Maybe you could post your nutch-default.xml and nutch-site.xml. That
"SEVERE bad conf file" shouldn't happen.
Plus it looks like you're including both protocol-http and
protocol-httpclient.
I believe they are mutually exclusive. Just use one or the other. I think
someone posted here recently saying they had problems with httpclient.
I'm using protocol-http myself.
Howie
060307 141033 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-default.xml
060307 141033 parsing file:/home/hdiwan/nutch-0.7.1/conf/crawl-tool.xml
060307 141033 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-site.xml
060307 141033 SEVERE bad conf file: top-level element not <nutch-conf>
060307 141033 No FS indicated, using default:local
060307 141033 crawl started in: ../SpectraSearch/crawl/
060307 141033 rootUrlFile = ../SpectraSearch/urls
060307 141033 threads = 3
060307 141033 depth = 2
060307 141033 Created webdb at LocalFS,/home/hdiwan/SpectraSearch/crawl/db
060307 141033 Starting URL processing
060307 141033 Plugins: looking in: /home/hdiwan/nutch-0.7.1/build/plugins
060307 141033 not including: /home/hdiwan/nutch-0.7.1
/build/plugins/protocol-file
060307 141033 not including: /home/hdiwan/nutch-0.7.1
/build/plugins/protocol-ftp
060307 141033 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/protocol-http/plugin.xml
060307 141033 impl: point=org.apache.nutch.protocol.Protocol class=
org.apache.nutch.protocol.http.Http
060307 141033 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/protocol-httpclient/plugin.xml
060307 141034 impl: point=org.apache.nutch.protocol.Protocol class=
org.apache.nutch.protocol.httpclient.Http
060307 141034 impl: point=org.apache.nutch.protocol.Protocol class=
org.apache.nutch.protocol.httpclient.Http
060307 141034 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/parse-html/plugin.xml
060307 141034 impl: point=org.apache.nutch.parse.Parser class=
org.apache.nutch.parse.html.HtmlParser
060307 141034 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/parse-js/plugin.xml
060307 141034 impl: point=org.apache.nutch.parse.Parser class=
org.apache.nutch.parse.js.JSParseFilter
060307 141034 impl: point=org.apache.nutch.parse.HtmlParseFilter class=
org.apache.nutch.parse.js.JSParseFilter
060307 141034 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/parse-text/plugin.xml
060307 141034 impl: point=org.apache.nutch.parse.Parser class=
org.apache.nutch.parse.text.TextParser
060307 141034 not including: /home/hdiwan/nutch-0.7.1
/build/plugins/parse-pdf
060307 141034 not including: /home/hdiwan/nutch-0.7.1
/build/plugins/parse-rss
060307 141034 not including: /home/hdiwan/nutch-0.7.1
/build/plugins/parse-msword
060307 141034 not including: /home/hdiwan/nutch-0.7.1
/build/plugins/parse-ext
060307 141034 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/index-basic/plugin.xml
060307 141034 impl: point=org.apache.nutch.indexer.IndexingFilter class=
org.apache.nutch.indexer.basic.BasicIndexingFilter
060307 141034 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/index-more/plugin.xml
060307 141034 impl: point=org.apache.nutch.indexer.IndexingFilter class=
org.apache.nutch.indexer.more.MoreIndexingFilter
060307 141034 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/query-basic/plugin.xml
060307 141034 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.basic.BasicQueryFilter
060307 141034 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/query-more/plugin.xml
060307 141034 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.more.TypeQueryFilter
060307 141034 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.more.DateQueryFilter
060307 141034 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/query-site/plugin.xml
060307 141034 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.site.SiteQueryFilter
060307 141034 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/query-url/plugin.xml
060307 141034 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.url.URLQueryFilter
060307 141034 parsing: /home/hdiwan/nutch-0.7.1
/build/plugins/urlfilter-regex/plugin.xml
060307 141034 impl: point=org.apache.nutch.net.URLFilter class=
org.apache.nutch.net.RegexURLFilter
060307 141034 not including: /home/hdiwan/nutch-0.7.1
/build/plugins/urlfilter-prefix
060307 141034 not including: /home/hdiwan/nutch-0.7.1
/build/plugins/creativecommons
060307 141034 not including: /home/hdiwan/nutch-0.7.1
/build/plugins/language-identifier
060307 141034 not including: /home/hdiwan/nutch-0.7.1
/build/plugins/clustering-carrot2
060307 141034 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/ontology
060307 141034 SEVERE org.apache.nutch.plugin.PluginRuntimeException:
extension point: org.apache.nutch.protocol.Protocol does not exist.
Exception in thread "main" java.lang.ExceptionInInitializerError
at
org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
at org.apache.nutch.db.WebDBInjector.injectURLFile(
WebDBInjector.java:378)
at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
Caused by: java.lang.RuntimeException:
org.apache.nutch.plugin.PluginRuntimeException: extension point:
org.apache.nutch.protocol.Protocol does not exist.
at org.apache.nutch.plugin.PluginRepository.getInstance(
PluginRepository.java:147)
at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:40)
... 4 more
Caused by: org.apache.nutch.plugin.PluginRuntimeException: extension point:
org.apache.nutch.protocol.Protocol does not exist.
at org.apache.nutch.plugin.PluginRepository.installExtensions(
PluginRepository.java:78)
at org.apache.nutch.plugin.PluginRepository.<init>(
PluginRepository.java:61)
at org.apache.nutch.plugin.PluginRepository.getInstance(
PluginRepository.java:144)
... 5 more
That's from my log. A preliminary investigation follows, with steps and
results pasted:
1. check the nutch-0.7.1 war file for the relevant class:
% jar tvf ./nutch-0.7.1.jar | grep Protocol
server: 2:14pm % jar tvf ./nutch-0.7.1.jar | grep Protocol.class
756 Tue Mar 07 13:17:04 PST 2006
org/apache/nutch/mapReduce/InterTrackerProtocol.class
491 Tue Mar 07 13:17:04 PST 2006
org/apache/nutch/mapReduce/JobSubmissionProtocol.class
324 Tue Mar 07 13:17:04 PST 2006
org/apache/nutch/mapReduce/MapOutputProtocol.class
409 Tue Mar 07 13:17:04 PST 2006
org/apache/nutch/mapReduce/TaskUmbilicalProtocol.class
517 Tue Mar 07 13:17:04 PST 2006
org/apache/nutch/protocol/Protocol.class
469 Tue Mar 07 13:17:04 PST 2006
org/apache/nutch/searcher/DistributedSearch$Protocol.class
So it indeed exists.
2. ... Perhaps, it wasn't found in the source tree...
find ./src/java -name 'Protocol.java' -print
server: 2:14pm % find ./src -name 'Protocol.java' -print [~/nutch-
0.7.1]
./src/java/org/apache/nutch/protocol/Protocol.java
Now I'm stumped... Help!
--
Cheers,
Hasan Diwan <[EMAIL PROTECTED]>