I placed the URLs for a crawl in urls per the tutorial [1]. Then:
% ./bin/nutch crawl urls -dir crawl.test -depth 2
... gives me the following log:
060213 131631 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-default.xml
060213 131631 parsing file:/home/hdiwan/nutch-0.7.1/conf/crawl-tool.xml
060213 131631 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-site.xml
060213 131631 No FS indicated, using default:local
060213 131631 crawl started in: crawl.test
060213 131631 rootUrlFile = urls
060213 131631 threads = 10
060213 131631 depth = 2
060213 131632 Created webdb at LocalFS,/home/hdiwan/nutch-0.7.1/crawl.test/db
060213 131632 Starting URL processing
060213 131632 Plugins: looking in: /home/hdiwan/nutch-0.7.1/build/plugins
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-file
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/protocol-ftp
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-http/plugin.xml
060213 131632 impl: point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.http.Http
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-httpclient
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/parse-html/plugin.xml
060213 131632 impl: point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.html.HtmlParser
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/parse-js
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/parse-text/plugin.xml
060213 131632 impl: point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.text.TextParser
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/parse-pdf
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/parse-rss
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/parse-msword
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/parse-ext
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/index-basic/plugin.xml
060213 131632 impl: point=org.apache.nutch.indexer.IndexingFilter
class=org.apache.nutch.indexer.basic.BasicIndexingFilter
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/index-more
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/query-basic/plugin.xml
060213 131632 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.site.SiteQueryFilter
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/query-url/plugin.xml
060213 131632 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.url.URLQueryFilter
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/urlfilter-regex/plugin.xml
060213 131632 impl: point=org.apache.nutch.net.URLFilter
class=org.apache.nutch.net.RegexURLFilter
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/urlfilter-prefix
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/creativecommons
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/language-identifier
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/clustering-carrot2
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/ontology
060213 131632 SEVERE org.apache.nutch.plugin.PluginRuntimeException:
extension point: org.apache.nutch.protocol.Protocol does not exist.
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
at
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
Caused by: java.lang.RuntimeException:
org.apache.nutch.plugin.PluginRuntimeException: extension point:
org.apache.nutch.protocol.Protocol does not exist.
at
org.apache.nutch.plugin.PluginRepository.getInstance(PluginRepository.java:147)
at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:40)
... 4 more
Caused by: org.apache.nutch.plugin.PluginRuntimeException: extension
point: org.apache.nutch.protocol.Protocol does not exist.
at
org.apache.nutch.plugin.PluginRepository.installExtensions(PluginRepository.java:78)
at
org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:61)
at
org.apache.nutch.plugin.PluginRepository.getInstance(PluginRepository.java:144)
... 5 more
... org/apache/nutch/protocol/Protocol.java does exist, as does
org/apache/nutch/protocol/Protocol.class, jar tvf nutch-0.7.1.jar
holds the class file. I could do further investigation, but would like
some pointers as to where I should be looking first. Thanks!
--
Cheers,
Hasan Diwan <[EMAIL PROTECTED]>
1. http://lucene.apache.org/nutch/tutorial.html
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general