Hi, Wanting to index/search my local file-system, I've followed the directions at:
http://www.searchmorph.com/wp/2005/02/11/getting-nutch-to-search-your-filesystem/ I see: $ bin/nutch crawl urls -dir ../crawltest1 -depth 2 051217 150747 parsing file:/H:/p/nutch/nutch-0.7.1/conf/nutch-default.xml 051217 150748 parsing file:/H:/p/nutch/nutch-0.7.1/conf/crawl-tool.xml 051217 150748 parsing file:/H:/p/nutch/nutch-0.7.1/conf/nutch-site.xml 051217 150748 No FS indicated, using default:local 051217 150748 crawl started in: ../crawltest1 051217 150748 rootUrlFile = urls 051217 150748 threads = 10 051217 150748 depth = 2 051217 150749 Created webdb at LocalFS,H:\p\nutch\crawltest1\db 051217 150750 Starting URL processing 051217 150750 Plugins: looking in: H:\p\nutch\nutch-0.7.1\plugins 051217 150750 not including: H:\p\nutch\nutch- 0.7.1\plugins\clustering-carrot2 051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\creativecommons 051217 150750 parsing: H:\p\nutch\nutch-0.7.1\plugins\index-basic\plugin.xml 051217 150750 impl: point=org.apache.nutch.indexer.IndexingFilter class= org.apac he.nutch.indexer.basic.BasicIndexingFilter 051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\index-more 051217 150750 not including: H:\p\nutch\nutch- 0.7.1\plugins\language-identifier 051217 150750 not including: H:\p\nutch\nutch- 0.7.1\plugins\nutch-extensionpoint s 051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\ontology 051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\parse-ext 051217 150750 parsing: H:\p\nutch\nutch-0.7.1\plugins\parse-html\plugin.xml 051217 150750 impl: point=org.apache.nutch.parse.Parser class= org.apache.nutch.p arse.html.HtmlParser 051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\parse-js 051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\parse-msword 051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\parse-pdf 051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\parse-rss 051217 150750 parsing: H:\p\nutch\nutch-0.7.1\plugins\parse-text\plugin.xml 051217 150750 impl: point=org.apache.nutch.parse.Parser class= org.apache.nutch.p arse.text.TextParser 051217 150750 parsing: H:\p\nutch\nutch- 0.7.1\plugins\protocol-file\plugin.xml 051217 150751 impl: point=org.apache.nutch.protocol.Protocol class= org.apache.nu tch.protocol.file.File 051217 150751 not including: H:\p\nutch\nutch-0.7.1\plugins\protocol-ftp 051217 150751 parsing: H:\p\nutch\nutch- 0.7.1\plugins\protocol-http\plugin.xml 051217 150751 impl: point=org.apache.nutch.protocol.Protocol class= org.apache.nu tch.protocol.http.Http 051217 150751 not including: H:\p\nutch\nutch- 0.7.1\plugins\protocol-httpclient 051217 150751 parsing: H:\p\nutch\nutch-0.7.1\plugins\query-basic\plugin.xml 051217 150751 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache .nutch.searcher.basic.BasicQueryFilter 051217 150751 not including: H:\p\nutch\nutch-0.7.1\plugins\query-more 051217 150751 parsing: H:\p\nutch\nutch-0.7.1\plugins\query-site\plugin.xml 051217 150751 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache .nutch.searcher.site.SiteQueryFilter 051217 150751 parsing: H:\p\nutch\nutch-0.7.1\plugins\query-url\plugin.xml 051217 150751 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache .nutch.searcher.url.URLQueryFilter 051217 150751 not including: H:\p\nutch\nutch-0.7.1\plugins\urlfilter-prefix 051217 150751 not including: H:\p\nutch\nutch-0.7.1\plugins\urlfilter-regex 051217 150751 SEVERE org.apache.nutch.plugin.PluginRuntimeException: extension p oint: org.apache.nutch.indexer.IndexingFilter does not exist. Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437) at org.apache.nutch.db.WebDBInjector.injectURLFile( WebDBInjector.java:37 8) at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535) at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134) Caused by: java.lang.RuntimeException: org.apache.nutch.plugin.PluginRuntimeExce ption: extension point: org.apache.nutch.indexer.IndexingFilter does not exist. at org.apache.nutch.plugin.PluginRepository.getInstance (PluginRepository .java:147) at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:40) ... 4 more Caused by: org.apache.nutch.plugin.PluginRuntimeException: extension point: org. apache.nutch.indexer.IndexingFilter does not exist. at org.apache.nutch.plugin.PluginRepository.installExtensions (PluginRepo sitory.java:78) at org.apache.nutch.plugin.PluginRepository.<init>( PluginRepository.java :61) at org.apache.nutch.plugin.PluginRepository.getInstance (PluginRepository .java:144) ... 5 more [EMAIL PROTECTED] /cygdrive/h/p/nutch/nutch-0.7.1 Any ideas on the: 051217 150751 SEVERE org.apache.nutch.plugin.PluginRuntimeException: extension p oint: org.apache.nutch.indexer.IndexingFilter does not exist. ... issue would be welcomed... BTW I have: $ more nutch-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="nutch-conf.xsl"?> <!-- Put site-specific property overrides in this file. --> <nutch-conf> <property> <name>plugin.includes</name> <value>protocol-file|protocol-http|parse-(text|html)|index-basic|query-(basi c|site|url)</value> </property> </nutch-conf> AND $ more crawl-urlfilter.txt -^(file|ftp|mailto|https): It excludes and filters out file URLs. Make it look like this: -^(ftp|mailto|https): Near the bottom there needs to be an entry like this: +.* IN my conf directory... I have not set any env. variables, appart from JAVA_HOME. Any help gratefully welcomed. Thanks, Stephen
