I'm seeing the same issue... I'll share an answer if/when I find it...
Stephen On 12/17/05, Alfred Ostermeier <[EMAIL PROTECTED]> wrote: > > Hello, > > I have just installed nutch 0.7.1. I'm running it on Win XP and cygwin. > Crawling an http-URL worked well. But: Crawling an file-URL failed. I did > configure nutch exactly as described in > > http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878 > e6. That means, I activated the "protocol-file" plugin. Below is the > content > of the log-file with the errors. > > The only Google hit for "IndexingFilter does not exist" ( > > http://www.mail-archive.com/[email protected]/msg00878.htmlsuspect > ed ) suspected the CLASSPATH among other things. To which folders or jar > file(s) has the CLASSPATH to be set - if yes? Currently mine is set to the > current directory. I unfortunately couldn't find the jar-file with the > class > IndexingFilter. > > Regards, > Alfred > > > ---------------------------------------------------------------------------- > ------------------------------------------------- > > run java in c:\j2sdk1.4.2_04\jre > 051217 235048 parsing file:/C:/nutch-0.7.1/conf/nutch-default.xml > 051217 235049 parsing file:/C:/nutch-0.7.1/conf/crawl-tool.xml > 051217 235049 parsing file:/C:/nutch-0.7.1/conf/nutch-site.xml > 051217 235049 No FS indicated, using default:local > 051217 235049 crawl started in: crawl.test > 051217 235049 rootUrlFile = urls > 051217 235049 threads = 10 > 051217 235049 depth = 3 > 051217 235049 Created webdb at LocalFS,C:\nutch-0.7.1\crawl.test\db > 051217 235049 Starting URL processing > 051217 235049 Plugins: looking in: C:\nutch-0.7.1\plugins > 051217 235049 not including: C:\nutch-0.7.1\plugins\clustering-carrot2 > 051217 235049 not including: C:\nutch-0.7.1\plugins\creativecommons > 051217 235049 parsing: C:\nutch-0.7.1\plugins\index-basic\plugin.xml > 051217 235049 impl: point=org.apache.nutch.indexer.IndexingFilter > class=org.apache.nutch.indexer.basic.BasicIndexingFilter > 051217 235049 not including: C:\nutch-0.7.1\plugins\index-more > 051217 235049 not including: C:\nutch-0.7.1\plugins\language-identifier > 051217 235049 not including: C:\nutch-0.7.1\plugins\nutch-extensionpoints > 051217 235049 not including: C:\nutch-0.7.1\plugins\ontology > 051217 235049 not including: C:\nutch-0.7.1\plugins\parse-ext > 051217 235049 parsing: C:\nutch-0.7.1\plugins\parse-html\plugin.xml > 051217 235049 impl: point=org.apache.nutch.parse.Parser > class=org.apache.nutch.parse.html.HtmlParser > 051217 235049 not including: C:\nutch-0.7.1\plugins\parse-js > 051217 235049 not including: C:\nutch-0.7.1\plugins\parse-msword > 051217 235049 not including: C:\nutch-0.7.1\plugins\parse-pdf > 051217 235049 not including: C:\nutch-0.7.1\plugins\parse-rss > 051217 235049 parsing: C:\nutch-0.7.1\plugins\parse-text\plugin.xml > 051217 235049 impl: point=org.apache.nutch.parse.Parser > class=org.apache.nutch.parse.text.TextParser > 051217 235049 parsing: C:\nutch-0.7.1\plugins\protocol-file\plugin.xml > 051217 235049 impl: point=org.apache.nutch.protocol.Protocol > class=org.apache.nutch.protocol.file.File > 051217 235049 not including: C:\nutch-0.7.1\plugins\protocol-ftp > 051217 235049 parsing: C:\nutch-0.7.1\plugins\protocol-http\plugin.xml > 051217 235049 impl: point=org.apache.nutch.protocol.Protocol > class=org.apache.nutch.protocol.http.Http > 051217 235049 not including: C:\nutch-0.7.1\plugins\protocol-httpclient > 051217 235049 parsing: C:\nutch-0.7.1\plugins\query-basic\plugin.xml > 051217 235049 impl: point=org.apache.nutch.searcher.QueryFilter > class=org.apache.nutch.searcher.basic.BasicQueryFilter > 051217 235049 not including: C:\nutch-0.7.1\plugins\query-more > 051217 235049 parsing: C:\nutch-0.7.1\plugins\query-site\plugin.xml > 051217 235049 impl: point=org.apache.nutch.searcher.QueryFilter > class=org.apache.nutch.searcher.site.SiteQueryFilter > 051217 235049 parsing: C:\nutch-0.7.1\plugins\query-url\plugin.xml > 051217 235049 impl: point=org.apache.nutch.searcher.QueryFilter > class=org.apache.nutch.searcher.url.URLQueryFilter > 051217 235049 not including: C:\nutch-0.7.1\plugins\urlfilter-prefix > 051217 235049 not including: C:\nutch-0.7.1\plugins\urlfilter-regex > 051217 235049 SEVERE org.apache.nutch.plugin.PluginRuntimeException: > extension point: org.apache.nutch.indexer.IndexingFilter does not exist. > java.lang.ExceptionInInitializerError > at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java > :437) > at > org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378) > at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535) > at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134) > Caused by: java.lang.RuntimeException: > org.apache.nutch.plugin.PluginRuntimeException: extension point: > org.apache.nutch.indexer.IndexingFilter does not exist. > at > org.apache.nutch.plugin.PluginRepository.getInstance(PluginRepository.java > :1 > 47) > at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:40) > ... 4 more > Caused by: org.apache.nutch.plugin.PluginRuntimeException: extension > point: > org.apache.nutch.indexer.IndexingFilter does not exist. > at > org.apache.nutch.plugin.PluginRepository.installExtensions > (PluginRepository. > java:78) > at > org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:61) > at > org.apache.nutch.plugin.PluginRepository.getInstance(PluginRepository.java > :1 > 44) > ... 5 more > Exception in thread "main" > > >
