I can get the 'crawl' to run without a 'SEVERE' error by altering my conf/nutch-site.xml to read:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="nutch-conf.xsl"?> <!-- Put site-specific property overrides in this file. --> <nutch-conf> <property> <name>plugin.includes</name> <value>nutch-extensionpoints|protocol-file|protocol-http|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value> </property> <property> <name>file.content.limit</name> <value>-1</value> </property> </nutch-conf> The file originally had: <value>protocol-file|urlfilter-regex|parse-(xml|text|html|js|pdf)|index-basic|query-(basic|site|url)</value> So the key appears to be the string 'nutch-extensionpoints|' prefixing the <value> Hmmmm... now to understand why this makes a difference and to see if I can get tomcat to use my brand new file-system crawl... Light at the end of the tunnel... :-) and I hope it's not a train... -Stephen ---- On 12/18/05, Stephen Fitch <[EMAIL PROTECTED]> wrote: > > I tried J2SE v1.4.2_10 and 5.0 u6... same issue.. > > I threw away my nutch directory and un-tarballed a new one... same > issue... > > I should add this is issue is on a Windows box with CYGWIN/bash and > the following env variables > > [EMAIL PROTECTED] /cygdrive/h/p/nutch/nutch-0.7.1 > $ env | grep NUTCH > NUTCH_HOME=/cygdrive/h/p/nutch/nutch-0.7.1 > NUTCH_CONF_DIR=/cygdrive/h/p/nutch/nutch-0.7.1/conf > > [EMAIL PROTECTED] /cygdrive/h/p/nutch/nutch-0.7.1 > $ env | grep CLASSPATH > CLASSPATH=/cygdrive/h/p/nutch/nutch-0.7.1/lib > > [EMAIL PROTECTED] /cygdrive/h/p/nutch/nutch-0.7.1 > $ env | grep JAVA > QTJAVA="D:\Program Files\Java\jre1.5.0\lib\ext\QTJava.zip" > JAVA_HOME=/cygdrive/e/Program Files/Java/jdk1.4.2_10 > > I see... > > 051218 125130 parsing: H:\p\nutch\nutch- > 0.7.1\plugins\urlfilter-regex\plugin.xml > 051218 125130 impl: point=org.apache.nutch.net.URLFilter class= > org.apache.nutch.net.RegexURLFilter > 051218 125130 SEVERE org.apache.nutch.plugin.PluginRuntimeException: > extension point: org.apache.nutch.indexer.IndexingFilter does not exist. > java.lang.ExceptionInInitializerError > at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java > :437) > at org.apache.nutch.db.WebDBInjector.injectURLFile( > WebDBInjector.java:378) > at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535) > at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134) > Caused by: java.lang.RuntimeException: > org.apache.nutch.plugin.PluginRuntimeException: extension point: > org.apache.nutch.indexer.IndexingFilter does not exist. > at org.apache.nutch.plugin.PluginRepository.getInstance( > PluginRepository.java:147) > at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:40) > ... 4 more > Caused by: org.apache.nutch.plugin.PluginRuntimeException: extension > point: org.apache.nutch.indexer.IndexingFilter does not exist. > at org.apache.nutch.plugin.PluginRepository.installExtensions( > PluginRepository.java:78) > at org.apache.nutch.plugin.PluginRepository.<init>( > PluginRepository.java:61) > at org.apache.nutch.plugin.PluginRepository.getInstance( > PluginRepository.java:144) > ... 5 more >
