I can get the 'crawl' to run without a 'SEVERE' error by altering my
conf/nutch-site.xml to read:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="nutch-conf.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<nutch-conf>
<property>
<name>plugin.includes</name>
<value>nutch-extensionpoints|protocol-file|protocol-http|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value>
</property>
<property>
<name>file.content.limit</name> <value>-1</value>
</property>
</nutch-conf>

The file originally had:

<value>protocol-file|urlfilter-regex|parse-(xml|text|html|js|pdf)|index-basic|query-(basic|site|url)</value>

So the key appears to be the string 'nutch-extensionpoints|' prefixing the
<value>

Hmmmm... now to understand why this makes a difference and to see if I can
get tomcat
to use my brand new file-system crawl...

Light at the end of the tunnel... :-) and I hope it's not a train...

-Stephen


----


On 12/18/05, Stephen Fitch <[EMAIL PROTECTED]> wrote:
>
> I tried J2SE v1.4.2_10 and 5.0 u6... same issue..
>
> I threw away my nutch directory and un-tarballed a  new one... same
> issue...
>
> I should add this is issue is on a Windows box with CYGWIN/bash and
> the following env variables
>
> [EMAIL PROTECTED] /cygdrive/h/p/nutch/nutch-0.7.1
> $ env | grep NUTCH
> NUTCH_HOME=/cygdrive/h/p/nutch/nutch-0.7.1
> NUTCH_CONF_DIR=/cygdrive/h/p/nutch/nutch-0.7.1/conf
>
> [EMAIL PROTECTED] /cygdrive/h/p/nutch/nutch-0.7.1
> $ env | grep CLASSPATH
> CLASSPATH=/cygdrive/h/p/nutch/nutch-0.7.1/lib
>
> [EMAIL PROTECTED] /cygdrive/h/p/nutch/nutch-0.7.1
> $ env | grep JAVA
> QTJAVA="D:\Program Files\Java\jre1.5.0\lib\ext\QTJava.zip"
> JAVA_HOME=/cygdrive/e/Program Files/Java/jdk1.4.2_10
>
> I see...
>
> 051218 125130 parsing: H:\p\nutch\nutch-
> 0.7.1\plugins\urlfilter-regex\plugin.xml
> 051218 125130 impl: point=org.apache.nutch.net.URLFilter class=
> org.apache.nutch.net.RegexURLFilter
> 051218 125130 SEVERE org.apache.nutch.plugin.PluginRuntimeException:
> extension point: org.apache.nutch.indexer.IndexingFilter does not exist.
> java.lang.ExceptionInInitializerError
>         at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java
> :437)
>         at org.apache.nutch.db.WebDBInjector.injectURLFile(
> WebDBInjector.java:378)
>         at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
>         at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
> Caused by: java.lang.RuntimeException:
> org.apache.nutch.plugin.PluginRuntimeException: extension point:
> org.apache.nutch.indexer.IndexingFilter does not exist.
>         at org.apache.nutch.plugin.PluginRepository.getInstance(
> PluginRepository.java:147)
>         at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:40)
>         ... 4 more
> Caused by: org.apache.nutch.plugin.PluginRuntimeException: extension
> point: org.apache.nutch.indexer.IndexingFilter does not exist.
>         at org.apache.nutch.plugin.PluginRepository.installExtensions(
> PluginRepository.java:78)
>         at org.apache.nutch.plugin.PluginRepository.<init>(
> PluginRepository.java:61)
>         at org.apache.nutch.plugin.PluginRepository.getInstance(
> PluginRepository.java:144)
>         ... 5 more
>

Reply via email to