Hi,

Wanting to index/search my local file-system, I've followed the directions
at:

http://www.searchmorph.com/wp/2005/02/11/getting-nutch-to-search-your-filesystem/

I see:

$ bin/nutch crawl urls -dir ../crawltest1 -depth 2
051217 150747 parsing file:/H:/p/nutch/nutch-0.7.1/conf/nutch-default.xml
051217 150748 parsing file:/H:/p/nutch/nutch-0.7.1/conf/crawl-tool.xml
051217 150748 parsing file:/H:/p/nutch/nutch-0.7.1/conf/nutch-site.xml
051217 150748 No FS indicated, using default:local
051217 150748 crawl started in: ../crawltest1
051217 150748 rootUrlFile = urls
051217 150748 threads = 10
051217 150748 depth = 2
051217 150749 Created webdb at LocalFS,H:\p\nutch\crawltest1\db
051217 150750 Starting URL processing
051217 150750 Plugins: looking in: H:\p\nutch\nutch-0.7.1\plugins
051217 150750 not including: H:\p\nutch\nutch-
0.7.1\plugins\clustering-carrot2
051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\creativecommons
051217 150750 parsing: H:\p\nutch\nutch-0.7.1\plugins\index-basic\plugin.xml
051217 150750 impl: point=org.apache.nutch.indexer.IndexingFilter class=
org.apac
he.nutch.indexer.basic.BasicIndexingFilter
051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\index-more
051217 150750 not including: H:\p\nutch\nutch-
0.7.1\plugins\language-identifier
051217 150750 not including: H:\p\nutch\nutch-
0.7.1\plugins\nutch-extensionpoint
s
051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\ontology
051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\parse-ext
051217 150750 parsing: H:\p\nutch\nutch-0.7.1\plugins\parse-html\plugin.xml
051217 150750 impl: point=org.apache.nutch.parse.Parser class=
org.apache.nutch.p
arse.html.HtmlParser
051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\parse-js
051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\parse-msword
051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\parse-pdf
051217 150750 not including: H:\p\nutch\nutch-0.7.1\plugins\parse-rss
051217 150750 parsing: H:\p\nutch\nutch-0.7.1\plugins\parse-text\plugin.xml
051217 150750 impl: point=org.apache.nutch.parse.Parser class=
org.apache.nutch.p
arse.text.TextParser
051217 150750 parsing: H:\p\nutch\nutch-
0.7.1\plugins\protocol-file\plugin.xml
051217 150751 impl: point=org.apache.nutch.protocol.Protocol class=
org.apache.nu
tch.protocol.file.File
051217 150751 not including: H:\p\nutch\nutch-0.7.1\plugins\protocol-ftp
051217 150751 parsing: H:\p\nutch\nutch-
0.7.1\plugins\protocol-http\plugin.xml
051217 150751 impl: point=org.apache.nutch.protocol.Protocol class=
org.apache.nu
tch.protocol.http.Http
051217 150751 not including: H:\p\nutch\nutch-
0.7.1\plugins\protocol-httpclient
051217 150751 parsing: H:\p\nutch\nutch-0.7.1\plugins\query-basic\plugin.xml
051217 150751 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache
.nutch.searcher.basic.BasicQueryFilter
051217 150751 not including: H:\p\nutch\nutch-0.7.1\plugins\query-more
051217 150751 parsing: H:\p\nutch\nutch-0.7.1\plugins\query-site\plugin.xml
051217 150751 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache
.nutch.searcher.site.SiteQueryFilter
051217 150751 parsing: H:\p\nutch\nutch-0.7.1\plugins\query-url\plugin.xml
051217 150751 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache
.nutch.searcher.url.URLQueryFilter
051217 150751 not including: H:\p\nutch\nutch-0.7.1\plugins\urlfilter-prefix
051217 150751 not including: H:\p\nutch\nutch-0.7.1\plugins\urlfilter-regex
051217 150751 SEVERE org.apache.nutch.plugin.PluginRuntimeException:
extension p
oint: org.apache.nutch.indexer.IndexingFilter does not exist.
Exception in thread "main" java.lang.ExceptionInInitializerError
        at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
        at org.apache.nutch.db.WebDBInjector.injectURLFile(
WebDBInjector.java:37
8)
        at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
Caused by: java.lang.RuntimeException:
org.apache.nutch.plugin.PluginRuntimeExce
ption: extension point: org.apache.nutch.indexer.IndexingFilter does not
exist.
        at org.apache.nutch.plugin.PluginRepository.getInstance
(PluginRepository
.java:147)
        at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:40)
        ... 4 more
Caused by: org.apache.nutch.plugin.PluginRuntimeException: extension point:
org.
apache.nutch.indexer.IndexingFilter does not exist.
        at org.apache.nutch.plugin.PluginRepository.installExtensions
(PluginRepo
sitory.java:78)
        at org.apache.nutch.plugin.PluginRepository.<init>(
PluginRepository.java
:61)
        at org.apache.nutch.plugin.PluginRepository.getInstance
(PluginRepository
.java:144)
        ... 5 more

[EMAIL PROTECTED] /cygdrive/h/p/nutch/nutch-0.7.1

Any ideas on the:

051217 150751 SEVERE org.apache.nutch.plugin.PluginRuntimeException:
extension p
oint: org.apache.nutch.indexer.IndexingFilter does not exist.

... issue would be welcomed...

BTW  I have:
$ more nutch-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="nutch-conf.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<nutch-conf>
<property>
    <name>plugin.includes</name>

<value>protocol-file|protocol-http|parse-(text|html)|index-basic|query-(basi
c|site|url)</value>
</property>
</nutch-conf>

AND
$ more crawl-urlfilter.txt
-^(file|ftp|mailto|https):

It excludes and filters out file URLs. Make it look like this:

-^(ftp|mailto|https):

Near the bottom there needs to be an entry like this:

+.*

IN my conf directory... I have not set any env. variables, appart from
JAVA_HOME.

Any help gratefully welcomed.



Thanks,

Stephen

Reply via email to