You don�t have the plugins in there. - New nutch requires plugins to be compiled. - To be certain also change the plugins path in the conf file to the absolute path where plugins are located.
CC- -----Original Message----- From: Matt Pasiewicz [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 01, 2005 6:13 PM To: [email protected] Subject: RE: Installing Nutch on Windows Ah, yes, I'm inching ever closer now. Here is what I'm getting now. -------------------------------------- run java in /cygdrive/c/PROGRA~1/java/jre1.5.0_03 050601 160524 parsing file:/C:/cygwin/nutch/conf/nutch-default.xml 050601 160524 parsing file:/C:/cygwin/nutch/conf/crawl-tool.xml 050601 160524 parsing file:/C:/cygwin/nutch/conf/nutch-site.xml 050601 160524 No FS indicated, using default:local 050601 160524 crawl started in: crawl.text 050601 160524 rootUrlFile = urls 050601 160524 threads = 10 050601 160524 depth = 3 050601 160525 Created webdb at LocalFS,C:\cygwin\nutch\crawl.text\db 050601 160525 Starting URL processing 050601 160525 Plugins: looking in: C:\cygwin\nutch\plugins 050601 160525 not including: C:\cygwin\nutch\plugins\clustering-carrot2 050601 160525 not including: C:\cygwin\nutch\plugins\creativecommons 050601 160525 parsing: C:\cygwin\nutch\plugins\index-basic\plugin.xml 050601 160525 impl: point=net.nutch.indexer.IndexingFilter class=net.nutch.indexer.basic.BasicIndexingFilter 050601 160525 not including: C:\cygwin\nutch\plugins\index-more 050601 160525 not including: C:\cygwin\nutch\plugins\language-identifier 050601 160525 not including: C:\cygwin\nutch\plugins\ontology 050601 160525 not including: C:\cygwin\nutch\plugins\parse-ext 050601 160525 parsing: C:\cygwin\nutch\plugins\parse-html\plugin.xml 050601 160525 impl: point=net.nutch.parse.Parser class=net.nutch.parse.html.HtmlParser 050601 160525 not including: C:\cygwin\nutch\plugins\parse-mp3 050601 160525 not including: C:\cygwin\nutch\plugins\parse-msword 050601 160525 not including: C:\cygwin\nutch\plugins\parse-pdf 050601 160525 not including: C:\cygwin\nutch\plugins\parse-rtf 050601 160525 parsing: C:\cygwin\nutch\plugins\parse-text\plugin.xml 050601 160525 impl: point=net.nutch.parse.Parser class=net.nutch.parse.text.TextParser 050601 160525 not including: C:\cygwin\nutch\plugins\protocol-file 050601 160525 not including: C:\cygwin\nutch\plugins\protocol-ftp 050601 160525 parsing: C:\cygwin\nutch\plugins\protocol-http\plugin.xml 050601 160525 impl: point=net.nutch.protocol.Protocol class=net.nutch.protocol.http.Http 050601 160525 parsing: C:\cygwin\nutch\plugins\query-basic\plugin.xml 050601 160525 impl: point=net.nutch.searcher.QueryFilter class=net.nutch.searcher.basic.BasicQueryFilter 050601 160525 not including: C:\cygwin\nutch\plugins\query-more 050601 160525 parsing: C:\cygwin\nutch\plugins\query-site\plugin.xml 050601 160525 impl: point=net.nutch.indexer.IndexingFilter class=net.nutch.searcher.site.SiteIndexingFilter 050601 160525 impl: point=net.nutch.searcher.QueryFilter class=net.nutch.searcher.site.SiteQueryFilter 050601 160525 parsing: C:\cygwin\nutch\plugins\query-url\plugin.xml 050601 160525 impl: point=net.nutch.searcher.QueryFilter class=net.nutch.searcher.url.URLQueryFilter 050601 160525 not including: C:\cygwin\nutch\plugins\urlfilter-prefix 050601 160525 not including: C:\cygwin\nutch\plugins\urlfilter-regex Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437) at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378) at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535) at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134) Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter not found. at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44) ... 4 more > -----Original Message----- > From: J B [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 01, 2005 4:10 PM > To: [email protected] > Subject: RE: Installing Nutch on Windows > > Matt, > > I don't even set NUTCH_JAVA_HOME since this is only to override the > normal JAVA_HOME. If you unset/remove NUTCH_JAVA_HOME alltogether, > Nutch should default to JAVA_HOME, which is enough. > > The error at the bottom of the stack, > > >050601 154453 No FS indicated, using default:local Exception > in thread > >"main" java.lang.RuntimeException: crawl.text already exists. > > also suggests that you have not removed a previously generated crawl > directory. > > Again, I am new to this so I could be very wrong... > > Jon > > > > > >From: "Matt Pasiewicz" <[EMAIL PROTECTED]> > >Reply-To: [email protected] > >To: <[email protected]> > >Subject: RE: Installing Nutch on Windows > >Date: Wed, 1 Jun 2005 15:52:06 -0600 > > > >Well, thanks to Jon's Cygwin explanation, I feel like I'm getting a > >little closer, but now I'm getting a bit of a prob from the > log below. > >Cygwin seems to see the path to NUTCH_JAVA_HOME > >(/cygdrive/c/PROGRA~1/java/jre1.5.0_03) just fine, but > something seems > >to be going wrong. Any ideas? > > > > > > ----------------------------------- > > > > > > > >NUTCH_JAVA_HOME: not found > > > >run java in /cygdrive/c/PROGRA~1/java/jre1.5.0_03 > > > >050601 154453 parsing file:/C:/cygwin/nutch/conf/nutch-default.xml > ><outbind://20/C:/cygwin/nutch/conf/nutch-default.xml> > > > >050601 154453 parsing file:/C:/cygwin/nutch/conf/crawl-tool.xml > ><outbind://20/C:/cygwin/nutch/conf/crawl-tool.xml> > > > >050601 154453 parsing file:/C:/cygwin/nutch/conf/nutch-site.xml > ><outbind://20/C:/cygwin/nutch/conf/nutch-site.xml> > > > >050601 154453 No FS indicated, using default:local Exception > in thread > >"main" java.lang.RuntimeException: crawl.text already exists. > > > >at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:121) > > > > > > _________________________________________________________________ > L�ttare att hitta dr�mresan med MSN Resor http://www.msn.se/resor/ > >
