Ah, yes, I'm inching ever closer now.  
Here is what I'm getting now.

--------------------------------------

        run java in /cygdrive/c/PROGRA~1/java/jre1.5.0_03
        050601 160524 parsing file:/C:/cygwin/nutch/conf/nutch-default.xml
        050601 160524 parsing file:/C:/cygwin/nutch/conf/crawl-tool.xml
        050601 160524 parsing file:/C:/cygwin/nutch/conf/nutch-site.xml
        050601 160524 No FS indicated, using default:local
        050601 160524 crawl started in: crawl.text
        050601 160524 rootUrlFile = urls
        050601 160524 threads = 10
        050601 160524 depth = 3
        050601 160525 Created webdb at LocalFS,C:\cygwin\nutch\crawl.text\db
        050601 160525 Starting URL processing
        050601 160525 Plugins: looking in: C:\cygwin\nutch\plugins
        050601 160525 not including: C:\cygwin\nutch\plugins\clustering-carrot2
        050601 160525 not including: C:\cygwin\nutch\plugins\creativecommons
        050601 160525 parsing: C:\cygwin\nutch\plugins\index-basic\plugin.xml
        050601 160525 impl: point=net.nutch.indexer.IndexingFilter 
class=net.nutch.indexer.basic.BasicIndexingFilter
        050601 160525 not including: C:\cygwin\nutch\plugins\index-more
        050601 160525 not including: C:\cygwin\nutch\plugins\language-identifier
        050601 160525 not including: C:\cygwin\nutch\plugins\ontology
        050601 160525 not including: C:\cygwin\nutch\plugins\parse-ext
        050601 160525 parsing: C:\cygwin\nutch\plugins\parse-html\plugin.xml
        050601 160525 impl: point=net.nutch.parse.Parser 
class=net.nutch.parse.html.HtmlParser
        050601 160525 not including: C:\cygwin\nutch\plugins\parse-mp3
        050601 160525 not including: C:\cygwin\nutch\plugins\parse-msword
        050601 160525 not including: C:\cygwin\nutch\plugins\parse-pdf
        050601 160525 not including: C:\cygwin\nutch\plugins\parse-rtf
        050601 160525 parsing: C:\cygwin\nutch\plugins\parse-text\plugin.xml
        050601 160525 impl: point=net.nutch.parse.Parser 
class=net.nutch.parse.text.TextParser
        050601 160525 not including: C:\cygwin\nutch\plugins\protocol-file
        050601 160525 not including: C:\cygwin\nutch\plugins\protocol-ftp
        050601 160525 parsing: C:\cygwin\nutch\plugins\protocol-http\plugin.xml
        050601 160525 impl: point=net.nutch.protocol.Protocol 
class=net.nutch.protocol.http.Http
        050601 160525 parsing: C:\cygwin\nutch\plugins\query-basic\plugin.xml
        050601 160525 impl: point=net.nutch.searcher.QueryFilter 
class=net.nutch.searcher.basic.BasicQueryFilter
        050601 160525 not including: C:\cygwin\nutch\plugins\query-more
        050601 160525 parsing: C:\cygwin\nutch\plugins\query-site\plugin.xml
        050601 160525 impl: point=net.nutch.indexer.IndexingFilter 
class=net.nutch.searcher.site.SiteIndexingFilter
        050601 160525 impl: point=net.nutch.searcher.QueryFilter 
class=net.nutch.searcher.site.SiteQueryFilter
        050601 160525 parsing: C:\cygwin\nutch\plugins\query-url\plugin.xml
        050601 160525 impl: point=net.nutch.searcher.QueryFilter 
class=net.nutch.searcher.url.URLQueryFilter
        050601 160525 not including: C:\cygwin\nutch\plugins\urlfilter-prefix
        050601 160525 not including: C:\cygwin\nutch\plugins\urlfilter-regex
        Exception in thread "main" java.lang.ExceptionInInitializerError
                at 
org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
                at 
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
                at 
org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
                at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter not found.
        at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)
        ... 4 more
 

> -----Original Message-----
> From: J B [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, June 01, 2005 4:10 PM
> To: [email protected]
> Subject: RE: Installing Nutch on Windows
> 
> Matt,
> 
> I don't even set NUTCH_JAVA_HOME since this is only to 
> override the normal JAVA_HOME. If you unset/remove 
> NUTCH_JAVA_HOME alltogether, Nutch should default to 
> JAVA_HOME, which is enough.
> 
> The error at the bottom of the stack,
> 
> >050601 154453 No FS indicated, using default:local Exception 
> in thread 
> >"main" java.lang.RuntimeException: crawl.text already exists.
> 
> also suggests that you have not removed a previously 
> generated crawl directory.
> 
> Again, I am new to this so I could be very wrong...
> 
> Jon
> 
> 
> 
> 
> >From: "Matt Pasiewicz" <[EMAIL PROTECTED]>
> >Reply-To: [email protected]
> >To: <[email protected]>
> >Subject: RE: Installing Nutch on Windows
> >Date: Wed, 1 Jun 2005 15:52:06 -0600
> >
> >Well, thanks to Jon's  Cygwin explanation, I feel like I'm getting a 
> >little closer, but now I'm getting a bit of a prob from the 
> log below.  
> >Cygwin seems to see the path to NUTCH_JAVA_HOME 
> >(/cygdrive/c/PROGRA~1/java/jre1.5.0_03) just fine, but 
> something seems 
> >to be going wrong.  Any ideas?
> >
> >
> >  -----------------------------------
> >
> >
> >
> >NUTCH_JAVA_HOME: not found
> >
> >run java in /cygdrive/c/PROGRA~1/java/jre1.5.0_03
> >
> >050601 154453 parsing file:/C:/cygwin/nutch/conf/nutch-default.xml
> ><outbind://20/C:/cygwin/nutch/conf/nutch-default.xml>
> >
> >050601 154453 parsing file:/C:/cygwin/nutch/conf/crawl-tool.xml
> ><outbind://20/C:/cygwin/nutch/conf/crawl-tool.xml>
> >
> >050601 154453 parsing file:/C:/cygwin/nutch/conf/nutch-site.xml
> ><outbind://20/C:/cygwin/nutch/conf/nutch-site.xml>
> >
> >050601 154453 No FS indicated, using default:local Exception 
> in thread 
> >"main" java.lang.RuntimeException: crawl.text already exists.
> >
> >at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:121)
> >
> >
> 
> _________________________________________________________________
> L�ttare att hitta dr�mresan med MSN Resor http://www.msn.se/resor/
> 
> 

Reply via email to