Ah, yes, I'm inching ever closer now.
Here is what I'm getting now.
--------------------------------------
run java in /cygdrive/c/PROGRA~1/java/jre1.5.0_03
050601 160524 parsing file:/C:/cygwin/nutch/conf/nutch-default.xml
050601 160524 parsing file:/C:/cygwin/nutch/conf/crawl-tool.xml
050601 160524 parsing file:/C:/cygwin/nutch/conf/nutch-site.xml
050601 160524 No FS indicated, using default:local
050601 160524 crawl started in: crawl.text
050601 160524 rootUrlFile = urls
050601 160524 threads = 10
050601 160524 depth = 3
050601 160525 Created webdb at LocalFS,C:\cygwin\nutch\crawl.text\db
050601 160525 Starting URL processing
050601 160525 Plugins: looking in: C:\cygwin\nutch\plugins
050601 160525 not including: C:\cygwin\nutch\plugins\clustering-carrot2
050601 160525 not including: C:\cygwin\nutch\plugins\creativecommons
050601 160525 parsing: C:\cygwin\nutch\plugins\index-basic\plugin.xml
050601 160525 impl: point=net.nutch.indexer.IndexingFilter
class=net.nutch.indexer.basic.BasicIndexingFilter
050601 160525 not including: C:\cygwin\nutch\plugins\index-more
050601 160525 not including: C:\cygwin\nutch\plugins\language-identifier
050601 160525 not including: C:\cygwin\nutch\plugins\ontology
050601 160525 not including: C:\cygwin\nutch\plugins\parse-ext
050601 160525 parsing: C:\cygwin\nutch\plugins\parse-html\plugin.xml
050601 160525 impl: point=net.nutch.parse.Parser
class=net.nutch.parse.html.HtmlParser
050601 160525 not including: C:\cygwin\nutch\plugins\parse-mp3
050601 160525 not including: C:\cygwin\nutch\plugins\parse-msword
050601 160525 not including: C:\cygwin\nutch\plugins\parse-pdf
050601 160525 not including: C:\cygwin\nutch\plugins\parse-rtf
050601 160525 parsing: C:\cygwin\nutch\plugins\parse-text\plugin.xml
050601 160525 impl: point=net.nutch.parse.Parser
class=net.nutch.parse.text.TextParser
050601 160525 not including: C:\cygwin\nutch\plugins\protocol-file
050601 160525 not including: C:\cygwin\nutch\plugins\protocol-ftp
050601 160525 parsing: C:\cygwin\nutch\plugins\protocol-http\plugin.xml
050601 160525 impl: point=net.nutch.protocol.Protocol
class=net.nutch.protocol.http.Http
050601 160525 parsing: C:\cygwin\nutch\plugins\query-basic\plugin.xml
050601 160525 impl: point=net.nutch.searcher.QueryFilter
class=net.nutch.searcher.basic.BasicQueryFilter
050601 160525 not including: C:\cygwin\nutch\plugins\query-more
050601 160525 parsing: C:\cygwin\nutch\plugins\query-site\plugin.xml
050601 160525 impl: point=net.nutch.indexer.IndexingFilter
class=net.nutch.searcher.site.SiteIndexingFilter
050601 160525 impl: point=net.nutch.searcher.QueryFilter
class=net.nutch.searcher.site.SiteQueryFilter
050601 160525 parsing: C:\cygwin\nutch\plugins\query-url\plugin.xml
050601 160525 impl: point=net.nutch.searcher.QueryFilter
class=net.nutch.searcher.url.URLQueryFilter
050601 160525 not including: C:\cygwin\nutch\plugins\urlfilter-prefix
050601 160525 not including: C:\cygwin\nutch\plugins\urlfilter-regex
Exception in thread "main" java.lang.ExceptionInInitializerError
at
org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
at
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
at
org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter not found.
at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)
... 4 more
> -----Original Message-----
> From: J B [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, June 01, 2005 4:10 PM
> To: [email protected]
> Subject: RE: Installing Nutch on Windows
>
> Matt,
>
> I don't even set NUTCH_JAVA_HOME since this is only to
> override the normal JAVA_HOME. If you unset/remove
> NUTCH_JAVA_HOME alltogether, Nutch should default to
> JAVA_HOME, which is enough.
>
> The error at the bottom of the stack,
>
> >050601 154453 No FS indicated, using default:local Exception
> in thread
> >"main" java.lang.RuntimeException: crawl.text already exists.
>
> also suggests that you have not removed a previously
> generated crawl directory.
>
> Again, I am new to this so I could be very wrong...
>
> Jon
>
>
>
>
> >From: "Matt Pasiewicz" <[EMAIL PROTECTED]>
> >Reply-To: [email protected]
> >To: <[email protected]>
> >Subject: RE: Installing Nutch on Windows
> >Date: Wed, 1 Jun 2005 15:52:06 -0600
> >
> >Well, thanks to Jon's Cygwin explanation, I feel like I'm getting a
> >little closer, but now I'm getting a bit of a prob from the
> log below.
> >Cygwin seems to see the path to NUTCH_JAVA_HOME
> >(/cygdrive/c/PROGRA~1/java/jre1.5.0_03) just fine, but
> something seems
> >to be going wrong. Any ideas?
> >
> >
> > -----------------------------------
> >
> >
> >
> >NUTCH_JAVA_HOME: not found
> >
> >run java in /cygdrive/c/PROGRA~1/java/jre1.5.0_03
> >
> >050601 154453 parsing file:/C:/cygwin/nutch/conf/nutch-default.xml
> ><outbind://20/C:/cygwin/nutch/conf/nutch-default.xml>
> >
> >050601 154453 parsing file:/C:/cygwin/nutch/conf/crawl-tool.xml
> ><outbind://20/C:/cygwin/nutch/conf/crawl-tool.xml>
> >
> >050601 154453 parsing file:/C:/cygwin/nutch/conf/nutch-site.xml
> ><outbind://20/C:/cygwin/nutch/conf/nutch-site.xml>
> >
> >050601 154453 No FS indicated, using default:local Exception
> in thread
> >"main" java.lang.RuntimeException: crawl.text already exists.
> >
> >at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:121)
> >
> >
>
> _________________________________________________________________
> L�ttare att hitta dr�mresan med MSN Resor http://www.msn.se/resor/
>
>