Hi all,
Having some problems getting nutch to run on
XP/Cygwin.
This is re nutch-2006-01-17
Intranet crawl........
When I do this (after making urls file, etc.):
bin/nutch crawl urls -dir cdir -depth 2 >&log
I get this in the log:
060117 114833 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
060117 114834 crawl started in: cdir
060117 114834 rootUrlDir = urls
060117 114834 threads = 10
060117 114834 depth = 2
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
060117 114834 Injector: starting
060117 114834 Injector: crawlDb: cdir\crawldb
060117 114834 Injector: urlDir: urls
060117 114834 Injector: Converting injected urls to
crawl db entries.
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
060117 114834 Running job: job_krj0e1
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
060117 114835 parsing
\tmp\nutch\mapred\local\localRunner\job_krj0e1.xml
060117 114835 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
java.io.IOException: No input directories specified
in: NutchConf: nutch-default.xml , mapred-default.xml
, \tmp\nutch\mapred\local\localRunner\job_krj0e1.xml ,
nutch-site.xml
at
org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85)
at
org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95)
at
org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63)
060117 114835 map 0%
java.io.IOException: Job failed!
at
org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
at
org.apache.nutch.crawl.Injector.inject(Injector.java:102)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
Exception in thread "main"
I see that:
nutch-site.xml is empty
mapred-default is empty
Whole Web setup............................
When I do this: (after mkdirs)
bin/nutch admin db -create
I get this at the prompt:
Exception in thread "main"
java.lang.NoClassDefFoundError: admin
I don't speak Java, so I'm not sure what it's saying.
Please help.
TIA.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com