Hi all,
Having some problems getting nutch to run on
XP/Cygwin.
This is re nutch-2006-01-17
Intranet crawl........
When I do this (after making urls file, etc.):
bin/nutch crawl urls -dir cdir -depth 2 >&log
I get this in the log:
060117 114833 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
060117 114834 crawl started in: cdir
060117 114834 rootUrlDir = urls
060117 114834 threads = 10
060117 114834 depth = 2
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
060117 114834 Injector: starting
060117 114834 Injector: crawlDb: cdir\crawldb
060117 114834 Injector: urlDir: urls
060117 114834 Injector: Converting injected urls to
crawl db entries.
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
060117 114834 Running job: job_krj0e1
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
060117 114834 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
060117 114835 parsing
\tmp\nutch\mapred\local\localRunner\job_krj0e1.xml
060117 114835 parsing
file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
java.io.IOException: No input directories specified
in: NutchConf: nutch-default.xml , mapred-default.xml
, \tmp\nutch\mapred\local\localRunner\job_krj0e1.xml ,
nutch-site.xml
at
org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85)
at
org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95)
at
org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63)
060117 114835 map 0%
java.io.IOException: Job failed!
at
org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
at
org.apache.nutch.crawl.Injector.inject(Injector.java:102)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
Exception in thread "main"
I see that:
nutch-site.xml is empty
mapred-default is empty
Whole Web setup............................
When I do this: (after mkdirs)
bin/nutch admin db -create
I get this at the prompt:
Exception in thread "main"
java.lang.NoClassDefFoundError: admin
I don't speak Java, so I'm not sure what it's saying.
Please help.
TIA.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general