If I remember correctly, crawl looks for a directory that contains the urls file.
On Tue, 2006-01-17 at 11:36 -0800, Chris Shepard wrote: > Hi all, > > Having some problems getting nutch to run on > XP/Cygwin. > This is re nutch-2006-01-17 > > Intranet crawl........ > > When I do this (after making urls file, etc.): > > bin/nutch crawl urls -dir cdir -depth 2 >&log > > I get this in the log: > > 060117 114833 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml > 060117 114834 crawl started in: cdir > 060117 114834 rootUrlDir = urls > 060117 114834 threads = 10 > 060117 114834 depth = 2 > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml > 060117 114834 Injector: starting > 060117 114834 Injector: crawlDb: cdir\crawldb > 060117 114834 Injector: urlDir: urls > 060117 114834 Injector: Converting injected urls to > crawl db entries. > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml > 060117 114834 Running job: job_krj0e1 > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml > 060117 114835 parsing > \tmp\nutch\mapred\local\localRunner\job_krj0e1.xml > 060117 114835 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml > java.io.IOException: No input directories specified > in: NutchConf: nutch-default.xml , mapred-default.xml > , \tmp\nutch\mapred\local\localRunner\job_krj0e1.xml , > nutch-site.xml > at > org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85) > at > org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95) > at > org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63) > 060117 114835 map 0% > java.io.IOException: Job failed! > at > org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308) > at > org.apache.nutch.crawl.Injector.inject(Injector.java:102) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) > Exception in thread "main" > > I see that: > > nutch-site.xml is empty > mapred-default is empty > > > Whole Web setup............................ > > When I do this: (after mkdirs) > > bin/nutch admin db -create > > I get this at the prompt: > > Exception in thread "main" > java.lang.NoClassDefFoundError: admin > > I don't speak Java, so I'm not sure what it's saying. > > > Please help. > > TIA. > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
