If I remember correctly, crawl looks for a directory that contains the urls file.
On Tue, 2006-01-17 at 11:36 -0800, Chris Shepard wrote: > Hi all, > > Having some problems getting nutch to run on > XP/Cygwin. > This is re nutch-2006-01-17 > > Intranet crawl........ > > When I do this (after making urls file, etc.): > > bin/nutch crawl urls -dir cdir -depth 2 >&log > > I get this in the log: > > 060117 114833 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml > 060117 114834 crawl started in: cdir > 060117 114834 rootUrlDir = urls > 060117 114834 threads = 10 > 060117 114834 depth = 2 > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml > 060117 114834 Injector: starting > 060117 114834 Injector: crawlDb: cdir\crawldb > 060117 114834 Injector: urlDir: urls > 060117 114834 Injector: Converting injected urls to > crawl db entries. > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml > 060117 114834 Running job: job_krj0e1 > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml > 060117 114834 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml > 060117 114835 parsing > \tmp\nutch\mapred\local\localRunner\job_krj0e1.xml > 060117 114835 parsing > file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml > java.io.IOException: No input directories specified > in: NutchConf: nutch-default.xml , mapred-default.xml > , \tmp\nutch\mapred\local\localRunner\job_krj0e1.xml , > nutch-site.xml > at > org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85) > at > org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95) > at > org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63) > 060117 114835 map 0% > java.io.IOException: Job failed! > at > org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308) > at > org.apache.nutch.crawl.Injector.inject(Injector.java:102) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) > Exception in thread "main" > > I see that: > > nutch-site.xml is empty > mapred-default is empty > > > Whole Web setup............................ > > When I do this: (after mkdirs) > > bin/nutch admin db -create > > I get this at the prompt: > > Exception in thread "main" > java.lang.NoClassDefFoundError: admin > > I don't speak Java, so I'm not sure what it's saying. > > > Please help. > > TIA. > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com >
