Please try this command bin/nutch crawl search -dir /usr/data/crawl -depth 2 &> crawl.log & where search folder contains the list of files containing URLs. The crawler will crawl data into /usr/data/crawl/crawldb folder. crawl.log being the log file. Hope this helps. Thanks Sudhi
BDalton <[EMAIL PROTECTED]> wrote: Thank you, that seemed to fix the problem. Unfortunately, another problem followed. With command: bin/nutch crawl urls1 -dir newcrawled -depth 2 >& crawl.log I now get a directory called ânewcrawledâ, however, the crawl.log is created empty without any information. Also the index created contains no data. No error messages. Iâm using nightly July 18, and have no problems with 0.7.2. Sami Siren-2 wrote: > > in 0.8 you submit a _directory_ containing urls.txt not the file itself. > > so remove /urls.txt part from your commandline and it should go fine. > > -- > Sami Siren > > BDalton wrote: > >>I get this error, >> >>bin/nutch crawl url.txt -dir newcrawled -depth 2 >& crawl.log >> >>Exception in thread "main" java.io.IOException: Input directory >>d:/nutch3/urls/urls.txt in local is invalid. >> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274) >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327) >> at org.apache.nutch.crawl.Injector.inject(Injector.java:138) >> at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) >> >> >> > > > -- View this message in context: http://www.nabble.com/0.8--Will-not-accept-url-list-file-on-Windows-tf1962714.html#a5386778 Sent from the Nutch - User forum at Nabble.com. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
