Check the disk space in your temporary directory. On Monday 26 July 2010 18:30:26 Yousef Ourabi wrote: > Hello: > > I keep on running into the following exception on both Nutch 1.1 and > the nightly build. I seem to get this after 3 or 4 iterations of the > fetch/parse/updatedb loop... any thoughts? > > > > Fetcher: java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1107) > at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1145) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1116) > > ParseSegment: starting at 2010-07-25 18:12:50 > ParseSegment: segment: crawl/segments/20100725174011 > Exception in thread "main" > org.apache.hadoop.mapred.InvalidInputException: Input path does not > exist: file: ... > /nutch-2010-07-07_04-49-04/crawl/segments/20100725174011/content > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:1 > 90) at > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileIn > putFormat.java:44) at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:20 > 1) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at > org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:156) at > org.apache.nutch.parse.ParseSegment.run(ParseSegment.java:177) at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.parse.ParseSegment.main(ParseSegment.java:163) > > I have a simple wrapper script that does the following: > > in/nutch inject crawl/crawldb urls > > #10 > bin/nutch generate crawl/crawldb crawl/segments > SEGMENT=crawl/segments/`ls -tr crawl/segments|tail -1` > bin/nutch fetch $SEGMENT -noParsing > bin/nutch parse $SEGMENT > bin/nutch updatedb crawl/crawldb $SEGMENT -filter -normalize >
Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

