Hello:

I keep on running into the following exception on both Nutch 1.1 and
the nightly build. I seem to get this after 3 or 4 iterations of the
fetch/parse/updatedb loop... any thoughts?



Fetcher: java.io.IOException: Job failed!
     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
     at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1107)
     at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1145)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
     at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1116)

ParseSegment: starting at 2010-07-25 18:12:50
ParseSegment: segment: crawl/segments/20100725174011
Exception in thread "main"
org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: file: ...
/nutch-2010-07-07_04-49-04/crawl/segments/20100725174011/content
     at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
     at 
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
     at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
     at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
     at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
     at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:156)
     at org.apache.nutch.parse.ParseSegment.run(ParseSegment.java:177)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
     at org.apache.nutch.parse.ParseSegment.main(ParseSegment.java:163)

I have a simple wrapper script that does the following:

in/nutch inject crawl/crawldb urls

#10
bin/nutch generate crawl/crawldb crawl/segments
SEGMENT=crawl/segments/`ls -tr crawl/segments|tail -1`
bin/nutch fetch $SEGMENT -noParsing
bin/nutch parse $SEGMENT
bin/nutch updatedb crawl/crawldb $SEGMENT -filter -normalize

Reply via email to