By the looks of it there was a problem parsing segment data in this particular segment. Please try reparsing the segment.
On Sat, Nov 12, 2011 at 11:46 AM, Rum Raisin <[email protected]> wrote: > Sorry continuing, since yahoo keyboard shortcuts triggered premature > email... > > It already created other directories like... with directories like > crawl_generate under them below. But why does it give this error? It > couldn't create the parse_data file earlier that its expecting now? Or it > thinks there should be data in that directory but there's nothing there? > > /nutch-trunk/crawl/segments/20111112043249 > /nutch-trunk/crawl/segments/20111112043120 > /nutch-trunk/crawl/segments/20111112043717 > /nutch-trunk/crawl/segments/20111112042823 > /nutch-trunk/crawl/segments/20111112043256 > > > ________________________________ > From: Rum Raisin <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Saturday, November 12, 2011 11:38 AM > Subject: Input path does not exist (parse_data) > > I get this error running nutch trunk under eclipse... > I don't understand what the problem is. It already created other > directories like... > > > Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: > Input path does not exist: > file:/home/jeff/workspace/nutch-trunk/crawl/segments/20111112043120/parse_data > Input path does not exist: > file:/home/jeff/workspace/nutch-trunk/crawl/segments/20111112042823/parse_data > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190) > at > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201) > at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) > at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175) > at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149) > at org.apache.nutch.crawl.Crawl.run(Crawl.java:143) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) > -- *Lewis*

