Sorry continuing, since yahoo keyboard shortcuts triggered premature email...
It already created other directories like... with directories like crawl_generate under them below. But why does it give this error? It couldn't create the parse_data file earlier that its expecting now? Or it thinks there should be data in that directory but there's nothing there? /nutch-trunk/crawl/segments/20111112043249 /nutch-trunk/crawl/segments/20111112043120 /nutch-trunk/crawl/segments/20111112043717 /nutch-trunk/crawl/segments/20111112042823 /nutch-trunk/crawl/segments/20111112043256 ________________________________ From: Rum Raisin <[email protected]> To: "[email protected]" <[email protected]> Sent: Saturday, November 12, 2011 11:38 AM Subject: Input path does not exist (parse_data) I get this error running nutch trunk under eclipse... I don't understand what the problem is. It already created other directories like... Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/jeff/workspace/nutch-trunk/crawl/segments/20111112043120/parse_data Input path does not exist: file:/home/jeff/workspace/nutch-trunk/crawl/segments/20111112042823/parse_data at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149) at org.apache.nutch.crawl.Crawl.run(Crawl.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

