I just encountered this exception when using Nutch build #736:
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist:
file:/opt/tomcat6/webapps/nutch/data/segments/20090302235647/parse_data
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
at
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:147)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:128)
What's causing this exception? I'm using my dedicated server for this
crawling.
Thanks!
Tony
--
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信