Actually I'm not shure if I look at the right log lines. Please explain in more detail for what exactly I should look for. Anyway I found the following line just before the error:
Error parsing: http://eu.apachecon.com/js/jquery.akslideshow.js: failed(2,0): Can't retrieve Tika parser for mime-type text/javascript But I can see that parsing erros like this already appeared earlier during the crawl. 2011/7/12 Markus Jelsma <[email protected]>: > Were there errors during parsing of that last segment? > >> I'm starting with nutch and I ran a simple job as described in the >> nutch tutorial. After a while I get the following error: >> >> >> CrawlDb update: URL filtering: true >> CrawlDb update: Merging segment data into db. >> CrawlDb update: finished at 2011-07-12 12:32:03, elapsed: 00:00:03 >> LinkDb: starting at 2011-07-12 12:32:03 >> LinkDb: linkdb: /Users/toom/Downloads/nutch-1.3/sites/linkdb >> LinkDb: URL normalize: true >> LinkDb: URL filter: true >> LinkDb: adding segment: >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238 >> LinkDb: adding segment: >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712113732 >> LinkDb: adding segment: >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712114256 >> LinkDb: adding segment: >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856 >> LinkDb: adding segment: >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908 >> LinkDb: adding segment: >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051 >> Exception in thread "main" >> org.apache.hadoop.mapred.InvalidInputException: Input path does not >> exist: >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238/parse_d >> ata Input path does not exist: >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712113732/parse_da >> ta Input path does not exist: >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712114256/parse_da >> ta at >> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:1 >> 90) at >> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileIn >> putFormat.java:44) at >> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:20 >> 1) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) >> at >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) >> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at >> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at >> org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175) >> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149) >> at org.apache.nutch.crawl.Crawl.run(Crawl.java:142) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.nutch.crawl.Crawl.main(Crawl.java:54) >

