Hi,
I've sucessfully finished the (inject)-generate-fetch-parse-update cycle
several times (Nutch 1.6 on Ubuntu 12.04), but now the command
./bin/nutch invertlinks crawl_instanz1/crawldb/ -dir
crawl_instanz1/segments/
fails with:
LinkDb: starting at 2013-02-16 11:23:21
LinkDb: linkdb: crawl_instanz1/crawldb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: internal links will be ignored.
LinkDb: adding segment:
file:/home/nutch/nutch-1.6-instanz-1/crawl_instanz1/segments/20130215170554
LinkDb: adding segment:
file:/home/nutch/nutch-1.6-instanz-1/crawl_instanz1/segments/20130215172647
LinkDb: adding segment:
file:/home/nutch/nutch-1.6-instanz-1/crawl_instanz1/segments/20130215164631
LinkDb: merging with existing linkdb: crawl_instanz1/crawldb
LinkDb: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:195)
at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:295)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:260)
hadoop.log says:
2013-02-16 11:23:35,202 WARN mapred.LocalJobRunner - job_local_0002
java.lang.ClassCastException: org.apache.nutch.crawl.CrawlDatum cannot be
cast to org.apache.nutch.crawl.Inlinks
at org.apache.nutch.crawl.LinkDbFilter.map(LinkDbFilter.java:39)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
2013-02-16 11:23:36,098 ERROR crawl.LinkDb - LinkDb: java.io.IOException:
Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:195)
at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:295)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:260)
What can I do?
Thanks.
Peter