Hi,
I am getting the following exception when I do a crawl using nutch. I am
kind of stuck due to this. I would really appreciate any pointers in
resolving this. I got a related mail thread here
<http://www.mail-archive.com/[email protected]/msg07745.htm>but
it doesn't describe a solution to the problem.
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.indexer.DeleteDuplicates.dedup(
DeleteDuplicates.java:439)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
I looked at hadoop.log and it has the following stack trace.
mapred.TaskTracker - Error running child
java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java
:113)
at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
DeleteDuplicates.java:176)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
:1445)
Thanks,
Manoj.
--
Tired of reading blogs? Listen to your favorite blogs at
http://www.blogbard.com !!!!