Hello. I apparently had a similar problem when trying to Dedup, I solved it updating nutch with the following patch
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06705.html I hope this will help you, good luck! 2008/1/13, Manoj Bist <[EMAIL PROTECTED]>: > Hi, > > I am getting the following exception when I do a crawl using nutch. I am > kind of stuck due to this. I would really appreciate any pointers in > resolving this. I got a related mail thread here > <http://www.mail-archive.com/[email protected]/msg07745.htm>but > it doesn't describe a solution to the problem. > > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) > at org.apache.nutch.indexer.DeleteDuplicates.dedup( > DeleteDuplicates.java:439) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:135) > > I looked at hadoop.log and it has the following stack trace. > > mapred.TaskTracker - Error running child > java.lang.ArrayIndexOutOfBoundsException: -1 > at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java > :113) > at > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next( > DeleteDuplicates.java:176) > at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java > :1445) > > > Thanks, > > Manoj. > > -- > Tired of reading blogs? Listen to your favorite blogs at > http://www.blogbard.com !!!! >
