Hello. I apparently had a similar problem when trying to Dedup, I
solved it updating nutch with the following patch

http://www.mail-archive.com/[EMAIL PROTECTED]/msg06705.html

I hope this will help you, good luck!

2008/1/13, Manoj Bist <[EMAIL PROTECTED]>:
> Hi,
>
> I am getting the following exception when I do a crawl using nutch. I am
> kind of stuck due to this.  I would really appreciate any pointers in
> resolving this. I got a related mail thread here
> <http://www.mail-archive.com/[email protected]/msg07745.htm>but
> it doesn't describe a solution to the problem.
>
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>         at org.apache.nutch.indexer.DeleteDuplicates.dedup(
> DeleteDuplicates.java:439)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>
> I looked at hadoop.log and it has the following stack trace.
>
>  mapred.TaskTracker - Error running child
> java.lang.ArrayIndexOutOfBoundsException: -1
>         at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java
> :113)
>         at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> DeleteDuplicates.java:176)
>         at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1445)
>
>
> Thanks,
>
> Manoj.
>
> --
> Tired of reading blogs? Listen to  your favorite blogs at
> http://www.blogbard.com   !!!!
>

Reply via email to