Hey guys
 
Been breaking my head over this error for a while now, but don't seem to be
getting anywhere! I have tried creating / recreating the index several
times, and also made sure that all settings were as "per the book". I read
somewhere on one of the other posts that this error could be due to a
corrupted index, but somehow, I don't think that's the case. I only have a
few urls in the index with depth 1, so it's not even a large crawl!
 
There are two directories in my crawled/indexes directory, viz. part-00000
and part-00001.
 
PS. This is a fresh install of nutch with a fresh index.
 
Please help before I go insane!!!!
 
Error log

Dedup: starting
Dedup: adding indexes in: crawled/indexes
DeleteDuplicates: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:402)
        at
org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
        at
org.apache.nutch.indexer.DeleteDuplicates.run(DeleteDuplicates.java:506)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
        at
org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java:490)

 
Additional error log
 
Task TASKID="tip_0009_m_000001" TASK_TYPE="MAP" TASK_STATUS="FAILED"
FINISH_TIME="1170237489795" ERROR="java.lang.ArrayIndexOutOfBoundsException:
-1
 at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:109)
 at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(De
leteDuplicates.java:176)
 at org.apache.hadoop.mapred.MapTask$2.next(MapTask.java:166)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:183)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1367)
 
Thanks
Hetal
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to