Ok, I ran some bigger test crawls > 150K with the 0.9RC. Everything worked fine (inject, generate, fetch, updatedb, readdb, linkdb, mergesegs, mergdb, merge, index) except delete duplicates on which I am getting this error when running against segment indexes on the DFS.
Because of the way I am automating some of my crawls (sorting names by alpha and only running part of the list), only one segment part-xxxxx had results and then others had 0 results. I don't know if that would cause this and I don't think this bug is critical for the 0.9 release but I wanted to bring it up. My guess would be that this is a small bug within the lucene libraries when the directories have 0 results. What is everyone's opinion on this in terms of the release? My vote would be to move forward with the release. Dennis Kubes Task Id : task_0027_m_000003_3, Status : FAILED task_0027_m_000003_3: Error running child task_0027_m_000003_3: java.lang.ArrayIndexOutOfBoundsException: -1 task_0027_m_000003_3: at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) task_0027_m_000003_3: at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:176) task_0027_m_000003_3: at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) task_0027_m_000003_3: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) task_0027_m_000003_3: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175) task_0027_m_000003_3: at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445) DeleteDuplicates: java.io.IOException: Job failed! Chris Mattmann wrote: > Folks, > > As an FYI, here is a link to the log of the steps that I followed to get to > this point in the release: > > http://people.apache.org/~mattmann/NUTCH_0.9_release_log_v2.doc > > Cheers, > Chris > > > > On 4/2/07 10:52 PM, "Chris Mattmann" <[EMAIL PROTECTED]> wrote: > >> Hi Folks, >> >> I have posted a candidate for the Apache Nutch 0.9 release at >> >> http://people.apache.org/~mattmann/nutch_0.9/rc2/ >> >> See the included CHANGES-0.9.txt file for details on release >> contents and latest changes. The release was made from the 0.9-dev trunk, >> including the recent patch applied by Dennis. I've also created a branch for >> this release candidate at: >> http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.9. >> >> Please vote on releasing these packages as Apache Nutch 0.9. >> The vote is open for the next 72 hours. Only votes from Nutch >> committers are binding, but everyone is welcome to check the release >> candidate and voice their approval or disapproval. The vote passes if >> at least three binding +1 votes are cast. >> >> [ ] +1 Release the packages as Apache Nutch 0.9 >> [ ] -1 Do not release the packages because... >> >> Thanks! >> >> Cheers, >> >> Chris >> >> >> >> >> > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
