Ok, I ran some bigger test crawls > 150K with the 0.9RC.  Everything 
worked fine (inject, generate, fetch, updatedb, readdb, linkdb, 
mergesegs, mergdb, merge, index) except delete duplicates on which I am 
getting this error when running against segment indexes on the DFS.

Because of the way I am automating some of my crawls (sorting names by 
alpha and only running part of the list), only one segment part-xxxxx 
had results and then others had 0 results.  I don't know if that would 
cause this and I don't think this bug is critical for the 0.9 release 
but I wanted to bring it up.

My guess would be that this is a small bug within the lucene libraries 
when the directories have 0 results.  What is everyone's opinion on this 
in terms of the release?  My vote would be to move forward with the release.

Dennis Kubes

Task Id : task_0027_m_000003_3, Status : FAILED
task_0027_m_000003_3: Error running child
task_0027_m_000003_3: java.lang.ArrayIndexOutOfBoundsException: -1
task_0027_m_000003_3:   at 
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
task_0027_m_000003_3:   at 
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:176)
task_0027_m_000003_3:   at 
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
task_0027_m_000003_3:   at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
task_0027_m_000003_3:   at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
task_0027_m_000003_3:   at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
DeleteDuplicates: java.io.IOException: Job failed!

Chris Mattmann wrote:
> Folks,
> 
>  As an FYI, here is a link to the log of the steps that I followed to get to
> this point in the release:
> 
> http://people.apache.org/~mattmann/NUTCH_0.9_release_log_v2.doc
> 
> Cheers,
>   Chris
> 
> 
> 
> On 4/2/07 10:52 PM, "Chris Mattmann" <[EMAIL PROTECTED]> wrote:
> 
>> Hi Folks,
>>  
>> I have posted a candidate for the Apache Nutch 0.9 release at
>>  
>>  http://people.apache.org/~mattmann/nutch_0.9/rc2/
>>  
>> See the included CHANGES-0.9.txt file for details on release
>> contents and latest changes. The release was made from the 0.9-dev trunk,
>> including the recent patch applied by Dennis. I've also created a branch for
>> this release candidate at:
>> http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.9.
>>  
>> Please vote on releasing these packages as Apache Nutch 0.9.
>> The vote is open for the next 72 hours. Only votes from Nutch
>> committers are binding, but everyone is welcome to check the release
>> candidate and voice their approval or disapproval. The vote  passes if
>> at least three binding +1 votes are cast.
>>  
>> [ ] +1 Release the packages as Apache Nutch 0.9
>> [ ] -1 Do not release the packages because...
>>  
>> Thanks!
>>  
>> Cheers,
>>
>>  Chris
>>
>>
>>
>>
>>
> 
> 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to