NUTCH-436 has a patch now if we want to add that to this release.

Dennis Kubes

Andrzej Bialecki wrote:
> Sean Dean wrote:
>> As for which Hadoop version is included in the next Nutch release, I 
>> share the same concern as Sami with 0.10.1 as it NPE's on anything 
>> above 100-200k URLs. I can volunteer to test any other version we are 
>> interested in, my regular fetches are about 13 million URLs and take a 
>> couple days to complete.
>>  
>> If anyone has a specific Hadoop jar they would like to share I don't 
>> mind testing it, otherwise I can just build the "most popular" version 
>> from source and replace that with my current one. For the record, I've 
>> been using Hadoop 0.9.1 for the longest time without any problems on 
>> these somewhat large crawls.
>>
>>   
> 
> It's clear to me then that we should bring Nutch to 0.11.2 first anyway. 
> Then, if we have time and if you are willing, we could test the 0.12 and 
> if it's stable enough for your 13 mln crawl then it's likely it's good 
> enough for the rest of us.
> 
> If there are no dissenting votes, I'll apply the patch to bring in 
> 0.11.2 some time tomorrow. I will also create a JIRA issue and attach 
> the patches from that revision to Hadoop 0.12 so that folks may test them.
> 
> Thanks for your comments!
> 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to