Sean Dean wrote:
> As for which Hadoop version is included in the next Nutch release, I share 
> the same concern as Sami with 0.10.1 as it NPE's on anything above 100-200k 
> URLs. I can volunteer to test any other version we are interested in, my 
> regular fetches are about 13 million URLs and take a couple days to complete.
>  
> If anyone has a specific Hadoop jar they would like to share I don't mind 
> testing it, otherwise I can just build the "most popular" version from source 
> and replace that with my current one. For the record, I've been using Hadoop 
> 0.9.1 for the longest time without any problems on these somewhat large 
> crawls.
>
>   

It's clear to me then that we should bring Nutch to 0.11.2 first anyway. 
Then, if we have time and if you are willing, we could test the 0.12 and 
if it's stable enough for your 13 mln crawl then it's likely it's good 
enough for the rest of us.

If there are no dissenting votes, I'll apply the patch to bring in 
0.11.2 some time tomorrow. I will also create a JIRA issue and attach 
the patches from that revision to Hadoop 0.12 so that folks may test them.

Thanks for your comments!

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to