Sean Dean wrote: > As for which Hadoop version is included in the next Nutch release, I share > the same concern as Sami with 0.10.1 as it NPE's on anything above 100-200k > URLs. I can volunteer to test any other version we are interested in, my > regular fetches are about 13 million URLs and take a couple days to complete. > > If anyone has a specific Hadoop jar they would like to share I don't mind > testing it, otherwise I can just build the "most popular" version from source > and replace that with my current one. For the record, I've been using Hadoop > 0.9.1 for the longest time without any problems on these somewhat large > crawls. > >
It's clear to me then that we should bring Nutch to 0.11.2 first anyway. Then, if we have time and if you are willing, we could test the 0.12 and if it's stable enough for your 13 mln crawl then it's likely it's good enough for the rest of us. If there are no dissenting votes, I'll apply the patch to bring in 0.11.2 some time tomorrow. I will also create a JIRA issue and attach the patches from that revision to Hadoop 0.12 so that folks may test them. Thanks for your comments! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers