NUTCH-436 has a patch now if we want to add that to this release. Dennis Kubes
Andrzej Bialecki wrote: > Sean Dean wrote: >> As for which Hadoop version is included in the next Nutch release, I >> share the same concern as Sami with 0.10.1 as it NPE's on anything >> above 100-200k URLs. I can volunteer to test any other version we are >> interested in, my regular fetches are about 13 million URLs and take a >> couple days to complete. >> >> If anyone has a specific Hadoop jar they would like to share I don't >> mind testing it, otherwise I can just build the "most popular" version >> from source and replace that with my current one. For the record, I've >> been using Hadoop 0.9.1 for the longest time without any problems on >> these somewhat large crawls. >> >> > > It's clear to me then that we should bring Nutch to 0.11.2 first anyway. > Then, if we have time and if you are willing, we could test the 0.12 and > if it's stable enough for your 13 mln crawl then it's likely it's good > enough for the rest of us. > > If there are no dissenting votes, I'll apply the patch to bring in > 0.11.2 some time tomorrow. I will also create a JIRA issue and attach > the patches from that revision to Hadoop 0.12 so that folks may test them. > > Thanks for your comments! > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers