The crawl for 1M pages completed successfully. There was an issue with doing a copyToLocal but that has already been filed as a HADOOP bug and the patch will be included in 0.12.x
Statistics for CrawlDb: crawldb TOTAL urls: 10839170 retry 0: 10816148 retry 1: 23022 min score: 0.0090 avg score: 0.173 max score: 2119.167 status 1 (db_unfetched): 9899275 status 2 (db_fetched): 667354 status 3 (db_gone): 11195 status 4 (db_redir_temp): 219507 status 5 (db_redir_perm): 41839 Dennis Kubes Andrzej Bialecki wrote: > Dennis Kubes wrote: >> >> >> Andrzej Bialecki wrote: >>> Dennis Kubes wrote: >>>> I agree there may be subtle bugs. >>>> >>>> I can do say a full dmoz crawl (~5M pages) with nutch trunk and hadoop >>>> 12.1 on a small cluster of 5 machines if this would help? We have >>>> already >>>> >>> >>> Certainly, that would be most welcome. >> >> I will start that up today. > > Thanks! > >>> >>> 0.12.1 is not out the door yet. I can create a patch that uses the >>> latest Hadoop trunk binaries, so that we could test it. >> >> I can just pull it down from source. Let me know if that isn't what >> we want'. > > Great, please do. > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers