> My Nutch cycle completed successfully over the weekend. Deployment and > searching also works fine. > > The only major/minor functional difference I noticed was that during > fetching Hadoop stored the fetched data in memory until it reached a > certain amount (100 megabytes or so) then wrote it all to disk (in the
Yes. The hadoop team implemented a in memory buffer and spill to disk functionality. I believe the about stored in memory before spills is configurable. Dennis Kubes > Hadoop temp directory) at once. During the write operation, which lasted > no more then 8 seconds each time the Java thread count would lower down to > between 6 and 10. This differed from previous versions where the data was > streamed in real-time, but I don't see this costing anything in > performance as the Java processes didn't lock despite the lowering of > total threads. > > > ----- Original Message ---- > From: Sean Dean <[EMAIL PROTECTED]> > To: nutch-dev@lucene.apache.org > Sent: Wednesday, March 7, 2007 6:52:05 PM > Subject: Re: 0.9 release > > > Great, thanks a lot. > > I have started a complete Nutch cycle (generate, fetch, updatedb, > invertlinks, index and dedup) on a 13 million document segment, and this > should take no longer then a couple days. I will let you know of any > problems, but hopefully it will work out with no errors at all. > > All this testing will be based off revision 515791 in trunk. > > > ----- Original Message ---- > From: Andrzej Bialecki <[EMAIL PROTECTED]> > To: nutch-dev@lucene.apache.org > Sent: Wednesday, March 7, 2007 5:04:21 PM > Subject: Re: 0.9 release > > > Sean Dean wrote: >> As it stands now with whats in trunk under 0.9-dev, one of the biggest >> problems is the version of Hadoop we have included. It fails on anything >> above 200k URLs, and should be considered a "blocker" issue. >> >> Its my understanding that Andrzej has a newer Hadoop JAR with some >> custom patches applied, but hasn't had the time yet to commit them back >> to Nutch. When he does get the chance, some testing will need to be >> initiated and I can be a small help there. >> >> Not to make it sound like the end of the world, but since almost >> everything in Nutch revolves around Hadoop we should get this issue >> corrected before we make other "big" plans for fixes and changes. >> > > To be precise, the version is 0.11.2 release and it's been committed > just now (rev. 515791). Your help in testing would be most welcome ... > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers