That's funny. I am already working on a tutorial, step by step how to setup Nutch and Hadoop from scratch over a cluster of 6 machines, 1 name node and 6 data nodes. Hopefully I will have it done tonight or tommorrow and I will post it on the list.
Dennis -----Original Message----- From: Vertical Search [mailto:[EMAIL PROTECTED] Sent: Friday, March 17, 2006 5:15 PM To: [email protected] Subject: Re: Large Mapreduce Sizes and Long Index Times Dennis, I have been following this thread. Myslef being a recent adopter trying to learn the art and science of hadoop and nutch. More with nutch though. Is there a way, you can document "Lessons learned" ? It can reduce quite a bit of heart breaks during various phases of crawling. I can help you document it if need be. Thanks On 3/17/06, Dennis Kubes <[EMAIL PROTECTED]> wrote: > > Finally got an index working with the Hadoop file system but just to do > the > apache.org site took around 2-3 hours and on each machine the mapreduce > local data was around 4.5 Gigs. Anybody know what might be causing this? > > Dennis > > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
