Dennis, I have been following this thread. Myslef being a recent adopter trying to learn the art and science of hadoop and nutch. More with nutch though. Is there a way, you can document "Lessons learned" ? It can reduce quite a bit of heart breaks during various phases of crawling. I can help you document it if need be.
Thanks On 3/17/06, Dennis Kubes <[EMAIL PROTECTED]> wrote: > > Finally got an index working with the Hadoop file system but just to do > the > apache.org site took around 2-3 hours and on each machine the mapreduce > local data was around 4.5 Gigs. Anybody know what might be causing this? > > Dennis > >
