Dennis,
I have been following this thread. Myslef being a recent adopter trying to
learn the art and science of hadoop and nutch. More with nutch though.
Is there a way, you can document "Lessons learned" ? It can reduce quite a
bit of heart breaks during various phases of crawling. I can help you
document it if need be.

Thanks


On 3/17/06, Dennis Kubes <[EMAIL PROTECTED]> wrote:
>
> Finally got an index working with the Hadoop file system but just to do
> the
> apache.org site took around 2-3 hours and on each machine the mapreduce
> local data was around 4.5 Gigs.  Anybody know what might be causing this?
>
> Dennis
>
>

Reply via email to