[Nutch-general] RE: Large Mapreduce Sizes and Long Index Times

Dennis Kubes Sat, 18 Mar 2006 14:27:01 -0800

That's funny.  I am already working on a tutorial, step by step how to 
setup Nutch and Hadoop from scratch over a cluster of 6 machines, 1 name
node and 6 data nodes.  Hopefully I will have it done tonight or tommorrow
and I will post it on the list.


Dennis 

-----Original Message-----
From: Vertical Search [mailto:[EMAIL PROTECTED] 
Sent: Friday, March 17, 2006 5:15 PM
To: [email protected]
Subject: Re: Large Mapreduce Sizes and Long Index Times

Dennis,
I have been following this thread. Myslef being a recent adopter trying to
learn the art and science of hadoop and nutch. More with nutch though.
Is there a way, you can document "Lessons learned" ? It can reduce quite a
bit of heart breaks during various phases of crawling. I can help you
document it if need be.

Thanks


On 3/17/06, Dennis Kubes <[EMAIL PROTECTED]> wrote:
>
> Finally got an index working with the Hadoop file system but just to do
> the
> apache.org site took around 2-3 hours and on each machine the mapreduce
> local data was around 4.5 Gigs.  Anybody know what might be causing this?
>
> Dennis
>
>



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] RE: Large Mapreduce Sizes and Long Index Times

Reply via email to