Re: Nutch Hadoop Optimization

Julien Nioche Thu, 15 Dec 2011 12:00:35 -0800

> So I have Nutch running on a hadoop cluster with three data nodes.  The
> machines are all pretty beefy, but Nutch isn't performing any faster than
> when I was running in pseudo mode on one machine. How to I set Nutch in
> order to take full advantage of the cluster?
>


Having beefy machines is not going to be very useful for the fetching step
which is IO bound and usually takes most of the time.
How big is your crawldb?  How long do the generate / parse and update steps
take? Having more than one machine won't make a massive difference if your
crawldb or segments are small.

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Nutch Hadoop Optimization

Reply via email to