On Monday 15 August 2011 14:59:20 webdev1977 wrote:
> I have been looking at pros and cons of running nutch locally in
> psuedo-distributed mode.  I have a very large machine with lots of
> processors and memory (16gb).  I am not able to get more machines to setup
> a proper hadoop cluster.
> 
> Is it worth the overhead to setup hadoop in pseduo distributed mode? Will I
> see any gains in fetching large amounts of content from only three domains?

You've many cores that you don't utilize right now which you can in pseudo-
mode. Fetching probably won't go faster since that's not a real bottleneck in 
many cases. The slow jobs are parsing, updating the crawldb (if it is large) 
or merging the linkdb (terrible performance).

> 
> If it is worth it, can anyone point me to a good tutorial/post for setting
> it up?

Google hadoop nutch tutorial?

> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-running-nutch-in-psuedo-distributed-
> mode-really-worth-it-tp3255677p3255677.html Sent from the Nutch - User
> mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to