Hi folks,

 

Would anyone be willing to share a few pros/cons of using many nodes vs. 1
very powerful machine for large-scale crawling? Of course many advantages
and disadvantages overlap with Hadoop and distributed computing in general,
but what I'm actually looking for are good reasons not to use a single
machine for Nutch.

 

One example could be that more machines give you more IP addresses for
fetching, and therefore less opportunity for being blocked by web admins,
correct?

 

Joe

Reply via email to