Hi folks,
Would anyone be willing to share a few pros/cons of using many nodes vs. 1 very powerful machine for large-scale crawling? Of course many advantages and disadvantages overlap with Hadoop and distributed computing in general, but what I'm actually looking for are good reasons not to use a single machine for Nutch. One example could be that more machines give you more IP addresses for fetching, and therefore less opportunity for being blocked by web admins, correct? Joe

