Hi,

I want to know if anyone is able to successfully run distributed crawl on
multiple machines involving crawling millions of pages? and how hard is to
do that? Do i just have to do some configuration and set up or do some
implementations also?

Also can anyone tell me if i want to crawl around 20,000 websites (say for
depth 5) in a day, is it possible and if yes then how many machines would i
roughly require? and what all configurations i will need? I would appreciate
even some very approximate numbers also as i can understand it might not be
trivial to find out or may be :-)

TIA
Pushpesh

Reply via email to