Have you tried the following:

http://wiki.apache.org/nutch/HardwareRequirements

and

http://wiki.apache.org/nutch/

There are no quick answer if one is planning to crawl million
pages..Read..Try.. Read..


On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I want to know if anyone is able to successfully run distributed crawl on
> multiple machines involving crawling millions of pages? and how hard is to
> do that? Do i just have to do some configuration and set up or do some
> implementations also?
>
> Also can anyone tell me if i want to crawl around 20,000 websites (say for
> depth 5) in a day, is it possible and if yes then how many machines would i
> roughly require? and what all configurations i will need? I would appreciate
> even some very approximate numbers also as i can understand it might not be
> trivial to find out or may be :-)
>
> TIA
> Pushpesh
>
>

Reply via email to