Have you tried the following: http://wiki.apache.org/nutch/HardwareRequirements
and http://wiki.apache.org/nutch/ There are no quick answer if one is planning to crawl million pages..Read..Try.. Read.. On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote: > Hi, > > I want to know if anyone is able to successfully run distributed crawl on > multiple machines involving crawling millions of pages? and how hard is to > do that? Do i just have to do some configuration and set up or do some > implementations also? > > Also can anyone tell me if i want to crawl around 20,000 websites (say for > depth 5) in a day, is it possible and if yes then how many machines would i > roughly require? and what all configurations i will need? I would appreciate > even some very approximate numbers also as i can understand it might not be > trivial to find out or may be :-) > > TIA > Pushpesh > >
