Hi NN, Thanks for replying me. Actually I wanted to know if distributed crawling in nutch is working fine and to what success? Like i am successful in setting up distributed crawl for 2 machines (1 master and 1 slave) but when i try with more than two machines there seems problem specially while injecting urls in crawlDB. So was wondering if anyone is successful in doing a massive crawl using nutch involving crawling of millions of pages successfully?
My requirement is to crawl like 20,000 websites (for say depth 5) in a day and i was wondering how many machines would it require to do that. Would truely appreciate any response on this. Thanks In Advance Pushpesh On 12/28/05, Nutch Newbie <[EMAIL PROTECTED]> wrote: > > Have you tried the following: > > http://wiki.apache.org/nutch/HardwareRequirements > > and > > http://wiki.apache.org/nutch/ > > There are no quick answer if one is planning to crawl million > pages..Read..Try.. Read.. > > > On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I want to know if anyone is able to successfully run distributed crawl > on > > multiple machines involving crawling millions of pages? and how hard is > to > > do that? Do i just have to do some configuration and set up or do some > > implementations also? > > > > Also can anyone tell me if i want to crawl around 20,000 websites (say > for > > depth 5) in a day, is it possible and if yes then how many machines > would i > > roughly require? and what all configurations i will need? I would > appreciate > > even some very approximate numbers also as i can understand it might not > be > > trivial to find out or may be :-) > > > > TIA > > Pushpesh > > > > >
