Hi NN,

Thanks for replying me. Actually I wanted to know if distributed crawling in
nutch is working fine and to what success? Like i am successful in setting
up distributed crawl for 2 machines (1 master and 1 slave) but when i try
with more than two machines there seems problem specially while injecting
urls in crawlDB. So was wondering if anyone is successful in doing a massive
crawl using nutch involving crawling of millions of pages successfully?

My requirement is to crawl like 20,000 websites (for say depth 5) in a day
and i was wondering how many machines would it require to do that.

Would truely appreciate any response on this.

Thanks In Advance
Pushpesh


On 12/28/05, Nutch Newbie <[EMAIL PROTECTED]> wrote:
>
> Have you tried the following:
>
> http://wiki.apache.org/nutch/HardwareRequirements
>
> and
>
> http://wiki.apache.org/nutch/
>
> There are no quick answer if one is planning to crawl million
> pages..Read..Try.. Read..
>
>
> On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I want to know if anyone is able to successfully run distributed crawl
> on
> > multiple machines involving crawling millions of pages? and how hard is
> to
> > do that? Do i just have to do some configuration and set up or do some
> > implementations also?
> >
> > Also can anyone tell me if i want to crawl around 20,000 websites (say
> for
> > depth 5) in a day, is it possible and if yes then how many machines
> would i
> > roughly require? and what all configurations i will need? I would
> appreciate
> > even some very approximate numbers also as i can understand it might not
> be
> > trivial to find out or may be :-)
> >
> > TIA
> > Pushpesh
> >
> >
>

Reply via email to