Any chance you could walk through your implementation?
Like how the twenty boxes were assigned? Maybe
upload your confs somewhere, and outline what commands
you actually ran?
Thanks,
Earl
--- Doug Cutting <[EMAIL PROTECTED]> wrote:
> Pushpesh Kr. Rajwanshi wrote:
> > I want to know if anyone is able to successfully
> run distributed crawl on
> > multiple machines involving crawling millions of
> pages? and how hard is to
> > do that? Do i just have to do some configuration
> and set up or do some
> > implementations also?
>
> I recently performed a four-level deep crawl,
> starting from urls in
> DMOZ, limiting each level to 16M urls. This ran on
> 20 machines taking
> around 24 hours using about 100Mbit and retrieved
> around 50M pages. I
> used Nutch unmodified, specifying only a few
> configuration options. So,
> yes, it is possible.
>
> Doug
>
__________________________________
Yahoo! for Good - Make a difference this year.
http://brand.yahoo.com/cybergivingweek2005/