+1
On Mon, 2006-01-02 at 13:39 -0800, Earl Cahill wrote:
> Any chance you could walk through your implementation?
> Like how the twenty boxes were assigned? Maybe
> upload your confs somewhere, and outline what commands
> you actually ran?
>
> Thanks,
> Earl
>
> --- Doug Cutting <[EMAIL PROTECTED]> wrote:
>
> > Pushpesh Kr. Rajwanshi wrote:
> > > I want to know if anyone is able to successfully
> > run distributed crawl on
> > > multiple machines involving crawling millions of
> > pages? and how hard is to
> > > do that? Do i just have to do some configuration
> > and set up or do some
> > > implementations also?
> >
> > I recently performed a four-level deep crawl,
> > starting from urls in
> > DMOZ, limiting each level to 16M urls. This ran on
> > 20 machines taking
> > around 24 hours using about 100Mbit and retrieved
> > around 50M pages. I
> > used Nutch unmodified, specifying only a few
> > configuration options. So,
> > yes, it is possible.
> >
> > Doug
> >
>
>
>
>
>
> __________________________________
> Yahoo! for Good - Make a difference this year.
> http://brand.yahoo.com/cybergivingweek2005/
>