Re: Is any one able to successfully run Distributed Crawl?

Gal Nitzan Tue, 03 Jan 2006 00:10:54 -0800

+1

On Mon, 2006-01-02 at 13:39 -0800, Earl Cahill wrote:
> Any chance you could walk through your implementation?
>  Like how the twenty boxes were assigned?  Maybe
> upload your confs somewhere, and outline what commands
> you actually ran?
> 
> Thanks,
> Earl
> 
> --- Doug Cutting <[EMAIL PROTECTED]> wrote:
> 
> > Pushpesh Kr. Rajwanshi wrote:
> > > I want to know if anyone is able to successfully
> > run distributed crawl on
> > > multiple machines involving crawling millions of
> > pages? and how hard is to
> > > do that? Do i just have to do some configuration
> > and set up or do some
> > > implementations also?
> > 
> > I recently performed a four-level deep crawl,
> > starting from urls in 
> > DMOZ, limiting each level to 16M urls.  This ran on
> > 20 machines taking 
> > around 24 hours using about 100Mbit and retrieved
> > around 50M pages.  I 
> > used Nutch unmodified, specifying only a few
> > configuration options.  So, 
> > yes, it is possible.
> > 
> > Doug
> > 
> 
> 
> 
>       
>               
> __________________________________ 
> Yahoo! for Good - Make a difference this year. 
> http://brand.yahoo.com/cybergivingweek2005/
>

Re: Is any one able to successfully run Distributed Crawl?

Reply via email to