Any chance you could walk through your implementation?
 Like how the twenty boxes were assigned?  Maybe
upload your confs somewhere, and outline what commands
you actually ran?

Thanks,
Earl

--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> Pushpesh Kr. Rajwanshi wrote:
> > I want to know if anyone is able to successfully
> run distributed crawl on
> > multiple machines involving crawling millions of
> pages? and how hard is to
> > do that? Do i just have to do some configuration
> and set up or do some
> > implementations also?
> 
> I recently performed a four-level deep crawl,
> starting from urls in 
> DMOZ, limiting each level to 16M urls.  This ran on
> 20 machines taking 
> around 24 hours using about 100Mbit and retrieved
> around 50M pages.  I 
> used Nutch unmodified, specifying only a few
> configuration options.  So, 
> yes, it is possible.
> 
> Doug
> 



        
                
__________________________________ 
Yahoo! for Good - Make a difference this year. 
http://brand.yahoo.com/cybergivingweek2005/

Reply via email to