Re: Is any one able to successfully run Distributed Crawl?

Doug Cutting Mon, 02 Jan 2006 11:10:33 -0800

Pushpesh Kr. Rajwanshi wrote:

I want to know if anyone is able to successfully run distributed crawl on
multiple machines involving crawling millions of pages? and how hard is to
do that? Do i just have to do some configuration and set up or do some
implementations also?

I recently performed a four-level deep crawl, starting from urls inDMOZ, limiting each level to 16M urls. This ran on 20 machines takingaround 24 hours using about 100Mbit and retrieved around 50M pages. Iused Nutch unmodified, specifying only a few configuration options. So,yes, it is possible.


Doug

Re: Is any one able to successfully run Distributed Crawl?

Reply via email to