Large intranet crawl

Venkat Shyam Mon, 01 Oct 2007 11:04:14 -0700

I am trying to deploy a large intranet crawl (single domain - around 500,000 
documents) and want to use distributed crawl mechanism with atleast 3 to 4 
nodes for crawl/indexing. I have not been able to get nutch/hadoop to work in 
distributed fashion for a single domain. It looks like due to politeness a 
single domain can be crawled only from a single machine. If anyone has any 
experience crawling large intranet site please share.
   
  Shyam


       
---------------------------------
Shape Yahoo! in your own image.  Join our Network Research Panel today!

Large intranet crawl

Reply via email to