Ok guys,

I once again need some advice here.  I have 4 dual proc quad core 1.8 xeon 
servers, each server has 4gb ram and runs linux.  I am using nutch svn (build 
#334 i think) and am using hadoop dfs.  I need to know what parameters I can 
set to get the optimal performance from these servers.  I  have a seed list of 
about 10,000 urls (ignore external link will be set to true).  My goal is to 
crawl in the shortest period of time.  Furthermore I intend to run one crawl 
(depth 5) and thus have one index.  

What advice would you give in terms of this approach and also in terms of 
nutch/hadoop variables/parameters and their settings.

Regards,
 
Hilkiah G. Lavinier MEng (Hons), ACGI 
6 Winston Lane, 
Goodwill, 
Roseau, Dominica 
Mbl: (767) 275 3382
Hm : (767) 440 3924
Fax: (767) 440 4991
VoIP USA: (646) 432 4487
 
Email: [EMAIL PROTECTED]
Email: [EMAIL PROTECTED]
IM: Yahoo hilkiah / MSN [EMAIL PROTECTED]
IM: ICQ #8978201  / AOL hilkiah21






      
____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

Reply via email to