Re: Optimising the speed of Nutch.

2012-02-22 Thread Bharat Goyal
Went through the checklist and made some changes as in increased the no of fetcher threads from default 10 to 30, but I still see nutch eating up all the resources, the CPU usage is as high as 100% -Bharat On Tuesday 21 February 2012 04:45 PM, Julien Nioche wrote: See

Re: Optimising the speed of Nutch.

2012-02-22 Thread remi tassing
Try decreasing the number of fetcher threads instead... On Wed, Feb 22, 2012 at 2:33 PM, Bharat Goyal bharat.go...@shiksha.comwrote: Went through the checklist and made some changes as in increased the no of fetcher threads from default 10 to 30, but I still see nutch eating up all the

Optimising the speed of Nutch.

2012-02-21 Thread Bharat Goyal
Hi, I have a list of around 1000 seed URLS, which I crawl till depth=2 or 3. This is done on a local machine having a configuration(having no other large resource consuming processes running) : Dual Core (2.4 GHz), 4GB Ram It takes around 14-15 hours to crawl this seedlist, which generates

Re: Optimising the speed of Nutch.

2012-02-21 Thread Julien Nioche
See http://*wiki*.apache.org/*nutch*/OptimizingCrawls for a checklist On 21 February 2012 10:47, Bharat Goyal bharat.go...@shiksha.com wrote: No of fetcher threads is equal to default value(10), What is the optimum value for no of threads? Also, the fetching and parsing are not seperate.