How many fetcher threads do you have at play? Also Are you separating fetching and parsing?
These are (generally speaking) places to get started. On Tue, Feb 21, 2012 at 8:19 AM, Bharat Goyal <[email protected]>wrote: > Hi, > > I have a list of around 1000 seed URLS, which I crawl till depth=2 or 3. > This is done on a local machine having a configuration(having no other > large resource consuming processes running) : > Dual Core (2.4 GHz), > 4GB Ram > > It takes around 14-15 hours to crawl this seedlist, which generates > around 21k web page content. Is there any way this can be optimized and > takes less time, Nutch(1.2) settings are all default. > > Thanks for the help. > > Regards, > > Bharat Goyal > > DISCLAIMER > This email is intended only for the person or the entity to whom it is > addressed and may contain information which is confidential and privileged. > Any review, retransmission, dissemination or any other use of the said > information by person or entities other than intended recipient is > unauthorized and prohibited. If you are not the intended recipient, please > delete this email and contact the sender. > -- *Lewis*

