How many fetcher threads do you have at play?
Also Are you separating fetching and parsing?

These are (generally speaking) places to get started.

On Tue, Feb 21, 2012 at 8:19 AM, Bharat Goyal <[email protected]>wrote:

> Hi,
>
> I have a list of around 1000 seed URLS, which I crawl till depth=2 or 3.
> This is done on a local machine having a configuration(having no other
> large resource consuming processes running) :
> Dual Core (2.4 GHz),
> 4GB Ram
>
> It takes around 14-15 hours to crawl this seedlist, which generates
> around 21k web page content. Is there any way this can be optimized and
> takes less time, Nutch(1.2) settings are all default.
>
> Thanks for the help.
>
> Regards,
>
> Bharat Goyal
>
> DISCLAIMER
> This email is intended only for the person or the entity to whom it is
> addressed and may contain information which is confidential and privileged.
> Any review, retransmission, dissemination or any other use of the said
> information by person or entities other than intended recipient is
> unauthorized and prohibited. If you are not the intended recipient, please
> delete this email and contact the sender.
>



-- 
*Lewis*

Reply via email to