Try decreasing the number of fetcher threads instead...

On Wed, Feb 22, 2012 at 2:33 PM, Bharat Goyal <[email protected]>wrote:

> Went through the checklist and made some changes as in increased the no
> of fetcher threads from default 10 to 30, but I still see nutch eating
> up all the resources, the CPU usage is as high as 100%
>
> -Bharat
>
> On Tuesday 21 February 2012 04:45 PM, Julien Nioche wrote:
>
>> See 
>> http://*wiki*.apache.org/***nutch*/OptimizingCrawls<http://apache.org/*nutch*/OptimizingCrawls>for
>>  a checklist
>>
>> On 21 February 2012 10:47, Bharat Goyal<[email protected]**>
>>  wrote:
>>
>>  No of fetcher threads is equal to default value(10), What is the optimum
>>> value for no of threads? Also, the fetching and parsing are not seperate.
>>>
>>> -Bharat
>>>
>>>
>>> On Tuesday 21 February 2012 04:11 PM, Lewis John Mcgibbney wrote:
>>>
>>>  How many fetcher threads do you have at play?
>>>> Also Are you separating fetching and parsing?
>>>>
>>>> These are (generally speaking) places to get started.
>>>>
>>>> On Tue, Feb 21, 2012 at 8:19 AM, Bharat Goyal<[email protected]*
>>>> ***
>>>>
>>>>> wrote:
>>>>>
>>>>  Hi,
>>>>
>>>>> I have a list of around 1000 seed URLS, which I crawl till depth=2 or
>>>>> 3.
>>>>> This is done on a local machine having a configuration(having no other
>>>>> large resource consuming processes running) :
>>>>> Dual Core (2.4 GHz),
>>>>> 4GB Ram
>>>>>
>>>>> It takes around 14-15 hours to crawl this seedlist, which generates
>>>>> around 21k web page content. Is there any way this can be optimized and
>>>>> takes less time, Nutch(1.2) settings are all default.
>>>>>
>>>>> Thanks for the help.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Bharat Goyal
>>>>>
>>>>> DISCLAIMER
>>>>> This email is intended only for the person or the entity to whom it is
>>>>> addressed and may contain information which is confidential and
>>>>> privileged.
>>>>> Any review, retransmission, dissemination or any other use of the said
>>>>> information by person or entities other than intended recipient is
>>>>> unauthorized and prohibited. If you are not the intended recipient,
>>>>> please
>>>>> delete this email and contact the sender.
>>>>>
>>>>>
>>>>>
>>>>  DISCLAIMER
>>> This email is intended only for the person or the entity to whom it is
>>> addressed and may contain information which is confidential and
>>> privileged.
>>> Any review, retransmission, dissemination or any other use of the said
>>> information by person or entities other than intended recipient is
>>> unauthorized and prohibited. If you are not the intended recipient,
>>> please
>>> delete this email and contact the sender.
>>>
>>>
>>
>>
>
> DISCLAIMER
> This email is intended only for the person or the entity to whom it is
> addressed and may contain information which is confidential and privileged.
> Any review, retransmission, dissemination or any other use of the said
> information by person or entities other than intended recipient is
> unauthorized and prohibited. If you are not the intended recipient, please
> delete this email and contact the sender.
>

Reply via email to