See http://*wiki*.apache.org/*nutch*/OptimizingCrawls for a checklist

On 21 February 2012 10:47, Bharat Goyal <[email protected]> wrote:

> No of fetcher threads is equal to default value(10), What is the optimum
> value for no of threads? Also, the fetching and parsing are not seperate.
>
> -Bharat
>
>
> On Tuesday 21 February 2012 04:11 PM, Lewis John Mcgibbney wrote:
>
>> How many fetcher threads do you have at play?
>> Also Are you separating fetching and parsing?
>>
>> These are (generally speaking) places to get started.
>>
>> On Tue, Feb 21, 2012 at 8:19 AM, Bharat Goyal<[email protected]**
>> >wrote:
>>
>>  Hi,
>>>
>>> I have a list of around 1000 seed URLS, which I crawl till depth=2 or 3.
>>> This is done on a local machine having a configuration(having no other
>>> large resource consuming processes running) :
>>> Dual Core (2.4 GHz),
>>> 4GB Ram
>>>
>>> It takes around 14-15 hours to crawl this seedlist, which generates
>>> around 21k web page content. Is there any way this can be optimized and
>>> takes less time, Nutch(1.2) settings are all default.
>>>
>>> Thanks for the help.
>>>
>>> Regards,
>>>
>>> Bharat Goyal
>>>
>>> DISCLAIMER
>>> This email is intended only for the person or the entity to whom it is
>>> addressed and may contain information which is confidential and
>>> privileged.
>>> Any review, retransmission, dissemination or any other use of the said
>>> information by person or entities other than intended recipient is
>>> unauthorized and prohibited. If you are not the intended recipient,
>>> please
>>> delete this email and contact the sender.
>>>
>>>
>>
>>
>
> DISCLAIMER
> This email is intended only for the person or the entity to whom it is
> addressed and may contain information which is confidential and privileged.
> Any review, retransmission, dissemination or any other use of the said
> information by person or entities other than intended recipient is
> unauthorized and prohibited. If you are not the intended recipient, please
> delete this email and contact the sender.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to