No of fetcher threads is equal to default value(10), What is the optimum
value for no of threads? Also, the fetching and parsing are not seperate.
-Bharat
On Tuesday 21 February 2012 04:11 PM, Lewis John Mcgibbney wrote:
How many fetcher threads do you have at play?
Also Are you separating fetching and parsing?
These are (generally speaking) places to get started.
On Tue, Feb 21, 2012 at 8:19 AM, Bharat Goyal<[email protected]>wrote:
Hi,
I have a list of around 1000 seed URLS, which I crawl till depth=2 or 3.
This is done on a local machine having a configuration(having no other
large resource consuming processes running) :
Dual Core (2.4 GHz),
4GB Ram
It takes around 14-15 hours to crawl this seedlist, which generates
around 21k web page content. Is there any way this can be optimized and
takes less time, Nutch(1.2) settings are all default.
Thanks for the help.
Regards,
Bharat Goyal
DISCLAIMER
This email is intended only for the person or the entity to whom it is
addressed and may contain information which is confidential and privileged.
Any review, retransmission, dissemination or any other use of the said
information by person or entities other than intended recipient is
unauthorized and prohibited. If you are not the intended recipient, please
delete this email and contact the sender.
DISCLAIMER
This email is intended only for the person or the entity to whom it is
addressed and may contain information which is confidential and privileged. Any
review, retransmission, dissemination or any other use of the said information
by person or entities other than intended recipient is unauthorized and
prohibited. If you are not the intended recipient, please delete this email and
contact the sender.