Thanks Julien for your response.

But I see that I have a lot of unused bandwidth and I think the bandwidth is
not the problem.
The site which I am crawling is my site.
I want to crawl as soon as possible, so I need to use the max power of my
machine.

Thanks,
Marseldi

-----Original Message-----
From: Julien Nioche [mailto:[email protected]] 
Sent: Monday, January 10, 2011 5:49 PM
To: [email protected]
Subject: Re: Read time out exception during fetch process

Looks like you're having exactly the opposite problem : the timeouts are
likely to be due to the number of threads which is too large for your
bandwidth. Try a smaller value and see if it works better. Could be also
that the target host detects that you are querying too hard and blocks you.

BTW Unless the site you are hitting is yours, I would not advise you to hit
it with more than 1 thread at a time as it would be considered impolite.


On 10 January 2011 16:40, Marseld Dedgjonaj
<[email protected]>wrote:

> Hello everyone,
>
> I installed nutch to crawl all links of a single website.
>
> When I crawl with default values of fetcher.threads.fetch(10) and
> fetcher.threads.per.host(1) parameters works fine, but performance is not
> good? (The most part of CPU and Bandwith is not used)
>
> if I change these parameters in nutch-site.xml to have a better usage of
> the
> recourses:
>
>
>
> <property>
>
>  <name>fetcher.threads.fetch</name>
>
>  <value>80</value>
>
> </property>
>
>
>
> <property>
>
>  <name>fetcher.threads.per.host</name>
>
>  <value>80</value>
>
> </property>
>
>
>
> I will having a "fetch of http://www.mysite.com/... failed with:
> java.net.SocketTimeoutException: Read timed out" exception for each url.
>
>
>
> I have CPU: core 2 quad and 8 GB of RAM
>
> Please any advice what to do?
>
>
>
> Thanks in advance
>
> Marseldi
>
>
>
>
>
> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Gjeni
> <b>Pun&euml; t&euml; Mir&euml;</b> dhe <b>t&euml; Mir&euml; p&euml;r
> Pun&euml;</b>... Vizitoni: <a target="_blank" href="http://www.punaime.al/
> ">www.punaime.al</a></span></p>
> <p><a target="_blank" href="http://www.punaime.al/";><span
> style="text-decoration: none;"><img width="165" height="31" border="0"
> alt="punaime" src="http://www.ikub.al/images/punaime.al_small.png";
> /></span></a></p>
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com



<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Gjeni <b>Pun&euml; 
t&euml; Mir&euml;</b> dhe <b>t&euml; Mir&euml; p&euml;r Pun&euml;</b>... 
Vizitoni: <a target="_blank" 
href="http://www.punaime.al/";>www.punaime.al</a></span></p>
<p><a target="_blank" href="http://www.punaime.al/";><span 
style="text-decoration: none;"><img width="165" height="31" border="0" 
alt="punaime" src="http://www.ikub.al/images/punaime.al_small.png"; 
/></span></a></p>


Reply via email to