Looks like you're having exactly the opposite problem : the timeouts are
likely to be due to the number of threads which is too large for your
bandwidth. Try a smaller value and see if it works better. Could be also
that the target host detects that you are querying too hard and blocks you.

BTW Unless the site you are hitting is yours, I would not advise you to hit
it with more than 1 thread at a time as it would be considered impolite.


On 10 January 2011 16:40, Marseld Dedgjonaj
<[email protected]>wrote:

> Hello everyone,
>
> I installed nutch to crawl all links of a single website.
>
> When I crawl with default values of fetcher.threads.fetch(10) and
> fetcher.threads.per.host(1) parameters works fine, but performance is not
> good? (The most part of CPU and Bandwith is not used)
>
> if I change these parameters in nutch-site.xml to have a better usage of
> the
> recourses:
>
>
>
> <property>
>
>  <name>fetcher.threads.fetch</name>
>
>  <value>80</value>
>
> </property>
>
>
>
> <property>
>
>  <name>fetcher.threads.per.host</name>
>
>  <value>80</value>
>
> </property>
>
>
>
> I will having a "fetch of http://www.mysite.com/... failed with:
> java.net.SocketTimeoutException: Read timed out" exception for each url.
>
>
>
> I have CPU: core 2 quad and 8 GB of RAM
>
> Please any advice what to do?
>
>
>
> Thanks in advance
>
> Marseldi
>
>
>
>
>
> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Gjeni
> <b>Pun&euml; t&euml; Mir&euml;</b> dhe <b>t&euml; Mir&euml; p&euml;r
> Pun&euml;</b>... Vizitoni: <a target="_blank" href="http://www.punaime.al/
> ">www.punaime.al</a></span></p>
> <p><a target="_blank" href="http://www.punaime.al/";><span
> style="text-decoration: none;"><img width="165" height="31" border="0"
> alt="punaime" src="http://www.ikub.al/images/punaime.al_small.png";
> /></span></a></p>
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to