Hello everyone,

I installed nutch to crawl all links of a single website.

When I crawl with default values of fetcher.threads.fetch(10) and
fetcher.threads.per.host(1) parameters works fine, but performance is not
good? (The most part of CPU and Bandwith is not used)

if I change these parameters in nutch-site.xml to have a better usage of the
recourses:

 

<property>

  <name>fetcher.threads.fetch</name>

  <value>80</value>

</property>

 

<property> 

  <name>fetcher.threads.per.host</name>

  <value>80</value>

</property>

 

I will having a "fetch of http://www.mysite.com/... failed with:
java.net.SocketTimeoutException: Read timed out" exception for each url.

 

I have CPU: core 2 quad and 8 GB of RAM

Please any advice what to do?

 

Thanks in advance 

Marseldi

 



<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Gjeni <b>Pun&euml; 
t&euml; Mir&euml;</b> dhe <b>t&euml; Mir&euml; p&euml;r Pun&euml;</b>... 
Vizitoni: <a target="_blank" 
href="http://www.punaime.al/";>www.punaime.al</a></span></p>
<p><a target="_blank" href="http://www.punaime.al/";><span 
style="text-decoration: none;"><img width="165" height="31" border="0" 
alt="punaime" src="http://www.ikub.al/images/punaime.al_small.png"; 
/></span></a></p>

Reply via email to