[ http://issues.apache.org/jira/browse/NUTCH-69?page=all ]
     
Andrzej Bialecki  resolved NUTCH-69:
------------------------------------

    Resolution: Invalid

This behaviour is caused by improper configuration. When crawling less hosts 
than (fetcher threads / threads per host), some threads will always be blocked. 
Solution: change configuration to use less threads, or more threads per host, 
or increase the max.http.delay so that blocked threads would wait longer..

> fetcher.threads.per.host ignored
> --------------------------------
>
>          Key: NUTCH-69
>          URL: http://issues.apache.org/jira/browse/NUTCH-69
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Reporter: Matthias Jaekle

>
> Fetcher ignores 'maximum threads per host'.
> If you fetch less domains with multiple threads, some webservers feel 
> attacked or could not serve you any more.
> So you loose lots of existing pages in your segments.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
core and dual graphics technology at this free one hour event hosted by HP, 
AMD, and NVIDIA.  To register visit http://www.hp.com/go/dualwebinar
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to