[ http://issues.apache.org/jira/browse/NUTCH-69?page=all ] Andrzej Bialecki resolved NUTCH-69: ------------------------------------
Resolution: Invalid This behaviour is caused by improper configuration. When crawling less hosts than (fetcher threads / threads per host), some threads will always be blocked. Solution: change configuration to use less threads, or more threads per host, or increase the max.http.delay so that blocked threads would wait longer.. > fetcher.threads.per.host ignored > -------------------------------- > > Key: NUTCH-69 > URL: http://issues.apache.org/jira/browse/NUTCH-69 > Project: Nutch > Type: Bug > Components: fetcher > Reporter: Matthias Jaekle > > Fetcher ignores 'maximum threads per host'. > If you fetch less domains with multiple threads, some webservers feel > attacked or could not serve you any more. > So you loose lots of existing pages in your segments. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual core and dual graphics technology at this free one hour event hosted by HP, AMD, and NVIDIA. To register visit http://www.hp.com/go/dualwebinar _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers