I was wondering if the current release of Nutch provides any support for slow servers ? The issue has been previously described in the following JIRA entry:

https://issues.apache.org/jira/browse/NUTCH-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588746#action_12588746

While being able to incorporate server latency information in the generation of fetch lists is nice to have, I was wondering if any configuration parameter is available to enforce a timeout on the effective fetch duration for a single URL ? In my current setup, I'm observing that over 50% of the time needed to complete a fetch task is due to a handful of slow hosts.

Has anyone on the list been able to optimize their crawls to minimize the impact of slow hosts ?

-yp

Reply via email to