I was wondering if the current release of Nutch provides any support for
slow servers ? The issue has been previously described in the following
JIRA entry:
https://issues.apache.org/jira/browse/NUTCH-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588746#action_12588746
While being able to incorporate server latency information in the
generation of fetch lists is nice to have, I was wondering if any
configuration parameter is available to enforce a timeout on the
effective fetch duration for a single URL ? In my current setup, I'm
observing that over 50% of the time needed to complete a fetch task is
due to a handful of slow hosts.
Has anyone on the list been able to optimize their crawls to minimize
the impact of slow hosts ?
-yp