[
https://issues.apache.org/jira/browse/NUTCH-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
behnam nikbakht updated NUTCH-1303:
---
Attachment: NUTCH-1303.patch
> Fetcher to skip queues for URLS getting repeated exceptions, based on
> percentage
>
>
> Key: NUTCH-1303
> URL: https://issues.apache.org/jira/browse/NUTCH-1303
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
>Affects Versions: 1.4
>Reporter: behnam nikbakht
> Labels: fetch
> Attachments: NUTCH-1303.patch
>
>
> as described in https://issues.apache.org/jira/browse/NUTCH-769, it is a good
> solution to skip queues with high exception value, but it is not easy to set
> value of fetcher.max.exceptions.per.queue when size of queues are different.
> i suggest that define a ratio instead of value, so if the ratio of exceptions
> per requests exceeds, then queue cleared.
> also, it is not sufficient to keep fetcher from high exceptions, value of
> fetcher.throughput.threshold.pages ensures that a valueable throughput of
> fetch can gained against slow hosts, but it clean all queues not slow queue.
> i suggest for this one that this factor like fetcher.max.exceptions.per.queue
> enforce to each queue not all of them.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira