[
https://issues.apache.org/jira/browse/NUTCH-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783247#action_12783247
]
Julien Nioche commented on NUTCH-769:
-------------------------------------
Missed a couple of lines indeed when I was trying to untangle this
functionality from my (heavily modified) local copy.
checkExceptionThreshold is called after the line 664
case ProtocolStatus.EXCEPTION:
logError(fit.url, status.getMessage());
int killedURLs =
fetchQueues.checkExceptionThreshold(fit.getQueueID());
reporter.incrCounter("FetcherStatus", "Exceptions", killedURLs);
I'll attach a modified version of the patch
Thanks
J.
> Fetcher to skip queues for URLS getting repeated exceptions
> -------------------------------------------------------------
>
> Key: NUTCH-769
> URL: https://issues.apache.org/jira/browse/NUTCH-769
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Reporter: Julien Nioche
> Priority: Minor
> Attachments: NUTCH-769-2.patch, NUTCH-769.patch
>
>
> As discussed on the mailing list (see
> http://www.mail-archive.com/nutch-u...@lucene.apache.org/msg15360.html) this
> patch allows to clear URLs queues in the Fetcher when more than a set number
> of exceptions have been encountered in a row. This can speed up the fetching
> substantially in cases where target hosts are not responsive (as a
> TimeoutException would be thrown) and limits cases where a whole Fetch step
> is slowed down because of a few queues.
> by default the parameter fetcher.max.exceptions.per.queue has a value of -1
> and is deactivated.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.