[ https://issues.apache.org/jira/browse/NUTCH-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel resolved NUTCH-2729. ------------------------------------ Resolution: Fixed > protocol-okhttp: fix marking of truncated content > ------------------------------------------------- > > Key: NUTCH-2729 > URL: https://issues.apache.org/jira/browse/NUTCH-2729 > Project: Nutch > Issue Type: Bug > Components: plugin, protocol > Affects Versions: 1.15 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Minor > Fix For: 1.16 > > > The plugin protocol-okhttp marks content as "truncated" including the reason > for the truncation - content limit or time limit exceeded, network disconnect > during fetch. > The detection of truncation by content limit has one bug: if the fetched > content is exactly the size of the content limit the loop to request more > content is exited. It should be continued by requesting one byte more to > reliably detect whether content is truncated or not. > Note that the Content-Length header cannot be used to determine truncation > reliably: it does not indicate the real content length for compressed or > chunked content or it might be wrong. -- This message was sent by Atlassian Jira (v8.3.2#803003)