[ 
https://issues.apache.org/jira/browse/NUTCH-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2729.
------------------------------------
    Resolution: Fixed

> protocol-okhttp: fix marking of truncated content
> -------------------------------------------------
>
>                 Key: NUTCH-2729
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2729
>             Project: Nutch
>          Issue Type: Bug
>          Components: plugin, protocol
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.16
>
>
> The plugin protocol-okhttp marks content as "truncated" including the reason 
> for the truncation - content limit or time limit exceeded, network disconnect 
> during fetch.
> The detection of truncation by content limit has one bug: if the fetched 
> content is exactly the size of the content limit the loop to request more 
> content is exited. It should be continued by requesting one byte more to 
> reliably detect whether content is truncated or not.
> Note that the Content-Length header cannot be used to determine truncation 
> reliably: it does not indicate the real content length for compressed or 
> chunked content or it might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to