[ https://issues.apache.org/jira/browse/NUTCH-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel resolved NUTCH-2557. ------------------------------------ Resolution: Fixed Thanks, [~gbouchar] and [~omkar20895]! > protocol-http fails to follow redirections when an HTTP response body is > invalid > -------------------------------------------------------------------------------- > > Key: NUTCH-2557 > URL: https://issues.apache.org/jira/browse/NUTCH-2557 > Project: Nutch > Issue Type: Sub-task > Affects Versions: 1.14 > Reporter: Gerard Bouchar > Priority: Major > Fix For: 1.15 > > > If a server sends a redirection (3XX status code, with a Location header), > protocol-http tries to parse the HTTP response body anyway. Thus, if an error > occurs while decoding the body, the redirection is not followed and the > information is lost. Browsers follow the redirection and close the socket > soon as they can. > * Example: this page is a redirection to its https version, with an HTTP > body containing invalidly gzip encoded contents. Browsers follow the > redirection, but nutch throws an error: > ** [http://www.webarcelona.net/es/blog?page=2] > > The HttpResponse::getContent class can already return null. I think it should > at least return null when parsing the HTTP response body fails. > Ideally, we would adopt the same behavior as browsers, and not even try > parsing the body when the headers indicate a redirection. -- This message was sent by Atlassian JIRA (v7.6.3#76005)