[ 
https://issues.apache.org/jira/browse/NUTCH-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerard Bouchar updated NUTCH-2564:
----------------------------------
    Description: 
When a server sends an invalid Content-Length header (one that is not a valid 
number) with a plain-text http body, browsers simply ignore it, but 
protocol-http has a strange approach: if the header is composed only of white 
spaces, it ignores it, but if it contains other characters, it throws an error, 
preventing us from doing anything with the page.

It should simply ignore invalid Content-Length headers.

 

Relevant code: 
[https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L354-L359]

 

  was:
When a server sends an invalid Content-Length header (one that is not a valid 
number) with a plain-text http body, browsers simply ignore it, but 
protocol-http has a strange approach: if the header is composed only of white 
spaces, it ignores it, but if it contains other characters, it throws an error, 
preventing us from doing anything with the page.

 

If the HTTP body is chunked, protocol-http always ignores the Content-Length 
header, be it invalid or not.

 

It should simply ignore invalid Content-Length headers.

 

Relevant code: 
[https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L354-L359]

 


> protocol-http throws an error when the content-length header is not a number
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-2564
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2564
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Gerard Bouchar
>            Priority: Major
>
> When a server sends an invalid Content-Length header (one that is not a valid 
> number) with a plain-text http body, browsers simply ignore it, but 
> protocol-http has a strange approach: if the header is composed only of white 
> spaces, it ignores it, but if it contains other characters, it throws an 
> error, preventing us from doing anything with the page.
> It should simply ignore invalid Content-Length headers.
>  
> Relevant code: 
> [https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L354-L359]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to