New submission from Michael Del Monte:

Initially reported at https://github.com/kennethreitz/requests/issues/2622

Closely related to http://bugs.python.org/issue19996

An HTTP response with an invalid header line that contains non-blank characters 
but *no* colon (contrast http://bugs.python.org/issue19996 in which it 
contained a colon as the first character) causes the same behavior.

httplib.HTTPMessage.readheaders() oddly does not appear even to attempt to 
follow RFC 2616, which requires the header to terminate with a blank line.  The 
invalid header line, which admittedly also breaks RFC 2616, is at least 
non-blank and should not terminate the header.  Yet readheaders() takes it as 
an indicator that the header is over and then fails properly to process the 
rest of the response.

The problem is exacerbated by a chunked encoding, which will not be properly 
received if the encoding header is not seen because readheaders() terminates 
early.  An example (why are banks always the miscreants here?) is:

p = response.get("http://www.merrickbank.com/";)

My recommended fix would be to insert these lines at httplib:327

                # continue reading headers on non-blank lines
                elif not len(line.strip()):
                    continue
                # break only on blank lines


This would cause readheaders() to terminate only on a non-blank non-header 
non-comment line, in accordance with RFC 2616.

----------
components: Library (Lib)
messages: 244672
nosy: mgdelmonte
priority: normal
severity: normal
status: open
title: httplib fails to handle semivalid HTTP headers
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24363>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to