[issue20007] .read(0) on http.client.HTTPResponse drops the rest of the content

Simon Sapin Tue, 17 Dec 2013 06:30:48 -0800

New submission from Simon Sapin:

When given a file-like object, html5lib calls .read(0) in order to check if the 
result is bytes or Unicode:


https://github.com/html5lib/html5lib-python/blob/e269a2fd0aafcd83af7cf1e65bba65c0e5a2c18b/html5lib/inputstream.py#L434

When given the result of urllib.client.urlopen(), it parses an empty document 
because of this bug.

Test case:

>>> from urllib.request import urlopen
>>> response = urlopen('http://python.org')
>>> response.read(0)
b''
>>> len(response.read())
0

For comparison:

>>> response = urlopen('http://python.org')
>>> len(response.read())
20317

The bug is here:

http://hg.python.org/cpython/file/d489394a73de/Lib/http/client.py#l541

'if not n:' assumes that "zero bytes have been read" indicates EOF, which is 
not the case when we ask for zero bytes.

----------
messages: 206446
nosy: ssapin
priority: normal
severity: normal
status: open
title: .read(0) on http.client.HTTPResponse drops the rest of the content
type: behavior

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20007>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20007] .read(0) on http.client.HTTPResponse drops the rest of the content

Reply via email to