New submission from Simon Sapin: When given a file-like object, html5lib calls .read(0) in order to check if the result is bytes or Unicode:
https://github.com/html5lib/html5lib-python/blob/e269a2fd0aafcd83af7cf1e65bba65c0e5a2c18b/html5lib/inputstream.py#L434 When given the result of urllib.client.urlopen(), it parses an empty document because of this bug. Test case: >>> from urllib.request import urlopen >>> response = urlopen('http://python.org') >>> response.read(0) b'' >>> len(response.read()) 0 For comparison: >>> response = urlopen('http://python.org') >>> len(response.read()) 20317 The bug is here: http://hg.python.org/cpython/file/d489394a73de/Lib/http/client.py#l541 'if not n:' assumes that "zero bytes have been read" indicates EOF, which is not the case when we ask for zero bytes. ---------- messages: 206446 nosy: ssapin priority: normal severity: normal status: open title: .read(0) on http.client.HTTPResponse drops the rest of the content type: behavior _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20007> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com