New submission from Alexander Belopolsky: This problem happens when I unpack a file from a 200+ MB zip archive as follows:
with zipfile.ZipFile(archive) as z: data = b'' with z.open(filename, 'rU') as f: for line in f: data += line I cannot reduce it to a test case suitable for posting here, but the culprit is the following code in zipfile.py: def peek(self, n=1): """Returns buffered bytes without advancing the position.""" if n > len(self._readbuffer) - self._offset: chunk = self.read(n) self._offset -= len(chunk) See http://hg.python.org/cpython/file/81f8375e60ce/Lib/zipfile.py#l605 The problem occurs when peek() is called on the boundary of the uncompress buffer and read() goes through more than one readbuffer. The result is that self._offset is smaller than len(chunk) leading to a non-sensical negative self._offset upon return from peek(). This problem does not seem to appear in 3.x since 028e8e0b03e8. ---------- messages: 206779 nosy: belopolsky priority: normal severity: normal status: open title: zipfile's readline() drops data in universal newline mode versions: Python 2.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20048> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com