New submission from Julian Taylor: Probably a case of 'don't do that' but reading lines in a compressed files in binary mode produces bytes with invalid newlines in encodings that where '\n' is encoded as something else:
with lzma.open("test.xz", "wt", encoding="UTF-32-LE") as f: f.write('0 1 2\n3 4 5'); lzma.open("test.xz", "rb").readlines()[0].decode('UTF-32-LE') Fails with: UnicodeDecodeError: 'utf-32-le' codec can't decode byte 0x0a in position 20: truncated data as readlines() produces: b'0\x00\x00\x00 \x00\x00\x001\x00\x00\x00 \x00\x00\x002\x00\x00\x00\n' The last newline should be '\n'.encode('UTF-32-LE') == b'\n\x00\x00\x00' ---------- components: Library (Lib) messages: 291661 nosy: jtaylor priority: normal severity: normal status: open title: binary compressed file reading corrupts newlines (lzma, gzip, bz2) _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30073> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com