Esa Peuha added the comment:
This code
import _lzma
with open('22h_ticks_bad.bi5', 'rb') as f:
infile = f.read()
for i in range(8191, 8195):
decompressor = _lzma.LZMADecompressor()
first_out = decompressor.decompress(infile[:i])
first_len = len(first_out)
last_out = decompressor.decompress(infile[i:])
last_len = len(last_out)
print(i, first_len, first_len + last_len, decompressor.eof)
prints this
8191 36243 45480 True
8192 36251 45473 False
8193 36253 45475 False
8194 36260 45480 True
It seems to me that this is a subtle bug in liblzma; if the input stream to the
incremental decompressor is broken at the wrong place, the internal state of
the decompressor is corrupted. For this particular file, it happens when the
break occurs after reading 8192 or 8193 bytes, and lzma.py happens to use a
buffer of 8192 bytes. There is nothing wrong with the compressed file, since
lzma.py decompresses it correctly if the buffer size is set to almost any other
value.
----------
nosy: +Esa.Peuha
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue21872>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com