New submission from Hiroshi Miura <miur...@linux.com>:

When decompressing a particular archive, result become truncated a last word. 
A test data attached is uncompressed size is 12800 bytes, and compressed using 
LZMA1+BCJ algorithm into 11327 bytes.
The data is a payload of a 7zip archive.

Here is a pytest code to reproduce it.


:: code-block::

    def test_lzma_raw_decompressor_lzmabcj():
        filters = []
        filters.append({'id': lzma.FILTER_X86})
        filters.append(lzma._decode_filter_properties(lzma.FILTER_LZMA1, 
b']\x00\x00\x01\x00'))
        decompressor = lzma.LZMADecompressor(format=lzma.FORMAT_RAW, 
filters=filters)
        with testdata_path.joinpath('lzmabcj.bin').open('rb') as infile:
            out = decompressor.decompress(infile.read(11327))
        assert len(out) == 12800


test become failure that len(out) become 12796 bytes, which lacks last 4 bytes, 
which should be b'\x00\x00\x00\x00'
When specifying  a filters  as a single LZMA1 decompression,  I got an expected 
length of data, 12800 bytes.(*1)

When creating a test data with LZMA2+BCJ and examines it, I got an expected 
data.
When specifying a filters as a single LZMA2 decompression against LZMA2+BCJ 
payload, a result is perfectly as same as (*1) data.
It indicate us that a pipeline of LZMA1/LZMA2 --> BCJ is in doubt. 


After investigation and understanding that _lzmamodule.c is a thin wrapper of 
liblzma, I found the problem can be reproduced in liblzma.
I've reported it to upstream xz-devel ML with a test code 
https://www.mail-archive.com/xz-devel@tukaani.org/msg00370.html

----------
components: Extension Modules
files: lzmabcj.bin
messages: 373008
nosy: miurahr
priority: normal
severity: normal
status: open
title: LZMADecompressor.decompress(FORMAT_RAW) truncate output when input is 
paticular LZMA+BCJ  data
versions: Python 3.6, Python 3.7, Python 3.8, Python 3.9
Added file: https://bugs.python.org/file49296/lzmabcj.bin

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41210>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to