[issue41210] Docs: More description of reason about LZMA1 data handling with FORMAT_ALONE

2020-08-02 Thread Hiroshi Miura


Hiroshi Miura  added the comment:

Here is a draft of additional text


Usage of :const:`FILTER_LZMA1` with :const:`FORMAT_RAW` is not recommended.
Because it may produce a wrong output in a certain condition, decompressing 
a combination of :const:`FILTER_LZMA1` and BCJ filters in :const:`FORMAT_RAW`.
It is because LZMA1 format sometimes lacks End of Stream (EOS) mark that
lead BCJ filters can not be flushed.


I've tried to write without a description of liblzma implementation, but only a 
nature of API and file format specification.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41210] Docs: More description of reason about LZMA1 data handling with FORMAT_ALONE

2020-07-13 Thread Janae


Janae  added the comment:

Here is a BCJ only CFFI test project.
https://apkgreat.com/fifa-14-apk/
All works are very interesting. thanks, to post and your works.

--
nosy: +Janae147

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41210] Docs: More description of reason about LZMA1 data handling with FORMAT_ALONE

2020-07-13 Thread Ma Lin


Ma Lin  added the comment:

It is better to raise a warning when using problematic combination.

But IMO either "raising a warning" or "adding more description to doc" is too 
dependent on the implementation detail of liblzma.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41210] Docs: More description of reason about LZMA1 data handling with FORMAT_ALONE

2020-07-12 Thread Hiroshi Miura


Hiroshi Miura  added the comment:

Lasse Collin gives me explanation of LZMA1 data format and suggestion how to 
implement.

I'd like to change an issue to a documentation issue to add more description 
about limitation on FORMAT_ALONE and LZMA1.

A suggestion from Lasse is as follows:

> liblzma cannot be used to decode data from .7z files except in certain
> cases. This isn't a bug, it's a missing feature.
>
> The raw encoder and decoder APIs only support streams that contain an
> end of payload marker (EOPM) alias end of stream (EOS) marker. .7z
> files use LZMA1 without such an end marker. Instead, the end is handled
> by the decoder knowing the exact uncompressed size of the data.
>
> The API of liblzma supports LZMA1 without end marker via
> lzma_alone_decoder(). That API can be abused to properly decode raw
> LZMA1 with known uncompressed size by feeding the decoder a fake 13-byte
> header. Everything else in the public API requires some end marker.
>
> Decoding LZMA1 without BCJ or other extra filters from .7z with
> lzma_raw_decoder() kind of works but you will notice that it will never
> return LZMA_STREAM_END, only LZMA_OK. This is because it will never see
> an end marker. A minor downside is that it won't then do a small
> integrity check at the end either (one variable in the range decoder
> state will be 0 at the end of any valid LZMA1 stream);
> lzma_alone_decoder() does this check even when end marker is missing.
>
> If you use lzma_raw_decoder() for end-markerless LZMA1, make sure that
> you never give it more output space than the real uncompressed size. In
> rare cases this could result in extra output or an error since the
> decoder would try to decode more output using the input it has gotten
> so far. Overall I think the hack with lzma_alone_decoder() is a better
> way with the current API.
>
> BCJ filters process the input data in chunks of a few bytes long, thus
> they need to hold a few bytes of look-ahead buffer. With some filters
> like ARM the look-ahead is aligned and if the uncompressed size is a
> multiple of this alignment, lzma_raw_decoder() will give you all the
> data even when the LZMA1 layer doesn't have an end marker. The x86
> filter has one-byte alignment but needs to see five bytes from the
> future before producing output. When LZMA1 layer doesn't return
> LZMA_STREAM_END, the x86 filter doesn't know that the end was reached
> and cannot flush the last bytes out.
>
> Using liblzma to decode .7z works in these cases:
>
> - LZMA1 using a fake 13-byte header with lzma_alone_decoder():
>
> 1 byte LZMA properties byte that encodes lc/lp/pb
> 4 bytes dictionary size as little endian uint32_t
> 8 bytes uncompressed size as little endian uint64_t;
> UINT64_MAX means unknown and then (and only then)
> EOPM must be present

--
title: LZMADecompressor.decompress(FORMAT_RAW) truncate output when input is 
paticular LZMA+BCJ  data -> Docs: More description of reason about LZMA1 data 
handling with FORMAT_ALONE

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com