Hello! Sorry for the delayed reply.
On 2020-04-04 Sebastian Andrzej Siewior wrote:
> I had an archive of ~35GiB which decompressed into ~80GiB and it took
> almost 20 minutes to do so. Then I was thinking if it would be
> possible to decompress it in parallel by feeding the individual
> blocks to the available CPUs.
> The patch at the bottom is a small C proof of concept so it is
> possible. I managed to decompress the same file in slightly over two
> minutes on system with 16 CPUs.
> Could this feature be merged in an improved way into the `xz' binary?
> Here are a few things I don't like
> - The tool forks `xz -lv' to get the list of blocks. I didn't find an
> API to get this information.
There is such an API in 5.3.1alpha and xz.git master branch
(lzma_file_info_decoder). xz --list uses that API. It hasn't been
tested much but it should work.
> - To decompress an individual block I create a new lzma_stream, feed
> the first few bytes from the image so it knows what it is and then
> feed the block. Once the block is done lzma_end() the stream and
> start over. It would be nice to create one stream for each CPU and
> then just reset the date after each block and reuse as much as
> possible of currently allocated memory.
While this approach works, it's not what I want to merge in xz, sorry.
The block headers can store the compressed and uncompressed sizes. xz
does this when compressing in threaded mode. The idea is that then
threaded decompression is possible in streamed mode (no need for
functionality like xz --list) so that xz can decompress from stdin to
It would be nice to have a threaded decompressor implemented inside
liblzma. The buffer-to-buffer API makes it a bit more annoying to do
than doing it outside liblzma (e.g. within xz using liblzma APIs) but
having it in liblzma would make the feature available to other
While not possible in liblzma, one advantage of your mmap-based approach
is that there are fewer intermediate buffers and that there is no limit
how badly out-of-order the blocks can finish. In contrast, the encoder
in liblzma allocates extra memory to allow certain level of out-of-order
completion while keeping all cores busy, and the same thing would need
be done in a decoder.
I'm aware that threaded decompression (and also some other features)
should have been implemented years ago. However, I haven't had energy
to do it.
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode