Re: [xz-devel] Parallel decompression support

2020-05-11 Thread Lasse Collin
(Note that you need to use the same address as sender that was used to
subscribe to the list.)

On 2020-04-19 Sebastian Andrzej Siewior wrote:
> So (if I understood you correctly) what you suggest is to feed the
> liblzma API and the lib then will fire up threads based on new
> blocks. That means for a default compressed file (-6, 24MiB block
> size) each thread will allocate 24MiB as a buffer where it stores the
> output. Once the previous thread is complete, it will be able to
> save/write its data.

Yes. See how the threaded compression API works. This makes it easy for
applications to use threaded (de)compression and it works for streamed
(de)compression which is a common use case.

Note that some extra memory needs to be allocated because blocks can
finish out of order. If the code assumes that blocks finish strictly in
order, the worker threads won't be busy all the time.

With decompression one has to decide how much memory can be used by
default. If there is no limit, in the extreme case a decoder could read
the whole input file in RAM and allocate output buffer for the whole
uncompressed file. This problem doesn't exist in your mmap (or pread)
approach.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode



Re: [xz-devel] Parallel decompression support

2020-04-18 Thread Lasse Collin
Hello! Sorry for the delayed reply.

On 2020-04-04 Sebastian Andrzej Siewior wrote:
> I had an archive of ~35GiB which decompressed into ~80GiB and it took
> almost 20 minutes to do so. Then I was thinking if it would be
> possible to decompress it in parallel by feeding the individual
> blocks to the available CPUs.
> 
> The patch at the bottom is a small C proof of concept so it is
> possible. I managed to decompress the same file in slightly over two
> minutes on system with 16 CPUs.
> 
> Could this feature be merged in an improved way into the `xz' binary?
> Here are a few things I don't like
> - The tool forks `xz -lv' to get the list of blocks. I didn't find an
>   API to get this information.

There is such an API in 5.3.1alpha and xz.git master branch
(lzma_file_info_decoder). xz --list uses that API. It hasn't been
tested much but it should work.

> - To decompress an individual block I create a new lzma_stream, feed
>   the first few bytes from the image so it knows what it is and then
>   feed the block. Once the block is done lzma_end() the stream and
>   start over. It would be nice to create one stream for each CPU and
>   then just reset the date after each block and reuse as much as
>   possible of currently allocated memory.

While this approach works, it's not what I want to merge in xz, sorry.

The block headers can store the compressed and uncompressed sizes. xz
does this when compressing in threaded mode. The idea is that then
threaded decompression is possible in streamed mode (no need for
functionality like xz --list) so that xz can decompress from stdin to
stdout.

It would be nice to have a threaded decompressor implemented inside
liblzma. The buffer-to-buffer API makes it a bit more annoying to do
than doing it outside liblzma (e.g. within xz using liblzma APIs) but
having it in liblzma would make the feature available to other
applications too.

While not possible in liblzma, one advantage of your mmap-based approach
is that there are fewer intermediate buffers and that there is no limit
how badly out-of-order the blocks can finish. In contrast, the encoder
in liblzma allocates extra memory to allow certain level of out-of-order
completion while keeping all cores busy, and the same thing would need
be done in a decoder.

I'm aware that threaded decompression (and also some other features)
should have been implemented years ago. However, I haven't had energy
to do it.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode