On 2020-12-12 Sebastian Andrzej Siewior wrote:
> This is WIP, the decoder appears to work based on:
> 
> |$ xz -dv < buster-pl.xz | openssl sha1
> |  100 %         10,2 GiB / 40,0 GiB = 0,255   114 MiB/s       6:00
> |(stdin)= 5eb4e2a3ce2253a6ec3fc86ee7ad8db0a5395959
> |
> |vs
> |
> |$ ./src/xz/.libs/xz -dv < buster-pl.xz | openssl sha1
> |  100 %         10,2 GiB / 40,0 GiB = 0,255   815 MiB/s       0:50
> |  (stdin)= 5eb4e2a3ce2253a6ec3fc86ee7ad8db0a5395959

Looks promising. :-)

> Parts of the mt-decoder are copied from the other decoder. Not sure if
> this is good or should be merged somehow with the single threaded
> decoder.

It's good to first make a separate threaded decoder. Once it is
finished, merging the two can be considered if it looks straightforward
enough. The amount of duplicated code isn't that large anyway.

> Threads, which finished decoding, remain idle until their output
> buffer has been fully consumed. The output buffer once allocated
> remains allocated until the thread is cleaned up. This saved 5 secs
> in the example above compared to freeing the buffer once the buffer
> was fully consumed and allocating it again once there is new data.
> The input buffer is freshly allocated for each block since they vary
> in size in general.

Yes, reusing buffers and encoder/decoder states can be useful (fewer
page faults). Perhaps even the input buffer could be reused if it is OK
to waste some memory and it makes a difference in speed.

> I made my own output queue since the output size is known. I have no
> idea if this is good or if it would be better to use lzma_outq
> instead.

The current lzma_outq isn't flexible enough for a decoder. It's a bit
primitive even for encoding: it works fine but it wastes a little
memory. However, since the LZMA encoder needs a lot of memory anyway,
the overall difference is around (or under) 10 % which likely doesn't
matter too much.

The idea of lzma_outq is to have a pool for output buffers that is
separate from the pool of worker threads. Different data takes
different amount of time to compress. The separate pools allow Blocks
to finish out of order and reusing worker threads immediately as long
as there is enough extra buffer space in the output queue. This is an
important detail for encoder performance (to prevent idle threads) and
with a quick try it seems it might help with decoding too. The
significance depends a lot on the data, of course.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Reply via email to