Re: [xz-devel] [PATCH v2] liblzma: Add multi-threaded decoder

Sebastian Andrzej Siewior Wed, 27 Jan 2021 15:17:27 -0800

On 2021-01-24 23:56:15 [+0200], Lasse Collin wrote:
> Hello!
Hi,

> I haven't made much progress with this still, I'm sorry. :-( Below are
> comments about a few small details. It's not much but I will (slowly)
> keep reading and testing.


Thank you.

> (1) Segfault due to thr->outbuf == NULL
> 
> I changed CHUNK_SIZE to 1 to test corner cases. I used
> good-1-block_header-1.xz as the test file. It can segfault in
> worker_decoder() on the line calling thr->block_decoder.code(...)
> because thr->outbuf is NULL (so the problem was introduced in the outq
> patch). This happens because of "thr->outbuf = NULL;" later in the
> function.
> 
> It looks like that it marks the outbuf finished and returns the thread
> to the pool too early or forgets to set thr->state = THR_IDLE. As a
> temporary workaround, I added "thr->state = THR_IDLE;" after
> "thr->outbuf = NULL;".

I moved the logic to reset the in buffer only after LZMA_STREAM_END.

> (2) Block decoder must return LZMA_STREAM_END on success
> 
> Because of end marker and integrity check, the output buffer will be
> full before the last bytes of input have been processed by the Block
> decoder. Thus it is not enough to look at the input and output
> positions to determine when decoding has been finished; only
> LZMA_STREAM_END should be used to determine that decoding was
> successful.
> 
> In theory it is OK to mark the outbuf as finished once the output is
> full but for simplicity I suggest doing so (and returning the thread to
> the pool) only after LZMA_STREAM_END.
> 
> I committed a new test file bad-1-check-crc32-2.xz. The last byte in
> the Block (last byte of Check) is wrong. Change CHUNK_SIZE to 1 and try
> "xz -t -T2 file bad-1-check-crc32-2.xz". The file must be detected to
> be corrupt (LZMA_DATA_ERROR).
> 

I changed that as suggested.

> (3) Bad input where the whole input or output buffer cannot be used
> 
> In the old single-threaded decoding, lzma_code() will eventually return
> LZMA_BUF_ERROR if the calls to lzma_code() cannot make any progress,
> that is, no more input is consumed and no more output is produced. This
> condition can happen with correct code if the input file is corrupt in
> a certain way, for example, a truncated .xz file.
> 
> Since the no-progress detection is centralized in lzma_code(), the
> internal decoders including Block decoder don't try to detect this
> situation. Currently this means that worker_decoder() should detect it
> to catch bad input and prevent hanging on certain malformed Blocks.
> However, since the Block decoder knows both Compressed Size and
> Uncompressed Size, I think I will improve Block decoder instead so
> don't do anything about this for now.
> 
> I committed two test files, bad-1-lzma2-9.xz and bad-1-lzma2-10.xz. The
> -9 may make worker_decoder() not notice that the Block is invalid. The
> -10 makes the decoder hang. Like I said, I might fix these by changing
> the Block decoder.

So I updated the logic that if LZMA_OK is returned from the block
decoder and either the complete in-buffer or out-buffer has been
consumed then it returns LZMA_DATA_ERROR.

> (4) Usage of partial_update in worker_decoder()
> 
> Terminology: main mutex means coder->mutex alias thr->coder->mutex.
> 
> In worker_decoder(), the main mutex is locked every time there is new
> output available in the worker thread. partial_update is only used to
> determine when to signal thr->coder->cond.
> 
> To reduce contention on the main mutex, worker_decoder() could lock it
> only when
>   - decoding of the Block has been finished (successfully or
> unsuccessfully, that is, ret != LZMA_OK), or
>   - there is new output available and partial_update is true; if
> partial_update is false, thr->outbuf->pos is not touched.
> 
> This way only one worker will be frequently locking the main mutex.
> However, I haven't tried it and thus don't know how much this affects
> performance in practice. One possible problem might be that it may
> introduce a small delay in output availability when the main thread
> switches reading from the next outbuf in the list.

I did implement it as described, didn't I?

> (5) Use of mythread_condtime_set()
> 
> In the encoder the absolute time is calculated once per lzma_code()
> call. The comment in wait_for_work() in in stream_encoder_mt.c was
> wrong. The reason the absolute time is calculated once per lzma_code()
> call is to ensure that blocking multiple times won't make the timeout
> ineffective if each blocking takes less than timeout milliseconds. So
> it should be done similarly in the decoder.

oh, okay. I assumed it as in "waiting is okay but no longer than timeout
secs".
It is possible that at SEQ_INDEX time we wait less than `timeout' copy a
few bytes and then wait again less than `timeout' until the out-buffer
is full. But hey, we make progress ;)
At SEQ_BLOCK_HEADER we would wait only once and return on the second
iteration because we made progress. But yes, depending on how much has
been done earlier, we would spent maybe too much time in waiting.
Good. Let me change that.

> (6) Use of lzma_outq_enable_partial_output()
> 
> It should be safe to call it unconditionally:
> 
>     if (thr->outbuf == coder->outq.head)
>         lzma_outq_enable_partial_output(&coder->outq,
>                                         thr_do_partial_update);
> 
> If outq.head is something else, it is either already finished or
> partial output has already been enabled. In both cases
> lzma_outq_enable_partial_output() will do nothing.

Yeah, that is true. I tried to avoid a function call since it should
only be used on the first block since the following once would get
enabled after previous block was fully consumed. But yeah, let me change
that.

Let me retest this updated thing tomorrow and see how it goes.

Sebastian

Re: [xz-devel] [PATCH v2] liblzma: Add multi-threaded decoder

Reply via email to