On 2021-12-31 Sebastian Andrzej Siewior wrote: > On 2021-12-15 23:33:58 [+0200], Lasse Collin wrote: > > Yes. It's fairly simple from implementation point of view but is it > > clear enough for the users, I'm not sure. > > > > I suppose the alternative is having just one limit value and a flag > > to tell if it is a soft limit (so no limit for single-threaded > > case) or a hard limit (return LZMA_MEM_ERROR if too low for even > > single thread). Having separate soft and hard limits instead can > > achieve the same and a little more, so I think I'll choose the > > two-value approach and hope it's clear enough for users. > > The value approach might work. I'm not sure if the term `soft' and > `hard' are good here. Using `memlimit' and `memlimit_threaded' (or so) > might make more obvious and easier to understand. > But then this just some documentation that needs to be read and > understood so maybe `softlimit' and `hardlimit' will work just fine.
I now plan to use memlimit_threading and memlimit_stop in the lzma_mt structure. Documentation is still needed but hopefully those are a bit more obvious. > > I was hoping to get this finished by Christmas but due to a recent > > sad event, late January is my target for the next alpha release > > now. And I'm late again. :-( This is more work than I had expected because there unfortunately are a few problems in the code and fixing them all requires quite significant changes (and I'm slow). As a bonus, working on this made me notice a few small bugs in the old liblzma code too (not yet committed). The following tries to explain some of the problems and what I have done locally. I don't have code to show yet because it still contains too many small FIXMEs but, as unbelievable as it might sound, this will get done. I need a few more days; I have other things I must do too. The biggest issue is handling of memory usage and threaded vs. direct mode. The memory usage limiting code makes assumptions that are true with the most common files but there are situations where these assumptions fail: (1) If a non-first Block requires a lot more memory than the first Block and so the memory limit would be exceeded in threaded mode, the decoder will not switch to direct mode even with LZMA_MEMLIMIT_COMPLETE. Instead the decoder proceeds with one thread and uses as much memory as that needs. (2) If a non-first Block lacks size info in its Block Header, the decoder won't switch to direct mode. It returns LZMA_PROG_ERROR instead. (3) The per-thread input buffers can grow as bigger Blocks are seen but the buffers cannot shrink. This has pros and cons. It's a problem if a single Block is very big and others are not. I thought it's better to first decode the Block Header to coder->block_options and then, based on the facts from that Block Header, determine memory usage and how to proceed (including switching to/from direct mode). This way there is no need to assume or expect anything. (coder->block_options need to be copied to a thread-specific structure before initializing the decoder.) For direct mode, I added separate SEQ states for it. This also helps making the code more similar to the single-threaded decoder in both looks and behavior. I hope that with memlimit_threading = 0 the threaded version can have identical externally-visible behavior as the original single-threaded version. This way xz doesn't need both functions (the single-threaded function is still needed if built with --disable-threads). Corner cases of the buffer-to-buffer API: (4) In some use cases there might be long pauses where no new input is available (for example, sending a live log file over network with compression). It is essential that the decoder will still provide all output that is easily possible from the input so far. That is, if the decoder was called without providing any new input, it might need to be handled specially. SEQ_BLOCK_HEADER and SEQ_INDEX return immediately if the application isn't providing any new input data, and so eventually lzma_code() will return LZMA_BUF_ERROR even when there would be output available from the worker threads. try_copy_decoded() could be called earlier but there is more to fix (see (5) and (6)). (Also remember my comment above that I changed the code so that Block Header is decoded first before getting a thread. That adds one more SEQ point where waiting for output is needed.) (5) The decoder must work when the application provides an output buffer whose size is exactly the uncompressed size of the file. This means that one cannot simply use *out_pos == out_size to determine when to return LZMA_OK. Perhaps the decoder hasn't marked its lzma_outbuf as finished but no more output will be coming, or there is an empty Block (empty Blocks perhaps shouldn't have been allowed in the format at all but it's too late for that). In short, instead of *out_pos == out_size one has to check !lzma_outq_is_empty() && lzma_outq_is_readable() after the call to lzma_outq_read() to determine if more output space is truly required. try_copy_decoded() and SEQ_INDEX have this problem but fixing is not complicated. Race condition in SEQ_BLOCK_HEADER: ret = get_thread(coder, allocator); if (ret != LZMA_OK) return ret; if (!coder->thr_write) { // No out buffer but making progress? if (start_out_pos != *out_pos) return LZMA_OK; mythread_mutex_lock(&coder->mutex); if (!lzma_outq_is_readable(&coder->outq)) ret = wait_cond_progress(coder, &wait_abs); mythread_mutex_unlock(&coder->mutex); The call to get_thread() locks coder->mutex for a while. If getting a thread fails and no output has been produced, the mutex is locked again to wait on coder->cond if no output is currently readable from the output queue. It is possible that a thread finishes and puts itself to coder->threads_free after the call to get_thread() fails but before the mutex is locked again. Thus, it's possible that get_thread() would now succeed but the code will wait on coder->cond anyway. In practice this is only a minor issue as there will likely be output fairly soon, but it may sometimes add a small unneeded delay. To fix it, it should check for all conditions that would allow continuing. I did it by creating a function that contains the functionality of try_copy_decoded() and checking for the conditions to start a Block, all while keeping coder->mutex locked. This way the conditions to start a new Block need to be checked in one place only. This helper function also made it easy to fix (4) and (5). The sizes from Block Header are untrusted so one has to be paranoid with them. This means checking for integer overflows when doing addition. Also, since the sizes are 63-bit (lzma_vli is uint64_t but valid known values fit into 63 bits), assigning these to a size_t needs care to avoid truncation on 32-bit systems. These are easy to forget but forgetting can cause security issues. I added support for lzma_get_progress(). It is needed to make xz -v progress indicator useful. Other news: Gitweb on git.tukaani.org is currently broken. I'll get it fixed in a few days. The repositories still work. -- Lasse Collin