On 2013-06-26 Alexander Clouter wrote:
> Attached is a patch that enables 'streaming' support for xz output,
> in short LZMA_SYNC_FLUSH is called every X milliseconds.

I like the idea.

The patch uses LZMA_SYNC_FLUSH after every X milliseconds even if all
read() calls are able to fill the buffer without blocking. A possible
alternative could be to flush when at least X milliseconds have passed
and read() gives EAGAIN. That is, don't flush as long as input is
coming faster than xz can compress it. I don't know if this is a good or
bad idea. It might mean much higher latency especially in threaded mode
(which doesn't support LZMA_SYNC_FLUSH yet).

A few other thoughts:

The timeout must be disabled when --list (MODE_LIST) is used.

gettimeofday() shouldn't fail as long as the first argument is sane and
the second argument is NULL, so there's no need to test the return
value (I hope).

It could be good to use clock_gettime(CLOCK_MONOTONIC, ...) when it is
available. It makes a difference if the system time jumps for some
reason. The threading code in liblzma uses it already so it's not a new
dependency. Currently message.c uses gettimeofday() and that could use
clock_gettime() too.

If select() gives EINTR, there should be a test for user_abort.
Otherwise if there is no input, xz won't react to SIGINT until the
timeout has expired.

I noticed that there is a race condition in signal handling in the
existing xz code. If e.g. SIGINT is sent after the value of user_abort
has been checked but before a blocking read() or write(), the read/write
will block and another signal is needed to make xz notice that
user_abort has been set. This affects the same code as your patch so I
think this should be fixed first.

Could signals be a good way to set a flag when to flush? It would allow
triggering flushing from another process. xz already supports
SIGUSR1/SIGINFO for to show progress info if --verbose wasn't used.

A possible problem is how to raise such signals within xz.
timer_create() and friends look nice but after checking a few OSes I
think they aren't portable enough. setitimer() could be more portable
but in practice it would mean using SIGALRM. Currently xz uses alarm()
for the progress indicator. Creating a thread solely for sending timer
signals should work, but I'm not sure I like that. Maybe just polling
the time like your patch does is the way to go.

> The patch is for 5.0.0 (what is currently in Debian
> 'oldstable/squeeze') but if the community likes the look of the
> patch, I can roll a version for whatever is at the HEAD of the git
> tree.

It won't apply directly because there's new code that uses
LZMA_FULL_FLUSH. But let's not worry about it until I have fixed the
race condition with signals and user_abort. It may need select() or
poll(), which may then be used to implement flushing too.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Reply via email to