Re: multithreaded zstd backup compression for client and server

Andres Freund Wed, 23 Mar 2022 16:32:04 -0700

Hi,

On 2022-03-23 18:31:12 -0400, Robert Haas wrote:
> On Wed, Mar 23, 2022 at 5:14 PM Andres Freund <and...@anarazel.de> wrote:
> > The most likely source of problem would errors thrown while zstd threads are
> > alive. Should make sure that that can't happen.
> >
> > What is the lifetime of the threads zstd spawns? Are they tied to a single
> > compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
> > is our code ensuring that we don't leak such contexts?
> 
> I haven't found any real documentation explaining how libzstd manages
> its threads. I am assuming that it is tied to the ZSTD_CCtx, but I
> don't know. I guess I could try to figure it out from the source code.


I found this the following section in the manual [1]:

    ZSTD_c_nbWorkers=400,    /* Select how many threads will be spawned to 
compress in parallel.
                              * When nbWorkers >= 1, triggers asynchronous mode 
when invoking ZSTD_compressStream*() :
                              * ZSTD_compressStream*() consumes input and flush 
output if possible, but immediately gives back control to caller,
                              * while compression is performed in parallel, 
within worker thread(s).
                              * (note : a strong exception to this rule is when 
first invocation of ZSTD_compressStream2() sets ZSTD_e_end :
                              *  in which case, ZSTD_compressStream2() 
delegates to ZSTD_compress2(), which is always a blocking call).
                              * More workers improve speed, but also increase 
memory usage.
                              * Default value is `0`, aka "single-threaded 
mode" : no worker is spawned,
                              * compression is performed inside Caller's 
thread, and all invocations are blocking */

"ZSTD_compressStream*() consumes input ... immediately gives back control"
pretty much confirms that.


Do we care about zstd's memory usage here? I think it's OK to mostly ignore
work_mem/maintenance_work_mem here, but I could also see limiting concurrency
so that estimated memory usage would fit into work_mem/maintenance_work_mem.



> It's probably also worth mentioning here that even if, contrary to
> expectations, the compression threads hang around to the end of time
> and chill, in practice nobody is likely to run BASE_BACKUP and then
> keep the connection open for a long time afterward. So it probably
> wouldn't really affect resource utilization in real-world scenarios
> even if the threads never exited, as long as they didn't, you know,
> busy-loop in the background. And I assume the actual library behavior
> can't be nearly that bad. This is a pretty mainstream piece of
> software.

I'm not really worried about resource utilization, more about the existence of
threads moving us into undefined behaviour territory or such. I don't think
that's possible, but it's IIRC UB to fork() while threads are present and do
pretty much *anything* other than immediately exec*().


> > > but that's not to say that there couldn't be problems.  I worry a bit that
> > > the mere presence of threads could in some way mess things up, but I don't
> > > know what the mechanism for that would be, and I don't want to postpone
> > > shipping useful features based on nebulous fears.
> >
> > One thing that'd be good to tests for is cancelling in-progress server-side
> > compression.  And perhaps a few assertions that ensure that we don't escape
> > with some threads still running. That'd have to be platform dependent, but I
> > don't see a problem with that in this case.
> 
> More specific suggestions, please?

I was thinking of doing something like calling pthread_is_threaded_np() before
and after the zstd section and erroring out if they differ. But I forgot that
that's on mac-ism.


> > > For both parallel and non-parallel zstd compression, I see differences
> > > between the compressed size depending on where the compression is
> > > done. I don't know whether this is an expected behavior of the zstd
> > > library or a bug. Both files uncompress OK and pass pg_verifybackup,
> > > but that doesn't mean we're not, for example, selecting different
> > > compression levels where we shouldn't be. I'll try to figure out
> > > what's going on here.
> > >
> > > zstd, client-side: 1.7GB, 17 seconds
> > > zstd, server-side: 1.3GB, 25 seconds
> > > parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
> > > parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds
> >
> > What causes this fairly massive client-side/server-side size difference?
> 
> You seem not to have read what I wrote about this exact point in the
> text which you quoted.

Somehow not...

Perhaps it's related to the amounts of memory fed to ZSTD_compressStream2() in
one invocation? I recall that there's some differences between basebackup
client / serverside around buffer sizes - but that's before all the recent-ish
changes...

Greetings,

Andres Freund

[1] http://facebook.github.io/zstd/zstd_manual.html

Re: multithreaded zstd backup compression for client and server

Reply via email to