On Wed, Mar 23, 2022 at 7:31 PM Andres Freund <and...@anarazel.de> wrote: > I found this the following section in the manual [1]: > > ZSTD_c_nbWorkers=400, /* Select how many threads will be spawned to > compress in parallel. > * When nbWorkers >= 1, triggers asynchronous > mode when invoking ZSTD_compressStream*() : > * ZSTD_compressStream*() consumes input and > flush output if possible, but immediately gives back control to caller, > * while compression is performed in parallel, > within worker thread(s). > * (note : a strong exception to this rule is > when first invocation of ZSTD_compressStream2() sets ZSTD_e_end : > * in which case, ZSTD_compressStream2() > delegates to ZSTD_compress2(), which is always a blocking call). > * More workers improve speed, but also increase > memory usage. > * Default value is `0`, aka "single-threaded > mode" : no worker is spawned, > * compression is performed inside Caller's > thread, and all invocations are blocking */ > > "ZSTD_compressStream*() consumes input ... immediately gives back control" > pretty much confirms that.
I saw that too, but I didn't consider it conclusive. It would be nice if their documentation had a bit more detail on what's really happening. > Do we care about zstd's memory usage here? I think it's OK to mostly ignore > work_mem/maintenance_work_mem here, but I could also see limiting concurrency > so that estimated memory usage would fit into work_mem/maintenance_work_mem. I think it's possible that we want to do nothing and possible that we want to do something, but I think it's very unlikely that the thing we want to do is related to maintenance_work_mem. Say we soft-cap the compression level to the one which we think will fit within maintanence_work_mem. I think the most likely outcome is that people will not get the compression level they request and be confused about why that has happened. It also seems possible that we'll be wrong about how much memory will be used - say, because somebody changes the library behavior in a new release - and will limit it to the wrong level. If we're going to do anything here, I think it should be to limit based on the compression level itself and not based how much memory we think that level will use. But that leaves the question of whether we should even try to impose some kind of limit, and there I'm not sure. It feels like it might be overengineered, because we're only talking about users who have replication privileges, and if those accounts are subverted there are big problems anyway. I think if we imposed a governance system here it would get very little use. On the other hand, I think that the higher zstd compression levels of 20+ can actually use a ton of memory, so we might want to limit access to those somehow. Apparently on the command line you have to say --ultra -- not sure if there's a corresponding API call or if that's a guard that's built specifically into the CLI. > Perhaps it's related to the amounts of memory fed to ZSTD_compressStream2() in > one invocation? I recall that there's some differences between basebackup > client / serverside around buffer sizes - but that's before all the recent-ish > changes... That thought occurred to me too but I haven't investigated yet. -- Robert Haas EDB: http://www.enterprisedb.com