Thanks for working on this, Corey. On Fri, Mar 14, 2025 at 04:03:16PM -0400, Corey Huinker wrote: > 0003 - > > Storing the restore function calls in the archive entry hogged a lot of > memory and made people nervous. This introduces a new function pointer that > generates those restore SQL calls right before they're written to disk, > thus reducing the memory load from "stats for every object to be dumped" to > just one object. Thanks to Nathan for diagnosing some weird quirks with > various formats. > > 0004 - > > This replaces the query in the prepared statement with one that batches > them 100 relations at a time, and then maintains that result set until it > is consumed. It seems to have obvious speedups.
I've been doing a variety of tests with my toy database of 100K relations [0], and I'm seeing around 20% less memory usage. That's still 20% more than without stats, but that's still a pretty nice improvement. I'd propose two small changes to the design: * I tested a variety of batch sizes, and to my suprise, I saw the best results with around 64 relations per batch. I imagine the absolute best batch size will vary greatly depending on the workload. It might also depend on work_mem and friends. * The custom format actually does two WriteToc() calls, and since these patches move the queries to this part of pg_dump, it means we'll run all the queries twice. The comments around this code suggest that the second pass isn't strictly necessary and that it is really only useful for data/parallel restore, so we could probably skip it for no-data dumps. With those two changes, a pg_upgrade-style dump of my test database goes from ~21.6 seconds without these patches to ~11.2 seconds with them. For reference, the same dump without stats takes ~7 seconds on HEAD. [0] https://postgr.es/m/Z9R9-mFbxukqKmg4%40nathan -- nathan