On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote: > I ended up writing several patches that shaved some time for pg_restore -l, > and reduced the toc.dat size.
I've only just started taking a look at these patches, and I intend to do a more thorough review in the hopefully-not-too-distant future. > First patch is "finishing" the job of removing has oids support. When this > support was removed, instead of dropping the field from the dumps and > increasing the dump versions, the field was kept as is. This field stores a > boolean as a string, "true" or "false". This is not free, and requires 10 > bytes per toc entry. This sounds reasonable to me. I wonder why this wasn't done when WITH OIDS was removed in v12. > The second patch removes calls to sscanf and replaces them with strtoul. This > was the biggest speedup for pg_restore -l. Nice. > The third patch changes the dump format further to remove these strtoul calls > and store the integers as is instead. Do we need to worry about endianness here? > The fourth patch is dirtier and does more changes to the dump format. Instead > of storing the owner, tablespace, table access method and schema of each > object as a string, pg_dump builds an array of these, stores them at the > beginning of the file and replaces the strings with integer fields in the > dump. > This reduces the file size further, and removes a lot of calls to ReadStr, > thus > saving quite some time. This sounds promising. > Patch Toc size Dump -s duration pg_restore -l duration > HEAD 214M 23.1s 1.27s > #1 (has oid) 210M 22.9s 1.26s > #2 (scanf) 210M 22.9s 1.07s > #3 (no strtoul) 202M 22.8s 0.94s > #4 (string list) 181M 23.1s 0.87s At a glance, the size improvements in 0004 look the most interesting to me. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com