On Monday, September 18, 2023 11:52:47 PM CEST Nathan Bossart wrote: > On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote: > > I ended up writing several patches that shaved some time for pg_restore > > -l, > > and reduced the toc.dat size. > > I've only just started taking a look at these patches, and I intend to do a > more thorough review in the hopefully-not-too-distant future.
Thank you very much. > Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this > to waiting-on-author. I did not notice anything running meson test -v, I'll look further into it in the next days. > > First patch is "finishing" the job of removing has oids support. When this > > support was removed, instead of dropping the field from the dumps and > > increasing the dump versions, the field was kept as is. This field stores > > a > > boolean as a string, "true" or "false". This is not free, and requires 10 > > bytes per toc entry. > > This sounds reasonable to me. I wonder why this wasn't done when WITH OIDS > was removed in v12. I suppose it is an oversight, or not wanting to increase the dump version for no reason. > > The second patch removes calls to sscanf and replaces them with strtoul. > > This was the biggest speedup for pg_restore -l. > > Nice. > > > The third patch changes the dump format further to remove these strtoul > > calls and store the integers as is instead. > > Do we need to worry about endianness here? I used the ReadInt/WriteInt functions already defined in pg_dump that take care of this issue, so there should be no need to worry. > > The fourth patch is dirtier and does more changes to the dump format. > > Instead of storing the owner, tablespace, table access method and schema > > of each object as a string, pg_dump builds an array of these, stores them > > at the beginning of the file and replaces the strings with integer fields > > in the dump. This reduces the file size further, and removes a lot of > > calls to ReadStr, thus saving quite some time. > > This sounds promising. > > > Patch Toc size Dump -s duration pg_restore -l duration > > HEAD 214M 23.1s 1.27s > > #1 (has oid) 210M 22.9s 1.26s > > #2 (scanf) 210M 22.9s 1.07s > > #3 (no strtoul) 202M 22.8s 0.94s > > #4 (string list) 181M 23.1s 0.87s > > At a glance, the size improvements in 0004 look the most interesting to me. Yes it is, and the speed benefits are interesting too (at least for my usecase)