On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
> I ended up writing several patches that shaved some time for pg_restore -l, 
> and reduced the toc.dat size.

I've only just started taking a look at these patches, and I intend to do a
more thorough review in the hopefully-not-too-distant future.

> First patch is "finishing" the job of removing has oids support. When this 
> support was removed, instead of dropping the field from the dumps and 
> increasing the dump versions, the field was kept as is. This field stores a 
> boolean as a string, "true" or "false". This is not free, and requires 10 
> bytes per toc entry.

This sounds reasonable to me.  I wonder why this wasn't done when WITH OIDS
was removed in v12.

> The second patch removes calls to sscanf and replaces them with strtoul. This 
> was the biggest speedup for pg_restore -l.

Nice.

> The third patch changes the dump format further to remove these strtoul calls 
> and store the integers as is instead.

Do we need to worry about endianness here?

> The fourth patch is dirtier and does more changes to the dump format. Instead 
> of storing the owner, tablespace, table access method and schema of each 
> object as a string, pg_dump builds an array of these, stores them at the 
> beginning of the file and replaces the strings with integer fields in the 
> dump. 
> This reduces the file size further, and removes a lot of calls to ReadStr, 
> thus 
> saving quite some time.

This sounds promising.

> Patch Toc size        Dump -s duration        pg_restore -l duration
> HEAD  214M    23.1s   1.27s
> #1 (has oid)  210M    22.9s   1.26s
> #2 (scanf)    210M    22.9s   1.07s
> #3 (no strtoul)       202M    22.8s   0.94s
> #4 (string list)      181M    23.1s   0.87s

At a glance, the size improvements in 0004 look the most interesting to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com


Reply via email to