Hello list,

I'm debugging the abysmal performance of pg_restoring a huge (10TB) database, which includes a table with more than 1000 partitions.

As part of pg_restore -j... --section=post-data
I see *days* being spent in the sequential creation of foreign keys:

  ALTER TABLE the_master_partitioned_table
  ADD CONSTRAINT ...
  FOREIGN KEY (columnX) REFERENCES another_table(columnX)

Each of the above queries take 1-2 days, and multiple such queries are issued sequentially.

My questions:

+ This master_partitioned_table contains at least 5 foreign key
  constraints from different columns to different tables.
  Despite pg_restore being invoked with -j, they all run sequentially.
  I guess pg_restore dependency resolution decides so. Is there a
  reason for that?

+ The backend process for each of the above ALTER TABLE commands, does not
  parallelize the foreign key checks for the different partitions. I
  know, because in the logs I see gigabytes of temporary files being
  written, with the CONTEXT showing queries issued incrementally on
  all the different partitions:

  :LOG:      temporary file: path 
"pg_tblspc/16390/PG_17_202406281/pgsql_tmp/pgsql_tmp3363462.579", size 
1073741824
  :CONTEXT:  SQL statement "SELECT fk."columnX" FROM ONLY 
"public"."table_partition_214" fk
             LEFT OUTER JOIN ONLY "public"."another_table" pk
                 ON ( pk."columnX" OPERATOR(pg_catalog.=) fk."columnX")
             WHERE pk."columnX" IS NULL AND (fk."columnX" IS NOT NULL)"

  Why can't the backend issue these queries in parallel workers?

+ Based on the pg_restore manual, I am experimenting with manually issuing
  DISABLE TRIGGERS before the restoration, but I can't see a difference
  when I'm doing the section=post-data separately. Is it supposed to speed
  things up?


Thanks in advance,
Dimitris



Reply via email to