Hello list,
I'm debugging the abysmal performance of pg_restoring a huge
(10TB) database, which includes a table with more than 1000 partitions.
As part of pg_restore -j... --section=post-data
I see *days* being spent in the sequential creation of foreign keys:
ALTER TABLE the_master_partitioned_table
ADD CONSTRAINT ...
FOREIGN KEY (columnX) REFERENCES another_table(columnX)
Each of the above queries take 1-2 days, and multiple such queries are
issued sequentially.
My questions:
+ This master_partitioned_table contains at least 5 foreign key
constraints from different columns to different tables.
Despite pg_restore being invoked with -j, they all run sequentially.
I guess pg_restore dependency resolution decides so. Is there a
reason for that?
+ The backend process for each of the above ALTER TABLE commands, does not
parallelize the foreign key checks for the different partitions. I
know, because in the logs I see gigabytes of temporary files being
written, with the CONTEXT showing queries issued incrementally on
all the different partitions:
:LOG: temporary file: path
"pg_tblspc/16390/PG_17_202406281/pgsql_tmp/pgsql_tmp3363462.579", size
1073741824
:CONTEXT: SQL statement "SELECT fk."columnX" FROM ONLY
"public"."table_partition_214" fk
LEFT OUTER JOIN ONLY "public"."another_table" pk
ON ( pk."columnX" OPERATOR(pg_catalog.=) fk."columnX")
WHERE pk."columnX" IS NULL AND (fk."columnX" IS NOT NULL)"
Why can't the backend issue these queries in parallel workers?
+ Based on the pg_restore manual, I am experimenting with manually issuing
DISABLE TRIGGERS before the restoration, but I can't see a difference
when I'm doing the section=post-data separately. Is it supposed to speed
things up?
Thanks in advance,
Dimitris