When adding a check to pg_upgrade a while back I noticed in a profile that the
cluster compatibility check phase spend a lot of time in connectToServer.  Some
of this can be attributed to data type checks which each run serially in turn
connecting to each database to run the check, and this seemed like a place
where we can do better.

The attached patch moves the checks from individual functions, which each loops
over all databases, into a struct which is consumed by a single umbrella check
where all data type queries are executed against a database using the same
connection.  This way we can amortize the connectToServer overhead across more
accesses to the database.

In the trivial case, a single database, I don't see a reduction of performance
over the current approach.  In a cluster with 100 (empty) databases there is a
~15% reduction in time to run a --check pass.  While it won't move the earth in
terms of wallclock time, consuming less resources on the old cluster allowing
--check to be cheaper might be the bigger win.

--
Daniel Gustafsson

Attachment: 0001-pg_upgrade-run-all-data-type-checks-per-connection.patch
Description: Binary data

Reply via email to