When adding a check to pg_upgrade a while back I noticed in a profile that the cluster compatibility check phase spend a lot of time in connectToServer. Some of this can be attributed to data type checks which each run serially in turn connecting to each database to run the check, and this seemed like a place where we can do better.
The attached patch moves the checks from individual functions, which each loops over all databases, into a struct which is consumed by a single umbrella check where all data type queries are executed against a database using the same connection. This way we can amortize the connectToServer overhead across more accesses to the database. In the trivial case, a single database, I don't see a reduction of performance over the current approach. In a cluster with 100 (empty) databases there is a ~15% reduction in time to run a --check pass. While it won't move the earth in terms of wallclock time, consuming less resources on the old cluster allowing --check to be cheaper might be the bigger win. -- Daniel Gustafsson
0001-pg_upgrade-run-all-data-type-checks-per-connection.patch
Description: Binary data