On Sun, Jul 10, 2022 at 9:31 PM Michael Paquier <mich...@paquier.xyz> wrote: > Hmm. That would mean that the more LOs a cluster has, the more bloat > there will be in the new cluster once the upgrade is done. That could > be quite a few gigs worth of data laying around depending on the data > inserted in the source cluster, and we don't have a way to know which > files to remove post-upgrade, do we?
The files that are being leaked here are the files backing the pg_largeobject table and the corresponding index as they existed in the new cluster just prior to the upgrade. Hopefully, the table is a zero-length file and the index is just one block, because you're supposed to use a newly-initdb'd cluster as the target for a pg_upgrade operation. Now, you don't actually have to do that: as we've been discussing, there aren't as many sanity checks in this code as there probably should be. But it would still be surprising to initdb a new cluster, load gigabytes of large objects into it, and then use it as the target cluster for a pg_upgrade. As for whether it's possible to know which files to remove post-upgrade, that's the same problem as trying to figure out whether their are orphaned files in any other PostgreSQL cluster. We don't have a tool for it, but if you're sure that the system is more or less quiescent - no uncommitted DDL, in particular - you can compare pg_class.relfilenode to the actual filesystem contents and figure out what extra stuff is present on the filesystem level. I am not saying we shouldn't try to fix this up more thoroughly, just that I think you are overestimating the consequences. -- Robert Haas EDB: http://www.enterprisedb.com