On Tue, Aug 23, 2022 at 1:46 AM Robert Haas <robertmh...@gmail.com> wrote: > > On Mon, Aug 22, 2022 at 3:55 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > To solve that problem, how about rewriting the system table in the new > > cluster which has a conflicting relfilenode? I think we can probably > > do this conflict checking before processing the tables from the old > > cluster. > > Thanks for chiming in. > > Right now, there are two parts to the relfilenumber preservation > system, and this scheme doesn't quite fit into either of them. First, > the dump includes commands to set pg_class.relfilenode in the new > cluster to the same value that it had in the old cluster. The dump > can't include any SQL commands that depend on what's happening in the > new cluster because pg_dump(all) only connects to a single cluster, > which in this case is the old cluster. Second, pg_upgrade itself > copies the files from the old cluster to the new cluster. This doesn't > involve a database connection at all. So there's no part of the > current relfilenode preservation mechanism that can look at the old > AND the new database and decide on some SQL to execute against the new > database. > > I thought for a while that we could use the information that's already > gathered by get_rel_infos() to do what you're suggesting here, but it > doesn't quite work, because that function excludes system tables, and > we can't afford to do that here. We'd either need to modify that query > to include system tables - at least for the new cluster - or run a > separate one to gather information about system tables in the new > cluster. Then, we could put all the pg_class.relfilenode values we > found in the new cluster into a hash table, loop over the list of rels > this function found in the old cluster, and for each one, probe into > the hash table. If we find a match, that's a system table that needs > to be moved out of the way before calling create_new_objects(), or > maybe inside that function but before it runs pg_restore. > > That doesn't seem too crazy, I think. It's a little bit of new > mechanism, but it doesn't sound horrific. It's got the advantage of > being significantly cheaper than my proposal of moving everything out > of the way unconditionally, and at the same time it retains one of the > key advantages of that proposal - IMV, anyway - which is that we don't > need separate relfilenode ranges for user and system objects any more. > So I guess on balance I kind of like it, but maybe I'm missing > something.
Okay, so this seems exactly the same as your previous proposal but instead of unconditionally rewriting all the system tables we will rewrite only those conflict with the user table or pg_largeobject from the previous cluster. Although it might have additional implementation complexity on the pg upgrade side, it seems cheaper than rewriting everything. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com