Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Robert Haas writes: > On Thu, Apr 21, 2022 at 1:53 PM Tom Lane wrote: >> 0001 attached is a revised patch that does it that way. This seems >> like a clearly better answer. >> 0002 contains the perhaps-slightly-more-controversial changes of >> changing the macro names and explicitly pinning no databases. > Both patches look good to me. Pushed, thanks for looking. regards, tom lane
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Apr 21, 2022 at 1:53 PM Tom Lane wrote: > Robert Haas writes: > > On Wed, Apr 20, 2022 at 4:56 PM Tom Lane wrote: > >> Having just had to bury my nose in renumber_oids.pl, I thought of a > >> different approach we could take to expose these OIDs to Catalog.pm. > >> That's to invent a new macro that Catalog.pm recognizes, and write > >> something about like this in pg_database.h: > >> DECLARE_OID_DEFINING_MACRO(Template0ObjectId, 4); > >> DECLARE_OID_DEFINING_MACRO(PostgresObjectId, 5); > > > I like it! > > 0001 attached is a revised patch that does it that way. This seems > like a clearly better answer. > > 0002 contains the perhaps-slightly-more-controversial changes of > changing the macro names and explicitly pinning no databases. Both patches look good to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Robert Haas writes: > On Wed, Apr 20, 2022 at 4:56 PM Tom Lane wrote: >> Having just had to bury my nose in renumber_oids.pl, I thought of a >> different approach we could take to expose these OIDs to Catalog.pm. >> That's to invent a new macro that Catalog.pm recognizes, and write >> something about like this in pg_database.h: >> DECLARE_OID_DEFINING_MACRO(Template0ObjectId, 4); >> DECLARE_OID_DEFINING_MACRO(PostgresObjectId, 5); > I like it! 0001 attached is a revised patch that does it that way. This seems like a clearly better answer. 0002 contains the perhaps-slightly-more-controversial changes of changing the macro names and explicitly pinning no databases. regards, tom lane diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm index 0275795dea..ece0a934f0 100644 --- a/src/backend/catalog/Catalog.pm +++ b/src/backend/catalog/Catalog.pm @@ -44,6 +44,8 @@ sub ParseHeader $catalog{columns} = []; $catalog{toasting}= []; $catalog{indexing}= []; + $catalog{other_oids} = []; + $catalog{foreign_keys} = []; $catalog{client_code} = []; open(my $ifh, '<', $input_file) || die "$input_file: $!"; @@ -118,6 +120,14 @@ sub ParseHeader index_decl => $6 }; } + elsif (/^DECLARE_OID_DEFINING_MACRO\(\s*(\w+),\s*(\d+)\)/) + { + push @{ $catalog{other_oids} }, + { +other_name => $1, +other_oid => $2 + }; + } elsif ( /^DECLARE_(ARRAY_)?FOREIGN_KEY(_OPT)?\(\s*\(([^)]+)\),\s*(\w+),\s*\(([^)]+)\)\)/ ) @@ -572,6 +582,10 @@ sub FindAllOidsFromHeaders { push @oids, $index->{index_oid}; } + foreach my $other (@{ $catalog->{other_oids} }) + { + push @oids, $other->{other_oid}; + } } return \@oids; diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c index 520f77971b..d7e5c02f95 100644 --- a/src/backend/catalog/catalog.c +++ b/src/backend/catalog/catalog.c @@ -339,9 +339,11 @@ IsPinnedObject(Oid classId, Oid objectId) * robustness. */ - /* template1 is not pinned */ + /* template1, template0, postgres are not pinned */ if (classId == DatabaseRelationId && - objectId == TemplateDbOid) + (objectId == TemplateDbOid || + objectId == Template0ObjectId || + objectId == PostgresObjectId)) return false; /* the public namespace is not pinned */ diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index 2d02b02267..f4ec6d6d40 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -472,7 +472,7 @@ EOM $catalog->{rowtype_oid_macro}, $catalog->{rowtype_oid} if $catalog->{rowtype_oid_macro}; - # Likewise for macros for toast and index OIDs + # Likewise for macros for toast, index, and other OIDs foreach my $toast (@{ $catalog->{toasting} }) { printf $def "#define %s %s\n", @@ -488,6 +488,12 @@ EOM $index->{index_oid_macro}, $index->{index_oid} if $index->{index_oid_macro}; } + foreach my $other (@{ $catalog->{other_oids} }) + { + printf $def "#define %s %s\n", + $other->{other_name}, $other->{other_oid} + if $other->{other_name}; + } print $def "\n"; diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 1cb4a5b0d2..5e2eeefc4c 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -59,11 +59,11 @@ #include "sys/mman.h" #endif -#include "access/transam.h" #include "access/xlog_internal.h" #include "catalog/pg_authid_d.h" #include "catalog/pg_class_d.h" /* pgrminclude ignore */ #include "catalog/pg_collation_d.h" +#include "catalog/pg_database_d.h" /* pgrminclude ignore */ #include "common/file_perm.h" #include "common/file_utils.h" #include "common/logging.h" diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index d3588607e7..786d592e2b 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -2901,10 +2901,11 @@ dumpDatabase(Archive *fout) qdatname = pg_strdup(fmtId(datname)); /* - * Prepare the CREATE DATABASE command. We must specify encoding, locale, - * and tablespace since those can't be altered later. Other DB properties - * are left to the DATABASE PROPERTIES entry, so that they can be applied - * after reconnecting to the target DB. + * Prepare the CREATE DATABASE command. We must specify OID (if we want + * to preserve that), as well as the encoding, locale, and tablespace + * since those can't be altered later. Other DB properties are left to + * the DATABASE PROPERTIES entry, so that they can be applied after + * reconnecting to the target DB. */ if (dopt->binary_upgrade) { diff --git a/src/include/access/transam.h b/src/include/access/transam.h index 9a2816de51..338dfca5a0 100644 --- a/src/include/access/transam.h +++ b/src/include/access/transam.h @@ -196,10 +196,6 @@ FullTransactionIdAdvance(FullTransactionId *dest) #define FirstUnpinnedObjectId 12000 #define FirstNormalObjectId 16384 -/* OIDs of Template0 and Postgres database are fixed */ -#define
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Wed, Apr 20, 2022 at 4:56 PM Tom Lane wrote: > Having just had to bury my nose in renumber_oids.pl, I thought of a > different approach we could take to expose these OIDs to Catalog.pm. > That's to invent a new macro that Catalog.pm recognizes, and write > something about like this in pg_database.h: > > DECLARE_OID_DEFINING_MACRO(Template0ObjectId, 4); > DECLARE_OID_DEFINING_MACRO(PostgresObjectId, 5); > > If that seems more plausible to you I'll set about preparing a patch. I like it! -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Robert Haas writes: > On Wed, Apr 20, 2022 at 2:34 PM Tom Lane wrote: >> The attached draft patch attempts to improve this situation. >> It reserves these OIDs, and creates the associated macros, through >> the normal BKI infrastructure by adding entries in pg_database.dat. >> We have to delete those rows again during initdb, which is slightly >> ugly but surely no more so than initdb's other direct manipulations >> of pg_database. > I'm not sure I really like this approach, but if you're firmly > convinced that it's better than cleaning up the loose ends in some > other way, I'm not going to waste a lot of energy fighting about it. Having just had to bury my nose in renumber_oids.pl, I thought of a different approach we could take to expose these OIDs to Catalog.pm. That's to invent a new macro that Catalog.pm recognizes, and write something about like this in pg_database.h: DECLARE_OID_DEFINING_MACRO(Template0ObjectId, 4); DECLARE_OID_DEFINING_MACRO(PostgresObjectId, 5); That would result in (a) the OIDs becoming known to Catalog.pm as reserved, though it wouldn't have any great clue about their semantics; and (b) this getting emitted into pg_database_d.h: #define Template0ObjectId 4 #define PostgresObjectId 5 Then we'd not need the dummy entries in pg_database.dat, which does seem cleaner now that I think about it. A downside is that with no context, Catalog.pm could not provide name translation services during postgres.bki generation for such OIDs --- but at least for these entries, we don't need that. If that seems more plausible to you I'll set about preparing a patch. regards, tom lane
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Wed, Apr 20, 2022 at 2:34 PM Tom Lane wrote: > The attached draft patch attempts to improve this situation. > It reserves these OIDs, and creates the associated macros, through > the normal BKI infrastructure by adding entries in pg_database.dat. > We have to delete those rows again during initdb, which is slightly > ugly but surely no more so than initdb's other direct manipulations > of pg_database. I'm not sure I really like this approach, but if you're firmly convinced that it's better than cleaning up the loose ends in some other way, I'm not going to waste a lot of energy fighting about it. It doesn't seem horrible. > There are a few loose ends: > > * I'm a bit inclined to simplify IsPinnedObject by just teaching > it that *no* entries of pg_database are pinned, which would correspond > to the evident lack of enforcement in dropdb(). Can anyone see a > reason why we might pin some database in future? It's kind of curious that we don't pin anything now. There's kind of nothing to stop you from hosing the system by dropping template0 and/or template1, or mutating them into some crazy form that doesn't work. But having said that, if as a matter of policy we don't even pin template0 or template1 or postgres, then it seems unlikely that we would suddenly decide to pin some new database in the future. > * I had to set up the additional pg_database entries with nonzero > datfrozenxid to avoid an assertion failure during initdb's first VACUUM. > (That VACUUM will overwrite template1's datfrozenxid before computing > the global minimum frozen XID, but not these others; and it doesn't like > finding that the minimum is zero.) This feels klugy. An alternative is > to delete the extra pg_database rows before that VACUUM, which would > mean taking those deletes out of make_template0 and make_postgres and > putting them somewhere seemingly unrelated, so that's a bit crufty too. > Anybody have a preference? Not really. If anything that's an argument against this entire approach, but I already commented on that point above. > * The new macro names seem ill-chosen. Template0ObjectId is spelled > randomly differently from the longstanding TemplateDbOid, and surely > PostgresObjectId is about as vague a name as could possibly have > been thought of (please name an object that it couldn't apply to). > I'm a little inclined to rename TemplateDbOid to Template1DbOid and > use Template0DbOid and PostgresDbOid for the others, but I didn't pull > that trigger here. Seems totally reasonable. I don't find the current naming horrible or I'd not have committed it that way, but putting Db in there makes sense, too. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Robert Haas writes: > On Sat, Jan 22, 2022 at 2:20 AM Shruthi Gowda wrote: >> Agree. In the latest patch, the template0 and postgres OIDs are fixed >> to unused manually assigned OIDs 4 and 5 respectively. These OIDs are >> no more listed as unused OIDs. > Thanks. Committed with a few more cosmetic changes. I happened to take a closer look at this patch today, and I'm pretty unhappy with the way that the assignment of those OIDs was managed. There are two big problems: 1. IsPinnedObject() will now report that template0 and postgres are pinned. This seems not to prevent one from dropping them (I guess dropdb() doesn't consult IsPinnedObject), but it would probably bollix any pg_shdepend management that should happen for them. 2. The Catalog.pm infrastructure knows nothing about these OIDs. While the unused_oids script was hack-n-slashed to claim that the OIDs are used, other scripts won't know about them; for example duplicate_oids won't report conflicts if someone tries to reuse those OIDs. The attached draft patch attempts to improve this situation. It reserves these OIDs, and creates the associated macros, through the normal BKI infrastructure by adding entries in pg_database.dat. We have to delete those rows again during initdb, which is slightly ugly but surely no more so than initdb's other direct manipulations of pg_database. There are a few loose ends: * I'm a bit inclined to simplify IsPinnedObject by just teaching it that *no* entries of pg_database are pinned, which would correspond to the evident lack of enforcement in dropdb(). Can anyone see a reason why we might pin some database in future? * I had to set up the additional pg_database entries with nonzero datfrozenxid to avoid an assertion failure during initdb's first VACUUM. (That VACUUM will overwrite template1's datfrozenxid before computing the global minimum frozen XID, but not these others; and it doesn't like finding that the minimum is zero.) This feels klugy. An alternative is to delete the extra pg_database rows before that VACUUM, which would mean taking those deletes out of make_template0 and make_postgres and putting them somewhere seemingly unrelated, so that's a bit crufty too. Anybody have a preference? * The new macro names seem ill-chosen. Template0ObjectId is spelled randomly differently from the longstanding TemplateDbOid, and surely PostgresObjectId is about as vague a name as could possibly have been thought of (please name an object that it couldn't apply to). I'm a little inclined to rename TemplateDbOid to Template1DbOid and use Template0DbOid and PostgresDbOid for the others, but I didn't pull that trigger here. Comments? regards, tom lane diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c index 520f77971b..d7e5c02f95 100644 --- a/src/backend/catalog/catalog.c +++ b/src/backend/catalog/catalog.c @@ -339,9 +339,11 @@ IsPinnedObject(Oid classId, Oid objectId) * robustness. */ - /* template1 is not pinned */ + /* template1, template0, postgres are not pinned */ if (classId == DatabaseRelationId && - objectId == TemplateDbOid) + (objectId == TemplateDbOid || + objectId == Template0ObjectId || + objectId == PostgresObjectId)) return false; /* the public namespace is not pinned */ diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 1cb4a5b0d2..04454b3d7c 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -59,11 +59,11 @@ #include "sys/mman.h" #endif -#include "access/transam.h" #include "access/xlog_internal.h" #include "catalog/pg_authid_d.h" #include "catalog/pg_class_d.h" /* pgrminclude ignore */ #include "catalog/pg_collation_d.h" +#include "catalog/pg_database_d.h" /* pgrminclude ignore */ #include "common/file_perm.h" #include "common/file_utils.h" #include "common/logging.h" @@ -1806,14 +1806,24 @@ make_template0(FILE *cmdfd) * objects in the old cluster, the problem scenario only exists if the OID * that is in use in the old cluster is also used in the new cluster - and * the new cluster should be the result of a fresh initdb.) - * - * We use "STRATEGY = file_copy" here because checkpoints during initdb - * are cheap. "STRATEGY = wal_log" would generate more WAL, which would - * be a little bit slower and make the new cluster a little bit bigger. */ static const char *const template0_setup[] = { - "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID = " - CppAsString2(Template0ObjectId) + /* + * Since pg_database.dat includes a dummy row for template0, we have + * to remove that before creating the database for real. + */ + "DELETE FROM pg_database WHERE datname = 'template0';\n\n", + + /* + * Clone template1 to make template0. + * + * We use "STRATEGY = file_copy" here because checkpoints during + * initdb are cheap. "STRATEGY = wal_log" would generate more WAL, + * which would be a little bit slower and make the
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Mon, Mar 21, 2022 at 8:52 PM Andres Freund wrote: > I noticed this still has an open CF entry: > https://commitfest.postgresql.org/37/3296/ > I assume it can be marked as committed? Yeah, done now. But don't forget that we still need to do something on the "wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:" thread, and I think the ball is in your court there. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On 2022-01-24 14:44:10 -0500, Robert Haas wrote: > On Sat, Jan 22, 2022 at 2:20 AM Shruthi Gowda wrote: > > Agree. In the latest patch, the template0 and postgres OIDs are fixed > > to unused manually assigned OIDs 4 and 5 respectively. These OIDs are > > no more listed as unused OIDs. > > Thanks. Committed with a few more cosmetic changes. I noticed this still has an open CF entry: https://commitfest.postgresql.org/37/3296/ I assume it can be marked as committed? - Andres
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Jan 25, 2022 at 10:19:53AM +0530, Shruthi Gowda wrote: > On Tue, Jan 25, 2022 at 1:14 AM Robert Haas wrote: > > On Sat, Jan 22, 2022 at 2:20 AM Shruthi Gowda wrote: > > > Agree. In the latest patch, the template0 and postgres OIDs are fixed > > > to unused manually assigned OIDs 4 and 5 respectively. These OIDs are > > > no more listed as unused OIDs. > > > > Thanks. Committed with a few more cosmetic changes. > > Thanks, Robert for committing this patch. If I'm not wrong, this can be closed. https://commitfest.postgresql.org/37/3296/
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Jan 25, 2022 at 1:14 AM Robert Haas wrote: > > On Sat, Jan 22, 2022 at 2:20 AM Shruthi Gowda wrote: > > Agree. In the latest patch, the template0 and postgres OIDs are fixed > > to unused manually assigned OIDs 4 and 5 respectively. These OIDs are > > no more listed as unused OIDs. > > Thanks. Committed with a few more cosmetic changes. Thanks, Robert for committing this patch.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Sat, Jan 22, 2022 at 2:20 AM Shruthi Gowda wrote: > Agree. In the latest patch, the template0 and postgres OIDs are fixed > to unused manually assigned OIDs 4 and 5 respectively. These OIDs are > no more listed as unused OIDs. Thanks. Committed with a few more cosmetic changes. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Sat, Jan 22, 2022 at 12:47:35PM +0530, Shruthi Gowda wrote: > On Sat, Jan 22, 2022 at 12:27 AM Tom Lane wrote: > > > > Robert Haas writes: > > > It seems to me that what this comment is saying is that OIDs in the > > > second and third categories are doled out by counters. Therefore, we > > > can't know which of those OIDs will get used, or how many of them will > > > get used, or which objects will get which OIDs. Therefore, I think we > > > should go back to the approach that you were using for template0 and > > > handle both that database and postgres using that method. That is, > > > assign an OID manually, and make sure unused_oids knows that it should > > > be counted as already used. > > > > Indeed. If you're going to manually assign OIDs to these databases, > > do it honestly, and put them into the range intended for that purpose. > > Trying to take short-cuts is just going to cause trouble down the road. > > Understood. I will rework the patch accordingly. Thanks Thanks. To get the rsync update reduction we need to preserve database oids. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Sat, Jan 22, 2022 at 12:17 AM Robert Haas wrote: > > On Fri, Jan 21, 2022 at 8:40 AM Shruthi Gowda wrote: > > From what I see in the code, template0 and postgres are the last > > things that get created in initdb phase. The system OIDs that get > > assigned to these DBs vary from release to release. At present, the > > system assigned OIDs of template0 and postgres are 13679 and 13680 > > respectively. I feel it would be safe to assign 16000 and 16001 for > > template0 and postgres respectively from the unpinned object OID range > > 12000 - 16383. In the future, even if the initdb unpinned objects > > reach the range of 16000 issues can only arise if initdb() creates > > another system-created database for which the system assigns these > > reserved OIDs (16000, 16001). > > It doesn't seem safe to me to rely on that. We don't know what could > happen in the future if the number of built-in objects increases. > Looking at the lengthy comment on this topic in transam.h, I see that > there are three ranges: > > 1- manually assigned OIDs > 1-11999 OIDs assigned by genbki.pl > 12000-16384 OIDs assigned to unpinned objects post-bootstrap > > It seems to me that what this comment is saying is that OIDs in the > second and third categories are doled out by counters. Therefore, we > can't know which of those OIDs will get used, or how many of them will > get used, or which objects will get which OIDs. Therefore, I think we > should go back to the approach that you were using for template0 and > handle both that database and postgres using that method. That is, > assign an OID manually, and make sure unused_oids knows that it should > be counted as already used. Agree. In the latest patch, the template0 and postgres OIDs are fixed to unused manually assigned OIDs 4 and 5 respectively. These OIDs are no more listed as unused OIDs. Regards, Shruthi KC EnterpriseDB: http://www.enterprisedb.com v13-0001-pg_upgrade-perserve-database-OID-patch.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Sat, Jan 22, 2022 at 12:27 AM Tom Lane wrote: > > Robert Haas writes: > > It seems to me that what this comment is saying is that OIDs in the > > second and third categories are doled out by counters. Therefore, we > > can't know which of those OIDs will get used, or how many of them will > > get used, or which objects will get which OIDs. Therefore, I think we > > should go back to the approach that you were using for template0 and > > handle both that database and postgres using that method. That is, > > assign an OID manually, and make sure unused_oids knows that it should > > be counted as already used. > > Indeed. If you're going to manually assign OIDs to these databases, > do it honestly, and put them into the range intended for that purpose. > Trying to take short-cuts is just going to cause trouble down the road. Understood. I will rework the patch accordingly. Thanks Regards, Shruthi KC EnterpriseDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Robert Haas writes: > It seems to me that what this comment is saying is that OIDs in the > second and third categories are doled out by counters. Therefore, we > can't know which of those OIDs will get used, or how many of them will > get used, or which objects will get which OIDs. Therefore, I think we > should go back to the approach that you were using for template0 and > handle both that database and postgres using that method. That is, > assign an OID manually, and make sure unused_oids knows that it should > be counted as already used. Indeed. If you're going to manually assign OIDs to these databases, do it honestly, and put them into the range intended for that purpose. Trying to take short-cuts is just going to cause trouble down the road. regards, tom lane
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Fri, Jan 21, 2022 at 8:40 AM Shruthi Gowda wrote: > From what I see in the code, template0 and postgres are the last > things that get created in initdb phase. The system OIDs that get > assigned to these DBs vary from release to release. At present, the > system assigned OIDs of template0 and postgres are 13679 and 13680 > respectively. I feel it would be safe to assign 16000 and 16001 for > template0 and postgres respectively from the unpinned object OID range > 12000 - 16383. In the future, even if the initdb unpinned objects > reach the range of 16000 issues can only arise if initdb() creates > another system-created database for which the system assigns these > reserved OIDs (16000, 16001). It doesn't seem safe to me to rely on that. We don't know what could happen in the future if the number of built-in objects increases. Looking at the lengthy comment on this topic in transam.h, I see that there are three ranges: 1- manually assigned OIDs 1-11999 OIDs assigned by genbki.pl 12000-16384 OIDs assigned to unpinned objects post-bootstrap It seems to me that what this comment is saying is that OIDs in the second and third categories are doled out by counters. Therefore, we can't know which of those OIDs will get used, or how many of them will get used, or which objects will get which OIDs. Therefore, I think we should go back to the approach that you were using for template0 and handle both that database and postgres using that method. That is, assign an OID manually, and make sure unused_oids knows that it should be counted as already used. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Fri, Jan 21, 2022 at 1:08 AM Robert Haas wrote: > > On Thu, Jan 20, 2022 at 11:03 AM Shruthi Gowda wrote: > > It is not required for PostgresObjectId. The unused_oids script > > provides a list of unused oids in the manually-assignable OIDs range > > (1-). > > Well, so ... why are we not treating the OIDs for these two databases > the same? If there's a range from which we can assign OIDs without > risk of duplication and without needing to update this script, perhaps > we ought to assign both of them from that range and leave the script > alone. >From what I see in the code, template0 and postgres are the last things that get created in initdb phase. The system OIDs that get assigned to these DBs vary from release to release. At present, the system assigned OIDs of template0 and postgres are 13679 and 13680 respectively. I feel it would be safe to assign 16000 and 16001 for template0 and postgres respectively from the unpinned object OID range 12000 - 16383. In the future, even if the initdb unpinned objects reach the range of 16000 issues can only arise if initdb() creates another system-created database for which the system assigns these reserved OIDs (16000, 16001). > + * that is in use in the old cluster is also used in th new cluster - and > > th -> the. Fixed > +preserves the DB, tablespace, relfilenode OIDs so TOAST and other references > > Insert "and" before "relfilenode". Fixed Attached is the latest patch for review. Shruthi KC EnterpriseDB: http://www.enterprisedb.com v12-0001-pg_upgrade-perserve-database-OID-patch.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Jan 20, 2022 at 11:03 AM Shruthi Gowda wrote: > It is not required for PostgresObjectId. The unused_oids script > provides a list of unused oids in the manually-assignable OIDs range > (1-). Well, so ... why are we not treating the OIDs for these two databases the same? If there's a range from which we can assign OIDs without risk of duplication and without needing to update this script, perhaps we ought to assign both of them from that range and leave the script alone. + * that is in use in the old cluster is also used in th new cluster - and th -> the. +preserves the DB, tablespace, relfilenode OIDs so TOAST and other references Insert "and" before "relfilenode". I think this is in pretty good shape now. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Jan 20, 2022 at 7:57 PM Robert Haas wrote: > > On Thu, Jan 20, 2022 at 7:09 AM Shruthi Gowda wrote: > > > Here's an updated version in which I've reverted the changes to gram.y > > > and tried to improve the comments and documentation. Could you have a > > > look at implementing (2) above? > > > > Attached is the patch that implements comment (2). > > This probably needs minor rebasing on account of the fact that I just > pushed the patch to remove datlastsysoid. I intended to do that before > you posted a new version to save you the trouble, but I was too slow > (or you were too fast, however you want to look at it). I have rebased the patch. Kindly have a look at it. > + errmsg("Invalid value for option \"%s\"", defel->defname), > > Per the "error message style" section of the documentation, primary > error messages neither begin with a capital letter nor end with a > period, while errdetail() messages are complete sentences and thus > both begin with a capital letter and end with a period. But what I > think you should really do here is get rid of the error detail and > convey all the information in a primary error message. e.g. "OID %u is > a system OID", or maybe better, "OIDs less than %u are reserved for > system objects". Fixed > + errmsg("database oid %u is already used by database %s", > + errmsg("data directory exists for database oid %u", dboid)); > > Usually we write "OID" rather than "oid" in error messages. I think > maybe it would be best to change the text slightly too. I suggest: > > database OID %u is already in use by database \"%s\" > data directory already exists for database with OID %u The second error message will be reported when the data directory with the given OID exists in the data path but the corresponding DB does not exist. This could happen only if the user creates directories in the data folder which is indeed not an invalid usage. I have updated the error message to "data directory with the specified OID %u already exists" because the error message you recommended gives a slightly different meaning. > + * it would fail. To avoid that, assign a fixed OID to template0 and > + * postgres rather than letting the server choose one. > > a fixed OID -> fixed OIDs > one -> them > > Or maybe put this comment back the way I had it and just talk about > postgres, and then change the comment in make_postgres to say "Assign > a fixed OID to postgres, for the same reasons as template0." Fixed > + /* > + * Make sure that binary upgrade propagate the database OID to the new > + * cluster > + */ > > This comment doesn't really seem necessary. It's sort of self-explanatory. Removed > +# Push the OID that is reserved for template0 database. > +my $Template0ObjectId = > + Catalog::FindDefinedSymbol('access/transam.h', '..', 'Template0ObjectId'); > +push @{$oids}, $Template0ObjectId; > > Don't you need to do this for PostgresObjectId also? It is not required for PostgresObjectId. The unused_oids script provides a list of unused oids in the manually-assignable OIDs range (1-). Shruthi KC EnterpriseDB: http://www.enterprisedb.com v11-0001-pg_upgrade-perserve-database-OID-patch.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Jan 20, 2022 at 7:09 AM Shruthi Gowda wrote: > > Here's an updated version in which I've reverted the changes to gram.y > > and tried to improve the comments and documentation. Could you have a > > look at implementing (2) above? > > Attached is the patch that implements comment (2). This probably needs minor rebasing on account of the fact that I just pushed the patch to remove datlastsysoid. I intended to do that before you posted a new version to save you the trouble, but I was too slow (or you were too fast, however you want to look at it). + errmsg("Invalid value for option \"%s\"", defel->defname), Per the "error message style" section of the documentation, primary error messages neither begin with a capital letter nor end with a period, while errdetail() messages are complete sentences and thus both begin with a capital letter and end with a period. But what I think you should really do here is get rid of the error detail and convey all the information in a primary error message. e.g. "OID %u is a system OID", or maybe better, "OIDs less than %u are reserved for system objects". + errmsg("database oid %u is already used by database %s", + errmsg("data directory exists for database oid %u", dboid)); Usually we write "OID" rather than "oid" in error messages. I think maybe it would be best to change the text slightly too. I suggest: database OID %u is already in use by database \"%s\" data directory already exists for database with OID %u + * it would fail. To avoid that, assign a fixed OID to template0 and + * postgres rather than letting the server choose one. a fixed OID -> fixed OIDs one -> them Or maybe put this comment back the way I had it and just talk about postgres, and then change the comment in make_postgres to say "Assign a fixed OID to postgres, for the same reasons as template0." + /* + * Make sure that binary upgrade propagate the database OID to the new + * cluster + */ This comment doesn't really seem necessary. It's sort of self-explanatory. +# Push the OID that is reserved for template0 database. +my $Template0ObjectId = + Catalog::FindDefinedSymbol('access/transam.h', '..', 'Template0ObjectId'); +push @{$oids}, $Template0ObjectId; Don't you need to do this for PostgresObjectId also? -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Jan 18, 2022 at 2:34 AM Robert Haas wrote: > > On Mon, Jan 17, 2022 at 9:57 AM Shruthi Gowda wrote: > > I have rebased and generated the patches on top of PostgreSQL commit > > ID cf925936ecc031355cd56fbd392ec3180517a110. > > Kindly apply v8-0001-pg_upgrade-Preserve-relfilenodes-and-tablespace-O.patch > > first and then v8-0002-Preserve-database-OIDs-in-pg_upgrade.patch. > > OK, so looking over 0002, I noticed a few things: > > 1. datlastsysoid isn't being used for anything any more. That's not a > defect in your patch, but I've separately proposed to remove it. okay > 2. I realized that the whole idea here depends on not having initdb > create more than one database without a fixed OID. The patch solves > that by nailing down the OID of template0, which is a sufficient > solution. However, I think nailing down the (initial) OID of postgres > as well would be a good idea, just in case somebody in the future > decides to add another system-created database. I agree with your thought. In my latest patch, postgres db gets created with a fixed OID. I have chosen an arbitrary number 16000 as postgres OID from the unpinned object OID range1200 - 16383. > 3. The changes to gram.y don't do anything. Sure, you've added a new > "OID" token, but nothing generates that token, so it has no effect. > The reason the syntax works is that createdb_opt_name accepts "IDENT", > which means any string that's not in the keyword list (see kwlist.h). > But that's there already, so you don't need to do anything in this > file. okay > 4. I felt that the documentation and comments could be somewhat improved. The documentation and comment updates are more accurate with the required details. Thanks. > Here's an updated version in which I've reverted the changes to gram.y > and tried to improve the comments and documentation. Could you have a > look at implementing (2) above? Attached is the patch that implements comment (2). Shruthi KC EnterpriseDB: http://www.enterprisedb.com v10-0001-pg_upgrade-perserve-database-OID-patch.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Jan 18, 2022 at 12:35 AM Robert Haas wrote: > > On Tue, Dec 14, 2021 at 1:21 PM Shruthi Gowda wrote: > > Thanks, Robert for the updated version. I reviewed the changes and it > > looks fine. > > I also tested the patch. The patch works as expected. > > Thanks. > > > > - I adjusted the function header comment for heap_create. Your > > > proposed comment seemed like it was pretty detailed but not 100% > > > correct. It also made one of the lines kind of long because you didn't > > > wrap the text in the surrounding style. I decided to make it simpler > > > and shorter instead of longer still and 100% correct. > > > > The comment update looks fine. However, I still feel it would be good to > > mention on which (rare) circumstance a valid relfilenode can get passed. > > In general, I think it's the job of a function parameter comment to > describe what the parameter does, not how the callers actually use it. > One problem with describing the latter is that, if someone later adds > another caller, there is a pretty good chance that they won't notice > that the comment needs to be changed. More fundamentally, the > parameter function comments should be like an instruction manual for > how to use the function. If you are trying to figure out how to use > this function, it is not helpful to know that "most callers like to > pass false" for this parameter. What you need to know is what value > your new call site should pass, and knowing what "most callers" do or > that something is "rare" doesn't really help. If we want to make this > comment more detailed, we should approach it from the point of view of > explaining how it ought to be set. It's clear now. Thanks for clarifying. > I've committed the v8-0001 patch you attached. I'll write separately > about v8-0002. Sure. Thank you.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Mon, Jan 17, 2022 at 9:57 AM Shruthi Gowda wrote: > I have rebased and generated the patches on top of PostgreSQL commit > ID cf925936ecc031355cd56fbd392ec3180517a110. > Kindly apply v8-0001-pg_upgrade-Preserve-relfilenodes-and-tablespace-O.patch > first and then v8-0002-Preserve-database-OIDs-in-pg_upgrade.patch. OK, so looking over 0002, I noticed a few things: 1. datlastsysoid isn't being used for anything any more. That's not a defect in your patch, but I've separately proposed to remove it. 2. I realized that the whole idea here depends on not having initdb create more than one database without a fixed OID. The patch solves that by nailing down the OID of template0, which is a sufficient solution. However, I think nailing down the (initial) OID of postgres as well would be a good idea, just in case somebody in the future decides to add another system-created database. 3. The changes to gram.y don't do anything. Sure, you've added a new "OID" token, but nothing generates that token, so it has no effect. The reason the syntax works is that createdb_opt_name accepts "IDENT", which means any string that's not in the keyword list (see kwlist.h). But that's there already, so you don't need to do anything in this file. 4. I felt that the documentation and comments could be somewhat improved. Here's an updated version in which I've reverted the changes to gram.y and tried to improve the comments and documentation. Could you have a look at implementing (2) above? -- Robert Haas EDB: http://www.enterprisedb.com v9-0001-pg_upgrade-perserve-database-OID-patch-from-shrut.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Dec 14, 2021 at 1:21 PM Shruthi Gowda wrote: > Thanks, Robert for the updated version. I reviewed the changes and it > looks fine. > I also tested the patch. The patch works as expected. Thanks. > > - I adjusted the function header comment for heap_create. Your > > proposed comment seemed like it was pretty detailed but not 100% > > correct. It also made one of the lines kind of long because you didn't > > wrap the text in the surrounding style. I decided to make it simpler > > and shorter instead of longer still and 100% correct. > > The comment update looks fine. However, I still feel it would be good to > mention on which (rare) circumstance a valid relfilenode can get passed. In general, I think it's the job of a function parameter comment to describe what the parameter does, not how the callers actually use it. One problem with describing the latter is that, if someone later adds another caller, there is a pretty good chance that they won't notice that the comment needs to be changed. More fundamentally, the parameter function comments should be like an instruction manual for how to use the function. If you are trying to figure out how to use this function, it is not helpful to know that "most callers like to pass false" for this parameter. What you need to know is what value your new call site should pass, and knowing what "most callers" do or that something is "rare" doesn't really help. If we want to make this comment more detailed, we should approach it from the point of view of explaining how it ought to be set. I've committed the v8-0001 patch you attached. I'll write separately about v8-0002. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Sat, Jan 15, 2022 at 11:17 AM Julien Rouhaud wrote: > > Hi, > > On Fri, Dec 17, 2021 at 01:03:06PM +0530, Shruthi Gowda wrote: > > > > I have updated the DBOID preserve patch to handle this case and > > generated the latest patch on top of your v7-001-preserve-relfilenode > > patch. > > The cfbot reports that the patch doesn't apply anymore: > http://cfbot.cputube.org/patch_36_3296.log > === Applying patches on top of PostgreSQL commit ID > 5513dc6a304d8bda114004a3b906cc6fde5d6274 === > === applying patch ./v7-0002-Preserve-database-OIDs-in-pg_upgrade.patch > [...] > patching file src/bin/pg_upgrade/info.c > Hunk #1 FAILED at 190. > Hunk #2 succeeded at 351 (offset 27 lines). > 1 out of 2 hunks FAILED -- saving rejects to file > src/bin/pg_upgrade/info.c.rej > patching file src/bin/pg_upgrade/pg_upgrade.h > Hunk #1 FAILED at 145. > 1 out of 1 hunk FAILED -- saving rejects to file > src/bin/pg_upgrade/pg_upgrade.h.rej > patching file src/bin/pg_upgrade/relfilenode.c > Hunk #1 FAILED at 193. > 1 out of 1 hunk FAILED -- saving rejects to file > src/bin/pg_upgrade/relfilenode.c.rej > > Could you send a rebased version? In the meantime I willl switch the cf entry > to Waiting on Author. I have rebased and generated the patches on top of PostgreSQL commit ID cf925936ecc031355cd56fbd392ec3180517a110. Kindly apply v8-0001-pg_upgrade-Preserve-relfilenodes-and-tablespace-O.patch first and then v8-0002-Preserve-database-OIDs-in-pg_upgrade.patch. Thanks & Regards, Shruthi KC EnterpriseDB: http://www.enterprisedb.com v8-0002-Preserve-database-OIDs-in-pg_upgrade.patch Description: Binary data v8-0001-pg_upgrade-Preserve-relfilenodes-and-tablespace-O.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Hi, On Fri, Dec 17, 2021 at 01:03:06PM +0530, Shruthi Gowda wrote: > > I have updated the DBOID preserve patch to handle this case and > generated the latest patch on top of your v7-001-preserve-relfilenode > patch. The cfbot reports that the patch doesn't apply anymore: http://cfbot.cputube.org/patch_36_3296.log === Applying patches on top of PostgreSQL commit ID 5513dc6a304d8bda114004a3b906cc6fde5d6274 === === applying patch ./v7-0002-Preserve-database-OIDs-in-pg_upgrade.patch [...] patching file src/bin/pg_upgrade/info.c Hunk #1 FAILED at 190. Hunk #2 succeeded at 351 (offset 27 lines). 1 out of 2 hunks FAILED -- saving rejects to file src/bin/pg_upgrade/info.c.rej patching file src/bin/pg_upgrade/pg_upgrade.h Hunk #1 FAILED at 145. 1 out of 1 hunk FAILED -- saving rejects to file src/bin/pg_upgrade/pg_upgrade.h.rej patching file src/bin/pg_upgrade/relfilenode.c Hunk #1 FAILED at 193. 1 out of 1 hunk FAILED -- saving rejects to file src/bin/pg_upgrade/relfilenode.c.rej Could you send a rebased version? In the meantime I willl switch the cf entry to Waiting on Author.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Mon, Dec 13, 2021 at 8:43 PM Shruthi Gowda wrote: > > On Mon, Dec 6, 2021 at 11:25 PM Robert Haas wrote: > > > > On Sun, Dec 5, 2021 at 11:44 PM Sadhuprasad Patro wrote: > > > 3. > > > @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt > > > *stmt) > > > */ > > > pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock); > > > > > > - do > > > + /* Select an OID for the new database if is not explicitly configured. > > > */ > > > + if (!OidIsValid(dboid)) > > > { > > > - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId, > > > -Anum_pg_database_oid); > > > - } while (check_db_file_conflict(dboid)); > > > > > > I think we need to do 'check_db_file_conflict' for the USER given OID > > > also.. right? It may already be present. > > > > Hopefully, if that happens, we straight up fail later on. > > That's right. If a database with user-specified OID exists, the > createdb fails with a "duplicate key value" error. > If just a data directory with user-specified OID exists, > MakePGDirectory() fails to create the directory and the cleanup > callback createdb_failure_callback() removes the directory that was > not created by 'createdb()' function. > The subsequent create database call with the same OID will succeed. > Should we handle the case where a data directory exists and the > corresponding DB with that oid does not exist? I presume this > situation doesn't arise unless the user tries to create directories in > the data path. Any thoughts? I have updated the DBOID preserve patch to handle this case and generated the latest patch on top of your v7-001-preserve-relfilenode patch. Thanks & Regards Shruthi KC EnterpriseDB: http://www.enterprisedb.com v7-0002-Preserve-database-OIDs-in-pg_upgrade.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On 12/15/21 12:09 AM, tushar wrote: I spent much of today reviewing 0001. Here's an updated version, so far only lightly tested. Please check whether I've broken anything. Thanks Robert, I tested from v96/12/13/v14 -> v15( with patch) things are working fine i.e table /index relfilenode is preserved, not changing after pg_upgrade. I covered tablespace OIDs testing scenarios and that is also preserved after pg_upgrade. -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On 12/14/21 2:35 AM, Robert Haas wrote: I spent much of today reviewing 0001. Here's an updated version, so far only lightly tested. Please check whether I've broken anything. Thanks Robert, I tested from v96/12/13/v14 -> v15( with patch) things are working fine i.e table /index relfilenode is preserved, not changing after pg_upgrade. -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Dec 14, 2021 at 2:35 AM Robert Haas wrote: > > On Mon, Dec 13, 2021 at 9:40 AM Shruthi Gowda wrote: > > > I am reviewing another patch > > > "v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg" as well > > > and will provide the comments soon if any... > > I spent much of today reviewing 0001. Here's an updated version, so > far only lightly tested. Please check whether I've broken anything. > Here are the changes: Thanks, Robert for the updated version. I reviewed the changes and it looks fine. I also tested the patch. The patch works as expected. > - I adjusted the function header comment for heap_create. Your > proposed comment seemed like it was pretty detailed but not 100% > correct. It also made one of the lines kind of long because you didn't > wrap the text in the surrounding style. I decided to make it simpler > and shorter instead of longer still and 100% correct. The comment update looks fine. However, I still feel it would be good to mention on which (rare) circumstance a valid relfilenode can get passed. > - I removed a one-line comment that said /* Override the toast > relfilenode */ because it preceded an error check, not a line of code > that would have done what the comment claimed. > > - I removed a one-line comment that said /* Override the relfilenode > */ because the following code would only sometimes override the > relfilenode. The code didn't seem complex enough to justify a a longer > and more accurate comment, so I just took it out. Fine > - I changed a test for (relkind == RELKIND_RELATION || relkind == > RELKIND_SEQUENCE || relkind == RELKIND_MATVIEW) to use > RELKIND_HAS_STORAGE(). It's true that not all of the storage types > that RELKIND_HAS_STORAGE() tests are possible here, but that's not a > reason to avoiding using the macro. If somebody adds a new relkind > with storage in the future, they might miss the need to manually > update this place, but they will not likely miss the need to update > RELKIND_HAS_STORAGE() since, if they did, their code probably wouldn't > work at all. I agree. > - I changed the way that you were passing create_storage down to > heap_create. I think I said before that you should EITHER fix it so > that we set create_storage = true only when the relation actually has > storage OR ELSE have heap_create() itself override the value to false > when there is no storage. You did both. There are times when it's > reasonable to ensure the same thing in multiple places, but this > doesn't seem to be one of them. So I took that out. I chose to retain > the code in heap_create() that overrides the value to false, added a > comment explaining that it does that, and then adjusted the callers to > ignore the storage type. I then added comments, and in one place an > assertion, to make it clearer what is happening. The changes are fine. Thanks for the fine-tuning. > - In pg_dump.c, I adjusted the comment that says "Not every relation > has storage." and the test that immediately follows, to ignore the > relfilenode when relkind says it's a partitioned table. Really, > partitioned tables should never have had relfilenodes, but as it turns > out, they used to have them. > Fine. Understood. Thanks & Regards, Shruthi KC EnterpriseDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Mon, Dec 13, 2021 at 9:40 AM Shruthi Gowda wrote: > > I am reviewing another patch > > "v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg" as well > > and will provide the comments soon if any... I spent much of today reviewing 0001. Here's an updated version, so far only lightly tested. Please check whether I've broken anything. Here are the changes: - I adjusted the function header comment for heap_create. Your proposed comment seemed like it was pretty detailed but not 100% correct. It also made one of the lines kind of long because you didn't wrap the text in the surrounding style. I decided to make it simpler and shorter instead of longer still and 100% correct. - I removed a one-line comment that said /* Override the toast relfilenode */ because it preceded an error check, not a line of code that would have done what the comment claimed. - I removed a one-line comment that said /* Override the relfilenode */ because the following code would only sometimes override the relfilenode. The code didn't seem complex enough to justify a a longer and more accurate comment, so I just took it out. - I changed a test for (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE || relkind == RELKIND_MATVIEW) to use RELKIND_HAS_STORAGE(). It's true that not all of the storage types that RELKIND_HAS_STORAGE() tests are possible here, but that's not a reason to avoiding using the macro. If somebody adds a new relkind with storage in the future, they might miss the need to manually update this place, but they will not likely miss the need to update RELKIND_HAS_STORAGE() since, if they did, their code probably wouldn't work at all. - I changed the way that you were passing create_storage down to heap_create. I think I said before that you should EITHER fix it so that we set create_storage = true only when the relation actually has storage OR ELSE have heap_create() itself override the value to false when there is no storage. You did both. There are times when it's reasonable to ensure the same thing in multiple places, but this doesn't seem to be one of them. So I took that out. I chose to retain the code in heap_create() that overrides the value to false, added a comment explaining that it does that, and then adjusted the callers to ignore the storage type. I then added comments, and in one place an assertion, to make it clearer what is happening. - In pg_dump.c, I adjusted the comment that says "Not every relation has storage." and the test that immediately follows, to ignore the relfilenode when relkind says it's a partitioned table. Really, partitioned tables should never have had relfilenodes, but as it turns out, they used to have them. Let me know your thoughts. -- Robert Haas EDB: http://www.enterprisedb.com v7-0001-pg_upgrade-Preserve-relfilenodes-and-tablespace-O.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Mon, Dec 6, 2021 at 11:25 PM Robert Haas wrote: > > On Sun, Dec 5, 2021 at 11:44 PM Sadhuprasad Patro wrote: > > 3. > > @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) > > */ > > pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock); > > > > - do > > + /* Select an OID for the new database if is not explicitly configured. */ > > + if (!OidIsValid(dboid)) > > { > > - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId, > > -Anum_pg_database_oid); > > - } while (check_db_file_conflict(dboid)); > > > > I think we need to do 'check_db_file_conflict' for the USER given OID > > also.. right? It may already be present. > > Hopefully, if that happens, we straight up fail later on. That's right. If a database with user-specified OID exists, the createdb fails with a "duplicate key value" error. If just a data directory with user-specified OID exists, MakePGDirectory() fails to create the directory and the cleanup callback createdb_failure_callback() removes the directory that was not created by 'createdb()' function. The subsequent create database call with the same OID will succeed. Should we handle the case where a data directory exists and the corresponding DB with that oid does not exist? I presume this situation doesn't arise unless the user tries to create directories in the data path. Any thoughts? Thanks & Regards Shruthi KC EnterpriseDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Mon, Dec 6, 2021 at 10:14 AM Sadhuprasad Patro wrote: > > On Tue, Oct 26, 2021 at 6:55 PM Shruthi Gowda wrote: > > > > > > I have revised the patch w.r.t the way 'create_storage' is interpreted > > in heap_create() along with some minor changes to preserve the DBOID > > patch. > > > > Hi Shruthi, > > I am reviewing the attached patches and providing a few comments here > below for patch "v5-0002-Preserve-database-OIDs-in-pg_upgrade.patch" > > 1. > --- a/doc/src/sgml/ref/create_database.sgml > +++ b/doc/src/sgml/ref/create_database.sgml > @@ -31,7 +31,8 @@ CREATE DATABASE class="parameter">name > - [ IS_TEMPLATE [=] class="parameter">istemplate ] ] > + [ IS_TEMPLATE [=] class="parameter">istemplate ] > + [ OID [=] class="parameter">db_oid ] ] > > Replace "db_oid" with 'oid'. Below in the listitem, we have mentioned 'oid'. Replaced "db_oid" with "oid" > > 2. > --- a/src/backend/commands/dbcommands.c > +++ b/src/backend/commands/dbcommands.c > + if ((dboid < FirstNormalObjectId) && > + (strcmp(dbname, "template0") != 0) && > + (!IsBinaryUpgrade)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE)), > + errmsg("Invalid value for option \"%s\"", defel->defname), > + errhint("The specified OID %u is less than the minimum OID for user > objects %u.", > + dboid, FirstNormalObjectId)); > + } > > Are we sure that 'IsBinaryUpgrade' will be set properly, before the > createdb function is called? Can we recheck once ? I believe 'IsBinaryUpgrade' will be set to true when pg_upgrade is invoked. pg_ugrade internally does pg_dump and pg_restore for every database in the cluster. > 3. > @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) > */ > pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock); > > - do > + /* Select an OID for the new database if is not explicitly configured. */ > + if (!OidIsValid(dboid)) > { > - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId, > -Anum_pg_database_oid); > - } while (check_db_file_conflict(dboid)); > > I think we need to do 'check_db_file_conflict' for the USER given OID > also.. right? It may already be present. If a datafile with user-specified OID exists, the create database fails with the below error. postgres=# create database d2 oid 16452; ERROR: could not create directory "base/16452": File exists > 4. > --- a/src/bin/initdb/initdb.c > +++ b/src/bin/initdb/initdb.c > > /* > + * Create template0 database with oid Template0ObjectId i.e, 4 > + */ > + > > Better to mention here, why OID 4 is reserved for template0 database?. The comment is updated to explain why template0 oid is fixed. > 5. > + /* > + * Create template0 database with oid Template0ObjectId i.e, 4 > + */ > + static const char *const template0_setup[] = { > + "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID > " > + CppAsString2(Template0ObjectId) ";\n\n", > > Can we write something like, 'OID = CppAsString2(Template0ObjectId)'? > mention "=". Fixed > 6. > + > + /* > + * We use the OID of postgres to determine datlastsysoid > + */ > + "UPDATE pg_database SET datlastsysoid = " > + "(SELECT oid FROM pg_database " > + "WHERE datname = 'postgres');\n\n", > + > > Make the above comment a single line comment. As Robert confirmed, this part of the code is moved from a different place. > 7. > There are some spelling mistakes in the comments as below, please > correct the same > + /* > + * Make sure that binary upgrade propogate the database OID to the > new => correct spelling > + * cluster > + */ > > +/* OID 4 is reserved for Templete0 database */ > > Correct spelling > +#define Template0ObjectId 4 > Fixed. > I am reviewing another patch > "v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg" as well > and will provide the comments soon if any... Thanks. I have rebased relfilenode oid preserve patch. You may use the rebased patch for review. Thanks & Regards Shruthi K C EnterpriseDB: http://www.enterprisedb.com v6-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg.patch Description: Binary data v6-0002-Preserve-database-OIDs-in-pg_upgrade.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Sun, Dec 5, 2021 at 11:44 PM Sadhuprasad Patro wrote: > 1. > --- a/doc/src/sgml/ref/create_database.sgml > +++ b/doc/src/sgml/ref/create_database.sgml > @@ -31,7 +31,8 @@ CREATE DATABASE class="parameter">name > - [ IS_TEMPLATE [=] class="parameter">istemplate ] ] > + [ IS_TEMPLATE [=] class="parameter">istemplate ] > + [ OID [=] class="parameter">db_oid ] ] > > Replace "db_oid" with 'oid'. Below in the listitem, we have mentioned 'oid'. I agree that the listitem and the synopsis need to be consistent, but it could be made consistent either by changing that one to db_oid or this one to oid. > 2. > --- a/src/backend/commands/dbcommands.c > +++ b/src/backend/commands/dbcommands.c > + if ((dboid < FirstNormalObjectId) && > + (strcmp(dbname, "template0") != 0) && > + (!IsBinaryUpgrade)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE)), > + errmsg("Invalid value for option \"%s\"", defel->defname), > + errhint("The specified OID %u is less than the minimum OID for user > objects %u.", > + dboid, FirstNormalObjectId)); > + } > > Are we sure that 'IsBinaryUpgrade' will be set properly, before the > createdb function is called? Can we recheck once ? How could it be set incorrectly, and how could we recheck this? > 3. > @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) > */ > pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock); > > - do > + /* Select an OID for the new database if is not explicitly configured. */ > + if (!OidIsValid(dboid)) > { > - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId, > -Anum_pg_database_oid); > - } while (check_db_file_conflict(dboid)); > > I think we need to do 'check_db_file_conflict' for the USER given OID > also.. right? It may already be present. Hopefully, if that happens, we straight up fail later on. > 4. > --- a/src/bin/initdb/initdb.c > +++ b/src/bin/initdb/initdb.c > > /* > + * Create template0 database with oid Template0ObjectId i.e, 4 > + */ > + > > Better to mention here, why OID 4 is reserved for template0 database?. I'm not sure how we would give a reason for selecting an arbitrary constant? We could perhaps explain why we use a fixed OID. But there's no reason it has to be 4, I think. > 5. > + /* > + * Create template0 database with oid Template0ObjectId i.e, 4 > + */ > + static const char *const template0_setup[] = { > + "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID > " > + CppAsString2(Template0ObjectId) ";\n\n", > > Can we write something like, 'OID = CppAsString2(Template0ObjectId)'? > mention "=". That seems like a good idea, because it would be more consistent. > 6. > + > + /* > + * We use the OID of postgres to determine datlastsysoid > + */ > + "UPDATE pg_database SET datlastsysoid = " > + "(SELECT oid FROM pg_database " > + "WHERE datname = 'postgres');\n\n", > + > > Make the above comment a single line comment. I think what Shruthi did is more correct. It doesn't have to be done as a single-line comment just because it can fit on one line. And Shruthi didn't write this comment anyway, it's only moved slightly from where it was before. > 7. > There are some spelling mistakes in the comments as below, please > correct the same > + /* > + * Make sure that binary upgrade propogate the database OID to the > new => correct spelling > + * cluster > + */ > > +/* OID 4 is reserved for Templete0 database */ > > Correct spelling > +#define Template0ObjectId 4 Yes, those would be good to fix. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Oct 26, 2021 at 6:55 PM Shruthi Gowda wrote: > > > I have revised the patch w.r.t the way 'create_storage' is interpreted > in heap_create() along with some minor changes to preserve the DBOID > patch. > Hi Shruthi, I am reviewing the attached patches and providing a few comments here below for patch "v5-0002-Preserve-database-OIDs-in-pg_upgrade.patch" 1. --- a/doc/src/sgml/ref/create_database.sgml +++ b/doc/src/sgml/ref/create_database.sgml @@ -31,7 +31,8 @@ CREATE DATABASE name - [ IS_TEMPLATE [=] istemplate ] ] + [ IS_TEMPLATE [=] istemplate ] + [ OID [=] db_oid ] ] Replace "db_oid" with 'oid'. Below in the listitem, we have mentioned 'oid'. 2. --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c + if ((dboid < FirstNormalObjectId) && + (strcmp(dbname, "template0") != 0) && + (!IsBinaryUpgrade)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE)), + errmsg("Invalid value for option \"%s\"", defel->defname), + errhint("The specified OID %u is less than the minimum OID for user objects %u.", + dboid, FirstNormalObjectId)); + } Are we sure that 'IsBinaryUpgrade' will be set properly, before the createdb function is called? Can we recheck once ? 3. @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) */ pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock); - do + /* Select an OID for the new database if is not explicitly configured. */ + if (!OidIsValid(dboid)) { - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId, -Anum_pg_database_oid); - } while (check_db_file_conflict(dboid)); I think we need to do 'check_db_file_conflict' for the USER given OID also.. right? It may already be present. 4. --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c /* + * Create template0 database with oid Template0ObjectId i.e, 4 + */ + Better to mention here, why OID 4 is reserved for template0 database?. 5. + /* + * Create template0 database with oid Template0ObjectId i.e, 4 + */ + static const char *const template0_setup[] = { + "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID " + CppAsString2(Template0ObjectId) ";\n\n", Can we write something like, 'OID = CppAsString2(Template0ObjectId)'? mention "=". 6. + + /* + * We use the OID of postgres to determine datlastsysoid + */ + "UPDATE pg_database SET datlastsysoid = " + "(SELECT oid FROM pg_database " + "WHERE datname = 'postgres');\n\n", + Make the above comment a single line comment. 7. There are some spelling mistakes in the comments as below, please correct the same + /* + * Make sure that binary upgrade propogate the database OID to the new => correct spelling + * cluster + */ +/* OID 4 is reserved for Templete0 database */ > Correct spelling +#define Template0ObjectId 4 I am reviewing another patch "v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg" as well and will provide the comments soon if any... Thanks & Regards SadhuPrasad EnterpriseDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Oct 7, 2021 at 7:33 PM Robert Haas wrote: > > On Thu, Oct 7, 2021 at 3:24 AM Shruthi Gowda wrote: > > Every other > > caller/flow passes false for 'create_storage' and we still need to > > create storage in heap_create() if relkind has storage. > > That seems surprising. I have revised the patch w.r.t the way 'create_storage' is interpreted in heap_create() along with some minor changes to preserve the DBOID patch. Regards, Shruthi KC EnterpriseDB: http://www.enterprisedb.com v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg.patch Description: Binary data v5-0002-Preserve-database-OIDs-in-pg_upgrade.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Oct 7, 2021 at 3:24 AM Shruthi Gowda wrote: > Every other > caller/flow passes false for 'create_storage' and we still need to > create storage in heap_create() if relkind has storage. That seems surprising. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Oct 7, 2021 at 2:05 AM Robert Haas wrote: > > On Mon, Oct 4, 2021 at 12:44 PM Shruthi Gowda wrote: > > Thanks for the inputs, Robert. In the v4 patch, an unused OID (i.e, 4) > > is fixed for the template0 and the same is removed from unused oid > > list. > > > > In addition to the review comment fixes, I have removed some code that > > is no longer needed/doesn't make sense since we preserve the OIDs. > > This is not a full review, but I'm wondering about this bit of code: > > - if (!RELKIND_HAS_STORAGE(relkind) || OidIsValid(relfilenode)) > + if (!RELKIND_HAS_STORAGE(relkind) || (OidIsValid(relfilenode) > && !create_storage)) > create_storage = false; > else > { > create_storage = true; > - relfilenode = relid; > + > + /* > +* Create the storage with oid same as relid if relfilenode is > +* unspecified by the caller > +*/ > + if (!OidIsValid(relfilenode)) > + relfilenode = relid; > } > > This seems hard to understand, and I wonder if perhaps it can be > simplified. If !RELKIND_HAS_STORAGE(relkind), then we're going to set > create_storage to false if it was previously true, and otherwise just > do nothing. Otherwise, if !create_storage, we'll enter the > create_storage = false branch which effectively does nothing. > Otherwise, if !OidIsValid(relfilenode), we'll set relfilenode = relid. > So couldn't we express that like this? > > if (!RELKIND_HAS_STORAGE(relkind)) > create_storage = false; > else if (create_storage && !OidIsValid(relfilenode)) > relfilenode = relid; > > If so, that seems more clear. 'create_storage' flag says whether or not to create the storage when a valid relfilenode is passed. 'create_storage' flag alone cannot make the storage creation decision in heap_create(). Only binary upgrade flow sets the 'create_storage' flag to true and expects storage gets created with specified relfilenode. Every other caller/flow passes false for 'create_storage' and we still need to create storage in heap_create() if relkind has storage. That's why I have explicitly set 'create_storage = true' in the else flow and initialize relfilenode on need basis. Regards, Shruthi KC EnterpriseDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Mon, Oct 4, 2021 at 12:44 PM Shruthi Gowda wrote: > Thanks for the inputs, Robert. In the v4 patch, an unused OID (i.e, 4) > is fixed for the template0 and the same is removed from unused oid > list. > > In addition to the review comment fixes, I have removed some code that > is no longer needed/doesn't make sense since we preserve the OIDs. This is not a full review, but I'm wondering about this bit of code: - if (!RELKIND_HAS_STORAGE(relkind) || OidIsValid(relfilenode)) + if (!RELKIND_HAS_STORAGE(relkind) || (OidIsValid(relfilenode) && !create_storage)) create_storage = false; else { create_storage = true; - relfilenode = relid; + + /* +* Create the storage with oid same as relid if relfilenode is +* unspecified by the caller +*/ + if (!OidIsValid(relfilenode)) + relfilenode = relid; } This seems hard to understand, and I wonder if perhaps it can be simplified. If !RELKIND_HAS_STORAGE(relkind), then we're going to set create_storage to false if it was previously true, and otherwise just do nothing. Otherwise, if !create_storage, we'll enter the create_storage = false branch which effectively does nothing. Otherwise, if !OidIsValid(relfilenode), we'll set relfilenode = relid. So couldn't we express that like this? if (!RELKIND_HAS_STORAGE(relkind)) create_storage = false; else if (create_storage && !OidIsValid(relfilenode)) relfilenode = relid; If so, that seems more clear. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Fri, Sep 24, 2021 at 12:44 AM Robert Haas wrote: > > On Wed, Sep 22, 2021 at 3:07 PM Shruthi Gowda wrote: > > > - The comment in binary_upgrade_set_pg_class_oids() is still not > > > accurate. You removed the sentence which says "Indexes cannot have > > > toast tables, so we need not make this probe in the index code path" > > > but the immediately preceding sentence is still inaccurate in at least > > > two ways. First, it only talks about tables, but the code now applies > > > to indexes. Second, it only talks about OIDs, but now also deals with > > > refilenodes. It's really important to fully update every comment that > > > might be affected by your changes! > > > > The comment is updated. > > Looks good. > > > The SQL query will not result in duplicate rows because the first join > > filters the duplicate rows if any with the on clause ' i.indisvalid' > > on it. The result of the first join is further left joined with pg_class and > > pg_class will not have duplicate rows for a given oid. > > Oh, you're right. My mistake. > > > As per your suggestion, reloid and relfilenode are absorbed in the same > > place. > > An additional parameter called 'suppress_storage' is passed to heap_create() > > which indicates whether or not to create the storage when the caller > > passed a valid relfilenode. > > I find it confusing to have both suppress_storage and create_storage > with one basically as the negation of the other. To avoid that sort of > thing I generally have a policy that variables and options should say > whether something should happen, rather than whether it should be > prevented from happening. Sometimes there are good reasons - such as > strong existing precedent - to deviate from this practice but I think > it's good to follow when possible. So my proposal is to always have > create_storage and never suppress_storage, and if some function needs > to adjust the value of create_storage that was passed to it then OK. Sure, I agree. In the latest patch, only 'create_storage' is used. > > I did not make the changes to set the oid and relfilenode in the same call. > > I feel the uniformity w.r.t the other function signatures in > > pg_upgrade_support.c will be lost because currently each function sets > > only one attribute. > > Also, renaming the applicable function names to represent that they > > set both oid and relfilenode will make the function name even longer. > > We may opt to not include the relfilenode in the function name instead > > use a generic name like binary_upgrade_set_next_xxx_pg_class_oid() but > > then > > we will end up with some functions that set two attributes and some > > functions that set one attribute. > > OK. > > > I have also attached the patch to preserve the DB oid. As discussed, > > template0 will be created with a fixed oid during initdb. I am using > > OID 2 for template0. Even though oid 2 is already in use for the > > 'pg_am' catalog I see no harm in using it for template0 DB because oid > > doesn’t have to be unique across the database - it has to be unique > > for the particular catalog table. Kindly let me know if I am missing > > something? > > Apparently, if we did decide to pick an unused oid for template0 then > > I see a challenge in removing that oid from the unused oid list. I > > could not come up with a feasible solution for handling it. > > You are correct that there is no intrinsic reason why the same OID > can't be used in various different catalogs. We have a policy of not > doing that, though; I'm not clear on the reason. Maybe it'd be OK to > deviate from that policy here, but another option would be to simply > change the unused_oids script (and maybe some of the others). It > already has: > > my $FirstGenbkiObjectId = > Catalog::FindDefinedSymbol('access/transam.h', '..', 'FirstGenbkiObjectId'); > push @{$oids}, $FirstGenbkiObjectId; > > Presumably it could be easily adapted to push the value of some other > defined symbol into @{$oids} also, thus making that OID in effect > used. Thanks for the inputs, Robert. In the v4 patch, an unused OID (i.e, 4) is fixed for the template0 and the same is removed from unused oid list. In addition to the review comment fixes, I have removed some code that is no longer needed/doesn't make sense since we preserve the OIDs. Regards, Shruthi KC EnterpriseDB: http://www.enterprisedb.com v4-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg.patch Description: Binary data v4-0002-Preserve-database-OIDs-in-pg_upgrade.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Wed, Sep 22, 2021 at 3:07 PM Shruthi Gowda wrote: > > - The comment in binary_upgrade_set_pg_class_oids() is still not > > accurate. You removed the sentence which says "Indexes cannot have > > toast tables, so we need not make this probe in the index code path" > > but the immediately preceding sentence is still inaccurate in at least > > two ways. First, it only talks about tables, but the code now applies > > to indexes. Second, it only talks about OIDs, but now also deals with > > refilenodes. It's really important to fully update every comment that > > might be affected by your changes! > > The comment is updated. Looks good. > The SQL query will not result in duplicate rows because the first join > filters the duplicate rows if any with the on clause ' i.indisvalid' > on it. The result of the first join is further left joined with pg_class and > pg_class will not have duplicate rows for a given oid. Oh, you're right. My mistake. > As per your suggestion, reloid and relfilenode are absorbed in the same place. > An additional parameter called 'suppress_storage' is passed to heap_create() > which indicates whether or not to create the storage when the caller > passed a valid relfilenode. I find it confusing to have both suppress_storage and create_storage with one basically as the negation of the other. To avoid that sort of thing I generally have a policy that variables and options should say whether something should happen, rather than whether it should be prevented from happening. Sometimes there are good reasons - such as strong existing precedent - to deviate from this practice but I think it's good to follow when possible. So my proposal is to always have create_storage and never suppress_storage, and if some function needs to adjust the value of create_storage that was passed to it then OK. > I did not make the changes to set the oid and relfilenode in the same call. > I feel the uniformity w.r.t the other function signatures in > pg_upgrade_support.c will be lost because currently each function sets > only one attribute. > Also, renaming the applicable function names to represent that they > set both oid and relfilenode will make the function name even longer. > We may opt to not include the relfilenode in the function name instead > use a generic name like binary_upgrade_set_next_xxx_pg_class_oid() but > then > we will end up with some functions that set two attributes and some > functions that set one attribute. OK. > I have also attached the patch to preserve the DB oid. As discussed, > template0 will be created with a fixed oid during initdb. I am using > OID 2 for template0. Even though oid 2 is already in use for the > 'pg_am' catalog I see no harm in using it for template0 DB because oid > doesn’t have to be unique across the database - it has to be unique > for the particular catalog table. Kindly let me know if I am missing > something? > Apparently, if we did decide to pick an unused oid for template0 then > I see a challenge in removing that oid from the unused oid list. I > could not come up with a feasible solution for handling it. You are correct that there is no intrinsic reason why the same OID can't be used in various different catalogs. We have a policy of not doing that, though; I'm not clear on the reason. Maybe it'd be OK to deviate from that policy here, but another option would be to simply change the unused_oids script (and maybe some of the others). It already has: my $FirstGenbkiObjectId = Catalog::FindDefinedSymbol('access/transam.h', '..', 'FirstGenbkiObjectId'); push @{$oids}, $FirstGenbkiObjectId; Presumably it could be easily adapted to push the value of some other defined symbol into @{$oids} also, thus making that OID in effect used. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 24, 2021 at 2:27 AM Robert Haas wrote: > It's pretty clear from the discussion, I think, that the database OID > one is going to need rework to be considered. > > Regarding the other one: > > - The comment in binary_upgrade_set_pg_class_oids() is still not > accurate. You removed the sentence which says "Indexes cannot have > toast tables, so we need not make this probe in the index code path" > but the immediately preceding sentence is still inaccurate in at least > two ways. First, it only talks about tables, but the code now applies > to indexes. Second, it only talks about OIDs, but now also deals with > refilenodes. It's really important to fully update every comment that > might be affected by your changes! The comment is updated. > - The SQL query in that function isn't completely correct. There is a > left join from pg_class to pg_index whose ON clause includes > "c.reltoastrelid = i.indrelid AND i.indisvalid." The reason it's > likely that is because it is possible, in corner cases, for a TOAST > table to have multiple TOAST indexes. I forget exactly how that > happens, but I think it might be like if a REINDEX CONCURRENTLY on the > TOAST table fails midway through, or something of that sort. Now if > that happens, the LEFT JOIN you added is going to cause the output to > contain multiple rows, because you didn't replicate the i.indisvalid > condition into that ON clause. And then it will fail. Apparently we > don't have a pg_upgrade test case for this scenario; we probably > should. Actually what I think would be even better than putting > i.indisvalid into that ON clause would be to join off of i.indrelid > rather than c.reltoastrelid. The SQL query will not result in duplicate rows because the first join filters the duplicate rows if any with the on clause ' i.indisvalid' on it. The result of the first join is further left joined with pg_class and pg_class will not have duplicate rows for a given oid. > - The code that decodes the various columns of this query does so in a > slightly different order than the query itself. It would be better to > make it match. Perhaps put relkind first in both cases. I might also > think about trying to make the column naming a bit more consistent, > e.g. relkind, relfilenode, toast_oid, toast_relfilenode, > toast_index_oid, toast_index_relfilenode. Fixed. > - In heap_create(), the wording of the error messages is not quite > consistent. You have "relfilenode value not set when in binary upgrade > mode", "toast relfilenode value not set when in binary upgrade mode", > and "pg_class index relfilenode value not set when in binary upgrade > mode". Why does the last one mention pg_class when the other two > don't? The error message is made consistent. This code chuck is moved to a different place as a part of another review comment fix. > - The code in heap_create() now has no comments whatsoever, which is a > shame, because it's actually kind of a tricky bit of logic. Someone > might wonder why we override the relfilenode inside that function > instead of doing it at the same places where we absorb > binary_upgrade_next_{heap,index,toast}_pg_class_oid and the passing > down the relfilenode. I think the answer is that passing down the > relfilenode from the caller would result in storage not actually being > created, whereas in this case we want it to be created but just with > the value we specify, and the reason we want that is because we need > later DDL that happens after these statements but before the old > cluster's relations are moved over to execute successfully, which it > won't if the storage is altogether absent. > However, that raises the question of whether this patch has even got > the basic design right. Maybe we ought to actually be absorbing the > relfilenode setting at the same places where we're doing so for the > OID, and then passing an additional parameter to heap_create() like > bool suppress_storage or something like that. Maybe, taking it even > further, we ought to be changing the signatures of > binary_upgrade_next_heap_pg_class_oid and friends to be two-argument > functions, and pass down the OID and the relfilenode in the same call, > rather than calling two separate functions. I'm not so much concerned > about the cost of calling two functions as the potential for > confusion. I'm not honestly sure that either of these changes are the > right thing to do, but I am pretty strongly inclined to do at least > the first part - trying to absorb reloid and relfilenode in the same > places. If we're not going to do that we certainly need to explain why > we're doing it the way we are in the comments. As per your suggestion, reloid and relfilenode are absorbed in the same place. An additional parameter called 'suppress_storage' is passed to heap_create() which indicates whether or not to create the storage when the caller passed a valid relfilenode. I did not make the changes to set the oid and relfilenode in the same
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 01:24:46PM -0400, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (br...@momjian.us) wrote: > > On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote: > > > Yes, we're talking about either incremental (or perhaps differential) > > > backup where only the files which are actually different would be backed > > > up. Just like with PG, I can't provide any complete guarantees that > > > we'd be able to actually make this possible after a major version with > > > pgBackRest with this change, but it definitely isn't possible *without* > > > this change. I can't see any reason why we wouldn't be able to do a > > > checksum-based incremental backup though (which would be *much* faster > > > than a regular backup) once this change is made and have that be a > > > reliable and trustworthy backup. I'd want to think about it more and > > > discuss it with David in some detail before saying if we could maybe > > > perform a timestamp-based incremental backup (without checksum'ing the > > > files, as we do in normal situations), but that would really just be a > > > bonus. > > > > Well, it would be nice to know exactly how it would help pgBackRest if > > that is one of the reasons we are adding this feature. > > pgBackRest keeps a manifest for every file in the PG data directory that > is backed up and we identify that file by the filename. Further, we > calculate a checksum for every file. If the filenames didn't change > then we'd be able to compare the file in the new cluster against the > file and checksum in the manifest in order to be able to perform the > incremental/differential backup. We don't store the inodes in the > manifest though, and we don't have any concept of looking at multiple > data directories at the same time or anything like that (which would > also mean that the old data directory would have to be kept around for > that to even work, which seems like a good bit of additional > complication and risk that someone might start up the old cluster by > accident..). > > That's how it'd be very helpful to pgBackRest for the filenames to be > preserved across pg_upgrade's. OK, that is clear. > > > > > > As far as TDE, I haven't seen any concrete plan for that, so why add > > > > > > this code for that reason? > > > > > > > > > > That this would help with TDE (of which there seems little doubt...) > > > > > is > > > > > an additional benefit to this. Specifically, taking the existing work > > > > > that's already been done to allow block-by-block encryption and > > > > > adjusting it for AES-XTS and then using the db-dir+relfileno+block > > > > > number as the IV, just like many disk encryption systems do, avoids > > > > > the > > > > > concerns that were brought up about using LSN for the IV with CTR and > > > > > it's certainly not difficult to do, but it does depend on this change. > > > > > This was all discussed previously and it sure looks like a sensible > > > > > approach to use that mirrors what many other systems already do > > > > > successfully. > > > > > > > > Well, I would think we would not add this for TDE until we were sure > > > > someone was working on adding TDE. > > > > > > That this would help with TDE is what I'd consider an added bonus. > > > > Not if we have no plans to implement TDE, which was my point. Why not > > wait to see if we are actually going to implement TDE rather than adding > > it now. It is just so obvious, why do I have to state this? > > There's been multiple years of effort put into implementing TDE and I'm > sure hopeful that it continues as I'm trying to put effort into moving > it forward myself. I'm a bit baffled by the idea that we're just Well, this is the first time I am hearing this publicly. > suddenly going to stop putting effort into TDE as it is brought up time > and time again by clients that I've talked to as one of the few reasons > they haven't moved to PG yet- I can't believe that hasn't been > experienced by folks at other organizations too, I mean, there's people > maintaining forks of PG specifically for TDE ... Agreed. > > > I've certainly done it and I'd be kind of surprised if others haven't, > > > but I've also played a lot with pg_dump in various modes, so perhaps > > > that's not a great representation. I've definitely had to explain to > > > clients why there's a whole different set of filenames after a > > > pg_upgrade and why that is the case for an 'in place' upgrade before > > > too. > > > > Uh, so I guess I am right that few people have mentioned this in the > > past. Why were users caring about the file names? > > This is a bit baffling to me. Users and admins certainly care about > what files their data is stored in and knowing how to find them. > Covering the data directory structure is a commonly asked for part of > the training that I regularly do for clients. I just never thought people cared about the file names, since I have never heard a complaint
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 01:20:38PM -0400, Robert Haas wrote: > So I think this proposed change is in the safe direction. If > relfilenodes were currently preserved and we wanted to make them not > be preserved, then I think you would be quite right to say "whoa, > whoa, that could be a problem." Indeed it could. If anyone then in the > future wanted to introduce a dependency on them staying the same, they > would have a problem. However, nothing in the server itself can care > about relfilenodes - or anything else - being *different* across a > pg_upgrade. The whole point of pg_upgrade is to make it feel like you > have the same database after you run it as you did before you ran it, > even though under the hood a lot of surgery has been done. Barring > bugs, you can never be sad about there being too LITTLE difference > between the post-upgrade database and the pre-upgrade database. Yes, this makes sense, and it is good we have stated the possible benefits now: * pgBackRest * pg_upgrade diagnostics * TDE (maybe) We can eventually evaluate the value of this based on those items. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote: > > Yes, we're talking about either incremental (or perhaps differential) > > backup where only the files which are actually different would be backed > > up. Just like with PG, I can't provide any complete guarantees that > > we'd be able to actually make this possible after a major version with > > pgBackRest with this change, but it definitely isn't possible *without* > > this change. I can't see any reason why we wouldn't be able to do a > > checksum-based incremental backup though (which would be *much* faster > > than a regular backup) once this change is made and have that be a > > reliable and trustworthy backup. I'd want to think about it more and > > discuss it with David in some detail before saying if we could maybe > > perform a timestamp-based incremental backup (without checksum'ing the > > files, as we do in normal situations), but that would really just be a > > bonus. > > Well, it would be nice to know exactly how it would help pgBackRest if > that is one of the reasons we are adding this feature. pgBackRest keeps a manifest for every file in the PG data directory that is backed up and we identify that file by the filename. Further, we calculate a checksum for every file. If the filenames didn't change then we'd be able to compare the file in the new cluster against the file and checksum in the manifest in order to be able to perform the incremental/differential backup. We don't store the inodes in the manifest though, and we don't have any concept of looking at multiple data directories at the same time or anything like that (which would also mean that the old data directory would have to be kept around for that to even work, which seems like a good bit of additional complication and risk that someone might start up the old cluster by accident..). That's how it'd be very helpful to pgBackRest for the filenames to be preserved across pg_upgrade's. > > > > > As far as TDE, I haven't seen any concrete plan for that, so why add > > > > > this code for that reason? > > > > > > > > That this would help with TDE (of which there seems little doubt...) is > > > > an additional benefit to this. Specifically, taking the existing work > > > > that's already been done to allow block-by-block encryption and > > > > adjusting it for AES-XTS and then using the db-dir+relfileno+block > > > > number as the IV, just like many disk encryption systems do, avoids the > > > > concerns that were brought up about using LSN for the IV with CTR and > > > > it's certainly not difficult to do, but it does depend on this change. > > > > This was all discussed previously and it sure looks like a sensible > > > > approach to use that mirrors what many other systems already do > > > > successfully. > > > > > > Well, I would think we would not add this for TDE until we were sure > > > someone was working on adding TDE. > > > > That this would help with TDE is what I'd consider an added bonus. > > Not if we have no plans to implement TDE, which was my point. Why not > wait to see if we are actually going to implement TDE rather than adding > it now. It is just so obvious, why do I have to state this? There's been multiple years of effort put into implementing TDE and I'm sure hopeful that it continues as I'm trying to put effort into moving it forward myself. I'm a bit baffled by the idea that we're just suddenly going to stop putting effort into TDE as it is brought up time and time again by clients that I've talked to as one of the few reasons they haven't moved to PG yet- I can't believe that hasn't been experienced by folks at other organizations too, I mean, there's people maintaining forks of PG specifically for TDE ... Seems like maybe we were both seeing something as obvious to the other that wasn't actually the case. > > > > > > make this required, general improved sanity when working with > > > > > > pg_upgrade > > > > > > is frankly a benefit in its own right too...). If the additional > > > > > > code > > > > > > > > > > How? I am not aware of any advantage except cosmetic. > > > > > > > > Having to resort to matching up inode numbers between the two clusters > > > > after a pg_upgrade to figure out what files are actually the same > > > > underneath is a pain that goes beyond just cosmetics imv. Removing that > > > > additional level that admins, and developers for that matter, have to go > > > > through would be a nice improvement on its own. > > > > > > OK, I was just not aware anyone did that, since I have never hard anyone > > > complain about it before. > > > > I've certainly done it and I'd be kind of surprised if others haven't, > > but I've also played a lot with pg_dump in various modes, so perhaps > > that's not a great representation. I've definitely had to explain to > > clients why there's a whole different set of filenames after a > > pg_upgrade
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 12:51 PM Bruce Momjian wrote: > I just don't want to add requirements/complexity to pg_upgrade without > clearly stated reasons because future database changes will need to > honor this new preservation behavior. Well, I agree that it's good to have reasons clearly stated and I hope that at this point you agree that they have been. Whether you agree with them is another question, but I hope you at least agree that they have been stated. As far as the other part of your concern, what I think makes this change pretty safe is that we are preserving more things rather than fewer. I can imagine some server behavior depending on something being the same between the old and the new clusters, but it is harder to imagine a dependency on something not being preserved. For example, we know that the OIDs of pg_type rows have to be the same in the old and new cluster because arrays are stored on disk with the type OIDs included. Therefore those need to be preserved. If in the future we changed things so that arrays - and other container types - did not include the type OIDs in the on-disk representation, then perhaps it would no longer be necessary to preserve the OIDs of pg_type rows across a pg_upgrade. However, it would not be harmful to do so. It just might not be required. So I think this proposed change is in the safe direction. If relfilenodes were currently preserved and we wanted to make them not be preserved, then I think you would be quite right to say "whoa, whoa, that could be a problem." Indeed it could. If anyone then in the future wanted to introduce a dependency on them staying the same, they would have a problem. However, nothing in the server itself can care about relfilenodes - or anything else - being *different* across a pg_upgrade. The whole point of pg_upgrade is to make it feel like you have the same database after you run it as you did before you ran it, even though under the hood a lot of surgery has been done. Barring bugs, you can never be sad about there being too LITTLE difference between the post-upgrade database and the pre-upgrade database. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote: > Yes, we're talking about either incremental (or perhaps differential) > backup where only the files which are actually different would be backed > up. Just like with PG, I can't provide any complete guarantees that > we'd be able to actually make this possible after a major version with > pgBackRest with this change, but it definitely isn't possible *without* > this change. I can't see any reason why we wouldn't be able to do a > checksum-based incremental backup though (which would be *much* faster > than a regular backup) once this change is made and have that be a > reliable and trustworthy backup. I'd want to think about it more and > discuss it with David in some detail before saying if we could maybe > perform a timestamp-based incremental backup (without checksum'ing the > files, as we do in normal situations), but that would really just be a > bonus. Well, it would be nice to know exactly how it would help pgBackRest if that is one of the reasons we are adding this feature. > > > > As far as TDE, I haven't seen any concrete plan for that, so why add > > > > this code for that reason? > > > > > > That this would help with TDE (of which there seems little doubt...) is > > > an additional benefit to this. Specifically, taking the existing work > > > that's already been done to allow block-by-block encryption and > > > adjusting it for AES-XTS and then using the db-dir+relfileno+block > > > number as the IV, just like many disk encryption systems do, avoids the > > > concerns that were brought up about using LSN for the IV with CTR and > > > it's certainly not difficult to do, but it does depend on this change. > > > This was all discussed previously and it sure looks like a sensible > > > approach to use that mirrors what many other systems already do > > > successfully. > > > > Well, I would think we would not add this for TDE until we were sure > > someone was working on adding TDE. > > That this would help with TDE is what I'd consider an added bonus. Not if we have no plans to implement TDE, which was my point. Why not wait to see if we are actually going to implement TDE rather than adding it now. It is just so obvious, why do I have to state this? > > > > > make this required, general improved sanity when working with > > > > > pg_upgrade > > > > > is frankly a benefit in its own right too...). If the additional code > > > > > > > > How? I am not aware of any advantage except cosmetic. > > > > > > Having to resort to matching up inode numbers between the two clusters > > > after a pg_upgrade to figure out what files are actually the same > > > underneath is a pain that goes beyond just cosmetics imv. Removing that > > > additional level that admins, and developers for that matter, have to go > > > through would be a nice improvement on its own. > > > > OK, I was just not aware anyone did that, since I have never hard anyone > > complain about it before. > > I've certainly done it and I'd be kind of surprised if others haven't, > but I've also played a lot with pg_dump in various modes, so perhaps > that's not a great representation. I've definitely had to explain to > clients why there's a whole different set of filenames after a > pg_upgrade and why that is the case for an 'in place' upgrade before > too. Uh, so I guess I am right that few people have mentioned this in the past. Why were users caring about the file names? > > > > > was a huge burden or even a moderate one then that might be an > > > > > argument > > > > > against, but it hardly sounds like it will be given Robert's thorough > > > > > analysis so far and the (admittedly not complete, but not that far > > > > > from > > > > > it based on the DB OID review) proposed patch. > > > > > > > > I am find to add it if it is minor, but I want to see the calculus of > > > > its value vs complexity, which I have not seen spelled out. > > > > > > I feel that this, along with the prior discussions, spells it out > > > sufficiently given the patch's complexity looks to be reasonably minor > > > and very similar to the existing things that pg_upgrade already does. > > > Had pg_upgrade done this in the first place, I don't think there would > > > have been nearly this amount of discussion about it. > > > > Well, there is a reason pg_upgrade didn't initially do this --- because > > it adds complexity, and potentially makes future changes to pg_upgrade > > necessary if the server behavior changes. > > I have a very hard time seeing what changes might happen in the server > in this space that wouldn't have an impact on pg_upgrade, with or > without this. I don't know, but I have to ask since I can't know the future, so any "preseration" has to be studied. > > I am not saying this change is wrong, but I think the reasons need to be > > stated in this thread, rather than just moving forward. > > Ok, they've been stated and it seems to at least Robert
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Thu, Aug 26, 2021 at 12:34:56PM -0400, Stephen Frost wrote: > > * Bruce Momjian (br...@momjian.us) wrote: > > > On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote: > > > > * Bruce Momjian (br...@momjian.us) wrote: > > > > > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote: > > > > > > Anyone see a flaw in that analysis? > > > > > > > > > > I am still waiting to hear the purpose of this preservation. As long > > > > > as > > > > > you don't apply the patch, I guess I will just stop asking. > > > > > > > > I'm a bit confused why this question keeps coming up as we've discussed > > > > multiple reasons (incremental backups, possible use for TDE which would > > > > > > I have not seen much explaination on pgBackRest, except me mentioning > > > it. Is this something really useful? > > > > Being able to quickly perform a backup on a newly upgraded cluster would > > certainly be valuable and that's definitely not possible today due to > > all of the filenames changing. > > You mean incremental backup, right? I was told this by the pgBackRest > developers during PGCon, but I have not heard that stated publicly, so I > hate to go just on what I heard rather than seeing that stated publicly. Yes, we're talking about either incremental (or perhaps differential) backup where only the files which are actually different would be backed up. Just like with PG, I can't provide any complete guarantees that we'd be able to actually make this possible after a major version with pgBackRest with this change, but it definitely isn't possible *without* this change. I can't see any reason why we wouldn't be able to do a checksum-based incremental backup though (which would be *much* faster than a regular backup) once this change is made and have that be a reliable and trustworthy backup. I'd want to think about it more and discuss it with David in some detail before saying if we could maybe perform a timestamp-based incremental backup (without checksum'ing the files, as we do in normal situations), but that would really just be a bonus. > > > As far as TDE, I haven't seen any concrete plan for that, so why add > > > this code for that reason? > > > > That this would help with TDE (of which there seems little doubt...) is > > an additional benefit to this. Specifically, taking the existing work > > that's already been done to allow block-by-block encryption and > > adjusting it for AES-XTS and then using the db-dir+relfileno+block > > number as the IV, just like many disk encryption systems do, avoids the > > concerns that were brought up about using LSN for the IV with CTR and > > it's certainly not difficult to do, but it does depend on this change. > > This was all discussed previously and it sure looks like a sensible > > approach to use that mirrors what many other systems already do > > successfully. > > Well, I would think we would not add this for TDE until we were sure > someone was working on adding TDE. That this would help with TDE is what I'd consider an added bonus. > > > > make this required, general improved sanity when working with pg_upgrade > > > > is frankly a benefit in its own right too...). If the additional code > > > > > > How? I am not aware of any advantage except cosmetic. > > > > Having to resort to matching up inode numbers between the two clusters > > after a pg_upgrade to figure out what files are actually the same > > underneath is a pain that goes beyond just cosmetics imv. Removing that > > additional level that admins, and developers for that matter, have to go > > through would be a nice improvement on its own. > > OK, I was just not aware anyone did that, since I have never hard anyone > complain about it before. I've certainly done it and I'd be kind of surprised if others haven't, but I've also played a lot with pg_dump in various modes, so perhaps that's not a great representation. I've definitely had to explain to clients why there's a whole different set of filenames after a pg_upgrade and why that is the case for an 'in place' upgrade before too. > > > > was a huge burden or even a moderate one then that might be an argument > > > > against, but it hardly sounds like it will be given Robert's thorough > > > > analysis so far and the (admittedly not complete, but not that far from > > > > it based on the DB OID review) proposed patch. > > > > > > I am find to add it if it is minor, but I want to see the calculus of > > > its value vs complexity, which I have not seen spelled out. > > > > I feel that this, along with the prior discussions, spells it out > > sufficiently given the patch's complexity looks to be reasonably minor > > and very similar to the existing things that pg_upgrade already does. > > Had pg_upgrade done this in the first place, I don't think there would > > have been nearly this amount of discussion about it. > > Well, there is a reason pg_upgrade didn't
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 12:37:19PM -0400, Robert Haas wrote: > On Thu, Aug 26, 2021 at 11:48 AM Bruce Momjian wrote: > > I am find to add it if it is minor, but I want to see the calculus of > > its value vs complexity, which I have not seen spelled out. > > I don't think it's going to be all that complicated, but we're going > to have to wait until we have something closer to a final patch before > we can really evaluate that. I am honestly a little puzzled about why > you think complexity is such a big issue for this patch in particular. > I feel we do probably several hundred things every release cycle that > are more complicated than this, so it doesn't seem like this is > particularly extraordinary or needs a lot of extra scrutiny. I do > think there is some risk that there are messy cases we can't handle > cleanly, but if that becomes an issue then I'll abandon the effort > until a solution can be found. I'm not trying to relentlessly drive > something through that is a bad idea on principle. > > I agree with all Stephen's comments, too. I just don't want to add requirements/complexity to pg_upgrade without clearly stated reasons because future database changes will need to honor this new preservation behavior. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 12:34:56PM -0400, Stephen Frost wrote: > * Bruce Momjian (br...@momjian.us) wrote: > > On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote: > > > * Bruce Momjian (br...@momjian.us) wrote: > > > > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote: > > > > > Anyone see a flaw in that analysis? > > > > > > > > I am still waiting to hear the purpose of this preservation. As long as > > > > you don't apply the patch, I guess I will just stop asking. > > > > > > I'm a bit confused why this question keeps coming up as we've discussed > > > multiple reasons (incremental backups, possible use for TDE which would > > > > I have not seen much explaination on pgBackRest, except me mentioning > > it. Is this something really useful? > > Being able to quickly perform a backup on a newly upgraded cluster would > certainly be valuable and that's definitely not possible today due to > all of the filenames changing. You mean incremental backup, right? I was told this by the pgBackRest developers during PGCon, but I have not heard that stated publicly, so I hate to go just on what I heard rather than seeing that stated publicly. > > As far as TDE, I haven't seen any concrete plan for that, so why add > > this code for that reason? > > That this would help with TDE (of which there seems little doubt...) is > an additional benefit to this. Specifically, taking the existing work > that's already been done to allow block-by-block encryption and > adjusting it for AES-XTS and then using the db-dir+relfileno+block > number as the IV, just like many disk encryption systems do, avoids the > concerns that were brought up about using LSN for the IV with CTR and > it's certainly not difficult to do, but it does depend on this change. > This was all discussed previously and it sure looks like a sensible > approach to use that mirrors what many other systems already do > successfully. Well, I would think we would not add this for TDE until we were sure someone was working on adding TDE. > > > make this required, general improved sanity when working with pg_upgrade > > > is frankly a benefit in its own right too...). If the additional code > > > > How? I am not aware of any advantage except cosmetic. > > Having to resort to matching up inode numbers between the two clusters > after a pg_upgrade to figure out what files are actually the same > underneath is a pain that goes beyond just cosmetics imv. Removing that > additional level that admins, and developers for that matter, have to go > through would be a nice improvement on its own. OK, I was just not aware anyone did that, since I have never hard anyone complain about it before. > > > was a huge burden or even a moderate one then that might be an argument > > > against, but it hardly sounds like it will be given Robert's thorough > > > analysis so far and the (admittedly not complete, but not that far from > > > it based on the DB OID review) proposed patch. > > > > I am find to add it if it is minor, but I want to see the calculus of > > its value vs complexity, which I have not seen spelled out. > > I feel that this, along with the prior discussions, spells it out > sufficiently given the patch's complexity looks to be reasonably minor > and very similar to the existing things that pg_upgrade already does. > Had pg_upgrade done this in the first place, I don't think there would > have been nearly this amount of discussion about it. Well, there is a reason pg_upgrade didn't initially do this --- because it adds complexity, and potentially makes future changes to pg_upgrade necessary if the server behavior changes. I am not saying this change is wrong, but I think the reasons need to be stated in this thread, rather than just moving forward. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Thu, Aug 26, 2021 at 11:39 AM Stephen Frost wrote: > > This looks like a pretty good analysis to me. As it relates to the > > question about allowing users to specify an OID, I'd be inclined to > > allow it but only for OIDs >64k. We've certainly reserved things in the > > past and I don't see any issue with having that reservation here, but if > > we're going to build the capability to specify the OID into CREATE > > DATABASE then it seems a bit odd to disallow users from using it, as > > long as we're preventing them from causing problems with it. > > > > Are there issues that you see with allowing users to specify the OID > > even with the >64k restriction..? I can't think of one offhand but > > perhaps I'm missing something. > > So I actually should have said 16k here, not 64k, as somebody already > pointed out to me off-list. Whee! Hah, yes, of course. > I don't know of a reason not to let people do that, other than that it > seems like an attractive nuisance. People will do it and it will fail > because they chose a duplicate OID, or they'll complain that a regular > dump and restore didn't preserve their database OIDs, or maybe they'll > expect that they can copy a database from one cluster to another > because they gave it the same OID! That said, I don't see a great harm > in it. It just seems to me like exposing knobs to users that don't > seem to have any legitimate use may be borrowing trouble. We're going to have to gate this somehow to allow the OIDs under 16k to be used, so it seems like what you're suggesting is that we have that gate in place but then allow any OID to be used if you've crossed that gate? That is, if we do something like: SELECT pg_catalog.binary_upgrade_allow_setting_db_oid(); CREATE DATABASE blah WITH OID 1234; for pg_upgrade, well, users who are interested may well figure out how to do that themselves if they decide they want to set the OID, whereas if it 'just works' provided they don't try to use an OID too low then maybe they won't try to bypass the restriction against using system OIDs..? Ok, I'll give you that this is a stretch and I'm on the fence about if it's worthwhile or not to include and document and if, as you say, it's inviting trouble to allow users to set it. Users do seem to have a knack for finding things even when they aren't documented and then we get to deal with those complaints too. :) Perhaps others have some stronger feelings one way or another. Thanks, Stephen signature.asc Description: PGP signature
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 11:48 AM Bruce Momjian wrote: > I am find to add it if it is minor, but I want to see the calculus of > its value vs complexity, which I have not seen spelled out. I don't think it's going to be all that complicated, but we're going to have to wait until we have something closer to a final patch before we can really evaluate that. I am honestly a little puzzled about why you think complexity is such a big issue for this patch in particular. I feel we do probably several hundred things every release cycle that are more complicated than this, so it doesn't seem like this is particularly extraordinary or needs a lot of extra scrutiny. I do think there is some risk that there are messy cases we can't handle cleanly, but if that becomes an issue then I'll abandon the effort until a solution can be found. I'm not trying to relentlessly drive something through that is a bad idea on principle. I agree with all Stephen's comments, too. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote: > > * Bruce Momjian (br...@momjian.us) wrote: > > > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote: > > > > Anyone see a flaw in that analysis? > > > > > > I am still waiting to hear the purpose of this preservation. As long as > > > you don't apply the patch, I guess I will just stop asking. > > > > I'm a bit confused why this question keeps coming up as we've discussed > > multiple reasons (incremental backups, possible use for TDE which would > > I have not seen much explaination on pgBackRest, except me mentioning > it. Is this something really useful? Being able to quickly perform a backup on a newly upgraded cluster would certainly be valuable and that's definitely not possible today due to all of the filenames changing. > As far as TDE, I haven't seen any concrete plan for that, so why add > this code for that reason? That this would help with TDE (of which there seems little doubt...) is an additional benefit to this. Specifically, taking the existing work that's already been done to allow block-by-block encryption and adjusting it for AES-XTS and then using the db-dir+relfileno+block number as the IV, just like many disk encryption systems do, avoids the concerns that were brought up about using LSN for the IV with CTR and it's certainly not difficult to do, but it does depend on this change. This was all discussed previously and it sure looks like a sensible approach to use that mirrors what many other systems already do successfully. > > make this required, general improved sanity when working with pg_upgrade > > is frankly a benefit in its own right too...). If the additional code > > How? I am not aware of any advantage except cosmetic. Having to resort to matching up inode numbers between the two clusters after a pg_upgrade to figure out what files are actually the same underneath is a pain that goes beyond just cosmetics imv. Removing that additional level that admins, and developers for that matter, have to go through would be a nice improvement on its own. > > was a huge burden or even a moderate one then that might be an argument > > against, but it hardly sounds like it will be given Robert's thorough > > analysis so far and the (admittedly not complete, but not that far from > > it based on the DB OID review) proposed patch. > > I am find to add it if it is minor, but I want to see the calculus of > its value vs complexity, which I have not seen spelled out. I feel that this, along with the prior discussions, spells it out sufficiently given the patch's complexity looks to be reasonably minor and very similar to the existing things that pg_upgrade already does. Had pg_upgrade done this in the first place, I don't think there would have been nearly this amount of discussion about it. Thanks, Stephen signature.asc Description: PGP signature
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 11:39 AM Stephen Frost wrote: > This looks like a pretty good analysis to me. As it relates to the > question about allowing users to specify an OID, I'd be inclined to > allow it but only for OIDs >64k. We've certainly reserved things in the > past and I don't see any issue with having that reservation here, but if > we're going to build the capability to specify the OID into CREATE > DATABASE then it seems a bit odd to disallow users from using it, as > long as we're preventing them from causing problems with it. > > Are there issues that you see with allowing users to specify the OID > even with the >64k restriction..? I can't think of one offhand but > perhaps I'm missing something. So I actually should have said 16k here, not 64k, as somebody already pointed out to me off-list. Whee! I don't know of a reason not to let people do that, other than that it seems like an attractive nuisance. People will do it and it will fail because they chose a duplicate OID, or they'll complain that a regular dump and restore didn't preserve their database OIDs, or maybe they'll expect that they can copy a database from one cluster to another because they gave it the same OID! That said, I don't see a great harm in it. It just seems to me like exposing knobs to users that don't seem to have any legitimate use may be borrowing trouble. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (br...@momjian.us) wrote: > > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote: > > > Anyone see a flaw in that analysis? > > > > I am still waiting to hear the purpose of this preservation. As long as > > you don't apply the patch, I guess I will just stop asking. > > I'm a bit confused why this question keeps coming up as we've discussed > multiple reasons (incremental backups, possible use for TDE which would I have not seen much explaination on pgBackRest, except me mentioning it. Is this something really useful? As far as TDE, I haven't seen any concrete plan for that, so why add this code for that reason? > make this required, general improved sanity when working with pg_upgrade > is frankly a benefit in its own right too...). If the additional code How? I am not aware of any advantage except cosmetic. > was a huge burden or even a moderate one then that might be an argument > against, but it hardly sounds like it will be given Robert's thorough > analysis so far and the (admittedly not complete, but not that far from > it based on the DB OID review) proposed patch. I am find to add it if it is minor, but I want to see the calculus of its value vs complexity, which I have not seen spelled out. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 11:35:01AM -0400, Robert Haas wrote: > On Thu, Aug 26, 2021 at 11:24 AM Bruce Momjian wrote: > > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote: > > > Anyone see a flaw in that analysis? > > > > I am still waiting to hear the purpose of this preservation. As long as > > you don't apply the patch, I guess I will just stop asking. > > You make it sound like I didn't answer that question the last time you > asked it, but I did.[1] I went back to the previous thread and found > that, in fact, there's at least one email *from you* appearing to > endorse that concept for reasons unrelated to TDE[2] and another where > you appear to agree that it would be useful for TDE to do it.[3] > Stephen Frost also wrote up his discussion during the Unconference and > some of his reasons for liking the idea.[4] > > If you've changed your mind about this being a good idea, or if you no > longer think it's useful without TDE, that's fine. Everyone is > entitled to change their opinion. But then please say that straight > out. It baffles me why you're now acting as if it hasn't been > discussed when it clearly has been, and both you and I were > participants in that discussion. > > [1] > https://www.postgresql.org/message-id/ca+tgmob7msyh3vray87usr22uakvvsyy4zbaqw2ao2cfoud...@mail.gmail.com > [2] https://www.postgresql.org/message-id/20210601140949.gc22...@momjian.us > [3] https://www.postgresql.org/message-id/20210527210023.gj5...@momjian.us > [4] > https://www.postgresql.org/message-id/20210531201652.gy20...@tamriel.snowman.net Yes, it would help incremental backup of pgBackRest, as reported by the developers. However, I have seen no discussion if this is useful enough reason to add the complexity to preserve this. The TODO list shows "Desirability" as the first item to be discussed, so I expected that to be discussed first. Also, with TDE not progressing (and my approach not even needing this), I have not seen a full discussion if this item is desirable based on its complexity. What I did see is this patch appear with no context of why it is useful given our current plans, except for pgBackRest, which I think I mentioned. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Tue, Aug 17, 2021 at 2:50 PM Robert Haas wrote: > > > Less sure that this is a good idea, though. In particular, I do not > > > think that you can make it work in the face of > > > alter database template1 rename to oops; > > > create database template1; > > > > That is a really good point. If we can't categorically force the OID > > of those databases to have a particular, fixed value, and based on > > this example that seems to be impossible, then there's always a > > possibility that we might find a value in the old cluster that doesn't > > happen to match what is present in the new cluster. Seen from that > > angle, the problem is really with databases that are pre-existent in > > the new cluster but whose contents still need to be dumped. Maybe we > > could (optionally? conditionally?) drop those databases from the new > > cluster and then recreate them with the OID that we want them to have. > > Actually, we do that already. create_new_objects() runs pg_restore > with --create for most databases, but with --clean --create for > template1 and postgres. This means that template1 and postgres will > always be recreated in the new cluster, and other databases are > assumed not to exist in the new cluster and the upgrade will fail if > they unexpectedly do. And the reason why pg_upgrade does that is that > it wants to "propagate [the] database-level properties" of postgres > and template1. So suppose we just make the database OID one of the > database-level properties that we want to propagate. That should > mostly just work, but where can things go wrong? > > The only real failure mode is we try to create a database in the new > cluster and find out that the OID is already in use. If the new OID > that collides >64k, then the user has messed with the new cluster > before doing that. And since pg_upgrade is pretty clearly already > assuming that you shouldn't do that, it's fine to also make that > assumption in this case. We can disregard such cases as user error. > > If the new OID that collides is <64k, then it must be colliding with > template0, template1, or postgres in the new cluster, because those > are the only databases that can have such OIDs since, currently, we > don't allow users to specify an OID for a new database. And the > problem cannot be with template1, because we hard-code its OID to 1. > If there is a database with OID 1 in either cluster, it must be > template1, and if there is a database with OID 1 in both clusters, it > must be template1 in both cases, and we'll just drop and recreate it > with OID 1 and everything is fine. So we need only consider template0 > and postgres, which are created with system-generated OIDs. And, it > would be no issue if either of those databases had the same OID in the > old and new cluster, so the only possible OID collision is one where > the same system-generated OID was assigned to one of those databases > in the old cluster and to the other in the new cluster. > > First consider the case where template0 has OID, say, 13000, in the > old cluster, and postgres has that OID in the new cluster. No problem > occurs, because template0 isn't transferred anyway. The reverse > direction is a problem, though. If postgres had been assigned OID > 13000 in the old cluster and, by sheer chance, template0 had that OID > in the new cluster, then the upgrade would fail, because it wouldn't > be able to recreate the postgres database with the correct OID. > > But that doesn't seem very difficult to fix. I think all we need to do > is have initdb assign a fixed OID to template0 at creation time. Then, > in any new release to which someone might be trying to upgrade, the > system-generated OID assigned to postgres in the old release can't > match the fixed OID assigned to template0 in the new release, so the > one problem case is ruled out. We do need, however, to make sure that > the assign-my-database-a-fixed-OID syntax is either entirely > restricted to initdb & pg_upgrade or at least that OIDS < 64k can only > be assigned in one of those modes. Otherwise, some creative person > could manufacture new problem cases by setting up the source database > so that the OID of one of their databases matches the fixed OID we > gave to template0 or template1, or the system-generated OID for > postgres in the new cluster. > > In short, as far as I can see, all we need to do to preserve database > OIDs across pg_upgrade is: > > 1. Add a new syntax for creating a database with a given OID, and use > it in pg_dump --binary-upgrade. > 2. Don't let users use it at least for OIDs <64k, or maybe just don't > let them use it at all. > 3. But let initdb use it, and have initdb set the initial OID for > template0 to a fixed value < 1. If the user changes it later, no > problem; the cluster into which they are upgrading won't contain any > databases with high-numbered OIDs. > > Anyone see a
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote: > > Anyone see a flaw in that analysis? > > I am still waiting to hear the purpose of this preservation. As long as > you don't apply the patch, I guess I will just stop asking. I'm a bit confused why this question keeps coming up as we've discussed multiple reasons (incremental backups, possible use for TDE which would make this required, general improved sanity when working with pg_upgrade is frankly a benefit in its own right too...). If the additional code was a huge burden or even a moderate one then that might be an argument against, but it hardly sounds like it will be given Robert's thorough analysis so far and the (admittedly not complete, but not that far from it based on the DB OID review) proposed patch. Thanks, Stephen signature.asc Description: PGP signature
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 11:24 AM Bruce Momjian wrote: > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote: > > Anyone see a flaw in that analysis? > > I am still waiting to hear the purpose of this preservation. As long as > you don't apply the patch, I guess I will just stop asking. You make it sound like I didn't answer that question the last time you asked it, but I did.[1] I went back to the previous thread and found that, in fact, there's at least one email *from you* appearing to endorse that concept for reasons unrelated to TDE[2] and another where you appear to agree that it would be useful for TDE to do it.[3] Stephen Frost also wrote up his discussion during the Unconference and some of his reasons for liking the idea.[4] If you've changed your mind about this being a good idea, or if you no longer think it's useful without TDE, that's fine. Everyone is entitled to change their opinion. But then please say that straight out. It baffles me why you're now acting as if it hasn't been discussed when it clearly has been, and both you and I were participants in that discussion. [1] https://www.postgresql.org/message-id/ca+tgmob7msyh3vray87usr22uakvvsyy4zbaqw2ao2cfoud...@mail.gmail.com [2] https://www.postgresql.org/message-id/20210601140949.gc22...@momjian.us [3] https://www.postgresql.org/message-id/20210527210023.gj5...@momjian.us [4] https://www.postgresql.org/message-id/20210531201652.gy20...@tamriel.snowman.net -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote: > Anyone see a flaw in that analysis? I am still waiting to hear the purpose of this preservation. As long as you don't apply the patch, I guess I will just stop asking. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 17, 2021 at 2:50 PM Robert Haas wrote: > > Less sure that this is a good idea, though. In particular, I do not > > think that you can make it work in the face of > > alter database template1 rename to oops; > > create database template1; > > That is a really good point. If we can't categorically force the OID > of those databases to have a particular, fixed value, and based on > this example that seems to be impossible, then there's always a > possibility that we might find a value in the old cluster that doesn't > happen to match what is present in the new cluster. Seen from that > angle, the problem is really with databases that are pre-existent in > the new cluster but whose contents still need to be dumped. Maybe we > could (optionally? conditionally?) drop those databases from the new > cluster and then recreate them with the OID that we want them to have. Actually, we do that already. create_new_objects() runs pg_restore with --create for most databases, but with --clean --create for template1 and postgres. This means that template1 and postgres will always be recreated in the new cluster, and other databases are assumed not to exist in the new cluster and the upgrade will fail if they unexpectedly do. And the reason why pg_upgrade does that is that it wants to "propagate [the] database-level properties" of postgres and template1. So suppose we just make the database OID one of the database-level properties that we want to propagate. That should mostly just work, but where can things go wrong? The only real failure mode is we try to create a database in the new cluster and find out that the OID is already in use. If the new OID that collides >64k, then the user has messed with the new cluster before doing that. And since pg_upgrade is pretty clearly already assuming that you shouldn't do that, it's fine to also make that assumption in this case. We can disregard such cases as user error. If the new OID that collides is <64k, then it must be colliding with template0, template1, or postgres in the new cluster, because those are the only databases that can have such OIDs since, currently, we don't allow users to specify an OID for a new database. And the problem cannot be with template1, because we hard-code its OID to 1. If there is a database with OID 1 in either cluster, it must be template1, and if there is a database with OID 1 in both clusters, it must be template1 in both cases, and we'll just drop and recreate it with OID 1 and everything is fine. So we need only consider template0 and postgres, which are created with system-generated OIDs. And, it would be no issue if either of those databases had the same OID in the old and new cluster, so the only possible OID collision is one where the same system-generated OID was assigned to one of those databases in the old cluster and to the other in the new cluster. First consider the case where template0 has OID, say, 13000, in the old cluster, and postgres has that OID in the new cluster. No problem occurs, because template0 isn't transferred anyway. The reverse direction is a problem, though. If postgres had been assigned OID 13000 in the old cluster and, by sheer chance, template0 had that OID in the new cluster, then the upgrade would fail, because it wouldn't be able to recreate the postgres database with the correct OID. But that doesn't seem very difficult to fix. I think all we need to do is have initdb assign a fixed OID to template0 at creation time. Then, in any new release to which someone might be trying to upgrade, the system-generated OID assigned to postgres in the old release can't match the fixed OID assigned to template0 in the new release, so the one problem case is ruled out. We do need, however, to make sure that the assign-my-database-a-fixed-OID syntax is either entirely restricted to initdb & pg_upgrade or at least that OIDS < 64k can only be assigned in one of those modes. Otherwise, some creative person could manufacture new problem cases by setting up the source database so that the OID of one of their databases matches the fixed OID we gave to template0 or template1, or the system-generated OID for postgres in the new cluster. In short, as far as I can see, all we need to do to preserve database OIDs across pg_upgrade is: 1. Add a new syntax for creating a database with a given OID, and use it in pg_dump --binary-upgrade. 2. Don't let users use it at least for OIDs <64k, or maybe just don't let them use it at all. 3. But let initdb use it, and have initdb set the initial OID for template0 to a fixed value < 1. If the user changes it later, no problem; the cluster into which they are upgrading won't contain any databases with high-numbered OIDs. Anyone see a flaw in that analysis? -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 24, 2021 at 02:34:26PM -0400, Robert Haas wrote: > On Tue, Aug 24, 2021 at 2:16 PM Bruce Momjian wrote: > > One other issue --- the more that pg_upgrade preserves, the more likely > > pg_upgrade will break when some internal changes happen in Postgres. > > Therefore, if you want pg_upgrade to preserve something, you have to > > have a good reason --- even code simplicity might not be a sufficient > > reason. > > While I accept that as a general principle, I don't think it's really > applicable in this case. pg_upgrade already knows all about > relfilenodes; it has a source file called relfilenode.c. I don't see > that a pg_upgrade that preserves relfilenodes is any more or less > likely to break in the future than a pg_upgrade that renumbers all the > files so that the relation OID and the relfilenode are equal. You've > got about the same amount of reliance on the on-disk layout either > way. I was making more of a general statement that preservation can be problematic and its impact must be researched. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 24, 2021 at 2:16 PM Bruce Momjian wrote: > One other issue --- the more that pg_upgrade preserves, the more likely > pg_upgrade will break when some internal changes happen in Postgres. > Therefore, if you want pg_upgrade to preserve something, you have to > have a good reason --- even code simplicity might not be a sufficient > reason. While I accept that as a general principle, I don't think it's really applicable in this case. pg_upgrade already knows all about relfilenodes; it has a source file called relfilenode.c. I don't see that a pg_upgrade that preserves relfilenodes is any more or less likely to break in the future than a pg_upgrade that renumbers all the files so that the relation OID and the relfilenode are equal. You've got about the same amount of reliance on the on-disk layout either way. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 24, 2021 at 12:43:20PM -0400, Bruce Momjian wrote: > Yes, it is a trade-off. If we had pg_upgrade create the new cluster, > the pg_upgrade instructions would be simpler, but pg_upgrade would be > more complex since it has to adjust _everything_ properly so pg_upgrade > works --- I never got to that point, but I am willing to explore what > would be required. One other issue --- the more that pg_upgrade preserves, the more likely pg_upgrade will break when some internal changes happen in Postgres. Therefore, if you want pg_upgrade to preserve something, you have to have a good reason --- even code simplicity might not be a sufficient reason. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Tue, Aug 24, 2021 at 12:43 PM Bruce Momjian wrote: > > Yes, it is a trade-off. If we had pg_upgrade create the new cluster, > > the pg_upgrade instructions would be simpler, but pg_upgrade would be > > more complex since it has to adjust _everything_ properly so pg_upgrade > > works --- I never got to that point, but I am willing to explore what > > would be required. > > It's probably a topic for another thread, rather than this one, but I > think that would be very cool. Yes, definite +1 on this. Thanks, Stephen signature.asc Description: PGP signature
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 24, 2021 at 12:43 PM Bruce Momjian wrote: > Yes, it is a trade-off. If we had pg_upgrade create the new cluster, > the pg_upgrade instructions would be simpler, but pg_upgrade would be > more complex since it has to adjust _everything_ properly so pg_upgrade > works --- I never got to that point, but I am willing to explore what > would be required. It's probably a topic for another thread, rather than this one, but I think that would be very cool. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Aug 23, 2021 at 5:12 PM Stephen Frost wrote: > > Regarding that ... I have to wonder just what promises we feel we've > > made when it comes to what a user is expected to be able to do with the > > new cluster *before* pg_upgrade is run on it. For my part, I sure feel > > like it's "nothing", in which case it seems like we can do things that > > we can't do with a running system, like literally just DROP and recreate > > with the correct OID of any databases we need to, or even push that back > > to the user to do that at initdb time with some kind of error thrown by > > pg_upgrade during the --check phase. "Initial databases have > > non-standard OIDs, recreate destination cluster with initdb > > --with-oid=12341" or something along those lines. > > Yeah, possibly. Honestly, I find it weird that pg_upgrade expects the > new cluster to already exist. It seems like it would be more sensible > if it created the cluster itself. That's not entirely trivial, because > for example you have to create it with the correct locale settings and > stuff. But if you require the cluster to exist already, then you run > into the kinds of questions that you're asking here, and whether the > answer is "nothing" as you propose here or something more than that, > it's clearly not "whatever you want" nor anything close to that. Yeah, I'd had a similar thought and also tend to agree that it'd make more sense for pg_upgrade to set up the new cluster too, and doing so in a way that makes sure that it matches the old cluster as that's rather important. Having the user do it also implies that there is some freedom for the user to mess around with the new cluster before running pg_upgrade, it seems to me anyway, and that's certainly not something that we've built anything into pg_upgrade to deal with cleanly.. It isn't like initdb takes all *that* long to run either, and reducing the number of steps that the user has to take to perform an upgrade sure seems like a good thing to do. Anyhow, just wanted to throw that out there as another way we might approach this. Thanks, Stephen signature.asc Description: PGP signature
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 24, 2021 at 12:04 PM Tom Lane wrote: > Per upthread discussion, it seems impractical to fully guarantee > that database OIDs match, which seems to mean that the whole premise > collapses. Like Bruce, I want to see a plausible use case justifying > any partial-guarantee scenario before we add more complication (= bugs) > to pg_upgrade. I think you might be overlooking the emails from Stephen and I where we suggested how that could be made to work? -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 24, 2021 at 11:24:21AM -0400, Robert Haas wrote: > On Mon, Aug 23, 2021 at 5:12 PM Stephen Frost wrote: > > Regarding that ... I have to wonder just what promises we feel we've > > made when it comes to what a user is expected to be able to do with the > > new cluster *before* pg_upgrade is run on it. For my part, I sure feel > > like it's "nothing", in which case it seems like we can do things that > > we can't do with a running system, like literally just DROP and recreate > > with the correct OID of any databases we need to, or even push that back > > to the user to do that at initdb time with some kind of error thrown by > > pg_upgrade during the --check phase. "Initial databases have > > non-standard OIDs, recreate destination cluster with initdb > > --with-oid=12341" or something along those lines. > > Yeah, possibly. Honestly, I find it weird that pg_upgrade expects the > new cluster to already exist. It seems like it would be more sensible > if it created the cluster itself. That's not entirely trivial, because > for example you have to create it with the correct locale settings and > stuff. But if you require the cluster to exist already, then you run > into the kinds of questions that you're asking here, and whether the > answer is "nothing" as you propose here or something more than that, > it's clearly not "whatever you want" nor anything close to that. Yes, it is a trade-off. If we had pg_upgrade create the new cluster, the pg_upgrade instructions would be simpler, but pg_upgrade would be more complex since it has to adjust _everything_ properly so pg_upgrade works --- I never got to that point, but I am willing to explore what would be required. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 24, 2021 at 11:28:37AM -0400, Robert Haas wrote: > On Mon, Aug 23, 2021 at 8:29 PM Bruce Momjian wrote: > > I assume this patch is not going to be applied until there is an actual > > use case for preserving these values. > > My interpretation of the preceding discussion was that several people > thought this change was a good idea regardless of whether anything > ever happens with TDE, so I wasn't seeing a reason to wait. > Personally, I've always thought that it was quite odd that pg_upgrade > didn't preserve the relfilenode values, so I'm in favor of the change. > I bet we could even make some simplifications to that code if we got > all of this sorted out, which seems like it would be nice. Yes, if this ends up being a cleanup with no added complexity, that would be nice, but I had not seen how that was possible in the psat. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 24, 2021 at 12:04:00PM -0400, Tom Lane wrote: > Robert Haas writes: > > On Mon, Aug 23, 2021 at 8:29 PM Bruce Momjian wrote: > >> I assume this patch is not going to be applied until there is an actual > >> use case for preserving these values. > > > ... > > > That being said, if you or somebody else thinks that this is a bad > > idea or that the reasons offered up until now are insufficient, feel > > free to make that argument. I just work here... > > Per upthread discussion, it seems impractical to fully guarantee > that database OIDs match, which seems to mean that the whole premise > collapses. Like Bruce, I want to see a plausible use case justifying > any partial-guarantee scenario before we add more complication (= bugs) > to pg_upgrade. Yes, pg_upgrade is already complex enough, so why add more complexity for some cosmetic value. (I think "cosmetic" flew out the window with pg_upgrade long ago. ;-) ) I know that pgBackRest has asked for stable relfilenodes to make incremental file system backups after pg_upgrade smaller, but if we want to make relfilenodes stable, we had better understand that is _why_ we are adding this complexity. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Robert Haas writes: > On Mon, Aug 23, 2021 at 8:29 PM Bruce Momjian wrote: >> I assume this patch is not going to be applied until there is an actual >> use case for preserving these values. > ... > That being said, if you or somebody else thinks that this is a bad > idea or that the reasons offered up until now are insufficient, feel > free to make that argument. I just work here... Per upthread discussion, it seems impractical to fully guarantee that database OIDs match, which seems to mean that the whole premise collapses. Like Bruce, I want to see a plausible use case justifying any partial-guarantee scenario before we add more complication (= bugs) to pg_upgrade. regards, tom lane
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Mon, Aug 23, 2021 at 8:29 PM Bruce Momjian wrote: > I assume this patch is not going to be applied until there is an actual > use case for preserving these values. My interpretation of the preceding discussion was that several people thought this change was a good idea regardless of whether anything ever happens with TDE, so I wasn't seeing a reason to wait. Personally, I've always thought that it was quite odd that pg_upgrade didn't preserve the relfilenode values, so I'm in favor of the change. I bet we could even make some simplifications to that code if we got all of this sorted out, which seems like it would be nice. I think it was also mentioned that this might be nice for pgBackRest, which apparently permits incremental backups across major version upgrades but likes filenames to match. That being said, if you or somebody else thinks that this is a bad idea or that the reasons offered up until now are insufficient, feel free to make that argument. I just work here... -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Mon, Aug 23, 2021 at 5:12 PM Stephen Frost wrote: > Regarding that ... I have to wonder just what promises we feel we've > made when it comes to what a user is expected to be able to do with the > new cluster *before* pg_upgrade is run on it. For my part, I sure feel > like it's "nothing", in which case it seems like we can do things that > we can't do with a running system, like literally just DROP and recreate > with the correct OID of any databases we need to, or even push that back > to the user to do that at initdb time with some kind of error thrown by > pg_upgrade during the --check phase. "Initial databases have > non-standard OIDs, recreate destination cluster with initdb > --with-oid=12341" or something along those lines. Yeah, possibly. Honestly, I find it weird that pg_upgrade expects the new cluster to already exist. It seems like it would be more sensible if it created the cluster itself. That's not entirely trivial, because for example you have to create it with the correct locale settings and stuff. But if you require the cluster to exist already, then you run into the kinds of questions that you're asking here, and whether the answer is "nothing" as you propose here or something more than that, it's clearly not "whatever you want" nor anything close to that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 24, 2021 at 5:59 AM Bruce Momjian wrote: > > On Mon, Aug 23, 2021 at 04:57:31PM -0400, Robert Haas wrote: > > On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda wrote: > > > Thanks Robert for your comments. > > > I have split the patch into two portions. One that handles DB OID and > > > the other that > > > handles tablespace OID and relfilenode OID. > > > > It's pretty clear from the discussion, I think, that the database OID > > one is going to need rework to be considered. > > I assume this patch is not going to be applied until there is an actual > use case for preserving these values. JFI, I added an entry into commit fest for this patch. link: https://commitfest.postgresql.org/34/3296/
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Mon, Aug 23, 2021 at 04:57:31PM -0400, Robert Haas wrote: > On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda wrote: > > Thanks Robert for your comments. > > I have split the patch into two portions. One that handles DB OID and > > the other that > > handles tablespace OID and relfilenode OID. > > It's pretty clear from the discussion, I think, that the database OID > one is going to need rework to be considered. I assume this patch is not going to be applied until there is an actual use case for preserving these values. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda wrote: > > Thanks Robert for your comments. > > I have split the patch into two portions. One that handles DB OID and > > the other that > > handles tablespace OID and relfilenode OID. > > It's pretty clear from the discussion, I think, that the database OID > one is going to need rework to be considered. Regarding that ... I have to wonder just what promises we feel we've made when it comes to what a user is expected to be able to do with the new cluster *before* pg_upgrade is run on it. For my part, I sure feel like it's "nothing", in which case it seems like we can do things that we can't do with a running system, like literally just DROP and recreate with the correct OID of any databases we need to, or even push that back to the user to do that at initdb time with some kind of error thrown by pg_upgrade during the --check phase. "Initial databases have non-standard OIDs, recreate destination cluster with initdb --with-oid=12341" or something along those lines. Also open to the idea of simply forcing 'template1' to always being OID=1 even if it's dropped/recreated and then just dropping/recreating the template0 and postgres databases if they've got different OIDs than what the old cluster did- after all, they should be getting entirely re-populated as part of the pg_upgrade process itself. Thanks, Stephen signature.asc Description: PGP signature
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda wrote: > Thanks Robert for your comments. > I have split the patch into two portions. One that handles DB OID and > the other that > handles tablespace OID and relfilenode OID. It's pretty clear from the discussion, I think, that the database OID one is going to need rework to be considered. Regarding the other one: - The comment in binary_upgrade_set_pg_class_oids() is still not accurate. You removed the sentence which says "Indexes cannot have toast tables, so we need not make this probe in the index code path" but the immediately preceding sentence is still inaccurate in at least two ways. First, it only talks about tables, but the code now applies to indexes. Second, it only talks about OIDs, but now also deals with refilenodes. It's really important to fully update every comment that might be affected by your changes! - The SQL query in that function isn't completely correct. There is a left join from pg_class to pg_index whose ON clause includes "c.reltoastrelid = i.indrelid AND i.indisvalid." The reason it's likely that is because it is possible, in corner cases, for a TOAST table to have multiple TOAST indexes. I forget exactly how that happens, but I think it might be like if a REINDEX CONCURRENTLY on the TOAST table fails midway through, or something of that sort. Now if that happens, the LEFT JOIN you added is going to cause the output to contain multiple rows, because you didn't replicate the i.indisvalid condition into that ON clause. And then it will fail. Apparently we don't have a pg_upgrade test case for this scenario; we probably should. Actually what I think would be even better than putting i.indisvalid into that ON clause would be to join off of i.indrelid rather than c.reltoastrelid. - The code that decodes the various columns of this query does so in a slightly different order than the query itself. It would be better to make it match. Perhaps put relkind first in both cases. I might also think about trying to make the column naming a bit more consistent, e.g. relkind, relfilenode, toast_oid, toast_relfilenode, toast_index_oid, toast_index_relfilenode. - In heap_create(), the wording of the error messages is not quite consistent. You have "relfilenode value not set when in binary upgrade mode", "toast relfilenode value not set when in binary upgrade mode", and "pg_class index relfilenode value not set when in binary upgrade mode". Why does the last one mention pg_class when the other two don't? - The code in heap_create() now has no comments whatsoever, which is a shame, because it's actually kind of a tricky bit of logic. Someone might wonder why we override the relfilenode inside that function instead of doing it at the same places where we absorb binary_upgrade_next_{heap,index,toast}_pg_class_oid and the passing down the relfilenode. I think the answer is that passing down the relfilenode from the caller would result in storage not actually being created, whereas in this case we want it to be created but just with the value we specify, and the reason we want that is because we need later DDL that happens after these statements but before the old cluster's relations are moved over to execute successfully, which it won't if the storage is altogether absent. However, that raises the question of whether this patch has even got the basic design right. Maybe we ought to actually be absorbing the relfilenode setting at the same places where we're doing so for the OID, and then passing an additional parameter to heap_create() like bool suppress_storage or something like that. Maybe, taking it even further, we ought to be changing the signatures of binary_upgrade_next_heap_pg_class_oid and friends to be two-argument functions, and pass down the OID and the relfilenode in the same call, rather than calling two separate functions. I'm not so much concerned about the cost of calling two functions as the potential for confusion. I'm not honestly sure that either of these changes are the right thing to do, but I am pretty strongly inclined to do at least the first part - trying to absorb reloid and relfilenode in the same places. If we're not going to do that we certainly need to explain why we're doing it the way we are in the comments. It's not really this patch's fault, but it would sure be nice if we had some better testing for this area. Suppose this patch somehow changed nothing from the present behavior. How would we know? Or suppose it managed to somehow set all the relfilenodes in the new cluster to random values rather than the intended one? There's no automated testing that would catch any of that, and it's not obvious how it could be added to test.sh. I suppose what we really need to do at some point is rewrite that as a TAP test, but that seems like a separate project from this patch. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
> The rest of this email will be detailed review comments on the patch > as presented, and thus probably only interesting to someone actually > working on the patch. Feel free to skip if that's not you. > > - I suggest splitting the patch into one portion that deals with > database OID and another portion that deals with tablespace OID and > relfilenode OID, or maybe splitting it all the way into three separate > patches, one for each. This could allow the uncontroversial parts to > get committed first while we're wondering what to do about the problem > described above. Thanks Robert for your comments. I have split the patch into two portions. One that handles DB OID and the other that handles tablespace OID and relfilenode OID. > - There are two places in the patch, one in dumpDatabase() and one in > generate_old_dump() where blank lines are removed with no other > changes. Please avoid whitespace-only hunks. These changes are avoided. > - If possible, please try to pgindent the new code. It's pretty good > what you did, but e.g. the declaration of > binary_upgrade_next_pg_tablespace_oid probably has less whitespace > than pgindent is going to want. Taken care in latest patches > - The comments in dumpDatabase() claim that "postgres" and "template1" > are handled specially in some way, but there seems to be no code that > matches those comments. The comment is removed. > - heap_create()'s logic around setting create_storage looks slightly > redundant. I'm not positive what would be better, but ... suppose you > just took the part that's currently gated by if (!IsBinaryUpgrade) and > did it unconditionally. Then put if (IsBinaryUpgrade) around the else > clause, but delete the last bit from there that sets create_storage. > Maybe we'd still want a comment saying that it's intentional that > create_storage = true even though it will be overwritten later, but > then, I think, we wouldn't need to set create_storage in two different > places. Maybe I'm wrong. > > - If we're not going to do that, then I think you should swap the if > and else clauses and reverse the sense of the test. In createdb(), > CreateTableSpace(), and a bunch of existing places, we do if > (IsBinaryUpgrade) { ... } else { ... } so I don't think it makes sense > for this one to instead do if (!IsBinaryUpgrade) { ... } else { ... }. I have avoided the redundant code and removed the comment as it does not make sense now that we are setting the create_storage conditionally. (In the original patch, create_storage was set to TRUE by default for binary upgrade case which was wrong and was hitting assert in the following flow). > - I'm not sure that I'd bother renaming > binary_upgrade_set_pg_class_oids_and_relfilenodes(). It's such a long > name, and a relfilenode is kind of an OID, so the current name isn't > even really wrong. I'd probably drop the header comment too, since it > seems rather obvious. But both of these things are judgement calls. I agree. I have retained the old function name. > - Inside that function, there is a comment that says "Indexes cannot > have toast tables, so we need not make this probe in the index code > path." However, you have moved the code from someplace where it didn't > happen for indexes to someplace where it happens for both tables and > indexes. Therefore the comment, which was true when the code was where > it was before, is now false. So you need to update it. The comment is updated. > - It is not clear to me why pg_upgrade's Makefile needs to be changed > to include -DFRONTEND in CPPFLAGS. All of the .c files in this > directory include postgres_fe.h rather than postgres.h, and that file > has #define FRONTEND 1. Moreover, there are no actual code changes in > this directory, so why should the Makefile need any change? Makefile change is removed. > - A couple of comment changes - and the commit message - mention data > encryption, but that's not a feature that this patch implements, nor > are we committed to adding it in the immediate future (or ever, > really). So I think those places should be revised to say that we do > this because we want the filenames to match between the old and new > clusters, and leave the reasons why that might be a good thing up to > the reader's imagination. Taken care. Regards, Shruthi KC EnterpriseDB: http://www.enterprisedb.com v2-0001-Preserve-database-OIDs-in-pg_upgrade.patch Description: Binary data v2-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg.patch Description: Binary data
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 17, 2021 at 1:54 PM Tom Lane wrote: > Right. If pg_upgrade explicitly ignores template0 then its OID > need not be stable ... at least, not unless there's a chance it > could conflict with some other database OID, which would become > a live possibility if we let users get at "WITH OID = n". Well, that might be a good reason not to let them do that, then, at least for n<64k. > > In fact, I'd probably go so far as to > > hardcode that in such a way that even if you drop those databases and > > recreate them, they get recreated with the same hard-coded OID. > > Less sure that this is a good idea, though. In particular, I do not > think that you can make it work in the face of > alter database template1 rename to oops; > create database template1; That is a really good point. If we can't categorically force the OID of those databases to have a particular, fixed value, and based on this example that seems to be impossible, then there's always a possibility that we might find a value in the old cluster that doesn't happen to match what is present in the new cluster. Seen from that angle, the problem is really with databases that are pre-existent in the new cluster but whose contents still need to be dumped. Maybe we could (optionally? conditionally?) drop those databases from the new cluster and then recreate them with the OID that we want them to have. > > The only hard requirement for this feature is if we > > use the database OID for some kind of encryption or integrity checking > > or checksum type feature. > > It's fairly unclear to me why that is so important as to justify the > amount of klugery that this line of thought seems to be bringing. Well, I think it would make sense to figure out how small we can make the kludge first, and then decide whether it's larger than we can tolerate. From my point of view, I completely understand why people to whom those kinds of features are important want to include all the fields that make up a buffer tag in the checksum or other integrity check. Right now, if somebody copies a page from one place to another, or if the operating system fumbles things and switches some pages around, we have no direct way of detecting that anything bad has happened. This is not the only problem that would need to be solved in order to fix that, but it's one of them, and I don't particularly see why it's not a valid goal. It's not as if a 16-bit checksum that is computed in exactly the same way for every page in the cluster is such state-of-the-art technology that only fools question its surpassing excellence. -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 17, 2021 at 11:56:30AM -0400, Robert Haas wrote: > On Wed, Aug 11, 2021 at 3:41 AM Shruthi Gowda wrote: > > I have fixed all the issues and now the patch is working as expected. > > Hi, > > I'm changing the subject line since the patch does something which was > discussed on that thread but isn't really related to the old email > subject. In general, I think this patch is uncontroversial and in > reasonably good shape. However, there's one part that I'm not too sure > about. If Tom Lane happens to be paying attention to this thread, I > think his feedback would be particularly useful, since he knows a lot > about the inner workings of pg_dump. Opinions from anybody else would > be great, too. Anyway, here's the hunk that worries me: What is the value of preserving db/ts/relfilenode OIDs? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 17, 2021 at 11:07 PM Robert Haas wrote: > > On Tue, Aug 17, 2021 at 12:42 PM Tom Lane wrote: > > Actually though ... I've not read the patch, but what does it do about > > the fact that the postgres and template0 DBs do not have stable OIDs? > > I cannot imagine any way to force those to match across PG versions > > that would not be an unsustainable crock. > > Well, it's interesting that you mention that, because there's a > comment in the patch that probably has to do with this: > > +/* > + * Make sure that pg_upgrade does not change database OID. Don't care > + * about "postgres" database, backend will assign it fixed OID anyway. > + * ("template1" has fixed OID too but the value 1 should not collide with > + * any other OID so backend pays no attention to it.) > + */ > In the original patch the author intended to avoid dumping the postgres DB OID like below: + if (dopt->binary_upgrade && dbCatId.oid != PostgresDbOid) Since postgres OID is not hardcoded/fixed I removed the check. My bad I missed updating the comment section. Sorry for the confusion. Regards, Shruthi KC EnterpriseDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
I wrote: > Robert Haas writes: >> The only hard requirement for this feature is if we >> use the database OID for some kind of encryption or integrity checking >> or checksum type feature. > It's fairly unclear to me why that is so important as to justify the > amount of klugery that this line of thought seems to be bringing. And, not to put too fine a point on it, how will you possibly do that without entirely breaking CREATE DATABASE? regards, tom lane
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Robert Haas writes: > I wasn't able to properly understand that comment, and to be honest > I'm not sure I precisely understand your concern either. I don't quite > see why the template0 database matters. I think that database isn't > going to be dumped, or restored, so as far as pg_upgrade is concerned > it might as well not exist in either cluster, and I don't see why > pg_upgrade can't therefore just ignore it completely. But template1 > and postgres are another matter. If I understand correctly, those > databases are going to be created in the new cluster by initdb, but > then pg_upgrade is going to populate them with data - including > relation files - from the old cluster. Right. If pg_upgrade explicitly ignores template0 then its OID need not be stable ... at least, not unless there's a chance it could conflict with some other database OID, which would become a live possibility if we let users get at "WITH OID = n". (Having said that, I'm not sure that pg_upgrade special-cases template0, or that it should do so.) > To be honest, what I'd be inclined to do about that is just nail down > those OIDs for future releases. Yeah, I was thinking along similar lines. > In fact, I'd probably go so far as to > hardcode that in such a way that even if you drop those databases and > recreate them, they get recreated with the same hard-coded OID. Less sure that this is a good idea, though. In particular, I do not think that you can make it work in the face of alter database template1 rename to oops; create database template1; > The only hard requirement for this feature is if we > use the database OID for some kind of encryption or integrity checking > or checksum type feature. It's fairly unclear to me why that is so important as to justify the amount of klugery that this line of thought seems to be bringing. regards, tom lane
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Tue, Aug 17, 2021 at 12:42 PM Tom Lane wrote: > Actually though ... I've not read the patch, but what does it do about > the fact that the postgres and template0 DBs do not have stable OIDs? > I cannot imagine any way to force those to match across PG versions > that would not be an unsustainable crock. Well, it's interesting that you mention that, because there's a comment in the patch that probably has to do with this: +/* + * Make sure that pg_upgrade does not change database OID. Don't care + * about "postgres" database, backend will assign it fixed OID anyway. + * ("template1" has fixed OID too but the value 1 should not collide with + * any other OID so backend pays no attention to it.) + */ I wasn't able to properly understand that comment, and to be honest I'm not sure I precisely understand your concern either. I don't quite see why the template0 database matters. I think that database isn't going to be dumped, or restored, so as far as pg_upgrade is concerned it might as well not exist in either cluster, and I don't see why pg_upgrade can't therefore just ignore it completely. But template1 and postgres are another matter. If I understand correctly, those databases are going to be created in the new cluster by initdb, but then pg_upgrade is going to populate them with data - including relation files - from the old cluster. And, yeah, I don't see how we could make those database OIDs match, which is not great. To be honest, what I'd be inclined to do about that is just nail down those OIDs for future releases. In fact, I'd probably go so far as to hardcode that in such a way that even if you drop those databases and recreate them, they get recreated with the same hard-coded OID. Now that doesn't do anything to create stability when people upgrade from an old release to a current one, but I don't really see that as an enormous problem. The only hard requirement for this feature is if we use the database OID for some kind of encryption or integrity checking or checksum type feature. Then, you want to avoid having the database OID change when you upgrade, so that the encryption or integrity check or checksum in question does not have to be recomputed for every page as part of pg_upgrade. But, that only matters if you're going between two releases that support that feature, which will not be the case if you're upgrading from some old release. Apart from that kind of feature, it still seems like a nice-to-have to keep database OIDs the same, but if those cases end up as exceptions, oh well. Does that seem reasonable, or am I missing something big? -- Robert Haas EDB: http://www.enterprisedb.com
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Stephen Frost writes: > Also agreed on this, though I wonder- do we actually need to explicitly > make CREATE DATABASE q WITH OID = 1234; only work during binary upgrade > mode in the backend? That strikes me as perhaps doing more work than we > really need to while also preventing something that users might actually > like to do. There should be adequate defenses against a duplicate OID already, so +1 --- no reason to insist this only be used during binary upgrade. Actually though ... I've not read the patch, but what does it do about the fact that the postgres and template0 DBs do not have stable OIDs? I cannot imagine any way to force those to match across PG versions that would not be an unsustainable crock. regards, tom lane
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Robert Haas writes: > > To me, adding a separate TOC entry for a thing that is not really a > > separate object seems like a scary hack that might come back to bite > > us. Unfortunately, I don't know enough about pg_dump to say exactly > > how it might come back to bite us, which leaves wide open the > > possibility that I am completely wrong I just think it's the > > intention that archive entries correspond to actual objects in the > > database, not commands that we want executed in some particular order. > > I agree, this seems like a moderately bad idea. It could get broken > either by executing only one of the TOC entries during restore, or > by executing them in the wrong order. The latter possibility could > be forestalled by adding a dependency, which I do not see this hunk > doing, which is clearly a bug. The former possibility would require > user intervention, so maybe it's in the category of "if you break > this you get to keep both pieces". Still, it's ugly. Yeah, agreed. > > If that criticism is indeed correct, then my proposal would be to > > instead add a WITH OID = nnn option to CREATE DATABASE and allow it to > > be used only in binary upgrade mode. > > If it's not too complicated to implement, that seems like an OK idea > from here. I don't have any great love for the way we handle OID > preservation in binary upgrade mode, so not doing it exactly the same > way for databases doesn't seem like a disadvantage. Also agreed on this, though I wonder- do we actually need to explicitly make CREATE DATABASE q WITH OID = 1234; only work during binary upgrade mode in the backend? That strikes me as perhaps doing more work than we really need to while also preventing something that users might actually like to do. Either way, we'll need to check that the OID given to us can be used for the database, I'd think. Having pg_dump only include it in binary upgrade mode is fine though. Thanks, Stephen signature.asc Description: PGP signature
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Robert Haas writes: > To me, adding a separate TOC entry for a thing that is not really a > separate object seems like a scary hack that might come back to bite > us. Unfortunately, I don't know enough about pg_dump to say exactly > how it might come back to bite us, which leaves wide open the > possibility that I am completely wrong I just think it's the > intention that archive entries correspond to actual objects in the > database, not commands that we want executed in some particular order. I agree, this seems like a moderately bad idea. It could get broken either by executing only one of the TOC entries during restore, or by executing them in the wrong order. The latter possibility could be forestalled by adding a dependency, which I do not see this hunk doing, which is clearly a bug. The former possibility would require user intervention, so maybe it's in the category of "if you break this you get to keep both pieces". Still, it's ugly. > If that criticism is indeed correct, then my proposal would be to > instead add a WITH OID = nnn option to CREATE DATABASE and allow it to > be used only in binary upgrade mode. If it's not too complicated to implement, that seems like an OK idea from here. I don't have any great love for the way we handle OID preservation in binary upgrade mode, so not doing it exactly the same way for databases doesn't seem like a disadvantage. regards, tom lane
preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Wed, Aug 11, 2021 at 3:41 AM Shruthi Gowda wrote: > I have fixed all the issues and now the patch is working as expected. Hi, I'm changing the subject line since the patch does something which was discussed on that thread but isn't really related to the old email subject. In general, I think this patch is uncontroversial and in reasonably good shape. However, there's one part that I'm not too sure about. If Tom Lane happens to be paying attention to this thread, I think his feedback would be particularly useful, since he knows a lot about the inner workings of pg_dump. Opinions from anybody else would be great, too. Anyway, here's the hunk that worries me: + + /* +* Need a separate entry, otherwise the command will be run in the +* same transaction as the CREATE DATABASE command, which is not +* allowed. +*/ + ArchiveEntry(fout, +dbCatId, /* catalog ID */ +dbDumpId, /* dump ID */ +ARCHIVE_OPTS(.tag = datname, + .owner = dba, + .description = "SET_DB_OID", + .section = SECTION_PRE_DATA, + .createStmt = setDBIdQry->data, + .dropStmt = NULL)); + To me, adding a separate TOC entry for a thing that is not really a separate object seems like a scary hack that might come back to bite us. Unfortunately, I don't know enough about pg_dump to say exactly how it might come back to bite us, which leaves wide open the possibility that I am completely wrong I just think it's the intention that archive entries correspond to actual objects in the database, not commands that we want executed in some particular order. If that criticism is indeed correct, then my proposal would be to instead add a WITH OID = nnn option to CREATE DATABASE and allow it to be used only in binary upgrade mode. That has the disadvantage of being inconsistent with the way that we preserve OIDs everywhere else, but the only other alternatives are (1) do something like the above, (2) remove the requirement that CREATE DATABASE run in its own transaction, and (3) give up. (2) sounds hard and (3) is unappealing. The rest of this email will be detailed review comments on the patch as presented, and thus probably only interesting to someone actually working on the patch. Feel free to skip if that's not you. - I suggest splitting the patch into one portion that deals with database OID and another portion that deals with tablespace OID and relfilenode OID, or maybe splitting it all the way into three separate patches, one for each. This could allow the uncontroversial parts to get committed first while we're wondering what to do about the problem described above. - There are two places in the patch, one in dumpDatabase() and one in generate_old_dump() where blank lines are removed with no other changes. Please avoid whitespace-only hunks. - If possible, please try to pgindent the new code. It's pretty good what you did, but e.g. the declaration of binary_upgrade_next_pg_tablespace_oid probably has less whitespace than pgindent is going to want. - The comments in dumpDatabase() claim that "postgres" and "template1" are handled specially in some way, but there seems to be no code that matches those comments. - heap_create()'s logic around setting create_storage looks slightly redundant. I'm not positive what would be better, but ... suppose you just took the part that's currently gated by if (!IsBinaryUpgrade) and did it unconditionally. Then put if (IsBinaryUpgrade) around the else clause, but delete the last bit from there that sets create_storage. Maybe we'd still want a comment saying that it's intentional that create_storage = true even though it will be overwritten later, but then, I think, we wouldn't need to set create_storage in two different places. Maybe I'm wrong. - If we're not going to do that, then I think you should swap the if and else clauses and reverse the sense of the test. In createdb(), CreateTableSpace(), and a bunch of existing places, we do if (IsBinaryUpgrade) { ... } else { ... } so I don't think it makes sense for this one to instead do if (!IsBinaryUpgrade) { ... } else { ... }. - I'm not sure that I'd bother renaming binary_upgrade_set_pg_class_oids_and_relfilenodes(). It's such a long name, and a relfilenode is kind of an OID, so the current name isn't even really wrong. I'd probably drop the header comment too, since it seems rather obvious. But both of these things are judgement calls. - Inside that function, there is a comment that says "Indexes cannot have toast tables, so we need not make this probe in the index code path." However, you have moved the code from someplace where it didn't happen for indexes to someplace where it happens for both tables and indexes. Therefore the comment, which was true when the code was