Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-04-21 Thread Tom Lane
Robert Haas  writes:
> On Thu, Apr 21, 2022 at 1:53 PM Tom Lane  wrote:
>> 0001 attached is a revised patch that does it that way.  This seems
>> like a clearly better answer.
>> 0002 contains the perhaps-slightly-more-controversial changes of
>> changing the macro names and explicitly pinning no databases.

> Both patches look good to me.

Pushed, thanks for looking.

regards, tom lane




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-04-21 Thread Robert Haas
On Thu, Apr 21, 2022 at 1:53 PM Tom Lane  wrote:
> Robert Haas  writes:
> > On Wed, Apr 20, 2022 at 4:56 PM Tom Lane  wrote:
> >> Having just had to bury my nose in renumber_oids.pl, I thought of a
> >> different approach we could take to expose these OIDs to Catalog.pm.
> >> That's to invent a new macro that Catalog.pm recognizes, and write
> >> something about like this in pg_database.h:
> >> DECLARE_OID_DEFINING_MACRO(Template0ObjectId, 4);
> >> DECLARE_OID_DEFINING_MACRO(PostgresObjectId, 5);
>
> > I like it!
>
> 0001 attached is a revised patch that does it that way.  This seems
> like a clearly better answer.
>
> 0002 contains the perhaps-slightly-more-controversial changes of
> changing the macro names and explicitly pinning no databases.

Both patches look good to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-04-21 Thread Tom Lane
Robert Haas  writes:
> On Wed, Apr 20, 2022 at 4:56 PM Tom Lane  wrote:
>> Having just had to bury my nose in renumber_oids.pl, I thought of a
>> different approach we could take to expose these OIDs to Catalog.pm.
>> That's to invent a new macro that Catalog.pm recognizes, and write
>> something about like this in pg_database.h:
>> DECLARE_OID_DEFINING_MACRO(Template0ObjectId, 4);
>> DECLARE_OID_DEFINING_MACRO(PostgresObjectId, 5);

> I like it!

0001 attached is a revised patch that does it that way.  This seems
like a clearly better answer.

0002 contains the perhaps-slightly-more-controversial changes of
changing the macro names and explicitly pinning no databases.

regards, tom lane

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 0275795dea..ece0a934f0 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -44,6 +44,8 @@ sub ParseHeader
 	$catalog{columns} = [];
 	$catalog{toasting}= [];
 	$catalog{indexing}= [];
+	$catalog{other_oids}  = [];
+	$catalog{foreign_keys} = [];
 	$catalog{client_code} = [];
 
 	open(my $ifh, '<', $input_file) || die "$input_file: $!";
@@ -118,6 +120,14 @@ sub ParseHeader
 index_decl => $6
 			  };
 		}
+		elsif (/^DECLARE_OID_DEFINING_MACRO\(\s*(\w+),\s*(\d+)\)/)
+		{
+			push @{ $catalog{other_oids} },
+			  {
+other_name => $1,
+other_oid  => $2
+			  };
+		}
 		elsif (
 			/^DECLARE_(ARRAY_)?FOREIGN_KEY(_OPT)?\(\s*\(([^)]+)\),\s*(\w+),\s*\(([^)]+)\)\)/
 		  )
@@ -572,6 +582,10 @@ sub FindAllOidsFromHeaders
 		{
 			push @oids, $index->{index_oid};
 		}
+		foreach my $other (@{ $catalog->{other_oids} })
+		{
+			push @oids, $other->{other_oid};
+		}
 	}
 
 	return \@oids;
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 520f77971b..d7e5c02f95 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -339,9 +339,11 @@ IsPinnedObject(Oid classId, Oid objectId)
 	 * robustness.
 	 */
 
-	/* template1 is not pinned */
+	/* template1, template0, postgres are not pinned */
 	if (classId == DatabaseRelationId &&
-		objectId == TemplateDbOid)
+		(objectId == TemplateDbOid ||
+		 objectId == Template0ObjectId ||
+		 objectId == PostgresObjectId))
 		return false;
 
 	/* the public namespace is not pinned */
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 2d02b02267..f4ec6d6d40 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -472,7 +472,7 @@ EOM
 	  $catalog->{rowtype_oid_macro}, $catalog->{rowtype_oid}
 	  if $catalog->{rowtype_oid_macro};
 
-	# Likewise for macros for toast and index OIDs
+	# Likewise for macros for toast, index, and other OIDs
 	foreach my $toast (@{ $catalog->{toasting} })
 	{
 		printf $def "#define %s %s\n",
@@ -488,6 +488,12 @@ EOM
 		  $index->{index_oid_macro}, $index->{index_oid}
 		  if $index->{index_oid_macro};
 	}
+	foreach my $other (@{ $catalog->{other_oids} })
+	{
+		printf $def "#define %s %s\n",
+		  $other->{other_name}, $other->{other_oid}
+		  if $other->{other_name};
+	}
 
 	print $def "\n";
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 1cb4a5b0d2..5e2eeefc4c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -59,11 +59,11 @@
 #include "sys/mman.h"
 #endif
 
-#include "access/transam.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_class_d.h" /* pgrminclude ignore */
 #include "catalog/pg_collation_d.h"
+#include "catalog/pg_database_d.h"	/* pgrminclude ignore */
 #include "common/file_perm.h"
 #include "common/file_utils.h"
 #include "common/logging.h"
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d3588607e7..786d592e2b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2901,10 +2901,11 @@ dumpDatabase(Archive *fout)
 	qdatname = pg_strdup(fmtId(datname));
 
 	/*
-	 * Prepare the CREATE DATABASE command.  We must specify encoding, locale,
-	 * and tablespace since those can't be altered later.  Other DB properties
-	 * are left to the DATABASE PROPERTIES entry, so that they can be applied
-	 * after reconnecting to the target DB.
+	 * Prepare the CREATE DATABASE command.  We must specify OID (if we want
+	 * to preserve that), as well as the encoding, locale, and tablespace
+	 * since those can't be altered later.  Other DB properties are left to
+	 * the DATABASE PROPERTIES entry, so that they can be applied after
+	 * reconnecting to the target DB.
 	 */
 	if (dopt->binary_upgrade)
 	{
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 9a2816de51..338dfca5a0 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -196,10 +196,6 @@ FullTransactionIdAdvance(FullTransactionId *dest)
 #define FirstUnpinnedObjectId	12000
 #define FirstNormalObjectId		16384
 
-/* OIDs of Template0 and Postgres database are fixed */
-#define 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-04-20 Thread Robert Haas
On Wed, Apr 20, 2022 at 4:56 PM Tom Lane  wrote:
> Having just had to bury my nose in renumber_oids.pl, I thought of a
> different approach we could take to expose these OIDs to Catalog.pm.
> That's to invent a new macro that Catalog.pm recognizes, and write
> something about like this in pg_database.h:
>
> DECLARE_OID_DEFINING_MACRO(Template0ObjectId, 4);
> DECLARE_OID_DEFINING_MACRO(PostgresObjectId, 5);
>
> If that seems more plausible to you I'll set about preparing a patch.

I like it!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-04-20 Thread Tom Lane
Robert Haas  writes:
> On Wed, Apr 20, 2022 at 2:34 PM Tom Lane  wrote:
>> The attached draft patch attempts to improve this situation.
>> It reserves these OIDs, and creates the associated macros, through
>> the normal BKI infrastructure by adding entries in pg_database.dat.
>> We have to delete those rows again during initdb, which is slightly
>> ugly but surely no more so than initdb's other direct manipulations
>> of pg_database.

> I'm not sure I really like this approach, but if you're firmly
> convinced that it's better than cleaning up the loose ends in some
> other way, I'm not going to waste a lot of energy fighting about it.

Having just had to bury my nose in renumber_oids.pl, I thought of a
different approach we could take to expose these OIDs to Catalog.pm.
That's to invent a new macro that Catalog.pm recognizes, and write
something about like this in pg_database.h:

DECLARE_OID_DEFINING_MACRO(Template0ObjectId, 4);
DECLARE_OID_DEFINING_MACRO(PostgresObjectId, 5);

That would result in (a) the OIDs becoming known to Catalog.pm
as reserved, though it wouldn't have any great clue about their
semantics; and (b) this getting emitted into pg_database_d.h:

#define Template0ObjectId 4
#define PostgresObjectId 5

Then we'd not need the dummy entries in pg_database.dat, which
does seem cleaner now that I think about it.  A downside is that
with no context, Catalog.pm could not provide name translation
services during postgres.bki generation for such OIDs --- but
at least for these entries, we don't need that.

If that seems more plausible to you I'll set about preparing a patch.

regards, tom lane




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-04-20 Thread Robert Haas
On Wed, Apr 20, 2022 at 2:34 PM Tom Lane  wrote:
> The attached draft patch attempts to improve this situation.
> It reserves these OIDs, and creates the associated macros, through
> the normal BKI infrastructure by adding entries in pg_database.dat.
> We have to delete those rows again during initdb, which is slightly
> ugly but surely no more so than initdb's other direct manipulations
> of pg_database.

I'm not sure I really like this approach, but if you're firmly
convinced that it's better than cleaning up the loose ends in some
other way, I'm not going to waste a lot of energy fighting about it.
It doesn't seem horrible.

> There are a few loose ends:
>
> * I'm a bit inclined to simplify IsPinnedObject by just teaching
> it that *no* entries of pg_database are pinned, which would correspond
> to the evident lack of enforcement in dropdb().  Can anyone see a
> reason why we might pin some database in future?

It's kind of curious that we don't pin anything now. There's kind of
nothing to stop you from hosing the system by dropping template0
and/or template1, or mutating them into some crazy form that doesn't
work. But having said that, if as a matter of policy we don't even pin
template0 or template1 or postgres, then it seems unlikely that we
would suddenly decide to pin some new database in the future.

> * I had to set up the additional pg_database entries with nonzero
> datfrozenxid to avoid an assertion failure during initdb's first VACUUM.
> (That VACUUM will overwrite template1's datfrozenxid before computing
> the global minimum frozen XID, but not these others; and it doesn't like
> finding that the minimum is zero.)  This feels klugy.  An alternative is
> to delete the extra pg_database rows before that VACUUM, which would
> mean taking those deletes out of make_template0 and make_postgres and
> putting them somewhere seemingly unrelated, so that's a bit crufty too.
> Anybody have a preference?

Not really. If anything that's an argument against this entire
approach, but I already commented on that point above.

> * The new macro names seem ill-chosen.  Template0ObjectId is spelled
> randomly differently from the longstanding TemplateDbOid, and surely
> PostgresObjectId is about as vague a name as could possibly have
> been thought of (please name an object that it couldn't apply to).
> I'm a little inclined to rename TemplateDbOid to Template1DbOid and
> use Template0DbOid and PostgresDbOid for the others, but I didn't pull
> that trigger here.

Seems totally reasonable. I don't find the current naming horrible or
I'd not have committed it that way, but putting Db in there makes
sense, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-04-20 Thread Tom Lane
Robert Haas  writes:
> On Sat, Jan 22, 2022 at 2:20 AM Shruthi Gowda  wrote:
>> Agree. In the latest patch, the template0 and postgres OIDs are fixed
>> to unused manually assigned OIDs 4 and 5 respectively. These OIDs are
>> no more listed as unused OIDs.

> Thanks. Committed with a few more cosmetic changes.

I happened to take a closer look at this patch today, and I'm pretty
unhappy with the way that the assignment of those OIDs was managed.
There are two big problems:

1. IsPinnedObject() will now report that template0 and postgres are
pinned.  This seems not to prevent one from dropping them (I guess
dropdb() doesn't consult IsPinnedObject), but it would probably bollix
any pg_shdepend management that should happen for them.

2. The Catalog.pm infrastructure knows nothing about these OIDs.
While the unused_oids script was hack-n-slashed to claim that the
OIDs are used, other scripts won't know about them; for example
duplicate_oids won't report conflicts if someone tries to reuse
those OIDs.

The attached draft patch attempts to improve this situation.
It reserves these OIDs, and creates the associated macros, through
the normal BKI infrastructure by adding entries in pg_database.dat.
We have to delete those rows again during initdb, which is slightly
ugly but surely no more so than initdb's other direct manipulations
of pg_database.

There are a few loose ends:

* I'm a bit inclined to simplify IsPinnedObject by just teaching
it that *no* entries of pg_database are pinned, which would correspond
to the evident lack of enforcement in dropdb().  Can anyone see a
reason why we might pin some database in future?

* I had to set up the additional pg_database entries with nonzero
datfrozenxid to avoid an assertion failure during initdb's first VACUUM.
(That VACUUM will overwrite template1's datfrozenxid before computing
the global minimum frozen XID, but not these others; and it doesn't like
finding that the minimum is zero.)  This feels klugy.  An alternative is
to delete the extra pg_database rows before that VACUUM, which would
mean taking those deletes out of make_template0 and make_postgres and
putting them somewhere seemingly unrelated, so that's a bit crufty too.
Anybody have a preference?

* The new macro names seem ill-chosen.  Template0ObjectId is spelled
randomly differently from the longstanding TemplateDbOid, and surely
PostgresObjectId is about as vague a name as could possibly have
been thought of (please name an object that it couldn't apply to).
I'm a little inclined to rename TemplateDbOid to Template1DbOid and
use Template0DbOid and PostgresDbOid for the others, but I didn't pull
that trigger here.

Comments?

regards, tom lane

diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 520f77971b..d7e5c02f95 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -339,9 +339,11 @@ IsPinnedObject(Oid classId, Oid objectId)
 	 * robustness.
 	 */
 
-	/* template1 is not pinned */
+	/* template1, template0, postgres are not pinned */
 	if (classId == DatabaseRelationId &&
-		objectId == TemplateDbOid)
+		(objectId == TemplateDbOid ||
+		 objectId == Template0ObjectId ||
+		 objectId == PostgresObjectId))
 		return false;
 
 	/* the public namespace is not pinned */
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 1cb4a5b0d2..04454b3d7c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -59,11 +59,11 @@
 #include "sys/mman.h"
 #endif
 
-#include "access/transam.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_class_d.h" /* pgrminclude ignore */
 #include "catalog/pg_collation_d.h"
+#include "catalog/pg_database_d.h"	/* pgrminclude ignore */
 #include "common/file_perm.h"
 #include "common/file_utils.h"
 #include "common/logging.h"
@@ -1806,14 +1806,24 @@ make_template0(FILE *cmdfd)
 	 * objects in the old cluster, the problem scenario only exists if the OID
 	 * that is in use in the old cluster is also used in the new cluster - and
 	 * the new cluster should be the result of a fresh initdb.)
-	 *
-	 * We use "STRATEGY = file_copy" here because checkpoints during initdb
-	 * are cheap. "STRATEGY = wal_log" would generate more WAL, which would
-	 * be a little bit slower and make the new cluster a little bit bigger.
 	 */
 	static const char *const template0_setup[] = {
-		"CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID = "
-		CppAsString2(Template0ObjectId)
+		/*
+		 * Since pg_database.dat includes a dummy row for template0, we have
+		 * to remove that before creating the database for real.
+		 */
+		"DELETE FROM pg_database WHERE datname = 'template0';\n\n",
+
+		/*
+		 * Clone template1 to make template0.
+		 *
+		 * We use "STRATEGY = file_copy" here because checkpoints during
+		 * initdb are cheap.  "STRATEGY = wal_log" would generate more WAL,
+		 * which would be a little bit slower and make the 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-03-22 Thread Robert Haas
On Mon, Mar 21, 2022 at 8:52 PM Andres Freund  wrote:
> I noticed this still has an open CF entry: 
> https://commitfest.postgresql.org/37/3296/
> I assume it can be marked as committed?

Yeah, done now. But don't forget that we still need to do something on
the "wrong fds used for refilenodes after pg_upgrade relfilenode
changes Reply-To:" thread, and I think the ball is in your court
there.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-03-21 Thread Andres Freund
On 2022-01-24 14:44:10 -0500, Robert Haas wrote:
> On Sat, Jan 22, 2022 at 2:20 AM Shruthi Gowda  wrote:
> > Agree. In the latest patch, the template0 and postgres OIDs are fixed
> > to unused manually assigned OIDs 4 and 5 respectively. These OIDs are
> > no more listed as unused OIDs.
> 
> Thanks. Committed with a few more cosmetic changes.

I noticed this still has an open CF entry: 
https://commitfest.postgresql.org/37/3296/
I assume it can be marked as committed?

- Andres




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-03-12 Thread Justin Pryzby
On Tue, Jan 25, 2022 at 10:19:53AM +0530, Shruthi Gowda wrote:
> On Tue, Jan 25, 2022 at 1:14 AM Robert Haas  wrote:
> > On Sat, Jan 22, 2022 at 2:20 AM Shruthi Gowda  wrote:
> > > Agree. In the latest patch, the template0 and postgres OIDs are fixed
> > > to unused manually assigned OIDs 4 and 5 respectively. These OIDs are
> > > no more listed as unused OIDs.
> >
> > Thanks. Committed with a few more cosmetic changes.
> 
> Thanks, Robert for committing this patch.

If I'm not wrong, this can be closed.
https://commitfest.postgresql.org/37/3296/




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-24 Thread Shruthi Gowda
On Tue, Jan 25, 2022 at 1:14 AM Robert Haas  wrote:
>
> On Sat, Jan 22, 2022 at 2:20 AM Shruthi Gowda  wrote:
> > Agree. In the latest patch, the template0 and postgres OIDs are fixed
> > to unused manually assigned OIDs 4 and 5 respectively. These OIDs are
> > no more listed as unused OIDs.
>
> Thanks. Committed with a few more cosmetic changes.

Thanks, Robert for committing this patch.




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-24 Thread Robert Haas
On Sat, Jan 22, 2022 at 2:20 AM Shruthi Gowda  wrote:
> Agree. In the latest patch, the template0 and postgres OIDs are fixed
> to unused manually assigned OIDs 4 and 5 respectively. These OIDs are
> no more listed as unused OIDs.

Thanks. Committed with a few more cosmetic changes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-24 Thread Bruce Momjian
On Sat, Jan 22, 2022 at 12:47:35PM +0530, Shruthi Gowda wrote:
> On Sat, Jan 22, 2022 at 12:27 AM Tom Lane  wrote:
> >
> > Robert Haas  writes:
> > > It seems to me that what this comment is saying is that OIDs in the
> > > second and third categories are doled out by counters. Therefore, we
> > > can't know which of those OIDs will get used, or how many of them will
> > > get used, or which objects will get which OIDs. Therefore, I think we
> > > should go back to the approach that you were using for template0 and
> > > handle both that database and postgres using that method. That is,
> > > assign an OID manually, and make sure unused_oids knows that it should
> > > be counted as already used.
> >
> > Indeed.  If you're going to manually assign OIDs to these databases,
> > do it honestly, and put them into the range intended for that purpose.
> > Trying to take short-cuts is just going to cause trouble down the road.
> 
> Understood. I will rework the patch accordingly. Thanks

Thanks.  To get the rsync update reduction we need to preserve database
oids.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-21 Thread Shruthi Gowda
On Sat, Jan 22, 2022 at 12:17 AM Robert Haas  wrote:
>
> On Fri, Jan 21, 2022 at 8:40 AM Shruthi Gowda  wrote:
> > From what I see in the code, template0 and postgres are the last
> > things that get created in initdb phase. The system OIDs that get
> > assigned to these DBs vary from release to release. At present, the
> > system assigned OIDs of template0 and postgres are 13679 and 13680
> > respectively. I feel it would be safe to assign 16000 and 16001 for
> > template0 and postgres respectively from the unpinned object OID range
> > 12000 - 16383. In the future, even if the initdb unpinned objects
> > reach the range of 16000 issues can only arise if initdb() creates
> > another system-created database for which the system assigns these
> > reserved OIDs (16000, 16001).
>
> It doesn't seem safe to me to rely on that. We don't know what could
> happen in the future if the number of built-in objects increases.
> Looking at the lengthy comment on this topic in transam.h, I see that
> there are three ranges:
>
> 1- manually assigned OIDs
> 1-11999 OIDs assigned by genbki.pl
> 12000-16384 OIDs assigned to unpinned objects post-bootstrap
>
> It seems to me that what this comment is saying is that OIDs in the
> second and third categories are doled out by counters. Therefore, we
> can't know which of those OIDs will get used, or how many of them will
> get used, or which objects will get which OIDs. Therefore, I think we
> should go back to the approach that you were using for template0 and
> handle both that database and postgres using that method. That is,
> assign an OID manually, and make sure unused_oids knows that it should
> be counted as already used.

Agree. In the latest patch, the template0 and postgres OIDs are fixed
to unused manually assigned OIDs 4 and 5 respectively. These OIDs are
no more listed as unused OIDs.


Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v13-0001-pg_upgrade-perserve-database-OID-patch.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-21 Thread Shruthi Gowda
On Sat, Jan 22, 2022 at 12:27 AM Tom Lane  wrote:
>
> Robert Haas  writes:
> > It seems to me that what this comment is saying is that OIDs in the
> > second and third categories are doled out by counters. Therefore, we
> > can't know which of those OIDs will get used, or how many of them will
> > get used, or which objects will get which OIDs. Therefore, I think we
> > should go back to the approach that you were using for template0 and
> > handle both that database and postgres using that method. That is,
> > assign an OID manually, and make sure unused_oids knows that it should
> > be counted as already used.
>
> Indeed.  If you're going to manually assign OIDs to these databases,
> do it honestly, and put them into the range intended for that purpose.
> Trying to take short-cuts is just going to cause trouble down the road.

Understood. I will rework the patch accordingly. Thanks

Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-21 Thread Tom Lane
Robert Haas  writes:
> It seems to me that what this comment is saying is that OIDs in the
> second and third categories are doled out by counters. Therefore, we
> can't know which of those OIDs will get used, or how many of them will
> get used, or which objects will get which OIDs. Therefore, I think we
> should go back to the approach that you were using for template0 and
> handle both that database and postgres using that method. That is,
> assign an OID manually, and make sure unused_oids knows that it should
> be counted as already used.

Indeed.  If you're going to manually assign OIDs to these databases,
do it honestly, and put them into the range intended for that purpose.
Trying to take short-cuts is just going to cause trouble down the road.

regards, tom lane




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-21 Thread Robert Haas
On Fri, Jan 21, 2022 at 8:40 AM Shruthi Gowda  wrote:
> From what I see in the code, template0 and postgres are the last
> things that get created in initdb phase. The system OIDs that get
> assigned to these DBs vary from release to release. At present, the
> system assigned OIDs of template0 and postgres are 13679 and 13680
> respectively. I feel it would be safe to assign 16000 and 16001 for
> template0 and postgres respectively from the unpinned object OID range
> 12000 - 16383. In the future, even if the initdb unpinned objects
> reach the range of 16000 issues can only arise if initdb() creates
> another system-created database for which the system assigns these
> reserved OIDs (16000, 16001).

It doesn't seem safe to me to rely on that. We don't know what could
happen in the future if the number of built-in objects increases.
Looking at the lengthy comment on this topic in transam.h, I see that
there are three ranges:

1- manually assigned OIDs
1-11999 OIDs assigned by genbki.pl
12000-16384 OIDs assigned to unpinned objects post-bootstrap

It seems to me that what this comment is saying is that OIDs in the
second and third categories are doled out by counters. Therefore, we
can't know which of those OIDs will get used, or how many of them will
get used, or which objects will get which OIDs. Therefore, I think we
should go back to the approach that you were using for template0 and
handle both that database and postgres using that method. That is,
assign an OID manually, and make sure unused_oids knows that it should
be counted as already used.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-21 Thread Shruthi Gowda
On Fri, Jan 21, 2022 at 1:08 AM Robert Haas  wrote:
>
> On Thu, Jan 20, 2022 at 11:03 AM Shruthi Gowda  wrote:
> > It is not required for PostgresObjectId. The unused_oids script
> > provides a list of unused oids in the manually-assignable OIDs range
> > (1-).
>
> Well, so ... why are we not treating the OIDs for these two databases
> the same? If there's a range from which we can assign OIDs without
> risk of duplication and without needing to update this script, perhaps
> we ought to assign both of them from that range and leave the script
> alone.

>From what I see in the code, template0 and postgres are the last
things that get created in initdb phase. The system OIDs that get
assigned to these DBs vary from release to release. At present, the
system assigned OIDs of template0 and postgres are 13679 and 13680
respectively. I feel it would be safe to assign 16000 and 16001 for
template0 and postgres respectively from the unpinned object OID range
12000 - 16383. In the future, even if the initdb unpinned objects
reach the range of 16000 issues can only arise if initdb() creates
another system-created database for which the system assigns these
reserved OIDs (16000, 16001).

> + * that is in use in the old cluster is also used in th new cluster - and
>
> th -> the.

Fixed

> +preserves the DB, tablespace, relfilenode OIDs so TOAST and other references
>
> Insert "and" before "relfilenode".

Fixed

Attached is the latest patch for review.

Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v12-0001-pg_upgrade-perserve-database-OID-patch.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-20 Thread Robert Haas
On Thu, Jan 20, 2022 at 11:03 AM Shruthi Gowda  wrote:
> It is not required for PostgresObjectId. The unused_oids script
> provides a list of unused oids in the manually-assignable OIDs range
> (1-).

Well, so ... why are we not treating the OIDs for these two databases
the same? If there's a range from which we can assign OIDs without
risk of duplication and without needing to update this script, perhaps
we ought to assign both of them from that range and leave the script
alone.

+ * that is in use in the old cluster is also used in th new cluster - and

th -> the.

+preserves the DB, tablespace, relfilenode OIDs so TOAST and other references

Insert "and" before "relfilenode".

I think this is in pretty good shape now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-20 Thread Shruthi Gowda
On Thu, Jan 20, 2022 at 7:57 PM Robert Haas  wrote:
>
> On Thu, Jan 20, 2022 at 7:09 AM Shruthi Gowda  wrote:
> > > Here's an updated version in which I've reverted the changes to gram.y
> > > and tried to improve the comments and documentation. Could you have a
> > > look at implementing (2) above?
> >
> > Attached is the patch that implements comment (2).
>
> This probably needs minor rebasing on account of the fact that I just
> pushed the patch to remove datlastsysoid. I intended to do that before
> you posted a new version to save you the trouble, but I was too slow
> (or you were too fast, however you want to look at it).

I have rebased the patch. Kindly have a look at it.

> + errmsg("Invalid value for option \"%s\"", defel->defname),
>
> Per the "error message style" section of the documentation, primary
> error messages neither begin with a capital letter nor end with a
> period, while errdetail() messages are complete sentences and thus
> both begin with a capital letter and end with a period. But what I
> think you should really do here is get rid of the error detail and
> convey all the information in a primary error message. e.g. "OID %u is
> a system OID", or maybe better, "OIDs less than %u are reserved for
> system objects".

Fixed

> + errmsg("database oid %u is already used by database %s",
> + errmsg("data directory exists for database oid %u", dboid));
>
> Usually we write "OID" rather than "oid" in error messages. I think
> maybe it would be best to change the text slightly too. I suggest:
>
> database OID %u is already in use by database \"%s\"
> data directory already exists for database with OID %u

The second error message will be reported when the data directory with
the given OID exists in the data path but the corresponding DB does
not exist. This could happen only if the user creates directories in
the data folder which is indeed not an invalid usage. I have updated
the error message to "data directory with the specified OID %u already
exists" because the error message you recommended gives a slightly
different meaning.

> + * it would fail. To avoid that, assign a fixed OID to template0 and
> + * postgres rather than letting the server choose one.
>
> a fixed OID -> fixed OIDs
> one -> them
>
> Or maybe put this comment back the way I had it and just talk about
> postgres, and then change the comment in make_postgres to say "Assign
> a fixed OID to postgres, for the same reasons as template0."

Fixed

> + /*
> + * Make sure that binary upgrade propagate the database OID to the new
> + * cluster
> + */
>
> This comment doesn't really seem necessary. It's sort of self-explanatory.

Removed

> +# Push the OID that is reserved for template0 database.
> +my $Template0ObjectId =
> +  Catalog::FindDefinedSymbol('access/transam.h', '..', 'Template0ObjectId');
> +push @{$oids}, $Template0ObjectId;
>
> Don't you need to do this for PostgresObjectId also?

It is not required for PostgresObjectId. The unused_oids script
provides a list of unused oids in the manually-assignable OIDs range
(1-).

Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v11-0001-pg_upgrade-perserve-database-OID-patch.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-20 Thread Robert Haas
On Thu, Jan 20, 2022 at 7:09 AM Shruthi Gowda  wrote:
> > Here's an updated version in which I've reverted the changes to gram.y
> > and tried to improve the comments and documentation. Could you have a
> > look at implementing (2) above?
>
> Attached is the patch that implements comment (2).

This probably needs minor rebasing on account of the fact that I just
pushed the patch to remove datlastsysoid. I intended to do that before
you posted a new version to save you the trouble, but I was too slow
(or you were too fast, however you want to look at it).

+ errmsg("Invalid value for option \"%s\"", defel->defname),

Per the "error message style" section of the documentation, primary
error messages neither begin with a capital letter nor end with a
period, while errdetail() messages are complete sentences and thus
both begin with a capital letter and end with a period. But what I
think you should really do here is get rid of the error detail and
convey all the information in a primary error message. e.g. "OID %u is
a system OID", or maybe better, "OIDs less than %u are reserved for
system objects".

+ errmsg("database oid %u is already used by database %s",
+ errmsg("data directory exists for database oid %u", dboid));

Usually we write "OID" rather than "oid" in error messages. I think
maybe it would be best to change the text slightly too. I suggest:

database OID %u is already in use by database \"%s\"
data directory already exists for database with OID %u

+ * it would fail. To avoid that, assign a fixed OID to template0 and
+ * postgres rather than letting the server choose one.

a fixed OID -> fixed OIDs
one -> them

Or maybe put this comment back the way I had it and just talk about
postgres, and then change the comment in make_postgres to say "Assign
a fixed OID to postgres, for the same reasons as template0."

+ /*
+ * Make sure that binary upgrade propagate the database OID to the new
+ * cluster
+ */

This comment doesn't really seem necessary. It's sort of self-explanatory.

+# Push the OID that is reserved for template0 database.
+my $Template0ObjectId =
+  Catalog::FindDefinedSymbol('access/transam.h', '..', 'Template0ObjectId');
+push @{$oids}, $Template0ObjectId;

Don't you need to do this for PostgresObjectId also?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-20 Thread Shruthi Gowda
On Tue, Jan 18, 2022 at 2:34 AM Robert Haas  wrote:
>
> On Mon, Jan 17, 2022 at 9:57 AM Shruthi Gowda  wrote:
> > I have rebased and generated the patches on top of PostgreSQL commit
> > ID cf925936ecc031355cd56fbd392ec3180517a110.
> > Kindly apply v8-0001-pg_upgrade-Preserve-relfilenodes-and-tablespace-O.patch
> > first and then v8-0002-Preserve-database-OIDs-in-pg_upgrade.patch.
>
> OK, so looking over 0002, I noticed a few things:
>
> 1. datlastsysoid isn't being used for anything any more. That's not a
> defect in your patch, but I've separately proposed to remove it.

okay

> 2. I realized that the whole idea here depends on not having initdb
> create more than one database without a fixed OID. The patch solves
> that by nailing down the OID of template0, which is a sufficient
> solution. However, I think nailing down the (initial) OID of postgres
> as well would be a good idea, just in case somebody in the future
> decides to add another system-created database.

I agree with your thought. In my latest patch, postgres db gets
created with a fixed OID.
I have chosen an arbitrary number 16000 as postgres OID from the
unpinned object OID range1200 - 16383.

> 3. The changes to gram.y don't do anything. Sure, you've added a new
> "OID" token, but nothing generates that token, so it has no effect.
> The reason the syntax works is that createdb_opt_name accepts "IDENT",
> which means any string that's not in the keyword list (see kwlist.h).
> But that's there already, so you don't need to do anything in this
> file.

okay

> 4. I felt that the documentation and comments could be somewhat improved.

The documentation and comment updates are more accurate with the
required details. Thanks.

> Here's an updated version in which I've reverted the changes to gram.y
> and tried to improve the comments and documentation. Could you have a
> look at implementing (2) above?

Attached is the patch that implements comment (2).

Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v10-0001-pg_upgrade-perserve-database-OID-patch.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-17 Thread Shruthi Gowda
On Tue, Jan 18, 2022 at 12:35 AM Robert Haas  wrote:
>
> On Tue, Dec 14, 2021 at 1:21 PM Shruthi Gowda  wrote:
> > Thanks, Robert for the updated version. I reviewed the changes and it
> > looks fine.
> > I also tested the patch. The patch works as expected.
>
> Thanks.
>
> > > - I adjusted the function header comment for heap_create. Your
> > > proposed comment seemed like it was pretty detailed but not 100%
> > > correct. It also made one of the lines kind of long because you didn't
> > > wrap the text in the surrounding style. I decided to make it simpler
> > > and shorter instead of longer still and 100% correct.
> >
> > The comment update looks fine. However, I still feel it would be good to
> > mention on which (rare) circumstance a valid relfilenode can get passed.
>
> In general, I think it's the job of a function parameter comment to
> describe what the parameter does, not how the callers actually use it.
> One problem with describing the latter is that, if someone later adds
> another caller, there is a pretty good chance that they won't notice
> that the comment needs to be changed. More fundamentally, the
> parameter function comments should be like an instruction manual for
> how to use the function. If you are trying to figure out how to use
> this function, it is not helpful to know that "most callers like to
> pass false" for this parameter. What you need to know is what value
> your new call site should pass, and knowing what "most callers" do or
> that something is "rare" doesn't really help. If we want to make this
> comment more detailed, we should approach it from the point of view of
> explaining how it ought to be set.

It's clear now. Thanks for clarifying.

> I've committed the v8-0001 patch you attached. I'll write separately
> about v8-0002.

Sure. Thank you.




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-17 Thread Robert Haas
On Mon, Jan 17, 2022 at 9:57 AM Shruthi Gowda  wrote:
> I have rebased and generated the patches on top of PostgreSQL commit
> ID cf925936ecc031355cd56fbd392ec3180517a110.
> Kindly apply v8-0001-pg_upgrade-Preserve-relfilenodes-and-tablespace-O.patch
> first and then v8-0002-Preserve-database-OIDs-in-pg_upgrade.patch.

OK, so looking over 0002, I noticed a few things:

1. datlastsysoid isn't being used for anything any more. That's not a
defect in your patch, but I've separately proposed to remove it.

2. I realized that the whole idea here depends on not having initdb
create more than one database without a fixed OID. The patch solves
that by nailing down the OID of template0, which is a sufficient
solution. However, I think nailing down the (initial) OID of postgres
as well would be a good idea, just in case somebody in the future
decides to add another system-created database.

3. The changes to gram.y don't do anything. Sure, you've added a new
"OID" token, but nothing generates that token, so it has no effect.
The reason the syntax works is that createdb_opt_name accepts "IDENT",
which means any string that's not in the keyword list (see kwlist.h).
But that's there already, so you don't need to do anything in this
file.

4. I felt that the documentation and comments could be somewhat improved.

Here's an updated version in which I've reverted the changes to gram.y
and tried to improve the comments and documentation. Could you have a
look at implementing (2) above?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v9-0001-pg_upgrade-perserve-database-OID-patch-from-shrut.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-17 Thread Robert Haas
On Tue, Dec 14, 2021 at 1:21 PM Shruthi Gowda  wrote:
> Thanks, Robert for the updated version. I reviewed the changes and it
> looks fine.
> I also tested the patch. The patch works as expected.

Thanks.

> > - I adjusted the function header comment for heap_create. Your
> > proposed comment seemed like it was pretty detailed but not 100%
> > correct. It also made one of the lines kind of long because you didn't
> > wrap the text in the surrounding style. I decided to make it simpler
> > and shorter instead of longer still and 100% correct.
>
> The comment update looks fine. However, I still feel it would be good to
> mention on which (rare) circumstance a valid relfilenode can get passed.

In general, I think it's the job of a function parameter comment to
describe what the parameter does, not how the callers actually use it.
One problem with describing the latter is that, if someone later adds
another caller, there is a pretty good chance that they won't notice
that the comment needs to be changed. More fundamentally, the
parameter function comments should be like an instruction manual for
how to use the function. If you are trying to figure out how to use
this function, it is not helpful to know that "most callers like to
pass false" for this parameter. What you need to know is what value
your new call site should pass, and knowing what "most callers" do or
that something is "rare" doesn't really help. If we want to make this
comment more detailed, we should approach it from the point of view of
explaining how it ought to be set.

I've committed the v8-0001 patch you attached. I'll write separately
about v8-0002.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-17 Thread Shruthi Gowda
On Sat, Jan 15, 2022 at 11:17 AM Julien Rouhaud  wrote:
>
> Hi,
>
> On Fri, Dec 17, 2021 at 01:03:06PM +0530, Shruthi Gowda wrote:
> >
> > I have updated the DBOID preserve patch to handle this case and
> > generated the latest patch on top of your v7-001-preserve-relfilenode
> > patch.
>
> The cfbot reports that the patch doesn't apply anymore:
> http://cfbot.cputube.org/patch_36_3296.log
> === Applying patches on top of PostgreSQL commit ID 
> 5513dc6a304d8bda114004a3b906cc6fde5d6274 ===
> === applying patch ./v7-0002-Preserve-database-OIDs-in-pg_upgrade.patch
> [...]
> patching file src/bin/pg_upgrade/info.c
> Hunk #1 FAILED at 190.
> Hunk #2 succeeded at 351 (offset 27 lines).
> 1 out of 2 hunks FAILED -- saving rejects to file 
> src/bin/pg_upgrade/info.c.rej
> patching file src/bin/pg_upgrade/pg_upgrade.h
> Hunk #1 FAILED at 145.
> 1 out of 1 hunk FAILED -- saving rejects to file 
> src/bin/pg_upgrade/pg_upgrade.h.rej
> patching file src/bin/pg_upgrade/relfilenode.c
> Hunk #1 FAILED at 193.
> 1 out of 1 hunk FAILED -- saving rejects to file 
> src/bin/pg_upgrade/relfilenode.c.rej
>
> Could you send a rebased version?  In the meantime I willl switch the cf entry
> to Waiting on Author.

I have rebased and generated the patches on top of PostgreSQL commit
ID cf925936ecc031355cd56fbd392ec3180517a110.
Kindly apply v8-0001-pg_upgrade-Preserve-relfilenodes-and-tablespace-O.patch
first and then v8-0002-Preserve-database-OIDs-in-pg_upgrade.patch.

Thanks & Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v8-0002-Preserve-database-OIDs-in-pg_upgrade.patch
Description: Binary data


v8-0001-pg_upgrade-Preserve-relfilenodes-and-tablespace-O.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-14 Thread Julien Rouhaud
Hi,

On Fri, Dec 17, 2021 at 01:03:06PM +0530, Shruthi Gowda wrote:
> 
> I have updated the DBOID preserve patch to handle this case and
> generated the latest patch on top of your v7-001-preserve-relfilenode
> patch.

The cfbot reports that the patch doesn't apply anymore:
http://cfbot.cputube.org/patch_36_3296.log
=== Applying patches on top of PostgreSQL commit ID 
5513dc6a304d8bda114004a3b906cc6fde5d6274 ===
=== applying patch ./v7-0002-Preserve-database-OIDs-in-pg_upgrade.patch
[...]
patching file src/bin/pg_upgrade/info.c
Hunk #1 FAILED at 190.
Hunk #2 succeeded at 351 (offset 27 lines).
1 out of 2 hunks FAILED -- saving rejects to file src/bin/pg_upgrade/info.c.rej
patching file src/bin/pg_upgrade/pg_upgrade.h
Hunk #1 FAILED at 145.
1 out of 1 hunk FAILED -- saving rejects to file 
src/bin/pg_upgrade/pg_upgrade.h.rej
patching file src/bin/pg_upgrade/relfilenode.c
Hunk #1 FAILED at 193.
1 out of 1 hunk FAILED -- saving rejects to file 
src/bin/pg_upgrade/relfilenode.c.rej

Could you send a rebased version?  In the meantime I willl switch the cf entry
to Waiting on Author.




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-16 Thread Shruthi Gowda
On Mon, Dec 13, 2021 at 8:43 PM Shruthi Gowda  wrote:
>
> On Mon, Dec 6, 2021 at 11:25 PM Robert Haas  wrote:
> >
> > On Sun, Dec 5, 2021 at 11:44 PM Sadhuprasad Patro  wrote:
> > > 3.
> > > @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt 
> > > *stmt)
> > >   */
> > >   pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock);
> > >
> > > - do
> > > + /* Select an OID for the new database if is not explicitly configured. 
> > > */
> > > + if (!OidIsValid(dboid))
> > >   {
> > > - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId,
> > > -Anum_pg_database_oid);
> > > - } while (check_db_file_conflict(dboid));
> > >
> > > I think we need to do 'check_db_file_conflict' for the USER given OID
> > > also.. right? It may already be present.
> >
> > Hopefully, if that happens, we straight up fail later on.
>
> That's right. If a database with user-specified OID exists, the
> createdb fails with a "duplicate key value" error.
> If just a data directory with user-specified OID exists,
> MakePGDirectory() fails to create the directory and the cleanup
> callback createdb_failure_callback() removes the directory that was
> not created by 'createdb()' function.
> The subsequent create database call with the same OID will succeed.
> Should we handle the case where a data directory exists and the
> corresponding DB with that oid does not exist? I presume this
> situation doesn't arise unless the user tries to create directories in
> the data path. Any thoughts?

I have updated the DBOID preserve patch to handle this case and
generated the latest patch on top of your v7-001-preserve-relfilenode
patch.

Thanks & Regards
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v7-0002-Preserve-database-OIDs-in-pg_upgrade.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-15 Thread tushar

On 12/15/21 12:09 AM, tushar wrote:

I spent much of today reviewing 0001. Here's an updated version, so
far only lightly tested. Please check whether I've broken anything.

Thanks Robert, I tested from v96/12/13/v14 -> v15( with patch)
things are working fine i.e table /index relfilenode is preserved,
not changing after pg_upgrade. 
I covered tablespace OIDs testing scenarios and that is also preserved 
after pg_upgrade.


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-14 Thread tushar

On 12/14/21 2:35 AM, Robert Haas wrote:

I spent much of today reviewing 0001. Here's an updated version, so
far only lightly tested. Please check whether I've broken anything.

Thanks Robert, I tested from v96/12/13/v14 -> v15( with patch)
things are working fine i.e table /index relfilenode is preserved,
not changing after pg_upgrade.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-14 Thread Shruthi Gowda
On Tue, Dec 14, 2021 at 2:35 AM Robert Haas  wrote:
>
> On Mon, Dec 13, 2021 at 9:40 AM Shruthi Gowda  wrote:
> > > I am reviewing another patch
> > > "v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg" as well
> > > and will provide the comments soon if any...
>
> I spent much of today reviewing 0001. Here's an updated version, so
> far only lightly tested. Please check whether I've broken anything.
> Here are the changes:

Thanks, Robert for the updated version. I reviewed the changes and it
looks fine.
I also tested the patch. The patch works as expected.

> - I adjusted the function header comment for heap_create. Your
> proposed comment seemed like it was pretty detailed but not 100%
> correct. It also made one of the lines kind of long because you didn't
> wrap the text in the surrounding style. I decided to make it simpler
> and shorter instead of longer still and 100% correct.

The comment update looks fine. However, I still feel it would be good to
mention on which (rare) circumstance a valid relfilenode can get passed.

> - I removed a one-line comment that said /* Override the toast
> relfilenode */ because it preceded an error check, not a line of code
> that would have done what the comment claimed.
>
> - I removed a one-line comment that said /* Override the relfilenode
> */  because the following code would only sometimes override the
> relfilenode. The code didn't seem complex enough to justify a a longer
> and more accurate comment, so I just took it out.

Fine

> - I changed a test for (relkind == RELKIND_RELATION || relkind ==
> RELKIND_SEQUENCE || relkind == RELKIND_MATVIEW) to use
> RELKIND_HAS_STORAGE(). It's true that not all of the storage types
> that RELKIND_HAS_STORAGE() tests are possible here, but that's not a
> reason to avoiding using the macro. If somebody adds a new relkind
> with storage in the future, they might miss the need to manually
> update this place, but they will not likely miss the need to update
> RELKIND_HAS_STORAGE() since, if they did, their code probably wouldn't
> work at all.

I agree.

> - I changed the way that you were passing create_storage down to
> heap_create. I think I said before that you should EITHER fix it so
> that we set create_storage = true only when the relation actually has
> storage OR ELSE have heap_create() itself override the value to false
> when there is no storage. You did both. There are times when it's
> reasonable to ensure the same thing in multiple places, but this
> doesn't seem to be one of them. So I took that out. I chose to retain
> the code in heap_create() that overrides the value to false, added a
> comment explaining that it does that, and then adjusted the callers to
> ignore the storage type. I then added comments, and in one place an
> assertion, to make it clearer what is happening.

The changes are fine. Thanks for the fine-tuning.

> - In pg_dump.c, I adjusted the comment that says "Not every relation
> has storage." and the test that immediately follows, to ignore the
> relfilenode when relkind says it's a partitioned table. Really,
> partitioned tables should never have had relfilenodes, but as it turns
> out, they used to have them.
>

Fine. Understood.

Thanks & Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-13 Thread Robert Haas
On Mon, Dec 13, 2021 at 9:40 AM Shruthi Gowda  wrote:
> > I am reviewing another patch
> > "v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg" as well
> > and will provide the comments soon if any...

I spent much of today reviewing 0001. Here's an updated version, so
far only lightly tested. Please check whether I've broken anything.
Here are the changes:

- I adjusted the function header comment for heap_create. Your
proposed comment seemed like it was pretty detailed but not 100%
correct. It also made one of the lines kind of long because you didn't
wrap the text in the surrounding style. I decided to make it simpler
and shorter instead of longer still and 100% correct.

- I removed a one-line comment that said /* Override the toast
relfilenode */ because it preceded an error check, not a line of code
that would have done what the comment claimed.

- I removed a one-line comment that said /* Override the relfilenode
*/  because the following code would only sometimes override the
relfilenode. The code didn't seem complex enough to justify a a longer
and more accurate comment, so I just took it out.

- I changed a test for (relkind == RELKIND_RELATION || relkind ==
RELKIND_SEQUENCE || relkind == RELKIND_MATVIEW) to use
RELKIND_HAS_STORAGE(). It's true that not all of the storage types
that RELKIND_HAS_STORAGE() tests are possible here, but that's not a
reason to avoiding using the macro. If somebody adds a new relkind
with storage in the future, they might miss the need to manually
update this place, but they will not likely miss the need to update
RELKIND_HAS_STORAGE() since, if they did, their code probably wouldn't
work at all.

- I changed the way that you were passing create_storage down to
heap_create. I think I said before that you should EITHER fix it so
that we set create_storage = true only when the relation actually has
storage OR ELSE have heap_create() itself override the value to false
when there is no storage. You did both. There are times when it's
reasonable to ensure the same thing in multiple places, but this
doesn't seem to be one of them. So I took that out. I chose to retain
the code in heap_create() that overrides the value to false, added a
comment explaining that it does that, and then adjusted the callers to
ignore the storage type. I then added comments, and in one place an
assertion, to make it clearer what is happening.

- In pg_dump.c, I adjusted the comment that says "Not every relation
has storage." and the test that immediately follows, to ignore the
relfilenode when relkind says it's a partitioned table. Really,
partitioned tables should never have had relfilenodes, but as it turns
out, they used to have them.

Let me know your thoughts.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v7-0001-pg_upgrade-Preserve-relfilenodes-and-tablespace-O.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-13 Thread Shruthi Gowda
On Mon, Dec 6, 2021 at 11:25 PM Robert Haas  wrote:
>
> On Sun, Dec 5, 2021 at 11:44 PM Sadhuprasad Patro  wrote:
> > 3.
> > @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
> >   */
> >   pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock);
> >
> > - do
> > + /* Select an OID for the new database if is not explicitly configured. */
> > + if (!OidIsValid(dboid))
> >   {
> > - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId,
> > -Anum_pg_database_oid);
> > - } while (check_db_file_conflict(dboid));
> >
> > I think we need to do 'check_db_file_conflict' for the USER given OID
> > also.. right? It may already be present.
>
> Hopefully, if that happens, we straight up fail later on.

That's right. If a database with user-specified OID exists, the
createdb fails with a "duplicate key value" error.
If just a data directory with user-specified OID exists,
MakePGDirectory() fails to create the directory and the cleanup
callback createdb_failure_callback() removes the directory that was
not created by 'createdb()' function.
The subsequent create database call with the same OID will succeed.
Should we handle the case where a data directory exists and the
corresponding DB with that oid does not exist? I presume this
situation doesn't arise unless the user tries to create directories in
the data path. Any thoughts?


Thanks & Regards
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-13 Thread Shruthi Gowda
On Mon, Dec 6, 2021 at 10:14 AM Sadhuprasad Patro  wrote:
>
> On Tue, Oct 26, 2021 at 6:55 PM Shruthi Gowda  wrote:
> >
> >
> > I have revised the patch w.r.t the way 'create_storage' is interpreted
> > in heap_create() along with some minor changes to preserve the DBOID
> > patch.
> >
>
> Hi Shruthi,
>
> I am reviewing the attached patches and providing a few comments here
> below for patch "v5-0002-Preserve-database-OIDs-in-pg_upgrade.patch"
>
> 1.
> --- a/doc/src/sgml/ref/create_database.sgml
> +++ b/doc/src/sgml/ref/create_database.sgml
> @@ -31,7 +31,8 @@ CREATE DATABASE  class="parameter">name
> -   [ IS_TEMPLATE [=]  class="parameter">istemplate ] ]
> +   [ IS_TEMPLATE [=]  class="parameter">istemplate ]
> +   [ OID [=]  class="parameter">db_oid ] ]
>
> Replace "db_oid" with 'oid'. Below in the listitem, we have mentioned 'oid'.

Replaced "db_oid" with "oid"

>
> 2.
> --- a/src/backend/commands/dbcommands.c
> +++ b/src/backend/commands/dbcommands.c
> + if ((dboid < FirstNormalObjectId) &&
> + (strcmp(dbname, "template0") != 0) &&
> + (!IsBinaryUpgrade))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE)),
> + errmsg("Invalid value for option \"%s\"", defel->defname),
> + errhint("The specified OID %u is less than the minimum OID for user
> objects %u.",
> + dboid, FirstNormalObjectId));
> + }
>
> Are we sure that 'IsBinaryUpgrade' will be set properly, before the
> createdb function is called? Can we recheck once ?

I believe 'IsBinaryUpgrade' will be set to true when pg_upgrade is invoked.
pg_ugrade internally does pg_dump and pg_restore for every database in
the cluster.

> 3.
> @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
>   */
>   pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock);
>
> - do
> + /* Select an OID for the new database if is not explicitly configured. */
> + if (!OidIsValid(dboid))
>   {
> - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId,
> -Anum_pg_database_oid);
> - } while (check_db_file_conflict(dboid));
>
> I think we need to do 'check_db_file_conflict' for the USER given OID
> also.. right? It may already be present.

If a datafile with user-specified OID exists, the create database
fails with the below error.
postgres=# create database d2 oid 16452;
ERROR:  could not create directory "base/16452": File exists

> 4.
> --- a/src/bin/initdb/initdb.c
> +++ b/src/bin/initdb/initdb.c
>
> /*
> + * Create template0 database with oid Template0ObjectId i.e, 4
> + */
> +
>
> Better to mention here, why OID 4 is reserved for template0 database?.

The comment is updated to explain why template0 oid is fixed.

> 5.
> + /*
> + * Create template0 database with oid Template0ObjectId i.e, 4
> + */
> + static const char *const template0_setup[] = {
> + "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID 
> "
> + CppAsString2(Template0ObjectId) ";\n\n",
>
> Can we write something like, 'OID = CppAsString2(Template0ObjectId)'?
> mention "=".

Fixed

> 6.
> +
> + /*
> + * We use the OID of postgres to determine datlastsysoid
> + */
> + "UPDATE pg_database SET datlastsysoid = "
> + "(SELECT oid FROM pg_database "
> + "WHERE datname = 'postgres');\n\n",
> +
>
> Make the above comment a single line comment.

As Robert confirmed, this part of the code is moved from a different place.

> 7.
> There are some spelling mistakes in the comments as below, please
> correct the same
> + /*
> + * Make sure that binary upgrade propogate the database OID to the
> new => correct spelling
> + * cluster
> + */
>
> +/* OID 4 is reserved for Templete0 database */
>  > Correct spelling
> +#define Template0ObjectId 4
>

Fixed.

> I am reviewing another patch
> "v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg" as well
> and will provide the comments soon if any...

Thanks. I have rebased relfilenode oid preserve patch. You may use the
rebased patch for review.

Thanks & Regards
Shruthi K C
EnterpriseDB: http://www.enterprisedb.com


v6-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg.patch
Description: Binary data


v6-0002-Preserve-database-OIDs-in-pg_upgrade.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-06 Thread Robert Haas
On Sun, Dec 5, 2021 at 11:44 PM Sadhuprasad Patro  wrote:
> 1.
> --- a/doc/src/sgml/ref/create_database.sgml
> +++ b/doc/src/sgml/ref/create_database.sgml
> @@ -31,7 +31,8 @@ CREATE DATABASE  class="parameter">name
> -   [ IS_TEMPLATE [=]  class="parameter">istemplate ] ]
> +   [ IS_TEMPLATE [=]  class="parameter">istemplate ]
> +   [ OID [=]  class="parameter">db_oid ] ]
>
> Replace "db_oid" with 'oid'. Below in the listitem, we have mentioned 'oid'.

I agree that the listitem and the synopsis need to be consistent, but
it could be made consistent either by changing that one to db_oid or
this one to oid.

> 2.
> --- a/src/backend/commands/dbcommands.c
> +++ b/src/backend/commands/dbcommands.c
> + if ((dboid < FirstNormalObjectId) &&
> + (strcmp(dbname, "template0") != 0) &&
> + (!IsBinaryUpgrade))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE)),
> + errmsg("Invalid value for option \"%s\"", defel->defname),
> + errhint("The specified OID %u is less than the minimum OID for user
> objects %u.",
> + dboid, FirstNormalObjectId));
> + }
>
> Are we sure that 'IsBinaryUpgrade' will be set properly, before the
> createdb function is called? Can we recheck once ?

How could it be set incorrectly, and how could we recheck this?

> 3.
> @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
>   */
>   pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock);
>
> - do
> + /* Select an OID for the new database if is not explicitly configured. */
> + if (!OidIsValid(dboid))
>   {
> - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId,
> -Anum_pg_database_oid);
> - } while (check_db_file_conflict(dboid));
>
> I think we need to do 'check_db_file_conflict' for the USER given OID
> also.. right? It may already be present.

Hopefully, if that happens, we straight up fail later on.

> 4.
> --- a/src/bin/initdb/initdb.c
> +++ b/src/bin/initdb/initdb.c
>
> /*
> + * Create template0 database with oid Template0ObjectId i.e, 4
> + */
> +
>
> Better to mention here, why OID 4 is reserved for template0 database?.

I'm not sure how we would give a reason for selecting an arbitrary
constant? We could perhaps explain why we use a fixed OID. But there's
no reason it has to be 4, I think.

> 5.
> + /*
> + * Create template0 database with oid Template0ObjectId i.e, 4
> + */
> + static const char *const template0_setup[] = {
> + "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID 
> "
> + CppAsString2(Template0ObjectId) ";\n\n",
>
> Can we write something like, 'OID = CppAsString2(Template0ObjectId)'?
> mention "=".

That seems like a good idea, because it would be more consistent.

> 6.
> +
> + /*
> + * We use the OID of postgres to determine datlastsysoid
> + */
> + "UPDATE pg_database SET datlastsysoid = "
> + "(SELECT oid FROM pg_database "
> + "WHERE datname = 'postgres');\n\n",
> +
>
> Make the above comment a single line comment.

I think what Shruthi did is more correct. It doesn't have to be done
as a single-line comment just because it can fit on one line. And
Shruthi didn't write this comment anyway, it's only moved slightly
from where it was before.

> 7.
> There are some spelling mistakes in the comments as below, please
> correct the same
> + /*
> + * Make sure that binary upgrade propogate the database OID to the
> new => correct spelling
> + * cluster
> + */
>
> +/* OID 4 is reserved for Templete0 database */
>  > Correct spelling
> +#define Template0ObjectId 4

Yes, those would be good to fix.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-05 Thread Sadhuprasad Patro
On Tue, Oct 26, 2021 at 6:55 PM Shruthi Gowda  wrote:
>
>
> I have revised the patch w.r.t the way 'create_storage' is interpreted
> in heap_create() along with some minor changes to preserve the DBOID
> patch.
>

Hi Shruthi,

I am reviewing the attached patches and providing a few comments here
below for patch "v5-0002-Preserve-database-OIDs-in-pg_upgrade.patch"

1.
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -31,7 +31,8 @@ CREATE DATABASE name
-   [ IS_TEMPLATE [=] istemplate ] ]
+   [ IS_TEMPLATE [=] istemplate ]
+   [ OID [=] db_oid ] ]

Replace "db_oid" with 'oid'. Below in the listitem, we have mentioned 'oid'.

2.
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
+ if ((dboid < FirstNormalObjectId) &&
+ (strcmp(dbname, "template0") != 0) &&
+ (!IsBinaryUpgrade))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE)),
+ errmsg("Invalid value for option \"%s\"", defel->defname),
+ errhint("The specified OID %u is less than the minimum OID for user
objects %u.",
+ dboid, FirstNormalObjectId));
+ }

Are we sure that 'IsBinaryUpgrade' will be set properly, before the
createdb function is called? Can we recheck once ?

3.
@@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
  */
  pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock);

- do
+ /* Select an OID for the new database if is not explicitly configured. */
+ if (!OidIsValid(dboid))
  {
- dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId,
-Anum_pg_database_oid);
- } while (check_db_file_conflict(dboid));

I think we need to do 'check_db_file_conflict' for the USER given OID
also.. right? It may already be present.

4.
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c

/*
+ * Create template0 database with oid Template0ObjectId i.e, 4
+ */
+

Better to mention here, why OID 4 is reserved for template0 database?.

5.
+ /*
+ * Create template0 database with oid Template0ObjectId i.e, 4
+ */
+ static const char *const template0_setup[] = {
+ "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID "
+ CppAsString2(Template0ObjectId) ";\n\n",

Can we write something like, 'OID = CppAsString2(Template0ObjectId)'?
mention "=".

6.
+
+ /*
+ * We use the OID of postgres to determine datlastsysoid
+ */
+ "UPDATE pg_database SET datlastsysoid = "
+ "(SELECT oid FROM pg_database "
+ "WHERE datname = 'postgres');\n\n",
+

Make the above comment a single line comment.

7.
There are some spelling mistakes in the comments as below, please
correct the same
+ /*
+ * Make sure that binary upgrade propogate the database OID to the
new => correct spelling
+ * cluster
+ */

+/* OID 4 is reserved for Templete0 database */
 > Correct spelling
+#define Template0ObjectId 4


I am reviewing another patch
"v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg" as well
and will provide the comments soon if any...

Thanks & Regards
SadhuPrasad
EnterpriseDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-10-26 Thread Shruthi Gowda
On Thu, Oct 7, 2021 at 7:33 PM Robert Haas  wrote:
>
> On Thu, Oct 7, 2021 at 3:24 AM Shruthi Gowda  wrote:
> > Every other
> > caller/flow passes false for 'create_storage' and we still need to
> > create storage in heap_create() if relkind has storage.
>
> That seems surprising.

I have revised the patch w.r.t the way 'create_storage' is interpreted
in heap_create() along with some minor changes to preserve the DBOID
patch.

Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg.patch
Description: Binary data


v5-0002-Preserve-database-OIDs-in-pg_upgrade.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-10-07 Thread Robert Haas
On Thu, Oct 7, 2021 at 3:24 AM Shruthi Gowda  wrote:
> Every other
> caller/flow passes false for 'create_storage' and we still need to
> create storage in heap_create() if relkind has storage.

That seems surprising.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-10-07 Thread Shruthi Gowda
On Thu, Oct 7, 2021 at 2:05 AM Robert Haas  wrote:
>
> On Mon, Oct 4, 2021 at 12:44 PM Shruthi Gowda  wrote:
> > Thanks for the inputs, Robert. In the v4 patch, an unused OID (i.e, 4)
> > is fixed for the template0 and the same is removed from unused oid
> > list.
> >
> > In addition to the review comment fixes, I have removed some code that
> > is no longer needed/doesn't make sense since we preserve the OIDs.
>
> This is not a full review, but I'm wondering about this bit of code:
>
> -   if (!RELKIND_HAS_STORAGE(relkind) || OidIsValid(relfilenode))
> +   if (!RELKIND_HAS_STORAGE(relkind) || (OidIsValid(relfilenode)
> && !create_storage))
> create_storage = false;
> else
> {
> create_storage = true;
> -   relfilenode = relid;
> +
> +   /*
> +* Create the storage with oid same as relid if relfilenode is
> +* unspecified by the caller
> +*/
> +   if (!OidIsValid(relfilenode))
> +   relfilenode = relid;
> }
>
> This seems hard to understand, and I wonder if perhaps it can be
> simplified. If !RELKIND_HAS_STORAGE(relkind), then we're going to set
> create_storage to false if it was previously true, and otherwise just
> do nothing. Otherwise, if !create_storage, we'll enter the
> create_storage = false branch which effectively does nothing.
> Otherwise, if !OidIsValid(relfilenode), we'll set relfilenode = relid.
> So couldn't we express that like this?
>
> if (!RELKIND_HAS_STORAGE(relkind))
> create_storage = false;
> else if (create_storage && !OidIsValid(relfilenode))
> relfilenode = relid;
>
> If so, that seems more clear.

'create_storage' flag says whether or not to create the storage when a
valid relfilenode is passed. 'create_storage' flag alone cannot make
the storage creation decision in heap_create().
Only binary upgrade flow sets the 'create_storage' flag to true and
expects storage gets created with specified relfilenode. Every other
caller/flow passes false for 'create_storage' and we still need to
create storage in heap_create() if relkind has storage. That's why I
have explicitly set 'create_storage = true' in the else flow and
initialize relfilenode on need basis.

Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-10-06 Thread Robert Haas
On Mon, Oct 4, 2021 at 12:44 PM Shruthi Gowda  wrote:
> Thanks for the inputs, Robert. In the v4 patch, an unused OID (i.e, 4)
> is fixed for the template0 and the same is removed from unused oid
> list.
>
> In addition to the review comment fixes, I have removed some code that
> is no longer needed/doesn't make sense since we preserve the OIDs.

This is not a full review, but I'm wondering about this bit of code:

-   if (!RELKIND_HAS_STORAGE(relkind) || OidIsValid(relfilenode))
+   if (!RELKIND_HAS_STORAGE(relkind) || (OidIsValid(relfilenode)
&& !create_storage))
create_storage = false;
else
{
create_storage = true;
-   relfilenode = relid;
+
+   /*
+* Create the storage with oid same as relid if relfilenode is
+* unspecified by the caller
+*/
+   if (!OidIsValid(relfilenode))
+   relfilenode = relid;
}

This seems hard to understand, and I wonder if perhaps it can be
simplified. If !RELKIND_HAS_STORAGE(relkind), then we're going to set
create_storage to false if it was previously true, and otherwise just
do nothing. Otherwise, if !create_storage, we'll enter the
create_storage = false branch which effectively does nothing.
Otherwise, if !OidIsValid(relfilenode), we'll set relfilenode = relid.
So couldn't we express that like this?

if (!RELKIND_HAS_STORAGE(relkind))
create_storage = false;
else if (create_storage && !OidIsValid(relfilenode))
relfilenode = relid;

If so, that seems more clear.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-10-04 Thread Shruthi Gowda
On Fri, Sep 24, 2021 at 12:44 AM Robert Haas  wrote:
>
> On Wed, Sep 22, 2021 at 3:07 PM Shruthi Gowda  wrote:
> > > - The comment in binary_upgrade_set_pg_class_oids() is still not
> > > accurate. You removed the sentence which says "Indexes cannot have
> > > toast tables, so we need not make this probe in the index code path"
> > > but the immediately preceding sentence is still inaccurate in at least
> > > two ways. First, it only talks about tables, but the code now applies
> > > to indexes. Second, it only talks about OIDs, but now also deals with
> > > refilenodes. It's really important to fully update every comment that
> > > might be affected by your changes!
> >
> > The comment is updated.
>
> Looks good.
>
> > The SQL query will not result in duplicate rows because the first join
> > filters the duplicate rows if any with the on clause ' i.indisvalid'
> > on it. The result of the first join is further left joined with pg_class and
> > pg_class will not have duplicate rows for a given oid.
>
> Oh, you're right. My mistake.
>
> > As per your suggestion, reloid and relfilenode are absorbed in the same 
> > place.
> > An additional parameter called 'suppress_storage' is passed to heap_create()
> > which indicates whether or not to create the storage when the caller
> > passed a valid relfilenode.
>
> I find it confusing to have both suppress_storage and create_storage
> with one basically as the negation of the other. To avoid that sort of
> thing I generally have a policy that variables and options should say
> whether something should happen, rather than whether it should be
> prevented from happening. Sometimes there are good reasons - such as
> strong existing precedent - to deviate from this practice but I think
> it's good to follow when possible. So my proposal is to always have
> create_storage and never suppress_storage, and if some function needs
> to adjust the value of create_storage that was passed to it then OK.

Sure, I agree. In the latest patch, only 'create_storage' is used.

> > I did not make the changes to set the oid and relfilenode in the same call.
> > I feel the uniformity w.r.t the other function signatures in
> > pg_upgrade_support.c will be lost because currently each function sets
> > only one attribute.
> > Also, renaming the applicable function names to represent that they
> > set both oid and relfilenode will make the function name even longer.
> > We may opt to not include the relfilenode in the function name instead
> > use a generic name like binary_upgrade_set_next_xxx_pg_class_oid() but
> > then
> > we will end up with some functions that set two attributes and some
> > functions that set one attribute.
>
> OK.
>
> > I have also attached the patch to preserve the DB oid. As discussed,
> > template0 will be created with a fixed oid during initdb. I am using
> > OID 2 for template0. Even though oid 2 is already in use for the
> > 'pg_am' catalog I see no harm in using it for template0 DB because oid
> > doesn’t have to be unique across the database - it has to be unique
> > for the particular catalog table. Kindly let me know if I am missing
> > something?
> > Apparently, if we did decide to pick an unused oid for template0 then
> > I see a challenge in removing that oid from the unused oid list. I
> > could not come up with a feasible solution for handling it.
>
> You are correct that there is no intrinsic reason why the same OID
> can't be used in various different catalogs. We have a policy of not
> doing that, though; I'm not clear on the reason. Maybe it'd be OK to
> deviate from that policy here, but another option would be to simply
> change the unused_oids script (and maybe some of the others). It
> already has:
>
> my $FirstGenbkiObjectId =
>   Catalog::FindDefinedSymbol('access/transam.h', '..', 'FirstGenbkiObjectId');
> push @{$oids}, $FirstGenbkiObjectId;
>
> Presumably it could be easily adapted to push the value of some other
> defined symbol into @{$oids} also, thus making that OID in effect
> used.

Thanks for the inputs, Robert. In the v4 patch, an unused OID (i.e, 4)
is fixed for the template0 and the same is removed from unused oid
list.

In addition to the review comment fixes, I have removed some code that
is no longer needed/doesn't make sense since we preserve the OIDs.

Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v4-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg.patch
Description: Binary data


v4-0002-Preserve-database-OIDs-in-pg_upgrade.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-09-23 Thread Robert Haas
On Wed, Sep 22, 2021 at 3:07 PM Shruthi Gowda  wrote:
> > - The comment in binary_upgrade_set_pg_class_oids() is still not
> > accurate. You removed the sentence which says "Indexes cannot have
> > toast tables, so we need not make this probe in the index code path"
> > but the immediately preceding sentence is still inaccurate in at least
> > two ways. First, it only talks about tables, but the code now applies
> > to indexes. Second, it only talks about OIDs, but now also deals with
> > refilenodes. It's really important to fully update every comment that
> > might be affected by your changes!
>
> The comment is updated.

Looks good.

> The SQL query will not result in duplicate rows because the first join
> filters the duplicate rows if any with the on clause ' i.indisvalid'
> on it. The result of the first join is further left joined with pg_class and
> pg_class will not have duplicate rows for a given oid.

Oh, you're right. My mistake.

> As per your suggestion, reloid and relfilenode are absorbed in the same place.
> An additional parameter called 'suppress_storage' is passed to heap_create()
> which indicates whether or not to create the storage when the caller
> passed a valid relfilenode.

I find it confusing to have both suppress_storage and create_storage
with one basically as the negation of the other. To avoid that sort of
thing I generally have a policy that variables and options should say
whether something should happen, rather than whether it should be
prevented from happening. Sometimes there are good reasons - such as
strong existing precedent - to deviate from this practice but I think
it's good to follow when possible. So my proposal is to always have
create_storage and never suppress_storage, and if some function needs
to adjust the value of create_storage that was passed to it then OK.

> I did not make the changes to set the oid and relfilenode in the same call.
> I feel the uniformity w.r.t the other function signatures in
> pg_upgrade_support.c will be lost because currently each function sets
> only one attribute.
> Also, renaming the applicable function names to represent that they
> set both oid and relfilenode will make the function name even longer.
> We may opt to not include the relfilenode in the function name instead
> use a generic name like binary_upgrade_set_next_xxx_pg_class_oid() but
> then
> we will end up with some functions that set two attributes and some
> functions that set one attribute.

OK.

> I have also attached the patch to preserve the DB oid. As discussed,
> template0 will be created with a fixed oid during initdb. I am using
> OID 2 for template0. Even though oid 2 is already in use for the
> 'pg_am' catalog I see no harm in using it for template0 DB because oid
> doesn’t have to be unique across the database - it has to be unique
> for the particular catalog table. Kindly let me know if I am missing
> something?
> Apparently, if we did decide to pick an unused oid for template0 then
> I see a challenge in removing that oid from the unused oid list. I
> could not come up with a feasible solution for handling it.

You are correct that there is no intrinsic reason why the same OID
can't be used in various different catalogs. We have a policy of not
doing that, though; I'm not clear on the reason. Maybe it'd be OK to
deviate from that policy here, but another option would be to simply
change the unused_oids script (and maybe some of the others). It
already has:

my $FirstGenbkiObjectId =
  Catalog::FindDefinedSymbol('access/transam.h', '..', 'FirstGenbkiObjectId');
push @{$oids}, $FirstGenbkiObjectId;

Presumably it could be easily adapted to push the value of some other
defined symbol into @{$oids} also, thus making that OID in effect
used.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-09-22 Thread Shruthi Gowda
On Tue, Aug 24, 2021 at 2:27 AM Robert Haas  wrote:
> It's pretty clear from the discussion, I think, that the database OID
> one is going to need rework to be considered.
>
> Regarding the other one:
>
> - The comment in binary_upgrade_set_pg_class_oids() is still not
> accurate. You removed the sentence which says "Indexes cannot have
> toast tables, so we need not make this probe in the index code path"
> but the immediately preceding sentence is still inaccurate in at least
> two ways. First, it only talks about tables, but the code now applies
> to indexes. Second, it only talks about OIDs, but now also deals with
> refilenodes. It's really important to fully update every comment that
> might be affected by your changes!

The comment is updated.

> - The SQL query in that function isn't completely correct. There is a
> left join from pg_class to pg_index whose ON clause includes
> "c.reltoastrelid = i.indrelid AND i.indisvalid." The reason it's
> likely that is because it is possible, in corner cases, for a TOAST
> table to have multiple TOAST indexes. I forget exactly how that
> happens, but I think it might be like if a REINDEX CONCURRENTLY on the
> TOAST table fails midway through, or something of that sort. Now if
> that happens, the LEFT JOIN you added is going to cause the output to
> contain multiple rows, because you didn't replicate the i.indisvalid
> condition into that ON clause. And then it will fail. Apparently we
> don't have a pg_upgrade test case for this scenario; we probably
> should. Actually what I think would be even better than putting
> i.indisvalid into that ON clause would be to join off of i.indrelid
> rather than c.reltoastrelid.

The SQL query will not result in duplicate rows because the first join
filters the duplicate rows if any with the on clause ' i.indisvalid'
on it. The result of the first join is further left joined with pg_class and
pg_class will not have duplicate rows for a given oid.

> - The code that decodes the various columns of this query does so in a
> slightly different order than the query itself. It would be better to
> make it match. Perhaps put relkind first in both cases. I might also
> think about trying to make the column naming a bit more consistent,
> e.g. relkind, relfilenode, toast_oid, toast_relfilenode,
> toast_index_oid, toast_index_relfilenode.

Fixed.

> - In heap_create(), the wording of the error messages is not quite
> consistent. You have "relfilenode value not set when in binary upgrade
> mode", "toast relfilenode value not set when in binary upgrade mode",
> and "pg_class index relfilenode value not set when in binary upgrade
> mode". Why does the last one mention pg_class when the other two
> don't?

The error message is made consistent. This code chuck is moved to a different
place as a part of another review comment fix.

> - The code in heap_create() now has no comments whatsoever, which is a
> shame, because it's actually kind of a tricky bit of logic. Someone
> might wonder why we override the relfilenode inside that function
> instead of doing it at the same places where we absorb
> binary_upgrade_next_{heap,index,toast}_pg_class_oid and the passing
> down the relfilenode. I think the answer is that passing down the
> relfilenode from the caller would result in storage not actually being
> created, whereas in this case we want it to be created but just with
> the value we specify, and the reason we want that is because we need
> later DDL that happens after these statements but before the old
> cluster's relations are moved over to execute successfully, which it
> won't if the storage is altogether absent.

> However, that raises the question of whether this patch has even got
> the basic design right. Maybe we ought to actually be absorbing the
> relfilenode setting at the same places where we're doing so for the
> OID, and then passing an additional parameter to heap_create() like
> bool suppress_storage or something like that. Maybe, taking it even
> further, we ought to be changing the signatures of
> binary_upgrade_next_heap_pg_class_oid and friends to be two-argument
> functions, and pass down the OID and the relfilenode in the same call,
> rather than calling two separate functions. I'm not so much concerned
> about the cost of calling two functions as the potential for
> confusion. I'm not honestly sure that either of these changes are the
> right thing to do, but I am pretty strongly inclined to do at least
> the first part - trying to absorb reloid and relfilenode in the same
> places. If we're not going to do that we certainly need to explain why
> we're doing it the way we are in the comments.

As per your suggestion, reloid and relfilenode are absorbed in the same place.
An additional parameter called 'suppress_storage' is passed to heap_create()
which indicates whether or not to create the storage when the caller
passed a valid relfilenode.

I did not make the changes to set the oid and relfilenode in the same 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 01:24:46PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote:
> > > Yes, we're talking about either incremental (or perhaps differential)
> > > backup where only the files which are actually different would be backed
> > > up.  Just like with PG, I can't provide any complete guarantees that
> > > we'd be able to actually make this possible after a major version with
> > > pgBackRest with this change, but it definitely isn't possible *without*
> > > this change.  I can't see any reason why we wouldn't be able to do a
> > > checksum-based incremental backup though (which would be *much* faster
> > > than a regular backup) once this change is made and have that be a
> > > reliable and trustworthy backup.  I'd want to think about it more and
> > > discuss it with David in some detail before saying if we could maybe
> > > perform a timestamp-based incremental backup (without checksum'ing the
> > > files, as we do in normal situations), but that would really just be a
> > > bonus.
> > 
> > Well, it would be nice to know exactly how it would help pgBackRest if
> > that is one of the reasons we are adding this feature.
> 
> pgBackRest keeps a manifest for every file in the PG data directory that
> is backed up and we identify that file by the filename.  Further, we
> calculate a checksum for every file.  If the filenames didn't change
> then we'd be able to compare the file in the new cluster against the
> file and checksum in the manifest in order to be able to perform the
> incremental/differential backup.  We don't store the inodes in the
> manifest though, and we don't have any concept of looking at multiple
> data directories at the same time or anything like that (which would
> also mean that the old data directory would have to be kept around for
> that to even work, which seems like a good bit of additional
> complication and risk that someone might start up the old cluster by
> accident..).
> 
> That's how it'd be very helpful to pgBackRest for the filenames to be
> preserved across pg_upgrade's.

OK, that is clear.

> > > > > > As far as TDE, I haven't seen any concrete plan for that, so why add
> > > > > > this code for that reason?
> > > > > 
> > > > > That this would help with TDE (of which there seems little doubt...) 
> > > > > is
> > > > > an additional benefit to this.  Specifically, taking the existing work
> > > > > that's already been done to allow block-by-block encryption and
> > > > > adjusting it for AES-XTS and then using the db-dir+relfileno+block
> > > > > number as the IV, just like many disk encryption systems do, avoids 
> > > > > the
> > > > > concerns that were brought up about using LSN for the IV with CTR and
> > > > > it's certainly not difficult to do, but it does depend on this change.
> > > > > This was all discussed previously and it sure looks like a sensible
> > > > > approach to use that mirrors what many other systems already do
> > > > > successfully.
> > > > 
> > > > Well, I would think we would not add this for TDE until we were sure
> > > > someone was working on adding TDE.
> > > 
> > > That this would help with TDE is what I'd consider an added bonus.
> > 
> > Not if we have no plans to implement TDE, which was my point.  Why not
> > wait to see if we are actually going to implement TDE rather than adding
> > it now.  It is just so obvious, why do I have to state this?
> 
> There's been multiple years of effort put into implementing TDE and I'm
> sure hopeful that it continues as I'm trying to put effort into moving
> it forward myself.  I'm a bit baffled by the idea that we're just

Well, this is the first time I am hearing this publicly.

> suddenly going to stop putting effort into TDE as it is brought up time
> and time again by clients that I've talked to as one of the few reasons
> they haven't moved to PG yet- I can't believe that hasn't been
> experienced by folks at other organizations too, I mean, there's people
> maintaining forks of PG specifically for TDE ...

Agreed.

> > > I've certainly done it and I'd be kind of surprised if others haven't,
> > > but I've also played a lot with pg_dump in various modes, so perhaps
> > > that's not a great representation.  I've definitely had to explain to
> > > clients why there's a whole different set of filenames after a
> > > pg_upgrade and why that is the case for an 'in place' upgrade before
> > > too.
> > 
> > Uh, so I guess I am right that few people have mentioned this in the
> > past.  Why were users caring about the file names?
> 
> This is a bit baffling to me.  Users and admins certainly care about
> what files their data is stored in and knowing how to find them.
> Covering the data directory structure is a commonly asked for part of
> the training that I regularly do for clients.

I just never thought people cared about the file names, since I have
never heard a complaint 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 01:20:38PM -0400, Robert Haas wrote:
> So I think this proposed change is in the safe direction. If
> relfilenodes were currently preserved and we wanted to make them not
> be preserved, then I think you would be quite right to say "whoa,
> whoa, that could be a problem." Indeed it could. If anyone then in the
> future wanted to introduce a dependency on them staying the same, they
> would have a problem. However, nothing in the server itself can care
> about relfilenodes - or anything else - being *different* across a
> pg_upgrade. The whole point of pg_upgrade is to make it feel like you
> have the same database after you run it as you did before you ran it,
> even though under the hood a lot of surgery has been done. Barring
> bugs, you can never be sad about there being too LITTLE difference
> between the post-upgrade database and the pre-upgrade database.

Yes, this makes sense, and it is good we have stated the possible
benefits now:

*  pgBackRest
*  pg_upgrade diagnostics
*  TDE (maybe)

We can eventually evaluate the value of this based on those items.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote:
> > Yes, we're talking about either incremental (or perhaps differential)
> > backup where only the files which are actually different would be backed
> > up.  Just like with PG, I can't provide any complete guarantees that
> > we'd be able to actually make this possible after a major version with
> > pgBackRest with this change, but it definitely isn't possible *without*
> > this change.  I can't see any reason why we wouldn't be able to do a
> > checksum-based incremental backup though (which would be *much* faster
> > than a regular backup) once this change is made and have that be a
> > reliable and trustworthy backup.  I'd want to think about it more and
> > discuss it with David in some detail before saying if we could maybe
> > perform a timestamp-based incremental backup (without checksum'ing the
> > files, as we do in normal situations), but that would really just be a
> > bonus.
> 
> Well, it would be nice to know exactly how it would help pgBackRest if
> that is one of the reasons we are adding this feature.

pgBackRest keeps a manifest for every file in the PG data directory that
is backed up and we identify that file by the filename.  Further, we
calculate a checksum for every file.  If the filenames didn't change
then we'd be able to compare the file in the new cluster against the
file and checksum in the manifest in order to be able to perform the
incremental/differential backup.  We don't store the inodes in the
manifest though, and we don't have any concept of looking at multiple
data directories at the same time or anything like that (which would
also mean that the old data directory would have to be kept around for
that to even work, which seems like a good bit of additional
complication and risk that someone might start up the old cluster by
accident..).

That's how it'd be very helpful to pgBackRest for the filenames to be
preserved across pg_upgrade's.

> > > > > As far as TDE, I haven't seen any concrete plan for that, so why add
> > > > > this code for that reason?
> > > > 
> > > > That this would help with TDE (of which there seems little doubt...) is
> > > > an additional benefit to this.  Specifically, taking the existing work
> > > > that's already been done to allow block-by-block encryption and
> > > > adjusting it for AES-XTS and then using the db-dir+relfileno+block
> > > > number as the IV, just like many disk encryption systems do, avoids the
> > > > concerns that were brought up about using LSN for the IV with CTR and
> > > > it's certainly not difficult to do, but it does depend on this change.
> > > > This was all discussed previously and it sure looks like a sensible
> > > > approach to use that mirrors what many other systems already do
> > > > successfully.
> > > 
> > > Well, I would think we would not add this for TDE until we were sure
> > > someone was working on adding TDE.
> > 
> > That this would help with TDE is what I'd consider an added bonus.
> 
> Not if we have no plans to implement TDE, which was my point.  Why not
> wait to see if we are actually going to implement TDE rather than adding
> it now.  It is just so obvious, why do I have to state this?

There's been multiple years of effort put into implementing TDE and I'm
sure hopeful that it continues as I'm trying to put effort into moving
it forward myself.  I'm a bit baffled by the idea that we're just
suddenly going to stop putting effort into TDE as it is brought up time
and time again by clients that I've talked to as one of the few reasons
they haven't moved to PG yet- I can't believe that hasn't been
experienced by folks at other organizations too, I mean, there's people
maintaining forks of PG specifically for TDE ...

Seems like maybe we were both seeing something as obvious to the other
that wasn't actually the case.

> > > > > > make this required, general improved sanity when working with 
> > > > > > pg_upgrade
> > > > > > is frankly a benefit in its own right too...).  If the additional 
> > > > > > code
> > > > > 
> > > > > How?  I am not aware of any advantage except cosmetic.
> > > > 
> > > > Having to resort to matching up inode numbers between the two clusters
> > > > after a pg_upgrade to figure out what files are actually the same
> > > > underneath is a pain that goes beyond just cosmetics imv.  Removing that
> > > > additional level that admins, and developers for that matter, have to go
> > > > through would be a nice improvement on its own.
> > > 
> > > OK, I was just not aware anyone did that, since I have never hard anyone
> > > complain about it before.
> > 
> > I've certainly done it and I'd be kind of surprised if others haven't,
> > but I've also played a lot with pg_dump in various modes, so perhaps
> > that's not a great representation.  I've definitely had to explain to
> > clients why there's a whole different set of filenames after a
> > pg_upgrade 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Robert Haas
On Thu, Aug 26, 2021 at 12:51 PM Bruce Momjian  wrote:
> I just don't want to add requirements/complexity to pg_upgrade without
> clearly stated reasons because future database changes will need to
> honor this new preservation behavior.

Well, I agree that it's good to have reasons clearly stated and I hope
that at this point you agree that they have been. Whether you agree
with them is another question, but I hope you at least agree that they
have been stated.

As far as the other part of your concern, what I think makes this
change pretty safe is that we are preserving more things rather than
fewer. I can imagine some server behavior depending on something being
the same between the old and the new clusters, but it is harder to
imagine a dependency on something not being preserved. For example, we
know that the OIDs of pg_type rows have to be the same in the old and
new cluster because arrays are stored on disk with the type OIDs
included. Therefore those need to be preserved. If in the future we
changed things so that arrays - and other container types - did not
include the type OIDs in the on-disk representation, then perhaps it
would no longer be necessary to preserve the OIDs of pg_type rows
across a pg_upgrade. However, it would not be harmful to do so. It
just might not be required.

So I think this proposed change is in the safe direction. If
relfilenodes were currently preserved and we wanted to make them not
be preserved, then I think you would be quite right to say "whoa,
whoa, that could be a problem." Indeed it could. If anyone then in the
future wanted to introduce a dependency on them staying the same, they
would have a problem. However, nothing in the server itself can care
about relfilenodes - or anything else - being *different* across a
pg_upgrade. The whole point of pg_upgrade is to make it feel like you
have the same database after you run it as you did before you ran it,
even though under the hood a lot of surgery has been done. Barring
bugs, you can never be sad about there being too LITTLE difference
between the post-upgrade database and the pre-upgrade database.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote:
> Yes, we're talking about either incremental (or perhaps differential)
> backup where only the files which are actually different would be backed
> up.  Just like with PG, I can't provide any complete guarantees that
> we'd be able to actually make this possible after a major version with
> pgBackRest with this change, but it definitely isn't possible *without*
> this change.  I can't see any reason why we wouldn't be able to do a
> checksum-based incremental backup though (which would be *much* faster
> than a regular backup) once this change is made and have that be a
> reliable and trustworthy backup.  I'd want to think about it more and
> discuss it with David in some detail before saying if we could maybe
> perform a timestamp-based incremental backup (without checksum'ing the
> files, as we do in normal situations), but that would really just be a
> bonus.

Well, it would be nice to know exactly how it would help pgBackRest if
that is one of the reasons we are adding this feature.

> > > > As far as TDE, I haven't seen any concrete plan for that, so why add
> > > > this code for that reason?
> > > 
> > > That this would help with TDE (of which there seems little doubt...) is
> > > an additional benefit to this.  Specifically, taking the existing work
> > > that's already been done to allow block-by-block encryption and
> > > adjusting it for AES-XTS and then using the db-dir+relfileno+block
> > > number as the IV, just like many disk encryption systems do, avoids the
> > > concerns that were brought up about using LSN for the IV with CTR and
> > > it's certainly not difficult to do, but it does depend on this change.
> > > This was all discussed previously and it sure looks like a sensible
> > > approach to use that mirrors what many other systems already do
> > > successfully.
> > 
> > Well, I would think we would not add this for TDE until we were sure
> > someone was working on adding TDE.
> 
> That this would help with TDE is what I'd consider an added bonus.

Not if we have no plans to implement TDE, which was my point.  Why not
wait to see if we are actually going to implement TDE rather than adding
it now.  It is just so obvious, why do I have to state this?

> > > > > make this required, general improved sanity when working with 
> > > > > pg_upgrade
> > > > > is frankly a benefit in its own right too...).  If the additional code
> > > > 
> > > > How?  I am not aware of any advantage except cosmetic.
> > > 
> > > Having to resort to matching up inode numbers between the two clusters
> > > after a pg_upgrade to figure out what files are actually the same
> > > underneath is a pain that goes beyond just cosmetics imv.  Removing that
> > > additional level that admins, and developers for that matter, have to go
> > > through would be a nice improvement on its own.
> > 
> > OK, I was just not aware anyone did that, since I have never hard anyone
> > complain about it before.
> 
> I've certainly done it and I'd be kind of surprised if others haven't,
> but I've also played a lot with pg_dump in various modes, so perhaps
> that's not a great representation.  I've definitely had to explain to
> clients why there's a whole different set of filenames after a
> pg_upgrade and why that is the case for an 'in place' upgrade before
> too.

Uh, so I guess I am right that few people have mentioned this in the
past.  Why were users caring about the file names?

> > > > > was a huge burden or even a moderate one then that might be an 
> > > > > argument
> > > > > against, but it hardly sounds like it will be given Robert's thorough
> > > > > analysis so far and the (admittedly not complete, but not that far 
> > > > > from
> > > > > it based on the DB OID review) proposed patch.
> > > > 
> > > > I am find to add it if it is minor, but I want to see the calculus of
> > > > its value vs complexity, which I have not seen spelled out.
> > > 
> > > I feel that this, along with the prior discussions, spells it out
> > > sufficiently given the patch's complexity looks to be reasonably minor
> > > and very similar to the existing things that pg_upgrade already does.
> > > Had pg_upgrade done this in the first place, I don't think there would
> > > have been nearly this amount of discussion about it.
> > 
> > Well, there is a reason pg_upgrade didn't initially do this --- because
> > it adds complexity, and potentially makes future changes to pg_upgrade
> > necessary if the server behavior changes.
> 
> I have a very hard time seeing what changes might happen in the server
> in this space that wouldn't have an impact on pg_upgrade, with or
> without this.

I don't know, but I have to ask since I can't know the future, so any
"preseration" has to be studied.

> > I am not saying this change is wrong, but I think the reasons need to be
> > stated in this thread, rather than just moving forward.
> 
> Ok, they've been stated and it seems to at least Robert 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Aug 26, 2021 at 12:34:56PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (br...@momjian.us) wrote:
> > > On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote:
> > > > * Bruce Momjian (br...@momjian.us) wrote:
> > > > > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > > > > > Anyone see a flaw in that analysis?
> > > > > 
> > > > > I am still waiting to hear the purpose of this preservation.  As long 
> > > > > as
> > > > > you don't apply the patch, I guess I will just stop asking.
> > > > 
> > > > I'm a bit confused why this question keeps coming up as we've discussed
> > > > multiple reasons (incremental backups, possible use for TDE which would
> > > 
> > > I have not seen much explaination on pgBackRest, except me mentioning
> > > it.  Is this something really useful?
> > 
> > Being able to quickly perform a backup on a newly upgraded cluster would
> > certainly be valuable and that's definitely not possible today due to
> > all of the filenames changing.
> 
> You mean incremental backup, right?  I was told this by the pgBackRest
> developers during PGCon, but I have not heard that stated publicly, so I
> hate to go just on what I heard rather than seeing that stated publicly.

Yes, we're talking about either incremental (or perhaps differential)
backup where only the files which are actually different would be backed
up.  Just like with PG, I can't provide any complete guarantees that
we'd be able to actually make this possible after a major version with
pgBackRest with this change, but it definitely isn't possible *without*
this change.  I can't see any reason why we wouldn't be able to do a
checksum-based incremental backup though (which would be *much* faster
than a regular backup) once this change is made and have that be a
reliable and trustworthy backup.  I'd want to think about it more and
discuss it with David in some detail before saying if we could maybe
perform a timestamp-based incremental backup (without checksum'ing the
files, as we do in normal situations), but that would really just be a
bonus.

> > > As far as TDE, I haven't seen any concrete plan for that, so why add
> > > this code for that reason?
> > 
> > That this would help with TDE (of which there seems little doubt...) is
> > an additional benefit to this.  Specifically, taking the existing work
> > that's already been done to allow block-by-block encryption and
> > adjusting it for AES-XTS and then using the db-dir+relfileno+block
> > number as the IV, just like many disk encryption systems do, avoids the
> > concerns that were brought up about using LSN for the IV with CTR and
> > it's certainly not difficult to do, but it does depend on this change.
> > This was all discussed previously and it sure looks like a sensible
> > approach to use that mirrors what many other systems already do
> > successfully.
> 
> Well, I would think we would not add this for TDE until we were sure
> someone was working on adding TDE.

That this would help with TDE is what I'd consider an added bonus.

> > > > make this required, general improved sanity when working with pg_upgrade
> > > > is frankly a benefit in its own right too...).  If the additional code
> > > 
> > > How?  I am not aware of any advantage except cosmetic.
> > 
> > Having to resort to matching up inode numbers between the two clusters
> > after a pg_upgrade to figure out what files are actually the same
> > underneath is a pain that goes beyond just cosmetics imv.  Removing that
> > additional level that admins, and developers for that matter, have to go
> > through would be a nice improvement on its own.
> 
> OK, I was just not aware anyone did that, since I have never hard anyone
> complain about it before.

I've certainly done it and I'd be kind of surprised if others haven't,
but I've also played a lot with pg_dump in various modes, so perhaps
that's not a great representation.  I've definitely had to explain to
clients why there's a whole different set of filenames after a
pg_upgrade and why that is the case for an 'in place' upgrade before
too.

> > > > was a huge burden or even a moderate one then that might be an argument
> > > > against, but it hardly sounds like it will be given Robert's thorough
> > > > analysis so far and the (admittedly not complete, but not that far from
> > > > it based on the DB OID review) proposed patch.
> > > 
> > > I am find to add it if it is minor, but I want to see the calculus of
> > > its value vs complexity, which I have not seen spelled out.
> > 
> > I feel that this, along with the prior discussions, spells it out
> > sufficiently given the patch's complexity looks to be reasonably minor
> > and very similar to the existing things that pg_upgrade already does.
> > Had pg_upgrade done this in the first place, I don't think there would
> > have been nearly this amount of discussion about it.
> 
> Well, there is a reason pg_upgrade didn't 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 12:37:19PM -0400, Robert Haas wrote:
> On Thu, Aug 26, 2021 at 11:48 AM Bruce Momjian  wrote:
> > I am find to add it if it is minor, but I want to see the calculus of
> > its value vs complexity, which I have not seen spelled out.
> 
> I don't think it's going to be all that complicated, but we're going
> to have to wait until we have something closer to a final patch before
> we can really evaluate that. I am honestly a little puzzled about why
> you think complexity is such a big issue for this patch in particular.
> I feel we do probably several hundred things every release cycle that
> are more complicated than this, so it doesn't seem like this is
> particularly extraordinary or needs a lot of extra scrutiny. I do
> think there is some risk that there are messy cases we can't handle
> cleanly, but if that becomes an issue then I'll abandon the effort
> until a solution can be found. I'm not trying to relentlessly drive
> something through that is a bad idea on principle.
> 
> I agree with all Stephen's comments, too.

I just don't want to add requirements/complexity to pg_upgrade without
clearly stated reasons because future database changes will need to
honor this new preservation behavior.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 12:34:56PM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote:
> > > * Bruce Momjian (br...@momjian.us) wrote:
> > > > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > > > > Anyone see a flaw in that analysis?
> > > > 
> > > > I am still waiting to hear the purpose of this preservation.  As long as
> > > > you don't apply the patch, I guess I will just stop asking.
> > > 
> > > I'm a bit confused why this question keeps coming up as we've discussed
> > > multiple reasons (incremental backups, possible use for TDE which would
> > 
> > I have not seen much explaination on pgBackRest, except me mentioning
> > it.  Is this something really useful?
> 
> Being able to quickly perform a backup on a newly upgraded cluster would
> certainly be valuable and that's definitely not possible today due to
> all of the filenames changing.

You mean incremental backup, right?  I was told this by the pgBackRest
developers during PGCon, but I have not heard that stated publicly, so I
hate to go just on what I heard rather than seeing that stated publicly.

> > As far as TDE, I haven't seen any concrete plan for that, so why add
> > this code for that reason?
> 
> That this would help with TDE (of which there seems little doubt...) is
> an additional benefit to this.  Specifically, taking the existing work
> that's already been done to allow block-by-block encryption and
> adjusting it for AES-XTS and then using the db-dir+relfileno+block
> number as the IV, just like many disk encryption systems do, avoids the
> concerns that were brought up about using LSN for the IV with CTR and
> it's certainly not difficult to do, but it does depend on this change.
> This was all discussed previously and it sure looks like a sensible
> approach to use that mirrors what many other systems already do
> successfully.

Well, I would think we would not add this for TDE until we were sure
someone was working on adding TDE.

> > > make this required, general improved sanity when working with pg_upgrade
> > > is frankly a benefit in its own right too...).  If the additional code
> > 
> > How?  I am not aware of any advantage except cosmetic.
> 
> Having to resort to matching up inode numbers between the two clusters
> after a pg_upgrade to figure out what files are actually the same
> underneath is a pain that goes beyond just cosmetics imv.  Removing that
> additional level that admins, and developers for that matter, have to go
> through would be a nice improvement on its own.

OK, I was just not aware anyone did that, since I have never hard anyone
complain about it before.
 
> > > was a huge burden or even a moderate one then that might be an argument
> > > against, but it hardly sounds like it will be given Robert's thorough
> > > analysis so far and the (admittedly not complete, but not that far from
> > > it based on the DB OID review) proposed patch.
> > 
> > I am find to add it if it is minor, but I want to see the calculus of
> > its value vs complexity, which I have not seen spelled out.
> 
> I feel that this, along with the prior discussions, spells it out
> sufficiently given the patch's complexity looks to be reasonably minor
> and very similar to the existing things that pg_upgrade already does.
> Had pg_upgrade done this in the first place, I don't think there would
> have been nearly this amount of discussion about it.

Well, there is a reason pg_upgrade didn't initially do this --- because
it adds complexity, and potentially makes future changes to pg_upgrade
necessary if the server behavior changes.

I am not saying this change is wrong, but I think the reasons need to be
stated in this thread, rather than just moving forward.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Aug 26, 2021 at 11:39 AM Stephen Frost  wrote:
> > This looks like a pretty good analysis to me.  As it relates to the
> > question about allowing users to specify an OID, I'd be inclined to
> > allow it but only for OIDs >64k.  We've certainly reserved things in the
> > past and I don't see any issue with having that reservation here, but if
> > we're going to build the capability to specify the OID into CREATE
> > DATABASE then it seems a bit odd to disallow users from using it, as
> > long as we're preventing them from causing problems with it.
> >
> > Are there issues that you see with allowing users to specify the OID
> > even with the >64k restriction..?  I can't think of one offhand but
> > perhaps I'm missing something.
> 
> So I actually should have said 16k here, not 64k, as somebody already
> pointed out to me off-list. Whee!

Hah, yes, of course.

> I don't know of a reason not to let people do that, other than that it
> seems like an attractive nuisance. People will do it and it will fail
> because they chose a duplicate OID, or they'll complain that a regular
> dump and restore didn't preserve their database OIDs, or maybe they'll
> expect that they can copy a database from one cluster to another
> because they gave it the same OID! That said, I don't see a great harm
> in it. It just seems to me like exposing knobs to users that don't
> seem to have any legitimate use may be borrowing trouble.

We're going to have to gate this somehow to allow the OIDs under 16k to
be used, so it seems like what you're suggesting is that we have that
gate in place but then allow any OID to be used if you've crossed that
gate?

That is, if we do something like:

SELECT pg_catalog.binary_upgrade_allow_setting_db_oid();
CREATE DATABASE blah WITH OID 1234;

for pg_upgrade, well, users who are interested may well figure out how
to do that themselves if they decide they want to set the OID, whereas
if it 'just works' provided they don't try to use an OID too low then
maybe they won't try to bypass the restriction against using system
OIDs..?

Ok, I'll give you that this is a stretch and I'm on the fence about if
it's worthwhile or not to include and document and if, as you say, it's
inviting trouble to allow users to set it.  Users do seem to have a
knack for finding things even when they aren't documented and then we
get to deal with those complaints too. :)

Perhaps others have some stronger feelings one way or another.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Robert Haas
On Thu, Aug 26, 2021 at 11:48 AM Bruce Momjian  wrote:
> I am find to add it if it is minor, but I want to see the calculus of
> its value vs complexity, which I have not seen spelled out.

I don't think it's going to be all that complicated, but we're going
to have to wait until we have something closer to a final patch before
we can really evaluate that. I am honestly a little puzzled about why
you think complexity is such a big issue for this patch in particular.
I feel we do probably several hundred things every release cycle that
are more complicated than this, so it doesn't seem like this is
particularly extraordinary or needs a lot of extra scrutiny. I do
think there is some risk that there are messy cases we can't handle
cleanly, but if that becomes an issue then I'll abandon the effort
until a solution can be found. I'm not trying to relentlessly drive
something through that is a bad idea on principle.

I agree with all Stephen's comments, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote:
> > * Bruce Momjian (br...@momjian.us) wrote:
> > > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > > > Anyone see a flaw in that analysis?
> > > 
> > > I am still waiting to hear the purpose of this preservation.  As long as
> > > you don't apply the patch, I guess I will just stop asking.
> > 
> > I'm a bit confused why this question keeps coming up as we've discussed
> > multiple reasons (incremental backups, possible use for TDE which would
> 
> I have not seen much explaination on pgBackRest, except me mentioning
> it.  Is this something really useful?

Being able to quickly perform a backup on a newly upgraded cluster would
certainly be valuable and that's definitely not possible today due to
all of the filenames changing.

> As far as TDE, I haven't seen any concrete plan for that, so why add
> this code for that reason?

That this would help with TDE (of which there seems little doubt...) is
an additional benefit to this.  Specifically, taking the existing work
that's already been done to allow block-by-block encryption and
adjusting it for AES-XTS and then using the db-dir+relfileno+block
number as the IV, just like many disk encryption systems do, avoids the
concerns that were brought up about using LSN for the IV with CTR and
it's certainly not difficult to do, but it does depend on this change.
This was all discussed previously and it sure looks like a sensible
approach to use that mirrors what many other systems already do
successfully.

> > make this required, general improved sanity when working with pg_upgrade
> > is frankly a benefit in its own right too...).  If the additional code
> 
> How?  I am not aware of any advantage except cosmetic.

Having to resort to matching up inode numbers between the two clusters
after a pg_upgrade to figure out what files are actually the same
underneath is a pain that goes beyond just cosmetics imv.  Removing that
additional level that admins, and developers for that matter, have to go
through would be a nice improvement on its own.

> > was a huge burden or even a moderate one then that might be an argument
> > against, but it hardly sounds like it will be given Robert's thorough
> > analysis so far and the (admittedly not complete, but not that far from
> > it based on the DB OID review) proposed patch.
> 
> I am find to add it if it is minor, but I want to see the calculus of
> its value vs complexity, which I have not seen spelled out.

I feel that this, along with the prior discussions, spells it out
sufficiently given the patch's complexity looks to be reasonably minor
and very similar to the existing things that pg_upgrade already does.
Had pg_upgrade done this in the first place, I don't think there would
have been nearly this amount of discussion about it.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Robert Haas
On Thu, Aug 26, 2021 at 11:39 AM Stephen Frost  wrote:
> This looks like a pretty good analysis to me.  As it relates to the
> question about allowing users to specify an OID, I'd be inclined to
> allow it but only for OIDs >64k.  We've certainly reserved things in the
> past and I don't see any issue with having that reservation here, but if
> we're going to build the capability to specify the OID into CREATE
> DATABASE then it seems a bit odd to disallow users from using it, as
> long as we're preventing them from causing problems with it.
>
> Are there issues that you see with allowing users to specify the OID
> even with the >64k restriction..?  I can't think of one offhand but
> perhaps I'm missing something.

So I actually should have said 16k here, not 64k, as somebody already
pointed out to me off-list. Whee!

I don't know of a reason not to let people do that, other than that it
seems like an attractive nuisance. People will do it and it will fail
because they chose a duplicate OID, or they'll complain that a regular
dump and restore didn't preserve their database OIDs, or maybe they'll
expect that they can copy a database from one cluster to another
because they gave it the same OID! That said, I don't see a great harm
in it. It just seems to me like exposing knobs to users that don't
seem to have any legitimate use may be borrowing trouble.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 11:36:51AM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > > Anyone see a flaw in that analysis?
> > 
> > I am still waiting to hear the purpose of this preservation.  As long as
> > you don't apply the patch, I guess I will just stop asking.
> 
> I'm a bit confused why this question keeps coming up as we've discussed
> multiple reasons (incremental backups, possible use for TDE which would

I have not seen much explaination on pgBackRest, except me mentioning
it.  Is this something really useful?

As far as TDE, I haven't seen any concrete plan for that, so why add
this code for that reason?

> make this required, general improved sanity when working with pg_upgrade
> is frankly a benefit in its own right too...).  If the additional code

How?  I am not aware of any advantage except cosmetic.

> was a huge burden or even a moderate one then that might be an argument
> against, but it hardly sounds like it will be given Robert's thorough
> analysis so far and the (admittedly not complete, but not that far from
> it based on the DB OID review) proposed patch.

I am find to add it if it is minor, but I want to see the calculus of
its value vs complexity, which I have not seen spelled out.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 11:35:01AM -0400, Robert Haas wrote:
> On Thu, Aug 26, 2021 at 11:24 AM Bruce Momjian  wrote:
> > On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > > Anyone see a flaw in that analysis?
> >
> > I am still waiting to hear the purpose of this preservation.  As long as
> > you don't apply the patch, I guess I will just stop asking.
> 
> You make it sound like I didn't answer that question the last time you
> asked it, but I did.[1] I went back to the previous thread and found
> that, in fact, there's at least one email *from you* appearing to
> endorse that concept for reasons unrelated to TDE[2] and another where
> you appear to agree that it would be useful for TDE to do it.[3]
> Stephen Frost also wrote up his discussion during the Unconference and
> some of his reasons for liking the idea.[4]
> 
> If you've changed your mind about this being a good idea, or if you no
> longer think it's useful without TDE, that's fine. Everyone is
> entitled to change their opinion. But then please say that straight
> out. It baffles me why you're now acting as if it hasn't been
> discussed when it clearly has been, and both you and I were
> participants in that discussion.
> 
> [1] 
> https://www.postgresql.org/message-id/ca+tgmob7msyh3vray87usr22uakvvsyy4zbaqw2ao2cfoud...@mail.gmail.com
> [2] https://www.postgresql.org/message-id/20210601140949.gc22...@momjian.us
> [3] https://www.postgresql.org/message-id/20210527210023.gj5...@momjian.us
> [4] 
> https://www.postgresql.org/message-id/20210531201652.gy20...@tamriel.snowman.net

Yes, it would help incremental backup of pgBackRest, as reported by the
developers.  However, I have seen no discussion if this is useful enough
reason to add the complexity to preserve this.  The TODO list shows
"Desirability" as the first item to be discussed, so I expected that to
be discussed first.  Also, with TDE not progressing (and my approach not
even needing this), I have not seen a full discussion if this item is
desirable based on its complexity.

What I did see is this patch appear with no context of why it is useful
given our current plans, except for pgBackRest, which I think I
mentioned.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Aug 17, 2021 at 2:50 PM Robert Haas  wrote:
> > > Less sure that this is a good idea, though.  In particular, I do not
> > > think that you can make it work in the face of
> > > alter database template1 rename to oops;
> > > create database template1;
> >
> > That is a really good point. If we can't categorically force the OID
> > of those databases to have a particular, fixed value, and based on
> > this example that seems to be impossible, then there's always a
> > possibility that we might find a value in the old cluster that doesn't
> > happen to match what is present in the new cluster. Seen from that
> > angle, the problem is really with databases that are pre-existent in
> > the new cluster but whose contents still need to be dumped. Maybe we
> > could (optionally? conditionally?) drop those databases from the new
> > cluster and then recreate them with the OID that we want them to have.
> 
> Actually, we do that already. create_new_objects() runs pg_restore
> with --create for most databases, but with --clean --create for
> template1 and postgres. This means that template1 and postgres will
> always be recreated in the new cluster, and other databases are
> assumed not to exist in the new cluster and the upgrade will fail if
> they unexpectedly do. And the reason why pg_upgrade does that is that
> it wants to "propagate [the] database-level properties" of postgres
> and template1. So suppose we just make the database OID one of the
> database-level properties that we want to propagate. That should
> mostly just work, but where can things go wrong?
> 
> The only real failure mode is we try to create a database in the new
> cluster and find out that the OID is already in use. If the new OID
> that collides >64k, then the user has messed with the new cluster
> before doing that. And since pg_upgrade is pretty clearly already
> assuming that you shouldn't do that, it's fine to also make that
> assumption in this case. We can disregard such cases as user error.
> 
> If the new OID that collides is <64k, then it must be colliding with
> template0, template1, or postgres in the new cluster, because those
> are the only databases that can have such OIDs since, currently, we
> don't allow users to specify an OID for a new database. And the
> problem cannot be with template1, because we hard-code its OID to 1.
> If there is a database with OID 1 in either cluster, it must be
> template1, and if there is a database with OID 1 in both clusters, it
> must be template1 in both cases, and we'll just drop and recreate it
> with OID 1 and everything is fine. So we need only consider template0
> and postgres, which are created with system-generated OIDs. And, it
> would be no issue if either of those databases had the same OID in the
> old and new cluster, so the only possible OID collision is one where
> the same system-generated OID was assigned to one of those databases
> in the old cluster and to the other in the new cluster.
> 
> First consider the case where template0 has OID, say, 13000, in the
> old cluster, and postgres has that OID in the new cluster. No problem
> occurs, because template0 isn't transferred anyway. The reverse
> direction is a problem, though. If postgres had been assigned OID
> 13000 in the old cluster and, by sheer chance, template0 had that OID
> in the new cluster, then the upgrade would fail, because it wouldn't
> be able to recreate the postgres database with the correct OID.
> 
> But that doesn't seem very difficult to fix. I think all we need to do
> is have initdb assign a fixed OID to template0 at creation time. Then,
> in any new release to which someone might be trying to upgrade, the
> system-generated OID assigned to postgres in the old release can't
> match the fixed OID assigned to template0 in the new release, so the
> one problem case is ruled out. We do need, however, to make sure that
> the assign-my-database-a-fixed-OID syntax is either entirely
> restricted to initdb & pg_upgrade or at least that OIDS < 64k can only
> be assigned in one of those modes. Otherwise, some creative person
> could manufacture new problem cases by setting up the source database
> so that the OID of one of their databases matches the fixed OID we
> gave to template0 or template1, or the system-generated OID for
> postgres in the new cluster.
> 
> In short, as far as I can see, all we need to do to preserve database
> OIDs across pg_upgrade is:
> 
> 1. Add a new syntax for creating a database with a given OID, and use
> it in pg_dump --binary-upgrade.
> 2. Don't let users use it at least for OIDs <64k, or maybe just don't
> let them use it at all.
> 3. But let initdb use it, and have initdb set the initial OID for
> template0 to a fixed value < 1. If the user changes it later, no
> problem; the cluster into which they are upgrading won't contain any
> databases with high-numbered OIDs.
> 
> Anyone see a 

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > Anyone see a flaw in that analysis?
> 
> I am still waiting to hear the purpose of this preservation.  As long as
> you don't apply the patch, I guess I will just stop asking.

I'm a bit confused why this question keeps coming up as we've discussed
multiple reasons (incremental backups, possible use for TDE which would
make this required, general improved sanity when working with pg_upgrade
is frankly a benefit in its own right too...).  If the additional code
was a huge burden or even a moderate one then that might be an argument
against, but it hardly sounds like it will be given Robert's thorough
analysis so far and the (admittedly not complete, but not that far from
it based on the DB OID review) proposed patch.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Robert Haas
On Thu, Aug 26, 2021 at 11:24 AM Bruce Momjian  wrote:
> On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> > Anyone see a flaw in that analysis?
>
> I am still waiting to hear the purpose of this preservation.  As long as
> you don't apply the patch, I guess I will just stop asking.

You make it sound like I didn't answer that question the last time you
asked it, but I did.[1] I went back to the previous thread and found
that, in fact, there's at least one email *from you* appearing to
endorse that concept for reasons unrelated to TDE[2] and another where
you appear to agree that it would be useful for TDE to do it.[3]
Stephen Frost also wrote up his discussion during the Unconference and
some of his reasons for liking the idea.[4]

If you've changed your mind about this being a good idea, or if you no
longer think it's useful without TDE, that's fine. Everyone is
entitled to change their opinion. But then please say that straight
out. It baffles me why you're now acting as if it hasn't been
discussed when it clearly has been, and both you and I were
participants in that discussion.

[1] 
https://www.postgresql.org/message-id/ca+tgmob7msyh3vray87usr22uakvvsyy4zbaqw2ao2cfoud...@mail.gmail.com
[2] https://www.postgresql.org/message-id/20210601140949.gc22...@momjian.us
[3] https://www.postgresql.org/message-id/20210527210023.gj5...@momjian.us
[4] 
https://www.postgresql.org/message-id/20210531201652.gy20...@tamriel.snowman.net

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Bruce Momjian
On Thu, Aug 26, 2021 at 11:00:47AM -0400, Robert Haas wrote:
> Anyone see a flaw in that analysis?

I am still waiting to hear the purpose of this preservation.  As long as
you don't apply the patch, I guess I will just stop asking.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-26 Thread Robert Haas
On Tue, Aug 17, 2021 at 2:50 PM Robert Haas  wrote:
> > Less sure that this is a good idea, though.  In particular, I do not
> > think that you can make it work in the face of
> > alter database template1 rename to oops;
> > create database template1;
>
> That is a really good point. If we can't categorically force the OID
> of those databases to have a particular, fixed value, and based on
> this example that seems to be impossible, then there's always a
> possibility that we might find a value in the old cluster that doesn't
> happen to match what is present in the new cluster. Seen from that
> angle, the problem is really with databases that are pre-existent in
> the new cluster but whose contents still need to be dumped. Maybe we
> could (optionally? conditionally?) drop those databases from the new
> cluster and then recreate them with the OID that we want them to have.

Actually, we do that already. create_new_objects() runs pg_restore
with --create for most databases, but with --clean --create for
template1 and postgres. This means that template1 and postgres will
always be recreated in the new cluster, and other databases are
assumed not to exist in the new cluster and the upgrade will fail if
they unexpectedly do. And the reason why pg_upgrade does that is that
it wants to "propagate [the] database-level properties" of postgres
and template1. So suppose we just make the database OID one of the
database-level properties that we want to propagate. That should
mostly just work, but where can things go wrong?

The only real failure mode is we try to create a database in the new
cluster and find out that the OID is already in use. If the new OID
that collides >64k, then the user has messed with the new cluster
before doing that. And since pg_upgrade is pretty clearly already
assuming that you shouldn't do that, it's fine to also make that
assumption in this case. We can disregard such cases as user error.

If the new OID that collides is <64k, then it must be colliding with
template0, template1, or postgres in the new cluster, because those
are the only databases that can have such OIDs since, currently, we
don't allow users to specify an OID for a new database. And the
problem cannot be with template1, because we hard-code its OID to 1.
If there is a database with OID 1 in either cluster, it must be
template1, and if there is a database with OID 1 in both clusters, it
must be template1 in both cases, and we'll just drop and recreate it
with OID 1 and everything is fine. So we need only consider template0
and postgres, which are created with system-generated OIDs. And, it
would be no issue if either of those databases had the same OID in the
old and new cluster, so the only possible OID collision is one where
the same system-generated OID was assigned to one of those databases
in the old cluster and to the other in the new cluster.

First consider the case where template0 has OID, say, 13000, in the
old cluster, and postgres has that OID in the new cluster. No problem
occurs, because template0 isn't transferred anyway. The reverse
direction is a problem, though. If postgres had been assigned OID
13000 in the old cluster and, by sheer chance, template0 had that OID
in the new cluster, then the upgrade would fail, because it wouldn't
be able to recreate the postgres database with the correct OID.

But that doesn't seem very difficult to fix. I think all we need to do
is have initdb assign a fixed OID to template0 at creation time. Then,
in any new release to which someone might be trying to upgrade, the
system-generated OID assigned to postgres in the old release can't
match the fixed OID assigned to template0 in the new release, so the
one problem case is ruled out. We do need, however, to make sure that
the assign-my-database-a-fixed-OID syntax is either entirely
restricted to initdb & pg_upgrade or at least that OIDS < 64k can only
be assigned in one of those modes. Otherwise, some creative person
could manufacture new problem cases by setting up the source database
so that the OID of one of their databases matches the fixed OID we
gave to template0 or template1, or the system-generated OID for
postgres in the new cluster.

In short, as far as I can see, all we need to do to preserve database
OIDs across pg_upgrade is:

1. Add a new syntax for creating a database with a given OID, and use
it in pg_dump --binary-upgrade.
2. Don't let users use it at least for OIDs <64k, or maybe just don't
let them use it at all.
3. But let initdb use it, and have initdb set the initial OID for
template0 to a fixed value < 1. If the user changes it later, no
problem; the cluster into which they are upgrading won't contain any
databases with high-numbered OIDs.

Anyone see a flaw in that analysis?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Bruce Momjian
On Tue, Aug 24, 2021 at 02:34:26PM -0400, Robert Haas wrote:
> On Tue, Aug 24, 2021 at 2:16 PM Bruce Momjian  wrote:
> > One other issue --- the more that pg_upgrade preserves, the more likely
> > pg_upgrade will break when some internal changes happen in Postgres.
> > Therefore, if you want pg_upgrade to preserve something, you have to
> > have a good reason --- even code simplicity might not be a sufficient
> > reason.
> 
> While I accept that as a general principle, I don't think it's really
> applicable in this case. pg_upgrade already knows all about
> relfilenodes; it has a source file called relfilenode.c. I don't see
> that a pg_upgrade that preserves relfilenodes is any more or less
> likely to break in the future than a pg_upgrade that renumbers all the
> files so that the relation OID and the relfilenode are equal. You've
> got about the same amount of reliance on the on-disk layout either
> way.

I was making more of a general statement that preservation can be
problematic and its impact must be researched.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Robert Haas
On Tue, Aug 24, 2021 at 2:16 PM Bruce Momjian  wrote:
> One other issue --- the more that pg_upgrade preserves, the more likely
> pg_upgrade will break when some internal changes happen in Postgres.
> Therefore, if you want pg_upgrade to preserve something, you have to
> have a good reason --- even code simplicity might not be a sufficient
> reason.

While I accept that as a general principle, I don't think it's really
applicable in this case. pg_upgrade already knows all about
relfilenodes; it has a source file called relfilenode.c. I don't see
that a pg_upgrade that preserves relfilenodes is any more or less
likely to break in the future than a pg_upgrade that renumbers all the
files so that the relation OID and the relfilenode are equal. You've
got about the same amount of reliance on the on-disk layout either
way.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Bruce Momjian
On Tue, Aug 24, 2021 at 12:43:20PM -0400, Bruce Momjian wrote:
> Yes, it is a trade-off.  If we had pg_upgrade create the new cluster,
> the pg_upgrade instructions would be simpler, but pg_upgrade would be
> more complex since it has to adjust _everything_ properly so pg_upgrade
> works --- I never got to that point, but I am willing to explore what
> would be required.

One other issue --- the more that pg_upgrade preserves, the more likely
pg_upgrade will break when some internal changes happen in Postgres. 
Therefore, if you want pg_upgrade to preserve something, you have to
have a good reason --- even code simplicity might not be a sufficient
reason.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Aug 24, 2021 at 12:43 PM Bruce Momjian  wrote:
> > Yes, it is a trade-off.  If we had pg_upgrade create the new cluster,
> > the pg_upgrade instructions would be simpler, but pg_upgrade would be
> > more complex since it has to adjust _everything_ properly so pg_upgrade
> > works --- I never got to that point, but I am willing to explore what
> > would be required.
> 
> It's probably a topic for another thread, rather than this one, but I
> think that would be very cool.

Yes, definite +1 on this.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Robert Haas
On Tue, Aug 24, 2021 at 12:43 PM Bruce Momjian  wrote:
> Yes, it is a trade-off.  If we had pg_upgrade create the new cluster,
> the pg_upgrade instructions would be simpler, but pg_upgrade would be
> more complex since it has to adjust _everything_ properly so pg_upgrade
> works --- I never got to that point, but I am willing to explore what
> would be required.

It's probably a topic for another thread, rather than this one, but I
think that would be very cool.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Aug 23, 2021 at 5:12 PM Stephen Frost  wrote:
> > Regarding that ... I have to wonder just what promises we feel we've
> > made when it comes to what a user is expected to be able to do with the
> > new cluster *before* pg_upgrade is run on it.  For my part, I sure feel
> > like it's "nothing", in which case it seems like we can do things that
> > we can't do with a running system, like literally just DROP and recreate
> > with the correct OID of any databases we need to, or even push that back
> > to the user to do that at initdb time with some kind of error thrown by
> > pg_upgrade during the --check phase.  "Initial databases have
> > non-standard OIDs, recreate destination cluster with initdb
> > --with-oid=12341" or something along those lines.
> 
> Yeah, possibly. Honestly, I find it weird that pg_upgrade expects the
> new cluster to already exist. It seems like it would be more sensible
> if it created the cluster itself. That's not entirely trivial, because
> for example you have to create it with the correct locale settings and
> stuff. But if you require the cluster to exist already, then you run
> into the kinds of questions that you're asking here, and whether the
> answer is "nothing" as you propose here or something more than that,
> it's clearly not "whatever you want" nor anything close to that.

Yeah, I'd had a similar thought and also tend to agree that it'd make
more sense for pg_upgrade to set up the new cluster too, and doing so in
a way that makes sure that it matches the old cluster as that's rather
important.  Having the user do it also implies that there is some
freedom for the user to mess around with the new cluster before running
pg_upgrade, it seems to me anyway, and that's certainly not something
that we've built anything into pg_upgrade to deal with cleanly..

It isn't like initdb takes all *that* long to run either, and reducing
the number of steps that the user has to take to perform an upgrade sure
seems like a good thing to do.  Anyhow, just wanted to throw that out
there as another way we might approach this.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Robert Haas
On Tue, Aug 24, 2021 at 12:04 PM Tom Lane  wrote:
> Per upthread discussion, it seems impractical to fully guarantee
> that database OIDs match, which seems to mean that the whole premise
> collapses.  Like Bruce, I want to see a plausible use case justifying
> any partial-guarantee scenario before we add more complication (= bugs)
> to pg_upgrade.

I think you might be overlooking the emails from Stephen and I where
we suggested how that could be made to work?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Bruce Momjian
On Tue, Aug 24, 2021 at 11:24:21AM -0400, Robert Haas wrote:
> On Mon, Aug 23, 2021 at 5:12 PM Stephen Frost  wrote:
> > Regarding that ... I have to wonder just what promises we feel we've
> > made when it comes to what a user is expected to be able to do with the
> > new cluster *before* pg_upgrade is run on it.  For my part, I sure feel
> > like it's "nothing", in which case it seems like we can do things that
> > we can't do with a running system, like literally just DROP and recreate
> > with the correct OID of any databases we need to, or even push that back
> > to the user to do that at initdb time with some kind of error thrown by
> > pg_upgrade during the --check phase.  "Initial databases have
> > non-standard OIDs, recreate destination cluster with initdb
> > --with-oid=12341" or something along those lines.
> 
> Yeah, possibly. Honestly, I find it weird that pg_upgrade expects the
> new cluster to already exist. It seems like it would be more sensible
> if it created the cluster itself. That's not entirely trivial, because
> for example you have to create it with the correct locale settings and
> stuff. But if you require the cluster to exist already, then you run
> into the kinds of questions that you're asking here, and whether the
> answer is "nothing" as you propose here or something more than that,
> it's clearly not "whatever you want" nor anything close to that.

Yes, it is a trade-off.  If we had pg_upgrade create the new cluster,
the pg_upgrade instructions would be simpler, but pg_upgrade would be
more complex since it has to adjust _everything_ properly so pg_upgrade
works --- I never got to that point, but I am willing to explore what
would be required.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Bruce Momjian
On Tue, Aug 24, 2021 at 11:28:37AM -0400, Robert Haas wrote:
> On Mon, Aug 23, 2021 at 8:29 PM Bruce Momjian  wrote:
> > I assume this patch is not going to be applied until there is an actual
> > use case for preserving these values.
> 
> My interpretation of the preceding discussion was that several people
> thought this change was a good idea regardless of whether anything
> ever happens with TDE, so I wasn't seeing a reason to wait.
> Personally, I've always thought that it was quite odd that pg_upgrade
> didn't preserve the relfilenode values, so I'm in favor of the change.
> I bet we could even make some simplifications to that code if we got
> all of this sorted out, which seems like it would be nice.

Yes, if this ends up being a cleanup with no added complexity, that
would be nice, but I had not seen how that was possible in the psat.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Bruce Momjian
On Tue, Aug 24, 2021 at 12:04:00PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Mon, Aug 23, 2021 at 8:29 PM Bruce Momjian  wrote:
> >> I assume this patch is not going to be applied until there is an actual
> >> use case for preserving these values.
> 
> > ...
> 
> > That being said, if you or somebody else thinks that this is a bad
> > idea or that the reasons offered up until now are insufficient, feel
> > free to make that argument. I just work here...
> 
> Per upthread discussion, it seems impractical to fully guarantee
> that database OIDs match, which seems to mean that the whole premise
> collapses.  Like Bruce, I want to see a plausible use case justifying
> any partial-guarantee scenario before we add more complication (= bugs)
> to pg_upgrade.

Yes, pg_upgrade is already complex enough, so why add more complexity
for some cosmetic value.  (I think "cosmetic" flew out the window with
pg_upgrade long ago.  ;-)  )

I know that pgBackRest has asked for stable relfilenodes to make
incremental file system backups after pg_upgrade smaller, but if we want
to make relfilenodes stable, we had better understand that is _why_ we
are adding this complexity.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Tom Lane
Robert Haas  writes:
> On Mon, Aug 23, 2021 at 8:29 PM Bruce Momjian  wrote:
>> I assume this patch is not going to be applied until there is an actual
>> use case for preserving these values.

> ...

> That being said, if you or somebody else thinks that this is a bad
> idea or that the reasons offered up until now are insufficient, feel
> free to make that argument. I just work here...

Per upthread discussion, it seems impractical to fully guarantee
that database OIDs match, which seems to mean that the whole premise
collapses.  Like Bruce, I want to see a plausible use case justifying
any partial-guarantee scenario before we add more complication (= bugs)
to pg_upgrade.

regards, tom lane




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Robert Haas
On Mon, Aug 23, 2021 at 8:29 PM Bruce Momjian  wrote:
> I assume this patch is not going to be applied until there is an actual
> use case for preserving these values.

My interpretation of the preceding discussion was that several people
thought this change was a good idea regardless of whether anything
ever happens with TDE, so I wasn't seeing a reason to wait.
Personally, I've always thought that it was quite odd that pg_upgrade
didn't preserve the relfilenode values, so I'm in favor of the change.
I bet we could even make some simplifications to that code if we got
all of this sorted out, which seems like it would be nice.

I think it was also mentioned that this might be nice for pgBackRest,
which apparently permits incremental backups across major version
upgrades but likes filenames to match.

That being said, if you or somebody else thinks that this is a bad
idea or that the reasons offered up until now are insufficient, feel
free to make that argument. I just work here...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Robert Haas
On Mon, Aug 23, 2021 at 5:12 PM Stephen Frost  wrote:
> Regarding that ... I have to wonder just what promises we feel we've
> made when it comes to what a user is expected to be able to do with the
> new cluster *before* pg_upgrade is run on it.  For my part, I sure feel
> like it's "nothing", in which case it seems like we can do things that
> we can't do with a running system, like literally just DROP and recreate
> with the correct OID of any databases we need to, or even push that back
> to the user to do that at initdb time with some kind of error thrown by
> pg_upgrade during the --check phase.  "Initial databases have
> non-standard OIDs, recreate destination cluster with initdb
> --with-oid=12341" or something along those lines.

Yeah, possibly. Honestly, I find it weird that pg_upgrade expects the
new cluster to already exist. It seems like it would be more sensible
if it created the cluster itself. That's not entirely trivial, because
for example you have to create it with the correct locale settings and
stuff. But if you require the cluster to exist already, then you run
into the kinds of questions that you're asking here, and whether the
answer is "nothing" as you propose here or something more than that,
it's clearly not "whatever you want" nor anything close to that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-24 Thread Shruthi Gowda
On Tue, Aug 24, 2021 at 5:59 AM Bruce Momjian  wrote:
>
> On Mon, Aug 23, 2021 at 04:57:31PM -0400, Robert Haas wrote:
> > On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda  wrote:
> > > Thanks Robert for your comments.
> > > I have split the patch into two portions. One that handles DB OID and
> > > the other that
> > > handles tablespace OID and relfilenode OID.
> >
> > It's pretty clear from the discussion, I think, that the database OID
> > one is going to need rework to be considered.
>
> I assume this patch is not going to be applied until there is an actual
> use case for preserving these values.

JFI, I added an entry into commit fest for this patch.
link: https://commitfest.postgresql.org/34/3296/




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-23 Thread Bruce Momjian
On Mon, Aug 23, 2021 at 04:57:31PM -0400, Robert Haas wrote:
> On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda  wrote:
> > Thanks Robert for your comments.
> > I have split the patch into two portions. One that handles DB OID and
> > the other that
> > handles tablespace OID and relfilenode OID.
> 
> It's pretty clear from the discussion, I think, that the database OID
> one is going to need rework to be considered.

I assume this patch is not going to be applied until there is an actual
use case for preserving these values.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-23 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda  wrote:
> > Thanks Robert for your comments.
> > I have split the patch into two portions. One that handles DB OID and
> > the other that
> > handles tablespace OID and relfilenode OID.
> 
> It's pretty clear from the discussion, I think, that the database OID
> one is going to need rework to be considered.

Regarding that ... I have to wonder just what promises we feel we've
made when it comes to what a user is expected to be able to do with the
new cluster *before* pg_upgrade is run on it.  For my part, I sure feel
like it's "nothing", in which case it seems like we can do things that
we can't do with a running system, like literally just DROP and recreate
with the correct OID of any databases we need to, or even push that back
to the user to do that at initdb time with some kind of error thrown by
pg_upgrade during the --check phase.  "Initial databases have
non-standard OIDs, recreate destination cluster with initdb
--with-oid=12341" or something along those lines.

Also open to the idea of simply forcing 'template1' to always being
OID=1 even if it's dropped/recreated and then just dropping/recreating
the template0 and postgres databases if they've got different OIDs than
what the old cluster did- after all, they should be getting entirely
re-populated as part of the pg_upgrade process itself.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-23 Thread Robert Haas
On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda  wrote:
> Thanks Robert for your comments.
> I have split the patch into two portions. One that handles DB OID and
> the other that
> handles tablespace OID and relfilenode OID.

It's pretty clear from the discussion, I think, that the database OID
one is going to need rework to be considered.

Regarding the other one:

- The comment in binary_upgrade_set_pg_class_oids() is still not
accurate. You removed the sentence which says "Indexes cannot have
toast tables, so we need not make this probe in the index code path"
but the immediately preceding sentence is still inaccurate in at least
two ways. First, it only talks about tables, but the code now applies
to indexes. Second, it only talks about OIDs, but now also deals with
refilenodes. It's really important to fully update every comment that
might be affected by your changes!

- The SQL query in that function isn't completely correct. There is a
left join from pg_class to pg_index whose ON clause includes
"c.reltoastrelid = i.indrelid AND i.indisvalid." The reason it's
likely that is because it is possible, in corner cases, for a TOAST
table to have multiple TOAST indexes. I forget exactly how that
happens, but I think it might be like if a REINDEX CONCURRENTLY on the
TOAST table fails midway through, or something of that sort. Now if
that happens, the LEFT JOIN you added is going to cause the output to
contain multiple rows, because you didn't replicate the i.indisvalid
condition into that ON clause. And then it will fail. Apparently we
don't have a pg_upgrade test case for this scenario; we probably
should. Actually what I think would be even better than putting
i.indisvalid into that ON clause would be to join off of i.indrelid
rather than c.reltoastrelid.

- The code that decodes the various columns of this query does so in a
slightly different order than the query itself. It would be better to
make it match. Perhaps put relkind first in both cases. I might also
think about trying to make the column naming a bit more consistent,
e.g. relkind, relfilenode, toast_oid, toast_relfilenode,
toast_index_oid, toast_index_relfilenode.

- In heap_create(), the wording of the error messages is not quite
consistent. You have "relfilenode value not set when in binary upgrade
mode", "toast relfilenode value not set when in binary upgrade mode",
and "pg_class index relfilenode value not set when in binary upgrade
mode". Why does the last one mention pg_class when the other two
don't?

- The code in heap_create() now has no comments whatsoever, which is a
shame, because it's actually kind of a tricky bit of logic. Someone
might wonder why we override the relfilenode inside that function
instead of doing it at the same places where we absorb
binary_upgrade_next_{heap,index,toast}_pg_class_oid and the passing
down the relfilenode. I think the answer is that passing down the
relfilenode from the caller would result in storage not actually being
created, whereas in this case we want it to be created but just with
the value we specify, and the reason we want that is because we need
later DDL that happens after these statements but before the old
cluster's relations are moved over to execute successfully, which it
won't if the storage is altogether absent.

However, that raises the question of whether this patch has even got
the basic design right. Maybe we ought to actually be absorbing the
relfilenode setting at the same places where we're doing so for the
OID, and then passing an additional parameter to heap_create() like
bool suppress_storage or something like that. Maybe, taking it even
further, we ought to be changing the signatures of
binary_upgrade_next_heap_pg_class_oid and friends to be two-argument
functions, and pass down the OID and the relfilenode in the same call,
rather than calling two separate functions. I'm not so much concerned
about the cost of calling two functions as the potential for
confusion. I'm not honestly sure that either of these changes are the
right thing to do, but I am pretty strongly inclined to do at least
the first part - trying to absorb reloid and relfilenode in the same
places. If we're not going to do that we certainly need to explain why
we're doing it the way we are in the comments.

It's not really this patch's fault, but it would sure be nice if we
had some better testing for this area. Suppose this patch somehow
changed nothing from the present behavior. How would we know? Or
suppose it managed to somehow set all the relfilenodes in the new
cluster to random values rather than the intended one? There's no
automated testing that would catch any of that, and it's not obvious
how it could be added to test.sh. I suppose what we really need to do
at some point is rewrite that as a TAP test, but that seems like a
separate project from this patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-20 Thread Shruthi Gowda
> The rest of this email will be detailed review comments on the patch
> as presented, and thus probably only interesting to someone actually
> working on the patch. Feel free to skip if that's not you.
>
> - I suggest splitting the patch into one portion that deals with
> database OID and another portion that deals with tablespace OID and
> relfilenode OID, or maybe splitting it all the way into three separate
> patches, one for each. This could allow the uncontroversial parts to
> get committed first while we're wondering what to do about the problem
> described above.

Thanks Robert for your comments.
I have split the patch into two portions. One that handles DB OID and
the other that
handles tablespace OID and relfilenode OID.

> - There are two places in the patch, one in dumpDatabase() and one in
> generate_old_dump() where blank lines are removed with no other
> changes. Please avoid whitespace-only hunks.

These changes are avoided.

> - If possible, please try to pgindent the new code. It's pretty good
> what you did, but e.g. the declaration of
> binary_upgrade_next_pg_tablespace_oid probably has less whitespace
> than pgindent is going to want.

Taken care in latest patches

> - The comments in dumpDatabase() claim that "postgres" and "template1"
> are handled specially in some way, but there seems to be no code that
> matches those comments.

The comment is removed.

> - heap_create()'s logic around setting create_storage looks slightly
> redundant. I'm not positive what would be better, but ... suppose you
> just took the part that's currently gated by if (!IsBinaryUpgrade) and
> did it unconditionally. Then put if (IsBinaryUpgrade) around the else
> clause, but delete the last bit from there that sets create_storage.
> Maybe we'd still want a comment saying that it's intentional that
> create_storage = true even though it will be overwritten later, but
> then, I think, we wouldn't need to set create_storage in two different
> places. Maybe I'm wrong.
>
> - If we're not going to do that, then I think you should swap the if
> and else clauses and reverse the sense of the test. In createdb(),
> CreateTableSpace(), and a bunch of existing places, we do if
> (IsBinaryUpgrade) { ... } else { ... } so I don't think it makes sense
> for this one to instead do if (!IsBinaryUpgrade) { ... } else { ... }.

I have avoided the redundant code and removed the comment as it does
not make sense now that we are setting the create_storage conditionally.
(In the original patch, create_storage was set to TRUE by default for binary
upgrade case which was wrong and was hitting assert in the following flow).

> - I'm not sure that I'd bother renaming
> binary_upgrade_set_pg_class_oids_and_relfilenodes(). It's such a long
> name, and a relfilenode is kind of an OID, so the current name isn't
> even really wrong. I'd probably drop the header comment too, since it
> seems rather obvious. But both of these things are judgement calls.

I agree. I have retained the old function name.

> - Inside that function, there is a comment that says "Indexes cannot
> have toast tables, so we need not make this probe in the index code
> path." However, you have moved the code from someplace where it didn't
> happen for indexes to someplace where it happens for both tables and
> indexes. Therefore the comment, which was true when the code was where
> it was before, is now false. So you need to update it.

The comment is updated.

> - It is not clear to me why pg_upgrade's Makefile needs to be changed
> to include -DFRONTEND in CPPFLAGS. All of the .c files in this
> directory include postgres_fe.h rather than postgres.h, and that file
> has #define FRONTEND 1. Moreover, there are no actual code changes in
> this directory, so why should the Makefile need any change?

Makefile change is removed.

> - A couple of comment changes - and the commit message - mention data
> encryption, but that's not a feature that this patch implements, nor
> are we committed to adding it in the immediate future (or ever,
> really). So I think those places should be revised to say that we do
> this because we want the filenames to match between the old and new
> clusters, and leave the reasons why that might be a good thing up to
> the reader's imagination.

Taken care.

Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v2-0001-Preserve-database-OIDs-in-pg_upgrade.patch
Description: Binary data


v2-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg.patch
Description: Binary data


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-17 Thread Robert Haas
On Tue, Aug 17, 2021 at 1:54 PM Tom Lane  wrote:
> Right.  If pg_upgrade explicitly ignores template0 then its OID
> need not be stable ... at least, not unless there's a chance it
> could conflict with some other database OID, which would become
> a live possibility if we let users get at "WITH OID = n".

Well, that might be a good reason not to let them do that, then, at
least for n<64k.

> > In fact, I'd probably go so far as to
> > hardcode that in such a way that even if you drop those databases and
> > recreate them, they get recreated with the same hard-coded OID.
>
> Less sure that this is a good idea, though.  In particular, I do not
> think that you can make it work in the face of
> alter database template1 rename to oops;
> create database template1;

That is a really good point. If we can't categorically force the OID
of those databases to have a particular, fixed value, and based on
this example that seems to be impossible, then there's always a
possibility that we might find a value in the old cluster that doesn't
happen to match what is present in the new cluster. Seen from that
angle, the problem is really with databases that are pre-existent in
the new cluster but whose contents still need to be dumped. Maybe we
could (optionally? conditionally?) drop those databases from the new
cluster and then recreate them with the OID that we want them to have.

> > The only hard requirement for this feature is if we
> > use the database OID for some kind of encryption or integrity checking
> > or checksum type feature.
>
> It's fairly unclear to me why that is so important as to justify the
> amount of klugery that this line of thought seems to be bringing.

Well, I think it would make sense to figure out how small we can make
the kludge first, and then decide whether it's larger than we can
tolerate. From my point of view, I completely understand why people to
whom those kinds of features are important want to include all the
fields that make up a buffer tag in the checksum or other integrity
check. Right now, if somebody copies a page from one place to another,
or if the operating system fumbles things and switches some pages
around, we have no direct way of detecting that anything bad has
happened. This is not the only problem that would need to be solved in
order to fix that, but it's one of them, and I don't particularly see
why it's not a valid goal. It's not as if a 16-bit checksum that is
computed in exactly the same way for every page in the cluster is such
state-of-the-art technology that only fools question its surpassing
excellence.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-17 Thread Bruce Momjian
On Tue, Aug 17, 2021 at 11:56:30AM -0400, Robert Haas wrote:
> On Wed, Aug 11, 2021 at 3:41 AM Shruthi Gowda  wrote:
> > I have fixed all the issues and now the patch is working as expected.
> 
> Hi,
> 
> I'm changing the subject line since the patch does something which was
> discussed on that thread but isn't really related to the old email
> subject. In general, I think this patch is uncontroversial and in
> reasonably good shape. However, there's one part that I'm not too sure
> about. If Tom Lane happens to be paying attention to this thread, I
> think his feedback would be particularly useful, since he knows a lot
> about the inner workings of pg_dump. Opinions from anybody else would
> be great, too. Anyway, here's the hunk that worries me:

What is the value of preserving db/ts/relfilenode OIDs?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-17 Thread Shruthi Gowda
On Tue, Aug 17, 2021 at 11:07 PM Robert Haas  wrote:
>
> On Tue, Aug 17, 2021 at 12:42 PM Tom Lane  wrote:
> > Actually though ... I've not read the patch, but what does it do about
> > the fact that the postgres and template0 DBs do not have stable OIDs?
> > I cannot imagine any way to force those to match across PG versions
> > that would not be an unsustainable crock.
>
> Well, it's interesting that you mention that, because there's a
> comment in the patch that probably has to do with this:
>
> +/*
> + * Make sure that pg_upgrade does not change database OID. Don't care
> + * about "postgres" database, backend will assign it fixed OID anyway.
> + * ("template1" has fixed OID too but the value 1 should not collide with
> + * any other OID so backend pays no attention to it.)
> + */
>
In the original patch the author intended to avoid dumping the
postgres DB OID like below:
+ if (dopt->binary_upgrade && dbCatId.oid != PostgresDbOid)

Since postgres OID is not hardcoded/fixed I removed the check.
My bad I missed updating the comment section. Sorry for the confusion.

Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-17 Thread Tom Lane
I wrote:
> Robert Haas  writes:
>> The only hard requirement for this feature is if we
>> use the database OID for some kind of encryption or integrity checking
>> or checksum type feature.

> It's fairly unclear to me why that is so important as to justify the
> amount of klugery that this line of thought seems to be bringing.

And, not to put too fine a point on it, how will you possibly do
that without entirely breaking CREATE DATABASE?

regards, tom lane




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-17 Thread Tom Lane
Robert Haas  writes:
> I wasn't able to properly understand that comment, and to be honest
> I'm not sure I precisely understand your concern either. I don't quite
> see why the template0 database matters. I think that database isn't
> going to be dumped, or restored, so as far as pg_upgrade is concerned
> it might as well not exist in either cluster, and I don't see why
> pg_upgrade can't therefore just ignore it completely. But template1
> and postgres are another matter. If I understand correctly, those
> databases are going to be created in the new cluster by initdb, but
> then pg_upgrade is going to populate them with data - including
> relation files - from the old cluster.

Right.  If pg_upgrade explicitly ignores template0 then its OID
need not be stable ... at least, not unless there's a chance it
could conflict with some other database OID, which would become
a live possibility if we let users get at "WITH OID = n".

(Having said that, I'm not sure that pg_upgrade special-cases
template0, or that it should do so.)

> To be honest, what I'd be inclined to do about that is just nail down
> those OIDs for future releases.

Yeah, I was thinking along similar lines.

> In fact, I'd probably go so far as to
> hardcode that in such a way that even if you drop those databases and
> recreate them, they get recreated with the same hard-coded OID.

Less sure that this is a good idea, though.  In particular, I do not
think that you can make it work in the face of
alter database template1 rename to oops;
create database template1;

> The only hard requirement for this feature is if we
> use the database OID for some kind of encryption or integrity checking
> or checksum type feature.

It's fairly unclear to me why that is so important as to justify the
amount of klugery that this line of thought seems to be bringing.

regards, tom lane




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-17 Thread Robert Haas
On Tue, Aug 17, 2021 at 12:42 PM Tom Lane  wrote:
> Actually though ... I've not read the patch, but what does it do about
> the fact that the postgres and template0 DBs do not have stable OIDs?
> I cannot imagine any way to force those to match across PG versions
> that would not be an unsustainable crock.

Well, it's interesting that you mention that, because there's a
comment in the patch that probably has to do with this:

+/*
+ * Make sure that pg_upgrade does not change database OID. Don't care
+ * about "postgres" database, backend will assign it fixed OID anyway.
+ * ("template1" has fixed OID too but the value 1 should not collide with
+ * any other OID so backend pays no attention to it.)
+ */

I wasn't able to properly understand that comment, and to be honest
I'm not sure I precisely understand your concern either. I don't quite
see why the template0 database matters. I think that database isn't
going to be dumped, or restored, so as far as pg_upgrade is concerned
it might as well not exist in either cluster, and I don't see why
pg_upgrade can't therefore just ignore it completely. But template1
and postgres are another matter. If I understand correctly, those
databases are going to be created in the new cluster by initdb, but
then pg_upgrade is going to populate them with data - including
relation files - from the old cluster. And, yeah, I don't see how we
could make those database OIDs match, which is not great.

To be honest, what I'd be inclined to do about that is just nail down
those OIDs for future releases. In fact, I'd probably go so far as to
hardcode that in such a way that even if you drop those databases and
recreate them, they get recreated with the same hard-coded OID. Now
that doesn't do anything to create stability when people upgrade from
an old release to a current one, but I don't really see that as an
enormous problem. The only hard requirement for this feature is if we
use the database OID for some kind of encryption or integrity checking
or checksum type feature. Then, you want to avoid having the database
OID change when you upgrade, so that the encryption or integrity check
or checksum in question does not have to be recomputed for every page
as part of pg_upgrade. But, that only matters if you're going between
two releases that support that feature, which will not be the case if
you're upgrading from some old release. Apart from that kind of
feature, it still seems like a nice-to-have to keep database OIDs the
same, but if those cases end up as exceptions, oh well.

Does that seem reasonable, or am I missing something big?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-17 Thread Tom Lane
Stephen Frost  writes:
> Also agreed on this, though I wonder- do we actually need to explicitly
> make CREATE DATABASE q WITH OID = 1234; only work during binary upgrade
> mode in the backend?  That strikes me as perhaps doing more work than we
> really need to while also preventing something that users might actually
> like to do.

There should be adequate defenses against a duplicate OID already,
so +1 --- no reason to insist this only be used during binary upgrade.

Actually though ... I've not read the patch, but what does it do about
the fact that the postgres and template0 DBs do not have stable OIDs?
I cannot imagine any way to force those to match across PG versions
that would not be an unsustainable crock.

regards, tom lane




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-17 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > To me, adding a separate TOC entry for a thing that is not really a
> > separate object seems like a scary hack that might come back to bite
> > us. Unfortunately, I don't know enough about pg_dump to say exactly
> > how it might come back to bite us, which leaves wide open the
> > possibility that I am completely wrong I just think it's the
> > intention that archive entries correspond to actual objects in the
> > database, not commands that we want executed in some particular order.
> 
> I agree, this seems like a moderately bad idea.  It could get broken
> either by executing only one of the TOC entries during restore, or
> by executing them in the wrong order.  The latter possibility could
> be forestalled by adding a dependency, which I do not see this hunk
> doing, which is clearly a bug.  The former possibility would require
> user intervention, so maybe it's in the category of "if you break
> this you get to keep both pieces".  Still, it's ugly.

Yeah, agreed.

> > If that criticism is indeed correct, then my proposal would be to
> > instead add a WITH OID = nnn option to CREATE DATABASE and allow it to
> > be used only in binary upgrade mode.
> 
> If it's not too complicated to implement, that seems like an OK idea
> from here.  I don't have any great love for the way we handle OID
> preservation in binary upgrade mode, so not doing it exactly the same
> way for databases doesn't seem like a disadvantage.

Also agreed on this, though I wonder- do we actually need to explicitly
make CREATE DATABASE q WITH OID = 1234; only work during binary upgrade
mode in the backend?  That strikes me as perhaps doing more work than we
really need to while also preventing something that users might actually
like to do.

Either way, we'll need to check that the OID given to us can be used for
the database, I'd think.

Having pg_dump only include it in binary upgrade mode is fine though.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-17 Thread Tom Lane
Robert Haas  writes:
> To me, adding a separate TOC entry for a thing that is not really a
> separate object seems like a scary hack that might come back to bite
> us. Unfortunately, I don't know enough about pg_dump to say exactly
> how it might come back to bite us, which leaves wide open the
> possibility that I am completely wrong I just think it's the
> intention that archive entries correspond to actual objects in the
> database, not commands that we want executed in some particular order.

I agree, this seems like a moderately bad idea.  It could get broken
either by executing only one of the TOC entries during restore, or
by executing them in the wrong order.  The latter possibility could
be forestalled by adding a dependency, which I do not see this hunk
doing, which is clearly a bug.  The former possibility would require
user intervention, so maybe it's in the category of "if you break
this you get to keep both pieces".  Still, it's ugly.

> If that criticism is indeed correct, then my proposal would be to
> instead add a WITH OID = nnn option to CREATE DATABASE and allow it to
> be used only in binary upgrade mode.

If it's not too complicated to implement, that seems like an OK idea
from here.  I don't have any great love for the way we handle OID
preservation in binary upgrade mode, so not doing it exactly the same
way for databases doesn't seem like a disadvantage.

regards, tom lane




preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-08-17 Thread Robert Haas
On Wed, Aug 11, 2021 at 3:41 AM Shruthi Gowda  wrote:
> I have fixed all the issues and now the patch is working as expected.

Hi,

I'm changing the subject line since the patch does something which was
discussed on that thread but isn't really related to the old email
subject. In general, I think this patch is uncontroversial and in
reasonably good shape. However, there's one part that I'm not too sure
about. If Tom Lane happens to be paying attention to this thread, I
think his feedback would be particularly useful, since he knows a lot
about the inner workings of pg_dump. Opinions from anybody else would
be great, too. Anyway, here's the hunk that worries me:

+
+   /*
+* Need a separate entry, otherwise the command will
be run in the
+* same transaction as the CREATE DATABASE command, which is not
+* allowed.
+*/
+   ArchiveEntry(fout,
+dbCatId,   /* catalog ID */
+dbDumpId,  /* dump ID */
+ARCHIVE_OPTS(.tag = datname,
+ .owner = dba,
+
.description = "SET_DB_OID",
+
.section = SECTION_PRE_DATA,
+
.createStmt = setDBIdQry->data,
+
.dropStmt = NULL));
+

To me, adding a separate TOC entry for a thing that is not really a
separate object seems like a scary hack that might come back to bite
us. Unfortunately, I don't know enough about pg_dump to say exactly
how it might come back to bite us, which leaves wide open the
possibility that I am completely wrong I just think it's the
intention that archive entries correspond to actual objects in the
database, not commands that we want executed in some particular order.
If that criticism is indeed correct, then my proposal would be to
instead add a WITH OID = nnn option to CREATE DATABASE and allow it to
be used only in binary upgrade mode. That has the disadvantage of
being inconsistent with the way that we preserve OIDs everywhere else,
but the only other alternatives are (1) do something like the above,
(2) remove the requirement that CREATE DATABASE run in its own
transaction, and (3) give up. (2) sounds hard and (3) is unappealing.

The rest of this email will be detailed review comments on the patch
as presented, and thus probably only interesting to someone actually
working on the patch. Feel free to skip if that's not you.

- I suggest splitting the patch into one portion that deals with
database OID and another portion that deals with tablespace OID and
relfilenode OID, or maybe splitting it all the way into three separate
patches, one for each. This could allow the uncontroversial parts to
get committed first while we're wondering what to do about the problem
described above.

- There are two places in the patch, one in dumpDatabase() and one in
generate_old_dump() where blank lines are removed with no other
changes. Please avoid whitespace-only hunks.

- If possible, please try to pgindent the new code. It's pretty good
what you did, but e.g. the declaration of
binary_upgrade_next_pg_tablespace_oid probably has less whitespace
than pgindent is going to want.

- The comments in dumpDatabase() claim that "postgres" and "template1"
are handled specially in some way, but there seems to be no code that
matches those comments.

- heap_create()'s logic around setting create_storage looks slightly
redundant. I'm not positive what would be better, but ... suppose you
just took the part that's currently gated by if (!IsBinaryUpgrade) and
did it unconditionally. Then put if (IsBinaryUpgrade) around the else
clause, but delete the last bit from there that sets create_storage.
Maybe we'd still want a comment saying that it's intentional that
create_storage = true even though it will be overwritten later, but
then, I think, we wouldn't need to set create_storage in two different
places. Maybe I'm wrong.

- If we're not going to do that, then I think you should swap the if
and else clauses and reverse the sense of the test. In createdb(),
CreateTableSpace(), and a bunch of existing places, we do if
(IsBinaryUpgrade) { ... } else { ... } so I don't think it makes sense
for this one to instead do if (!IsBinaryUpgrade) { ... } else { ... }.

- I'm not sure that I'd bother renaming
binary_upgrade_set_pg_class_oids_and_relfilenodes(). It's such a long
name, and a relfilenode is kind of an OID, so the current name isn't
even really wrong. I'd probably drop the header comment too, since it
seems rather obvious. But both of these things are judgement calls.

- Inside that function, there is a comment that says "Indexes cannot
have toast tables, so we need not make this probe in the index code
path." However, you have moved the code from someplace where it didn't
happen for indexes to someplace where it happens for both tables and
indexes. Therefore the comment, which was true when the code was