Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-14 Thread Kyotaro Horiguchi
At Thu, 14 Apr 2022 11:13:01 -0400, Robert Haas  wrote 
in 
> On Tue, Apr 12, 2022 at 6:55 AM Kyotaro Horiguchi
>  wrote:
> > (My mailer has been fixed.)
> 
> Cool.
> 
> > So, I created the patches for back-patching from 10 to 14.  (With
> > fixed a silly bug of the v1-pg14 that HaveVirtualXIDsDelayingChkpt and
> > HaveVirtualXIDsDelayingChkptEnd are inverted..)
> 
> I am very sorry not to use these, but as I said in my previous post on
> this thread, I prefer to commit what I wrote and Markus revised rather
> than these versions. I have done that now.

Not at all. It's just an unfortunate crossing.
Thanks for fixing this.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-14 Thread Robert Haas
On Tue, Apr 12, 2022 at 6:55 AM Kyotaro Horiguchi
 wrote:
> (My mailer has been fixed.)

Cool.

> So, I created the patches for back-patching from 10 to 14.  (With
> fixed a silly bug of the v1-pg14 that HaveVirtualXIDsDelayingChkpt and
> HaveVirtualXIDsDelayingChkptEnd are inverted..)

I am very sorry not to use these, but as I said in my previous post on
this thread, I prefer to commit what I wrote and Markus revised rather
than these versions. I have done that now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-12 Thread Kyotaro Horiguchi
(My mailer has been fixed.)

At Mon, 11 Apr 2022 21:45:59 +0200, Markus Wanner 
 wrote in 
> On Mon, 2022-04-11 at 15:21 -0400, Robert Haas wrote:
> > ... before v13, the commit in question actually
> > changed the size of PGXACT, which is really quite bad -- it needs to
> > be 12 bytes for performance reasons. And there's no spare bytes
> > available, so I think we should follow one of the suggestions that he
> > had over in that email thread, and put delayChkptEnd in PGPROC even
> > though delayChkpt is in PGXACT.
> 
> This makes sense to me.  Kudos to Kyotaro for considering this.
> 
> At first read, this sounded like a trade-off between compatibility and
> performance for PG 12 and older.  But I realize leaving delayChkpt in
> PGXACT and adding just delayChkptEnd to PGPROC is compatible and leaves
> PGXACT at a size of 12 bytes.  So this sounds like a good approach to
> me.

Thanks!

So, I created the patches for back-patching from 10 to 14.  (With
fixed a silly bug of the v1-pg14 that HaveVirtualXIDsDelayingChkpt and
HaveVirtualXIDsDelayingChkptEnd are inverted..)

They revert delayChkpt-related changes made by the patch and add
delayChkptEnd stuff.  I compared among every pair of the patches for
neighbouring versions, to make sure not making the same change in
different way and they have the same set of hunks.

This version takes the way sharing the common static function
(*ChkptGuts) between the functions *Chkpt() and *ChkptEnd().

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 471ca4aa38a92f837ed4e4aeda40648c3b0b0a9b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 6 Apr 2022 15:47:50 +0900
Subject: [PATCH v2] Fix ABI/API break

Commit bbace5697d caused ABI break by changing the type and width of
PGPROC.delayChkpt and API break by changing the signature of some
public functions.  Restore the member and function signature to
previous state and add delayChkptEnd to an existing gap in PGPROC
struct.

Backpatch to 10, all supported branches.
---
 src/backend/access/transam/multixact.c  |  6 +--
 src/backend/access/transam/twophase.c   | 13 +++---
 src/backend/access/transam/xact.c   |  5 +--
 src/backend/access/transam/xlog.c   | 10 ++---
 src/backend/access/transam/xloginsert.c |  2 +-
 src/backend/catalog/storage.c   |  6 +--
 src/backend/storage/buffer/bufmgr.c |  6 +--
 src/backend/storage/ipc/procarray.c | 60 +++--
 src/backend/storage/lmgr/proc.c |  6 ++-
 src/include/storage/proc.h  | 42 +++--
 src/include/storage/procarray.h |  8 ++--
 11 files changed, 82 insertions(+), 82 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 50d8bab9e2..b643564f16 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -3075,8 +3075,8 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	 * crash/basebackup, even though the state of the data directory would
 	 * require it.
 	 */
-	Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0);
-	MyProc->delayChkpt |= DELAY_CHKPT_START;
+	Assert(!MyProc->delayChkpt);
+	MyProc->delayChkpt = true;
 
 	/* WAL log truncation */
 	WriteMTruncateXlogRec(newOldestMultiDB,
@@ -3102,7 +3102,7 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	/* Then offsets */
 	PerformOffsetsTruncation(oldestMulti, newOldestMulti);
 
-	MyProc->delayChkpt &= ~DELAY_CHKPT_START;
+	MyProc->delayChkpt = false;
 
 	END_CRIT_SECTION();
 	LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index dea3f485f7..911d93fbf4 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -474,7 +474,8 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	}
 	proc->xid = xid;
 	Assert(proc->xmin == InvalidTransactionId);
-	proc->delayChkpt = 0;
+	proc->delayChkpt = false;
+	proc->delayChkptEnd = false;
 	proc->statusFlags = 0;
 	proc->pid = 0;
 	proc->databaseId = databaseid;
@@ -1165,8 +1166,7 @@ EndPrepare(GlobalTransaction gxact)
 
 	START_CRIT_SECTION();
 
-	Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0);
-	MyProc->delayChkpt |= DELAY_CHKPT_START;
+	MyProc->delayChkpt = true;
 
 	XLogBeginInsert();
 	for (record = records.head; record != NULL; record = record->next)
@@ -1209,7 +1209,7 @@ EndPrepare(GlobalTransaction gxact)
 	 * checkpoint starting after this will certainly see the gxact as a
 	 * candidate for fsyncing.
 	 */
-	MyProc->delayChkpt &= ~DELAY_CHKPT_START;
+	MyProc->delayChkpt = false;
 
 	/*
 	 * Remember that we have this GlobalTransaction entry locked for us.  If
@@ -2276,8 +2276,7 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	START_CRIT_SECTION();
 
 	/* See notes in RecordTransactionCommit */
-	Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0);
-	

Re: API stability

2022-04-11 Thread Kyotaro Horiguchi
At Mon, 11 Apr 2022 12:48:25 +0200, Matthias van de Meent 
 wrote in 
> On Mon, 11 Apr 2022 at 06:30, Kyotaro Horiguchi  
> wrote:
> I won't speak for Robert H., but this might be because of gmail not
> putting this mail in the right thread: Your mail client dropped the
> "[was: pgsql: ...]" tag, which Gmail subsequently displays as a
> different thread (that is, in my Gmail UI there's three "Re: API
> stability" threads, one of which has the [was: pgsql: ...]-tag, and
> two of which seem to be started by you as a reply on the original
> thread, but with the [was: pgsql: ...]-tag dropped and thus considered
> a new thread).

Mmm. d*** gmail..  My main mailer does that defaltly but I think I can
*fix* that behavior.

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-11 Thread Markus Wanner
On Mon, 2022-04-11 at 15:21 -0400, Robert Haas wrote:
> ... before v13, the commit in question actually
> changed the size of PGXACT, which is really quite bad -- it needs to
> be 12 bytes for performance reasons. And there's no spare bytes
> available, so I think we should follow one of the suggestions that he
> had over in that email thread, and put delayChkptEnd in PGPROC even
> though delayChkpt is in PGXACT.

This makes sense to me.  Kudos to Kyotaro for considering this.

At first read, this sounded like a trade-off between compatibility and
performance for PG 12 and older.  But I realize leaving delayChkpt in
PGXACT and adding just delayChkptEnd to PGPROC is compatible and leaves
PGXACT at a size of 12 bytes.  So this sounds like a good approach to
me.

Best Regards

Markus





Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-11 Thread Robert Haas
On Fri, Apr 8, 2022 at 11:50 AM Robert Haas  wrote:
> On Fri, Apr 8, 2022 at 4:47 AM Markus Wanner
>  wrote:
> > I agree with Michael, it would be nice to not duplicate the code, but
> > use a common underlying method.  A modified patch is attached.
>
> I don't think this is better, but I don't think it's worth arguing
> about, either, so I'll do it this way if nobody objects.
>
> Meanwhile, I've committed the patch for master to master.

Well, I've just realized that Kyotaro Horiguchi volunteered to fix
this on an email thread I did not see because of the way Gmail breaks
the thread if you change the subject line. And he developed a very
similar patch to what we have here. I'm going to use this one as the
basis for going forward because I've already studied it in detail and
it's less work for me to stick with what I know than to go study
something else. But, he also noticed something which we didn't notice
here, which is that before v13, the commit in question actually
changed the size of PGXACT, which is really quite bad -- it needs to
be 12 bytes for performance reasons. And there's no spare bytes
available, so I think we should follow one of the suggestions that he
had over in that email thread, and put delayChkptEnd in PGPROC even
though delayChkpt is in PGXACT.

Comments?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: API stability

2022-04-11 Thread Robert Haas
On Mon, Apr 11, 2022 at 6:48 AM Matthias van de Meent
 wrote:
> So, this might be the reason Robert overlooked your declaration to
> volunteer: he was looking for volunteers in the thread "Re: API
> Stability [was: pgsql: ...]" in the Gmail UI, which didn't show your
> messages there because of the different subject line.

Yes, that's what happened.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: API stability

2022-04-11 Thread Matthias van de Meent
On Mon, 11 Apr 2022 at 06:30, Kyotaro Horiguchi  wrote:
>
> (a bit off-topic)
>
> I'm not sure where I am..
>
> At Wed, 06 Apr 2022 10:36:30 +0900 (JST), Kyotaro Horiguchi 
>  wrote in
> me> > this if nobody else would like to do it, but let me ask whether
> me> > Kyotaro Horiguchi would like to propose a patch, since the original
> me> > patch did, and/or whether you would like to propose a patch, as the
> me> > person reporting the issue.
> me>
> me> I'd like to do that. Let me see.
>
> At Thu, 7 Apr 2022 10:04:20 -0400, Robert Haas  wrote 
> in
> > struct, which is what we now need to fix. Since I don't hear anyone
> > else volunteering to take care of that, I'll go work on it.
>
> Just confirmation. Is my message above didn't look like declaring that
> I'd like to volunteering?  If so, please teach me the correct way to
> say that, since I don't want to repeat the same mistake.  Or are there
> some other reasons?  (Sorry if this looks like a blame, but I asking
> plainly (really:).)

I won't speak for Robert H., but this might be because of gmail not
putting this mail in the right thread: Your mail client dropped the
"[was: pgsql: ...]" tag, which Gmail subsequently displays as a
different thread (that is, in my Gmail UI there's three "Re: API
stability" threads, one of which has the [was: pgsql: ...]-tag, and
two of which seem to be started by you as a reply on the original
thread, but with the [was: pgsql: ...]-tag dropped and thus considered
a new thread).

So, this might be the reason Robert overlooked your declaration to
volunteer: he was looking for volunteers in the thread "Re: API
Stability [was: pgsql: ...]" in the Gmail UI, which didn't show your
messages there because of the different subject line.

Kind regards,

Matthias van de Meent




Re: API stability

2022-04-10 Thread Kyotaro Horiguchi
(a bit off-topic)

I'm not sure where I am..

At Wed, 06 Apr 2022 10:36:30 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> > this if nobody else would like to do it, but let me ask whether
me> > Kyotaro Horiguchi would like to propose a patch, since the original
me> > patch did, and/or whether you would like to propose a patch, as the
me> > person reporting the issue.
me> 
me> I'd like to do that. Let me see.

At Thu, 7 Apr 2022 10:04:20 -0400, Robert Haas  wrote in 
> struct, which is what we now need to fix. Since I don't hear anyone
> else volunteering to take care of that, I'll go work on it.

Just confirmation. Is my message above didn't look like declaring that
I'd like to volunteering?  If so, please teach me the correct way to
say that, since I don't want to repeat the same mistake.  Or are there
some other reasons?  (Sorry if this looks like a blame, but I asking
plainly (really:).)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-08 Thread Robert Haas
On Fri, Apr 8, 2022 at 4:47 AM Markus Wanner
 wrote:
> I agree with Michael, it would be nice to not duplicate the code, but
> use a common underlying method.  A modified patch is attached.

I don't think this is better, but I don't think it's worth arguing
about, either, so I'll do it this way if nobody objects.

Meanwhile, I've committed the patch for master to master.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-08 Thread Markus Wanner
On Fri, 2022-04-08 at 08:47 +0900, Michael Paquier wrote:
> On Thu, Apr 07, 2022 at 11:19:15AM -0400, Robert Haas wrote:
> > Here are patches for master and v14 to do things this way.
> > Comments?
> 
> Thanks for the patches.  They look correct.

+1, looks good to me and addresses my specific original concern.

> For ~14, I'd rather avoid
> the code duplication done by GetVirtualXIDsDelayingChkptEnd() and
> HaveVirtualXIDsDelayingChkpt() that could be avoided with an extra
> bool argument to the existing routine.  The same kind of duplication
> happens with GetVirtualXIDsDelayingChkpt() and
> GetVirtualXIDsDelayingChkptEnd().

I agree with Michael, it would be nice to not duplicate the code, but
use a common underlying method.  A modified patch is attached.

Best Regards

Markus

From: Robert Haas 
Date: Thu, 7 Apr 2022 11:15:07 -0400
Subject: [PATCH] Rethink the delay-checkpoint-end mechanism in the
 back-branches.

The back-patch of commit bbace5697df12398e87ffd9879171c39d27f5b33 had
the unfortunate effect of changing the layout of PGPROC in the
back-branches, which could break extensions. This happened because it
changed the delayChkpt from type bool to type int. So, change it back,
and add a new bool delayChkptEnd field instead. The new field should
fall within what used to be padding space within the struct, and so
hopefully won't cause any extensions to break.

Per report from Markus Wanner and discussion with Tom Lane and others.
---
 backend/access/transam/multixact.c  |   6 +-
 backend/access/transam/twophase.c   |  13 ++--
 backend/access/transam/xact.c   |   6 +-
 backend/access/transam/xlog.c   |  10 +--
 backend/access/transam/xloginsert.c |   2 
 backend/catalog/storage.c   |   6 +-
 backend/storage/buffer/bufmgr.c |   6 +-
 backend/storage/ipc/procarray.c | 100 +++-
 include/storage/proc.h  |  35 +---
 include/storage/procarray.h |   7 +-
 10 files changed, 107 insertions(+), 84 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 50d8bab9e21..b643564f16a 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -3075,8 +3075,8 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	 * crash/basebackup, even though the state of the data directory would
 	 * require it.
 	 */
-	Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0);
-	MyProc->delayChkpt |= DELAY_CHKPT_START;
+	Assert(!MyProc->delayChkpt);
+	MyProc->delayChkpt = true;
 
 	/* WAL log truncation */
 	WriteMTruncateXlogRec(newOldestMultiDB,
@@ -3102,7 +3102,7 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	/* Then offsets */
 	PerformOffsetsTruncation(oldestMulti, newOldestMulti);
 
-	MyProc->delayChkpt &= ~DELAY_CHKPT_START;
+	MyProc->delayChkpt = false;
 
 	END_CRIT_SECTION();
 	LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index dea3f485f7a..633a6f1747f 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -474,8 +474,9 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	}
 	proc->xid = xid;
 	Assert(proc->xmin == InvalidTransactionId);
-	proc->delayChkpt = 0;
+	proc->delayChkpt = false;
 	proc->statusFlags = 0;
+	proc->delayChkptEnd = false;
 	proc->pid = 0;
 	proc->databaseId = databaseid;
 	proc->roleId = owner;
@@ -1165,8 +1166,7 @@ EndPrepare(GlobalTransaction gxact)
 
 	START_CRIT_SECTION();
 
-	Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0);
-	MyProc->delayChkpt |= DELAY_CHKPT_START;
+	MyProc->delayChkpt = true;
 
 	XLogBeginInsert();
 	for (record = records.head; record != NULL; record = record->next)
@@ -1209,7 +1209,7 @@ EndPrepare(GlobalTransaction gxact)
 	 * checkpoint starting after this will certainly see the gxact as a
 	 * candidate for fsyncing.
 	 */
-	MyProc->delayChkpt &= ~DELAY_CHKPT_START;
+	MyProc->delayChkpt = false;
 
 	/*
 	 * Remember that we have this GlobalTransaction entry locked for us.  If
@@ -2276,8 +2276,7 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	START_CRIT_SECTION();
 
 	/* See notes in RecordTransactionCommit */
-	Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0);
-	MyProc->delayChkpt |= DELAY_CHKPT_START;
+	MyProc->delayChkpt = true;
 
 	/*
 	 * Emit the XLOG commit record. Note that we mark 2PC commits as
@@ -2325,7 +2324,7 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	TransactionIdCommitTree(xid, nchildren, children);
 
 	/* Checkpoint can proceed now */
-	MyProc->delayChkpt &= ~DELAY_CHKPT_START;
+	MyProc->delayChkpt = false;
 
 	END_CRIT_SECTION();
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index c5e7261921d..e8523693c0a 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1335,9 +1335,9 

Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-08 Thread Michael Paquier
On Thu, Apr 07, 2022 at 10:19:35PM -0400, Robert Haas wrote:
> Yeah, that's exactly why I didn't do what Michael proposes. If we're
> going to go to this trouble to avoid changing the layout of a PGPROC,
> we must be doing that on the theory that extension code cares about
> delayChkpt. And if that is so, it seems reasonable to suppose that it
> might also want to call the associated functions.

Compatibility does not strike me as a problem with two static inline 
functions used as wrappers of their common logic.

> Honestly, I wouldn't have thought that this mattered, because I
> wouldn't have guessed that any non-core code cared about delayChkpt.
> But I would have been wrong.

That's a minor point.  If you wish to keep this code as you are
proposing, that's fine as well by me.
--
Michael


signature.asc
Description: PGP signature


Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-07 Thread Robert Haas
On Thu, Apr 7, 2022 at 7:52 PM Tom Lane  wrote:
> Michael Paquier  writes:
> > On Thu, Apr 07, 2022 at 11:19:15AM -0400, Robert Haas wrote:
> >> Here are patches for master and v14 to do things this way. Comments?
>
> > Thanks for the patches.  They look correct.  For ~14, I'd rather avoid
> > the code duplication done by GetVirtualXIDsDelayingChkptEnd() and
> > HaveVirtualXIDsDelayingChkpt() that could be avoided with an extra
> > bool argument to the existing routine.
>
> Isn't adding another argument an API break?  (If there's any outside
> code calling GetVirtualXIDsDelayingChkpt, which it seems like there
> might be.)

Yeah, that's exactly why I didn't do what Michael proposes. If we're
going to go to this trouble to avoid changing the layout of a PGPROC,
we must be doing that on the theory that extension code cares about
delayChkpt. And if that is so, it seems reasonable to suppose that it
might also want to call the associated functions.

Honestly, I wouldn't have thought that this mattered, because I
wouldn't have guessed that any non-core code cared about delayChkpt.
But I would have been wrong.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: API stability

2022-04-07 Thread Kyotaro Horiguchi
(Mmm. My mailer automatically teared off the [was: ..] part from the
subject..)

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: API stability

2022-04-07 Thread Kyotaro Horiguchi
At Fri, 8 Apr 2022 08:47:42 +0900, Michael Paquier  wrote 
in 
> On Thu, Apr 07, 2022 at 11:19:15AM -0400, Robert Haas wrote:
> > Here are patches for master and v14 to do things this way. Comments?
> 
> Thanks for the patches.  They look correct.  For ~14, I'd rather avoid
> the code duplication done by GetVirtualXIDsDelayingChkptEnd() and
> HaveVirtualXIDsDelayingChkpt() that could be avoided with an extra
> bool argument to the existing routine.  The same kind of duplication
> happens with GetVirtualXIDsDelayingChkpt() and
> GetVirtualXIDsDelayingChkptEnd().

FWIW, it is done in [1].

[1] 
https://www.postgresql.org/message-id/20220406.164521.17171257901083417.horikyota.ntt%40gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-07 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Apr 07, 2022 at 11:19:15AM -0400, Robert Haas wrote:
>> Here are patches for master and v14 to do things this way. Comments?

> Thanks for the patches.  They look correct.  For ~14, I'd rather avoid
> the code duplication done by GetVirtualXIDsDelayingChkptEnd() and
> HaveVirtualXIDsDelayingChkpt() that could be avoided with an extra
> bool argument to the existing routine.

Isn't adding another argument an API break?  (If there's any outside
code calling GetVirtualXIDsDelayingChkpt, which it seems like there
might be.)

regards, tom lane




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-07 Thread Michael Paquier
On Thu, Apr 07, 2022 at 11:19:15AM -0400, Robert Haas wrote:
> Here are patches for master and v14 to do things this way. Comments?

Thanks for the patches.  They look correct.  For ~14, I'd rather avoid
the code duplication done by GetVirtualXIDsDelayingChkptEnd() and
HaveVirtualXIDsDelayingChkpt() that could be avoided with an extra
bool argument to the existing routine.  The same kind of duplication
happens with GetVirtualXIDsDelayingChkpt() and
GetVirtualXIDsDelayingChkptEnd().
--
Michael


signature.asc
Description: PGP signature


Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-07 Thread Petr Jelinek


> On 7. 4. 2022, at 17:19, Robert Haas  wrote:
> 
> On Tue, Apr 5, 2022 at 10:17 AM Tom Lane  wrote:
>> What I think you need to do is:
>> 
>> 1. In the back branches, revert delayChkpt to its previous type and
>> semantics.  Squeeze a separate delayChkptEnd bool in somewhere
>> (you can't change the struct size either ...).
>> 
>> 2. In HEAD, rename the field to something like delayChkptFlags,
>> to ensure that any code touching it has to be inspected and updated.
> 
> Here are patches for master and v14 to do things this way. Comments?


Yeah I think this should do it (compilers should warn on master even without 
the rename, but who notices that right? :) )

Thanks,
Petr





Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-07 Thread Tom Lane
Robert Haas  writes:
> Here are patches for master and v14 to do things this way. Comments?

WFM.

regards, tom lane




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-07 Thread Robert Haas
On Tue, Apr 5, 2022 at 10:17 AM Tom Lane  wrote:
> What I think you need to do is:
>
> 1. In the back branches, revert delayChkpt to its previous type and
> semantics.  Squeeze a separate delayChkptEnd bool in somewhere
> (you can't change the struct size either ...).
>
> 2. In HEAD, rename the field to something like delayChkptFlags,
> to ensure that any code touching it has to be inspected and updated.

Here are patches for master and v14 to do things this way. Comments?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


0001-Rename-delayChkpt-to-delayChkptFlags.patch
Description: Binary data


0001-Rethink-the-delay-checkpoint-end-mechanism-in-the-ba.patch
Description: Binary data


Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-07 Thread Robert Haas
On Thu, Apr 7, 2022 at 2:28 AM Michael Paquier  wrote:
> > Well, perhaps it's not the end of the world, but it's still a large
> > PITA for the maintainer of such an extension.  They can't "just fix it"
> > because some percentage of their userbase will still need to compile
> > against older minor releases.  Nor have you provided any way to handle
> > that requirement via conditional compilation.
>
> For example, I recall that some external extensions make use of
> sizeof(PGPROC) for their own business.  Isn't 412ad7a going to be a
> problem to change this structure's internals for already-compiled code
> on stable branches?

I don't think that commit changed sizeof(PGPROC), but it did affect
the position of the delayChkpt and statusFlags members within the
struct, which is what we now need to fix. Since I don't hear anyone
else volunteering to take care of that, I'll go work on it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-07 Thread Michael Paquier
On Tue, Apr 05, 2022 at 03:16:20PM -0400, Tom Lane wrote:
> Robert Haas  writes:
>> On Tue, Apr 5, 2022 at 10:32 AM Tom Lane  wrote:
>>> My point is that we want that to happen in HEAD, but it's not okay
>>> for it to happen in a minor release of a stable branch.
> 
>> I understand, but I am not sure that I agree. I think that if an
>> extension stops compiling against a back-branch, someone will notice
>> the next time they try to compile it and will fix it. Maybe that's not
>> amazing, but I don't think it's a huge deal either.

I agree with Tom's argument.  The internals of this structure should
not have changed in a stable branch.

> Well, perhaps it's not the end of the world, but it's still a large
> PITA for the maintainer of such an extension.  They can't "just fix it"
> because some percentage of their userbase will still need to compile
> against older minor releases.  Nor have you provided any way to handle
> that requirement via conditional compilation.

For example, I recall that some external extensions make use of
sizeof(PGPROC) for their own business.  Isn't 412ad7a going to be a
problem to change this structure's internals for already-compiled code
on stable branches?
--
Michael


signature.asc
Description: PGP signature


Re: API stability

2022-04-06 Thread Kyotaro Horiguchi
At Wed, 06 Apr 2022 18:13:17 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Wed, 6 Apr 2022 10:30:32 +0200, Alvaro Herrera  
> wrote in 
> > For code documentation purposes, I think it is slightly better to use
> > bits8 than uint8 for variables where you're storing independent bit flags.
> 
> Oh, agreed.  Will fix in the next version along with other fixes.

The immediately folloing member statusFlags is in uint8.  So using
bits8 here results in the following look.

>   bits8   delayChkptFlags;/* for DELAY_CHKPT_* flags */
>
>   uint8   statusFlags;/* this backend's status flags, see 
> PROC_*
>* above. 
> mirrored in

PGPROC has another member that fits  bits*.

>   uint64  fpLockBits; /* lock modes held for each 
> fast-path slot */

Do I change this in this patch? Or leave them for another chance?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: API stability

2022-04-06 Thread Kyotaro Horiguchi
At Wed, 6 Apr 2022 10:30:32 +0200, Alvaro Herrera  
wrote in 
> On 2022-Apr-06, Kyotaro Horiguchi wrote:
> 
> > For master, renamed delayChkpt to delayChkptFlags and changed it to
> > uint8.
> 
> For code documentation purposes, I think it is slightly better to use
> bits8 than uint8 for variables where you're storing independent bit flags.

Oh, agreed.  Will fix in the next version along with other fixes.

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: API stability

2022-04-06 Thread Alvaro Herrera
On 2022-Apr-06, Kyotaro Horiguchi wrote:

> For master, renamed delayChkpt to delayChkptFlags and changed it to
> uint8.

For code documentation purposes, I think it is slightly better to use
bits8 than uint8 for variables where you're storing independent bit flags.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)




Re: API stability

2022-04-06 Thread Kyotaro Horiguchi
At Wed, 06 Apr 2022 15:53:32 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Wed, 06 Apr 2022 15:31:53 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > At Wed, 06 Apr 2022 14:30:37 +0900 (JST), Kyotaro Horiguchi 
> >  wrote in 
> > > So if we don't want to move any member in PGPROC, we do:
> > > 
> > > 14: after statusFlags.
> > > 13: after delayChkpt.
> > > 12-10: after syncRepState (and before syncRepLinks).
> > > 
> > > If we allow to shift some members, the new flag can be placed more
> > > saner place.
> > > 
> > > 14: after delayChkpt ((uint8)statusFlags moves forward by 1 byte)
> > > 13: after delayChkpt (no member moves)
> > > 12-10: after subxids ((bool)procArrayGroupMember moves forward by 1 byte)
> > > 
> > > I continue working on the last direction above.
> > 
> > Hmm. That is ABI break.  I go with the first way.
> 
> By the way, the patch for -14 changed the sigunature of two public
> functions.
> 
> -GetVirtualXIDsDelayingChkpt(int *nvxids)
> +GetVirtualXIDsDelayingChkpt(int *nvxids, int type)
> 
> -HaveVirtualXIDsDelayingChkpt(VirtualTransactionId *vxids, int nvxids)
> +HaveVirtualXIDsDelayingChkpt(VirtualTransactionId *vxids, int nvxids, int 
> type)
> 
> Do I need to restore the signature?

For master, renamed delayChkpt to delayChkptFlags and changed it to uint8.

For 14, restored delayChkpt to bool and added delayChkptEnd into a gap in 
PGPROC, then restored the signature of the two functions above to before the 
patch. Then added a new functions ..DelayingChkptEnd().

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 307c295369d24e8a0ec4d3977ce1547162e7eeb9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 6 Apr 2022 14:47:02 +0900
Subject: [PATCH v1] Rename and change type of delayChkpt

Commit 412ad7a556 changed the type of PGPROC.delayChkpt, but did not
change its name.  Rename it to delayChkptFlags so as to ensure that
any code touching it has to be inspected and updated.

On the way renaming the struct member, narrow the width of the member
to uint8.  It has been inadvertently widen from bool to int.

Note: Comments are not intentionally reflowed so that this can be
easily reviewed.
---
 src/backend/access/transam/multixact.c  |  6 +++---
 src/backend/access/transam/twophase.c   | 22 +++---
 src/backend/access/transam/xact.c   | 12 ++--
 src/backend/access/transam/xlog.c   |  2 +-
 src/backend/access/transam/xloginsert.c |  2 +-
 src/backend/catalog/storage.c   |  6 +++---
 src/backend/storage/buffer/bufmgr.c |  6 +++---
 src/backend/storage/ipc/procarray.c | 22 +++---
 src/backend/storage/lmgr/proc.c |  4 ++--
 src/include/storage/proc.h  |  6 +++---
 10 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9f65c600d0..45907d1b44 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -3088,8 +3088,8 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	 * crash/basebackup, even though the state of the data directory would
 	 * require it.
 	 */
-	Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0);
-	MyProc->delayChkpt |= DELAY_CHKPT_START;
+	Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
+	MyProc->delayChkptFlags |= DELAY_CHKPT_START;
 
 	/* WAL log truncation */
 	WriteMTruncateXlogRec(newOldestMultiDB,
@@ -3115,7 +3115,7 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	/* Then offsets */
 	PerformOffsetsTruncation(oldestMulti, newOldestMulti);
 
-	MyProc->delayChkpt &= ~DELAY_CHKPT_START;
+	MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
 
 	END_CRIT_SECTION();
 	LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 4dc8ccc12b..0a0e35e79f 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -475,7 +475,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	}
 	proc->xid = xid;
 	Assert(proc->xmin == InvalidTransactionId);
-	proc->delayChkpt = 0;
+	proc->delayChkptFlags = 0;
 	proc->statusFlags = 0;
 	proc->pid = 0;
 	proc->databaseId = databaseid;
@@ -1151,11 +1151,11 @@ EndPrepare(GlobalTransaction gxact)
 	 * Now writing 2PC state data to WAL. We let the WAL's CRC protection
 	 * cover us, so no need to calculate a separate CRC.
 	 *
-	 * We have to set delayChkpt here, too; otherwise a checkpoint starting
+	 * We have to set delayChkptFlags here, too; otherwise a checkpoint starting
 	 * immediately after the WAL record is inserted could complete without
 	 * fsync'ing our state file.  (This is essentially the same kind of race
 	 * condition as the COMMIT-to-clog-write case that RecordTransactionCommit
-	 * uses delayChkpt for; see notes there.)
+	 * uses delayChkptFlags for; see notes there.)
 	 *
 	 * We 

Re: API stability

2022-04-06 Thread Kyotaro Horiguchi
At Wed, 06 Apr 2022 15:31:53 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Wed, 06 Apr 2022 14:30:37 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > So if we don't want to move any member in PGPROC, we do:
> > 
> > 14: after statusFlags.
> > 13: after delayChkpt.
> > 12-10: after syncRepState (and before syncRepLinks).
> > 
> > If we allow to shift some members, the new flag can be placed more
> > saner place.
> > 
> > 14: after delayChkpt ((uint8)statusFlags moves forward by 1 byte)
> > 13: after delayChkpt (no member moves)
> > 12-10: after subxids ((bool)procArrayGroupMember moves forward by 1 byte)
> > 
> > I continue working on the last direction above.
> 
> Hmm. That is ABI break.  I go with the first way.

By the way, the patch for -14 changed the sigunature of two public
functions.

-GetVirtualXIDsDelayingChkpt(int *nvxids)
+GetVirtualXIDsDelayingChkpt(int *nvxids, int type)

-HaveVirtualXIDsDelayingChkpt(VirtualTransactionId *vxids, int nvxids)
+HaveVirtualXIDsDelayingChkpt(VirtualTransactionId *vxids, int nvxids, int type)


Do I need to restore the signature?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: API stability

2022-04-06 Thread Kyotaro Horiguchi
At Wed, 06 Apr 2022 14:30:37 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> So if we don't want to move any member in PGPROC, we do:
> 
> 14: after statusFlags.
> 13: after delayChkpt.
> 12-10: after syncRepState (and before syncRepLinks).
> 
> If we allow to shift some members, the new flag can be placed more
> saner place.
> 
> 14: after delayChkpt ((uint8)statusFlags moves forward by 1 byte)
> 13: after delayChkpt (no member moves)
> 12-10: after subxids ((bool)procArrayGroupMember moves forward by 1 byte)
> 
> I continue working on the last direction above.

Hmm. That is ABI break.  I go with the first way.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: API stability

2022-04-05 Thread Kyotaro Horiguchi
me> I'd like to do that. Let me see.

At Tue, 5 Apr 2022 10:29:03 -0400, Robert Haas  wrote in 
> On Tue, Apr 5, 2022 at 10:17 AM Tom Lane  wrote:
> > What I think you need to do is:
> >
> > 1. In the back branches, revert delayChkpt to its previous type and
> > semantics.  Squeeze a separate delayChkptEnd bool in somewhere
> > (you can't change the struct size either ...).
> >
> > 2. In HEAD, rename the field to something like delayChkptFlags,
> > to ensure that any code touching it has to be inspected and updated.
> 
> Well, we can do it that way, I suppose.

The change is easy on head, but is it better use uint8 instead of int
for delayChkptFlags?

In the back branches, we have, on gcc/Linux/x86-64,
14's PGPROC is 880 bytes and has gaps:

- 6 bytes after statusFlag
- 4 bytes after syncRepState
- 2 bytes after subxidStatus
- 3 bytes after procArrayGroupMember
- 3 bytes after clogGroupMember
- 3 bytes after fpVXIDLock

It seems that we can place the new variable in the first place above,
since the two are not bonded together, or at least in less tightly
bonded than other candidates.

13's PGPROC is 856 bytes and has a 7 bytes gap after delayChkpt.

Versions Ealier than 13 have delayChkpt in PGXACT (12 bytes).  It is
tightly packed and dones't have a room for a new member.  Can we add
the new flag to PGPROC instead of PGXACT?
  
12 and 11's PGPROC is 848 bytes and has gaps:
   - 4 bytes after syncRepState
   - 3 bytes after procArrayGroupMember
   - 3 bytes after clogGroupMember
   - 4 bytes after clogGroupMemberPage
   - 3 bytes after fpVXIDLock


10's PGPROC is 816 bytes and has gaps:
   - 4 bytes after cvWaitLink
   - 4 bytes after syncRepState
   - 3 bytes after procArrayGroupMember
   - 3 bytes after fpVXIDLock

So if we don't want to move any member in PGPROC, we do:

14: after statusFlags.
13: after delayChkpt.
12-10: after syncRepState (and before syncRepLinks).

If we allow to shift some members, the new flag can be placed more
saner place.

14: after delayChkpt ((uint8)statusFlags moves forward by 1 byte)
13: after delayChkpt (no member moves)
12-10: after subxids ((bool)procArrayGroupMember moves forward by 1 byte)

I continue working on the last direction above.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: API stability

2022-04-05 Thread Kyotaro Horiguchi
At Tue, 5 Apr 2022 10:01:56 -0400, Robert Haas  wrote in 
> I think as the person who committed that patch I'm on the hook to fix
> this if nobody else would like to do it, but let me ask whether
> Kyotaro Horiguchi would like to propose a patch, since the original
> patch did, and/or whether you would like to propose a patch, as the
> person reporting the issue.

I'd like to do that. Let me see.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-05 Thread Tom Lane
Robert Haas  writes:
> On Tue, Apr 5, 2022 at 10:32 AM Tom Lane  wrote:
>> My point is that we want that to happen in HEAD, but it's not okay
>> for it to happen in a minor release of a stable branch.

> I understand, but I am not sure that I agree. I think that if an
> extension stops compiling against a back-branch, someone will notice
> the next time they try to compile it and will fix it. Maybe that's not
> amazing, but I don't think it's a huge deal either.

Well, perhaps it's not the end of the world, but it's still a large
PITA for the maintainer of such an extension.  They can't "just fix it"
because some percentage of their userbase will still need to compile
against older minor releases.  Nor have you provided any way to handle
that requirement via conditional compilation.

regards, tom lane




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-05 Thread Robert Haas
On Tue, Apr 5, 2022 at 10:32 AM Tom Lane  wrote:
> Robert Haas  writes:
> > On Tue, Apr 5, 2022 at 10:17 AM Tom Lane  wrote:
> >> Renaming it would constitute an API break, which is if anything worse
> >> than an ABI break.
>
> > I don't think so, because an API break will cause a compilation
> > failure, which an extension author can easily fix.
>
> My point is that we want that to happen in HEAD, but it's not okay
> for it to happen in a minor release of a stable branch.

I understand, but I am not sure that I agree. I think that if an
extension stops compiling against a back-branch, someone will notice
the next time they try to compile it and will fix it. Maybe that's not
amazing, but I don't think it's a huge deal either. On the other hand,
if existing builds that someone has already shipped stop working with
a new server release, that's a much larger issue. The extension
packager can't go back and retroactively add a dependency on the
server version to the already-shipped package. A new package can be
shipped and specify a minimum minor version with which it will work,
but an old package is what it is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-05 Thread Tom Lane
Robert Haas  writes:
> On Tue, Apr 5, 2022 at 10:17 AM Tom Lane  wrote:
>> Renaming it would constitute an API break, which is if anything worse
>> than an ABI break.

> I don't think so, because an API break will cause a compilation
> failure, which an extension author can easily fix.

My point is that we want that to happen in HEAD, but it's not okay
for it to happen in a minor release of a stable branch.

regards, tom lane




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-05 Thread Robert Haas
On Tue, Apr 5, 2022 at 10:17 AM Tom Lane  wrote:
> Renaming it would constitute an API break, which is if anything worse
> than an ABI break.

I don't think so, because an API break will cause a compilation
failure, which an extension author can easily fix.

> While we're complaining at you, let me point out that changing a field's
> content and semantics while not changing its name is a time bomb waiting
> to break any third-party code that looks at (or modifies...) the field.
>
> What I think you need to do is:
>
> 1. In the back branches, revert delayChkpt to its previous type and
> semantics.  Squeeze a separate delayChkptEnd bool in somewhere
> (you can't change the struct size either ...).
>
> 2. In HEAD, rename the field to something like delayChkptFlags,
> to ensure that any code touching it has to be inspected and updated.

Well, we can do it that way, I suppose.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-05 Thread Tom Lane
Robert Haas  writes:
> On Tue, Apr 5, 2022 at 9:02 AM Markus Wanner
>  wrote:
>> And for this specific case: Is it worth reverting this change and
>> applying a fully backwards compatible fix, instead?

> I think it's normally our policy to avoid changing definitions of
> accessible structs in back branches, except that we allow ourselves
> the indulgence of adding new members at the end or in padding space.
> So what would probably be best is if, in the back-branches, we changed
> "delayChkpt" back to a boolean, renamed it to delayChkptStart, and
> added a separate Boolean called delayChkptEnd. Maybe that could be
> added just after statusFlags, where I think it would fall into padding
> space.

Renaming it would constitute an API break, which is if anything worse
than an ABI break.

While we're complaining at you, let me point out that changing a field's
content and semantics while not changing its name is a time bomb waiting
to break any third-party code that looks at (or modifies...) the field.

What I think you need to do is:

1. In the back branches, revert delayChkpt to its previous type and
semantics.  Squeeze a separate delayChkptEnd bool in somewhere
(you can't change the struct size either ...).

2. In HEAD, rename the field to something like delayChkptFlags,
to ensure that any code touching it has to be inspected and updated.

In other words, this is already an API break in HEAD, and that's
fine, but it didn't break it hard enough to draw anyone's attention,
which is not fine.

regards, tom lane




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-05 Thread Robert Haas
On Tue, Apr 5, 2022 at 9:02 AM Markus Wanner
 wrote:
> And for this specific case: Is it worth reverting this change and
> applying a fully backwards compatible fix, instead?

I think it's normally our policy to avoid changing definitions of
accessible structs in back branches, except that we allow ourselves
the indulgence of adding new members at the end or in padding space.
So what would probably be best is if, in the back-branches, we changed
"delayChkpt" back to a boolean, renamed it to delayChkptStart, and
added a separate Boolean called delayChkptEnd. Maybe that could be
added just after statusFlags, where I think it would fall into padding
space.

I think as the person who committed that patch I'm on the hook to fix
this if nobody else would like to do it, but let me ask whether
Kyotaro Horiguchi would like to propose a patch, since the original
patch did, and/or whether you would like to propose a patch, as the
person reporting the issue.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-05 Thread Markus Wanner



On 24.03.22 20:32, Robert Haas wrote:

Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.


This patch changed the delayChkpt field of struct PGPROC from bool to 
int.  Back-porting this change could be considered an API breaking 
change for extensions using this field.


I'm not certain about padding behavior of compilers in general (or 
standards requirements around that), but at least on my machine, it 
seems sizeof(PGPROC) did not change, so padding led to subsequent fields 
still having the same offset.


Nonetheless, the meaning of the field itself changed.  And the 
additional assert now also triggers for the following pseudo-code of the 
extension I'm concerned about:


/*
 * Prevent checkpoints being emitted in between additional
 * information in the logical message and the following
 * prepare record.
 */
MyProc->delayChkpt = true;

LogLogicalMessage(...);

/* Note that this will also reset the delayChkpt flag. */
PrepareTransaction(...);


Now, I'm well aware this is not an official API, it just happens to be 
accessible for extensions.  So I guess the underlying question is:  What 
can extension developers expect?  Which parts are okay to change even in 
stable branches and which can be relied upon to remain stable?


And for this specific case: Is it worth reverting this change and 
applying a fully backwards compatible fix, instead?


Regards

Markus Wanner