Re: Improve comment on cid mapping (was Re: Adding CommandID to heap xlog records)
Hi, On 2023-06-26 09:57:56 +0300, Heikki Linnakangas wrote: > diff --git a/src/backend/replication/logical/snapbuild.c > b/src/backend/replication/logical/snapbuild.c > index 0786bb0ab7..e403feeccd 100644 > --- a/src/backend/replication/logical/snapbuild.c > +++ b/src/backend/replication/logical/snapbuild.c > @@ -41,10 +41,15 @@ > * transactions we need Snapshots that see intermediate versions of the > * catalog in a transaction. During normal operation this is achieved by > using > * CommandIds/cmin/cmax. The problem with that however is that for space > - * efficiency reasons only one value of that is stored > - * (cf. combocid.c). Since combo CIDs are only available in memory we log > - * additional information which allows us to get the original (cmin, cmax) > - * pair during visibility checks. Check the reorderbuffer.c's comment above > + * efficiency reasons, the cmin and cmax are not included in WAL records. We > + * cannot read the cmin/cmax from the tuple itself, either, because it is > + * reset on crash recovery. Even if we could, we could not decode combocids > + * which are only tracked in the original backend's memory. To work around > + * that, heapam writes an extra WAL record (XLOG_HEAP2_NEW_CID) every time a > + * catalog row is modified, which includes the cmin and cmax of the > + * tuple. During decoding, we insert the ctid->(cmin,cmax) mappings into the > + * reorder buffer, and use them at visibility checks instead of the cmin/cmax > + * on the tuple itself. Check the reorderbuffer.c's comment above > * ResolveCminCmaxDuringDecoding() for details. > * > * To facilitate all this we need our own visibility routine, as the normal > -- > 2.30.2 LGTM > From 9140a0d98fd21b595eac6d75521a6b1a9f1b Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas > Date: Mon, 26 Jun 2023 09:56:02 +0300 > Subject: [PATCH v2 2/2] Remove redundant check for fast_forward. > > We already checked for it earlier in the function. > > Discussion: > https://www.postgresql.org/message-id/1ba2899e-77f8-7866-79e5-f3b7d1251...@iki.fi > --- > src/backend/replication/logical/decode.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/src/backend/replication/logical/decode.c > b/src/backend/replication/logical/decode.c > index d91055a440..7039d425e2 100644 > --- a/src/backend/replication/logical/decode.c > +++ b/src/backend/replication/logical/decode.c > @@ -422,8 +422,7 @@ heap2_decode(LogicalDecodingContext *ctx, > XLogRecordBuffer *buf) > switch (info) > { > case XLOG_HEAP2_MULTI_INSERT: > - if (!ctx->fast_forward && > - SnapBuildProcessChange(builder, xid, > buf->origptr)) > + if (SnapBuildProcessChange(builder, xid, buf->origptr)) > DecodeMultiInsert(ctx, buf); > break; > case XLOG_HEAP2_NEW_CID: > -- > 2.30.2 LGTM^2 Greetings, Andres Freund
Improve comment on cid mapping (was Re: Adding CommandID to heap xlog records)
On 28/02/2023 15:52, Heikki Linnakangas wrote: So unfortunately I don't see much opportunity to simplify logical decoding with this. However, please take a look at the first two patches attached. They're tiny cleanups that make sense on their own. Rebased these small patches. I'll add this to the commitfest. -- Heikki Linnakangas Neon (https://neon.tech) From 66289440ac65ea386cd138aeeed27b0032c2bb80 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Mon, 26 Jun 2023 09:48:26 +0300 Subject: [PATCH v2 1/2] Improve comment on why we need ctid->(cmin,cmax) mapping. Combocids are only part of the problem. Explain the problem in more detail. Discussion: https://www.postgresql.org/message-id/1ba2899e-77f8-7866-79e5-f3b7d1251...@iki.fi --- src/backend/replication/logical/snapbuild.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 0786bb0ab7..e403feeccd 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -41,10 +41,15 @@ * transactions we need Snapshots that see intermediate versions of the * catalog in a transaction. During normal operation this is achieved by using * CommandIds/cmin/cmax. The problem with that however is that for space - * efficiency reasons only one value of that is stored - * (cf. combocid.c). Since combo CIDs are only available in memory we log - * additional information which allows us to get the original (cmin, cmax) - * pair during visibility checks. Check the reorderbuffer.c's comment above + * efficiency reasons, the cmin and cmax are not included in WAL records. We + * cannot read the cmin/cmax from the tuple itself, either, because it is + * reset on crash recovery. Even if we could, we could not decode combocids + * which are only tracked in the original backend's memory. To work around + * that, heapam writes an extra WAL record (XLOG_HEAP2_NEW_CID) every time a + * catalog row is modified, which includes the cmin and cmax of the + * tuple. During decoding, we insert the ctid->(cmin,cmax) mappings into the + * reorder buffer, and use them at visibility checks instead of the cmin/cmax + * on the tuple itself. Check the reorderbuffer.c's comment above * ResolveCminCmaxDuringDecoding() for details. * * To facilitate all this we need our own visibility routine, as the normal -- 2.30.2 From 9140a0d98fd21b595eac6d75521a6b1a9f1b Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Mon, 26 Jun 2023 09:56:02 +0300 Subject: [PATCH v2 2/2] Remove redundant check for fast_forward. We already checked for it earlier in the function. Discussion: https://www.postgresql.org/message-id/1ba2899e-77f8-7866-79e5-f3b7d1251...@iki.fi --- src/backend/replication/logical/decode.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index d91055a440..7039d425e2 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -422,8 +422,7 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) switch (info) { case XLOG_HEAP2_MULTI_INSERT: - if (!ctx->fast_forward && -SnapBuildProcessChange(builder, xid, buf->origptr)) + if (SnapBuildProcessChange(builder, xid, buf->origptr)) DecodeMultiInsert(ctx, buf); break; case XLOG_HEAP2_NEW_CID: -- 2.30.2
Re: Adding CommandID to heap xlog records
I took another stab at this from a different angle, and tried to use this to simplify logical decoding. The theory was that if we included the command ID in the WAL records, we wouldn't need the separate HEAP2_NEW_CID record anymore, and could remove much of the code in reorderbuffer.c that's concerned with tracking ctid->(cmin,cmax) mapping. Unfortunately, it didn't work out. Here's one problem: Insert with cmin 1 Commit Delete the same tuple with cmax 2. Abort Even if we store the cmin in the INSERT record, and set it on the tuple on replay, the DELETE overwrites it. That's OK for the original transactions, because they only look at the cmin/cmax of their own transaction, but it's a problem for logical decoding. If we see the inserted tuple during logical decoding, we need the cmin of the tuple. We could still just replace the HEAP2_NEW_CID records with the CIDs in the heap INSERT/UPDATE/DELETE records, and use that information to maintain the ctid->(cmin,cmax) mapping in reorderbuffer.c like we do today. But that doesn't really simplify reorderbuffer.c much. Attached is a patch for that, for the archives sake. Another problem with that is that logical decoding needs slightly different information than what we store on the tuples on disk. My original motivation for this was for Neon, which needs the WAL replay to restore the same CID as what's stored on disk, whether it's cmin, cmax or combocid. But for logical decoding, we need the cmin or cmax, *not* the combocid. To cater for both uses, we'd need to include both the original cmin/cmax and the possible combocid, which again makes it more complicated. So unfortunately I don't see much opportunity to simplify logical decoding with this. However, please take a look at the first two patches attached. They're tiny cleanups that make sense on their own. - Heikki From d666309a4cffc48e5091c0a97b8e2e7ec668f058 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Tue, 28 Feb 2023 13:22:40 +0200 Subject: [PATCH 1/4] Improve comment on why we need ctid->(cmin,cmax) mapping --- src/backend/replication/logical/snapbuild.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 62542827e4b..93755646564 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -41,10 +41,15 @@ * transactions we need Snapshots that see intermediate versions of the * catalog in a transaction. During normal operation this is achieved by using * CommandIds/cmin/cmax. The problem with that however is that for space - * efficiency reasons only one value of that is stored - * (cf. combocid.c). Since combo CIDs are only available in memory we log - * additional information which allows us to get the original (cmin, cmax) - * pair during visibility checks. Check the reorderbuffer.c's comment above + * efficiency reasons, the cmin and cmax are not included in WAL records. We + * cannot read the cmin/cmax from the tuple itself, either, because it is + * reset on crash recovery. Even if we could, we could not decode combocids + * which are only tracked in the original backend's memory. To work around + * that, heapam writes an extra WAL record (XLOG_HEAP2_NEW_CID) every time a + * catalog row is modified, which includes the cmin and cmax of the + * tuple. During decoding, we insert the ctid->(cmin,cmax) mappings into the + * reorder buffer, and use them at visibility checks instead of the cmin/cmax + * on the tuple itself. Check the reorderbuffer.c's comment above * ResolveCminCmaxDuringDecoding() for details. * * To facilitate all this we need our own visibility routine, as the normal -- 2.30.2 From 44435eb4121425f4babd9ed8847e6ddc137ba436 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Tue, 28 Feb 2023 12:55:49 +0200 Subject: [PATCH 2/4] Remove redundant check for fast_forward --- src/backend/replication/logical/decode.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 8fe7bb65f1f..d8962345da4 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -394,8 +394,7 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) switch (info) { case XLOG_HEAP2_MULTI_INSERT: - if (!ctx->fast_forward && -SnapBuildProcessChange(builder, xid, buf->origptr)) + if (SnapBuildProcessChange(builder, xid, buf->origptr)) DecodeMultiInsert(ctx, buf); break; case XLOG_HEAP2_NEW_CID: -- 2.30.2 From bc5103128fb3b6583a0b2b248fd205b59a347fe5 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Tue, 28 Feb 2023 13:02:56 +0200 Subject: [PATCH 3/4] Remove combocid field in logical decode that was just for debugging. (I don't think this is worth doing without the next patch) ---
Re: Adding CommandID to heap xlog records
On Mon, 16 Jan 2023 at 19:56, vignesh C wrote: > > On Thu, 3 Nov 2022 at 15:06, Ian Lawrence Barwick wrote: > > > > 2022年9月30日(金) 1:04 Matthias van de Meent : > > > > > > On Wed, 28 Sept 2022 at 19:40, Bruce Momjian wrote: > > > > > > > > On Thu, Sep 22, 2022 at 11:12:32PM +0200, Matthias van de Meent wrote: > > > > > On Thu, 8 Sept 2022 at 23:24, Tom Lane wrote: > > > > > > > > > > > > Matthias van de Meent writes: > > > > > > > Please find attached a patch that adds the CommandId of the > > > > > > > inserting > > > > > > > transaction to heap (batch)insert, update and delete records. It > > > > > > > is > > > > > > > based on the changes we made in the fork we maintain for Neon. > > > > > > > > > > > > This seems like a very significant cost increment with returns > > > > > > to only a minuscule number of users. We certainly cannot consider > > > > > > it unless you provide some evidence that that impression is wrong. > > > > > > > > > > Attached a proposed set of patches to reduce overhead of the inital > > > > > patch. > > > > > > > > This might be obvious to some, but the patch got a lot larger. :-( > > > > > > Sorry for that, but updating the field from which the redo manager > > > should pull its information does indeed touch a lot of files because > > > most users of xl_info are only interested in the 4 bits reserved for > > > the redo-manager. Most of 0001 is therefore updates to point code to > > > the new field in XLogRecord, and renaming the variables and arguments > > > from info to rminfo. > > > > > > [tangent] With that refactoring, I also clean up a lot of code that > > > was using a wrong macro/constant for rmgr flags; `info & > > > ~XLR_INFO_MASK` may have the same value as `info & > > > XLR_RMGR_INFO_MASK`, but that's only guaranteed by the documentation; > > > and would require the same significant rework if new bits were > > > assigned to non-XLR_INFO_MASK and non-XLR_RMGR_INFO_MASK. [/tangent] > > > > > > 0002 grew a bit as well, but not to a degree that I think is worrying > > > or otherwise impossible to review. > > > > Hi > > > > This entry was marked as "Needs review" in the CommitFest app but cfbot > > reports the patch no longer applies. > > > > We've marked it as "Waiting on Author". As CommitFest 2022-11 is > > currently underway, this would be an excellent time update the patch. > > > > Once you think the patchset is ready for review again, you (or any > > interested party) can move the patch entry forward by visiting > > > > https://commitfest.postgresql.org/40/3882/ > > > > and changing the status to "Needs review". > > I was not sure if you will be planning to post an updated version of > patch as the patch has been awaiting your attention from last > commitfest, please post an updated version for it soon or update the > commitfest entry accordingly. There has been no updates on this thread for some time, so this has been switched as Returned with Feedback. Feel free to open it in the next commitfest if you plan to continue on this. Regards, Vignesh
Re: Adding CommandID to heap xlog records
On Thu, 3 Nov 2022 at 15:06, Ian Lawrence Barwick wrote: > > 2022年9月30日(金) 1:04 Matthias van de Meent : > > > > On Wed, 28 Sept 2022 at 19:40, Bruce Momjian wrote: > > > > > > On Thu, Sep 22, 2022 at 11:12:32PM +0200, Matthias van de Meent wrote: > > > > On Thu, 8 Sept 2022 at 23:24, Tom Lane wrote: > > > > > > > > > > Matthias van de Meent writes: > > > > > > Please find attached a patch that adds the CommandId of the > > > > > > inserting > > > > > > transaction to heap (batch)insert, update and delete records. It is > > > > > > based on the changes we made in the fork we maintain for Neon. > > > > > > > > > > This seems like a very significant cost increment with returns > > > > > to only a minuscule number of users. We certainly cannot consider > > > > > it unless you provide some evidence that that impression is wrong. > > > > > > > > Attached a proposed set of patches to reduce overhead of the inital > > > > patch. > > > > > > This might be obvious to some, but the patch got a lot larger. :-( > > > > Sorry for that, but updating the field from which the redo manager > > should pull its information does indeed touch a lot of files because > > most users of xl_info are only interested in the 4 bits reserved for > > the redo-manager. Most of 0001 is therefore updates to point code to > > the new field in XLogRecord, and renaming the variables and arguments > > from info to rminfo. > > > > [tangent] With that refactoring, I also clean up a lot of code that > > was using a wrong macro/constant for rmgr flags; `info & > > ~XLR_INFO_MASK` may have the same value as `info & > > XLR_RMGR_INFO_MASK`, but that's only guaranteed by the documentation; > > and would require the same significant rework if new bits were > > assigned to non-XLR_INFO_MASK and non-XLR_RMGR_INFO_MASK. [/tangent] > > > > 0002 grew a bit as well, but not to a degree that I think is worrying > > or otherwise impossible to review. > > Hi > > This entry was marked as "Needs review" in the CommitFest app but cfbot > reports the patch no longer applies. > > We've marked it as "Waiting on Author". As CommitFest 2022-11 is > currently underway, this would be an excellent time update the patch. > > Once you think the patchset is ready for review again, you (or any > interested party) can move the patch entry forward by visiting > > https://commitfest.postgresql.org/40/3882/ > > and changing the status to "Needs review". I was not sure if you will be planning to post an updated version of patch as the patch has been awaiting your attention from last commitfest, please post an updated version for it soon or update the commitfest entry accordingly. Regards, Vignesh
Re: Adding CommandID to heap xlog records
2022年9月30日(金) 1:04 Matthias van de Meent : > > On Wed, 28 Sept 2022 at 19:40, Bruce Momjian wrote: > > > > On Thu, Sep 22, 2022 at 11:12:32PM +0200, Matthias van de Meent wrote: > > > On Thu, 8 Sept 2022 at 23:24, Tom Lane wrote: > > > > > > > > Matthias van de Meent writes: > > > > > Please find attached a patch that adds the CommandId of the inserting > > > > > transaction to heap (batch)insert, update and delete records. It is > > > > > based on the changes we made in the fork we maintain for Neon. > > > > > > > > This seems like a very significant cost increment with returns > > > > to only a minuscule number of users. We certainly cannot consider > > > > it unless you provide some evidence that that impression is wrong. > > > > > > Attached a proposed set of patches to reduce overhead of the inital patch. > > > > This might be obvious to some, but the patch got a lot larger. :-( > > Sorry for that, but updating the field from which the redo manager > should pull its information does indeed touch a lot of files because > most users of xl_info are only interested in the 4 bits reserved for > the redo-manager. Most of 0001 is therefore updates to point code to > the new field in XLogRecord, and renaming the variables and arguments > from info to rminfo. > > [tangent] With that refactoring, I also clean up a lot of code that > was using a wrong macro/constant for rmgr flags; `info & > ~XLR_INFO_MASK` may have the same value as `info & > XLR_RMGR_INFO_MASK`, but that's only guaranteed by the documentation; > and would require the same significant rework if new bits were > assigned to non-XLR_INFO_MASK and non-XLR_RMGR_INFO_MASK. [/tangent] > > 0002 grew a bit as well, but not to a degree that I think is worrying > or otherwise impossible to review. Hi This entry was marked as "Needs review" in the CommitFest app but cfbot reports the patch no longer applies. We've marked it as "Waiting on Author". As CommitFest 2022-11 is currently underway, this would be an excellent time update the patch. Once you think the patchset is ready for review again, you (or any interested party) can move the patch entry forward by visiting https://commitfest.postgresql.org/40/3882/ and changing the status to "Needs review". Thanks Ian Barwick
Re: Adding CommandID to heap xlog records
On Wed, 28 Sept 2022 at 19:40, Bruce Momjian wrote: > > On Thu, Sep 22, 2022 at 11:12:32PM +0200, Matthias van de Meent wrote: > > On Thu, 8 Sept 2022 at 23:24, Tom Lane wrote: > > > > > > Matthias van de Meent writes: > > > > Please find attached a patch that adds the CommandId of the inserting > > > > transaction to heap (batch)insert, update and delete records. It is > > > > based on the changes we made in the fork we maintain for Neon. > > > > > > This seems like a very significant cost increment with returns > > > to only a minuscule number of users. We certainly cannot consider > > > it unless you provide some evidence that that impression is wrong. > > > > Attached a proposed set of patches to reduce overhead of the inital patch. > > This might be obvious to some, but the patch got a lot larger. :-( Sorry for that, but updating the field from which the redo manager should pull its information does indeed touch a lot of files because most users of xl_info are only interested in the 4 bits reserved for the redo-manager. Most of 0001 is therefore updates to point code to the new field in XLogRecord, and renaming the variables and arguments from info to rminfo. [tangent] With that refactoring, I also clean up a lot of code that was using a wrong macro/constant for rmgr flags; `info & ~XLR_INFO_MASK` may have the same value as `info & XLR_RMGR_INFO_MASK`, but that's only guaranteed by the documentation; and would require the same significant rework if new bits were assigned to non-XLR_INFO_MASK and non-XLR_RMGR_INFO_MASK. [/tangent] 0002 grew a bit as well, but not to a degree that I think is worrying or otherwise impossible to review. Kind regards, Matthias van de Meent.
Re: Adding CommandID to heap xlog records
On Thu, Sep 22, 2022 at 11:12:32PM +0200, Matthias van de Meent wrote: > On Thu, 8 Sept 2022 at 23:24, Tom Lane wrote: > > > > Matthias van de Meent writes: > > > Please find attached a patch that adds the CommandId of the inserting > > > transaction to heap (batch)insert, update and delete records. It is > > > based on the changes we made in the fork we maintain for Neon. > > > > This seems like a very significant cost increment with returns > > to only a minuscule number of users. We certainly cannot consider > > it unless you provide some evidence that that impression is wrong. > > Attached a proposed set of patches to reduce overhead of the inital patch. This might be obvious to some, but the patch got a lot larger. :-( --- > contrib/pg_walinspect/pg_walinspect.c| 4 +- > src/backend/access/brin/brin_pageops.c | 16 +++--- > src/backend/access/brin/brin_xlog.c | 8 +-- > src/backend/access/gin/ginxlog.c | 6 +-- > src/backend/access/gist/gistxlog.c | 6 +-- > src/backend/access/hash/hash_xlog.c | 6 +-- > src/backend/access/heap/heapam.c | 40 +++ > src/backend/access/nbtree/nbtinsert.c| 18 +++ > src/backend/access/nbtree/nbtpage.c | 8 +-- > src/backend/access/nbtree/nbtxlog.c | 10 ++-- > src/backend/access/rmgrdesc/brindesc.c | 20 > src/backend/access/rmgrdesc/clogdesc.c | 10 ++-- > src/backend/access/rmgrdesc/committsdesc.c | 10 ++-- > src/backend/access/rmgrdesc/dbasedesc.c | 12 ++--- > src/backend/access/rmgrdesc/genericdesc.c| 2 +- > src/backend/access/rmgrdesc/gindesc.c| 8 +-- > src/backend/access/rmgrdesc/gistdesc.c | 8 +-- > src/backend/access/rmgrdesc/hashdesc.c | 8 +-- > src/backend/access/rmgrdesc/heapdesc.c | 46 - > src/backend/access/rmgrdesc/logicalmsgdesc.c | 8 +-- > src/backend/access/rmgrdesc/mxactdesc.c | 14 ++--- > src/backend/access/rmgrdesc/nbtdesc.c| 8 +-- > src/backend/access/rmgrdesc/relmapdesc.c | 8 +-- > src/backend/access/rmgrdesc/replorigindesc.c | 8 +-- > src/backend/access/rmgrdesc/seqdesc.c| 8 +-- > src/backend/access/rmgrdesc/smgrdesc.c | 10 ++-- > src/backend/access/rmgrdesc/spgdesc.c| 8 +-- > src/backend/access/rmgrdesc/standbydesc.c| 12 ++--- > src/backend/access/rmgrdesc/tblspcdesc.c | 10 ++-- > src/backend/access/rmgrdesc/xactdesc.c | 34 ++-- > src/backend/access/rmgrdesc/xlogdesc.c | 28 +- > src/backend/access/spgist/spgxlog.c | 6 +-- > src/backend/access/transam/clog.c| 8 +-- > src/backend/access/transam/commit_ts.c | 8 +-- > src/backend/access/transam/multixact.c | 48 - > src/backend/access/transam/twophase.c| 2 +- > src/backend/access/transam/xact.c| 36 +++-- > src/backend/access/transam/xlog.c| 34 ++-- > src/backend/access/transam/xloginsert.c | 31 --- > src/backend/access/transam/xlogprefetcher.c | 2 +- > src/backend/access/transam/xlogreader.c | 2 +- > src/backend/access/transam/xlogrecovery.c| 54 ++-- > src/backend/access/transam/xlogstats.c | 2 +- > src/backend/catalog/storage.c| 15 +++--- > src/backend/commands/dbcommands.c| 30 ++- > src/backend/commands/sequence.c | 6 +-- > src/backend/commands/tablespace.c| 8 +-- > src/backend/postmaster/autovacuum.c | 4 +- > src/backend/replication/logical/decode.c | 38 +++--- > src/backend/replication/logical/message.c| 6 +-- > src/backend/replication/logical/origin.c | 6 +-- > src/backend/storage/ipc/standby.c| 10 ++-- > src/backend/utils/cache/relmapper.c | 6 +-- > src/bin/pg_resetwal/pg_resetwal.c| 2 +- > src/bin/pg_rewind/parsexlog.c| 10 ++-- > src/bin/pg_waldump/pg_waldump.c | 6 +-- > src/include/access/brin_xlog.h | 2 +- > src/include/access/clog.h| 2 +- > src/include/access/ginxlog.h | 2 +- > src/include/access/gistxlog.h| 2 +- > src/include/access/hash_xlog.h | 2 +- > src/include/access/heapam_xlog.h | 4 +- > src/include/access/multixact.h | 3 +- > src/include/access/nbtxlog.h | 2 +- > src/include/access/spgxlog.h | 2 +- > src/include/access/xact.h| 6 +-- > src/include/access/xlog.h| 2 +- > src/include/access/xloginsert.h | 3 +- > src/include/access/xlogreader.h | 1 + > src/include/access/xlogrecord.h | 11 +--- > src/include/access/xlogstats.h | 2 +- >
Re: Adding CommandID to heap xlog records
Matthias van de Meent writes: > Please find attached a patch that adds the CommandId of the inserting > transaction to heap (batch)insert, update and delete records. It is > based on the changes we made in the fork we maintain for Neon. This seems like a very significant cost increment with returns to only a minuscule number of users. We certainly cannot consider it unless you provide some evidence that that impression is wrong. regards, tom lane
Adding CommandID to heap xlog records
Hi, The current WAL records generated by the Heap tableAM do not contain the command ID of the query that inserted/updated/deleted the records. The CID is not included in XLog because it is useful only to visibility checks in an active read/write transaction, which currently only appear in a primary node. In Neon [0], we're using XLog to reconstruct page state, as opposed to writing out dirty pages. This has the benefit of saving write IO for dirty page writes, but this does mean that we need the CID in heap insert/update/delete records to correctly mark the tuples, such that modified pages that are flushed from the buffer pool get reconstructed correctly. A more detailed write-up why we do this is here [1]. Neon does not need to be the only user of this API, as adding CID to xlog records also allows the primary to offload (partial) queries to a remote physical replica that would utilise the same transaction and snapshot of the primary. Right now, it's not possible to offload the read-component of RW queries to a secondary [2]. The attached patch would make multi-node transactions possible on systems with a single primary node and multiple read replicas, without the need for prepared commits and special extra code to achieve snapshot consistency, as a consistent snapshot could be copied and used by physical replicas (think parallel workers, but on a different server). Please find attached a patch that adds the CommandId of the inserting transaction to heap (batch)insert, update and delete records. It is based on the changes we made in the fork we maintain for Neon. Kind regards, Matthias van de Meent [0] https://github.com/neondatabase/neon/#neon [1] https://github.com/neondatabase/neon/blob/main/docs/core_changes.md#add-t_cid-to-heap-wal-records [2] At least not without blocking XLog replay of the primary transaction on the secondary, due to the same issues that Neon encountered: you need the CommandID to distinguish between this transactions' updates in the current command and previous commands. v1-0001-Add-cid-to-heap-xlog-records-that-insert-update-d.patch Description: Binary data