Re: logical decoding and replication of sequences, take 2

Ashutosh Bapat Thu, 14 Dec 2023 07:45:01 -0800

On Thu, Dec 14, 2023 at 2:51 PM Amit Kapila <[email protected]> wrote:
>
> On Thu, Dec 14, 2023 at 2:45 PM Ashutosh Bapat
> <[email protected]> wrote:
> >
> > On Thu, Dec 14, 2023 at 12:37 PM Dilip Kumar <[email protected]> wrote:
> > >
> > > I think you forgot to attach the patch.
> >
> > Sorry. Here it is.
> >
> > On Thu, Dec 14, 2023 at 2:36 PM Amit Kapila <[email protected]> wrote:
> > >
> > > >
> > > It looks like the solution works. But this is the only place where we
> > > process a change before SNAPSHOT reaches FULL. But this is also the
> > > only record which affects a decision to queue/not a following change.
> > > So it should be ok. The sequence_hash'es as separate for each
> > > transaction and they are cleaned when processing COMMIT record.
> > > >
> > >
> > > But it is possible that even commit or abort also happens before the
> > > snapshot reaches full state in which case the hash table will have
> > > stale or invalid (for aborts) entries. That will probably be cleaned
> > > at a later point by running_xact records.
> >
> > Why would cleaning wait till running_xact records? Won't txn entry
> > itself be removed when processing commit/abort record? At the same the
> > sequence hash will be cleaned as well.
> >
> > > Now, I think in theory, it
> > > is possible that the same RelFileLocator can again be allocated before
> > > we clean up the existing entry which can probably confuse the system.
> >
> > How? The transaction allocating the first time would be cleaned before
> > it happens the second time. So shouldn't matter.
> >
>
> It can only be cleaned if we process it but xact_decode won't allow us
> to process it and I don't think it would be a good idea to add another
> hack for sequences here. See below code:
>
> xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
> {
> SnapBuild  *builder = ctx->snapshot_builder;
> ReorderBuffer *reorder = ctx->reorder;
> XLogReaderState *r = buf->record;
> uint8 info = XLogRecGetInfo(r) & XLOG_XACT_OPMASK;
>
> /*
> * If the snapshot isn't yet fully built, we cannot decode anything, so
> * bail out.
> */
> if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
> return;


That may be true for a transaction which is decoded, but I think all
the transactions which are added to ReorderBuffer should be cleaned up
once they have been processed irrespective of whether they are
decoded/sent downstream or not. In this case I see the sequence hash
being cleaned up for the sequence related transaction in Hayato's
reproducer. See attached patch with a diagnostic change and the output
below (notice sequence cleanup called on transaction 767).
2023-12-14 21:06:36.756 IST [386957] LOG:  logical decoding found
initial starting point at 0/15B2F68
2023-12-14 21:06:36.756 IST [386957] DETAIL:  Waiting for transactions
(approximately 1) older than 767 to end.
2023-12-14 21:06:36.756 IST [386957] STATEMENT:  SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 21:07:05.679 IST [386957] LOG:  XXX: smgr_decode. snapshot
is SNAPBUILD_BUILDING_SNAPSHOT
2023-12-14 21:07:05.679 IST [386957] STATEMENT:  SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 21:07:05.679 IST [386957] LOG:  XXX: seq_decode. snapshot
is SNAPBUILD_BUILDING_SNAPSHOT
2023-12-14 21:07:05.679 IST [386957] STATEMENT:  SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 21:07:05.679 IST [386957] LOG:  XXX: skipped
2023-12-14 21:07:05.679 IST [386957] STATEMENT:  SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 21:07:05.710 IST [386957] LOG:  logical decoding found
initial consistent point at 0/15B3388
2023-12-14 21:07:05.710 IST [386957] DETAIL:  Waiting for transactions
(approximately 1) older than 768 to end.
2023-12-14 21:07:05.710 IST [386957] STATEMENT:  SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 21:07:39.292 IST [386298] LOG:  checkpoint starting: time
2023-12-14 21:07:40.919 IST [386957] LOG:  XXX: seq_decode. snapshot
is SNAPBUILD_FULL_SNAPSHOT
2023-12-14 21:07:40.919 IST [386957] STATEMENT:  SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 21:07:40.919 IST [386957] LOG:  XXX: the sequence is transactional
2023-12-14 21:07:40.919 IST [386957] STATEMENT:  SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 21:07:40.919 IST [386957] LOG:  sequence cleanup called on
transaction 767
2023-12-14 21:07:40.919 IST [386957] STATEMENT:  SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 21:07:40.919 IST [386957] LOG:  logical decoding found
consistent point at 0/15B3518
2023-12-14 21:07:40.919 IST [386957] DETAIL:  There are no running transactions.
2023-12-14 21:07:40.919 IST [386957] STATEMENT:  SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');

We see similar output when pg_logical_slot_get_changes() is called.

I haven't found the code path from where the sequence cleanup gets
called. But it's being called. Am I missing something?

-- 
Best Wishes,
Ashutosh Bapat

diff --git a/src/backend/replication/logical/decode.c 
b/src/backend/replication/logical/decode.c
index 73e38cafd8..8e2ebbd8e0 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -1543,10 +1543,15 @@ smgr_decode(LogicalDecodingContext *ctx, 
XLogRecordBuffer *buf)
        elog(LOG, "XXX: smgr_decode. snapshot is %s", 
SnapBuildIdentify(builder));
 
        /*
-        * If we don't have snapshot or we are just fast-forwarding, there is no
-        * point in decoding relfilenode information.
+        * If we are not building snapshot yet or we are just fast-forwarding, 
there
+        * is no point in decoding relfilenode information.  If the sequence
+        * associated with relfilenode here is changed in the same transaction 
but
+        * after snapshot was built, the relfilenode needs to be present in the
+        * sequence hash table so that the change can be deemed as 
transactional.
+        * Otherwise it will not be queued. Hence we process this change even 
before
+        * we have built snapshot.
         */
-       if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
+       if (SnapBuildCurrentState(builder) < SNAPBUILD_BUILDING_SNAPSHOT ||
                ctx->fast_forward)
        {
                elog(LOG, "XXX: skipped");
diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index f7d9a26cfc..25a742fef0 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1078,6 +1078,8 @@ ReorderBufferSequenceCleanup(ReorderBuffer *rb, 
ReorderBufferTXN *txn)
                                                HASH_REMOVE, NULL) == NULL)
                        elog(ERROR, "hash table corrupted");
        }
+
+       elog(LOG, "sequence cleanup called on transaction %d", txn->xid);
 }
 
 /*

Re: logical decoding and replication of sequences, take 2

Reply via email to