On Thu, Dec 14, 2023 at 2:51 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Dec 14, 2023 at 2:45 PM Ashutosh Bapat > <ashutosh.bapat....@gmail.com> wrote: > > > > On Thu, Dec 14, 2023 at 12:37 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > > > I think you forgot to attach the patch. > > > > Sorry. Here it is. > > > > On Thu, Dec 14, 2023 at 2:36 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > > > > > It looks like the solution works. But this is the only place where we > > > process a change before SNAPSHOT reaches FULL. But this is also the > > > only record which affects a decision to queue/not a following change. > > > So it should be ok. The sequence_hash'es as separate for each > > > transaction and they are cleaned when processing COMMIT record. > > > > > > > > > > But it is possible that even commit or abort also happens before the > > > snapshot reaches full state in which case the hash table will have > > > stale or invalid (for aborts) entries. That will probably be cleaned > > > at a later point by running_xact records. > > > > Why would cleaning wait till running_xact records? Won't txn entry > > itself be removed when processing commit/abort record? At the same the > > sequence hash will be cleaned as well. > > > > > Now, I think in theory, it > > > is possible that the same RelFileLocator can again be allocated before > > > we clean up the existing entry which can probably confuse the system. > > > > How? The transaction allocating the first time would be cleaned before > > it happens the second time. So shouldn't matter. > > > > It can only be cleaned if we process it but xact_decode won't allow us > to process it and I don't think it would be a good idea to add another > hack for sequences here. See below code: > > xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) > { > SnapBuild *builder = ctx->snapshot_builder; > ReorderBuffer *reorder = ctx->reorder; > XLogReaderState *r = buf->record; > uint8 info = XLogRecGetInfo(r) & XLOG_XACT_OPMASK; > > /* > * If the snapshot isn't yet fully built, we cannot decode anything, so > * bail out. > */ > if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) > return;
That may be true for a transaction which is decoded, but I think all the transactions which are added to ReorderBuffer should be cleaned up once they have been processed irrespective of whether they are decoded/sent downstream or not. In this case I see the sequence hash being cleaned up for the sequence related transaction in Hayato's reproducer. See attached patch with a diagnostic change and the output below (notice sequence cleanup called on transaction 767). 2023-12-14 21:06:36.756 IST [386957] LOG: logical decoding found initial starting point at 0/15B2F68 2023-12-14 21:06:36.756 IST [386957] DETAIL: Waiting for transactions (approximately 1) older than 767 to end. 2023-12-14 21:06:36.756 IST [386957] STATEMENT: SELECT pg_create_logical_replication_slot('slot', 'test_decoding'); 2023-12-14 21:07:05.679 IST [386957] LOG: XXX: smgr_decode. snapshot is SNAPBUILD_BUILDING_SNAPSHOT 2023-12-14 21:07:05.679 IST [386957] STATEMENT: SELECT pg_create_logical_replication_slot('slot', 'test_decoding'); 2023-12-14 21:07:05.679 IST [386957] LOG: XXX: seq_decode. snapshot is SNAPBUILD_BUILDING_SNAPSHOT 2023-12-14 21:07:05.679 IST [386957] STATEMENT: SELECT pg_create_logical_replication_slot('slot', 'test_decoding'); 2023-12-14 21:07:05.679 IST [386957] LOG: XXX: skipped 2023-12-14 21:07:05.679 IST [386957] STATEMENT: SELECT pg_create_logical_replication_slot('slot', 'test_decoding'); 2023-12-14 21:07:05.710 IST [386957] LOG: logical decoding found initial consistent point at 0/15B3388 2023-12-14 21:07:05.710 IST [386957] DETAIL: Waiting for transactions (approximately 1) older than 768 to end. 2023-12-14 21:07:05.710 IST [386957] STATEMENT: SELECT pg_create_logical_replication_slot('slot', 'test_decoding'); 2023-12-14 21:07:39.292 IST [386298] LOG: checkpoint starting: time 2023-12-14 21:07:40.919 IST [386957] LOG: XXX: seq_decode. snapshot is SNAPBUILD_FULL_SNAPSHOT 2023-12-14 21:07:40.919 IST [386957] STATEMENT: SELECT pg_create_logical_replication_slot('slot', 'test_decoding'); 2023-12-14 21:07:40.919 IST [386957] LOG: XXX: the sequence is transactional 2023-12-14 21:07:40.919 IST [386957] STATEMENT: SELECT pg_create_logical_replication_slot('slot', 'test_decoding'); 2023-12-14 21:07:40.919 IST [386957] LOG: sequence cleanup called on transaction 767 2023-12-14 21:07:40.919 IST [386957] STATEMENT: SELECT pg_create_logical_replication_slot('slot', 'test_decoding'); 2023-12-14 21:07:40.919 IST [386957] LOG: logical decoding found consistent point at 0/15B3518 2023-12-14 21:07:40.919 IST [386957] DETAIL: There are no running transactions. 2023-12-14 21:07:40.919 IST [386957] STATEMENT: SELECT pg_create_logical_replication_slot('slot', 'test_decoding'); We see similar output when pg_logical_slot_get_changes() is called. I haven't found the code path from where the sequence cleanup gets called. But it's being called. Am I missing something? -- Best Wishes, Ashutosh Bapat
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 73e38cafd8..8e2ebbd8e0 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -1543,10 +1543,15 @@ smgr_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) elog(LOG, "XXX: smgr_decode. snapshot is %s", SnapBuildIdentify(builder)); /* - * If we don't have snapshot or we are just fast-forwarding, there is no - * point in decoding relfilenode information. + * If we are not building snapshot yet or we are just fast-forwarding, there + * is no point in decoding relfilenode information. If the sequence + * associated with relfilenode here is changed in the same transaction but + * after snapshot was built, the relfilenode needs to be present in the + * sequence hash table so that the change can be deemed as transactional. + * Otherwise it will not be queued. Hence we process this change even before + * we have built snapshot. */ - if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT || + if (SnapBuildCurrentState(builder) < SNAPBUILD_BUILDING_SNAPSHOT || ctx->fast_forward) { elog(LOG, "XXX: skipped"); diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index f7d9a26cfc..25a742fef0 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -1078,6 +1078,8 @@ ReorderBufferSequenceCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn) HASH_REMOVE, NULL) == NULL) elog(ERROR, "hash table corrupted"); } + + elog(LOG, "sequence cleanup called on transaction %d", txn->xid); } /*