On 12/15/23 03:33, Amit Kapila wrote: > On Thu, Dec 14, 2023 at 9:14 PM Ashutosh Bapat > <ashutosh.bapat....@gmail.com> wrote: >> >> On Thu, Dec 14, 2023 at 2:51 PM Amit Kapila <amit.kapil...@gmail.com> wrote: >>> >>> It can only be cleaned if we process it but xact_decode won't allow us >>> to process it and I don't think it would be a good idea to add another >>> hack for sequences here. See below code: >>> >>> xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) >>> { >>> SnapBuild *builder = ctx->snapshot_builder; >>> ReorderBuffer *reorder = ctx->reorder; >>> XLogReaderState *r = buf->record; >>> uint8 info = XLogRecGetInfo(r) & XLOG_XACT_OPMASK; >>> >>> /* >>> * If the snapshot isn't yet fully built, we cannot decode anything, so >>> * bail out. >>> */ >>> if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) >>> return; >> >> That may be true for a transaction which is decoded, but I think all >> the transactions which are added to ReorderBuffer should be cleaned up >> once they have been processed irrespective of whether they are >> decoded/sent downstream or not. In this case I see the sequence hash >> being cleaned up for the sequence related transaction in Hayato's >> reproducer. >> > > It was because the test you are using was not designed to show the > problem I mentioned. In this case, the rollback was after a full > snapshot state was reached. >
Right, I haven't tried to reproduce this, but it very much looks like we the entry would not be removed if the xact aborts/commits before the snapshot reaches FULL state. I suppose one way to deal with this would be to first check if an entry for the same relfilenode exists. If it does, the original transaction must have terminated, but we haven't cleaned it up yet - in which case we can just "move" the relfilenode to the new one. However, can't that happen even with full snapshots? I mean, let's say a transaction creates a relfilenode and terminates without writing an abort record (surely that's possible, right?). And then another xact comes and generates the same relfilenode (presumably that's unlikely, but perhaps possible?). Aren't we in pretty much the same situation, until the next RUNNING_XACTS cleans up the hash table? I think tracking all relfilenodes would fix the original issue (with treating some changes as transactional), and the tweak that "moves" the relfilenode to the new xact would fix this other issue too. That being said, I feel a bit uneasy about it, for similar reasons as Amit. If we start processing records before full snapshot, that seems like moving the assumptions a bit. For example it means we'd create ReorderBufferTXN entries for cases that'd have skipped before. OTOH this is (or should be) only a very temporary period while starting the replication, I believe. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company