On Fri, Sep 2, 2022 at 6:26 AM osumi.takami...@fujitsu.com <osumi.takami...@fujitsu.com> wrote: > > > I've met an assertion failure of logical decoding with below scenario on HEAD. > > --- > <preparation> > create table tab1 (val integer); > select 'init' from pg_create_logical_replication_slot('regression_slot', > 'test_decoding'); > > <session1> > begin; > savepoint sp1; > insert into tab1 values (1); > > <session2> > checkpoint; -- for RUNNING_XACT > select data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, > 'include-xids', '0', 'skip-empty-xacts', '1'); > > <session1> > truncate tab1; -- for NEW_CID > commit; > begin; > insert into tab1 values (3); >
BTW, if I just change the truncate statement to "Analyze tab1" in your entire test then I am getting a different assertion failure: postgres.exe!ExceptionalCondition(const char * conditionName, const char * errorType, const char * fileName, int lineNumber) Line 70 C postgres.exe!AssertTXNLsnOrder(ReorderBuffer * rb) Line 902 C postgres.exe!ReorderBufferTXNByXid(ReorderBuffer * rb, unsigned int xid, bool create, bool * is_new, unsigned __int64 lsn, bool create_as_top) Line 681 C postgres.exe!ReorderBufferAddNewTupleCids(ReorderBuffer * rb, unsigned int xid, unsigned __int64 lsn, RelFileLocator locator, ItemPointerData tid, unsigned int cmin, unsigned int cmax, unsigned int combocid) Line 3188 C postgres.exe!SnapBuildProcessNewCid(SnapBuild * builder, unsigned int xid, unsigned __int64 lsn, xl_heap_new_cid * xlrec) Line 823 C postgres.exe!heap2_decode(LogicalDecodingContext * ctx, XLogRecordBuffer * buf) Line 408 C postgres.exe!LogicalDecodingProcessRecord(LogicalDecodingContext * ctx, XLogReaderState * record) Line 119 C postgres.exe!pg_logical_slot_get_changes_guts(FunctionCallInfoBaseData * fcinfo, bool confirm, bool binary) Line 274 C postgres.exe!pg_logical_slot_get_changes(FunctionCallInfoBaseData * fcinfo) Line 339 C This is matching with call stack we see intermittently in the BF [1][2]. The difference with your scenario is that the Truncate statement generates an additional WAL XLOG_STANDBY_LOCK prior to XLOG_HEAP2_NEW_CID. I think we can fix this in the below ways: a. Assert(prev_first_lsn <= cur_txn->first_lsn); -- Explain in comments that it is possible when subtransaction and transaction are not previously logged as it happened in this scenario b. track txn of prev_first_lsn (say as prev_txn) and check if prev_txn's toptxn is the same as cur_txn or cur_txn's toptxn is the same as the prev_txn then perform assert mentioned in (a) else, keep the current Assert. It seems (b) will be more robust. Thoughts? Note: I added Sawada-San as sometime back we had an offlist discussion on this intermittent BF failure but we were not able to reach the exact test which can show this failure. [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-08-20%2002%3A45%3A34 [2] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09 -- With Regards, Amit Kapila.