Hi Ajin,
At 2026-01-29 11:33:59, "Ajin Cherian" <[email protected]> wrote: >I tested with patch 0002+0004 on HEAD, and the test added by patch >0002 is failing like below. > >not ok 15 - snapshot_build 289 ms ># (test process exited with exit code 1) > >I see the postgres crashed and when I look at the core file, I see the >below stack trace: >Core was generated by >`/home/ajin/postgresql/postgres3/postgres/tmp_install/home/ajin/install-oss/bin/postgres >'' '' '' '' '' '' '''. >Program terminated with signal SIGABRT, Aborted. >#0 0x00007436092c7e9c in __pthread_kill_implementation () from >/lib64/libc.so.6 >Missing rpms, try: dnf --enablerepo='*debug*' install >zlib-ng-compat-debuginfo-2.2.3-2.el10.x86_64 >glibc-debuginfo-2.39-38.el10.x86_64 >libicu-debuginfo-74.2-5.el10.x86_64 >libstdc++-debuginfo-14.3.1-2.1.el10.x86_64 >libgcc-debuginfo-14.3.1-2.1.el10.x86_64 >(gdb) bt >#0 0x00007436092c7e9c in __pthread_kill_implementation () from >/lib64/libc.so.6 >#1 0x0000743609271a96 in raise () from /lib64/libc.so.6 >#2 0x00007436092598fa in abort () from /lib64/libc.so.6 >#3 0x0000000000a00a93 in ExceptionalCondition >(conditionName=conditionName@entry=0xb1e039 "txn->ninvalidations == >0", fileName=fileName@entry=0xb1dcd8 "reorderbuffer.c", >lineNumber=lineNumber@entry=3207) at assert.c:65 >#4 0x0000000000836c94 in ReorderBufferForget (rb=0x3cf96a40, >xid=xid@entry=1136, lsn=34819568) at reorderbuffer.c:3207 >#5 0x0000000000823aea in DecodeCommit (ctx=ctx@entry=0x3cf869d0, >buf=buf@entry=0x7ffd069389a0, parsed=parsed@entry=0x7ffd069387d0, >xid=xid@entry=1136, two_phase=false) at decode.c:707 >#6 0x000000000082497c in xact_decode (ctx=ctx@entry=0x3cf869d0, >buf=buf@entry=0x7ffd069389a0) at decode.c:237 >#7 0x00000000008246eb in LogicalDecodingProcessRecord >(ctx=ctx@entry=0x3cf869d0, record=0x3cf86da8) at decode.c:116 >#8 0x0000000000829602 in DecodingContextFindStartpoint >(ctx=ctx@entry=0x3cf869d0) at logical.c:647 >#9 0x000000000084e8ea in create_logical_replication_slot >(name=name@entry=0x3ce8fb70 "isolation_slot", >plugin=plugin@entry=0x3ce8fc10 "test_decoding", >temporary=temporary@entry=false, two_phase=two_phase@entry=false, >failover=failover@entry=false, > restart_lsn=restart_lsn@entry=0, find_startpoint=true) at slotfuncs.c:177 >#10 0x000000000084f300 in pg_create_logical_replication_slot >(fcinfo=<optimized out>) at slotfuncs.c:207 >#11 0x00000000006c77c4 in ExecMakeTableFunctionResult >(setexpr=0x3cf61b18, econtext=0x3cf61968, argContext=<optimized out>, >expectedDesc=0x3cf79f48, randomAccess=false) at execSRF.c:234 >#12 0x00000000006da839 in FunctionNext (node=node@entry=0x3cf61758) at >nodeFunctionscan.c:94 > >(gdb) f 4 >#4 0x0000000000836c94 in ReorderBufferForget (rb=0x3cf96a40, >xid=xid@entry=1136, lsn=34819568) at reorderbuffer.c:3207 >3207 Assert(txn->ninvalidations == 0); >(gdb) p *txn >$1 = {txn_flags = 1, xid = 1136, toplevel_xid = 0, gid = 0x0, >first_lsn = 34814328, final_lsn = 34819568, end_lsn = 0, toptxn = 0x0, >restart_decoding_lsn = 0, origin_id = 0, origin_lsn = 0, {commit_time >= 0, prepare_time = 0, abort_time = 0}, base_snapshot = 0x0, > base_snapshot_lsn = 0, base_snapshot_node = {prev = 0x0, next = >0x0}, snapshot_now = 0x0, command_id = 4294967295, nentries = 14, >nentries_mem = 14, changes = {head = {prev = 0x3cfb5510, next = >0x3cfb4ae8}}, tuplecids = {head = {prev = 0x3cfb5440, next = >0x3cfb4a80}}, > ntuplecids = 13, tuplecid_hash = 0x0, toast_hash = 0x0, subtxns = >{head = {prev = 0x3cfb2c68, next = 0x3cfb2c68}}, nsubtxns = 0, >ninvalidations = 22, invalidations = 0x3cf96c50, >ninvalidations_distributed = 0, invalidations_distributed = 0x0, node >= {prev = 0x3cfb2b30, > next = 0x3cf96a48}, catchange_node = {prev = 0x3cf96a68, next = >0x3cf96a68}, txn_node = {first_child = 0x0, next_sibling = 0x0, >prev_or_parent = 0x0}, size = 1472, total_size = 1472, >output_plugin_private = 0x0} > >Looks like the assert in ReorderBufferForget failed because >ninvalidations is not 0. > >You need to build with asserts enabled. Oops, I forgot to add the --enable-cassert option. After adding it, I was able to reproduce the issue. I spent some time researching the code. Now, I am wonder about the relationship between the invalidation message and the base snapshot of that XID. 1) IIUC, current design requires that an XID with an invalidation message must have a base snapshot (see the handling in SnapBuildCommitTxn and ReorderBufferForget regarding base snapshots). This is consistent with the requirement when adding other regular changes. In that case, when adding an invalidation message, should we also apply the same checks as for other regular changes? Specifically, for transactions without enough information to decode, we would skip adding the invalidation message (see SnapBuildProcessChange). This should be fine because at that point no decoding has been done yet, so there is no cache to invalidate. Based on this idea, I have updated the 0003 and 0004 patches (see attached v7.1-0003 and v7.1-0004). 2) When executing an invalidation message with an XID, do we necessarily require that the XID has a base snapshot? I did not see a strict relationship between the two. In other words, when executing an invalidation message, we could potentially remove the check for xid->base_snapshot. From another perspective, before executing an invalidation message, the current logic already processes the base snapshot, and a base snapshot may be not present (see the handling in SnapBuildCommitTxn with if (needs_snapshot)). Based on this idea, I have updated the 0003 and 0004 patches (see attached v7.2-0003 and v7.2-0004). Regards, Haiyang Li
v7.1-0003-Track-transaction-committed-in-BUILDING_SNAPSHOT.patch
Description: Binary data
v7.1-0004-Track-transaction-committed-in-BUILDING_SNAPSHOT.patch
Description: Binary data
v7.2-0003-Track-transaction-committed-in-BUILDING_SNAPSHOT.patch
Description: Binary data
v7.2-0004-Track-transaction-committed-in-BUILDING_SNAPSHOT.patch
Description: Binary data
