Hi, hackers

I've met an assertion failure of logical decoding with below scenario on HEAD.

---
<preparation>
create table tab1 (val integer);
select 'init' from  pg_create_logical_replication_slot('regression_slot', 
'test_decoding');

<session1>
begin;
savepoint sp1;
insert into tab1 values (1);

<session2>
checkpoint; -- for RUNNING_XACT
select data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1');

<session1>
truncate tab1; -- for NEW_CID
commit;
begin;
insert into tab1 values (3);

<session2>
checkpoint;
select data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1');

<session1>
commit;

<session2>

select data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1');
---


Here, it's not a must but is advisable to make LOG_SNAPSHOT_INTERVAL_MS bigger 
so that
we can issue RUNNING_XACT according to our checkpoint commands explicitly.

In the above scenario, the first checkpoint generates RUNNING_XACT after the 
wal record
(for ReorderBufferAssignChild) that associates sub transaction with its top 
transaction.
This means that once we restart from RUNNING_XACT, we lose the association 
between top
transaction and sub transaction and then we can't mark the top transaction as 
catalog
modifying transaction by decoding NEW_CID (written after RUNNING_XACT), if the
sub transaction changes the catalog.

Therefore, this leads to the failure for the assert that can check
the consistency that when one sub transaction modifies the catalog,
its top transaction should be marked so as well.

I feel we need to remember the relationship between top transaction and sub 
transaction
in the serialized snapshot even before changing catalog at decoding 
RUNNING_XACT,
so that we can keep track of the association after the restart. What do you 
think ?


The stack call of this failure and related information is below.

(gdb) bt
#0  0x00007f2632588387 in raise () from /lib64/libc.so.6
#1  0x00007f2632589a78 in abort () from /lib64/libc.so.6
#2  0x0000000000b3eba1 in ExceptionalCondition (conditionName=0xd137e0 
"!needs_snapshot || needs_timetravel",
    errorType=0xd130c5 "FailedAssertion", fileName=0xd130b9 "snapbuild.c", 
lineNumber=1116) at assert.c:69
#3  0x0000000000911257 in SnapBuildCommitTxn (builder=0x23f0638, lsn=22386632, 
xid=728, nsubxacts=1,
    subxacts=0x2bfcc88, xinfo=79) at snapbuild.c:1116
#4  0x00000000008fa420 in DecodeCommit (ctx=0x23e0108, buf=0x7fff4a1f9220, 
parsed=0x7fff4a1f9020, xid=728,
    two_phase=false) at decode.c:630
#5  0x00000000008f9953 in xact_decode (ctx=0x23e0108, buf=0x7fff4a1f9220) at 
decode.c:216
#6  0x00000000008f967d in LogicalDecodingProcessRecord (ctx=0x23e0108, 
record=0x23e04a0) at decode.c:119
#7  0x0000000000900b63 in pg_logical_slot_get_changes_guts (fcinfo=0x23d80a8, 
confirm=true, binary=false)
    at logicalfuncs.c:271
#8  0x0000000000900ca0 in pg_logical_slot_get_changes (fcinfo=0x23d80a8) at 
logicalfuncs.c:338
...
(gdb) frame 3
#3  0x0000000000911257 in SnapBuildCommitTxn (builder=0x23f0638, lsn=22386632, 
xid=728, nsubxacts=1,
    subxacts=0x2bfcc88, xinfo=79) at snapbuild.c:1116
1116            Assert(!needs_snapshot || needs_timetravel);
(gdb) list
1111            {
1112                    /* record that we cannot export a general snapshot 
anymore */
1113                    builder->committed.includes_all_transactions = false;
1114            }
1115
1116            Assert(!needs_snapshot || needs_timetravel);
1117
1118            /*
1119             * Adjust xmax of the snapshot builder, we only do that for 
committed,
1120             * catalog modifying, transactions, everything else isn't 
interesting for



Best Regards,
        Takamichi Osumi



Reply via email to