On Mon, Nov 4, 2019 at 5:22 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Oct 30, 2019 at 9:38 AM vignesh C <vignes...@gmail.com> wrote: > > > > On Tue, Oct 22, 2019 at 10:52 PM Tomas Vondra > > <tomas.von...@2ndquadrant.com> wrote: > > > > > > I think the patch should do the simplest thing possible, i.e. what it > > > does today. Otherwise we'll never get it committed. > > > > > I found a couple of crashes while reviewing and testing flushing of > > open transaction data: > > > > Thanks for doing these tests. However, I don't think these issues are > anyway related to this patch. It seems to be base code issues > manifested by this patch. See my analysis below. > > > Issue 1: > > #0 0x00007f22c5722337 in raise () from /lib64/libc.so.6 > > #1 0x00007f22c5723a28 in abort () from /lib64/libc.so.6 > > #2 0x0000000000ec5390 in ExceptionalCondition > > (conditionName=0x10ea814 "!dlist_is_empty(head)", errorType=0x10ea804 > > "FailedAssertion", > > fileName=0x10ea7e0 "../../../../src/include/lib/ilist.h", > > lineNumber=458) at assert.c:54 > > #3 0x0000000000b4fb91 in dlist_tail_element_off (head=0x19e4db8, > > off=64) at ../../../../src/include/lib/ilist.h:458 > > #4 0x0000000000b546d0 in ReorderBufferAbortOld (rb=0x191b6b0, > > oldestRunningXid=3834) at reorderbuffer.c:1966 > > #5 0x0000000000b3ca03 in DecodeStandbyOp (ctx=0x19af990, > > buf=0x7ffcbc26dc50) at decode.c:332 > > > > This seems to be the problem of base code where we abort immediately > after serializing the changes because in that case, the changes list > will be empty. I think you can try to reproduce it via the debugger > or by hacking the code such that it serializes after every change and > then if you abort after one change, it should hit this problem. > > > > > Issue 2: > > #0 0x00007f1d7ddc4337 in raise () from /lib64/libc.so.6 > > #1 0x00007f1d7ddc5a28 in abort () from /lib64/libc.so.6 > > #2 0x0000000000ec4e1d in ExceptionalCondition > > (conditionName=0x10ead30 "txn->final_lsn != InvalidXLogRecPtr", > > errorType=0x10ea284 "FailedAssertion", > > fileName=0x10ea2d0 "reorderbuffer.c", lineNumber=3052) at assert.c:54 > > #3 0x0000000000b577e0 in ReorderBufferRestoreCleanup (rb=0x2ae36b0, > > txn=0x2bafb08) at reorderbuffer.c:3052 > > #4 0x0000000000b52b1c in ReorderBufferCleanupTXN (rb=0y x2ae36b0, > > txn=0x2bafb08) at reorderbuffer.c:1318 > > #5 0x0000000000b5279d in ReorderBufferCleanupTXN (rb=0x2ae36b0, > > txn=0x2b9d778) at reorderbuffer.c:1257 > > #6 0x0000000000b5475c in ReorderBufferAbortOld (rb=0x2ae36b0, > > oldestRunningXid=3835) at reorderbuffer.c:1973 > > > > This seems to be again the problem with base code as we don't update > the final_lsn for subtransactions during ReorderBufferAbortOld. This > can also be reproduced with some hacking in code or via debugger in a > similar way as explained for the previous problem but with a > difference that there must be subtransaction involved in this case. > > > #7 0x0000000000b3ca03 in DecodeStandbyOp (ctx=0x2b676d0, > > buf=0x7ffcbc74cc00) at decode.c:332 > > #8 0x0000000000b3c208 in LogicalDecodingProcessRecord (ctx=0x2b676d0, > > record=0x2b67990) at decode.c:121 > > #9 0x0000000000b70b2b in XLogSendLogical () at walsender.c:2845 > > > > These failures come randomly. > > I'm not able to reproduce this issue with simple test case. > > Yeah, it appears to be difficult to reproduce unless you hack the code > to serialize every change or use debugger to forcefully flush the > changes every time. >
Thanks Amit for your analysis, I was able to reproduce the above issue consistently by making some code changes and with help of debugger. I did one change so that it flushes every time instead of flushing after the buffer size exceeds the logical_decoding_work_mem, attached one of the transactions and called abort. When the server restarts after abort, this problem occurs consistently. I could reproduce the issue with base code also. It seems like this issue is not an issue of 0001-Add-logical_decoding_work_mem-to-limit-ReorderBuffer patch and exists from base code. I will post the issue in hackers with details. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com