On Sun, Sep 4, 2022 at 11:10 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > On 9/4/22 16:08, Tomas Vondra wrote: > > ... > > > > so in fact we *know* 849 is a subxact of 848, but we don't call > > ReorderBufferAssignChild in this case. In fact we can't even do the > > assignment easily in this case, because we create the subxact first, so > > that the crash happens right when we attempt to create the toplevel one, > > and we never even get a chance to do the assignment: > > > > 1) process the NEW_CID record, logged for 849 (subxact) > > 2) process CIDs in the WAL record, which has topleve_xid 848 > > > > > > So IMHO we need to figure out what to do for WAL records that create > > both the toplevel and subxact - either we need to skip them, or rethink > > how we create the ReorderBufferTXN structs. > > > > This fixes the crash for me, by adding a ReorderBufferAssignChild call > to SnapBuildProcessNewCid, and tweaking ReorderBufferAssignChild to > ensure we don't try to create the top xact before updating the subxact > and removing it from the toplevel_by_lsn list. > > Essentially, what's happening is this: > > 1) We read the NEW_CID record, which is logged with XID 849, i.e. the > subxact. But we don't know it's a subxact, so we create it as a > top-level xact with the LSN. > > 2) We start processing contents of the NEW_CID, which however has info > that 849 is subxact of 848, calls ReorderBufferAddNewTupleCids which > promptly does ReorderBufferTXNByXid() with the top-level XID, which > creates it with the same LSN, and crashes because of the assert. > > I'm not sure what's the right/proper way to fix this ... > > The problem is ReorderBufferAssignChild was coded in a way that did not > expect the subxact to be created first (as a top-level xact). >
I think there was a previously hard-coded way to detect that and we have removed it in commit (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e3ff789acfb2754cd7b5e87f6f4463fd08e35996). I think it is possible that subtransaction gets logged without previous top-level txn record as shown in the commit shared. -- With Regards, Amit Kapila.