On 3/7/22 22:11, Tomas Vondra wrote: > > > On 3/7/22 17:39, Tomas Vondra wrote: >> >> >> On 3/1/22 12:53, Amit Kapila wrote: >>> On Mon, Feb 28, 2022 at 5:16 PM Amit Kapila <amit.kapil...@gmail.com> wrote: >>>> >>>> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra >>>> <tomas.von...@enterprisedb.com> wrote: >>>>> >>>>> On 2/10/22 19:17, Tomas Vondra wrote: >>>>>> I've polished & pushed the first part adding sequence decoding >>>>>> infrastructure etc. Attached are the two remaining parts. >>>>>> >>>>>> I plan to wait a day or two and then push the test_decoding part. The >>>>>> last part (for built-in replication) will need more work and maybe >>>>>> rethinking the grammar etc. >>>>>> >>>>> >>>>> I've pushed the second part, adding sequences to test_decoding. >>>>> >>>> >>>> The test_decoding is failing randomly in the last few days. I am not >>>> completely sure but they might be related to this work. The two of >>>> these appears to be due to the same reason: >>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09 >>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-17%2015%3A17%3A07 >>>> >>>> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: >>>> "reorderbuffer.c", Line: 1173, PID: 35013) >>>> 0 postgres 0x00593de0 ExceptionalCondition + >>>> 160\\0 >>>> >>> >>> While reviewing the code for this, I noticed that in >>> sequence_decode(), we don't call ReorderBufferProcessXid to register >>> the first known lsn in WAL for the current xid. The similar functions >>> logicalmsg_decode() or heap_decode() do call ReorderBufferProcessXid >>> even if they decide not to queue or send the change. Is there a reason >>> for not doing the same here? However, I am not able to deduce any >>> scenario where lack of this will lead to such an Assertion failure. >>> Any thoughts? >>> >> >> Thanks, that seems like an omission. Will fix. >> > > I've pushed this simple fix. Not sure it'll fix the assert failures on > skink/locust, though. Given the lack of information it'll be difficult > to verify. So let's wait a bit. >
I've done about 5000 runs of 'make check' in test_decoding, on two rpi machines (one armv7, one aarch64). Not a single assert failure :-( How come skink/locust hit that in just a couple runs? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company