On Wed, Mar 30, 2022 at 2:20 AM Anton A. Melnikov <aamelni...@inbox.ru> wrote: > > Can the test failures be encountered without such an elaborate setup? If > > not, > > then I don't really see why we need to do anything here? > > There was a real bug report from our test department. They do long time > repetitive tests and sometimes met this failure. > So i suppose there is a non-zero probability that such error can occur > in the one-shot test as well. > The sequence given in the first letter helps to catch this failure quickly.
I don't think that the idea of "extra" WAL records is very principled. It's pretty vague what "extra" means, and your definition seems to be basically "whatever would be needed to make this test case pass." I think the problem is basically with the test cases's idea that # of WAL records and # of table rows ought to be equal. I think that's just false. In general, we'd also have to worry about index insertions, which would provoke variable numbers of WAL records depending on whether they cause a page split. And we'd have to worry about TOAST table insertions, which could produce different numbers of records depending on the size of the data, the configured block size and TOAST threshold, and whether the TOAST table index incurs a page split. So even if we added a mechanism like what you propose here, we would only be fixing this particular test case, not creating infrastructure of any general utility. If it's true that this test case sometimes randomly fails, then we ought to fix that somehow, maybe by just removing this particular check from the test case, or changing it to >=, or something like that. But I don't think adding a new counter is the right idea. -- Robert Haas EDB: http://www.enterprisedb.com