On Wed, May 13, 2020 at 3:10 PM Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > Hmm. I think we should (try to?) write code that avoids all crashes > with production builds, but not extend that to assertion failures.
Assertions are only a problem at all because Mark would like to write tests that involve a selection of truly corrupt data. That's a new requirement, and one that I have my doubts about. > > I'll stick with your example. You're calling > > TransactionIdDidCommit() from check_tuphdr_xids(), which will > > interrogate the commit log and pg_subtrans. It's just not under your > > control. > > in a production build this would just fail with an error that the > pg_xact file cannot be found, which is fine -- if this happens in a > production system, you're not disturbing any other sessions. Or maybe > the file is there and the byte can be read, in which case you would get > the correct response; but that's fine too. I think that this is fine, too, since I don't consider assertion failures with corrupt data all that important. I'd make some effort to avoid it, but not too much, and not at the expense of a useful general purpose assertion that could catch bugs in many different contexts. I would be willing to make a larger effort to avoid crashing a backend, since that affects production. I might go to some effort to not crash with downright adversarial inputs, for example. But it seems inappropriate to take extreme measures just to avoid a crash with extremely contrived inputs that will probably never occur. My sense is that this is subject to sharply diminishing returns. Completely nailing down hard crashes from corrupt data seems like the wrong priority, at the very least. Pursuing that objective over other objectives sounds like zero-risk bias. -- Peter Geoghegan