On Fri, Aug 11, 2023 at 11:23 AM Andres Freund <and...@anarazel.de> wrote: > > Couldn't you say the same thing about defensive "can't happen" ERRORs? > > They are essentially a form of assertion that isn't limited to > > assert-enabled builds. > > Yes. A lot of them I hate them with the passion of a thousand suns ;). "Oh, > our transaction state machinery is confused. Yes, let's just continue going > through the same machinery again, that'll resolve it.".
I am not unsympathetic to Ashutosh's point about conventional ERRORs being easier to deal with when debugging your own code, during initial development work. But that seems like a problem with the tooling in other areas. For example, dealing with core dumps left behind by the regression tests can be annoying. Don't you also hate it when there's a regression.diffs that just shows 20k lines of subtractions? Perhaps you don't -- perhaps your custom setup makes it quick and easy to get relevant information about what actually went wrong. But it seems like that sort of thing could be easier to deal with by default, without using custom shell scripts or anything -- particularly for those of us that haven't been Postgres hackers for eons. > There've been people arguing in the past that it's good for robustness to do > stuff like that. I think it's exactly the opposite. > > Now, don't get me wrong, there are plenty cases where a "this shouldn't > happen" elog(ERROR) is appropriate... Right. They're not bad -- they just need to be backed up by some kind of reasoning, which will be particular to each case. The default approach should be to crash whenever any invariants are violated, because all bets are off at that point. > What are you imagining? Basically something that generates an elog(ERROR) with > the stringified expression as part of the error message? I'd probably start with a new elevel, that implied an assertion failure in assert-enabled builds but otherwise acted just like ERROR. I remember multiple cases where I added an "Assert(false)" right after a "can't happen" error, which isn't a great approach. It might also be useful to offer something along the lines you've described, which I was sort of thinking of myself. But now that I've thought about it a little more, I think that such an approach might end up being overused. If you're going to add what amounts to a "can't happen" ERROR then you should really be obligated to write a halfway reasonable error message. As I said, you should have to own the fact that you think it's better to not just crash for this one particular callsite, based on some fairly specific chain of reasoning. Ideally it'll be backed up by practical experience/user reports. -- Peter Geoghegan