On Fri, Apr 12, 2024 at 3:33 PM Andres Freund <and...@anarazel.de> wrote:
> Here's a patch implementing this approach. I confirmed that before we trigger
> the stuck spinlock logic very quickly and after we don't. However, if most
> sleeps are interrupted, it can delay the stuck spinlock detection a good
> bit. But that seems much better than triggering it too quickly.

+1 for doing something about this. I'm not sure if it goes far enough,
but it definitely seems much better than doing nothing. Given your
findings, I'm honestly kind of surprised that I haven't seen problems
of this type more frequently. And I think the general idea of not
counting the waits if they're interrupted makes sense. Sure, it's not
going to be 100% accurate, but it's got to be way better for the timer
to trigger too slowly than too quickly. Perhaps that's too glib of me,
given that I'm not sure we should even have a timer, but even if we
stipulate that the panic is useful in some cases, spurious panics are
still really bad.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Reply via email to