On Fri, Apr 12, 2024 at 3:33 PM Andres Freund <and...@anarazel.de> wrote: > Here's a patch implementing this approach. I confirmed that before we trigger > the stuck spinlock logic very quickly and after we don't. However, if most > sleeps are interrupted, it can delay the stuck spinlock detection a good > bit. But that seems much better than triggering it too quickly.
+1 for doing something about this. I'm not sure if it goes far enough, but it definitely seems much better than doing nothing. Given your findings, I'm honestly kind of surprised that I haven't seen problems of this type more frequently. And I think the general idea of not counting the waits if they're interrupted makes sense. Sure, it's not going to be 100% accurate, but it's got to be way better for the timer to trigger too slowly than too quickly. Perhaps that's too glib of me, given that I'm not sure we should even have a timer, but even if we stipulate that the panic is useful in some cases, spurious panics are still really bad. -- Robert Haas EDB: http://www.enterprisedb.com