Hi, Given that I found several ways to spuriously trigger the stuck spinlock logic, I think we need to backpatch a fix.
On 2024-04-11 21:41:39 -0700, Andres Freund wrote: > Tracking the total amount of time spent sleeping doesn't require any > additional syscalls, due to the nanosleep()... I did quickly implement this, and it works nicely on unix. Unfortunately I couldn't find something similar for windows. Which would mean we'd need to record the time before every sleep. Another issue with that approach is that we'd need to add a field to SpinDelayStatus, which'd be an ABI break. Not sure there's any external code using it, but if there were, it'd be a fairly nasty, rarely hit, path. Afaict there's no padding that could be reused on 32 bit platforms [1]. I wonder if, for easy backpatching, the easiest solution is to just reset errno before calling pg_usleep(), and only increment status->delays if errno != EINTR. Given our pg_usleep() implementations, that'd preven the stuck spinlock logic from triggering too fast. If the ABI issue weren't there, we could record the current time the first time we sleep during a spinlock acquisition (there's an existing branch for that) and then only trigger the stuck spinlock detection when enough time has passed. That'd be fine from an efficiency POV. Even if we decide to do so, I don't think it'd be a good idea to remove the stuck logic alltogether in the backbranches. That might take out services that have been running ok-ish for a long time. Greetings, Andres Freund [1] But we are wasting a bunch of space on 64bit platforms due to switching between pointers and 32bit integers, should probably fix that.