Hi Andres, hi all, Thanks a lot for the advice.
> My position is basically: > > 1) We should *never* add new long-duration polling loops to postgres. We've > regretted it every time. It just ends up masking bugs and biting us in > scenarios we didn't predict (increased wakeups increasing power usage, > increased latency because our more eager wakeup mechanisms were racy). > > 2) We should try rather hard to not even have any new very short lived polling > code. The existing code in XactLockTableWait() isn't great, even on the > primary, but the window during the polling addresses is really short, so > it's *kinda* acceptable. I’m not familiar with the historical problems that polling has caused, it does seem that explicit waiting is generally more efficient in its own right. > 3) There are many ways to address the XactLockTableWait() issue here. One way > would be to simply make XactLockTableWait() work on standbys, by > maintaining the lock table. Another would be to teach it to add some > helper to procarray.c that allows XactLockTableWait() to work based on the > KnownAssignedXids machinery. > > I don't have a clear preference for how to make this work in a non-polling > way. But it's clear to me that making it poll smarter is the completely wrong > direction. > > Greetings, > > Andres Freund I’ve tried to replace polling with waiting using KnownAssignedXids mechanisms. What changed 1. Each XID now has a small hash-table entry with a condition variable. 2. XactLockTableWait() on a standby registers on that CV instead of polling. 3. Whenever a transaction (or sub-xid) is pruned from KnownAssignedXids we call WakeXidWaiters(), which broadcasts to the exact XID’s CV. Feedback welcome. Best, Xuneng
v5-0001-Replace-polling-with-waiting-in-XactLockTableWait.patch
Description: Binary data