Hi Andres, hi all,

Thanks a lot for the advice.

> My position is basically:
>
> 1) We should *never* add new long-duration polling loops to postgres. We've
>    regretted it every time. It just ends up masking bugs and biting us in
>    scenarios we didn't predict (increased wakeups increasing power usage,
>    increased latency because our more eager wakeup mechanisms were racy).
>
> 2) We should try rather hard to not even have any new very short lived polling
>    code.  The existing code in XactLockTableWait() isn't great, even on the
>    primary, but the window during the polling addresses is really short, so
>    it's *kinda* acceptable.

I’m not familiar with the historical problems that polling has caused,
it does seem that explicit waiting is generally more efficient in its
own right.

> 3) There are many ways to address the XactLockTableWait() issue here. One way
>    would be to simply make XactLockTableWait() work on standbys, by
>    maintaining the lock table.  Another would be to teach it to add some
>    helper to procarray.c that allows XactLockTableWait() to work based on the
>    KnownAssignedXids machinery.
>
> I don't have a clear preference for how to make this work in a non-polling
> way. But it's clear to me that making it poll smarter is the completely wrong
> direction.
>
> Greetings,
>
> Andres Freund

I’ve tried to replace polling with waiting using KnownAssignedXids mechanisms.

What changed
1. Each XID now has a small hash-table entry with a condition variable.
2. XactLockTableWait() on a standby registers on that CV instead of polling.
3. Whenever a transaction (or sub-xid) is pruned from
KnownAssignedXids we call WakeXidWaiters(), which broadcasts to the
exact XID’s CV.

Feedback welcome.

Best,
Xuneng

Attachment: v5-0001-Replace-polling-with-waiting-in-XactLockTableWait.patch
Description: Binary data

Reply via email to