Hi, On 2023-08-12 07:51:09 +1200, Thomas Munro wrote: > Oh, I see what's happening. Maybe commit b91dd9de wasn't the best > idea, but bc971f4025c broke an assumption, since it doesn't use > ConditionVariableSleep(). That is confusing the signal forwarding > logic which expects to find our entry in the wait list in the common > case.
Hm, I guess I got confused by the cv code once more. I thought that ConditionVariableCancelSleep() would wake us up anyway, because once we return from ConditionVariableSleep(), we'd be off the list. But I now realize (and I think not for the first time), that ConditionVariableSleep() always puts us *back* on the list. Leaving aside the issue in this thread, isn't always adding us back into the list bad from a contention POV alone - it doubles the write traffic on the CV and is guaranteed to cause contention for ConditionVariableBroadcast(). I wonder if this explains some performance issues I've seen in the past. What if we instead reset cv_sleep_target once we've been taken off the list? I think it'd not be too hard to make it safe to do the proclist_contains() without the spinlock. Lwlocks have something similar, there we solve it by this sequence: proclist_delete(&wakeup, iter.cur, lwWaitLink); /* * Guarantee that lwWaiting being unset only becomes visible once the * unlink from the link has completed. Otherwise the target backend * could be woken up for other reason and enqueue for a new lock - if * that happens before the list unlink happens, the list would end up * being corrupted. * * The barrier pairs with the LWLockWaitListLock() when enqueuing for * another lock. */ pg_write_barrier(); waiter->lwWaiting = LW_WS_NOT_WAITING; PGSemaphoreUnlock(waiter->sem); I guess this means we'd need something like lwWaiting for CVs as well. > From a85b2827f4500bc2e7c533feace474bc47086dfa Mon Sep 17 00:00:00 2001 > From: Thomas Munro <thomas.mu...@gmail.com> > Date: Sat, 12 Aug 2023 07:06:08 +1200 > Subject: [PATCH] De-pessimize ConditionVariableCancelSleep(). > > Commit b91dd9de was concerned with a theoretical problem with our > non-atomic condition variable operations. If you stop sleeping, and > then cancel the sleep in a separate step, you might be signaled in > between, and that could be lost. That doesn't matter for callers of > ConditionVariableBroadcast(), but callers of ConditionVariableSignal() > might be upset if a signal went missing like this. FWIW I suspect at least some of the places that'd have a problem without that forwarding, might also be racy with it.... > New idea: ConditionVariableCancelSleep() can just return true if you've > been signaled. Hypothetical users of ConditionVariableSignal() would > then still have a way to deal with rare lost signals if they are > concerned about that problem. Sounds like a plan to me. Greetings, Andres Freund