Thanks again Taylor for pointing that out -- even if some aspects are beyond my use-case and more related to the condvar API itself.
I did not make too much progress but I have added a note in the code referring to this thread, explaining the issue. I am also transitioning the code to managing the waits as discussed here. The wakeup issue as it is now may also introduce erroneous behaviour to the semaphore API, but I am pretty sure this would be good on the current codebase. Stephan Am Di., 11. Feb. 2025 um 15:52 Uhr schrieb Taylor R Campbell <riastr...@netbsd.org>: > > > Date: Tue, 11 Feb 2025 13:58:41 +0100 > > From: Stephan <stephan...@googlemail.com> > > > > The only thing left to mention is that I am working on a 2 years old > > snapshot of the source as syncing to upstream does not seem to be > > easy. > > Aha! It looks like netbsd-10 shipped with this bug, which was in HEAD > between April 2020 and October 2023. > > > The bug -- which is more a performance bug than a correctness bug, > because it never loses wakeups but does cause more spurious wakeups -- > was introduced in this commit in April 2020: > > https://mail-index.netbsd.org/source-changes/2020/04/10/msg116017.html > > That changed cv_signal so that it may wake more than one > _interruptible_ waiter (cv_timedwait, cv_wait_sig, cv_timedwait_sig). > It will still wake at most one _noninterruptible_ waiter (cv_wait). > > If you change your code to use cv_wait instead of cv_timedwait, I bet > you'll observe that (just as an illustration -- obviously I expect you > really need the timeout). > > > Why would we have done this? The motivation for that commit was to > allow users like zfs to do > > cv_broadcast(cv); > cv_destroy(cv); > > and not crash. The reason this used to crash is that the cv_*wait* > calls that can fail, like cv_timedwait, i.e., the interruptible ones, > used to do something like this to avoid _losing_ explicit wakeups in > case of interruption: > > error = sleepq_block(...); > if (error) { > /* pass the explicit wakeup along to the next thread */ > cv_signal(cv); > } > > Having cv_signal wake all the interruptible waiters (at least until it > finds a noninterruptible one) obviates the need for interruptible > waiters to touch the cv by calling cv_signal after wakeup to pass the > explicit wakeup along. So, with that change, you can safely > cv_destroy immediately after cv_broadcast. But the change has the > side effect of waking more waiters than needed in some cases. > > > This is another unfortunate consequence of the behaviour of the > cv_*wait* routines where they can release the lock, sleep, re-acquire > the lock, and then fail with a nonzero error -- leaving the caller > with the confusing situation of having to re-check the condition _and_ > consider the error code. > > If the cv_*wait* routines were all guaranteed to return zero in the > event they had released and re-acquired the lock, this problem would > go away. Unfortunately, while cv_wait_sig could work that way because > signal-pending state is persistent (so when you call it in a loop, > it'll notice the signal-pending state on the next iteration), > cv_timedwait can't because it doesn't keep state. So we're stuck with > complex internal logic to work around the hazards of the bad API of > cv_timedwait. > > > (A long time previously, I had locally patched zfs not > to cv_broadcast/cv_destroy: > > https://mail-index.netbsd.org/source-changes/2012/10/15/msg037924.html > > But that local patch got lost at some point and it's a hefty > maintenance burden.) > > > The fix, committed in October 2023, was to do some careful bookkeeping > in sleepq_remove so that sleepq_block never returns an error code from > a legitimate wakeup: > > https://mail-index.netbsd.org/source-changes/2023/10/08/msg147974.html > > But that fix wasn't pulled up to netbsd-10 (and it might be hard, lots > of other nearby changes to disentangle), so netbsd-10 still issues a > lot of spurious wakeups for interruptible waits like cv_timedwait. > > This would all be easier if we ditched the awful cv_timedwait API! > Even zfs mostly ignores the return value. > > > The attached module is a simpler test for the bug (build with > bsd.kmodule.mk, KMOD=cvsignal SRCS=cvsignal.c; change cv_wait to > cv_timedwait or cv_wait_sig to taste).