> Date: Tue, 11 Feb 2025 13:58:41 +0100 > From: Stephan <stephan...@googlemail.com> > > The only thing left to mention is that I am working on a 2 years old > snapshot of the source as syncing to upstream does not seem to be > easy.
Aha! It looks like netbsd-10 shipped with this bug, which was in HEAD between April 2020 and October 2023. The bug -- which is more a performance bug than a correctness bug, because it never loses wakeups but does cause more spurious wakeups -- was introduced in this commit in April 2020: https://mail-index.netbsd.org/source-changes/2020/04/10/msg116017.html That changed cv_signal so that it may wake more than one _interruptible_ waiter (cv_timedwait, cv_wait_sig, cv_timedwait_sig). It will still wake at most one _noninterruptible_ waiter (cv_wait). If you change your code to use cv_wait instead of cv_timedwait, I bet you'll observe that (just as an illustration -- obviously I expect you really need the timeout). Why would we have done this? The motivation for that commit was to allow users like zfs to do cv_broadcast(cv); cv_destroy(cv); and not crash. The reason this used to crash is that the cv_*wait* calls that can fail, like cv_timedwait, i.e., the interruptible ones, used to do something like this to avoid _losing_ explicit wakeups in case of interruption: error = sleepq_block(...); if (error) { /* pass the explicit wakeup along to the next thread */ cv_signal(cv); } Having cv_signal wake all the interruptible waiters (at least until it finds a noninterruptible one) obviates the need for interruptible waiters to touch the cv by calling cv_signal after wakeup to pass the explicit wakeup along. So, with that change, you can safely cv_destroy immediately after cv_broadcast. But the change has the side effect of waking more waiters than needed in some cases. This is another unfortunate consequence of the behaviour of the cv_*wait* routines where they can release the lock, sleep, re-acquire the lock, and then fail with a nonzero error -- leaving the caller with the confusing situation of having to re-check the condition _and_ consider the error code. If the cv_*wait* routines were all guaranteed to return zero in the event they had released and re-acquired the lock, this problem would go away. Unfortunately, while cv_wait_sig could work that way because signal-pending state is persistent (so when you call it in a loop, it'll notice the signal-pending state on the next iteration), cv_timedwait can't because it doesn't keep state. So we're stuck with complex internal logic to work around the hazards of the bad API of cv_timedwait. (A long time previously, I had locally patched zfs not to cv_broadcast/cv_destroy: https://mail-index.netbsd.org/source-changes/2012/10/15/msg037924.html But that local patch got lost at some point and it's a hefty maintenance burden.) The fix, committed in October 2023, was to do some careful bookkeeping in sleepq_remove so that sleepq_block never returns an error code from a legitimate wakeup: https://mail-index.netbsd.org/source-changes/2023/10/08/msg147974.html But that fix wasn't pulled up to netbsd-10 (and it might be hard, lots of other nearby changes to disentangle), so netbsd-10 still issues a lot of spurious wakeups for interruptible waits like cv_timedwait. This would all be easier if we ditched the awful cv_timedwait API! Even zfs mostly ignores the return value. The attached module is a simpler test for the bug (build with bsd.kmodule.mk, KMOD=cvsignal SRCS=cvsignal.c; change cv_wait to cv_timedwait or cv_wait_sig to taste).
/* $NetBSD$ */ /*- * Copyright (c) 2025 The NetBSD Foundation, Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #include <sys/cdefs.h> __KERNEL_RCSID(0, "$NetBSD$"); #include <sys/param.h> #include <sys/condvar.h> #include <sys/kthread.h> #include <sys/module.h> #include <sys/mutex.h> #include <sys/proc.h> MODULE(MODULE_CLASS_MISC, cvsignal, NULL); static bool done; static bool woken; static kmutex_t lock; static kcondvar_t cv; static struct lwp *sleeper[3]; static void doit(void *cookie) { mutex_enter(&lock); while (!done) cv_wait(&cv, &lock); printf("sleeper %ld: %swakeup\n", (long)curlwp->l_lid, woken ? "" : "first "); if (!woken) { woken = true; kpause("slppause", /*intr*/false, 2*hz, &lock); cv_broadcast(&cv); } mutex_exit(&lock); kthread_exit(0); } static int cvsignal_modcmd(modcmd_t cmd, void *arg __unused) { unsigned i; int error; switch (cmd) { case MODULE_CMD_INIT: mutex_init(&lock, MUTEX_DEFAULT, IPL_NONE); cv_init(&cv, "slpwait"); for (i = 0; i < __arraycount(sleeper); i++) { error = kthread_create(PRI_NONE, KTHREAD_MPSAFE|KTHREAD_MUSTJOIN, /*cpu*/NULL, &doit, NULL, &sleeper[i], "sleeper/%u", i); if (error) { printf("%s: kthread_create: %d\n", __func__, error); break; } } kpause("slpinit", /*intr*/false, hz, NULL); printf("waking one thread\n"); mutex_enter(&lock); done = true; cv_signal(&cv); mutex_exit(&lock); break; case MODULE_CMD_FINI: for (i = __arraycount(sleeper); i --> 0;) { if (sleeper[i]) (void)kthread_join(sleeper[i]); } cv_destroy(&cv); mutex_destroy(&lock); break; default: return ENOTTY; } return 0; }