Re: How to wait on cv_timedwait() correctly

Taylor R Campbell Tue, 11 Feb 2025 06:52:36 -0800

> Date: Tue, 11 Feb 2025 13:58:41 +0100
> From: Stephan <stephan...@googlemail.com>
> 
> The only thing left to mention is that I am working on a 2 years old
> snapshot of the source as syncing to upstream does not seem to be
> easy.


Aha!  It looks like netbsd-10 shipped with this bug, which was in HEAD
between April 2020 and October 2023.


The bug -- which is more a performance bug than a correctness bug,
because it never loses wakeups but does cause more spurious wakeups --
was introduced in this commit in April 2020:

https://mail-index.netbsd.org/source-changes/2020/04/10/msg116017.html

That changed cv_signal so that it may wake more than one
_interruptible_ waiter (cv_timedwait, cv_wait_sig, cv_timedwait_sig).
It will still wake at most one _noninterruptible_ waiter (cv_wait).

If you change your code to use cv_wait instead of cv_timedwait, I bet
you'll observe that (just as an illustration -- obviously I expect you
really need the timeout).


Why would we have done this?  The motivation for that commit was to
allow users like zfs to do

        cv_broadcast(cv);
        cv_destroy(cv);

and not crash.  The reason this used to crash is that the cv_*wait*
calls that can fail, like cv_timedwait, i.e., the interruptible ones,
used to do something like this to avoid _losing_ explicit wakeups in
case of interruption:

        error = sleepq_block(...);
        if (error) {
                /* pass the explicit wakeup along to the next thread */
                cv_signal(cv);
        }

Having cv_signal wake all the interruptible waiters (at least until it
finds a noninterruptible one) obviates the need for interruptible
waiters to touch the cv by calling cv_signal after wakeup to pass the
explicit wakeup along.  So, with that change, you can safely
cv_destroy immediately after cv_broadcast.  But the change has the
side effect of waking more waiters than needed in some cases.


This is another unfortunate consequence of the behaviour of the
cv_*wait* routines where they can release the lock, sleep, re-acquire
the lock, and then fail with a nonzero error -- leaving the caller
with the confusing situation of having to re-check the condition _and_
consider the error code.

If the cv_*wait* routines were all guaranteed to return zero in the
event they had released and re-acquired the lock, this problem would
go away.  Unfortunately, while cv_wait_sig could work that way because
signal-pending state is persistent (so when you call it in a loop,
it'll notice the signal-pending state on the next iteration),
cv_timedwait can't because it doesn't keep state.  So we're stuck with
complex internal logic to work around the hazards of the bad API of
cv_timedwait.


(A long time previously, I had locally patched zfs not
to cv_broadcast/cv_destroy:

https://mail-index.netbsd.org/source-changes/2012/10/15/msg037924.html

But that local patch got lost at some point and it's a hefty
maintenance burden.)


The fix, committed in October 2023, was to do some careful bookkeeping
in sleepq_remove so that sleepq_block never returns an error code from
a legitimate wakeup:

https://mail-index.netbsd.org/source-changes/2023/10/08/msg147974.html

But that fix wasn't pulled up to netbsd-10 (and it might be hard, lots
of other nearby changes to disentangle), so netbsd-10 still issues a
lot of spurious wakeups for interruptible waits like cv_timedwait.

This would all be easier if we ditched the awful cv_timedwait API!
Even zfs mostly ignores the return value.


The attached module is a simpler test for the bug (build with
bsd.kmodule.mk, KMOD=cvsignal SRCS=cvsignal.c; change cv_wait to
cv_timedwait or cv_wait_sig to taste).

/*      $NetBSD$        */

/*-
 * Copyright (c) 2025 The NetBSD Foundation, Inc.
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
 * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
 * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
 * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 * POSSIBILITY OF SUCH DAMAGE.
 */

#include <sys/cdefs.h>
__KERNEL_RCSID(0, "$NetBSD$");

#include <sys/param.h>

#include <sys/condvar.h>
#include <sys/kthread.h>
#include <sys/module.h>
#include <sys/mutex.h>
#include <sys/proc.h>

MODULE(MODULE_CLASS_MISC, cvsignal, NULL);

static bool done;
static bool woken;
static kmutex_t lock;
static kcondvar_t cv;
static struct lwp *sleeper[3];

static void
doit(void *cookie)
{

        mutex_enter(&lock);
        while (!done)
                cv_wait(&cv, &lock);
        printf("sleeper %ld: %swakeup\n", (long)curlwp->l_lid,
            woken ? "" : "first ");
        if (!woken) {
                woken = true;
                kpause("slppause", /*intr*/false, 2*hz, &lock);
                cv_broadcast(&cv);
        }
        mutex_exit(&lock);

        kthread_exit(0);
}

static int
cvsignal_modcmd(modcmd_t cmd, void *arg __unused)
{
        unsigned i;
        int error;

        switch (cmd) {
        case MODULE_CMD_INIT:
                mutex_init(&lock, MUTEX_DEFAULT, IPL_NONE);
                cv_init(&cv, "slpwait");
                for (i = 0; i < __arraycount(sleeper); i++) {
                        error = kthread_create(PRI_NONE,
                            KTHREAD_MPSAFE|KTHREAD_MUSTJOIN, /*cpu*/NULL,
                            &doit, NULL, &sleeper[i], "sleeper/%u", i);
                        if (error) {
                                printf("%s: kthread_create: %d\n", __func__,
                                    error);
                                break;
                        }
                }
                kpause("slpinit", /*intr*/false, hz, NULL);
                printf("waking one thread\n");
                mutex_enter(&lock);
                done = true;
                cv_signal(&cv);
                mutex_exit(&lock);
                break;
        case MODULE_CMD_FINI:
                for (i = __arraycount(sleeper); i --> 0;) {
                        if (sleeper[i])
                                (void)kthread_join(sleeper[i]);
                }
                cv_destroy(&cv);
                mutex_destroy(&lock);
                break;
        default:
                return ENOTTY;
        }

        return 0;
}

Re: How to wait on cv_timedwait() correctly

Reply via email to