Hi,
This patch adds support for passing NULL as the ident when calling
tsleep(9) etc. When this happens, sleep_setup() will use the address
of the sleep_state struct as the value for p_wchan. This address is
basically always a private value so the thread should never receive a
wakeup(9) broadcast.
Why do we want this? Sometimes there is no logical ident to sleep on.
Sometimes there is legitimately no reason to receive a wakeup(9).
In the past, people have handrolled private channels to work around
this situation. The code often looks like this:
void
foo(void)
{
int chan;
tsleep(&chan, ...);
}
Permitting the use of NULL and letting the implementation choose a
private channel is better than handrolling a private channel for two
reasons:
1. We save a bit of stack space. tsleep(9) etc. already have a
sleep_state struct on the stack and it's at a private address
so there is no space cost to use it.
2. The NULL clearly communicates the author's intent to the reader.
It indicates the author had no wakeup channel in mind when they
wrote the code. The reader then doesn't need to reason about
whether or not the ident value is superfluous. Poring over
a file (or several) to determine whether any thread ever calls
wakeup(9) on a given ident sucks.
FreeBSD/NetBSD have a dedicated interface for this "sleep without a
wakeup channel" operation. They call it "pause". I proposed adding
it but I got mixed feedback on the patch. Then mpi@ proposed this
idea. I think this is simpler and better.
The actual implementation requires just a few small changes to
sleep_setup().
I've added an additional KASSERT to each of tsleep(9), msleep(9), and
rwsleep(9). You now need at least one of (a) an ident or (b) PCATCH
or (c) a timeout, otherwise there is no way to get the thread started
again. This would indicate a programmer error and we should panic if
it ever happens.
I've documented the new NULL ident behavior in tsleep.9.
Also included here is a sample user, sys_nanosleep(). nanosleep(2)
wakes up due to interruption by signal or timeout. It should never be
awoken with wakeup(9). Up until now we had a private channel on the
stack. Now we can just pass NULL. It's simpler.
There are a bunch of other potential users but they can wait until a
later patch.
I'm running with this now so I'm pretty sure this is a sound change.
Feel free to test it out. nanosleep(2) gets called all the time so if
there was an issue I imagine it'd show up pretty quickly.
Thoughts? ok?
Index: share/man/man9/tsleep.9
===================================================================
RCS file: /cvs/src/share/man/man9/tsleep.9,v
retrieving revision 1.15
diff -u -p -r1.15 tsleep.9
--- share/man/man9/tsleep.9 20 Mar 2020 03:37:09 -0000 1.15
+++ share/man/man9/tsleep.9 18 Dec 2020 19:40:04 -0000
@@ -144,8 +144,11 @@ to the resource for which the process is
The same identifier must be used in a call to
.Fn wakeup
to get the process going again.
+If the thread does not want to receive any
+.Fn wakeup
+broadcasts,
.Fa ident
-should not be
+should be
.Dv NULL .
.It Fa priority
The process priority to be used when the process is awakened and put on
Index: sys/kern/kern_synch.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.172
diff -u -p -r1.172 kern_synch.c
--- sys/kern/kern_synch.c 7 Dec 2020 16:55:29 -0000 1.172
+++ sys/kern/kern_synch.c 18 Dec 2020 19:40:04 -0000
@@ -119,6 +119,7 @@ tsleep(const volatile void *ident, int p
#endif
KASSERT((priority & ~(PRIMASK | PCATCH)) == 0);
+ KASSERT(ident != NULL || ISSET(priority, PCATCH) || timo != 0);
#ifdef MULTIPROCESSOR
KASSERT(timo || _kernel_lock_held());
@@ -214,6 +215,7 @@ msleep(const volatile void *ident, struc
KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0);
KASSERT(mtx != NULL);
+ KASSERT(ident != NULL || ISSET(priority, PCATCH) || timo != 0);
if (priority & PCATCH)
KERNEL_ASSERT_LOCKED();
@@ -301,6 +303,7 @@ rwsleep(const volatile void *ident, stru
int error, status;
KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0);
+ KASSERT(ident != NULL || ISSET(priority, PCATCH) || timo != 0);
rw_assert_anylock(rwl);
status = rw_status(rwl);
@@ -351,8 +354,6 @@ sleep_setup(struct sleep_state *sls, con
#ifdef DIAGNOSTIC
if (p->p_flag & P_CANTSLEEP)
panic("sleep: %s failed insomnia", p->p_p->ps_comm);
- if (ident == NULL)
- panic("tsleep: no ident");
if (p->p_stat != SONPROC)
panic("tsleep: not SONPROC");
#endif
@@ -378,11 +379,23 @@ sleep_setup(struct sleep_state *sls, con
TRACEPOINT(sched, sleep, NULL);
- p->p_wchan = ident;
+ /*
+ * If ident is NULL the caller does not want to receive
+ * wakeup(9) broadcasts. However, p_wchan cannot actually
+ * be NULL until the thread is awoken or the sleep is aborted.
+ * As a workaround, we use the address of the sleep_state
+ * struct as the wakeup channel. It is effectively always
+ * a private value so the thread should never receive any
+ * broadcasts.
+ */
+ p->p_wchan = (ident != NULL) ? ident : sls;
+
p->p_wmesg = wmesg;
p->p_slptime = 0;
p->p_slppri = prio & PRIMASK;
- TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_runq);
+
+ /* NB: The queue is chosen based on wchan, not ident. */
+ TAILQ_INSERT_TAIL(&slpque[LOOKUP(p->p_wchan)], p, p_runq);
}
int
Index: sys/kern/kern_time.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_time.c,v
retrieving revision 1.150
diff -u -p -r1.150 kern_time.c
--- sys/kern/kern_time.c 10 Nov 2020 17:26:54 -0000 1.150
+++ sys/kern/kern_time.c 18 Dec 2020 19:40:04 -0000
@@ -269,7 +269,6 @@ sys_clock_getres(struct proc *p, void *v
int
sys_nanosleep(struct proc *p, void *v, register_t *retval)
{
- static int chan;
struct sys_nanosleep_args/* {
syscallarg(const struct timespec *) rqtp;
syscallarg(struct timespec *) rmtp;
@@ -294,7 +293,7 @@ sys_nanosleep(struct proc *p, void *v, r
do {
getnanouptime(&start);
nsecs = MAX(1, MIN(TIMESPEC_TO_NSEC(&request), MAXTSLP));
- error = tsleep_nsec(&chan, PWAIT | PCATCH, "nanosleep", nsecs);
+ error = tsleep_nsec(NULL, PWAIT | PCATCH, "nanosleep", nsecs);
getnanouptime(&stop);
timespecsub(&stop, &start, &elapsed);
timespecsub(&request, &elapsed, &request);