Module Name: src Committed By: thorpej Date: Sun Oct 10 18:07:52 UTC 2021
Modified Files: src/sys/kern: kern_event.c kern_exec.c kern_exit.c kern_fork.c src/sys/sys: event.h eventvar.h proc.h Log Message: Changes to make EVFILT_PROC MP-safe: Because the locking protocol around processes is somewhat complex compared to other events that can be posted on kqueues, introduce new functions for posting NOTE_EXEC, NOTE_EXIT, and NOTE_FORK, rather than just using the generic knote() function. These functions KASSERT() their locking expectations, and deal with other complexities for each situation. knote_proc_fork(), in particiular, needs to handle NOTE_TRACK, which requires allocation of a new knote to attach to the child process. We don't want to be allocating memory while holding the parent's p_lock. Furthermore, we also have to attach the tracking note to the child process, which means we have to acquire the child's p_lock. So, to handle all this, we introduce some additional synchronization infrastructure around the 'knote' structure: - Add the ability to mark a knote as being in a state of flux. Knotes in this state are guaranteed not to be detached/deleted, thus allowing a code path drop other locks after putting a knote in this state. - Code paths that wish to detach/delete a knote must first check if the knote is in-flux. If so, they must wait for it to quiesce. Because multiple threads of execution may attempt this concurrently, a mechanism exists for a single LWP to claim the detach responsibility; all other threads simply wait for the knote to disappear before they can make further progress. - When kqueue_scan() encounters an in-flux knote, it simply treats the situation just like encountering another thread's queue marker -- wait for the flux to settle and continue on. (The "in-flux knote" idea was inspired by FreeBSD, but this works differently from their implementation, as the two kqueue implementations have diverged quite a bit.) knote_proc_fork() uses this infrastructure to implement NOTE_TRACK like so: - Attempt to put the original tracking knote into a state of flux; if this fails (because the note has a detach pending), we skip all processing (the original process has lost interest, and we simply won the race). - Once the note is in-flux, drop the kq and forking process's locks, and allocate 2 knotes: one to post the NOTE_CHILD event, and one to attach a new NOTE_TRACK to the child process. Notably, we do NOT go through kqueue_register() to do this, but rather do all of the work directly and KASSERT() our assumptions; this allows us to directly control our interaction with locks. All memory allocations here are performed with KM_NOSLEEP, in order to prevent holding the original knote in-flux indefinitely. - Because the NOTE_TRACK use case adds knotes to kqueues through a sort of back-door mechanism, we must serialize with the closing of the destination kqueue's file descriptor, so steal another bit from the kq_count field to notify other threads that a kqueue is on its way out to prevent new knotes from being enqueued while the close path detaches them. In addition to fixing EVFILT_PROC's reliance on KERNEL_LOCK, this also fixes a long-standing bug whereby a NOTE_CHILD event could be dropped if the child process exited before the interested process received the NOTE_CHILD event (the same knote would be used to deliver the NOTE_EXIT event, and would clobber the NOTE_CHILD's 'data' field). Add a bunch of comments to explain what's going on in various critical sections, and sprinkle additional KASSERT()s to validate assumptions in several more locations. To generate a diff of this commit: cvs rdiff -u -r1.128 -r1.129 src/sys/kern/kern_event.c cvs rdiff -u -r1.509 -r1.510 src/sys/kern/kern_exec.c cvs rdiff -u -r1.291 -r1.292 src/sys/kern/kern_exit.c cvs rdiff -u -r1.226 -r1.227 src/sys/kern/kern_fork.c cvs rdiff -u -r1.43 -r1.44 src/sys/sys/event.h cvs rdiff -u -r1.9 -r1.10 src/sys/sys/eventvar.h cvs rdiff -u -r1.368 -r1.369 src/sys/sys/proc.h Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
Modified files: Index: src/sys/kern/kern_event.c diff -u src/sys/kern/kern_event.c:1.128 src/sys/kern/kern_event.c:1.129 --- src/sys/kern/kern_event.c:1.128 Thu Sep 30 01:20:53 2021 +++ src/sys/kern/kern_event.c Sun Oct 10 18:07:51 2021 @@ -1,7 +1,7 @@ -/* $NetBSD: kern_event.c,v 1.128 2021/09/30 01:20:53 thorpej Exp $ */ +/* $NetBSD: kern_event.c,v 1.129 2021/10/10 18:07:51 thorpej Exp $ */ /*- - * Copyright (c) 2008, 2009 The NetBSD Foundation, Inc. + * Copyright (c) 2008, 2009, 2021 The NetBSD Foundation, Inc. * All rights reserved. * * This code is derived from software contributed to The NetBSD Foundation @@ -58,8 +58,10 @@ * FreeBSD: src/sys/kern/kern_event.c,v 1.27 2001/07/05 17:10:44 rwatson Exp */ +#include "opt_ddb.h" + #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: kern_event.c,v 1.128 2021/09/30 01:20:53 thorpej Exp $"); +__KERNEL_RCSID(0, "$NetBSD: kern_event.c,v 1.129 2021/10/10 18:07:51 thorpej Exp $"); #include <sys/param.h> #include <sys/systm.h> @@ -134,7 +136,7 @@ static const struct filterops kqread_fil }; static const struct filterops proc_filtops = { - .f_flags = 0, + .f_flags = FILTEROP_MPSAFE, .f_attach = filt_procattach, .f_detach = filt_procdetach, .f_event = filt_proc, @@ -177,8 +179,6 @@ static int kq_calloutmax = (4 * 1024); extern const struct filterops fs_filtops; /* vfs_syscalls.c */ extern const struct filterops sig_filtops; /* kern_sig.c */ -#define KQ_FLUX_WAKEUP(kq) cv_broadcast(&kq->kq_cv) - /* * Table for for all system-defined filters. * These should be listed in the numeric order of the EVFILT_* defines. @@ -234,10 +234,189 @@ static size_t user_kfiltersz; /* size * Typically, f_event(NOTE_SUBMIT) via knote: object lock * f_event(!NOTE_SUBMIT) via knote: nothing, * acquires/releases object lock inside. + * + * Locking rules when detaching knotes: + * + * There are some situations where knote submission may require dropping + * locks (see knote_proc_fork()). In order to support this, it's possible + * to mark a knote as being 'in-flux'. Such a knote is guaranteed not to + * be detached while it remains in-flux. Because it will not be detached, + * locks can be dropped so e.g. memory can be allocated, locks on other + * data structures can be acquired, etc. During this time, any attempt to + * detach an in-flux knote must wait until the knote is no longer in-flux. + * When this happens, the knote is marked for death (KN_WILLDETACH) and the + * LWP who gets to finish the detach operation is recorded in the knote's + * 'udata' field (which is no longer required for its original purpose once + * a knote is so marked). Code paths that lead to knote_detach() must ensure + * that their LWP is the one tasked with its final demise after waiting for + * the in-flux status of the knote to clear. Note that once a knote is + * marked KN_WILLDETACH, no code paths may put it into an in-flux state. + * + * Once the special circumstances have been handled, the locks are re- + * acquired in the proper order (object lock -> kq_lock), the knote taken + * out of flux, and any waiters are notified. Because waiters must have + * also dropped *their* locks in order to safely block, they must re- + * validate all of their assumptions; see knote_detach_quiesce(). See also + * the kqueue_register() (EV_ADD, EV_DELETE) and kqueue_scan() (EV_ONESHOT) + * cases. + * + * When kqueue_scan() encounters an in-flux knote, the situation is + * treated like another LWP's list marker. + * + * LISTEN WELL: It is important to not hold knotes in flux for an + * extended period of time! In-flux knotes effectively block any + * progress of the kqueue_scan() operation. Any code paths that place + * knotes in-flux should be careful to not block for indefinite periods + * of time, such as for memory allocation (i.e. KM_NOSLEEP is OK, but + * KM_SLEEP is not). */ static krwlock_t kqueue_filter_lock; /* lock on filter lists */ static kmutex_t kqueue_timer_lock; /* for EVFILT_TIMER */ +#define KQ_FLUX_WAIT(kq) (void)cv_wait(&kq->kq_cv, &kq->kq_lock) +#define KQ_FLUX_WAKEUP(kq) cv_broadcast(&kq->kq_cv) + +static inline bool +kn_in_flux(struct knote *kn) +{ + KASSERT(mutex_owned(&kn->kn_kq->kq_lock)); + return kn->kn_influx != 0; +} + +static inline bool +kn_enter_flux(struct knote *kn) +{ + KASSERT(mutex_owned(&kn->kn_kq->kq_lock)); + + if (kn->kn_status & KN_WILLDETACH) { + return false; + } + + KASSERT(kn->kn_influx < UINT_MAX); + kn->kn_influx++; + + return true; +} + +static inline bool +kn_leave_flux(struct knote *kn) +{ + KASSERT(mutex_owned(&kn->kn_kq->kq_lock)); + KASSERT(kn->kn_influx > 0); + kn->kn_influx--; + return kn->kn_influx == 0; +} + +static void +kn_wait_flux(struct knote *kn, bool can_loop) +{ + bool loop; + + KASSERT(mutex_owned(&kn->kn_kq->kq_lock)); + + /* + * It may not be safe for us to touch the knote again after + * dropping the kq_lock. The caller has let us know in + * 'can_loop'. + */ + for (loop = true; loop && kn->kn_influx != 0; loop = can_loop) { + KQ_FLUX_WAIT(kn->kn_kq); + } +} + +#define KNOTE_WILLDETACH(kn) \ +do { \ + (kn)->kn_status |= KN_WILLDETACH; \ + (kn)->kn_kevent.udata = curlwp; \ +} while (/*CONSTCOND*/0) + +/* + * Wait until the specified knote is in a quiescent state and + * safe to detach. Returns true if we potentially blocked (and + * thus dropped our locks). + */ +static bool +knote_detach_quiesce(struct knote *kn) +{ + struct kqueue *kq = kn->kn_kq; + filedesc_t *fdp = kq->kq_fdp; + + KASSERT(mutex_owned(&fdp->fd_lock)); + + mutex_spin_enter(&kq->kq_lock); + /* + * There are two cases where we might see KN_WILLDETACH here: + * + * 1. Someone else has already started detaching the knote but + * had to wait for it to settle first. + * + * 2. We had to wait for it to settle, and had to come back + * around after re-acquiring the locks. + * + * When KN_WILLDETACH is set, we also set the LWP that claimed + * the prize of finishing the detach in the 'udata' field of the + * knote (which will never be used again for its usual purpose + * once the note is in this state). If it doesn't point to us, + * we must drop the locks and let them in to finish the job. + * + * Otherwise, once we have claimed the knote for ourselves, we + * can finish waiting for it to settle. The is the only scenario + * where touching a detaching knote is safe after dropping the + * locks. + */ + if ((kn->kn_status & KN_WILLDETACH) != 0 && + kn->kn_kevent.udata != curlwp) { + /* + * N.B. it is NOT safe for us to touch the knote again + * after dropping the locks here. The caller must go + * back around and re-validate everything. However, if + * the knote is in-flux, we want to block to minimize + * busy-looping. + */ + mutex_exit(&fdp->fd_lock); + if (kn_in_flux(kn)) { + kn_wait_flux(kn, false); + mutex_spin_exit(&kq->kq_lock); + return true; + } + mutex_spin_exit(&kq->kq_lock); + preempt_point(); + return true; + } + /* + * If we get here, we know that we will be claiming the + * detach responsibilies, or that we already have and + * this is the second attempt after re-validation. + */ + KASSERT((kn->kn_status & KN_WILLDETACH) == 0 || + kn->kn_kevent.udata == curlwp); + /* + * Similarly, if we get here, either we are just claiming it + * and may have to wait for it to settle, or if this is the + * second attempt after re-validation that no other code paths + * have put it in-flux. + */ + KASSERT((kn->kn_status & KN_WILLDETACH) == 0 || + kn_in_flux(kn) == false); + KNOTE_WILLDETACH(kn); + if (kn_in_flux(kn)) { + mutex_exit(&fdp->fd_lock); + kn_wait_flux(kn, true); + /* + * It is safe for us to touch the knote again after + * dropping the locks, but the caller must still + * re-validate everything because other aspects of + * the environment may have changed while we blocked. + */ + KASSERT(kn_in_flux(kn) == false); + mutex_spin_exit(&kq->kq_lock); + return true; + } + mutex_spin_exit(&kq->kq_lock); + + return false; +} + static int filter_attach(struct knote *kn) { @@ -577,24 +756,9 @@ static int filt_procattach(struct knote *kn) { struct proc *p; - struct lwp *curl; - - curl = curlwp; mutex_enter(&proc_lock); - if (kn->kn_flags & EV_FLAG1) { - /* - * NOTE_TRACK attaches to the child process too early - * for proc_find, so do a raw look up and check the state - * explicitly. - */ - p = proc_find_raw(kn->kn_id); - if (p != NULL && p->p_stat != SIDL) - p = NULL; - } else { - p = proc_find(kn->kn_id); - } - + p = proc_find(kn->kn_id); if (p == NULL) { mutex_exit(&proc_lock); return ESRCH; @@ -606,7 +770,7 @@ filt_procattach(struct knote *kn) */ mutex_enter(p->p_lock); mutex_exit(&proc_lock); - if (kauth_authorize_process(curl->l_cred, + if (kauth_authorize_process(curlwp->l_cred, KAUTH_PROCESS_KEVENT_FILTER, p, NULL, NULL, NULL) != 0) { mutex_exit(p->p_lock); return EACCES; @@ -616,13 +780,11 @@ filt_procattach(struct knote *kn) kn->kn_flags |= EV_CLEAR; /* automatically set */ /* - * internal flag indicating registration done by kernel + * NOTE_CHILD is only ever generated internally; don't let it + * leak in from user-space. See knote_proc_fork_track(). */ - if (kn->kn_flags & EV_FLAG1) { - kn->kn_data = kn->kn_sdata; /* ppid */ - kn->kn_fflags = NOTE_CHILD; - kn->kn_flags &= ~EV_FLAG1; - } + kn->kn_sfflags &= ~NOTE_CHILD; + SLIST_INSERT_HEAD(&p->p_klist, kn, kn_selnext); mutex_exit(p->p_lock); @@ -642,91 +804,350 @@ filt_procattach(struct knote *kn) static void filt_procdetach(struct knote *kn) { + struct kqueue *kq = kn->kn_kq; struct proc *p; - if (kn->kn_status & KN_DETACHED) - return; - - p = kn->kn_obj; - - mutex_enter(p->p_lock); - SLIST_REMOVE(&p->p_klist, kn, knote, kn_selnext); - mutex_exit(p->p_lock); + /* + * We have to synchronize with knote_proc_exit(), but we + * are forced to acquire the locks in the wrong order here + * because we can't be sure kn->kn_obj is valid unless + * KN_DETACHED is not set. + */ + again: + mutex_spin_enter(&kq->kq_lock); + if ((kn->kn_status & KN_DETACHED) == 0) { + p = kn->kn_obj; + if (!mutex_tryenter(p->p_lock)) { + mutex_spin_exit(&kq->kq_lock); + preempt_point(); + goto again; + } + kn->kn_status |= KN_DETACHED; + SLIST_REMOVE(&p->p_klist, kn, knote, kn_selnext); + mutex_exit(p->p_lock); + } + mutex_spin_exit(&kq->kq_lock); } /* * Filter event method for EVFILT_PROC. + * + * Due to some of the complexities of process locking, we have special + * entry points for delivering knote submissions. filt_proc() is used + * only to check for activation from kqueue_register() and kqueue_scan(). */ static int filt_proc(struct knote *kn, long hint) { - u_int event, fflag; - struct kevent kev; - struct kqueue *kq; - int error; + struct kqueue *kq = kn->kn_kq; + uint32_t fflags; - event = (u_int)hint & NOTE_PCTRLMASK; - kq = kn->kn_kq; - fflag = 0; + /* + * Because we share the same klist with signal knotes, just + * ensure that we're not being invoked for the proc-related + * submissions. + */ + KASSERT((hint & (NOTE_EXEC | NOTE_EXIT | NOTE_FORK)) == 0); - /* If the user is interested in this event, record it. */ - if (kn->kn_sfflags & event) - fflag |= event; + mutex_spin_enter(&kq->kq_lock); + fflags = kn->kn_fflags; + mutex_spin_exit(&kq->kq_lock); - if (event == NOTE_EXIT) { - struct proc *p = kn->kn_obj; + return fflags != 0; +} - if (p != NULL) - kn->kn_data = P_WAITSTATUS(p); - /* - * Process is gone, so flag the event as finished. - * - * Detach the knote from watched process and mark - * it as such. We can't leave this to kqueue_scan(), - * since the process might not exist by then. And we - * have to do this now, since psignal KNOTE() is called - * also for zombies and we might end up reading freed - * memory if the kevent would already be picked up - * and knote g/c'ed. - */ - filt_procdetach(kn); +void +knote_proc_exec(struct proc *p) +{ + struct knote *kn, *tmpkn; + struct kqueue *kq; + uint32_t fflags; + + mutex_enter(p->p_lock); + SLIST_FOREACH_SAFE(kn, &p->p_klist, kn_selnext, tmpkn) { + /* N.B. EVFILT_SIGNAL knotes are on this same list. */ + if (kn->kn_fop == &sig_filtops) { + continue; + } + KASSERT(kn->kn_fop == &proc_filtops); + + kq = kn->kn_kq; mutex_spin_enter(&kq->kq_lock); - kn->kn_status |= KN_DETACHED; - /* Mark as ONESHOT, so that the knote it g/c'ed when read */ - kn->kn_flags |= (EV_EOF | EV_ONESHOT); - kn->kn_fflags |= fflag; + fflags = (kn->kn_fflags |= (kn->kn_sfflags & NOTE_EXEC)); mutex_spin_exit(&kq->kq_lock); + if (fflags) { + knote_activate(kn); + } + } + + mutex_exit(p->p_lock); +} + +static int __noinline +knote_proc_fork_track(struct proc *p1, struct proc *p2, struct knote *okn) +{ + struct kqueue *kq = okn->kn_kq; + + KASSERT(mutex_owned(&kq->kq_lock)); + KASSERT(mutex_owned(p1->p_lock)); + + /* + * We're going to put this knote into flux while we drop + * the locks and create and attach a new knote to track the + * child. If we are not able to enter flux, then this knote + * is about to go away, so skip the notification. + */ + if (!kn_enter_flux(okn)) { + return 0; + } + + mutex_spin_exit(&kq->kq_lock); + mutex_exit(p1->p_lock); - return 1; + /* + * We actually have to register *two* new knotes: + * + * ==> One for the NOTE_CHILD notification. This is a forced + * ONESHOT note. + * + * ==> One to actually track the child process as it subsequently + * forks, execs, and, ultimately, exits. + * + * If we only register a single knote, then it's possible for + * for the NOTE_CHILD and NOTE_EXIT to be collapsed into a single + * notification if the child exits before the tracking process + * has received the NOTE_CHILD notification, which applications + * aren't expecting (the event's 'data' field would be clobbered, + * for exmaple). + * + * To do this, what we have here is an **extremely** stripped-down + * version of kqueue_register() that has the following properties: + * + * ==> Does not block to allocate memory. If we are unable + * to allocate memory, we return ENOMEM. + * + * ==> Does not search for existing knotes; we know there + * are not any because this is a new process that isn't + * even visible to other processes yet. + * + * ==> Assumes that the knhash for our kq's descriptor table + * already exists (after all, we're already tracking + * processes with knotes if we got here). + * + * ==> Directly attaches the new tracking knote to the child + * process. + * + * The whole point is to do the minimum amount of work while the + * knote is held in-flux, and to avoid doing extra work in general + * (we already have the new child process; why bother looking it + * up again?). + */ + filedesc_t *fdp = kq->kq_fdp; + struct knote *knchild, *kntrack; + int error = 0; + + knchild = kmem_zalloc(sizeof(*knchild), KM_NOSLEEP); + kntrack = kmem_zalloc(sizeof(*knchild), KM_NOSLEEP); + if (__predict_false(knchild == NULL || kntrack == NULL)) { + error = ENOMEM; + goto out; + } + + kntrack->kn_obj = p2; + kntrack->kn_id = p2->p_pid; + kntrack->kn_kq = kq; + kntrack->kn_fop = okn->kn_fop; + kntrack->kn_kfilter = okn->kn_kfilter; + kntrack->kn_sfflags = okn->kn_sfflags; + kntrack->kn_sdata = p1->p_pid; + + kntrack->kn_kevent.ident = p2->p_pid; + kntrack->kn_kevent.filter = okn->kn_filter; + kntrack->kn_kevent.flags = + okn->kn_flags | EV_ADD | EV_ENABLE | EV_CLEAR; + kntrack->kn_kevent.fflags = 0; + kntrack->kn_kevent.data = 0; + kntrack->kn_kevent.udata = okn->kn_kevent.udata; /* preserve udata */ + + /* + * The child note does not need to be attached to the + * new proc's klist at all. + */ + *knchild = *kntrack; + knchild->kn_status = KN_DETACHED; + knchild->kn_sfflags = 0; + knchild->kn_kevent.flags |= EV_ONESHOT; + knchild->kn_kevent.fflags = NOTE_CHILD; + knchild->kn_kevent.data = p1->p_pid; /* parent */ + + mutex_enter(&fdp->fd_lock); + + /* + * We need to check to see if the kq is closing, and skip + * attaching the knote if so. Normally, this isn't necessary + * when coming in the front door because the file descriptor + * layer will synchronize this. + * + * It's safe to test KQ_CLOSING without taking the kq_lock + * here because that flag is only ever set when the fd_lock + * is also held. + */ + if (__predict_false(kq->kq_count & KQ_CLOSING)) { + mutex_exit(&fdp->fd_lock); + goto out; } + /* + * We do the "insert into FD table" and "attach to klist" steps + * in the opposite order of kqueue_register() here to avoid + * having to take p2->p_lock twice. But this is OK because we + * hold fd_lock across the entire operation. + */ + + mutex_enter(p2->p_lock); + error = kauth_authorize_process(curlwp->l_cred, + KAUTH_PROCESS_KEVENT_FILTER, p2, NULL, NULL, NULL); + if (__predict_false(error != 0)) { + mutex_exit(p2->p_lock); + mutex_exit(&fdp->fd_lock); + error = EACCES; + goto out; + } + SLIST_INSERT_HEAD(&p2->p_klist, kntrack, kn_selnext); + mutex_exit(p2->p_lock); + + KASSERT(fdp->fd_knhashmask != 0); + KASSERT(fdp->fd_knhash != NULL); + struct klist *list = &fdp->fd_knhash[KN_HASH(kntrack->kn_id, + fdp->fd_knhashmask)]; + SLIST_INSERT_HEAD(list, kntrack, kn_link); + SLIST_INSERT_HEAD(list, knchild, kn_link); + + /* This adds references for knchild *and* kntrack. */ + atomic_add_int(&kntrack->kn_kfilter->refcnt, 2); + + knote_activate(knchild); + + kntrack = NULL; + knchild = NULL; + + mutex_exit(&fdp->fd_lock); + + out: + if (__predict_false(knchild != NULL)) { + kmem_free(knchild, sizeof(*knchild)); + } + if (__predict_false(kntrack != NULL)) { + kmem_free(kntrack, sizeof(*kntrack)); + } + mutex_enter(p1->p_lock); mutex_spin_enter(&kq->kq_lock); - if ((event == NOTE_FORK) && (kn->kn_sfflags & NOTE_TRACK)) { + + if (kn_leave_flux(okn)) { + KQ_FLUX_WAKEUP(kq); + } + + return error; +} + +void +knote_proc_fork(struct proc *p1, struct proc *p2) +{ + struct knote *kn; + struct kqueue *kq; + uint32_t fflags; + + mutex_enter(p1->p_lock); + + /* + * N.B. We DO NOT use SLIST_FOREACH_SAFE() here because we + * don't want to pre-fetch the next knote; in the event we + * have to drop p_lock, we will have put the knote in-flux, + * meaning that no one will be able to detach it until we + * have taken the knote out of flux. However, that does + * NOT stop someone else from detaching the next note in the + * list while we have it unlocked. Thus, we want to fetch + * the next note in the list only after we have re-acquired + * the lock, and using SLIST_FOREACH() will satisfy that. + */ + SLIST_FOREACH(kn, &p1->p_klist, kn_selnext) { + /* N.B. EVFILT_SIGNAL knotes are on this same list. */ + if (kn->kn_fop == &sig_filtops) { + continue; + } + KASSERT(kn->kn_fop == &proc_filtops); + + kq = kn->kn_kq; + mutex_spin_enter(&kq->kq_lock); + kn->kn_fflags |= (kn->kn_sfflags & NOTE_FORK); + if (__predict_false(kn->kn_sfflags & NOTE_TRACK)) { + /* + * This will drop kq_lock and p_lock and + * re-acquire them before it returns. + */ + if (knote_proc_fork_track(p1, p2, kn)) { + kn->kn_fflags |= NOTE_TRACKERR; + } + KASSERT(mutex_owned(p1->p_lock)); + KASSERT(mutex_owned(&kq->kq_lock)); + } + fflags = kn->kn_fflags; + mutex_spin_exit(&kq->kq_lock); + if (fflags) { + knote_activate(kn); + } + } + + mutex_exit(p1->p_lock); +} + +void +knote_proc_exit(struct proc *p) +{ + struct knote *kn; + struct kqueue *kq; + + KASSERT(mutex_owned(p->p_lock)); + + while (!SLIST_EMPTY(&p->p_klist)) { + kn = SLIST_FIRST(&p->p_klist); + kq = kn->kn_kq; + + KASSERT(kn->kn_obj == p); + + mutex_spin_enter(&kq->kq_lock); + kn->kn_data = P_WAITSTATUS(p); + /* + * Mark as ONESHOT, so that the knote is g/c'ed + * when read. + */ + kn->kn_flags |= (EV_EOF | EV_ONESHOT); + kn->kn_fflags |= kn->kn_sfflags & NOTE_EXIT; + /* - * Process forked, and user wants to track the new process, - * so attach a new knote to it, and immediately report an - * event with the parent's pid. Register knote with new - * process. + * Detach the knote from the process and mark it as such. + * N.B. EVFILT_SIGNAL are also on p_klist, but by the + * time we get here, all open file descriptors for this + * process have been released, meaning that signal knotes + * will have already been detached. + * + * We need to synchronize this with filt_procdetach(). */ - memset(&kev, 0, sizeof(kev)); - kev.ident = hint & NOTE_PDATAMASK; /* pid */ - kev.filter = kn->kn_filter; - kev.flags = kn->kn_flags | EV_ADD | EV_ENABLE | EV_FLAG1; - kev.fflags = kn->kn_sfflags; - kev.data = kn->kn_id; /* parent */ - kev.udata = kn->kn_kevent.udata; /* preserve udata */ + KASSERT(kn->kn_fop == &proc_filtops); + if ((kn->kn_status & KN_DETACHED) == 0) { + kn->kn_status |= KN_DETACHED; + SLIST_REMOVE_HEAD(&p->p_klist, kn_selnext); + } mutex_spin_exit(&kq->kq_lock); - error = kqueue_register(kq, &kev); - mutex_spin_enter(&kq->kq_lock); - if (error != 0) - kn->kn_fflags |= NOTE_TRACKERR; - } - kn->kn_fflags |= fflag; - fflag = kn->kn_fflags; - mutex_spin_exit(&kq->kq_lock); - return fflag != 0; + /* + * Always activate the knote for NOTE_EXIT regardless + * of whether or not the listener cares about it. + * This matches historical behavior. + */ + knote_activate(kn); + } } static void @@ -1220,6 +1641,10 @@ kqueue_register(struct kqueue *kq, struc } } + /* It's safe to test KQ_CLOSING while holding only the fd_lock. */ + KASSERT(mutex_owned(&fdp->fd_lock)); + KASSERT((kq->kq_count & KQ_CLOSING) == 0); + /* * kn now contains the matching knote, or NULL if no match */ @@ -1285,7 +1710,17 @@ kqueue_register(struct kqueue *kq, struc ft ? ft->f_ops->fo_name : "?", error); #endif - /* knote_detach() drops fdp->fd_lock */ + /* + * N.B. no need to check for this note to + * be in-flux, since it was never visible + * to the monitored object. + * + * knote_detach() drops fdp->fd_lock + */ + mutex_enter(&kq->kq_lock); + KNOTE_WILLDETACH(kn); + KASSERT(kn_in_flux(kn) == false); + mutex_exit(&kq->kq_lock); knote_detach(kn, fdp, false); goto done; } @@ -1299,6 +1734,36 @@ kqueue_register(struct kqueue *kq, struc } if (kev->flags & EV_DELETE) { + /* + * Let the world know that this knote is about to go + * away, and wait for it to settle if it's currently + * in-flux. + */ + mutex_spin_enter(&kq->kq_lock); + if (kn->kn_status & KN_WILLDETACH) { + /* + * This knote is already on its way out, + * so just be done. + */ + mutex_spin_exit(&kq->kq_lock); + goto doneunlock; + } + KNOTE_WILLDETACH(kn); + if (kn_in_flux(kn)) { + mutex_exit(&fdp->fd_lock); + /* + * It's safe for us to conclusively wait for + * this knote to settle because we know we'll + * be completing the detach. + */ + kn_wait_flux(kn, true); + KASSERT(kn_in_flux(kn) == false); + mutex_spin_exit(&kq->kq_lock); + mutex_enter(&fdp->fd_lock); + } else { + mutex_spin_exit(&kq->kq_lock); + } + /* knote_detach() drops fdp->fd_lock */ knote_detach(kn, fdp, true); goto done; @@ -1355,10 +1820,46 @@ doneunlock: return (error); } -#if defined(DEBUG) #define KN_FMT(buf, kn) \ (snprintb((buf), sizeof(buf), __KN_FLAG_BITS, (kn)->kn_status), buf) +#if defined(DDB) +void +kqueue_printit(struct kqueue *kq, bool full, void (*pr)(const char *, ...)) +{ + const struct knote *kn; + u_int count; + int nmarker; + char buf[128]; + + count = 0; + nmarker = 0; + + (*pr)("kqueue %p (restart=%d count=%u):\n", kq, + !!(kq->kq_count & KQ_RESTART), KQ_COUNT(kq)); + (*pr)(" Queued knotes:\n"); + TAILQ_FOREACH(kn, &kq->kq_head, kn_tqe) { + if (kn->kn_status & KN_MARKER) { + nmarker++; + } else { + count++; + } + (*pr)(" knote %p: kq=%p status=%s\n", + kn, kn->kn_kq, KN_FMT(buf, kn)); + (*pr)(" id=0x%lx (%lu) filter=%d\n", + (u_long)kn->kn_id, (u_long)kn->kn_id, kn->kn_filter); + if (kn->kn_kq != kq) { + (*pr)(" !!! kn->kn_kq != kq\n"); + } + } + if (count != KQ_COUNT(kq)) { + (*pr)(" !!! count(%u) != KQ_COUNT(%u)\n", + count, KQ_COUNT(kq)); + } +} +#endif /* DDB */ + +#if defined(DEBUG) static void kqueue_check(const char *func, size_t line, const struct kqueue *kq) { @@ -1368,7 +1869,6 @@ kqueue_check(const char *func, size_t li char buf[128]; KASSERT(mutex_owned(&kq->kq_lock)); - KASSERT(KQ_COUNT(kq) < UINT_MAX / 2); count = 0; nmarker = 0; @@ -1389,7 +1889,7 @@ kqueue_check(const char *func, size_t li } count++; if (count > KQ_COUNT(kq)) { - panic("%s,%zu: kq=%p kq->kq_count(%d) != " + panic("%s,%zu: kq=%p kq->kq_count(%u) != " "count(%d), nmarker=%d", func, line, kq, KQ_COUNT(kq), count, nmarker); @@ -1461,6 +1961,7 @@ kqueue_scan(file_t *fp, size_t maxevents memset(&morker, 0, sizeof(morker)); marker = &morker; + marker->kn_kq = kq; marker->kn_status = KN_MARKER; mutex_spin_enter(&kq->kq_lock); retry: @@ -1498,21 +1999,47 @@ kqueue_scan(file_t *fp, size_t maxevents * Acquire the fdp->fd_lock interlock to avoid races with * file creation/destruction from other threads. */ -relock: mutex_spin_exit(&kq->kq_lock); +relock: mutex_enter(&fdp->fd_lock); mutex_spin_enter(&kq->kq_lock); while (count != 0) { - kn = TAILQ_FIRST(&kq->kq_head); /* get next knote */ + /* + * Get next knote. We are guaranteed this will never + * be NULL because of the marker we inserted above. + */ + kn = TAILQ_FIRST(&kq->kq_head); - if ((kn->kn_status & KN_MARKER) != 0 && kn != marker) { + bool kn_is_other_marker = + (kn->kn_status & KN_MARKER) != 0 && kn != marker; + bool kn_is_detaching = (kn->kn_status & KN_WILLDETACH) != 0; + bool kn_is_in_flux = kn_in_flux(kn); + + /* + * If we found a marker that's not ours, or this knote + * is in a state of flux, then wait for everything to + * settle down and go around again. + */ + if (kn_is_other_marker || kn_is_detaching || kn_is_in_flux) { if (influx) { influx = 0; KQ_FLUX_WAKEUP(kq); } mutex_exit(&fdp->fd_lock); - (void)cv_wait(&kq->kq_cv, &kq->kq_lock); + if (kn_is_other_marker || kn_is_in_flux) { + KQ_FLUX_WAIT(kq); + mutex_spin_exit(&kq->kq_lock); + } else { + /* + * Detaching but not in-flux? Someone is + * actively trying to finish the job; just + * go around and try again. + */ + KASSERT(kn_is_detaching); + mutex_spin_exit(&kq->kq_lock); + preempt_point(); + } goto relock; } @@ -1553,14 +2080,22 @@ relock: } if (rv == 0) { /* - * non-ONESHOT event that hasn't - * triggered again, so de-queue. + * non-ONESHOT event that hasn't triggered + * again, so it will remain de-queued. */ kn->kn_status &= ~(KN_ACTIVE|KN_BUSY); kq->kq_count--; influx = 1; continue; } + } else { + /* + * This ONESHOT note is going to be detached + * below. Mark the knote as not long for this + * world before we release the kq lock so that + * no one else will put it in a state of flux. + */ + KNOTE_WILLDETACH(kn); } KASSERT(kn->kn_fop != NULL); touch = (!(kn->kn_fop->f_flags & FILTEROP_ISFD) && @@ -1578,6 +2113,9 @@ relock: /* delete ONESHOT events after retrieval */ kn->kn_status &= ~KN_BUSY; kq->kq_count--; + KASSERT(kn_in_flux(kn) == false); + KASSERT((kn->kn_status & KN_WILLDETACH) != 0 && + kn->kn_kevent.udata == curlwp); mutex_spin_exit(&kq->kq_lock); knote_detach(kn, fdp, true); mutex_enter(&fdp->fd_lock); @@ -1773,18 +2311,22 @@ kqueue_doclose(struct kqueue *kq, struct KASSERT(mutex_owned(&fdp->fd_lock)); + again: for (kn = SLIST_FIRST(list); kn != NULL;) { if (kq != kn->kn_kq) { kn = SLIST_NEXT(kn, kn_link); continue; } + if (knote_detach_quiesce(kn)) { + mutex_enter(&fdp->fd_lock); + goto again; + } knote_detach(kn, fdp, true); mutex_enter(&fdp->fd_lock); kn = SLIST_FIRST(list); } } - /* * fileops close method for a kqueue descriptor. */ @@ -1801,7 +2343,27 @@ kqueue_close(file_t *fp) fp->f_type = 0; fdp = curlwp->l_fd; + KASSERT(kq->kq_fdp == fdp); + mutex_enter(&fdp->fd_lock); + + /* + * We're doing to drop the fd_lock multiple times while + * we detach knotes. During this time, attempts to register + * knotes via the back door (e.g. knote_proc_fork_track()) + * need to fail, lest they sneak in to attach a knote after + * we've already drained the list it's destined for. + * + * We must aquire kq_lock here to set KQ_CLOSING (to serialize + * with other code paths that modify kq_count without holding + * the fd_lock), but once this bit is set, it's only safe to + * test it while holding the fd_lock, and holding kq_lock while + * doing so is not necessary. + */ + mutex_enter(&kq->kq_lock); + kq->kq_count |= KQ_CLOSING; + mutex_exit(&kq->kq_lock); + for (i = 0; i <= fdp->fd_lastkqfile; i++) { if ((ff = fdp->fd_dt->dt_ff[i]) == NULL) continue; @@ -1812,8 +2374,15 @@ kqueue_close(file_t *fp) kqueue_doclose(kq, &fdp->fd_knhash[i], -1); } } + mutex_exit(&fdp->fd_lock); +#if defined(DEBUG) + mutex_enter(&kq->kq_lock); + kq_check(kq); + mutex_exit(&kq->kq_lock); +#endif /* DEBUG */ + KASSERT(TAILQ_EMPTY(&kq->kq_head)); KASSERT(KQ_COUNT(kq) == 0); mutex_destroy(&kq->kq_lock); cv_destroy(&kq->kq_cv); @@ -1875,10 +2444,14 @@ knote_fdclose(int fd) struct knote *kn; filedesc_t *fdp; + again: fdp = curlwp->l_fd; mutex_enter(&fdp->fd_lock); list = (struct klist *)&fdp->fd_dt->dt_ff[fd]->ff_knlist; while ((kn = SLIST_FIRST(list)) != NULL) { + if (knote_detach_quiesce(kn)) { + goto again; + } knote_detach(kn, fdp, true); mutex_enter(&fdp->fd_lock); } @@ -1898,9 +2471,10 @@ knote_detach(struct knote *kn, filedesc_ kq = kn->kn_kq; KASSERT((kn->kn_status & KN_MARKER) == 0); + KASSERT((kn->kn_status & KN_WILLDETACH) != 0); + KASSERT(kn->kn_fop != NULL); KASSERT(mutex_owned(&fdp->fd_lock)); - KASSERT(kn->kn_fop != NULL); /* Remove from monitored object. */ if (dofop) { filter_detach(kn); @@ -1917,8 +2491,10 @@ knote_detach(struct knote *kn, filedesc_ /* Remove from kqueue. */ again: mutex_spin_enter(&kq->kq_lock); + KASSERT(kn_in_flux(kn) == false); if ((kn->kn_status & KN_QUEUED) != 0) { kq_check(kq); + KASSERT(KQ_COUNT(kq) != 0); kq->kq_count--; TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); kn->kn_status &= ~KN_QUEUED; @@ -1949,6 +2525,10 @@ knote_enqueue(struct knote *kn) kq = kn->kn_kq; mutex_spin_enter(&kq->kq_lock); + if (__predict_false(kn->kn_status & KN_WILLDETACH)) { + /* Don't bother enqueueing a dying knote. */ + goto out; + } if ((kn->kn_status & KN_DISABLED) != 0) { kn->kn_status &= ~KN_DISABLED; } @@ -1956,11 +2536,13 @@ knote_enqueue(struct knote *kn) kq_check(kq); kn->kn_status |= KN_QUEUED; TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); + KASSERT(KQ_COUNT(kq) < KQ_MAXCOUNT); kq->kq_count++; kq_check(kq); cv_broadcast(&kq->kq_cv); selnotify(&kq->kq_sel, 0, NOTE_SUBMIT); } + out: mutex_spin_exit(&kq->kq_lock); } /* @@ -1976,15 +2558,21 @@ knote_activate(struct knote *kn) kq = kn->kn_kq; mutex_spin_enter(&kq->kq_lock); + if (__predict_false(kn->kn_status & KN_WILLDETACH)) { + /* Don't bother enqueueing a dying knote. */ + goto out; + } kn->kn_status |= KN_ACTIVE; if ((kn->kn_status & (KN_QUEUED | KN_DISABLED)) == 0) { kq_check(kq); kn->kn_status |= KN_QUEUED; TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); + KASSERT(KQ_COUNT(kq) < KQ_MAXCOUNT); kq->kq_count++; kq_check(kq); cv_broadcast(&kq->kq_cv); selnotify(&kq->kq_sel, 0, NOTE_SUBMIT); } + out: mutex_spin_exit(&kq->kq_lock); } Index: src/sys/kern/kern_exec.c diff -u src/sys/kern/kern_exec.c:1.509 src/sys/kern/kern_exec.c:1.510 --- src/sys/kern/kern_exec.c:1.509 Tue Sep 28 15:35:44 2021 +++ src/sys/kern/kern_exec.c Sun Oct 10 18:07:51 2021 @@ -1,4 +1,4 @@ -/* $NetBSD: kern_exec.c,v 1.509 2021/09/28 15:35:44 thorpej Exp $ */ +/* $NetBSD: kern_exec.c,v 1.510 2021/10/10 18:07:51 thorpej Exp $ */ /*- * Copyright (c) 2008, 2019, 2020 The NetBSD Foundation, Inc. @@ -62,7 +62,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: kern_exec.c,v 1.509 2021/09/28 15:35:44 thorpej Exp $"); +__KERNEL_RCSID(0, "$NetBSD: kern_exec.c,v 1.510 2021/10/10 18:07:51 thorpej Exp $"); #include "opt_exec.h" #include "opt_execfmt.h" @@ -1367,8 +1367,17 @@ execve_runproc(struct lwp *l, struct exe pool_put(&exec_pool, data->ed_argp); - /* notify others that we exec'd */ - KNOTE(&p->p_klist, NOTE_EXEC); + /* + * Notify anyone who might care that we've exec'd. + * + * This is slightly racy; someone could sneak in and + * attach a knote after we've decided not to notify, + * or vice-versa, but that's not particularly bothersome. + * knote_proc_exec() will acquire p->p_lock as needed. + */ + if (!SLIST_EMPTY(&p->p_klist)) { + knote_proc_exec(p); + } kmem_free(epp->ep_hdr, epp->ep_hdrlen); Index: src/sys/kern/kern_exit.c diff -u src/sys/kern/kern_exit.c:1.291 src/sys/kern/kern_exit.c:1.292 --- src/sys/kern/kern_exit.c:1.291 Sat Dec 5 18:17:01 2020 +++ src/sys/kern/kern_exit.c Sun Oct 10 18:07:51 2021 @@ -1,4 +1,4 @@ -/* $NetBSD: kern_exit.c,v 1.291 2020/12/05 18:17:01 thorpej Exp $ */ +/* $NetBSD: kern_exit.c,v 1.292 2021/10/10 18:07:51 thorpej Exp $ */ /*- * Copyright (c) 1998, 1999, 2006, 2007, 2008, 2020 The NetBSD Foundation, Inc. @@ -67,7 +67,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: kern_exit.c,v 1.291 2020/12/05 18:17:01 thorpej Exp $"); +__KERNEL_RCSID(0, "$NetBSD: kern_exit.c,v 1.292 2021/10/10 18:07:51 thorpej Exp $"); #include "opt_ktrace.h" #include "opt_dtrace.h" @@ -435,16 +435,6 @@ exit1(struct lwp *l, int exitcode, int s proc_finispecific(p); /* - * Notify interested parties of our demise. - */ - KNOTE(&p->p_klist, NOTE_EXIT); - - SDT_PROBE(proc, kernel, , exit, - ((p->p_sflag & PS_COREDUMP) ? CLD_DUMPED : - (p->p_xsig ? CLD_KILLED : CLD_EXITED)), - 0,0,0,0); - - /* * Reset p_opptr pointer of all former children which got * traced by another process and were reparented. We reset * it to NULL here; the trace detach code then reparents @@ -509,6 +499,15 @@ exit1(struct lwp *l, int exitcode, int s */ p->p_stat = SDEAD; + /* + * Let anyone watching this DTrace probe know what we're + * on our way out. + */ + SDT_PROBE(proc, kernel, , exit, + ((p->p_sflag & PS_COREDUMP) ? CLD_DUMPED : + (p->p_xsig ? CLD_KILLED : CLD_EXITED)), + 0,0,0,0); + /* Put in front of parent's sibling list for parent to collect it */ old_parent = p->p_pptr; old_parent->p_nstopchild++; @@ -559,6 +558,19 @@ exit1(struct lwp *l, int exitcode, int s pcu_discard_all(l); mutex_enter(p->p_lock); + /* + * Notify other processes tracking us with a knote that + * we're exiting. + * + * N.B. we do this here because the process is now SDEAD, + * and thus cannot have any more knotes attached. Also, + * knote_proc_exit() expects that p->p_lock is already + * held (and will assert so). + */ + if (!SLIST_EMPTY(&p->p_klist)) { + knote_proc_exit(p); + } + /* Free the LWP ID */ proc_free_lwpid(p, l->l_lid); lwp_drainrefs(l); Index: src/sys/kern/kern_fork.c diff -u src/sys/kern/kern_fork.c:1.226 src/sys/kern/kern_fork.c:1.227 --- src/sys/kern/kern_fork.c:1.226 Sat May 23 23:42:43 2020 +++ src/sys/kern/kern_fork.c Sun Oct 10 18:07:51 2021 @@ -1,4 +1,4 @@ -/* $NetBSD: kern_fork.c,v 1.226 2020/05/23 23:42:43 ad Exp $ */ +/* $NetBSD: kern_fork.c,v 1.227 2021/10/10 18:07:51 thorpej Exp $ */ /*- * Copyright (c) 1999, 2001, 2004, 2006, 2007, 2008, 2019 @@ -68,7 +68,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: kern_fork.c,v 1.226 2020/05/23 23:42:43 ad Exp $"); +__KERNEL_RCSID(0, "$NetBSD: kern_fork.c,v 1.227 2021/10/10 18:07:51 thorpej Exp $"); #include "opt_ktrace.h" #include "opt_dtrace.h" @@ -547,7 +547,7 @@ fork1(struct lwp *l1, int flags, int exi */ if (!SLIST_EMPTY(&p1->p_klist)) { mutex_exit(&proc_lock); - KNOTE(&p1->p_klist, NOTE_FORK | p2->p_pid); + knote_proc_fork(p1, p2); mutex_enter(&proc_lock); } Index: src/sys/sys/event.h diff -u src/sys/sys/event.h:1.43 src/sys/sys/event.h:1.44 --- src/sys/sys/event.h:1.43 Sun Sep 26 21:29:39 2021 +++ src/sys/sys/event.h Sun Oct 10 18:07:51 2021 @@ -1,4 +1,4 @@ -/* $NetBSD: event.h,v 1.43 2021/09/26 21:29:39 thorpej Exp $ */ +/* $NetBSD: event.h,v 1.44 2021/10/10 18:07:51 thorpej Exp $ */ /*- * Copyright (c) 1999,2000,2001 Jonathan Lemon <jle...@freebsd.org> @@ -246,6 +246,7 @@ struct knote { struct kfilter *kn_kfilter; void *kn_hook; int kn_hookid; + unsigned int kn_influx; /* q: in-flux counter */ #define KN_ACTIVE 0x01U /* event has been triggered */ #define KN_QUEUED 0x02U /* event is on queue */ @@ -253,6 +254,7 @@ struct knote { #define KN_DETACHED 0x08U /* knote is detached */ #define KN_MARKER 0x10U /* is a marker */ #define KN_BUSY 0x20U /* is being scanned */ +#define KN_WILLDETACH 0x40U /* being detached imminently */ /* Toggling KN_BUSY also requires kn_kq->kq_fdp->fd_lock. */ #define __KN_FLAG_BITS \ "\20" \ @@ -261,7 +263,8 @@ struct knote { "\3DISABLED" \ "\4DETACHED" \ "\5MARKER" \ - "\6BUSY" + "\6BUSY" \ + "\7WILLDETACH" #define kn_id kn_kevent.ident Index: src/sys/sys/eventvar.h diff -u src/sys/sys/eventvar.h:1.9 src/sys/sys/eventvar.h:1.10 --- src/sys/sys/eventvar.h:1.9 Sun May 2 19:13:43 2021 +++ src/sys/sys/eventvar.h Sun Oct 10 18:07:51 2021 @@ -1,4 +1,4 @@ -/* $NetBSD: eventvar.h,v 1.9 2021/05/02 19:13:43 jdolecek Exp $ */ +/* $NetBSD: eventvar.h,v 1.10 2021/10/10 18:07:51 thorpej Exp $ */ /*- * Copyright (c) 1999,2000 Jonathan Lemon <jle...@freebsd.org> @@ -51,9 +51,21 @@ struct kqueue { filedesc_t *kq_fdp; struct selinfo kq_sel; kcondvar_t kq_cv; - u_int kq_count; /* number of pending events */ -#define KQ_RESTART 0x80000000 /* force ERESTART */ -#define KQ_COUNT(kq) ((kq)->kq_count & ~KQ_RESTART) + uint32_t kq_count; /* number of pending events */ }; +#define KQ_RESTART __BIT(31) /* force ERESTART */ +#define KQ_CLOSING __BIT(30) /* kqueue is closing for good */ +#define KQ_MAXCOUNT __BITS(0,29) +#define KQ_COUNT(kq) ((unsigned int)((kq)->kq_count & KQ_MAXCOUNT)) + +#ifdef _KERNEL + +#if defined(DDB) +void kqueue_printit(struct kqueue *, bool, + void (*)(const char *, ...)); +#endif /* DDB */ + +#endif /* _KERNEL */ + #endif /* !_SYS_EVENTVAR_H_ */ Index: src/sys/sys/proc.h diff -u src/sys/sys/proc.h:1.368 src/sys/sys/proc.h:1.369 --- src/sys/sys/proc.h:1.368 Sat Dec 5 18:17:01 2020 +++ src/sys/sys/proc.h Sun Oct 10 18:07:51 2021 @@ -1,4 +1,4 @@ -/* $NetBSD: proc.h,v 1.368 2020/12/05 18:17:01 thorpej Exp $ */ +/* $NetBSD: proc.h,v 1.369 2021/10/10 18:07:51 thorpej Exp $ */ /*- * Copyright (c) 2006, 2007, 2008, 2020 The NetBSD Foundation, Inc. @@ -562,6 +562,15 @@ void proc_setspecific(struct proc *, spe int proc_compare(const struct proc *, const struct lwp *, const struct proc *, const struct lwp *); +/* + * Special handlers for delivering EVFILT_PROC notifications. These + * exist to handle some of the special locking considerations around + * proesses. + */ +void knote_proc_exec(struct proc *); +void knote_proc_fork(struct proc *, struct proc *); +void knote_proc_exit(struct proc *); + int proclist_foreach_call(struct proclist *, int (*)(struct proc *, void *arg), void *);