Re: PROPOSAL: config_* with device_t references
> Date: Wed, 10 May 2023 01:08:27 + > From: Taylor R Campbell > > I propose to add new config_*_acquire/release functions: [...] Updated patch so that the legacy config_* operations require the caller to hold the kernel lock, while the new config_*_acquire/release ones do not (but take it internally until we make autoconf internals MP-safe). This way you can eliminate KERNEL_LOCKs around finding and detaching children, without introducing races with concurrent drvctl -d: // no KERNEL_LOCK(1, NULL); dev = config_found_acquire(...); mutex_enter(&sc->sc_childlock); sc->sc_dev = dev; mutex_exit(&sc->sc_childlock); device_release(dev); // no KERNEL_UNLOCK_ONE(NULL); // no KERNEL_LOCK(1, NULL); mutex_enter(&sc->sc_childlock); if ((dev = sc->sc_dev) != NULL) device_acquire(dev); mutex_exit(&sc->sc_childlock); config_detach_release(dev, ...); // no KERNEL_UNLOCK_ONE(NULL); Also included: - Conversion of dk(4) to config_*_acquire/release -- lightly tested. - Conversion of usb(4) to config_*_acquire/release -- compile-tested. (Coming up is a series of other changes to fix related races in dk(4) and simplify some of the logic. Also need to think about how devices like vnd(4), cgd(4), &c., can safely work -- these logical drivers sometimes do config_attach_pseudo in open, and sometimes do config_detach in close, so their needs are a bit different from drivers for physical hardware devices.) >From baba80e17fbb254e50b242d71dc7e1de9bdc5520 Mon Sep 17 00:00:00 2001 From: Taylor R Campbell Date: Fri, 26 Aug 2022 10:35:08 + Subject: [PATCH] autoconf(9): New functions for referenced attach/detach. New functions: - config_found_acquire(dev, aux, print, cfargs) - config_attach_acquire(parent, cf, aux, print, cfargs) - config_attach_pseudo_acquire(cf, aux) - config_detach_release(dev, flags) - device_acquire(dev) The config_*_acquire functions are like the non-acquire versions, but they return a referenced device_t, which is guaranteed to be safe to use until released. The device's detach function may run while it is referenced, but the device_t will not be freed and the parent's .ca_childdetached routine will not be called. => config_attach_pseudo_acquire additionally lets you pass an aux argument to the device's .ca_attach routine, unlike config_attach_pseudo which always passes NULL. => Eventually, config_found, config_attach, and config_attach_pseudo should be made to return void, because use of the device_t they return is unsafe without the kernel lock and difficult to use safely even with the kernel lock or in a UP system. For now, they require the caller to hold the kernel lock, while config_*_acquire do not. config_detach_release is like device_release and then config_detach, but avoids the race inherent with that sequence. => Eventually, config_detach should be eliminated, because getting at the device_t it needs is unsafe without the kernel lock and difficult to use safely even with the kernel lock or in a UP system. For now, it requires the caller to hold the kernel lock, while config_detach_release does not. device_acquire acquires a reference to a device. It never fails and can be used in thread context (but not interrupt context, hard or soft). Caller is responsible for ensuring that the device_t cannot be freed; in other words, the device_t must be made unavailable to any device_acquire callers before the .ca_detach function returns. Typically device_acquire will be used in a read section (mutex, rwlock, pserialize, &c.) in a data structure lookup, with corresponding logic in the .ca_detach function to remove the device from the data structure and wait for all read sections to complete. --- sys/kern/subr_autoconf.c | 185 --- sys/sys/device.h | 7 ++ 2 files changed, 181 insertions(+), 11 deletions(-) diff --git a/sys/kern/subr_autoconf.c b/sys/kern/subr_autoconf.c index e60be2d7aad8..55c04ff8ca9e 100644 --- a/sys/kern/subr_autoconf.c +++ b/sys/kern/subr_autoconf.c @@ -1259,17 +1259,21 @@ static const char * const msgs[] = { * not configured, call the given `print' function and return NULL. */ device_t -config_found(device_t parent, void *aux, cfprint_t print, +config_found_acquire(device_t parent, void *aux, cfprint_t print, const struct cfargs * const cfargs) { cfdata_t cf; struct cfargs_internal store; const struct cfargs_internal * const args = cfargs_canonicalize(cfargs, &store); + device_t dev; + + KERNEL_LOCK(1, NULL); cf = config_search_internal(parent, aux, args); if (cf != NULL) { - return config_attach_internal(parent, cf, aux, print, args); + dev = config_attach_internal(parent, cf, aux, print, args); + goto out; } if (print) { @@ -1283,7
Re: PROPOSAL: config_* with device_t references
On Wed, May 10, 2023 at 01:08:27AM +, Taylor R Campbell wrote: > I propose to add new config_*_acquire/release functions: > dev = config_found_acquire(...); > ... > config_detach_release(dev); Seems reasonable... (note, haven't read the diff carefully yet) -- David A. Holland dholl...@netbsd.org
PROPOSAL: config_* with device_t references
I propose to add new config_*_acquire/release functions: dev = config_found_acquire(...); is essentially dev = config_found(...); device_acquire(dev); and config_detach_release(dev); is essentially device_release(dev); config_detach(dev); but they work atomically without races with, e.g., concurrent drvctl or physical device removal in order to eliminate a class of driver bugs. Similarly config_attach_acquire, config_attach_pseudo_acquire. Longer-term, we can eliminate the non-acquire/release versions (or make config_attach/found return void since that's safe) once every driver is converted, but adding new names enables the compiler to assist with auditing the conversion until we're done. Thoughts? DETAILS The logic sc->sc_child = config_found(...); and config_detach(sc->sc_child, flags); found in many drivers, is fundamentally buggy on any system with MP or kpreemption. It's buggy because, for example, a parallel call to `drvctl -d childN' can detach the device and free it, before sc->sc_child has been initialized on return from config_found or after it has been read but before control has entered config_detach. This can lead to use-after-free. The kernel lock provides weak protection against this class of bugs via KERNEL_LOCK(1, NULL); sc->sc_child = config_found(...); KERNEL_UNLOCK_ONE(NULL); because it has the opportunity to exclude concurrent .ca_childdetached callbacks. But we need to get more things off the kernel lock, not add new dependencies on it. And it only helps as long as nothing in config_found blocks (e.g., on an adaptive mutex) before it returns the device_t. We can rewrite this instead as: child = config_found_acquire(...); mutex_enter(&sc->sc_childlock); sc->sc_child = child; mutex_exit(&sc->sc_childlock); device_release(child); and mutex_enter(&sc->sc_childlock); child = sc->sc_child; device_acquire(child); mutex_exit(&sc->sc_childlock); config_detach_release(child); Here sc->sc_childlock is also taken by the device's .ca_childdetached routine: mutex_enter(&sc->sc_childlock); if (dev == sc->sc_child) sc->sc_child = NULL; mutex_exit(&sc->sc_childlock); Other use of the child, e.g. in a keyboard interrupt routine calling wskbd_input, can safely use sc_childlock to coordinate access to sc_child without worrying about spinning for the kernel lock or sleeping while another thread runs a detach routine -- it's only needed to serialize access to sc->sc_child, so it can use a driver's own fine-grained locks. (It's the driver's responsibility to ensure that .ca_detach routine can't return while device_acquire is still possible, e.g. by quiescing interrupt handlers that could call device_acquire.) A minor additional change in this is to pass a cookie through config_attach_pseudo_acquire as the aux argument to the .ca_attach function. >From 066f934f6c0dd3c3884b827d1b78f02c4b50fc3e Mon Sep 17 00:00:00 2001 From: Taylor R Campbell Date: Fri, 26 Aug 2022 10:35:08 + Subject: [PATCH] autoconf(9): New functions for referenced attach/detach. New functions: - config_found_acquire(dev, aux, print, cfargs) - config_attach_acquire(parent, cf, aux, print, cfargs) - config_attach_pseudo_acquire(cf, aux) - config_detach_release(dev, flags) - device_acquire(dev) The config_*_acquire functions are like the non-acquire versions, but they return a referenced device_t, which is guaranteed to be safe to use until released. The device's detach function may run while it is referenced, but the device_t will not be freed and the parent's .ca_childdetached routine will not be called. => config_attach_pseudo_acquire additionally lets you pass an aux argument to the device's .ca_attach routine, unlike config_attach_pseudo which always passes NULL. => Eventually, config_found, config_attach, and config_attach_pseudo should be made to return void, because use of the device_t they return is fundamentally unsafe. config_detach_release is like device_release and then config_detach, but avoids the race inherent with that sequence. => Eventually, config_detach should be eliminated, because it's fundamentally MP-unsafe (and difficult to use safely even in a UP system or with the kernel lock). device_acquire acquires a reference to a device. It never fails. The caller is responsible for ensuring that the device_t cannot be freed. Typically this will be used in a read section (mutex, rwlock, pserialize, &c.) in a data structure lookup, with corresponding logic in the .ca_detach function to remove the device from the data structure and wait for all read sections to complete. --- sys/kern/subr_autoconf.c | 137 +-- sys/sys/device.h | 7 ++ 2 files changed, 137 insertions(+), 7 deletions(-) diff --git a/sys/kern/su