Date: Wed, 8 Jun 2016 17:34:33 +0000
   From: David Holland <[email protected]>

   On Tue, Jun 07, 2016 at 06:28:11PM +0800, Paul Goyette wrote:
    > Can anyone suggest a reliable way to ensure that a device-driver module 
can
    > be _really_ safely detached?

   There's a pserialize scheme for this; see e.g. an old thread called
   "kicking everybody out of the softc".

Skimming that thread prompted me to write a sketch -- attached -- of
the per-CPU counts I've mentioned several times in this thread, which
is a bit simpler than dyoung@'s proposal back in 2010.

The reason I use per-CPU counts here instead of psref(9) or per-CPU
reference lists is to allow the caller of device_lookup_acquire to
migrate to another CPU before device_release, which is fairly likely
for operations that wait for I/O, but which is forbidden for psref(9).

   The catch for arbitrary drivers is that you need to do the enter
   business using the softc *before* using [cb]devsw to jump into the
   driver, but the current way all of that works is the other way
   around. This really needs to be restructured...

This is not a catch -- this is just the distinction between autoconf
device state (device_lookup) and userland-exposed device nodes
(bdev/cdev).  Detach both before unmapping the module code and you're
safe.

I think even the order doesn't even matter: if you detach the autoconf
device first, then xyzread's call to device_lookup will fail; if you
detach the devsw first, then no more calls to xyzread can happen.  But
there may be other constraints that imply some order, which I don't
have in my head at the moment.
struct device {
        ...
        struct percpu   *dv_usecounts;          /* int64_t */
        int64_t         *dv_usecount_total;
        bool            dv_dying;
        ...
};

/*
 * device_diediedie:
 *
 *      Prevent new references to dv and wait for all existing
 *      references to drain.  Caller must hold alldevs_mtx, which will
 *      be released during the wait.
 *
 *      XXX To be called from config_devunlink, I think, as the first
 *      thing it does.
 */
device_t
device_diediedie(device_t dev)
{
        int64_t total = 0;

        KASSERT(mutex_owned(&alldevs_mtx));
        KASSERT(!dev->dv_dying);

        /* Mark it complete and record where the total tally goes.  */
        dev->dv_usecount_total = &total;
        dev->dv_dying = true;

        /*
         * Count up the total on all CPUs.  xc_broadcast serves as a
         * global memory barrier to ensure visibility of dv_dying = 1
         * on all CPUs before the cv_wait.
         */
        mutex_exit(&alldevs_mtx);
        xc_broadcast(0, &device_usecount_xc, dev, NULL);
        mutex_enter(&alldevs_mtx);

        KASSERTMSG((0 <= total), "negatively referenced device: %p, %"PRId64,
            dev, total);
        while (total != 0)
                cv_wait(&device_release_cv, &alldevs_mtx);

        /* Cause subsequent attempts to add to the total to crash.  */
        dev->dv_usecount_total = NULL;
}

static void
device_usecount_xc(void *cookie0, void *cookie1 __unused)
{
        device_t dev = cookie0;
        int64_t *usecountp;

        mutex_enter(&alldevs_mtx);
        usecountp = percpu_getref(dev->dv_usecounts);
        *dev->dv_usecount_total += *usecountp;
        percpu_putref(dev->dv_usecounts);
        mutex_exit(&alldevs_mtx);
}

/*
 * device_lookup_acquire:
 *
 *      Look up a device instance for a given driver and add a
 *      reference to it.
 *
 *      If found, return the device private, and store the device
 *      pointer in *devp, which the caller must later release with
 *      device_release.  Otherwise, return NULL and store NULL in
 *      *devp.
 *
 *      XXX In principle, this could be done with pserialize(9) and
 *      involve no interprocessor synchronization at all, by changing
 *      the management of cd->cd_ndevs and cd->cd_devs.
 */
void *
device_lookup_acquire(cfdriver_t cd, int unit, device_t *devp)
{
        device_t dev;

        mutex_enter(&alldevs_mtx);
        if (unit < 0 || unit >= cd->cd_ndevs) {
                dev = NULL;
        } else if ((dev = cd->cd_devs[unit]) != NULL && dev->dv_del_gen != 0) {
                dev = NULL;
        } else if (dev->dv_dying) {
                dev = NULL;
        } else {
                int64_t *usecountp = percpu_getref(dev->dv_usecounts);
                *usecountp += 1;
                percpu_putref(dev->dv_usecounts);
        }
        mutex_exit(&alldevs_mtx);

        *devp = dev;
        return dev ? device_private(dev) : NULL;
}

/*
 * device_release:
 *
 *      Release a device acquired with device_lookup_acquire.  If dev
 *      is not in the process of dying, involves no interprocessor
 *      synchronization.
 */
void
device_release(device_t dev)
{
        int64_t *usecountp;

        if (__predict_false(dv->dv_dying)) {
                /*
                 * Slow path: adjust local delta and global total in
                 * serial with respect to device_usecount_xc so we
                 * don't inadvertently double-count.
                 *
                 * Broadcast unconditionally, not conditional on
                 * whether the local delta is zero, because our
                 * reference may have migrated from another CPU.
                 */
                mutex_enter(&alldevs_mtx);
                usecountp = percpu_getref(dev->dv_usecounts);
                *usecountp -= 1;
                percpu_putref(dev->dv_usecounts);
                *dev->dv_usecount_total -= 1;
                cv_broadcast(&device_release_cv);
                mutex_exit(&alldevs_mtx);
        } else {
                /* Fast path: adjust only local delta.  */
                usecountp = percpu_getref(dev->dv_usecounts);
                *usecountp -= 1;
                percpu_putref(dev->dv_usecounts);
        }
}

Reply via email to