Re: Add a mountlist iterator

Taylor R Campbell Sun, 02 Apr 2017 07:34:55 -0700

> Date: Sun, 2 Apr 2017 11:09:49 +0200
> From: "J. Hannken-Illjes" <hann...@eis.cs.tu-bs.de>
> 
> > On 1. Apr 2017, at 23:03, Taylor R Campbell 
> > <campbell+netbsd-tech-k...@mumble.net> wrote:
> > 
> > Any particular reason to use a pointer-to-opaque-pointer here instead
> > of a caller-allocated struct mountlist_iterator object?
> 
> Just to make it opaque to the caller.  I see no reason to make the
> mountlist internals part of the kernel API and expose them down to
> the file system implementations.


The caller-allocated struct mountlist_iterator could be made to
contain just a single pointer as the current API uses.

*shrug* Not a big deal either way.

> As _mountlist_next() exists for DDB only (no locks, no allocs) I see
> no problems here.

One might be concerned in ddb with having to chase many pointers in a
possibly corrupt kernel memory state, so there is some small advantage
to avoiding list traversal.

>                    Embedding the mountlist_entry in "struct mount"
> forces us to use a "struct mount" as a marker and this struct
> has a size of ~2.5 kBytes.

By `embed' I didn't mean that the marker has to be a whole struct
mount.  Rather, something more like:

        struct mountlist_entry {
                TAILQ_ENTRY(mountlist_entry)            mle_tqe;
                enum { MNTENT_MARKER, MNTENT_MOUNT }    mle_tag;
        };

        struct mount {
                ...
                struct mountlist_entry mnt_entry;
                ...
        };

        TAILQ_HEAD(, mountlist_entry) mount_list;

To recover the struct mount from a struct mountlist_entry that is
tagged as a mount, simply do

        struct mountlist_entry *mle = ...;
        struct mount *mp = container_of(mle, struct mount, mnt_entry);

> > Candidate for cacheline-aligned struct?  Why do we need a new lock
> > mount_list_lock for this instead of using the existing mountlist_lock?
> 
> For the transition it is better to use a separate lock.  After the
> transition "mountlist_lock" gets removed.

OK, fair enough.  Can you add a comment over the declaration stating
that plan?

> > Beware: TAILQ_PREV (and TAILQ_LAST) violates strict aliasing rules, so
> > we should probably try to phase it out rather than add new users.
> > Maybe we should add a new kind of queue(3) that doesn't violate strict
> > aliasing but supports *_PREV and *_LAST.
> > 
> > All that said, it is not clear to me why you need reverse iteration
> > here.  None of your patches seem to use it.  Are you planning to use
> > it for some future purpose?
> 
> Historical the mountlist gets traversed in mount order.  Some syscalls
> return in this order so it should be kept.
> 
> Unmounting all file system on shutdown has to be done in reverse mount
> order so here (vfs_unmount_forceone / vfs_unmountall1) we need it.

I see -- your patch just hasn't addressed those yet.  Maybe Someone^TM
should just write a replacement for TAILQ so we can use it here now.

> > Why do we need a new refcnt dance here?  Why isn't it enough to just
> > use vfs_busy like all the existing callers?
> 
> Because vfs_busy() expects the caller to already hold a reference and
> one of the goals here is to drop the mount_list_lock as early as possible.

I suppose you want to avoid setting down a lock order for
mount_list_lock and mp->mnt_unmounting.  I guess this doesn't matter
much since it's limited to vfs internals.

> > Can you make another separate commit to drop the keepref parameter to
> > vfs_unbusy, and change all the callers with keepref=true to just use
> > vfs_destroy themselves?
> 
> You mean the opposite (keepref=false -> caller does vfs_destroy), yes?
> 
> Of the 81 usages of vfs_unbusy() only 6 have keepref=true, all others
> would have to be changed to "vfs_unbusy(mp); vfs_destroy(mp);".

Heh.  This confusion on my part suggest two things:

1. Perhaps in this change, we ought to add a different function for
use with mount iteration, say mountlist_iterator_next_done, so that
any legacy confusion about the vfs_busy/unbusy API is hidden:

        mountlist_iterator_init(&it);
        while ((mp = mountlist_iterator_next(&it)) != NULL) {
                ...
                mountlist_iterator_next_done(&it, mp);
        }
        mountlist_iterator_destroy(&it);

2. Perhaps in another change, we ought to replace vfs_unbusy(mp,
false) by vfs_unbusy(mp), and vfs_unbusy(mp, true) by

vfs_acquire(mp);
vfs_unbusy(mp);

so that there is less confusing boolean-flagged asymmetry between
vfs_busy/vfs_unbusy.

Thus, the busy count (determining whether a mount can be unmounted
from the file system namespace) and the reference count (determining
whether the struct mount kernel data structures can be freed) would be
maintained in parallel:

        /* file system mount point busy count (implies reference count too) */
        error = vfs_busy(mp);
        if (error)
                goto fail;
        ...
        vfs_unbusy(mp);

        /* data structure reference count */
        vfs_acquire(mp);
        ...
        vfs_destroy(mp);

(Meanwhile, vfs_release would be a better name than vfs_destroy, in a
subsequent change.)

Re: Add a mountlist iterator

Reply via email to