> On 2. Apr 2017, at 16:34, Taylor R Campbell > <campbell+netbsd-tech-k...@mumble.net> wrote: > >> Date: Sun, 2 Apr 2017 11:09:49 +0200 >> From: "J. Hannken-Illjes" <hann...@eis.cs.tu-bs.de> >> >>> On 1. Apr 2017, at 23:03, Taylor R Campbell >>> <campbell+netbsd-tech-k...@mumble.net> wrote: >>> >>> Any particular reason to use a pointer-to-opaque-pointer here instead >>> of a caller-allocated struct mountlist_iterator object? >> >> Just to make it opaque to the caller. I see no reason to make the >> mountlist internals part of the kernel API and expose them down to >> the file system implementations. > > The caller-allocated struct mountlist_iterator could be made to > contain just a single pointer as the current API uses. > > *shrug* Not a big deal either way. > >> As _mountlist_next() exists for DDB only (no locks, no allocs) I see >> no problems here. > > One might be concerned in ddb with having to chase many pointers in a > possibly corrupt kernel memory state, so there is some small advantage > to avoiding list traversal.
Not from the DDB prompt but from operations like "printlockedvnodes()" or "fstrans_dump()". If the mountlist is corrupt you're lost. > >> Embedding the mountlist_entry in "struct mount" >> forces us to use a "struct mount" as a marker and this struct >> has a size of ~2.5 kBytes. > > By `embed' I didn't mean that the marker has to be a whole struct > mount. Rather, something more like: > > struct mountlist_entry { > TAILQ_ENTRY(mountlist_entry) mle_tqe; > enum { MNTENT_MARKER, MNTENT_MOUNT } mle_tag; > }; > > struct mount { > ... > struct mountlist_entry mnt_entry; > ... > }; > > TAILQ_HEAD(, mountlist_entry) mount_list; > > To recover the struct mount from a struct mountlist_entry that is > tagged as a mount, simply do > > struct mountlist_entry *mle = ...; > struct mount *mp = container_of(mle, struct mount, mnt_entry); Any reason you want "mle_tqe" and "mle_tag" public? >>> Candidate for cacheline-aligned struct? Why do we need a new lock >>> mount_list_lock for this instead of using the existing mountlist_lock? >> >> For the transition it is better to use a separate lock. After the >> transition "mountlist_lock" gets removed. > > OK, fair enough. Can you add a comment over the declaration stating > that plan? > >>> Beware: TAILQ_PREV (and TAILQ_LAST) violates strict aliasing rules, so >>> we should probably try to phase it out rather than add new users. >>> Maybe we should add a new kind of queue(3) that doesn't violate strict >>> aliasing but supports *_PREV and *_LAST. >>> >>> All that said, it is not clear to me why you need reverse iteration >>> here. None of your patches seem to use it. Are you planning to use >>> it for some future purpose? >> >> Historical the mountlist gets traversed in mount order. Some syscalls >> return in this order so it should be kept. >> >> Unmounting all file system on shutdown has to be done in reverse mount >> order so here (vfs_unmount_forceone / vfs_unmountall1) we need it. > > I see -- your patch just hasn't addressed those yet. Maybe Someone^TM > should just write a replacement for TAILQ so we can use it here now. File 002_ml_iterator_switch around line 453 and 487. >>> Why do we need a new refcnt dance here? Why isn't it enough to just >>> use vfs_busy like all the existing callers? >> >> Because vfs_busy() expects the caller to already hold a reference and >> one of the goals here is to drop the mount_list_lock as early as possible. > > I suppose you want to avoid setting down a lock order for > mount_list_lock and mp->mnt_unmounting. I guess this doesn't matter > much since it's limited to vfs internals. > >>> Can you make another separate commit to drop the keepref parameter to >>> vfs_unbusy, and change all the callers with keepref=true to just use >>> vfs_destroy themselves? >> >> You mean the opposite (keepref=false -> caller does vfs_destroy), yes? >> >> Of the 81 usages of vfs_unbusy() only 6 have keepref=true, all others >> would have to be changed to "vfs_unbusy(mp); vfs_destroy(mp);". > > Heh. This confusion on my part suggest two things: > > 1. Perhaps in this change, we ought to add a different function for > use with mount iteration, say mountlist_iterator_next_done, so that > any legacy confusion about the vfs_busy/unbusy API is hidden: > > mountlist_iterator_init(&it); > while ((mp = mountlist_iterator_next(&it)) != NULL) { > ... > mountlist_iterator_next_done(&it, mp); > } > mountlist_iterator_destroy(&it); Will think about it -- looks ok so far. > 2. Perhaps in another change, we ought to replace vfs_unbusy(mp, > false) by vfs_unbusy(mp), and vfs_unbusy(mp, true) by > > vfs_acquire(mp); > vfs_unbusy(mp); > > so that there is less confusing boolean-flagged asymmetry between > vfs_busy/vfs_unbusy. > > Thus, the busy count (determining whether a mount can be unmounted > from the file system namespace) and the reference count (determining > whether the struct mount kernel data structures can be freed) would be > maintained in parallel: > > /* file system mount point busy count (implies reference count too) */ > error = vfs_busy(mp); > if (error) > goto fail; > ... > vfs_unbusy(mp); > > /* data structure reference count */ > vfs_acquire(mp); > ... > vfs_destroy(mp); dito. > (Meanwhile, vfs_release would be a better name than vfs_destroy, in a > subsequent change.) -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)