> Date: Sun, 2 Apr 2017 11:09:49 +0200 > From: "J. Hannken-Illjes" <hann...@eis.cs.tu-bs.de> > > > On 1. Apr 2017, at 23:03, Taylor R Campbell > > <campbell+netbsd-tech-k...@mumble.net> wrote: > > > > Any particular reason to use a pointer-to-opaque-pointer here instead > > of a caller-allocated struct mountlist_iterator object? > > Just to make it opaque to the caller. I see no reason to make the > mountlist internals part of the kernel API and expose them down to > the file system implementations.
The caller-allocated struct mountlist_iterator could be made to contain just a single pointer as the current API uses. *shrug* Not a big deal either way. > As _mountlist_next() exists for DDB only (no locks, no allocs) I see > no problems here. One might be concerned in ddb with having to chase many pointers in a possibly corrupt kernel memory state, so there is some small advantage to avoiding list traversal. > Embedding the mountlist_entry in "struct mount" > forces us to use a "struct mount" as a marker and this struct > has a size of ~2.5 kBytes. By `embed' I didn't mean that the marker has to be a whole struct mount. Rather, something more like: struct mountlist_entry { TAILQ_ENTRY(mountlist_entry) mle_tqe; enum { MNTENT_MARKER, MNTENT_MOUNT } mle_tag; }; struct mount { ... struct mountlist_entry mnt_entry; ... }; TAILQ_HEAD(, mountlist_entry) mount_list; To recover the struct mount from a struct mountlist_entry that is tagged as a mount, simply do struct mountlist_entry *mle = ...; struct mount *mp = container_of(mle, struct mount, mnt_entry); > > Candidate for cacheline-aligned struct? Why do we need a new lock > > mount_list_lock for this instead of using the existing mountlist_lock? > > For the transition it is better to use a separate lock. After the > transition "mountlist_lock" gets removed. OK, fair enough. Can you add a comment over the declaration stating that plan? > > Beware: TAILQ_PREV (and TAILQ_LAST) violates strict aliasing rules, so > > we should probably try to phase it out rather than add new users. > > Maybe we should add a new kind of queue(3) that doesn't violate strict > > aliasing but supports *_PREV and *_LAST. > > > > All that said, it is not clear to me why you need reverse iteration > > here. None of your patches seem to use it. Are you planning to use > > it for some future purpose? > > Historical the mountlist gets traversed in mount order. Some syscalls > return in this order so it should be kept. > > Unmounting all file system on shutdown has to be done in reverse mount > order so here (vfs_unmount_forceone / vfs_unmountall1) we need it. I see -- your patch just hasn't addressed those yet. Maybe Someone^TM should just write a replacement for TAILQ so we can use it here now. > > Why do we need a new refcnt dance here? Why isn't it enough to just > > use vfs_busy like all the existing callers? > > Because vfs_busy() expects the caller to already hold a reference and > one of the goals here is to drop the mount_list_lock as early as possible. I suppose you want to avoid setting down a lock order for mount_list_lock and mp->mnt_unmounting. I guess this doesn't matter much since it's limited to vfs internals. > > Can you make another separate commit to drop the keepref parameter to > > vfs_unbusy, and change all the callers with keepref=true to just use > > vfs_destroy themselves? > > You mean the opposite (keepref=false -> caller does vfs_destroy), yes? > > Of the 81 usages of vfs_unbusy() only 6 have keepref=true, all others > would have to be changed to "vfs_unbusy(mp); vfs_destroy(mp);". Heh. This confusion on my part suggest two things: 1. Perhaps in this change, we ought to add a different function for use with mount iteration, say mountlist_iterator_next_done, so that any legacy confusion about the vfs_busy/unbusy API is hidden: mountlist_iterator_init(&it); while ((mp = mountlist_iterator_next(&it)) != NULL) { ... mountlist_iterator_next_done(&it, mp); } mountlist_iterator_destroy(&it); 2. Perhaps in another change, we ought to replace vfs_unbusy(mp, false) by vfs_unbusy(mp), and vfs_unbusy(mp, true) by vfs_acquire(mp); vfs_unbusy(mp); so that there is less confusing boolean-flagged asymmetry between vfs_busy/vfs_unbusy. Thus, the busy count (determining whether a mount can be unmounted from the file system namespace) and the reference count (determining whether the struct mount kernel data structures can be freed) would be maintained in parallel: /* file system mount point busy count (implies reference count too) */ error = vfs_busy(mp); if (error) goto fail; ... vfs_unbusy(mp); /* data structure reference count */ vfs_acquire(mp); ... vfs_destroy(mp); (Meanwhile, vfs_release would be a better name than vfs_destroy, in a subsequent change.)