On Thu, Jan 09, 2020 at 01:22:21PM -0500, Ted Unangst wrote:
> Martin Pieuchot wrote:
> > On 09/01/20(Thu) 10:46, Alexander Bluhm wrote:
> > > Without this diff my regress machines became so unstable, that they
> > > often do not finish the test run.
> > >
> > > With this diff, they run fine.  No regressions.
> >
> > Sadly this is a workaround.  What is the issue?  We're missing a barrier
> > during umount?  How does other BSDs solved that?
> >
> > If your diff work because you changed a TAIL vs a HEAD insert and now
> > how to reproduce it you could capture stack traces from the function
> > doing the insert.  Is it insmntque()?  Could we prevent adding the vnode
> > to the list if a dying flag is on the mount point?
>
> I don't think that's possible. syncing the filesystem can cause more work to
> be queued. If you prevent adding the vnode to the list, how will that work be
> done?

Exactly!  My first approach was to add a flag to prevent list
insertion during unmount.  I have added the diff below for reference.
But it does not work.

The first problem is that checkalias() calls insmntque(vp, mp).  It
does not provide an error path, but that could be fixed by changing
all callers in the file system specific code.  Also a creating
device alias during unmount is unlikely.

But it fails miserably with softdeps.  When a buffer gets fushed,
new vnodes are allocated and added to the list.  As my diff prevents
vnode allocation during unmount, the metadata is not updated.  So
you end with a clean file system where fsck -f finds wrong reference
counters.

That is why I have to keep the list alive while flushing it out.
So I came to the TAILQ solution.

> The alternative would be to add another loop around the list processing. I
> think that's worse because we don't know how many times to keep looping.

There might be a race when TAILQ_FOREACH_SAFE reaches the end, next
is NULL, but during the final sleep a new vnode is inserted.  Then
we may miss to flush it.

I think vlfush() should use a while(!TAILQ_EMPTY()) logic.  But I
don't want to change that in a huge diff.  There are other _SAFE
macros that are not needed or even unsafe.  I want to convert to
TAILQ first to be more flexible.  Reversing the list is enough to
fix my existing panic.

> bluhm's diff is the best approach imo. ok me.

I also think this is the easiest step in the right direction.

bluhm

Index: kern/vfs_subr.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.297
diff -u -p -r1.297 vfs_subr.c
--- kern/vfs_subr.c     30 Dec 2019 13:49:40 -0000      1.297
+++ kern/vfs_subr.c     3 Jan 2020 13:13:42 -0000
@@ -401,6 +401,11 @@ getnewvnode(enum vtagtype tag, struct mo
                toggle = 0;

        s = splbio();
+       if (mp != NULL && mp->mnt_flag & MNT_FORCE) {
+               splx(s);
+               *vpp = NULLVP;
+               return (EIO);
+       }
        if ((numvnodes < maxvnodes) ||
            ((TAILQ_FIRST(listhd = &vnode_free_list) == NULL) &&
            ((TAILQ_FIRST(listhd = &vnode_hold_list) == NULL) || toggle))) {
@@ -425,7 +430,7 @@ getnewvnode(enum vtagtype tag, struct mo
                if (vp == NULL) {
                        splx(s);
                        tablefull("vnode");
-                       *vpp = 0;
+                       *vpp = NULLVP;
                        return (ENFILE);
                }

@@ -480,8 +485,12 @@ insmntque(struct vnode *vp, struct mount
        /*
         * Insert into list of vnodes for the new mount point, if available.
         */
-       if ((vp->v_mount = mp) != NULL)
+       if ((vp->v_mount = mp) != NULL) {
+               /* Do not insert new vnodes while forcefully unmounting. */
+               if (mp->mnt_flag & MNT_FORCE)
+                       printf("%s: insert vnode during unmount\n", __func__);
                LIST_INSERT_HEAD(&mp->mnt_vnodelist, vp, v_mntvnodes);
+       }
 }

 /*
Index: kern/vfs_syscalls.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.338
diff -u -p -r1.338 vfs_syscalls.c
--- kern/vfs_syscalls.c 29 Nov 2019 20:58:17 -0000      1.338
+++ kern/vfs_syscalls.c 1 Jan 2020 17:49:55 -0000
@@ -477,6 +477,9 @@ dounmount_leaf(struct mount *mp, int fla
        int error;
        int hadsyncer = 0;

+       KASSERT(vfs_isbusy(mp));
+       if (flags & MNT_FORCE)
+               mp->mnt_flag |= MNT_FORCE;
        mp->mnt_flag &=~ MNT_ASYNC;
        cache_purgevfs(mp);     /* remove cache entries for this file sys */
        if (mp->mnt_syncer != NULL) {
@@ -501,6 +504,7 @@ dounmount_leaf(struct mount *mp, int fla
        if (error && !(flags & MNT_DOOMED)) {
                if ((mp->mnt_flag & MNT_RDONLY) == 0 && hadsyncer)
                        (void) vfs_allocate_syncvnode(mp);
+               mp->mnt_flag &=~ MNT_FORCE;
                vfs_unbusy(mp);
                return (error);
        }

Reply via email to