Re: dangling vnode panic

2020-01-09 Thread Todd C . Miller
On Thu, 09 Jan 2020 23:03:37 +0100, Alexander Bluhm wrote:

> That is why I have to keep the list alive while flushing it out.
> So I came to the TAILQ solution.

It seems like the best solution right now.

> > The alternative would be to add another loop around the list processing. I
> > think that's worse because we don't know how many times to keep looping.
>
> There might be a race when TAILQ_FOREACH_SAFE reaches the end, next
> is NULL, but during the final sleep a new vnode is inserted.  Then
> we may miss to flush it.
>
> I think vlfush() should use a while(!TAILQ_EMPTY()) logic.  But I
> don't want to change that in a huge diff.  There are other _SAFE
> macros that are not needed or even unsafe.  I want to convert to
> TAILQ first to be more flexible.  Reversing the list is enough to
> fix my existing panic.

That makes sense.

> > bluhm's diff is the best approach imo. ok me.
>
> I also think this is the easiest step in the right direction.

OK millert@

 - todd



Re: dangling vnode panic

2020-01-09 Thread Alexander Bluhm
On Thu, Jan 09, 2020 at 01:22:21PM -0500, Ted Unangst wrote:
> Martin Pieuchot wrote:
> > On 09/01/20(Thu) 10:46, Alexander Bluhm wrote:
> > > Without this diff my regress machines became so unstable, that they
> > > often do not finish the test run.
> > >
> > > With this diff, they run fine.  No regressions.
> >
> > Sadly this is a workaround.  What is the issue?  We're missing a barrier
> > during umount?  How does other BSDs solved that?
> >
> > If your diff work because you changed a TAIL vs a HEAD insert and now
> > how to reproduce it you could capture stack traces from the function
> > doing the insert.  Is it insmntque()?  Could we prevent adding the vnode
> > to the list if a dying flag is on the mount point?
>
> I don't think that's possible. syncing the filesystem can cause more work to
> be queued. If you prevent adding the vnode to the list, how will that work be
> done?

Exactly!  My first approach was to add a flag to prevent list
insertion during unmount.  I have added the diff below for reference.
But it does not work.

The first problem is that checkalias() calls insmntque(vp, mp).  It
does not provide an error path, but that could be fixed by changing
all callers in the file system specific code.  Also a creating
device alias during unmount is unlikely.

But it fails miserably with softdeps.  When a buffer gets fushed,
new vnodes are allocated and added to the list.  As my diff prevents
vnode allocation during unmount, the metadata is not updated.  So
you end with a clean file system where fsck -f finds wrong reference
counters.

That is why I have to keep the list alive while flushing it out.
So I came to the TAILQ solution.

> The alternative would be to add another loop around the list processing. I
> think that's worse because we don't know how many times to keep looping.

There might be a race when TAILQ_FOREACH_SAFE reaches the end, next
is NULL, but during the final sleep a new vnode is inserted.  Then
we may miss to flush it.

I think vlfush() should use a while(!TAILQ_EMPTY()) logic.  But I
don't want to change that in a huge diff.  There are other _SAFE
macros that are not needed or even unsafe.  I want to convert to
TAILQ first to be more flexible.  Reversing the list is enough to
fix my existing panic.

> bluhm's diff is the best approach imo. ok me.

I also think this is the easiest step in the right direction.

bluhm

Index: kern/vfs_subr.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.297
diff -u -p -r1.297 vfs_subr.c
--- kern/vfs_subr.c 30 Dec 2019 13:49:40 -  1.297
+++ kern/vfs_subr.c 3 Jan 2020 13:13:42 -
@@ -401,6 +401,11 @@ getnewvnode(enum vtagtype tag, struct mo
toggle = 0;

s = splbio();
+   if (mp != NULL && mp->mnt_flag & MNT_FORCE) {
+   splx(s);
+   *vpp = NULLVP;
+   return (EIO);
+   }
if ((numvnodes < maxvnodes) ||
((TAILQ_FIRST(listhd = _free_list) == NULL) &&
((TAILQ_FIRST(listhd = _hold_list) == NULL) || toggle))) {
@@ -425,7 +430,7 @@ getnewvnode(enum vtagtype tag, struct mo
if (vp == NULL) {
splx(s);
tablefull("vnode");
-   *vpp = 0;
+   *vpp = NULLVP;
return (ENFILE);
}

@@ -480,8 +485,12 @@ insmntque(struct vnode *vp, struct mount
/*
 * Insert into list of vnodes for the new mount point, if available.
 */
-   if ((vp->v_mount = mp) != NULL)
+   if ((vp->v_mount = mp) != NULL) {
+   /* Do not insert new vnodes while forcefully unmounting. */
+   if (mp->mnt_flag & MNT_FORCE)
+   printf("%s: insert vnode during unmount\n", __func__);
LIST_INSERT_HEAD(>mnt_vnodelist, vp, v_mntvnodes);
+   }
 }

 /*
Index: kern/vfs_syscalls.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.338
diff -u -p -r1.338 vfs_syscalls.c
--- kern/vfs_syscalls.c 29 Nov 2019 20:58:17 -  1.338
+++ kern/vfs_syscalls.c 1 Jan 2020 17:49:55 -
@@ -477,6 +477,9 @@ dounmount_leaf(struct mount *mp, int fla
int error;
int hadsyncer = 0;

+   KASSERT(vfs_isbusy(mp));
+   if (flags & MNT_FORCE)
+   mp->mnt_flag |= MNT_FORCE;
mp->mnt_flag &=~ MNT_ASYNC;
cache_purgevfs(mp); /* remove cache entries for this file sys */
if (mp->mnt_syncer != NULL) {
@@ -501,6 +504,7 @@ dounmount_leaf(struct mount *mp, int fla
if (error && !(flags & MNT_DOOMED)) {
if ((mp->mnt_flag & MNT_RDONLY) == 0 && hadsyncer)
(void) vfs_allocate_syncvnode(mp);
+   mp->mnt_flag &=~ MNT_FORCE;
vfs_unbusy(mp);

Re: dangling vnode panic

2020-01-09 Thread Ted Unangst
Martin Pieuchot wrote:
> On 09/01/20(Thu) 10:46, Alexander Bluhm wrote:
> > Without this diff my regress machines became so unstable, that they
> > often do not finish the test run.
> > 
> > With this diff, they run fine.  No regressions.
> 
> Sadly this is a workaround.  What is the issue?  We're missing a barrier
> during umount?  How does other BSDs solved that?
> 
> If your diff work because you changed a TAIL vs a HEAD insert and now
> how to reproduce it you could capture stack traces from the function
> doing the insert.  Is it insmntque()?  Could we prevent adding the vnode
> to the list if a dying flag is on the mount point?

I don't think that's possible. syncing the filesystem can cause more work to
be queued. If you prevent adding the vnode to the list, how will that work be
done?

The alternative would be to add another loop around the list processing. I
think that's worse because we don't know how many times to keep looping.

bluhm's diff is the best approach imo. ok me.



Re: dangling vnode panic

2020-01-09 Thread Martin Pieuchot
On 09/01/20(Thu) 10:46, Alexander Bluhm wrote:
> Without this diff my regress machines became so unstable, that they
> often do not finish the test run.
> 
> With this diff, they run fine.  No regressions.

Sadly this is a workaround.  What is the issue?  We're missing a barrier
during umount?  How does other BSDs solved that?

If your diff work because you changed a TAIL vs a HEAD insert and now
how to reproduce it you could capture stack traces from the function
doing the insert.  Is it insmntque()?  Could we prevent adding the vnode
to the list if a dying flag is on the mount point?

> Any ok?
> 
> bluhm
> 
> On Sat, Jan 04, 2020 at 10:55:46PM +0100, Alexander Bluhm wrote:
> > On my amd64 test machine regress/sys/kern/mount run-regress-unmount-busy
> > triggers the dangling vnode panic regulary.  Something has changed
> > in the previous months that makes it more likely.
> >
> > Problem is that while flushing the mnt_vnodelist list, the unmount
> > process sleeps.  Then active processes can write new dirty vnodes
> > to the list.  This happens without softdep.
> >
> > My first attempt was to disable insertions while unmounting.  That
> > does not work.  A buffer flush can cause softdep to add new dirty
> > vnodes to the list.
> >
> > This diff adds the dirty vnodes to the end of the list.  vflush()
> > trverses it from the begining.  Then new vnodes, that are added
> > during syncing, will be flushed.  This fixes my test.
> >
> > Please test with and without softdep.
> >
> > ok?
> >
> > bluhm
> >
> > Index: sys/kern/vfs_subr.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_subr.c,v
> > retrieving revision 1.297
> > diff -u -p -r1.297 vfs_subr.c
> > --- sys/kern/vfs_subr.c 30 Dec 2019 13:49:40 -  1.297
> > +++ sys/kern/vfs_subr.c 4 Jan 2020 18:42:25 -
> > @@ -178,7 +178,7 @@ vfs_mount_alloc(struct vnode *vp, struct
> > rw_init_flags(>mnt_lock, "vfslock", RWL_IS_VNODE);
> > (void)vfs_busy(mp, VB_READ|VB_NOWAIT);
> >
> > -   LIST_INIT(>mnt_vnodelist);
> > +   TAILQ_INIT(>mnt_vnodelist);
> > mp->mnt_vnodecovered = vp;
> >
> > atomic_inc_int(>vfc_refcount);
> > @@ -476,12 +476,12 @@ insmntque(struct vnode *vp, struct mount
> >  * Delete from old mount point vnode list, if on one.
> >  */
> > if (vp->v_mount != NULL)
> > -   LIST_REMOVE(vp, v_mntvnodes);
> > +   TAILQ_REMOVE(>v_mount->mnt_vnodelist, vp, v_mntvnodes);
> > /*
> >  * Insert into list of vnodes for the new mount point, if available.
> >  */
> > if ((vp->v_mount = mp) != NULL)
> > -   LIST_INSERT_HEAD(>mnt_vnodelist, vp, v_mntvnodes);
> > +   TAILQ_INSERT_TAIL(>mnt_vnodelist, vp, v_mntvnodes);
> >  }
> >
> >  /*
> > @@ -872,7 +872,7 @@ vfs_mount_foreach_vnode(struct mount *mp
> > int error = 0;
> >
> >  loop:
> > -   LIST_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) {
> > +   TAILQ_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) {
> > if (vp->v_mount != mp)
> > goto loop;
> >
> > @@ -1299,7 +1299,7 @@ printlockedvnodes(void)
> > TAILQ_FOREACH(mp, , mnt_list) {
> > if (vfs_busy(mp, VB_READ|VB_NOWAIT))
> > continue;
> > -   LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
> > +   TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
> > if (VOP_ISLOCKED(vp))
> > vprint(NULL, vp);
> > }
> > @@ -2287,7 +2287,7 @@ vfs_mount_print(struct mount *mp, int fu
> > (*pr)("locked vnodes:");
> > /* XXX would take mountlist lock, except ddb has no context */
> > cnt = 0;
> > -   LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
> > +   TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
> > if (VOP_ISLOCKED(vp)) {
> > if (cnt == 0)
> > (*pr)("\n  %p", vp);
> > @@ -2304,7 +2304,7 @@ vfs_mount_print(struct mount *mp, int fu
> > (*pr)("all vnodes:");
> > /* XXX would take mountlist lock, except ddb has no context */
> > cnt = 0;
> > -   LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
> > +   TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
> > if (cnt == 0)
> >

Re: dangling vnode panic

2020-01-09 Thread Alexander Bluhm
Without this diff my regress machines became so unstable, that they
often do not finish the test run.

With this diff, they run fine.  No regressions.

Any ok?

bluhm

On Sat, Jan 04, 2020 at 10:55:46PM +0100, Alexander Bluhm wrote:
> On my amd64 test machine regress/sys/kern/mount run-regress-unmount-busy
> triggers the dangling vnode panic regulary.  Something has changed
> in the previous months that makes it more likely.
>
> Problem is that while flushing the mnt_vnodelist list, the unmount
> process sleeps.  Then active processes can write new dirty vnodes
> to the list.  This happens without softdep.
>
> My first attempt was to disable insertions while unmounting.  That
> does not work.  A buffer flush can cause softdep to add new dirty
> vnodes to the list.
>
> This diff adds the dirty vnodes to the end of the list.  vflush()
> trverses it from the begining.  Then new vnodes, that are added
> during syncing, will be flushed.  This fixes my test.
>
> Please test with and without softdep.
>
> ok?
>
> bluhm
>
> Index: sys/kern/vfs_subr.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_subr.c,v
> retrieving revision 1.297
> diff -u -p -r1.297 vfs_subr.c
> --- sys/kern/vfs_subr.c   30 Dec 2019 13:49:40 -  1.297
> +++ sys/kern/vfs_subr.c   4 Jan 2020 18:42:25 -
> @@ -178,7 +178,7 @@ vfs_mount_alloc(struct vnode *vp, struct
>   rw_init_flags(>mnt_lock, "vfslock", RWL_IS_VNODE);
>   (void)vfs_busy(mp, VB_READ|VB_NOWAIT);
>
> - LIST_INIT(>mnt_vnodelist);
> + TAILQ_INIT(>mnt_vnodelist);
>   mp->mnt_vnodecovered = vp;
>
>   atomic_inc_int(>vfc_refcount);
> @@ -476,12 +476,12 @@ insmntque(struct vnode *vp, struct mount
>* Delete from old mount point vnode list, if on one.
>*/
>   if (vp->v_mount != NULL)
> - LIST_REMOVE(vp, v_mntvnodes);
> + TAILQ_REMOVE(>v_mount->mnt_vnodelist, vp, v_mntvnodes);
>   /*
>* Insert into list of vnodes for the new mount point, if available.
>*/
>   if ((vp->v_mount = mp) != NULL)
> - LIST_INSERT_HEAD(>mnt_vnodelist, vp, v_mntvnodes);
> + TAILQ_INSERT_TAIL(>mnt_vnodelist, vp, v_mntvnodes);
>  }
>
>  /*
> @@ -872,7 +872,7 @@ vfs_mount_foreach_vnode(struct mount *mp
>   int error = 0;
>
>  loop:
> - LIST_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) {
> + TAILQ_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) {
>   if (vp->v_mount != mp)
>   goto loop;
>
> @@ -1299,7 +1299,7 @@ printlockedvnodes(void)
>   TAILQ_FOREACH(mp, , mnt_list) {
>   if (vfs_busy(mp, VB_READ|VB_NOWAIT))
>   continue;
> - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
> + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
>   if (VOP_ISLOCKED(vp))
>   vprint(NULL, vp);
>   }
> @@ -2287,7 +2287,7 @@ vfs_mount_print(struct mount *mp, int fu
>   (*pr)("locked vnodes:");
>   /* XXX would take mountlist lock, except ddb has no context */
>   cnt = 0;
> - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
> + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
>   if (VOP_ISLOCKED(vp)) {
>   if (cnt == 0)
>   (*pr)("\n  %p", vp);
> @@ -2304,7 +2304,7 @@ vfs_mount_print(struct mount *mp, int fu
>   (*pr)("all vnodes:");
>   /* XXX would take mountlist lock, except ddb has no context */
>   cnt = 0;
> - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
> + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
>   if (cnt == 0)
>   (*pr)("\n  %p", vp);
>   else if ((cnt % (72 / (sizeof(void *) * 2 + 4))) == 0)
> Index: sys/kern/vfs_syscalls.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_syscalls.c,v
> retrieving revision 1.338
> diff -u -p -r1.338 vfs_syscalls.c
> --- sys/kern/vfs_syscalls.c   29 Nov 2019 20:58:17 -  1.338
> +++ sys/kern/vfs_syscalls.c   4 Jan 2020 18:41:20 -
> @@ -489,7 +489,7 @@ dounmount_leaf(struct mount *mp, int fla
>* Before calling file system unmount, make sure
>* all unveils to vnodes in here are dropped.
>*/
> - LIST_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) {

dangling vnode panic

2020-01-04 Thread Alexander Bluhm
Hi,

On my amd64 test machine regress/sys/kern/mount run-regress-unmount-busy
triggers the dangling vnode panic regulary.  Something has changed
in the previous months that makes it more likely.

Problem is that while flushing the mnt_vnodelist list, the unmount
process sleeps.  Then active processes can write new dirty vnodes
to the list.  This happens without softdep.

My first attempt was to disable insertions while unmounting.  That
does not work.  A buffer flush can cause softdep to add new dirty
vnodes to the list.

This diff adds the dirty vnodes to the end of the list.  vflush()
trverses it from the begining.  Then new vnodes, that are added
during syncing, will be flushed.  This fixes my test.

Please test with and without softdep.

ok?

bluhm

Index: sys/kern/vfs_subr.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.297
diff -u -p -r1.297 vfs_subr.c
--- sys/kern/vfs_subr.c 30 Dec 2019 13:49:40 -  1.297
+++ sys/kern/vfs_subr.c 4 Jan 2020 18:42:25 -
@@ -178,7 +178,7 @@ vfs_mount_alloc(struct vnode *vp, struct
rw_init_flags(>mnt_lock, "vfslock", RWL_IS_VNODE);
(void)vfs_busy(mp, VB_READ|VB_NOWAIT);

-   LIST_INIT(>mnt_vnodelist);
+   TAILQ_INIT(>mnt_vnodelist);
mp->mnt_vnodecovered = vp;

atomic_inc_int(>vfc_refcount);
@@ -476,12 +476,12 @@ insmntque(struct vnode *vp, struct mount
 * Delete from old mount point vnode list, if on one.
 */
if (vp->v_mount != NULL)
-   LIST_REMOVE(vp, v_mntvnodes);
+   TAILQ_REMOVE(>v_mount->mnt_vnodelist, vp, v_mntvnodes);
/*
 * Insert into list of vnodes for the new mount point, if available.
 */
if ((vp->v_mount = mp) != NULL)
-   LIST_INSERT_HEAD(>mnt_vnodelist, vp, v_mntvnodes);
+   TAILQ_INSERT_TAIL(>mnt_vnodelist, vp, v_mntvnodes);
 }

 /*
@@ -872,7 +872,7 @@ vfs_mount_foreach_vnode(struct mount *mp
int error = 0;

 loop:
-   LIST_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) {
+   TAILQ_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) {
if (vp->v_mount != mp)
goto loop;

@@ -1299,7 +1299,7 @@ printlockedvnodes(void)
TAILQ_FOREACH(mp, , mnt_list) {
if (vfs_busy(mp, VB_READ|VB_NOWAIT))
continue;
-   LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
+   TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
if (VOP_ISLOCKED(vp))
vprint(NULL, vp);
}
@@ -2287,7 +2287,7 @@ vfs_mount_print(struct mount *mp, int fu
(*pr)("locked vnodes:");
/* XXX would take mountlist lock, except ddb has no context */
cnt = 0;
-   LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
+   TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
if (VOP_ISLOCKED(vp)) {
if (cnt == 0)
(*pr)("\n  %p", vp);
@@ -2304,7 +2304,7 @@ vfs_mount_print(struct mount *mp, int fu
(*pr)("all vnodes:");
/* XXX would take mountlist lock, except ddb has no context */
cnt = 0;
-   LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
+   TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) {
if (cnt == 0)
(*pr)("\n  %p", vp);
else if ((cnt % (72 / (sizeof(void *) * 2 + 4))) == 0)
Index: sys/kern/vfs_syscalls.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.338
diff -u -p -r1.338 vfs_syscalls.c
--- sys/kern/vfs_syscalls.c 29 Nov 2019 20:58:17 -  1.338
+++ sys/kern/vfs_syscalls.c 4 Jan 2020 18:41:20 -
@@ -489,7 +489,7 @@ dounmount_leaf(struct mount *mp, int fla
 * Before calling file system unmount, make sure
 * all unveils to vnodes in here are dropped.
 */
-   LIST_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) {
+   TAILQ_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) {
unveil_removevnode(vp);
}

@@ -511,7 +511,7 @@ dounmount_leaf(struct mount *mp, int fla
vrele(coveredvp);
}

-   if (!LIST_EMPTY(>mnt_vnodelist))
+   if (!TAILQ_EMPTY(>mnt_vnodelist))
panic("unmount: dangling vnode");

vfs_unbusy(mp);
Index: sys/nfs/nfs_subs.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/nfs/nfs_subs.c,v
retrieving revision 1.140
diff -u -p -r1.140 nfs_subs.c
--- sys/nfs/nfs_subs.c  25 Dec 2019 12:31:12 -