Re: dangling vnode panic
On Thu, 09 Jan 2020 23:03:37 +0100, Alexander Bluhm wrote: > That is why I have to keep the list alive while flushing it out. > So I came to the TAILQ solution. It seems like the best solution right now. > > The alternative would be to add another loop around the list processing. I > > think that's worse because we don't know how many times to keep looping. > > There might be a race when TAILQ_FOREACH_SAFE reaches the end, next > is NULL, but during the final sleep a new vnode is inserted. Then > we may miss to flush it. > > I think vlfush() should use a while(!TAILQ_EMPTY()) logic. But I > don't want to change that in a huge diff. There are other _SAFE > macros that are not needed or even unsafe. I want to convert to > TAILQ first to be more flexible. Reversing the list is enough to > fix my existing panic. That makes sense. > > bluhm's diff is the best approach imo. ok me. > > I also think this is the easiest step in the right direction. OK millert@ - todd
Re: dangling vnode panic
On Thu, Jan 09, 2020 at 01:22:21PM -0500, Ted Unangst wrote: > Martin Pieuchot wrote: > > On 09/01/20(Thu) 10:46, Alexander Bluhm wrote: > > > Without this diff my regress machines became so unstable, that they > > > often do not finish the test run. > > > > > > With this diff, they run fine. No regressions. > > > > Sadly this is a workaround. What is the issue? We're missing a barrier > > during umount? How does other BSDs solved that? > > > > If your diff work because you changed a TAIL vs a HEAD insert and now > > how to reproduce it you could capture stack traces from the function > > doing the insert. Is it insmntque()? Could we prevent adding the vnode > > to the list if a dying flag is on the mount point? > > I don't think that's possible. syncing the filesystem can cause more work to > be queued. If you prevent adding the vnode to the list, how will that work be > done? Exactly! My first approach was to add a flag to prevent list insertion during unmount. I have added the diff below for reference. But it does not work. The first problem is that checkalias() calls insmntque(vp, mp). It does not provide an error path, but that could be fixed by changing all callers in the file system specific code. Also a creating device alias during unmount is unlikely. But it fails miserably with softdeps. When a buffer gets fushed, new vnodes are allocated and added to the list. As my diff prevents vnode allocation during unmount, the metadata is not updated. So you end with a clean file system where fsck -f finds wrong reference counters. That is why I have to keep the list alive while flushing it out. So I came to the TAILQ solution. > The alternative would be to add another loop around the list processing. I > think that's worse because we don't know how many times to keep looping. There might be a race when TAILQ_FOREACH_SAFE reaches the end, next is NULL, but during the final sleep a new vnode is inserted. Then we may miss to flush it. I think vlfush() should use a while(!TAILQ_EMPTY()) logic. But I don't want to change that in a huge diff. There are other _SAFE macros that are not needed or even unsafe. I want to convert to TAILQ first to be more flexible. Reversing the list is enough to fix my existing panic. > bluhm's diff is the best approach imo. ok me. I also think this is the easiest step in the right direction. bluhm Index: kern/vfs_subr.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.297 diff -u -p -r1.297 vfs_subr.c --- kern/vfs_subr.c 30 Dec 2019 13:49:40 - 1.297 +++ kern/vfs_subr.c 3 Jan 2020 13:13:42 - @@ -401,6 +401,11 @@ getnewvnode(enum vtagtype tag, struct mo toggle = 0; s = splbio(); + if (mp != NULL && mp->mnt_flag & MNT_FORCE) { + splx(s); + *vpp = NULLVP; + return (EIO); + } if ((numvnodes < maxvnodes) || ((TAILQ_FIRST(listhd = _free_list) == NULL) && ((TAILQ_FIRST(listhd = _hold_list) == NULL) || toggle))) { @@ -425,7 +430,7 @@ getnewvnode(enum vtagtype tag, struct mo if (vp == NULL) { splx(s); tablefull("vnode"); - *vpp = 0; + *vpp = NULLVP; return (ENFILE); } @@ -480,8 +485,12 @@ insmntque(struct vnode *vp, struct mount /* * Insert into list of vnodes for the new mount point, if available. */ - if ((vp->v_mount = mp) != NULL) + if ((vp->v_mount = mp) != NULL) { + /* Do not insert new vnodes while forcefully unmounting. */ + if (mp->mnt_flag & MNT_FORCE) + printf("%s: insert vnode during unmount\n", __func__); LIST_INSERT_HEAD(>mnt_vnodelist, vp, v_mntvnodes); + } } /* Index: kern/vfs_syscalls.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.338 diff -u -p -r1.338 vfs_syscalls.c --- kern/vfs_syscalls.c 29 Nov 2019 20:58:17 - 1.338 +++ kern/vfs_syscalls.c 1 Jan 2020 17:49:55 - @@ -477,6 +477,9 @@ dounmount_leaf(struct mount *mp, int fla int error; int hadsyncer = 0; + KASSERT(vfs_isbusy(mp)); + if (flags & MNT_FORCE) + mp->mnt_flag |= MNT_FORCE; mp->mnt_flag &=~ MNT_ASYNC; cache_purgevfs(mp); /* remove cache entries for this file sys */ if (mp->mnt_syncer != NULL) { @@ -501,6 +504,7 @@ dounmount_leaf(struct mount *mp, int fla if (error && !(flags & MNT_DOOMED)) { if ((mp->mnt_flag & MNT_RDONLY) == 0 && hadsyncer) (void) vfs_allocate_syncvnode(mp); + mp->mnt_flag &=~ MNT_FORCE; vfs_unbusy(mp);
Re: dangling vnode panic
Martin Pieuchot wrote: > On 09/01/20(Thu) 10:46, Alexander Bluhm wrote: > > Without this diff my regress machines became so unstable, that they > > often do not finish the test run. > > > > With this diff, they run fine. No regressions. > > Sadly this is a workaround. What is the issue? We're missing a barrier > during umount? How does other BSDs solved that? > > If your diff work because you changed a TAIL vs a HEAD insert and now > how to reproduce it you could capture stack traces from the function > doing the insert. Is it insmntque()? Could we prevent adding the vnode > to the list if a dying flag is on the mount point? I don't think that's possible. syncing the filesystem can cause more work to be queued. If you prevent adding the vnode to the list, how will that work be done? The alternative would be to add another loop around the list processing. I think that's worse because we don't know how many times to keep looping. bluhm's diff is the best approach imo. ok me.
Re: dangling vnode panic
On 09/01/20(Thu) 10:46, Alexander Bluhm wrote: > Without this diff my regress machines became so unstable, that they > often do not finish the test run. > > With this diff, they run fine. No regressions. Sadly this is a workaround. What is the issue? We're missing a barrier during umount? How does other BSDs solved that? If your diff work because you changed a TAIL vs a HEAD insert and now how to reproduce it you could capture stack traces from the function doing the insert. Is it insmntque()? Could we prevent adding the vnode to the list if a dying flag is on the mount point? > Any ok? > > bluhm > > On Sat, Jan 04, 2020 at 10:55:46PM +0100, Alexander Bluhm wrote: > > On my amd64 test machine regress/sys/kern/mount run-regress-unmount-busy > > triggers the dangling vnode panic regulary. Something has changed > > in the previous months that makes it more likely. > > > > Problem is that while flushing the mnt_vnodelist list, the unmount > > process sleeps. Then active processes can write new dirty vnodes > > to the list. This happens without softdep. > > > > My first attempt was to disable insertions while unmounting. That > > does not work. A buffer flush can cause softdep to add new dirty > > vnodes to the list. > > > > This diff adds the dirty vnodes to the end of the list. vflush() > > trverses it from the begining. Then new vnodes, that are added > > during syncing, will be flushed. This fixes my test. > > > > Please test with and without softdep. > > > > ok? > > > > bluhm > > > > Index: sys/kern/vfs_subr.c > > === > > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_subr.c,v > > retrieving revision 1.297 > > diff -u -p -r1.297 vfs_subr.c > > --- sys/kern/vfs_subr.c 30 Dec 2019 13:49:40 - 1.297 > > +++ sys/kern/vfs_subr.c 4 Jan 2020 18:42:25 - > > @@ -178,7 +178,7 @@ vfs_mount_alloc(struct vnode *vp, struct > > rw_init_flags(>mnt_lock, "vfslock", RWL_IS_VNODE); > > (void)vfs_busy(mp, VB_READ|VB_NOWAIT); > > > > - LIST_INIT(>mnt_vnodelist); > > + TAILQ_INIT(>mnt_vnodelist); > > mp->mnt_vnodecovered = vp; > > > > atomic_inc_int(>vfc_refcount); > > @@ -476,12 +476,12 @@ insmntque(struct vnode *vp, struct mount > > * Delete from old mount point vnode list, if on one. > > */ > > if (vp->v_mount != NULL) > > - LIST_REMOVE(vp, v_mntvnodes); > > + TAILQ_REMOVE(>v_mount->mnt_vnodelist, vp, v_mntvnodes); > > /* > > * Insert into list of vnodes for the new mount point, if available. > > */ > > if ((vp->v_mount = mp) != NULL) > > - LIST_INSERT_HEAD(>mnt_vnodelist, vp, v_mntvnodes); > > + TAILQ_INSERT_TAIL(>mnt_vnodelist, vp, v_mntvnodes); > > } > > > > /* > > @@ -872,7 +872,7 @@ vfs_mount_foreach_vnode(struct mount *mp > > int error = 0; > > > > loop: > > - LIST_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) { > > + TAILQ_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) { > > if (vp->v_mount != mp) > > goto loop; > > > > @@ -1299,7 +1299,7 @@ printlockedvnodes(void) > > TAILQ_FOREACH(mp, , mnt_list) { > > if (vfs_busy(mp, VB_READ|VB_NOWAIT)) > > continue; > > - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > > + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > > if (VOP_ISLOCKED(vp)) > > vprint(NULL, vp); > > } > > @@ -2287,7 +2287,7 @@ vfs_mount_print(struct mount *mp, int fu > > (*pr)("locked vnodes:"); > > /* XXX would take mountlist lock, except ddb has no context */ > > cnt = 0; > > - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > > + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > > if (VOP_ISLOCKED(vp)) { > > if (cnt == 0) > > (*pr)("\n %p", vp); > > @@ -2304,7 +2304,7 @@ vfs_mount_print(struct mount *mp, int fu > > (*pr)("all vnodes:"); > > /* XXX would take mountlist lock, except ddb has no context */ > > cnt = 0; > > - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > > + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > > if (cnt == 0) > >
Re: dangling vnode panic
Without this diff my regress machines became so unstable, that they often do not finish the test run. With this diff, they run fine. No regressions. Any ok? bluhm On Sat, Jan 04, 2020 at 10:55:46PM +0100, Alexander Bluhm wrote: > On my amd64 test machine regress/sys/kern/mount run-regress-unmount-busy > triggers the dangling vnode panic regulary. Something has changed > in the previous months that makes it more likely. > > Problem is that while flushing the mnt_vnodelist list, the unmount > process sleeps. Then active processes can write new dirty vnodes > to the list. This happens without softdep. > > My first attempt was to disable insertions while unmounting. That > does not work. A buffer flush can cause softdep to add new dirty > vnodes to the list. > > This diff adds the dirty vnodes to the end of the list. vflush() > trverses it from the begining. Then new vnodes, that are added > during syncing, will be flushed. This fixes my test. > > Please test with and without softdep. > > ok? > > bluhm > > Index: sys/kern/vfs_subr.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_subr.c,v > retrieving revision 1.297 > diff -u -p -r1.297 vfs_subr.c > --- sys/kern/vfs_subr.c 30 Dec 2019 13:49:40 - 1.297 > +++ sys/kern/vfs_subr.c 4 Jan 2020 18:42:25 - > @@ -178,7 +178,7 @@ vfs_mount_alloc(struct vnode *vp, struct > rw_init_flags(>mnt_lock, "vfslock", RWL_IS_VNODE); > (void)vfs_busy(mp, VB_READ|VB_NOWAIT); > > - LIST_INIT(>mnt_vnodelist); > + TAILQ_INIT(>mnt_vnodelist); > mp->mnt_vnodecovered = vp; > > atomic_inc_int(>vfc_refcount); > @@ -476,12 +476,12 @@ insmntque(struct vnode *vp, struct mount >* Delete from old mount point vnode list, if on one. >*/ > if (vp->v_mount != NULL) > - LIST_REMOVE(vp, v_mntvnodes); > + TAILQ_REMOVE(>v_mount->mnt_vnodelist, vp, v_mntvnodes); > /* >* Insert into list of vnodes for the new mount point, if available. >*/ > if ((vp->v_mount = mp) != NULL) > - LIST_INSERT_HEAD(>mnt_vnodelist, vp, v_mntvnodes); > + TAILQ_INSERT_TAIL(>mnt_vnodelist, vp, v_mntvnodes); > } > > /* > @@ -872,7 +872,7 @@ vfs_mount_foreach_vnode(struct mount *mp > int error = 0; > > loop: > - LIST_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) { > + TAILQ_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) { > if (vp->v_mount != mp) > goto loop; > > @@ -1299,7 +1299,7 @@ printlockedvnodes(void) > TAILQ_FOREACH(mp, , mnt_list) { > if (vfs_busy(mp, VB_READ|VB_NOWAIT)) > continue; > - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > if (VOP_ISLOCKED(vp)) > vprint(NULL, vp); > } > @@ -2287,7 +2287,7 @@ vfs_mount_print(struct mount *mp, int fu > (*pr)("locked vnodes:"); > /* XXX would take mountlist lock, except ddb has no context */ > cnt = 0; > - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > if (VOP_ISLOCKED(vp)) { > if (cnt == 0) > (*pr)("\n %p", vp); > @@ -2304,7 +2304,7 @@ vfs_mount_print(struct mount *mp, int fu > (*pr)("all vnodes:"); > /* XXX would take mountlist lock, except ddb has no context */ > cnt = 0; > - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { > if (cnt == 0) > (*pr)("\n %p", vp); > else if ((cnt % (72 / (sizeof(void *) * 2 + 4))) == 0) > Index: sys/kern/vfs_syscalls.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_syscalls.c,v > retrieving revision 1.338 > diff -u -p -r1.338 vfs_syscalls.c > --- sys/kern/vfs_syscalls.c 29 Nov 2019 20:58:17 - 1.338 > +++ sys/kern/vfs_syscalls.c 4 Jan 2020 18:41:20 - > @@ -489,7 +489,7 @@ dounmount_leaf(struct mount *mp, int fla >* Before calling file system unmount, make sure >* all unveils to vnodes in here are dropped. >*/ > - LIST_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) {
dangling vnode panic
Hi, On my amd64 test machine regress/sys/kern/mount run-regress-unmount-busy triggers the dangling vnode panic regulary. Something has changed in the previous months that makes it more likely. Problem is that while flushing the mnt_vnodelist list, the unmount process sleeps. Then active processes can write new dirty vnodes to the list. This happens without softdep. My first attempt was to disable insertions while unmounting. That does not work. A buffer flush can cause softdep to add new dirty vnodes to the list. This diff adds the dirty vnodes to the end of the list. vflush() trverses it from the begining. Then new vnodes, that are added during syncing, will be flushed. This fixes my test. Please test with and without softdep. ok? bluhm Index: sys/kern/vfs_subr.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.297 diff -u -p -r1.297 vfs_subr.c --- sys/kern/vfs_subr.c 30 Dec 2019 13:49:40 - 1.297 +++ sys/kern/vfs_subr.c 4 Jan 2020 18:42:25 - @@ -178,7 +178,7 @@ vfs_mount_alloc(struct vnode *vp, struct rw_init_flags(>mnt_lock, "vfslock", RWL_IS_VNODE); (void)vfs_busy(mp, VB_READ|VB_NOWAIT); - LIST_INIT(>mnt_vnodelist); + TAILQ_INIT(>mnt_vnodelist); mp->mnt_vnodecovered = vp; atomic_inc_int(>vfc_refcount); @@ -476,12 +476,12 @@ insmntque(struct vnode *vp, struct mount * Delete from old mount point vnode list, if on one. */ if (vp->v_mount != NULL) - LIST_REMOVE(vp, v_mntvnodes); + TAILQ_REMOVE(>v_mount->mnt_vnodelist, vp, v_mntvnodes); /* * Insert into list of vnodes for the new mount point, if available. */ if ((vp->v_mount = mp) != NULL) - LIST_INSERT_HEAD(>mnt_vnodelist, vp, v_mntvnodes); + TAILQ_INSERT_TAIL(>mnt_vnodelist, vp, v_mntvnodes); } /* @@ -872,7 +872,7 @@ vfs_mount_foreach_vnode(struct mount *mp int error = 0; loop: - LIST_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) { + TAILQ_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) { if (vp->v_mount != mp) goto loop; @@ -1299,7 +1299,7 @@ printlockedvnodes(void) TAILQ_FOREACH(mp, , mnt_list) { if (vfs_busy(mp, VB_READ|VB_NOWAIT)) continue; - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { if (VOP_ISLOCKED(vp)) vprint(NULL, vp); } @@ -2287,7 +2287,7 @@ vfs_mount_print(struct mount *mp, int fu (*pr)("locked vnodes:"); /* XXX would take mountlist lock, except ddb has no context */ cnt = 0; - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { if (VOP_ISLOCKED(vp)) { if (cnt == 0) (*pr)("\n %p", vp); @@ -2304,7 +2304,7 @@ vfs_mount_print(struct mount *mp, int fu (*pr)("all vnodes:"); /* XXX would take mountlist lock, except ddb has no context */ cnt = 0; - LIST_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { + TAILQ_FOREACH(vp, >mnt_vnodelist, v_mntvnodes) { if (cnt == 0) (*pr)("\n %p", vp); else if ((cnt % (72 / (sizeof(void *) * 2 + 4))) == 0) Index: sys/kern/vfs_syscalls.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.338 diff -u -p -r1.338 vfs_syscalls.c --- sys/kern/vfs_syscalls.c 29 Nov 2019 20:58:17 - 1.338 +++ sys/kern/vfs_syscalls.c 4 Jan 2020 18:41:20 - @@ -489,7 +489,7 @@ dounmount_leaf(struct mount *mp, int fla * Before calling file system unmount, make sure * all unveils to vnodes in here are dropped. */ - LIST_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) { + TAILQ_FOREACH_SAFE(vp , >mnt_vnodelist, v_mntvnodes, nvp) { unveil_removevnode(vp); } @@ -511,7 +511,7 @@ dounmount_leaf(struct mount *mp, int fla vrele(coveredvp); } - if (!LIST_EMPTY(>mnt_vnodelist)) + if (!TAILQ_EMPTY(>mnt_vnodelist)) panic("unmount: dangling vnode"); vfs_unbusy(mp); Index: sys/nfs/nfs_subs.c === RCS file: /data/mirror/openbsd/cvs/src/sys/nfs/nfs_subs.c,v retrieving revision 1.140 diff -u -p -r1.140 nfs_subs.c --- sys/nfs/nfs_subs.c 25 Dec 2019 12:31:12 -