On Fri, Jan 15, 2010 at 12:14:56PM +0100, Manuel Bouyer wrote: > > > Log Message: > > > Remove KASSERT(vp->v_usecount == 1) in getnewvnode() and ungetnewvnode(). > > > Another process could be vget()ing the vnode and bump v_usecount while > > > getcleanvnode() is vclean()ing it (as vclean drops the interlock). > > > vget() will then wait for VI_XLOCK or VI_FREEING to clear; and we could > > > test > > > this assertion while the other process is still slepping. We could even > > > end up in ungetnewvnode() before this other process got a chance to run. > > > > Why doesn't the v_usecount == 1 check in getcleanvnode() work? > > You're right; it's not while the first thread is in vclean() that the > second can grab a reference which will cause this issue. > > Then it's probably when getcleanvnode() drops the interlock after > vclean(). This means something can add a reference to a clean vnode, > this is bad. The KASSERT() is probably right. > > ufs_ihashget() looks safe for this, it drops the ufs_ihash_lock after > getting the interlock. > cache_lookup() also grabs the interlock before releasing ncp->nc_lock, > so it should be OK. > Any other place where a vnode could be cached without holding a > reference count ?
There is the mount list. getnewvnode() will remove it from the mount list after the KASSERT() so it looks like a good candidate. Maybe the issue is in qsync() and is related to kern/42205 (a DIAGNOSTIC kernel will panic on the KASSERT, a non-DIAGNOSTIC one would dereference a NULL pointer in the qsync() thread). vlean() sets vp->v_tag to VT_NON but doesn't clear v_type. The qsync() thread could find the vnode in the mount list, think it's valid and vget it while the getcleanvnode() thread has just released v_interlock and not set v_type to VNON yet. The last patch I propose in kern/42205 would fix this as well. I did a quick check of other 'v_type == VNON', they look safe. -- Manuel Bouyer <bou...@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference --