> Date: Wed, 23 Nov 2022 10:52:32 +0100
> From: Martin Pieuchot <m...@openbsd.org>
> 
> On 22/11/22(Tue) 23:40, Mark Kettenis wrote:
> > > Date: Tue, 22 Nov 2022 17:47:44 +0000
> > > From: Miod Vallat <m...@online.fr>
> > > 
> > > > Here is a diff.  Maybe bluhm@ can try this on the macppc machine that
> > > > triggered the original "vref used where vget required" problem?
> > > 
> > > On a similar machine it panics after a few hours with:
> > > 
> > > panic: uvn_flush: PGO_SYNCIO return 'try again' error (impossible)
> > > 
> > > The trace (transcribed by hand) is
> > > uvn_flush+0x820
> > > uvm_vnp_terminate+0x79
> > > vclean+0xdc
> > > vgonel+0x70
> > > getnewvnode+0x240
> > > ffs_vget+0xcc
> > > ffs_inode_alloc+0x13c
> > > ufs_makeinode+0x94
> > > ufs_create+0x58
> > > VOP_CREATE+0x48
> > > vn_open+0x188
> > > doopenat+0x1b4
> > 
> > Ah right, there is another path where we end up with a refcount of
> > zero.  Should be fixable, but I need to think about this for a bit.
> 
> Not sure to understand what you mean with refcount of 0.  Could you
> elaborate?

Sorry, I was thinking ahead a bit.  I'm pretty much convinced that the
issue we're dealing with is a race between a vnode being
recycled/cleaned and the pagedaemon paging out pages associated with
that same vnode.

The crashes we've seen before were all in the pagedaemon path where we
end up calling into the VFS layer with a vnode that has v_usecount ==
0.  My "fix" avoids that, but hits the issue that when we are in the
codepath that is recycling/cleaning the vnode, we can't use vget() to
get a reference to the vnode since it checks that the vnode isn't in
the process of being cleaned.

But if we avoid that issue (by for example) skipping the vget() call
if the UVM_VNODE_DYING flag is set, we run into the same scenario
where we call into the VFS layer with v_usecount == 0.  Now that may
not actually be a problem, but I need to investigate this a bit more.

Or maybe calling into the VFS layer with a vnode that has v_usecount
== 0 is perfectly fine and we should do the vget() dance I propose in
uvm_vnp_unache() instead of in uvn_put().

> My understanding of the panic reported is that the proposed diff creates
> a complicated relationship between the vnode and the UVM vnode layer.
> The above problem occurs because VXLOCK is set on a vnode when it is
> being recycled *before* calling uvm_vnp_terminate().   Now that uvn_io()
> calls vget(9) it will fail because VXLOCK is set, which is what we want
> during vclean(9).

I don't think my diff make the relationship between the vnode and the
UVM vnode layer any more complicated than it already is.  But yes, the
VXLOCK thing you describe is what I didn't foresee.

Reply via email to