Re: NFS lockup after UDP fragments getting lost

Greg Troxel Wed, 31 Jul 2019 05:54:42 -0700

Edgar Fuß <[email protected]> writes:

> Thanks to riastradh@, this tuned out to be caused by an (UDP, hard)
> HFS mount combined with a mis-configured IPFilter that blocked all but
> the first fragment of a fragmented NFS reply (e.g., readdir) combined
> with a NetBSD design error (or so Taylor says) that a vnode lock may
> be held accross I/O, in this case, network I/O.


Holding a vnode lock across IO seems like a bug to me too.  Marking the
vnode as having an in-process operation so others can
lock/read/report-that-status/unlock seems ok.  But I'm sure you already
know that vnode locking is hard.

> It looks like the operation to which the reply was lost sometimes
> doesn't get retried. Do we have some weird bug where the first
> fragment arriving stops the timeout but the blocking of the remaining
> fragments cause it to wedge?

Probably not.  fragments sit until there's a packet and then the packet
is sent to the stack.  So the NFS code is almost certainly totally
unaware of the arrival of the first fragment.

Re: NFS lockup after UDP fragments getting lost

Reply via email to