NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

Edgar Fuß Wed, 31 Jul 2019 01:45:38 -0700

Thanks to riastradh@, this tuned out to be caused by an (UDP, hard) HFS mount 
combined with a mis-configured IPFilter that blocked all but the first fragment 
of a fragmented NFS reply (e.g., readdir) combined with a NetBSD design error 
(or so Taylor says) that a vnode lock may be held accross I/O, in this case, 
network I/O.


It should be reproducable with a default NFS mount and a
        block in all with frag-body
IPFilter rule and then trying to readdir.

Now, in some cases, the machine in question recovered after fixing the filter 
rules, in others, it didn't, forcing a reboot. This strikes me as a bug because 
the same lock-up could as well have been caused by network problems instead of 
ipfilter mis-configuration.

It looks like the operation to which the reply was lost sometimes doesn't get 
retried. Do we have some weird bug where the first fragment arriving stops the 
timeout but the blocking of the remaining fragments cause it to wedge?

NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

Reply via email to