On Thu, Sep 27, 2018 at 01:03:40AM +0000, Emmanuel Dreyfus wrote: > Module Name: src > Committed By: manu > Date: Thu Sep 27 01:03:40 UTC 2018 > > Modified Files: > src/sys/kern: vfs_trans.c > > Log Message: > Work around deadlock between fstchg and fstcnt > > When suspending a filesystem in fstrans_setstate(), we wait on > fstcnt for threads to finish transactions. While we do this, any > thread trying to start a filesystem transaction will wait on fstchg > in fstrans_start(), a situation which can deadlock. > > The wait for fstcnt in fstrans_setstate() can be interrupted by > a signal, but the wait for fstchg in fstrans_start() cannot. Once > most processes are stuck in fstchg, it is impossible to send a > signal to the thread that waits on fstcnt, because no process > respond anymore to user input. > > We fix that by adding a timeout to the wait on fstcnt in > fstrans_setstate(). This means suspending a filesystem may fail, > but it was already the case when the sleep was interupted by > a signal, hence calling function must already handle a possible > failure.
Actually callers do not, they just forward the failure. This means that things like e.g. umount or snapshots will randomly fail (whenever fstrans_setstate() can actually take a long time without deadlock has to be determined, but I suspect it can). I think fstrans_setstate() should pause and retry in this case. -- Manuel Bouyer <bou...@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference --