On Thu, Sep 27, 2018 at 01:03:40AM +0000, Emmanuel Dreyfus wrote:
> Module Name:  src
> Committed By: manu
> Date:         Thu Sep 27 01:03:40 UTC 2018
> 
> Modified Files:
>       src/sys/kern: vfs_trans.c
> 
> Log Message:
> Work around deadlock between fstchg and fstcnt
> 
> When suspending a filesystem in fstrans_setstate(), we wait on
> fstcnt for threads to finish transactions. While we do this, any
> thread trying to start a filesystem transaction will wait on fstchg
> in fstrans_start(), a situation which can deadlock.
> 
> The wait for fstcnt in fstrans_setstate() can be interrupted by
> a signal, but the wait for fstchg in fstrans_start() cannot. Once
> most processes are stuck in fstchg, it is impossible to send a
> signal to the thread that waits on fstcnt, because no process
> respond anymore to user input.
> 
> We fix that by adding a timeout to the wait on fstcnt in
> fstrans_setstate(). This means suspending a filesystem may fail,
> but it was already the case when the sleep was interupted by
> a signal, hence calling function must already handle a possible
> failure.

Actually callers do not, they just forward the failure.
This means that things like e.g. umount or snapshots will randomly
fail (whenever fstrans_setstate() can actually take a long
time without deadlock has to be determined, but I suspect it can). 

I think fstrans_setstate() should pause and retry in this case.

-- 
Manuel Bouyer <bou...@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--

Reply via email to