> On 27. Sep 2018, at 09:44, Manuel Bouyer <bou...@antioche.eu.org> wrote:
> 
> On Thu, Sep 27, 2018 at 01:03:40AM +0000, Emmanuel Dreyfus wrote:
>> Module Name: src
>> Committed By:        manu
>> Date:                Thu Sep 27 01:03:40 UTC 2018
>> 
>> Modified Files:
>>      src/sys/kern: vfs_trans.c
>> 
>> Log Message:
>> Work around deadlock between fstchg and fstcnt
>> 
>> When suspending a filesystem in fstrans_setstate(), we wait on
>> fstcnt for threads to finish transactions. While we do this, any
>> thread trying to start a filesystem transaction will wait on fstchg
>> in fstrans_start(), a situation which can deadlock.
>> 
>> The wait for fstcnt in fstrans_setstate() can be interrupted by
>> a signal, but the wait for fstchg in fstrans_start() cannot. Once
>> most processes are stuck in fstchg, it is impossible to send a
>> signal to the thread that waits on fstcnt, because no process
>> respond anymore to user input.
>> 
>> We fix that by adding a timeout to the wait on fstcnt in
>> fstrans_setstate(). This means suspending a filesystem may fail,
>> but it was already the case when the sleep was interupted by
>> a signal, hence calling function must already handle a possible
>> failure.
> 
> Actually callers do not, they just forward the failure.
> This means that things like e.g. umount or snapshots will randomly
> fail (whenever fstrans_setstate() can actually take a long
> time without deadlock has to be determined, but I suspect it can).
> 
> I think fstrans_setstate() should pause and retry in this case.

As Manuel said ...

This change is wrong, there is no upper time limit for a successfull
file system suspension.

Please revert ASAP.

I will take a look into this PR next week.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to