On 12/11/15 15:23, Anton Ivanov wrote:
> [snip]
>
>>> Hmmm, UML is UP and does not support PREEMPT, so all spinlocks
>>> should be a no-op.
>> In that case, if I understand correctly what is going on, there are a
>> couple of places - the free_irqs(), activate_fd and the sigio handler
>> itself, where it should not be a mutex, not a spinlock. It is there to
>> ensure that you cannot use it in an interrupt context while it is being
>> modified.
>>
>> If spinlock is a NOP it fails to perform this duty. The code should also
>> be different - it should return on try_lock so it does not deadlock so
>> spinlock_irqsave is the wrong locking primitive as it does not have this
>> functionality.
>>
>> That is an issue both with this patch and with the original poll based
>> controller - there free_irq, add_fd, reactivate_fd can all theoretically
>> produce a race if you are adding/removing devices while under high IO load.
> We, however cannot use mutex here as it is interrupt.
>
> I tried with spin_trylock and finally got the correct behaviour. It
> throws an occasional warning here and there while inserting/removing
> devices, but works correctly with either config. No more BUGs.
>
> A bare (not try) spinlock with UP/PREEMPT set as they are in UML
> actually does not guard anything effectively - it is a NOP. The try form
> is an exemption - if you look at spinlock.h it is actually "viable" even
> on UP. It will however throw a warning that it is activated in an
> inappropriate context if it hits an existing lock.
>
> In theory - the code in signal.c should guard against nested interrupt
> invocation. I am still struggling to understand why it fails to work in
> practice.
>
> This also leaves open the question on how to add/remove interrupts. If
> the spinlock does not actually guard the irq data structures properly
> modifying them in a safe manner becomes a very interesting exercise. I
> have it working with the try form, but that throws the occasional warning.
>
> I am going to clean it up and re-submit so we have a "working version"
> which people can comment on.

Putting an extra guard around the signal handler in signal.c which 
prevents recursive invocation solves it completely.

I am going to clean it up, see if we need a similar guard around the 
timer interrupt and re-submit tomorrow morning.

>
> A.
>
>> A.
>>
>>> Do you have lock debugging enabled?
>>>
>>> I this case I'd start gdb and inspect the memory. Maybe a stack corruption.
>>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> User-mode-linux-devel mailing list
>> User-mode-linux-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>

------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Reply via email to