On 12/11/15 15:23, Anton Ivanov wrote: > [snip] > >>> Hmmm, UML is UP and does not support PREEMPT, so all spinlocks >>> should be a no-op. >> In that case, if I understand correctly what is going on, there are a >> couple of places - the free_irqs(), activate_fd and the sigio handler >> itself, where it should not be a mutex, not a spinlock. It is there to >> ensure that you cannot use it in an interrupt context while it is being >> modified. >> >> If spinlock is a NOP it fails to perform this duty. The code should also >> be different - it should return on try_lock so it does not deadlock so >> spinlock_irqsave is the wrong locking primitive as it does not have this >> functionality. >> >> That is an issue both with this patch and with the original poll based >> controller - there free_irq, add_fd, reactivate_fd can all theoretically >> produce a race if you are adding/removing devices while under high IO load. > We, however cannot use mutex here as it is interrupt. > > I tried with spin_trylock and finally got the correct behaviour. It > throws an occasional warning here and there while inserting/removing > devices, but works correctly with either config. No more BUGs. > > A bare (not try) spinlock with UP/PREEMPT set as they are in UML > actually does not guard anything effectively - it is a NOP. The try form > is an exemption - if you look at spinlock.h it is actually "viable" even > on UP. It will however throw a warning that it is activated in an > inappropriate context if it hits an existing lock. > > In theory - the code in signal.c should guard against nested interrupt > invocation. I am still struggling to understand why it fails to work in > practice. > > This also leaves open the question on how to add/remove interrupts. If > the spinlock does not actually guard the irq data structures properly > modifying them in a safe manner becomes a very interesting exercise. I > have it working with the try form, but that throws the occasional warning. > > I am going to clean it up and re-submit so we have a "working version" > which people can comment on.
Putting an extra guard around the signal handler in signal.c which prevents recursive invocation solves it completely. I am going to clean it up, see if we need a similar guard around the timer interrupt and re-submit tomorrow morning. > > A. > >> A. >> >>> Do you have lock debugging enabled? >>> >>> I this case I'd start gdb and inspect the memory. Maybe a stack corruption. >>> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> User-mode-linux-devel mailing list >> User-mode-linux-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel >> > > ------------------------------------------------------------------------------ > _______________________________________________ > User-mode-linux-devel mailing list > User-mode-linux-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel > ------------------------------------------------------------------------------ _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel