On Mon, 2010-10-25 at 23:22 +0200, Jan Kiszka wrote: > Am 25.10.2010 23:12, Philippe Gerum wrote: > > On Mon, 2010-10-25 at 21:22 +0200, Jan Kiszka wrote: > >> Am 25.10.2010 21:20, Philippe Gerum wrote: > >>> On Mon, 2010-10-25 at 21:15 +0200, Jan Kiszka wrote: > >>>> Am 25.10.2010 21:08, Philippe Gerum wrote: > >>>>> On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote: > >>>>>> Am 25.10.2010 18:48, Philippe Gerum wrote: > >>>>>>> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: > >>>>>>>>> > >>>>>>>>> Should we test IPIPE_STALL_FLAG on all but current CPUs? > >>>>>>>> > >>>>>>>> That would solve this particular issue, but we should drain the > >>>>>>>> pipeline > >>>>>>>> out of any Xenomai critical section. The way it is done now may > >>>>>>>> induce a > >>>>>>>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in > >>>>>>>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs > >>>>>>>> off > >>>>>>>> for CPU0 to release the Xenomai lock that annoys us right now). > >>>>>>>> > >>>>>>>> I'll come up with something hopefully better and tested in the next > >>>>>>>> days. > >>>>>>>> > >>>>>>> > >>>>>>> Sorry for the lag. In case that helps, here is another approach, based > >>>>>>> on telling the pipeline to ignore the irq about to be detached, so > >>>>>>> that > >>>>>>> it passes all further occurrences down to the next domain, without > >>>>>> > >>>>>> Err, won't this irritate that next domain, ie. won't Linux dump > >>>>>> warnings > >>>>>> about a spurious/unhandled IRQ? I think either the old handler shall > >>>>>> receive the last event or no one. > >>>>> > >>>>> Flipping the IRQ modes within a ipipe_critical_enter/exit section gives > >>>>> you that guarantee. You are supposed to have disabled the irq line > >>>>> before detaching, and critical IPIs cannot be acknowledged until all > >>>>> CPUs have re-enabled interrupts at some point. Therefore, there are only > >>>>> two scenarii: > >>>>> > >>>>> - irq was disabled before delivery, and a pending interrupt is masked by > >>>>> the PIC and never delivered to the CPU. > >>>>> > >>>>> - an interrupt sneaked in before disabling, it is currently processed by > >>>>> the pipeline in the low handler on some CPU, in which case interrupts > >>>>> are off, so a critical IPI could be acked yet, and the irq mode bits > >>>>> still allow dispatching to the target domain on that CPU. The assumption > >>>>> which is happily made is that only head domains are interested in > >>>>> un-virtualizing irqs, so the dispatch will happen immediately, while the > >>>>> handler is still valid (actually, we are not allowed to un-virtualize > >>>>> root irqs, and intermediate Adeos domains are already considered as > >>>>> endangered species, so this is fine). > >>>>> > >>>>>> > >>>>>> Why this complex solution, why not simply draining (via critical_enter > >>>>>> or whatever) - but _after_ xnintr_irq_detach, ie. while the related > >>>>>> resources are still valid? > >>>>>> > >>>>> > >>>>> Because it's already too late. You have cleared the handler pointer when > >>>>> un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or > >>>>> the log syncer on another CPU could then branch to eip $0. > >>>> > >>>> Just make ipipe_virtualize_irq install a nop handler instead of NULL. > >>> > >>> This does not solve the issue of the last interrupt which should be > >>> processed. You don't want to miss it. > >> > >> Don't understand. No interrupt is supposed to arrive anymore on > >> deregistration, the last user is supposed to be down by now. We just > >> need to catch though that slipped through. > > > > No, we are handling the case when an interrupt is currently handled on a > > CPU which is not the one that unregisters the IRQ, and which managed to > > sneak in while the irq source was about to be masked in the PIC. This is > > purely asynchronous for us in SMP, since we don't have irq descriptor > > locks for the pipeline, we only have them at Xenomai level, which is one > > step too far to protected our low level Adeos handler against that kind > > of race. Logically speaking, there is no reason why you would accept to > > leave that irq unhandled if it is there and known from a CPU (it was > > actually the first point you raised). > > If that handler is already running, the IRQ will get handled, we just > need to wait for the handler to finish after we returned from > ipipe_virtualize_irq => thus we need a barrier here. >
So, we let ipipe_virtualize_irq clear the handler pointer any remote CPU could use in parallel, and we...synchronize on the outcome? The notion of "handler" may explain why we don't sync yet: I'm not taking about the Xenomai entry for interrupts, I'm dealing with interrupts in the early code of ipipe_handle_irq, before you hit the wired irq dispatcher or the sync_stage loop. > If the handler was about to run but we deregistered it a bit quicker, > the IRQ need not be addressed at device level anymore. Reason: we > already switched off any IRQ assertions in the device before we entered > ipipe_virtualize_irq. So no harm is caused, that IRQ line is deasserted > already (IOW: the IRQ became a spurious one while cleaning up). You can attempt to disable the irq line on one CPU, and have a relevant irq entering the low level pipeline handler on another one at exactly the same time. We have _no_ sync here. > > > > > The proper way to fix this issue would have been to fix xnintr_detach in > > the first place, because calling ipipe_virtualize_irq() while holding a > > lock with irqs off is wrong. We could have then drained the pipeline > > from the unregistering code. I'm rather going for a decent solution > > which is not prone to regression for 2.x. > > > > Again, I see no reason for a more complex solution than avoiding that > NULL pointer dereference at ipipe level as I suggested and adding a very > simply system wide barrier right after dropping nklock in xnintr_detach. > You cannot drop that lock without rewriting the xnintr layer in rather touchy areas, that is the point. > Jan > -- Philippe. _______________________________________________ Xenomai-help mailing list [email protected] https://mail.gna.org/listinfo/xenomai-help
