On Sat, 2010-01-23 at 11:09 +0100, Jan Kiszka wrote: > Philippe Gerum wrote: > > On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote: > >> On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: > >>> Philippe Gerum wrote: > >>>> On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: > >>>>> Gilles Chanteperdrix wrote: > >>>>>> Philippe Gerum wrote: > >>>>>>> On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: > >>>>>>>> Hi guys, > >>>>>>>> > >>>>>>>> we are currently trying to catch an ugly Linux pipeline state > >>>>>>>> corruption > >>>>>>>> on x86-64. > >>>>>>>> > >>>>>>>> Conceptual question: If a Xenomai task causes a fault, we enter > >>>>>>>> ipipe_trap_notify over the primary domain and leave it over the root > >>>>>>>> domain, right? Now, if the root domain happened to be stalled when > >>>>>>>> the > >>>>>>>> exception happened, where should it normally be unstalled again, > >>>>>>>> *for_that_task*? Our problem is that we generate a code path where > >>>>>>>> this > >>>>>>>> does not happen. > >>>>>>> xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> > >>>>>>> finish_lock_switch -> unstall > >>>>>>> > >>>>>>> Since xnshadow_relax is called on behalf the event dispatcher, we > >>>>>>> should > >>>>>>> expect it to return with the root domain unstalled after a domain > >>>>>>> downgrade, from primary to root. > >>>>>> Ok, but what about local_irq_restore_nosync at the end of the function > >>>>>> ? > >>>>>> > >>>>> That is, IMO, our problem: It replays the root state on fault entry, but > >>>>> that one is totally unrelated to the (Xenomai) task that caused the > >>>>> fault. > >>>> The code seems fishy. Try restoring only when the incoming domain was > >>>> the root one. Indeed. > >>>> > >>> Something like this? > >>> > >>> diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c > >>> index 4442d96..0558ea3 100644 > >>> --- a/arch/x86/kernel/ipipe.c > >>> +++ b/arch/x86/kernel/ipipe.c > >>> @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { > >>> > >>> int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int > >>> vector) > >>> { > >>> + bool restore_flags = false; > >>> unsigned long flags; > >>> > >>> - /* Pick up the root domain state of the interrupted context. */ > >>> - local_save_flags(flags); > >>> + if (ipipe_root_domain_p && irqs_disabled_hw()) { > >>> + /* Pick up the root domain state of the interrupted context. */ > >>> + local_save_flags(flags); > >>> > >>> - if (ipipe_root_domain_p) { > >>> /* > >>> * Replicate hw interrupt state into the virtual mask before > >>> * calling the I-pipe event handler over the root domain. Also > >>> * required later when calling the Linux exception handler. > >>> */ > >>> - if (irqs_disabled_hw()) > >>> - local_irq_disable(); > >>> + local_irq_disable(); > >>> + > >>> + restore_flags = true; > >>> } > >>> #ifdef CONFIG_KGDB > >>> /* catch exception KGDB is interested in over non-root domains */ > >>> @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, > >>> long error_code, int vector) > >>> #endif /* CONFIG_KGDB */ > >>> > >>> if (unlikely(ipipe_trap_notify(vector, regs))) { > >>> - local_irq_restore_nosync(flags); > >>> + if (restore_flags) > >>> + local_irq_restore_nosync(flags); > >>> return 1; > >>> } > >>> > >>> @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, > >>> long error_code, int vector) > >>> * Relevant for 64-bit: Restore root domain state as the low-level > >>> * return code will not align it to regs.flags. > >>> */ > >>> - local_irq_restore_nosync(flags); > >>> + if (restore_flags) > >>> + local_irq_restore_nosync(flags); > >>> > >>> return 0; > >>> } > >>> > >>> > >>> We are currently not able to test this on the system that triggers it, > >>> but we'll do so tomorrow (yeah...). > >>> > >> Should work. Famous last words. > >> > > > > Strike that. This won't work, because the fixup code will use the saved > > flags even when root is not the incoming domain and/or hw IRQs are on on > > entry. In short, local_save_flags() must be done unconditionally, as > > previously. > > It will accidentally work for 64-bit where __fixup_if is empty. And for > 32-bit, I would say we need to make it depend on restore_flags as well. >
AFAIK, Gilles is working on this. We just need to avoid stepping on 32bit toes to fix 64. > Jan > -- Philippe. _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core