Philippe Gerum wrote: > On Sat, 2010-01-23 at 11:09 +0100, Jan Kiszka wrote: >> Philippe Gerum wrote: >>> On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote: >>>> On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: >>>>> Philippe Gerum wrote: >>>>>> On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: >>>>>>> Gilles Chanteperdrix wrote: >>>>>>>> Philippe Gerum wrote: >>>>>>>>> On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: >>>>>>>>>> Hi guys, >>>>>>>>>> >>>>>>>>>> we are currently trying to catch an ugly Linux pipeline state >>>>>>>>>> corruption >>>>>>>>>> on x86-64. >>>>>>>>>> >>>>>>>>>> Conceptual question: If a Xenomai task causes a fault, we enter >>>>>>>>>> ipipe_trap_notify over the primary domain and leave it over the root >>>>>>>>>> domain, right? Now, if the root domain happened to be stalled when >>>>>>>>>> the >>>>>>>>>> exception happened, where should it normally be unstalled again, >>>>>>>>>> *for_that_task*? Our problem is that we generate a code path where >>>>>>>>>> this >>>>>>>>>> does not happen. >>>>>>>>> xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> >>>>>>>>> finish_lock_switch -> unstall >>>>>>>>> >>>>>>>>> Since xnshadow_relax is called on behalf the event dispatcher, we >>>>>>>>> should >>>>>>>>> expect it to return with the root domain unstalled after a domain >>>>>>>>> downgrade, from primary to root. >>>>>>>> Ok, but what about local_irq_restore_nosync at the end of the function >>>>>>>> ? >>>>>>>> >>>>>>> That is, IMO, our problem: It replays the root state on fault entry, but >>>>>>> that one is totally unrelated to the (Xenomai) task that caused the >>>>>>> fault. >>>>>> The code seems fishy. Try restoring only when the incoming domain was >>>>>> the root one. Indeed. >>>>>> >>>>> Something like this? >>>>> >>>>> diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c >>>>> index 4442d96..0558ea3 100644 >>>>> --- a/arch/x86/kernel/ipipe.c >>>>> +++ b/arch/x86/kernel/ipipe.c >>>>> @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { >>>>> >>>>> int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int >>>>> vector) >>>>> { >>>>> + bool restore_flags = false; >>>>> unsigned long flags; >>>>> >>>>> - /* Pick up the root domain state of the interrupted context. */ >>>>> - local_save_flags(flags); >>>>> + if (ipipe_root_domain_p && irqs_disabled_hw()) { >>>>> + /* Pick up the root domain state of the interrupted context. */ >>>>> + local_save_flags(flags); >>>>> >>>>> - if (ipipe_root_domain_p) { >>>>> /* >>>>> * Replicate hw interrupt state into the virtual mask before >>>>> * calling the I-pipe event handler over the root domain. Also >>>>> * required later when calling the Linux exception handler. >>>>> */ >>>>> - if (irqs_disabled_hw()) >>>>> - local_irq_disable(); >>>>> + local_irq_disable(); >>>>> + >>>>> + restore_flags = true; >>>>> } >>>>> #ifdef CONFIG_KGDB >>>>> /* catch exception KGDB is interested in over non-root domains */ >>>>> @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, >>>>> long error_code, int vector) >>>>> #endif /* CONFIG_KGDB */ >>>>> >>>>> if (unlikely(ipipe_trap_notify(vector, regs))) { >>>>> - local_irq_restore_nosync(flags); >>>>> + if (restore_flags) >>>>> + local_irq_restore_nosync(flags); >>>>> return 1; >>>>> } >>>>> >>>>> @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, >>>>> long error_code, int vector) >>>>> * Relevant for 64-bit: Restore root domain state as the low-level >>>>> * return code will not align it to regs.flags. >>>>> */ >>>>> - local_irq_restore_nosync(flags); >>>>> + if (restore_flags) >>>>> + local_irq_restore_nosync(flags); >>>>> >>>>> return 0; >>>>> } >>>>> >>>>> >>>>> We are currently not able to test this on the system that triggers it, >>>>> but we'll do so tomorrow (yeah...). >>>>> >>>> Should work. Famous last words. >>>> >>> Strike that. This won't work, because the fixup code will use the saved >>> flags even when root is not the incoming domain and/or hw IRQs are on on >>> entry. In short, local_save_flags() must be done unconditionally, as >>> previously. >> It will accidentally work for 64-bit where __fixup_if is empty. And for >> 32-bit, I would say we need to make it depend on restore_flags as well. >> > > AFAIK, Gilles is working on this. We just need to avoid stepping on > 32bit toes to fix 64.
The arguably less ambitious following patch works for me on x86_32: diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c index 4442d96..a7e1241 100644 --- a/arch/x86/kernel/ipipe.c +++ b/arch/x86/kernel/ipipe.c @@ -703,6 +703,7 @@ static int __ipipe_xlate_signo[] = { int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) { unsigned long flags; + unsigned root_on_entry = 0; /* Pick up the root domain state of the interrupted context. */ local_save_flags(flags); @@ -715,6 +716,7 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) */ if (irqs_disabled_hw()) local_irq_disable(); + root_on_entry = 1; } #ifdef CONFIG_KGDB /* catch exception KGDB is interested in over non-root domains */ @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) #endif /* CONFIG_KGDB */ if (unlikely(ipipe_trap_notify(vector, regs))) { - local_irq_restore_nosync(flags); + if (root_on_entry) + local_irq_restore_nosync(flags); return 1; } @@ -734,7 +737,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) * handler, restore the original IF from exception entry as the * low-level return code will evaluate it. */ - __fixup_if(raw_irqs_disabled_flags(flags), regs); + if (root_on_entry) + __fixup_if(raw_irqs_disabled_flags(flags), regs); if (unlikely(!ipipe_root_domain_p)) { /* Detect unhandled faults over non-root domains. */ @@ -770,7 +774,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) * Relevant for 64-bit: Restore root domain state as the low-level * return code will not align it to regs.flags. */ - local_irq_restore_nosync(flags); + if (root_on_entry) + local_irq_restore_nosync(flags); return 0; } -- Gilles. _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core