Re: [Xenomai-core] Domain switch during page fault handling
On Sun, 2010-01-24 at 11:45 +0100, Jan Kiszka wrote: > Philippe Gerum wrote: > > On Sat, 2010-01-23 at 11:33 +0100, Jan Kiszka wrote: > >> Philippe Gerum wrote: > >>> On Sat, 2010-01-23 at 11:09 +0100, Jan Kiszka wrote: > Philippe Gerum wrote: > > On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote: > >> On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: > >>> Philippe Gerum wrote: > On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: > > Gilles Chanteperdrix wrote: > >> Philippe Gerum wrote: > >>> On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: > Hi guys, > > we are currently trying to catch an ugly Linux pipeline state > corruption > on x86-64. > > Conceptual question: If a Xenomai task causes a fault, we enter > ipipe_trap_notify over the primary domain and leave it over the > root > domain, right? Now, if the root domain happened to be stalled > when the > exception happened, where should it normally be unstalled again, > *for_that_task*? Our problem is that we generate a code path > where this > does not happen. > >>> xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> > >>> finish_lock_switch -> unstall > >>> > >>> Since xnshadow_relax is called on behalf the event dispatcher, we > >>> should > >>> expect it to return with the root domain unstalled after a domain > >>> downgrade, from primary to root. > >> Ok, but what about local_irq_restore_nosync at the end of the > >> function ? > >> > > That is, IMO, our problem: It replays the root state on fault > > entry, but > > that one is totally unrelated to the (Xenomai) task that caused the > > fault. > The code seems fishy. Try restoring only when the incoming domain was > the root one. Indeed. > > >>> Something like this? > >>> > >>> diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c > >>> index 4442d96..0558ea3 100644 > >>> --- a/arch/x86/kernel/ipipe.c > >>> +++ b/arch/x86/kernel/ipipe.c > >>> @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { > >>> > >>> int __ipipe_handle_exception(struct pt_regs *regs, long error_code, > >>> int vector) > >>> { > >>> + bool restore_flags = false; > >>> unsigned long flags; > >>> > >>> - /* Pick up the root domain state of the interrupted context. */ > >>> - local_save_flags(flags); > >>> + if (ipipe_root_domain_p && irqs_disabled_hw()) { > >>> + /* Pick up the root domain state of the interrupted > >>> context. */ > >>> + local_save_flags(flags); > >>> > >>> - if (ipipe_root_domain_p) { > >>> /* > >>>* Replicate hw interrupt state into the virtual mask > >>> before > >>>* calling the I-pipe event handler over the root > >>> domain. Also > >>>* required later when calling the Linux exception > >>> handler. > >>>*/ > >>> - if (irqs_disabled_hw()) > >>> - local_irq_disable(); > >>> + local_irq_disable(); > >>> + > >>> + restore_flags = true; > >>> } > >>> #ifdef CONFIG_KGDB > >>> /* catch exception KGDB is interested in over non-root domains > >>> */ > >>> @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs > >>> *regs, long error_code, int vector) > >>> #endif /* CONFIG_KGDB */ > >>> > >>> if (unlikely(ipipe_trap_notify(vector, regs))) { > >>> - local_irq_restore_nosync(flags); > >>> + if (restore_flags) > >>> + local_irq_restore_nosync(flags); > >>> return 1; > >>> } > >>> > >>> @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs > >>> *regs, long error_code, int vector) > >>>* Relevant for 64-bit: Restore root domain state as the > >>> low-level > >>>* return code will not align it to regs.flags. > >>>*/ > >>> - local_irq_restore_nosync(flags); > >>> + if (restore_flags) > >>> + local_irq_restore_nosync(flags); > >>> > >>> return 0; > >>> } > >>> > >>> > >>> We are currently not able to test this on the system that triggers it, > >>> but we'll do so tomorrow (yeah...). > >>> > >> Should work. Famous last words. > >> > > Strike that. This won't work, because the fixup code will use the saved > > flags even when root is not the incoming domain and/or h
Re: [Xenomai-core] Domain switch during page fault handling
Philippe Gerum wrote: > On Sat, 2010-01-23 at 11:33 +0100, Jan Kiszka wrote: >> Philippe Gerum wrote: >>> On Sat, 2010-01-23 at 11:09 +0100, Jan Kiszka wrote: Philippe Gerum wrote: > On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote: >> On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: >>> Philippe Gerum wrote: On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Philippe Gerum wrote: >>> On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: Hi guys, we are currently trying to catch an ugly Linux pipeline state corruption on x86-64. Conceptual question: If a Xenomai task causes a fault, we enter ipipe_trap_notify over the primary domain and leave it over the root domain, right? Now, if the root domain happened to be stalled when the exception happened, where should it normally be unstalled again, *for_that_task*? Our problem is that we generate a code path where this does not happen. >>> xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> >>> finish_lock_switch -> unstall >>> >>> Since xnshadow_relax is called on behalf the event dispatcher, we >>> should >>> expect it to return with the root domain unstalled after a domain >>> downgrade, from primary to root. >> Ok, but what about local_irq_restore_nosync at the end of the >> function ? >> > That is, IMO, our problem: It replays the root state on fault entry, > but > that one is totally unrelated to the (Xenomai) task that caused the > fault. The code seems fishy. Try restoring only when the incoming domain was the root one. Indeed. >>> Something like this? >>> >>> diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c >>> index 4442d96..0558ea3 100644 >>> --- a/arch/x86/kernel/ipipe.c >>> +++ b/arch/x86/kernel/ipipe.c >>> @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { >>> >>> int __ipipe_handle_exception(struct pt_regs *regs, long error_code, >>> int vector) >>> { >>> + bool restore_flags = false; >>> unsigned long flags; >>> >>> - /* Pick up the root domain state of the interrupted context. */ >>> - local_save_flags(flags); >>> + if (ipipe_root_domain_p && irqs_disabled_hw()) { >>> + /* Pick up the root domain state of the interrupted >>> context. */ >>> + local_save_flags(flags); >>> >>> - if (ipipe_root_domain_p) { >>> /* >>> * Replicate hw interrupt state into the virtual mask >>> before >>> * calling the I-pipe event handler over the root >>> domain. Also >>> * required later when calling the Linux exception >>> handler. >>> */ >>> - if (irqs_disabled_hw()) >>> - local_irq_disable(); >>> + local_irq_disable(); >>> + >>> + restore_flags = true; >>> } >>> #ifdef CONFIG_KGDB >>> /* catch exception KGDB is interested in over non-root domains >>> */ >>> @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, >>> long error_code, int vector) >>> #endif /* CONFIG_KGDB */ >>> >>> if (unlikely(ipipe_trap_notify(vector, regs))) { >>> - local_irq_restore_nosync(flags); >>> + if (restore_flags) >>> + local_irq_restore_nosync(flags); >>> return 1; >>> } >>> >>> @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, >>> long error_code, int vector) >>> * Relevant for 64-bit: Restore root domain state as the >>> low-level >>> * return code will not align it to regs.flags. >>> */ >>> - local_irq_restore_nosync(flags); >>> + if (restore_flags) >>> + local_irq_restore_nosync(flags); >>> >>> return 0; >>> } >>> >>> >>> We are currently not able to test this on the system that triggers it, >>> but we'll do so tomorrow (yeah...). >>> >> Should work. Famous last words. >> > Strike that. This won't work, because the fixup code will use the saved > flags even when root is not the incoming domain and/or hw IRQs are on on > entry. In short, local_save_flags() must be done unconditionally, as > previously. It will accidentally work for 64-bit where __fixup_if is empty. And for 32-bit, I would say we need
Re: [Xenomai-core] Domain switch during page fault handling
On Sat, 2010-01-23 at 11:33 +0100, Jan Kiszka wrote: > Philippe Gerum wrote: > > On Sat, 2010-01-23 at 11:09 +0100, Jan Kiszka wrote: > >> Philippe Gerum wrote: > >>> On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote: > On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: > > Philippe Gerum wrote: > >> On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: > >>> Gilles Chanteperdrix wrote: > Philippe Gerum wrote: > > On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: > >> Hi guys, > >> > >> we are currently trying to catch an ugly Linux pipeline state > >> corruption > >> on x86-64. > >> > >> Conceptual question: If a Xenomai task causes a fault, we enter > >> ipipe_trap_notify over the primary domain and leave it over the > >> root > >> domain, right? Now, if the root domain happened to be stalled when > >> the > >> exception happened, where should it normally be unstalled again, > >> *for_that_task*? Our problem is that we generate a code path where > >> this > >> does not happen. > > xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> > > finish_lock_switch -> unstall > > > > Since xnshadow_relax is called on behalf the event dispatcher, we > > should > > expect it to return with the root domain unstalled after a domain > > downgrade, from primary to root. > Ok, but what about local_irq_restore_nosync at the end of the > function ? > > >>> That is, IMO, our problem: It replays the root state on fault entry, > >>> but > >>> that one is totally unrelated to the (Xenomai) task that caused the > >>> fault. > >> The code seems fishy. Try restoring only when the incoming domain was > >> the root one. Indeed. > >> > > Something like this? > > > > diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c > > index 4442d96..0558ea3 100644 > > --- a/arch/x86/kernel/ipipe.c > > +++ b/arch/x86/kernel/ipipe.c > > @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { > > > > int __ipipe_handle_exception(struct pt_regs *regs, long error_code, > > int vector) > > { > > + bool restore_flags = false; > > unsigned long flags; > > > > - /* Pick up the root domain state of the interrupted context. */ > > - local_save_flags(flags); > > + if (ipipe_root_domain_p && irqs_disabled_hw()) { > > + /* Pick up the root domain state of the interrupted > > context. */ > > + local_save_flags(flags); > > > > - if (ipipe_root_domain_p) { > > /* > > * Replicate hw interrupt state into the virtual mask > > before > > * calling the I-pipe event handler over the root > > domain. Also > > * required later when calling the Linux exception > > handler. > > */ > > - if (irqs_disabled_hw()) > > - local_irq_disable(); > > + local_irq_disable(); > > + > > + restore_flags = true; > > } > > #ifdef CONFIG_KGDB > > /* catch exception KGDB is interested in over non-root domains > > */ > > @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, > > long error_code, int vector) > > #endif /* CONFIG_KGDB */ > > > > if (unlikely(ipipe_trap_notify(vector, regs))) { > > - local_irq_restore_nosync(flags); > > + if (restore_flags) > > + local_irq_restore_nosync(flags); > > return 1; > > } > > > > @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, > > long error_code, int vector) > > * Relevant for 64-bit: Restore root domain state as the > > low-level > > * return code will not align it to regs.flags. > > */ > > - local_irq_restore_nosync(flags); > > + if (restore_flags) > > + local_irq_restore_nosync(flags); > > > > return 0; > > } > > > > > > We are currently not able to test this on the system that triggers it, > > but we'll do so tomorrow (yeah...). > > > Should work. Famous last words. > > >>> Strike that. This won't work, because the fixup code will use the saved > >>> flags even when root is not the incoming domain and/or hw IRQs are on on > >>> entry. In short, local_save_flags() must be done unconditionally, as > >>> previously. > >> It will accidentally work for 64-bit where __fixup_if is empty. And for > >> 32-bit, I would say we need to make it depend on res
Re: [Xenomai-core] Domain switch during page fault handling
Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> The arguably less ambitious following patch works for me on x86_32: >> >> diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c >> index 4442d96..a7e1241 100644 >> --- a/arch/x86/kernel/ipipe.c >> +++ b/arch/x86/kernel/ipipe.c >> @@ -703,6 +703,7 @@ static int __ipipe_xlate_signo[] = { >> int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int >> vector) >> { >> unsigned long flags; >> +unsigned root_on_entry = 0; >> >> /* Pick up the root domain state of the interrupted context. */ >> local_save_flags(flags); >> @@ -715,6 +716,7 @@ int __ipipe_handle_exception(struct pt_regs *regs, long >> error_code, int vector) >> */ >> if (irqs_disabled_hw()) >> local_irq_disable(); >> +root_on_entry = 1; >> } >> #ifdef CONFIG_KGDB >> /* catch exception KGDB is interested in over non-root domains */ >> @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long >> error_code, int vector) >> #endif /* CONFIG_KGDB */ >> >> if (unlikely(ipipe_trap_notify(vector, regs))) { >> -local_irq_restore_nosync(flags); >> +if (root_on_entry) >> +local_irq_restore_nosync(flags); >> return 1; >> } >> >> @@ -734,7 +737,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long >> error_code, int vector) >> * handler, restore the original IF from exception entry as the >> * low-level return code will evaluate it. >> */ >> -__fixup_if(raw_irqs_disabled_flags(flags), regs); >> +if (root_on_entry) >> +__fixup_if(raw_irqs_disabled_flags(flags), regs); > > See my other mail, this is conflicting with what was once documented as > the purpose of this fixup. I think we need to pick up the current root > state here and push it into regs - if we are running over root. > What about this (draft, not yet tested)? diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c index 4442d96..99c5346 100644 --- a/arch/x86/kernel/ipipe.c +++ b/arch/x86/kernel/ipipe.c @@ -702,19 +702,17 @@ static int __ipipe_xlate_signo[] = { int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) { + bool restore_flags = false; unsigned long flags; - /* Pick up the root domain state of the interrupted context. */ - local_save_flags(flags); - - if (ipipe_root_domain_p) { + if (ipipe_root_domain_p && irqs_disabled_hw()) { /* * Replicate hw interrupt state into the virtual mask before * calling the I-pipe event handler over the root domain. Also * required later when calling the Linux exception handler. */ - if (irqs_disabled_hw()) - local_irq_disable(); + local_irq_save(flags); + restore_flags = true; } #ifdef CONFIG_KGDB /* catch exception KGDB is interested in over non-root domains */ @@ -725,18 +723,20 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) #endif /* CONFIG_KGDB */ if (unlikely(ipipe_trap_notify(vector, regs))) { - local_irq_restore_nosync(flags); + if (restore_flags) + local_irq_restore_nosync(flags); return 1; } - /* -* 32-bit: In case we migrated to root domain inside the event -* handler, restore the original IF from exception entry as the -* low-level return code will evaluate it. -*/ - __fixup_if(raw_irqs_disabled_flags(flags), regs); - - if (unlikely(!ipipe_root_domain_p)) { + if (likely(ipipe_root_domain_p)) { + /* +* 32-bit: In case we migrated to root domain inside the event +* handler, align regs.flags with the root domain state as the +* low-level return code will evaluate it. +*/ + __fixup_if(test_bit(IPIPE_STALL_FLAG, + &ipipe_root_cpudom_var(status)), regs); + } else { /* Detect unhandled faults over non-root domains. */ struct ipipe_domain *ipd = ipipe_current_domain; @@ -770,21 +770,26 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) * Relevant for 64-bit: Restore root domain state as the low-level * return code will not align it to regs.flags. */ - local_irq_restore_nosync(flags); + if (restore_flags) + local_irq_restore_nosync(flags); return 0; } int __ipipe_divert_exception(struct pt_regs *regs, int vector) { + bool restore_flags = false; unsigned long flags; - /* Same root state handling as in __ipipe_handle_exception. */ - local_save_flags(fla
Re: [Xenomai-core] Domain switch during page fault handling
Gilles Chanteperdrix wrote: > Philippe Gerum wrote: >> On Sat, 2010-01-23 at 11:09 +0100, Jan Kiszka wrote: >>> Philippe Gerum wrote: On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote: > On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: >> Philippe Gerum wrote: >>> On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: Gilles Chanteperdrix wrote: > Philippe Gerum wrote: >> On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: >>> Hi guys, >>> >>> we are currently trying to catch an ugly Linux pipeline state >>> corruption >>> on x86-64. >>> >>> Conceptual question: If a Xenomai task causes a fault, we enter >>> ipipe_trap_notify over the primary domain and leave it over the root >>> domain, right? Now, if the root domain happened to be stalled when >>> the >>> exception happened, where should it normally be unstalled again, >>> *for_that_task*? Our problem is that we generate a code path where >>> this >>> does not happen. >> xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> >> finish_lock_switch -> unstall >> >> Since xnshadow_relax is called on behalf the event dispatcher, we >> should >> expect it to return with the root domain unstalled after a domain >> downgrade, from primary to root. > Ok, but what about local_irq_restore_nosync at the end of the > function ? > That is, IMO, our problem: It replays the root state on fault entry, but that one is totally unrelated to the (Xenomai) task that caused the fault. >>> The code seems fishy. Try restoring only when the incoming domain was >>> the root one. Indeed. >>> >> Something like this? >> >> diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c >> index 4442d96..0558ea3 100644 >> --- a/arch/x86/kernel/ipipe.c >> +++ b/arch/x86/kernel/ipipe.c >> @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { >> >> int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int >> vector) >> { >> +bool restore_flags = false; >> unsigned long flags; >> >> -/* Pick up the root domain state of the interrupted context. */ >> -local_save_flags(flags); >> +if (ipipe_root_domain_p && irqs_disabled_hw()) { >> +/* Pick up the root domain state of the interrupted >> context. */ >> +local_save_flags(flags); >> >> -if (ipipe_root_domain_p) { >> /* >> * Replicate hw interrupt state into the virtual mask >> before >> * calling the I-pipe event handler over the root >> domain. Also >> * required later when calling the Linux exception >> handler. >> */ >> -if (irqs_disabled_hw()) >> -local_irq_disable(); >> +local_irq_disable(); >> + >> +restore_flags = true; >> } >> #ifdef CONFIG_KGDB >> /* catch exception KGDB is interested in over non-root domains >> */ >> @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, >> long error_code, int vector) >> #endif /* CONFIG_KGDB */ >> >> if (unlikely(ipipe_trap_notify(vector, regs))) { >> -local_irq_restore_nosync(flags); >> +if (restore_flags) >> +local_irq_restore_nosync(flags); >> return 1; >> } >> >> @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, >> long error_code, int vector) >> * Relevant for 64-bit: Restore root domain state as the >> low-level >> * return code will not align it to regs.flags. >> */ >> -local_irq_restore_nosync(flags); >> +if (restore_flags) >> +local_irq_restore_nosync(flags); >> >> return 0; >> } >> >> >> We are currently not able to test this on the system that triggers it, >> but we'll do so tomorrow (yeah...). >> > Should work. Famous last words. > Strike that. This won't work, because the fixup code will use the saved flags even when root is not the incoming domain and/or hw IRQs are on on entry. In short, local_save_flags() must be done unconditionally, as previously. >>> It will accidentally work for 64-bit where __fixup_if is empty. And for >>> 32-bit, I would say we need to make it depend on restore_flags as well. >>> >> AFAIK, Gilles is working on this. We just need to avoid stepping on >> 32bit toes to fix 64. >
Re: [Xenomai-core] Domain switch during page fault handling
Philippe Gerum wrote: > On Sat, 2010-01-23 at 11:09 +0100, Jan Kiszka wrote: >> Philippe Gerum wrote: >>> On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote: On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: > Philippe Gerum wrote: >> On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: Philippe Gerum wrote: > On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: >> Hi guys, >> >> we are currently trying to catch an ugly Linux pipeline state >> corruption >> on x86-64. >> >> Conceptual question: If a Xenomai task causes a fault, we enter >> ipipe_trap_notify over the primary domain and leave it over the root >> domain, right? Now, if the root domain happened to be stalled when >> the >> exception happened, where should it normally be unstalled again, >> *for_that_task*? Our problem is that we generate a code path where >> this >> does not happen. > xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> > finish_lock_switch -> unstall > > Since xnshadow_relax is called on behalf the event dispatcher, we > should > expect it to return with the root domain unstalled after a domain > downgrade, from primary to root. Ok, but what about local_irq_restore_nosync at the end of the function ? >>> That is, IMO, our problem: It replays the root state on fault entry, but >>> that one is totally unrelated to the (Xenomai) task that caused the >>> fault. >> The code seems fishy. Try restoring only when the incoming domain was >> the root one. Indeed. >> > Something like this? > > diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c > index 4442d96..0558ea3 100644 > --- a/arch/x86/kernel/ipipe.c > +++ b/arch/x86/kernel/ipipe.c > @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { > > int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int > vector) > { > + bool restore_flags = false; > unsigned long flags; > > - /* Pick up the root domain state of the interrupted context. */ > - local_save_flags(flags); > + if (ipipe_root_domain_p && irqs_disabled_hw()) { > + /* Pick up the root domain state of the interrupted context. */ > + local_save_flags(flags); > > - if (ipipe_root_domain_p) { > /* >* Replicate hw interrupt state into the virtual mask before >* calling the I-pipe event handler over the root domain. Also >* required later when calling the Linux exception handler. >*/ > - if (irqs_disabled_hw()) > - local_irq_disable(); > + local_irq_disable(); > + > + restore_flags = true; > } > #ifdef CONFIG_KGDB > /* catch exception KGDB is interested in over non-root domains */ > @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, > long error_code, int vector) > #endif /* CONFIG_KGDB */ > > if (unlikely(ipipe_trap_notify(vector, regs))) { > - local_irq_restore_nosync(flags); > + if (restore_flags) > + local_irq_restore_nosync(flags); > return 1; > } > > @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, > long error_code, int vector) >* Relevant for 64-bit: Restore root domain state as the low-level >* return code will not align it to regs.flags. >*/ > - local_irq_restore_nosync(flags); > + if (restore_flags) > + local_irq_restore_nosync(flags); > > return 0; > } > > > We are currently not able to test this on the system that triggers it, > but we'll do so tomorrow (yeah...). > Should work. Famous last words. >>> Strike that. This won't work, because the fixup code will use the saved >>> flags even when root is not the incoming domain and/or hw IRQs are on on >>> entry. In short, local_save_flags() must be done unconditionally, as >>> previously. >> It will accidentally work for 64-bit where __fixup_if is empty. And for >> 32-bit, I would say we need to make it depend on restore_flags as well. >> > > AFAIK, Gilles is working on this. We just need to avoid stepping on > 32bit toes to fix 64. > OK. Just realized that my suggestion would conflict with the comment above __fixup_if. So I think we first need to clarify the various scenarios again to avoid breaking one while fixing another. Entry over non-root, exit over non-root: - no need to fiddle with the root state Entry over root, exit over root, !irqs_disabled_hw(): - no need to fiddle with the root state - 32 bit: regs fixup required? Entry over
Re: [Xenomai-core] Domain switch during page fault handling
Philippe Gerum wrote: > On Sat, 2010-01-23 at 11:09 +0100, Jan Kiszka wrote: >> Philippe Gerum wrote: >>> On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote: On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: > Philippe Gerum wrote: >> On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: Philippe Gerum wrote: > On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: >> Hi guys, >> >> we are currently trying to catch an ugly Linux pipeline state >> corruption >> on x86-64. >> >> Conceptual question: If a Xenomai task causes a fault, we enter >> ipipe_trap_notify over the primary domain and leave it over the root >> domain, right? Now, if the root domain happened to be stalled when >> the >> exception happened, where should it normally be unstalled again, >> *for_that_task*? Our problem is that we generate a code path where >> this >> does not happen. > xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> > finish_lock_switch -> unstall > > Since xnshadow_relax is called on behalf the event dispatcher, we > should > expect it to return with the root domain unstalled after a domain > downgrade, from primary to root. Ok, but what about local_irq_restore_nosync at the end of the function ? >>> That is, IMO, our problem: It replays the root state on fault entry, but >>> that one is totally unrelated to the (Xenomai) task that caused the >>> fault. >> The code seems fishy. Try restoring only when the incoming domain was >> the root one. Indeed. >> > Something like this? > > diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c > index 4442d96..0558ea3 100644 > --- a/arch/x86/kernel/ipipe.c > +++ b/arch/x86/kernel/ipipe.c > @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { > > int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int > vector) > { > + bool restore_flags = false; > unsigned long flags; > > - /* Pick up the root domain state of the interrupted context. */ > - local_save_flags(flags); > + if (ipipe_root_domain_p && irqs_disabled_hw()) { > + /* Pick up the root domain state of the interrupted context. */ > + local_save_flags(flags); > > - if (ipipe_root_domain_p) { > /* >* Replicate hw interrupt state into the virtual mask before >* calling the I-pipe event handler over the root domain. Also >* required later when calling the Linux exception handler. >*/ > - if (irqs_disabled_hw()) > - local_irq_disable(); > + local_irq_disable(); > + > + restore_flags = true; > } > #ifdef CONFIG_KGDB > /* catch exception KGDB is interested in over non-root domains */ > @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, > long error_code, int vector) > #endif /* CONFIG_KGDB */ > > if (unlikely(ipipe_trap_notify(vector, regs))) { > - local_irq_restore_nosync(flags); > + if (restore_flags) > + local_irq_restore_nosync(flags); > return 1; > } > > @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, > long error_code, int vector) >* Relevant for 64-bit: Restore root domain state as the low-level >* return code will not align it to regs.flags. >*/ > - local_irq_restore_nosync(flags); > + if (restore_flags) > + local_irq_restore_nosync(flags); > > return 0; > } > > > We are currently not able to test this on the system that triggers it, > but we'll do so tomorrow (yeah...). > Should work. Famous last words. >>> Strike that. This won't work, because the fixup code will use the saved >>> flags even when root is not the incoming domain and/or hw IRQs are on on >>> entry. In short, local_save_flags() must be done unconditionally, as >>> previously. >> It will accidentally work for 64-bit where __fixup_if is empty. And for >> 32-bit, I would say we need to make it depend on restore_flags as well. >> > > AFAIK, Gilles is working on this. We just need to avoid stepping on > 32bit toes to fix 64. The arguably less ambitious following patch works for me on x86_32: diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c index 4442d96..a7e1241 100644 --- a/arch/x86/kernel/ipipe.c +++ b/arch/x86/kernel/ipipe.c @@ -703,6 +703,7 @@ static int __ipipe_xlate_signo[] = { int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) { unsigned long flags; + unsigned root_on_entr
Re: [Xenomai-core] Domain switch during page fault handling
On Sat, 2010-01-23 at 11:09 +0100, Jan Kiszka wrote: > Philippe Gerum wrote: > > On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote: > >> On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: > >>> Philippe Gerum wrote: > On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: > > Gilles Chanteperdrix wrote: > >> Philippe Gerum wrote: > >>> On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: > Hi guys, > > we are currently trying to catch an ugly Linux pipeline state > corruption > on x86-64. > > Conceptual question: If a Xenomai task causes a fault, we enter > ipipe_trap_notify over the primary domain and leave it over the root > domain, right? Now, if the root domain happened to be stalled when > the > exception happened, where should it normally be unstalled again, > *for_that_task*? Our problem is that we generate a code path where > this > does not happen. > >>> xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> > >>> finish_lock_switch -> unstall > >>> > >>> Since xnshadow_relax is called on behalf the event dispatcher, we > >>> should > >>> expect it to return with the root domain unstalled after a domain > >>> downgrade, from primary to root. > >> Ok, but what about local_irq_restore_nosync at the end of the function > >> ? > >> > > That is, IMO, our problem: It replays the root state on fault entry, but > > that one is totally unrelated to the (Xenomai) task that caused the > > fault. > The code seems fishy. Try restoring only when the incoming domain was > the root one. Indeed. > > >>> Something like this? > >>> > >>> diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c > >>> index 4442d96..0558ea3 100644 > >>> --- a/arch/x86/kernel/ipipe.c > >>> +++ b/arch/x86/kernel/ipipe.c > >>> @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { > >>> > >>> int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int > >>> vector) > >>> { > >>> + bool restore_flags = false; > >>> unsigned long flags; > >>> > >>> - /* Pick up the root domain state of the interrupted context. */ > >>> - local_save_flags(flags); > >>> + if (ipipe_root_domain_p && irqs_disabled_hw()) { > >>> + /* Pick up the root domain state of the interrupted context. */ > >>> + local_save_flags(flags); > >>> > >>> - if (ipipe_root_domain_p) { > >>> /* > >>>* Replicate hw interrupt state into the virtual mask before > >>>* calling the I-pipe event handler over the root domain. Also > >>>* required later when calling the Linux exception handler. > >>>*/ > >>> - if (irqs_disabled_hw()) > >>> - local_irq_disable(); > >>> + local_irq_disable(); > >>> + > >>> + restore_flags = true; > >>> } > >>> #ifdef CONFIG_KGDB > >>> /* catch exception KGDB is interested in over non-root domains */ > >>> @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, > >>> long error_code, int vector) > >>> #endif /* CONFIG_KGDB */ > >>> > >>> if (unlikely(ipipe_trap_notify(vector, regs))) { > >>> - local_irq_restore_nosync(flags); > >>> + if (restore_flags) > >>> + local_irq_restore_nosync(flags); > >>> return 1; > >>> } > >>> > >>> @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, > >>> long error_code, int vector) > >>>* Relevant for 64-bit: Restore root domain state as the low-level > >>>* return code will not align it to regs.flags. > >>>*/ > >>> - local_irq_restore_nosync(flags); > >>> + if (restore_flags) > >>> + local_irq_restore_nosync(flags); > >>> > >>> return 0; > >>> } > >>> > >>> > >>> We are currently not able to test this on the system that triggers it, > >>> but we'll do so tomorrow (yeah...). > >>> > >> Should work. Famous last words. > >> > > > > Strike that. This won't work, because the fixup code will use the saved > > flags even when root is not the incoming domain and/or hw IRQs are on on > > entry. In short, local_save_flags() must be done unconditionally, as > > previously. > > It will accidentally work for 64-bit where __fixup_if is empty. And for > 32-bit, I would say we need to make it depend on restore_flags as well. > AFAIK, Gilles is working on this. We just need to avoid stepping on 32bit toes to fix 64. > Jan > -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
Philippe Gerum wrote: > On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote: >> On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: >>> Philippe Gerum wrote: On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Philippe Gerum wrote: >>> On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: Hi guys, we are currently trying to catch an ugly Linux pipeline state corruption on x86-64. Conceptual question: If a Xenomai task causes a fault, we enter ipipe_trap_notify over the primary domain and leave it over the root domain, right? Now, if the root domain happened to be stalled when the exception happened, where should it normally be unstalled again, *for_that_task*? Our problem is that we generate a code path where this does not happen. >>> xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> >>> finish_lock_switch -> unstall >>> >>> Since xnshadow_relax is called on behalf the event dispatcher, we should >>> expect it to return with the root domain unstalled after a domain >>> downgrade, from primary to root. >> Ok, but what about local_irq_restore_nosync at the end of the function ? >> > That is, IMO, our problem: It replays the root state on fault entry, but > that one is totally unrelated to the (Xenomai) task that caused the fault. The code seems fishy. Try restoring only when the incoming domain was the root one. Indeed. >>> Something like this? >>> >>> diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c >>> index 4442d96..0558ea3 100644 >>> --- a/arch/x86/kernel/ipipe.c >>> +++ b/arch/x86/kernel/ipipe.c >>> @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { >>> >>> int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int >>> vector) >>> { >>> + bool restore_flags = false; >>> unsigned long flags; >>> >>> - /* Pick up the root domain state of the interrupted context. */ >>> - local_save_flags(flags); >>> + if (ipipe_root_domain_p && irqs_disabled_hw()) { >>> + /* Pick up the root domain state of the interrupted context. */ >>> + local_save_flags(flags); >>> >>> - if (ipipe_root_domain_p) { >>> /* >>> * Replicate hw interrupt state into the virtual mask before >>> * calling the I-pipe event handler over the root domain. Also >>> * required later when calling the Linux exception handler. >>> */ >>> - if (irqs_disabled_hw()) >>> - local_irq_disable(); >>> + local_irq_disable(); >>> + >>> + restore_flags = true; >>> } >>> #ifdef CONFIG_KGDB >>> /* catch exception KGDB is interested in over non-root domains */ >>> @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long >>> error_code, int vector) >>> #endif /* CONFIG_KGDB */ >>> >>> if (unlikely(ipipe_trap_notify(vector, regs))) { >>> - local_irq_restore_nosync(flags); >>> + if (restore_flags) >>> + local_irq_restore_nosync(flags); >>> return 1; >>> } >>> >>> @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long >>> error_code, int vector) >>> * Relevant for 64-bit: Restore root domain state as the low-level >>> * return code will not align it to regs.flags. >>> */ >>> - local_irq_restore_nosync(flags); >>> + if (restore_flags) >>> + local_irq_restore_nosync(flags); >>> >>> return 0; >>> } >>> >>> >>> We are currently not able to test this on the system that triggers it, >>> but we'll do so tomorrow (yeah...). >>> >> Should work. Famous last words. >> > > Strike that. This won't work, because the fixup code will use the saved > flags even when root is not the incoming domain and/or hw IRQs are on on > entry. In short, local_save_flags() must be done unconditionally, as > previously. It will accidentally work for 64-bit where __fixup_if is empty. And for 32-bit, I would say we need to make it depend on restore_flags as well. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote: > On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: > > Philippe Gerum wrote: > > > On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: > > >> Gilles Chanteperdrix wrote: > > >>> Philippe Gerum wrote: > > On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: > > > Hi guys, > > > > > > we are currently trying to catch an ugly Linux pipeline state > > > corruption > > > on x86-64. > > > > > > Conceptual question: If a Xenomai task causes a fault, we enter > > > ipipe_trap_notify over the primary domain and leave it over the root > > > domain, right? Now, if the root domain happened to be stalled when the > > > exception happened, where should it normally be unstalled again, > > > *for_that_task*? Our problem is that we generate a code path where > > > this > > > does not happen. > > xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> > > finish_lock_switch -> unstall > > > > Since xnshadow_relax is called on behalf the event dispatcher, we > > should > > expect it to return with the root domain unstalled after a domain > > downgrade, from primary to root. > > >>> Ok, but what about local_irq_restore_nosync at the end of the function ? > > >>> > > >> That is, IMO, our problem: It replays the root state on fault entry, but > > >> that one is totally unrelated to the (Xenomai) task that caused the > > >> fault. > > > > > > The code seems fishy. Try restoring only when the incoming domain was > > > the root one. Indeed. > > > > > > > Something like this? > > > > diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c > > index 4442d96..0558ea3 100644 > > --- a/arch/x86/kernel/ipipe.c > > +++ b/arch/x86/kernel/ipipe.c > > @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { > > > > int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int > > vector) > > { > > + bool restore_flags = false; > > unsigned long flags; > > > > - /* Pick up the root domain state of the interrupted context. */ > > - local_save_flags(flags); > > + if (ipipe_root_domain_p && irqs_disabled_hw()) { > > + /* Pick up the root domain state of the interrupted context. */ > > + local_save_flags(flags); > > > > - if (ipipe_root_domain_p) { > > /* > > * Replicate hw interrupt state into the virtual mask before > > * calling the I-pipe event handler over the root domain. Also > > * required later when calling the Linux exception handler. > > */ > > - if (irqs_disabled_hw()) > > - local_irq_disable(); > > + local_irq_disable(); > > + > > + restore_flags = true; > > } > > #ifdef CONFIG_KGDB > > /* catch exception KGDB is interested in over non-root domains */ > > @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long > > error_code, int vector) > > #endif /* CONFIG_KGDB */ > > > > if (unlikely(ipipe_trap_notify(vector, regs))) { > > - local_irq_restore_nosync(flags); > > + if (restore_flags) > > + local_irq_restore_nosync(flags); > > return 1; > > } > > > > @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long > > error_code, int vector) > > * Relevant for 64-bit: Restore root domain state as the low-level > > * return code will not align it to regs.flags. > > */ > > - local_irq_restore_nosync(flags); > > + if (restore_flags) > > + local_irq_restore_nosync(flags); > > > > return 0; > > } > > > > > > We are currently not able to test this on the system that triggers it, > > but we'll do so tomorrow (yeah...). > > > > Should work. Famous last words. > Strike that. This won't work, because the fixup code will use the saved flags even when root is not the incoming domain and/or hw IRQs are on on entry. In short, local_save_flags() must be done unconditionally, as previously. > > Jan > > > > -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote: > Philippe Gerum wrote: > > On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: > >> Gilles Chanteperdrix wrote: > >>> Philippe Gerum wrote: > On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: > > Hi guys, > > > > we are currently trying to catch an ugly Linux pipeline state corruption > > on x86-64. > > > > Conceptual question: If a Xenomai task causes a fault, we enter > > ipipe_trap_notify over the primary domain and leave it over the root > > domain, right? Now, if the root domain happened to be stalled when the > > exception happened, where should it normally be unstalled again, > > *for_that_task*? Our problem is that we generate a code path where this > > does not happen. > xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> > finish_lock_switch -> unstall > > Since xnshadow_relax is called on behalf the event dispatcher, we should > expect it to return with the root domain unstalled after a domain > downgrade, from primary to root. > >>> Ok, but what about local_irq_restore_nosync at the end of the function ? > >>> > >> That is, IMO, our problem: It replays the root state on fault entry, but > >> that one is totally unrelated to the (Xenomai) task that caused the fault. > > > > The code seems fishy. Try restoring only when the incoming domain was > > the root one. Indeed. > > > > Something like this? > > diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c > index 4442d96..0558ea3 100644 > --- a/arch/x86/kernel/ipipe.c > +++ b/arch/x86/kernel/ipipe.c > @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { > > int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int > vector) > { > + bool restore_flags = false; > unsigned long flags; > > - /* Pick up the root domain state of the interrupted context. */ > - local_save_flags(flags); > + if (ipipe_root_domain_p && irqs_disabled_hw()) { > + /* Pick up the root domain state of the interrupted context. */ > + local_save_flags(flags); > > - if (ipipe_root_domain_p) { > /* >* Replicate hw interrupt state into the virtual mask before >* calling the I-pipe event handler over the root domain. Also >* required later when calling the Linux exception handler. >*/ > - if (irqs_disabled_hw()) > - local_irq_disable(); > + local_irq_disable(); > + > + restore_flags = true; > } > #ifdef CONFIG_KGDB > /* catch exception KGDB is interested in over non-root domains */ > @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long > error_code, int vector) > #endif /* CONFIG_KGDB */ > > if (unlikely(ipipe_trap_notify(vector, regs))) { > - local_irq_restore_nosync(flags); > + if (restore_flags) > + local_irq_restore_nosync(flags); > return 1; > } > > @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long > error_code, int vector) >* Relevant for 64-bit: Restore root domain state as the low-level >* return code will not align it to regs.flags. >*/ > - local_irq_restore_nosync(flags); > + if (restore_flags) > + local_irq_restore_nosync(flags); > > return 0; > } > > > We are currently not able to test this on the system that triggers it, > but we'll do so tomorrow (yeah...). > Should work. Famous last words. > Jan > -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
Philippe Gerum wrote: > On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Philippe Gerum wrote: On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: > Hi guys, > > we are currently trying to catch an ugly Linux pipeline state corruption > on x86-64. > > Conceptual question: If a Xenomai task causes a fault, we enter > ipipe_trap_notify over the primary domain and leave it over the root > domain, right? Now, if the root domain happened to be stalled when the > exception happened, where should it normally be unstalled again, > *for_that_task*? Our problem is that we generate a code path where this > does not happen. xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> finish_lock_switch -> unstall Since xnshadow_relax is called on behalf the event dispatcher, we should expect it to return with the root domain unstalled after a domain downgrade, from primary to root. >>> Ok, but what about local_irq_restore_nosync at the end of the function ? >>> >> That is, IMO, our problem: It replays the root state on fault entry, but >> that one is totally unrelated to the (Xenomai) task that caused the fault. > > The code seems fishy. Try restoring only when the incoming domain was > the root one. Indeed. > Something like this? diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c index 4442d96..0558ea3 100644 --- a/arch/x86/kernel/ipipe.c +++ b/arch/x86/kernel/ipipe.c @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = { int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) { + bool restore_flags = false; unsigned long flags; - /* Pick up the root domain state of the interrupted context. */ - local_save_flags(flags); + if (ipipe_root_domain_p && irqs_disabled_hw()) { + /* Pick up the root domain state of the interrupted context. */ + local_save_flags(flags); - if (ipipe_root_domain_p) { /* * Replicate hw interrupt state into the virtual mask before * calling the I-pipe event handler over the root domain. Also * required later when calling the Linux exception handler. */ - if (irqs_disabled_hw()) - local_irq_disable(); + local_irq_disable(); + + restore_flags = true; } #ifdef CONFIG_KGDB /* catch exception KGDB is interested in over non-root domains */ @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) #endif /* CONFIG_KGDB */ if (unlikely(ipipe_trap_notify(vector, regs))) { - local_irq_restore_nosync(flags); + if (restore_flags) + local_irq_restore_nosync(flags); return 1; } @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) * Relevant for 64-bit: Restore root domain state as the low-level * return code will not align it to regs.flags. */ - local_irq_restore_nosync(flags); + if (restore_flags) + local_irq_restore_nosync(flags); return 0; } We are currently not able to test this on the system that triggers it, but we'll do so tomorrow (yeah...). Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Philippe Gerum wrote: > >> On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: > >>> Hi guys, > >>> > >>> we are currently trying to catch an ugly Linux pipeline state corruption > >>> on x86-64. > >>> > >>> Conceptual question: If a Xenomai task causes a fault, we enter > >>> ipipe_trap_notify over the primary domain and leave it over the root > >>> domain, right? Now, if the root domain happened to be stalled when the > >>> exception happened, where should it normally be unstalled again, > >>> *for_that_task*? Our problem is that we generate a code path where this > >>> does not happen. > >> xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> > >> finish_lock_switch -> unstall > >> > >> Since xnshadow_relax is called on behalf the event dispatcher, we should > >> expect it to return with the root domain unstalled after a domain > >> downgrade, from primary to root. > > > > Ok, but what about local_irq_restore_nosync at the end of the function ? > > > > That is, IMO, our problem: It replays the root state on fault entry, but > that one is totally unrelated to the (Xenomai) task that caused the fault. The code seems fishy. Try restoring only when the incoming domain was the root one. Indeed. > > Jan > -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
Jan Kiszka wrote: > Philippe Gerum wrote: >> On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: >>> Hi guys, >>> >>> we are currently trying to catch an ugly Linux pipeline state corruption >>> on x86-64. >>> >>> Conceptual question: If a Xenomai task causes a fault, we enter >>> ipipe_trap_notify over the primary domain and leave it over the root >>> domain, right? Now, if the root domain happened to be stalled when the >>> exception happened, where should it normally be unstalled again, >>> *for_that_task*? Our problem is that we generate a code path where this >>> does not happen. >> xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> >> finish_lock_switch -> unstall >> >> Since xnshadow_relax is called on behalf the event dispatcher, we should >> expect it to return with the root domain unstalled after a domain >> downgrade, from primary to root. > > That all happens as expected. But then __ipipe_handle_exception - as it > last duty - ruins our day by replaying an unrelated root state. > To illustrate what goes wrong (maybe it already fixes the issue, but it may have side effects and is surely not clean): diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c index 4442d96..b6a3720 100644 --- a/arch/x86/kernel/ipipe.c +++ b/arch/x86/kernel/ipipe.c @@ -702,11 +702,14 @@ static int __ipipe_xlate_signo[] = { int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) { - unsigned long flags; + struct ipipe_domain *entry_domain = ipipe_current_domain; + unsigned long flags, entry_dom_flags; /* Pick up the root domain state of the interrupted context. */ local_save_flags(flags); + entry_dom_flags = ipipe_test_pipeline() ? 0 : X86_EFLAGS_IF; + if (ipipe_root_domain_p) { /* * Replicate hw interrupt state into the virtual mask before @@ -729,6 +732,9 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) return 1; } + if (entry_domain != ipipe_current_domain) + flags = entry_dom_flags; + /* * 32-bit: In case we migrated to root domain inside the event * handler, restore the original IF from exception entry as the Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
Philippe Gerum wrote: > On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: >> Hi guys, >> >> we are currently trying to catch an ugly Linux pipeline state corruption >> on x86-64. >> >> Conceptual question: If a Xenomai task causes a fault, we enter >> ipipe_trap_notify over the primary domain and leave it over the root >> domain, right? Now, if the root domain happened to be stalled when the >> exception happened, where should it normally be unstalled again, >> *for_that_task*? Our problem is that we generate a code path where this >> does not happen. > > xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> > finish_lock_switch -> unstall > > Since xnshadow_relax is called on behalf the event dispatcher, we should > expect it to return with the root domain unstalled after a domain > downgrade, from primary to root. That all happens as expected. But then __ipipe_handle_exception - as it last duty - ruins our day by replaying an unrelated root state. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
Gilles Chanteperdrix wrote: > Philippe Gerum wrote: >> On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: >>> Hi guys, >>> >>> we are currently trying to catch an ugly Linux pipeline state corruption >>> on x86-64. >>> >>> Conceptual question: If a Xenomai task causes a fault, we enter >>> ipipe_trap_notify over the primary domain and leave it over the root >>> domain, right? Now, if the root domain happened to be stalled when the >>> exception happened, where should it normally be unstalled again, >>> *for_that_task*? Our problem is that we generate a code path where this >>> does not happen. >> xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> >> finish_lock_switch -> unstall >> >> Since xnshadow_relax is called on behalf the event dispatcher, we should >> expect it to return with the root domain unstalled after a domain >> downgrade, from primary to root. > > Ok, but what about local_irq_restore_nosync at the end of the function ? > That is, IMO, our problem: It replays the root state on fault entry, but that one is totally unrelated to the (Xenomai) task that caused the fault. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
Jan Kiszka wrote: > Hi guys, > > we are currently trying to catch an ugly Linux pipeline state corruption > on x86-64. > > Conceptual question: If a Xenomai task causes a fault, we enter > ipipe_trap_notify over the primary domain and leave it over the root > domain, right? Now, if the root domain happened to be stalled when the > exception happened, where should it normally be unstalled again, > *for_that_task*? Our problem is that we generate a code path where this > does not happen. I have spent a few hours on a similar problem on x86_32. The difference on x86_32 is that the stall bit is used as user-space interrupt flag, so the effect is visible on latencies. I have to say that understanding __ipipe_handle_exception requires more time than I have currently spent, but I intend to elucidate this sooner or later. Here are the kind of traces I get: :| #*event t...@-101 -188+ 3.035 xntimer_next_local_shot+0x85 (xntimer_t (...) :+func-170+ 1.797 up_read+0x3 (do_page_fault+0x136) :#func-1680.279 __ipipe_unstall_iret_root+0x4 (restore_ret+0x0) :| #begin 0x8000 -1680.326 __ipipe_unstall_iret_root+0x7b (restore_ret+0x0) :| #end 0x800d -167+ 1.572 __ipipe_unstall_iret_root+0x64 (restore_ret+0x0) :| #func-1660.239 __ipipe_syscall_root+0xf (system_call+0x2d) :| #func-1660.323 __ipipe_dispatch_event+0x9 (__ipipe_syscall_root+0x40) :| +*func-1650.859 hisyscall_event+0xf (__ipipe_dispatch_event+0xd3) :| #func-164+ 1.388 losyscall_event+0x9 (__ipipe_dispatch_event+0xd3) :| #func-1630.870 sys_time+0x11 (syscall_call+0x7) :| #func-1620.278 get_seconds+0x3 (sys_time+0x1e) :#func-1620.271 __ipipe_unstall_iret_root+0x4 (restore_ret+0x0) :| #begin 0x8000 -1620.575 __ipipe_unstall_iret_root+0x7b (restore_ret+0x0) :| #end 0x800d -161! 64.984 __ipipe_unstall_iret_root+0x64 (restore_ret+0x0) :| #func -960.275 __ipipe_syscall_root+0xf (system_call+0x2d) :| #func -960.304 __ipipe_dispatch_event+0x9 (__ipipe_syscall_root+0x40) :| +*func -950.482 hisyscall_event+0xf (__ipipe_dispatch_event+0xd3) :| #func -950.758 losyscall_event+0x9 (__ipipe_dispatch_event+0xd3) :| #func -940.971 sys_write+0x8 (syscall_call+0x7) :| #*event t...@-134 -2280.546 xntimer_next_local_shot+0x89 (xntimer_t (...) :+func-1810.647 up_read+0x3 (do_page_fault+0x126) :#func-181+ 1.282 __ipipe_unstall_iret_root+0x3 (restore_ret+0x0) :| #func-1790.356 __ipipe_syscall_root+0x11 (system_call+0x2d) :| #func-1790.317 __ipipe_dispatch_event+0x9 (__ipipe_syscall_root+0x42) :| +*func-1790.712 hisyscall_event+0xf (__ipipe_dispatch_event+0xb7) :| #func-178+ 1.415 losyscall_event+0x9 (__ipipe_dispatch_event+0xb7) :| #func-1770.763 sys_time+0x11 (syscall_call+0x7) :| #func-1760.350 get_seconds+0x3 (sys_time+0x1e) :#func-176! 71.098 __ipipe_unstall_iret_root+0x3 (restore_ret+0x0) :| #func-1040.309 __ipipe_syscall_root+0x11 (system_call+0x2d) :| #func-1040.839 __ipipe_dispatch_event+0x9 (__ipipe_syscall_root+0x42) :| +*func-1030.400 hisyscall_event+0xf (__ipipe_dispatch_event+0xb7) :| #func-103+ 1.362 losyscall_event+0x9 (__ipipe_dispatch_event+0xb7) :| #func-1020.520 sys_write+0x8 (syscall_call+0x7) -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
Philippe Gerum wrote: > On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: >> Hi guys, >> >> we are currently trying to catch an ugly Linux pipeline state corruption >> on x86-64. >> >> Conceptual question: If a Xenomai task causes a fault, we enter >> ipipe_trap_notify over the primary domain and leave it over the root >> domain, right? Now, if the root domain happened to be stalled when the >> exception happened, where should it normally be unstalled again, >> *for_that_task*? Our problem is that we generate a code path where this >> does not happen. > > xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> > finish_lock_switch -> unstall > > Since xnshadow_relax is called on behalf the event dispatcher, we should > expect it to return with the root domain unstalled after a domain > downgrade, from primary to root. Ok, but what about local_irq_restore_nosync at the end of the function ? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: > Hi guys, > > we are currently trying to catch an ugly Linux pipeline state corruption > on x86-64. > > Conceptual question: If a Xenomai task causes a fault, we enter > ipipe_trap_notify over the primary domain and leave it over the root > domain, right? Now, if the root domain happened to be stalled when the > exception happened, where should it normally be unstalled again, > *for_that_task*? Our problem is that we generate a code path where this > does not happen. xnhadow_relax -> ipipe_reenter_root -> finish_task_switch -> finish_lock_switch -> unstall Since xnshadow_relax is called on behalf the event dispatcher, we should expect it to return with the root domain unstalled after a domain downgrade, from primary to root. Normally. > > Jan > -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Domain switch during page fault handling
Jan Kiszka wrote: > Hi guys, > > we are currently trying to catch an ugly Linux pipeline state corruption > on x86-64. > > Conceptual question: If a Xenomai task causes a fault, we enter > ipipe_trap_notify over the primary domain and leave it over the root > domain, right? Now, if the root domain happened to be stalled when the > exception happened, where should it normally be unstalled again, > *for_that_task*? Our problem is that we generate a code path where this > does not happen. In other words, and now it's starting to become a pure ipipe topic: If we migrated during ipipe_trap_notify, shouldn't we restore the root domain state afterwards from the state of the previously active domain on entry of __ipipe_handle_exception? I bet that would fix our case, but I'm yet unsure about side effects and the intended semantic behind this. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core