Philippe Gerum wrote:
> On Sat, 2010-01-23 at 11:09 +0100, Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> On Fri, 2010-01-22 at 19:08 +0100, Philippe Gerum wrote:
>>>> On Fri, 2010-01-22 at 19:03 +0100, Jan Kiszka wrote:
>>>>> Philippe Gerum wrote:
>>>>>> On Fri, 2010-01-22 at 18:41 +0100, Jan Kiszka wrote:
>>>>>>> Gilles Chanteperdrix wrote:
>>>>>>>> Philippe Gerum wrote:
>>>>>>>>> On Fri, 2010-01-22 at 17:58 +0100, Jan Kiszka wrote: 
>>>>>>>>>> Hi guys,
>>>>>>>>>>
>>>>>>>>>> we are currently trying to catch an ugly Linux pipeline state 
>>>>>>>>>> corruption
>>>>>>>>>> on x86-64.
>>>>>>>>>>
>>>>>>>>>> Conceptual question: If a Xenomai task causes a fault, we enter
>>>>>>>>>> ipipe_trap_notify over the primary domain and leave it over the root
>>>>>>>>>> domain, right? Now, if the root domain happened to be stalled when 
>>>>>>>>>> the
>>>>>>>>>> exception happened, where should it normally be unstalled again,
>>>>>>>>>> *for_that_task*? Our problem is that we generate a code path where 
>>>>>>>>>> this
>>>>>>>>>> does not happen.
>>>>>>>>> xnhadow_relax -> ipipe_reenter_root -> finish_task_switch ->
>>>>>>>>> finish_lock_switch -> unstall
>>>>>>>>>
>>>>>>>>> Since xnshadow_relax is called on behalf the event dispatcher, we 
>>>>>>>>> should
>>>>>>>>> expect it to return with the root domain unstalled after a domain
>>>>>>>>> downgrade, from primary to root.
>>>>>>>> Ok, but what about local_irq_restore_nosync at the end of the function 
>>>>>>>> ?
>>>>>>>>
>>>>>>> That is, IMO, our problem: It replays the root state on fault entry, but
>>>>>>> that one is totally unrelated to the (Xenomai) task that caused the 
>>>>>>> fault.
>>>>>> The code seems fishy. Try restoring only when the incoming domain was
>>>>>> the root one. Indeed.
>>>>>>
>>>>> Something like this?
>>>>>
>>>>> diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c
>>>>> index 4442d96..0558ea3 100644
>>>>> --- a/arch/x86/kernel/ipipe.c
>>>>> +++ b/arch/x86/kernel/ipipe.c
>>>>> @@ -702,19 +702,21 @@ static int __ipipe_xlate_signo[] = {
>>>>>  
>>>>>  int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int 
>>>>> vector)
>>>>>  {
>>>>> + bool restore_flags = false;
>>>>>   unsigned long flags;
>>>>>  
>>>>> - /* Pick up the root domain state of the interrupted context. */
>>>>> - local_save_flags(flags);
>>>>> + if (ipipe_root_domain_p && irqs_disabled_hw()) {
>>>>> +         /* Pick up the root domain state of the interrupted context. */
>>>>> +         local_save_flags(flags);
>>>>>  
>>>>> - if (ipipe_root_domain_p) {
>>>>>           /*
>>>>>            * Replicate hw interrupt state into the virtual mask before
>>>>>            * calling the I-pipe event handler over the root domain. Also
>>>>>            * required later when calling the Linux exception handler.
>>>>>            */
>>>>> -         if (irqs_disabled_hw())
>>>>> -                 local_irq_disable();
>>>>> +         local_irq_disable();
>>>>> +
>>>>> +         restore_flags = true;
>>>>>   }
>>>>>  #ifdef CONFIG_KGDB
>>>>>   /* catch exception KGDB is interested in over non-root domains */
>>>>> @@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, 
>>>>> long error_code, int vector)
>>>>>  #endif /* CONFIG_KGDB */
>>>>>  
>>>>>   if (unlikely(ipipe_trap_notify(vector, regs))) {
>>>>> -         local_irq_restore_nosync(flags);
>>>>> +         if (restore_flags)
>>>>> +                 local_irq_restore_nosync(flags);
>>>>>           return 1;
>>>>>   }
>>>>>  
>>>>> @@ -770,7 +773,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, 
>>>>> long error_code, int vector)
>>>>>    * Relevant for 64-bit: Restore root domain state as the low-level
>>>>>    * return code will not align it to regs.flags.
>>>>>    */
>>>>> - local_irq_restore_nosync(flags);
>>>>> + if (restore_flags)
>>>>> +         local_irq_restore_nosync(flags);
>>>>>  
>>>>>   return 0;
>>>>>  }
>>>>>
>>>>>
>>>>> We are currently not able to test this on the system that triggers it,
>>>>> but we'll do so tomorrow (yeah...).
>>>>>
>>>> Should work. Famous last words.
>>>>
>>> Strike that. This won't work, because the fixup code will use the saved
>>> flags even when root is not the incoming domain and/or hw IRQs are on on
>>> entry. In short, local_save_flags() must be done unconditionally, as
>>> previously.
>> It will accidentally work for 64-bit where __fixup_if is empty. And for
>> 32-bit, I would say we need to make it depend on restore_flags as well.
>>
> 
> AFAIK, Gilles is working on this. We just need to avoid stepping on
> 32bit toes to fix 64.

The arguably less ambitious following patch works for me on x86_32:

diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c
index 4442d96..a7e1241 100644
--- a/arch/x86/kernel/ipipe.c
+++ b/arch/x86/kernel/ipipe.c
@@ -703,6 +703,7 @@ static int __ipipe_xlate_signo[] = {
 int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector)
 {
        unsigned long flags;
+       unsigned root_on_entry = 0;
 
        /* Pick up the root domain state of the interrupted context. */
        local_save_flags(flags);
@@ -715,6 +716,7 @@ int __ipipe_handle_exception(struct pt_regs *regs, long 
error_code, int vector)
                 */
                if (irqs_disabled_hw())
                        local_irq_disable();
+               root_on_entry = 1;
        }
 #ifdef CONFIG_KGDB
        /* catch exception KGDB is interested in over non-root domains */
@@ -725,7 +727,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long 
error_code, int vector)
 #endif /* CONFIG_KGDB */
 
        if (unlikely(ipipe_trap_notify(vector, regs))) {
-               local_irq_restore_nosync(flags);
+               if (root_on_entry)
+                       local_irq_restore_nosync(flags);
                return 1;
        }
 
@@ -734,7 +737,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long 
error_code, int vector)
         * handler, restore the original IF from exception entry as the
         * low-level return code will evaluate it.
         */
-       __fixup_if(raw_irqs_disabled_flags(flags), regs);
+       if (root_on_entry)
+               __fixup_if(raw_irqs_disabled_flags(flags), regs);
 
        if (unlikely(!ipipe_root_domain_p)) {
                /* Detect unhandled faults over non-root domains. */
@@ -770,7 +774,8 @@ int __ipipe_handle_exception(struct pt_regs *regs, long 
error_code, int vector)
         * Relevant for 64-bit: Restore root domain state as the low-level
         * return code will not align it to regs.flags.
         */
-       local_irq_restore_nosync(flags);
+       if (root_on_entry)
+               local_irq_restore_nosync(flags);
 
        return 0;
 }


-- 
                                            Gilles.

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to