Philippe Gerum wrote:
> On Mon, 2006-12-04 at 22:06 +0100, Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> ...
>>> This indicates that we face an I-pipe bug: the scheduled Linux call on
>>> relaxation of TASK2 and then later TASK1 somehow gets lost (there is no
>>> rthal_apc_handler in the remaining trace).
>> I think I got it. No I-pipe bug, but one in the HAL.
>>
>> What happened? A weird race caused by the unprotected optimisation to
>> only call rthal_schedule_irq() if there is no APC pending yet. This is
>> the constellation I finally worked out via instrumenting and tracing:
>>
>> PRIO 1:
>> rthal_apc_schedule()
>>   test&set rthal_apc_pending
>>   (but no rthal_schedule_irq() yet)
>>
>>                         -PREEMPTION-
>>
>>                                         PRIO 99:
>>                                         ...
>>                                         rthal_apc_schedule()
>>                                           test rthal_apc_pending
>>                                           (already set => no
>>                                           rthal_schedule_irq()!)
>>
>> So, no one reported the ACP to I-pipe, and no one ever will in Nicolas
>> scenario - soft lock-up!
>>
>> Nicolas, please give the attached patch a try. Your test is running fine
>> for me now.
>>
>>
>> At this chance: do we need rthal_apc_schedule() returning the previous
>> state at all? No current caller checks the return value. If it's OK to
>> clean this up, I will post a combined patch.
> 
> It's pointless to return this info, the caller would not be able to
> exploit it in any way.
> 
> Could you please try this other form of your fix? This would prevent the
> virq to be fired twice (or more) uselessly, with no request pending.
> Masking interrupts to enforce atomicity is ok here, since
> rthal_schedule_irq() will run this way in any case. Additionally, my
> comment about migration was suspicious, I do see a case where CPU
> migration would leave this code in limbos. But to get the cpuid
> properly, we need the local interrupts to be masked first. TIA,

Yes, works fine as well.

Then go for the attached fix+cleanup?

Jan
Index: ksrc/arch/generic/hal.c
===================================================================
--- ksrc/arch/generic/hal.c	(Revision 1920)
+++ ksrc/arch/generic/hal.c	(Arbeitskopie)
@@ -565,7 +565,7 @@ int rthal_apc_free(int apc)
 
 /**
  * @fn int rthal_apc_schedule (int apc)
- *                           
+ *
  * @brief Schedule an APC invocation.
  *
  * This service marks the APC as pending for the Linux domain, so that
@@ -580,8 +580,7 @@ int rthal_apc_free(int apc)
  *
  * @param apc The APC id. to schedule.
  *
- * @return 0 or 1 are returned upon success, the former meaning that
- * the APC was already pending. Otherwise:
+ * @return 0 is returned upon success. Otherwise:
  *
  * - -EINVAL is returned if @a apc is invalid.
  *
@@ -596,18 +595,21 @@ int rthal_apc_free(int apc)
 int rthal_apc_schedule(int apc)
 {
     rthal_declare_cpuid;
+    unsigned long flags;
 
     if (apc < 0 || apc >= RTHAL_NR_APCS)
         return -EINVAL;
 
-    rthal_load_cpuid();         /* Migration would be harmless here. */
+    rthal_local_irq_save(flags);
+
+    rthal_load_cpuid();
 
-    if (!test_and_set_bit(apc, &rthal_apc_pending[cpuid])) {
+    if (!__test_and_set_bit(apc, &rthal_apc_pending[cpuid]))
         rthal_schedule_irq(rthal_apc_virq);
-        return 1;
-    }
 
-    return 0;                   /* Already pending. */
+    rthal_local_irq_restore(flags);
+
+    return 0;
 }
 
 #ifdef CONFIG_PROC_FS

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

Reply via email to