On 07/12/2011 07:43 PM, Jan Kiszka wrote:
> On 2011-07-12 19:38, Gilles Chanteperdrix wrote:
>> On 07/12/2011 07:34 PM, Jan Kiszka wrote:
>>> On 2011-07-12 19:31, Gilles Chanteperdrix wrote:
>>>> On 07/12/2011 02:57 PM, Jan Kiszka wrote:
>>>>>                   xnlock_put_irqrestore(&nklock, s);
>>>>>                   xnpod_schedule();
>>>>>           }
>>>>> @@ -1036,6 +1043,7 @@ redo:
>>>>>    * to process this signal anyway.
>>>>>    */
>>>>>   if (rthal_current_domain == rthal_root_domain) {
>>>>> +         XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC));
>>>> Misleading dead code again, XNATOMIC is cleared not ten lines above.
>>> Nope, I forgot to remove that line.
>>>>>           if (XENO_DEBUG(NUCLEUS) && (!signal_pending(this_task)
>>>>>               || this_task->state != TASK_RUNNING))
>>>>>                   xnpod_fatal
>>>>> @@ -1044,6 +1052,8 @@ redo:
>>>>>           return -ERESTARTSYS;
>>>>>   }
>>>>> + xnthread_clear_info(thread, XNATOMIC);
>>>> Why this? I find the xnthread_clear_info(XNATOMIC) right at the right
>>>> place at the point it currently is.
>>> Nope. Now we either clear XNATOMIC after successful migration or when
>>> the signal is about to be sent (ie. in the hook). That way we can test
>>> more reliably (TM) in the gatekeeper if the thread can be migrated.
>> Ok for adding the XNATOMIC test, because it improves the robustness, but
>> why changing the way XNATOMIC is set and clear? Chances of breaking
>> thing while changing code in this area are really high...
> The current code is (most probably) broken as it does not properly
> synchronizes the gatekeeper against a signaled and "runaway" target
> Linux task.
> We need an indication if a Linux signal will (or already has) woken up
> the to-be-migrated task. That task may have continued over its context,
> potentially on a different CPU. Providing this indication is the purpose
> of changing where XNATOMIC is cleared.

What about synchronizing with the gatekeeper with a semaphore, as done
in the first patch you sent, but doing it in xnshadow_harden, as soon as
we detect that we are not back from schedule in primary mode? It seems
it would avoid any further issue, as we would then be guaranteed that
the thread could not switch to TASK_INTERRUPTIBLE again before the
gatekeeper is finished.

What worries me is the comment in xnshadow_harden:

         * gatekeeper sent us to primary mode. Since
         * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking
         * the runqueue's count of uniniterruptible tasks, we just
         * notice the issue and gracefully fail; the caller will have
         * to process this signal anyway.

Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this
point? Or simply that TASK_UNINTERRUPTIBLE is not available for the
business of xnshadow_harden?


Xenomai-core mailing list

Reply via email to