Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

Gilles Chanteperdrix Tue, 12 Jul 2011 04:10:19 -0700

On 07/12/2011 01:06 PM, Jan Kiszka wrote:
> On 2011-07-12 13:04, Gilles Chanteperdrix wrote:
>> On 07/12/2011 01:00 PM, Jan Kiszka wrote:
>>> On 2011-07-12 12:59, Gilles Chanteperdrix wrote:
>>>> On 07/12/2011 09:22 AM, Jan Kiszka wrote:
>>>>> On 2011-07-12 08:41, Gilles Chanteperdrix wrote:
>>>>>> On 07/11/2011 10:12 PM, Jan Kiszka wrote:
>>>>>>> On 2011-07-11 22:09, Gilles Chanteperdrix wrote:
>>>>>>>> On 07/11/2011 10:06 PM, Jan Kiszka wrote:
>>>>>>>>> On 2011-07-11 22:02, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 07/11/2011 09:59 PM, Jan Kiszka wrote:
>>>>>>>>>>> On 2011-07-11 21:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 07/11/2011 09:16 PM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2011-07-11 21:10, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 07/08/2011 06:29 PM, GIT version control wrote:
>>>>>>>>>>>>>>>> @@ -2528,6 +2534,22 @@ static inline void 
>>>>>>>>>>>>>>>> do_taskexit_event(struct task_struct *p)
>>>>>>>>>>>>>>>>        magic = xnthread_get_magic(thread);
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>        xnlock_get_irqsave(&nklock, s);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +      gksched = thread->gksched;
>>>>>>>>>>>>>>>> +      if (gksched) {
>>>>>>>>>>>>>>>> +              xnlock_put_irqrestore(&nklock, s);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are we sure irqs are on here? Are you sure that what is needed 
>>>>>>>>>>>>>>> is not an
>>>>>>>>>>>>>>> xnlock_clear_irqon?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We are in the context of do_exit. Not only IRQs are on, also 
>>>>>>>>>>>>>> preemption.
>>>>>>>>>>>>>> And surely no nklock is held.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Furthermore, I do not understand how we
>>>>>>>>>>>>>>> "synchronize" with the gatekeeper, how is the gatekeeper 
>>>>>>>>>>>>>>> garanteed to
>>>>>>>>>>>>>>> wait for this assignment?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The gatekeeper holds the gksync token while it's active. We 
>>>>>>>>>>>>>> request it,
>>>>>>>>>>>>>> thus we wait for the gatekeeper to become idle again. While it 
>>>>>>>>>>>>>> is idle,
>>>>>>>>>>>>>> we reset the queued reference - but I just realized that this 
>>>>>>>>>>>>>> may tramp
>>>>>>>>>>>>>> on other tasks' values. I need to add a check that the value to 
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> null'ified is actually still ours.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thinking again, that's actually not a problem: gktarget is only 
>>>>>>>>>>>>> needed
>>>>>>>>>>>>> while gksync is zero - but then we won't get hold of it anyway 
>>>>>>>>>>>>> and,
>>>>>>>>>>>>> thus, can't cause any damage.
>>>>>>>>>>>>
>>>>>>>>>>>> Well, you make it look like it does not work. From what I 
>>>>>>>>>>>> understand,
>>>>>>>>>>>> what you want is to set gktarget to null if a task being hardened 
>>>>>>>>>>>> is
>>>>>>>>>>>> destroyed. But by waiting for the semaphore, you actually wait for 
>>>>>>>>>>>> the
>>>>>>>>>>>> harden to be complete, so setting to NULL is useless. Or am I 
>>>>>>>>>>>> missing
>>>>>>>>>>>> something else?
>>>>>>>>>>>
>>>>>>>>>>> Setting to NULL is probably unneeded but still better than rely on 
>>>>>>>>>>> the
>>>>>>>>>>> gatekeeper never waking up spuriously and then dereferencing a stale
>>>>>>>>>>> pointer.
>>>>>>>>>>>
>>>>>>>>>>> The key element of this fix is waitng on gksync, thus on the 
>>>>>>>>>>> completion
>>>>>>>>>>> of the non-RT part of the hardening. Actually, this part usually 
>>>>>>>>>>> fails
>>>>>>>>>>> as the target task received a termination signal at this point.
>>>>>>>>>>
>>>>>>>>>> Yes, but since you wait on the completion of the hardening, the test
>>>>>>>>>> if (target &&...) in the gatekeeper code will always be true, 
>>>>>>>>>> because at
>>>>>>>>>> this point the cleanup code will still be waiting for the semaphore.
>>>>>>>>>
>>>>>>>>> Yes, except we will ever wake up the gatekeeper later on without an
>>>>>>>>> updated gktarget, ie. spuriously. Better safe than sorry, this is 
>>>>>>>>> hairy
>>>>>>>>> code anyway (hopefully obsolete one day).
>>>>>>>>
>>>>>>>> The gatekeeper is not woken up by posting the semaphore, the gatekeeper
>>>>>>>> is woken up by the thread which is going to be hardened (and this 
>>>>>>>> thread
>>>>>>>> is the one which waits for the semaphore).
>>>>>>>
>>>>>>> All true. And what is the point?
>>>>>>
>>>>>> The point being, would not something like this patch be sufficient?
>>>>>>
>>>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>>>>> index 01f4200..4742c02 100644
>>>>>> --- a/ksrc/nucleus/shadow.c
>>>>>> +++ b/ksrc/nucleus/shadow.c
>>>>>> @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct
>>>>>> task_struct *p)
>>>>>>          magic = xnthread_get_magic(thread);
>>>>>>
>>>>>>          xnlock_get_irqsave(&nklock, s);
>>>>>> +        if (xnthread_test_info(thread, XNATOMIC)) {
>>>>>> +                struct xnsched *gksched = xnpod_sched_slot(task_cpu(p));
>>>>>
>>>>> That's not reliable, the task might have been migrated by Linux in the
>>>>> meantime. We must use the stored gksched.
>>>>>
>>>>>> +                xnlock_put_irqrestore(&nklock, s);
>>>>>> +
>>>>>> +                /* Thread is in flight to primary mode, wait for the
>>>>>> +                   gatekeeper to be done with it. */
>>>>>> +                down(&gksched->gksync);
>>>>>> +                up(&gksched->gksync);
>>>>>> +
>>>>>> +                xnlock_get_irqsave(&nklock, s);
>>>>>> +        }
>>>>>> +
>>>>>>          /* Prevent wakeup call from xnshadow_unmap(). */
>>>>>>          xnshadow_thrptd(p) = NULL;
>>>>>>          xnthread_archtcb(thread)->user_task = NULL;
>>>>>>
>>>>>
>>>>> Again, setting gktarget to NULL and testing for NULL is simply safer,
>>>>> and I see no gain in skipping that. But if you prefer the
>>>>> micro-optimization, I'll drop it.
>>>>
>>>> Could not we use an info bit instead of adding a pointer?
>>>>
>>>
>>> "That's not reliable, the task might have been migrated by Linux in the
>>> meantime. We must use the stored gksched."
>>
>> I mean add another info bit to mean that the task is queued for wakeup
>> by the gatekeeper.
>>
>> XNGKQ, or something.
> 
> What additional value does it provide to gksched != NULL? We need that
> pointer anyway to identify the gatekeeper that holds a reference.


No, the scheduler which holds the reference is xnpod_sched_slot(task_cpu(p))

-- 
                                                                Gilles.

_______________________________________________
Xenomai-core mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

Reply via email to