Re: [Xenomai-core] [RFC] Fixes for domain migration races

2011-07-28 Thread Jan Kiszka
On 2011-07-27 20:44, Gilles Chanteperdrix wrote:
 On 07/19/2011 08:44 AM, Jan Kiszka wrote:
 Hi,

 I've just uploaded my upstream queue that mostly deals with the various
 races I found in the domain migration code.

 One of my concerns raised earlier turned out to be for no reason: We do
 not allow Linux to wake up a task that has TASK_ATOMICSWITCH set. So the
 deletion race can indeed be fixed by the patch I sent earlier.
 
 So, I still have the same question: is not the solution of synchronizing
 with the gatekeeper as soon as we get out from schedule in secondary
 mode better than waiting the task_exit callback? It looks more correct,
 and it avoids gksched.

Yes, I was on the wrong track /wrt wakeup races during the early
migration phase.

It is a possible and valid scenario that the task returns from
schedule() without being migrated. That can only happen if a signal was
queued in the meantime. The task will not be woken up again, that is
prevented by ATOMICSWITCH, but it will check for pending signals itself
before falling asleep. In that case it will enter TASK_RUNNING again and
return either before the gatekeeper could run or, on SMP, may continue
in parallel on a different CPU.

What saves us now from the fatal scenario that both the task runs and
the gatekeeper resumes its Xenomai part is that TASK_INTERRUPTIBLE state
was left. And if we wait for the gatekeeper to realize this like you
suggested, we ensure that neither the object is deleted too early nor
TASK_INTERRUPTIBLE is reentered again by doing Linux work.

I've cleaned up my queue correspondingly and just pushed it.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RFC] Fixes for domain migration races

2011-07-27 Thread Gilles Chanteperdrix
On 07/19/2011 08:44 AM, Jan Kiszka wrote:
 Hi,
 
 I've just uploaded my upstream queue that mostly deals with the various
 races I found in the domain migration code.
 
 One of my concerns raised earlier turned out to be for no reason: We do
 not allow Linux to wake up a task that has TASK_ATOMICSWITCH set. So the
 deletion race can indeed be fixed by the patch I sent earlier.

So, I still have the same question: is not the solution of synchronizing
with the gatekeeper as soon as we get out from schedule in secondary
mode better than waiting the task_exit callback? It looks more correct,
and it avoids gksched.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [RFC] Fixes for domain migration races

2011-07-19 Thread Jan Kiszka
Hi,

I've just uploaded my upstream queue that mostly deals with the various
races I found in the domain migration code.

One of my concerns raised earlier turned out to be for no reason: We do
not allow Linux to wake up a task that has TASK_ATOMICSWITCH set. So the
deletion race can indeed be fixed by the patch I sent earlier. However,
we do not synchronize setting and testing of TASK_ATOMICSWITCH (because
we cannot hold the rq lock), thus we still face a small race window that
allows premature wakeups, at least in theory. That's now addressed by
patch 3.

Besides another race around set/clear_task_nowakeup, there should have
been a window during early migration to RT where we silently swallowed
Linux signals. Closed by patch 4, hopefully also fixing our spurious gdb
lockups on SMP boxes - time will tell.

Please review carefully.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core