Jan Kiszka wrote:
> Jan Kiszka wrote:
>> Hi,
>>
>> I'm banging my head against this issue for several days now, first
>> trying to sort out an unrelated bug I also came across at this chance,
>> then trying to understand what happens, and finally getting mad about
>> why this may only happen with Xenomai:
>>
>> One process, two threads, running under gdb control (no breakpoints,
>> just the automatically set ones that track thread creation/destruction).
>> All happens already with only one CPU. The first thread decides to issue
>> exit() exactly while the second one is on its way from primary to
>> secondary mode due to running on a breakpoint (int3 -> xnpod_trap_fault
>> -> xnshadow_relax...). The group exit of thread A causes SIGKILL to be
>> set in thread B, but triggers no further actions due to B already being
>> awake and on its way to queue and handle the other signal (SIGTRAP). Now
>> when B comes to dequeue the next signal it finds SIGTRAP and SIGKILL
>> set, but picks up SIGTRAP due to its lower number. Now ptrace causes B
>> to stop, gdb gets confused, sends A, which is already a zombie, a
>> SIGSTOP and waits on it to confirm this stop - which never happens. If
>> someone is interested, I can provide an LTTng dump of this scenario.
>>
>> My problem is now that I still don't understand what prevents this
>> deadlock on vanilla Linux. Does Xenomai create a thread schedule here
>> that is impossible there? Or does it only widens an otherwise very
>> small race window that also exists with mainline? Before making a fool
>> of my self on LKML, I would like to collect some further ideas on the
>> workaround or fix(?) below that cures this deadlock for me.
> 
> After reading this comment
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3d749b9e676b26584a47e75c235aa6f69d0697ae
> 
> I'm now about to escalate the issue to LKML. This really looks like a
> mainline bug, probably just triggered more quickly by the large latency
> between signal queuing and receiver scheduling that the
> primary->secondary mode switch introduces.

That said, I think gdb is buggy too: the kill function probably returns
some error which says that the thread no longer exists, which gdb
probably ignores since it awaits a signal from that killed thread.

-- 
                                                 Gilles.

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to