Jan Kiszka wrote:
> Jan Kiszka wrote:
>> I'm banging my head against this issue for several days now, first
>> trying to sort out an unrelated bug I also came across at this chance,
>> then trying to understand what happens, and finally getting mad about
>> why this may only happen with Xenomai:
>> One process, two threads, running under gdb control (no breakpoints,
>> just the automatically set ones that track thread creation/destruction).
>> All happens already with only one CPU. The first thread decides to issue
>> exit() exactly while the second one is on its way from primary to
>> secondary mode due to running on a breakpoint (int3 -> xnpod_trap_fault
>> -> xnshadow_relax...). The group exit of thread A causes SIGKILL to be
>> set in thread B, but triggers no further actions due to B already being
>> awake and on its way to queue and handle the other signal (SIGTRAP). Now
>> when B comes to dequeue the next signal it finds SIGTRAP and SIGKILL
>> set, but picks up SIGTRAP due to its lower number. Now ptrace causes B
>> to stop, gdb gets confused, sends A, which is already a zombie, a
>> SIGSTOP and waits on it to confirm this stop - which never happens. If
>> someone is interested, I can provide an LTTng dump of this scenario.
>> My problem is now that I still don't understand what prevents this
>> deadlock on vanilla Linux. Does Xenomai create a thread schedule here
>> that is impossible there? Or does it only widens an otherwise very
>> small race window that also exists with mainline? Before making a fool
>> of my self on LKML, I would like to collect some further ideas on the
>> workaround or fix(?) below that cures this deadlock for me.
> After reading this comment
> I'm now about to escalate the issue to LKML. This really looks like a
> mainline bug, probably just triggered more quickly by the large latency
> between signal queuing and receiver scheduling that the
> primary->secondary mode switch introduces.
That said, I think gdb is buggy too: the kill function probably returns
some error which says that the thread no longer exists, which gdb
probably ignores since it awaits a signal from that killed thread.
Xenomai-core mailing list