Jan Kiszka wrote: > Jan Kiszka wrote: >> Hi, >> >> I'm banging my head against this issue for several days now, first >> trying to sort out an unrelated bug I also came across at this chance, >> then trying to understand what happens, and finally getting mad about >> why this may only happen with Xenomai: >> >> One process, two threads, running under gdb control (no breakpoints, >> just the automatically set ones that track thread creation/destruction). >> All happens already with only one CPU. The first thread decides to issue >> exit() exactly while the second one is on its way from primary to >> secondary mode due to running on a breakpoint (int3 -> xnpod_trap_fault >> -> xnshadow_relax...). The group exit of thread A causes SIGKILL to be >> set in thread B, but triggers no further actions due to B already being >> awake and on its way to queue and handle the other signal (SIGTRAP). Now >> when B comes to dequeue the next signal it finds SIGTRAP and SIGKILL >> set, but picks up SIGTRAP due to its lower number. Now ptrace causes B >> to stop, gdb gets confused, sends A, which is already a zombie, a >> SIGSTOP and waits on it to confirm this stop - which never happens. If >> someone is interested, I can provide an LTTng dump of this scenario. >> >> My problem is now that I still don't understand what prevents this >> deadlock on vanilla Linux. Does Xenomai create a thread schedule here >> that is impossible there? Or does it only widens an otherwise very >> small race window that also exists with mainline? Before making a fool >> of my self on LKML, I would like to collect some further ideas on the >> workaround or fix(?) below that cures this deadlock for me. > > After reading this comment > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3d749b9e676b26584a47e75c235aa6f69d0697ae > > I'm now about to escalate the issue to LKML. This really looks like a > mainline bug, probably just triggered more quickly by the large latency > between signal queuing and receiver scheduling that the > primary->secondary mode switch introduces.
That said, I think gdb is buggy too: the kill function probably returns some error which says that the thread no longer exists, which gdb probably ignores since it awaits a signal from that killed thread. -- Gilles. _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core