Philippe Gerum wrote: > On Mon, 2007-05-21 at 16:43 +0200, Jan Kiszka wrote: >> Hi Philippe, >> >> Jan Kiszka wrote: >>> Just ran into this with CONFIG_IPIPE_DEBUG_CONTEXT (maybe due to some >>> bug of my own): >> Here is some code to trigger the issue reliably: >> >> #include <sys/mman.h> >> #include <native/task.h> >> >> void task_fnct(void *arg) >> { >> rt_task_delete(NULL); >> } >> >> main() >> { >> RT_TASK task; >> mlockall(MCL_CURRENT|MCL_FUTURE); >> rt_task_spawn(&task, "task", 0, 10, 0, task_fnct, NULL); >> } >> >> >>> [ 102.616000] I-pipe: Detected illicit call from domain 'Xenomai' >>> [ 102.616000] into a service reserved for domain 'Linux' and below. >>> [ 102.616000] c741bdc8 00000000 00000000 c8860ef8 c741bdec c0105683 >>> c032c200 c13fe22c >>> [ 102.616000] c0361f00 c741be08 c01519ed c032f5b8 c032c742 c03380b3 >>> c8885100 c78beac0 >>> [ 102.616000] c741be14 c0142ce9 c7a80b30 c741be3c c884d075 c885f150 >>> c8860ef8 c741be3c >>> [ 102.616000] Call Trace: >>> [ 102.616000] [<c0104d9f>] show_trace_log_lvl+0x1f/0x40 >>> [ 102.616000] [<c0104e71>] show_stack_log_lvl+0xb1/0xe0 >>> [ 102.616000] [<c0105683>] show_stack+0x33/0x40 >>> [ 102.616000] [<c01519ed>] ipipe_check_context+0xad/0xc0 >>> [ 102.616000] [<c0142ce9>] module_put+0x19/0x90 >>> [ 102.616000] [<c884d075>] xnshadow_unmap+0xb5/0x130 [xeno_nucleus] >>> [ 102.616000] [<c8871dc5>] __shadow_delete_hook+0x25/0x30 [xeno_native] >>> [ 102.616000] [<c8842f78>] xnpod_schedule+0xb58/0x12f0 [xeno_nucleus] >>> [ 102.616000] [<c8844bfb>] xnpod_delete_thread+0x2cb/0x3d0 [xeno_nucleus] >>> [ 102.616000] [<c886f5bd>] rt_task_delete+0x20d/0x220 [xeno_native] >>> >>> I would dare to say that module_put in xnshadow_unmap is not well placed >>> as it can wakeup a Linux process. The module ref-counter maintenance >>> needs some postponing, I guess. > > Mm, I should definitely read mails entirely. > >> Attached is a patch proposal. It solves the issue by postponing the >> module_put via a new schedule_linux_call. Note that this approach issues >> LO_WAKEUP_REQ where the old test (p->state != TASK_RUNNING) would not >> have done so. I don't see negative side effects yet, > > This code has to work with an awful lot of runtime situations, including > those raised by ptracing/GDB, and thread signaling in general, so test > harnessing is key here.
Well, what about listing those scenarios and their requirements in a comment if this is so critical? That may make /me feel better as I could more easily go through them and check changes like those of today against them. > >> and I'm furthermore >> not sure of the old code was handling SMP scenarios safely (What if the >> thread to be unmapped was running on different CPU than xnshadow_unmap? >> How to ensure test-atomicity then?). >> > > This wakeup call is there to release emerging shadows waiting on their > start barrier that get killed before start. Other regular cases imply > that either the current thread is exiting - in which case this is a no > brainer, or it has been sent a group kill signal, in which case it has > to wake up anyway. I didn't _removed_ cases where the wakeup takes place, I _added_ some more my removing the test. > >> Well, this thing seems to work, > > It does, actually. > >> but it doesn't leave me with a good >> feeling. IMHO, there are far too many cleanup cases with all the >> combinations of non-shadow termination, shadow self-termination, >> termination on task exit, various SMP scenarios, etc. >> Moreover, quite a >> lot of stuff is executed in atomic scheduling hooks (namely on cleanup). >> At least for shadow threads, I don't think all of this is necessary. >> Actually, deferring most of the uncritical cleanup work into Linux >> context would make things more regular, easier to review, and likely >> less disturbing for overall system latencies. I know that atomic >> scheduling hooks are required to implement certain skins, but I don't >> see why janitor work should be done there as well. And we also still >> have that tiny uncleanness on shadow threads termination around TCB >> release vs. hook execution. > > Talking about xnshadow_unmap, and as your patch illustrates, what you > could postpone is basically the first part of the routine, which updates > the reference counts. The rest _must_ run over the deletion hook in > atomic context (already runs fine all Linux debug switches on). The rpi_pop()? No chance to call it before entering the scheduler and then the hooks? > latency hit is certainly not seen there; the real issues are left in the > native and POSIX skin thread deletion routines (heap management and > other housekeeping calls under nklock and so on). That's surely true. Besides RPI manipulation, there is really no problematic code left here latency-wise. Other hooks are trickier. But my point is that we first need some infrastructure to provide an alternative cleanup context before we can start moving things. My patch today is a hack...err...workaround from this perspective. > >> As we are already at it: LO_MAX_REQUESTS is a hidden limitation that >> may (theoretically) bite large setups when e.g. too many threads gets >> killed from the "wrong" context. At least some overflow detection is >> required here to warn the unfortunate user that he reached hard-coded >> scalability limits. > > Yes. CONFIG_XENO_OPT_DEBUG[_NUCLEUS] or unconditionally (error return code)? Jan
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core