Philippe Gerum wrote:
> On Mon, 2007-05-21 at 16:43 +0200, Jan Kiszka wrote:
>> Hi Philippe,
>>
>> Jan Kiszka wrote:
>>> Just ran into this with CONFIG_IPIPE_DEBUG_CONTEXT (maybe due to some
>>> bug of my own):
>> Here is some code to trigger the issue reliably:
>>
>> #include <sys/mman.h>
>> #include <native/task.h>
>>
>> void task_fnct(void *arg)
>> {
>> rt_task_delete(NULL);
>> }
>>
>> main()
>> {
>> RT_TASK task;
>> mlockall(MCL_CURRENT|MCL_FUTURE);
>> rt_task_spawn(&task, "task", 0, 10, 0, task_fnct, NULL);
>> }
>>
>>
>>> [ 102.616000] I-pipe: Detected illicit call from domain 'Xenomai'
>>> [ 102.616000] into a service reserved for domain 'Linux' and below.
>>> [ 102.616000] c741bdc8 00000000 00000000 c8860ef8 c741bdec c0105683
>>> c032c200 c13fe22c
>>> [ 102.616000] c0361f00 c741be08 c01519ed c032f5b8 c032c742 c03380b3
>>> c8885100 c78beac0
>>> [ 102.616000] c741be14 c0142ce9 c7a80b30 c741be3c c884d075 c885f150
>>> c8860ef8 c741be3c
>>> [ 102.616000] Call Trace:
>>> [ 102.616000] [<c0104d9f>] show_trace_log_lvl+0x1f/0x40
>>> [ 102.616000] [<c0104e71>] show_stack_log_lvl+0xb1/0xe0
>>> [ 102.616000] [<c0105683>] show_stack+0x33/0x40
>>> [ 102.616000] [<c01519ed>] ipipe_check_context+0xad/0xc0
>>> [ 102.616000] [<c0142ce9>] module_put+0x19/0x90
>>> [ 102.616000] [<c884d075>] xnshadow_unmap+0xb5/0x130 [xeno_nucleus]
>>> [ 102.616000] [<c8871dc5>] __shadow_delete_hook+0x25/0x30 [xeno_native]
>>> [ 102.616000] [<c8842f78>] xnpod_schedule+0xb58/0x12f0 [xeno_nucleus]
>>> [ 102.616000] [<c8844bfb>] xnpod_delete_thread+0x2cb/0x3d0 [xeno_nucleus]
>>> [ 102.616000] [<c886f5bd>] rt_task_delete+0x20d/0x220 [xeno_native]
>>>
>>> I would dare to say that module_put in xnshadow_unmap is not well placed
>>> as it can wakeup a Linux process. The module ref-counter maintenance
>>> needs some postponing, I guess.
>
> Mm, I should definitely read mails entirely.
>
>> Attached is a patch proposal. It solves the issue by postponing the
>> module_put via a new schedule_linux_call. Note that this approach issues
>> LO_WAKEUP_REQ where the old test (p->state != TASK_RUNNING) would not
>> have done so. I don't see negative side effects yet,
>
> This code has to work with an awful lot of runtime situations, including
> those raised by ptracing/GDB, and thread signaling in general, so test
> harnessing is key here.Well, what about listing those scenarios and their requirements in a comment if this is so critical? That may make /me feel better as I could more easily go through them and check changes like those of today against them. > >> and I'm furthermore >> not sure of the old code was handling SMP scenarios safely (What if the >> thread to be unmapped was running on different CPU than xnshadow_unmap? >> How to ensure test-atomicity then?). >> > > This wakeup call is there to release emerging shadows waiting on their > start barrier that get killed before start. Other regular cases imply > that either the current thread is exiting - in which case this is a no > brainer, or it has been sent a group kill signal, in which case it has > to wake up anyway. I didn't _removed_ cases where the wakeup takes place, I _added_ some more my removing the test. > >> Well, this thing seems to work, > > It does, actually. > >> but it doesn't leave me with a good >> feeling. IMHO, there are far too many cleanup cases with all the >> combinations of non-shadow termination, shadow self-termination, >> termination on task exit, various SMP scenarios, etc. >> Moreover, quite a >> lot of stuff is executed in atomic scheduling hooks (namely on cleanup). >> At least for shadow threads, I don't think all of this is necessary. >> Actually, deferring most of the uncritical cleanup work into Linux >> context would make things more regular, easier to review, and likely >> less disturbing for overall system latencies. I know that atomic >> scheduling hooks are required to implement certain skins, but I don't >> see why janitor work should be done there as well. And we also still >> have that tiny uncleanness on shadow threads termination around TCB >> release vs. hook execution. > > Talking about xnshadow_unmap, and as your patch illustrates, what you > could postpone is basically the first part of the routine, which updates > the reference counts. The rest _must_ run over the deletion hook in > atomic context (already runs fine all Linux debug switches on). The rpi_pop()? No chance to call it before entering the scheduler and then the hooks? > latency hit is certainly not seen there; the real issues are left in the > native and POSIX skin thread deletion routines (heap management and > other housekeeping calls under nklock and so on). That's surely true. Besides RPI manipulation, there is really no problematic code left here latency-wise. Other hooks are trickier. But my point is that we first need some infrastructure to provide an alternative cleanup context before we can start moving things. My patch today is a hack...err...workaround from this perspective. > >> As we are already at it: LO_MAX_REQUESTS is a hidden limitation that >> may (theoretically) bite large setups when e.g. too many threads gets >> killed from the "wrong" context. At least some overflow detection is >> required here to warn the unfortunate user that he reached hard-coded >> scalability limits. > > Yes. CONFIG_XENO_OPT_DEBUG[_NUCLEUS] or unconditionally (error return code)? Jan
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Xenomai-core mailing list [email protected] https://mail.gna.org/listinfo/xenomai-core
