Il 16/07/21 16:58, Philippe Gerum ha scritto:
Mauro S. <mau.sa...@tin.it> writes:
Il 16/07/21 14:06, Philippe Gerum ha scritto:
Mauro S. via Xenomai <xenomai@xenomai.org> writes:
Hi,
I'm using Xenomai3 (master branch, commit
bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
coniguration. SMI workaround is enabled and all latency tests are
good.
I'm facing with a very weird problem in my application. I have some
tasks with priority < 90 that call rt_task_suspend() on themselves.
Then, I have a task with priority 99 that resumes all other tasks with
rt_task_resume(), when they are suspended.
Sometimes a task does not get resumed.
In /proc/xenomai/sched/stat I have this status for my suspendend task:
CPU PID MSW CSW XSC PF STAT %CPU NAME
1 620 3 8 13 0 00048041 0.0
t12
^
That bit (XNSUSP) indicates that the core thinks the task is still
in
suspended state.
Analizing the scenario attaching gdb to the application, I observe
that the not-resuming task has this backtrace:
#0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
100 signal.c: No such file or directory.
(gdb) bt
#0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
#1 0x00007f0ec14eb379 in threadobj_suspend
(thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
#2 0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
task.c:1154
that seems to me OK. If I understood correctly, it is locked in its
SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
Then, I placed some breakpoints where rt_task_resume() were called,
and in rt_task_resume() itself. I set tcb->suspends=1 with GDB and
followed the subsequent call of threadobj_resume(). Then, I placed
some breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED
bit in thobj->status, and I observed that __RT(kill(thobj->pid,
SIGRESM)) got called, with a retval 0. thobj->pid has the right value.
But the suspended task does not get resumed.
Any idea/suggestion?
You may want to check what happens in __cobalt_kill()
(kernel/cobalt/posix/signal.c), and in COBALT_SYSCALL(kill, ...), to
make sure the call actually succeeds.
I traversed the code until lib/cobalt/signal.c (__cobalt_kill()
library-side) and I cheched retval of XENOMAI_SYSCALL2(sc_cobalt_kill,
pid, sig) (that is 0).
If I understand correctly, kernel/cobalt/posix/signal.c is kernel
code. Should I add printks? Or should I have to attach gdb to kernel
(I think I'm not able)
printk() would be fine.
Hi Philippe an Jan,
I have to check more deeply my application because there are some other
strange behaviors that make me think that there could be memory leaks
somewhere (and a standalone simple test code does not have the problem I
seen).
I will be back if I will find something regarding Xenomai.
Thanks again for your help.
Regards
--
Mauro