On 2015-03-16 17:09, Philippe Gerum wrote:
> On 03/16/2015 05:00 PM, Jan Kiszka wrote:
>> On 2015-03-16 16:31, Jan Kiszka wrote:
>>> On 2015-03-16 15:43, Philippe Gerum wrote:
>>>> On 03/11/2015 03:47 PM, Jan Kiszka wrote:
>>>>> Hi Philippe,
>>>>>
>>>>> just happened to trigger the oops below by running
>>>>>
>>>>> gdb --args smokey --run=8
>>>>>
>>>>> That run already has troubles and generates different output than
>>>>> running the test without gdb surveillance, probably due to unexpected
>>>>> mode switches.
>>>>
>>>> Clearly, yes. GDB causes the test program to leave primary mode, which
>>>> changes the scheduling order, and therefore the output which depends on it.
>>>>
>>>> But the real problem is that running the test again
>>>>> afterwards, with or without gdb, causes the oops. Registers contain
>>>>> suspicious "dead" patterns, thus we access invalid list elements. Do we
>>>>> miss a cleanup when terminating smokey in the gdb session?
>>>>>
>>>>
>>>> I could not reproduce this bug yet.
>>>>
>>>> There is no reason for ptracing the application to have any impact on
>>>> the housekeeping chores when it exits. The backtrace shows that
>>>> xnsched_tp_set_schedule() is walking through tp->threads, which seems to
>>>> link to a stale tcb. xnsched_tp_forget() would then be called twice,
>>>> leading to the fault.
>>>>
>>>> Normally, a thread that undergoes TP scheduling should be automatically
>>>> removed from tp->threads upon exit after this sequence took place:
>>>>
>>>> handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
>>>> xnsched_forget -> xnsched_tp_forget
>>>>
>>>> For that bug to happen, either this assumption has to be wrong, or
>>>> xnsched_set_policy() is being silly at some point.
>>>>
>>>> Is this 100% reproducible on your end, and does this require the initial
>>>> gdb run to show up, or would that break even when running the sched_tp
>>>> twice without gdb?
>>>
>>> It is always reproducible, also with current next branch. And you need
>>> to run gdb beforehand, yes.
>>>
>>> I'll see if I can look into details.
>>
>> During cleanup of the first run under gdb, I get this one as expected
>> (and two more hits for thread and C):
>>
>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at
>> ../kernel/xenomai/sched-tp.c:175
>> 175 {
>> (gdb) p thread->name
>> $3 = "threadA", '\000' <repeats 24 times>
>> (gdb) bt
>> #0 xnsched_tp_forget (thread=0xffff88003ad07040) at
>> ../kernel/xenomai/sched-tp.c:175
>> #1 0xffffffff8114b19f in xnsched_forget (thread=<optimized out>) at
>> ../include/xenomai/cobalt/kernel/sched.h:603
>> #2 cleanup_tcb (thread=<optimized out>) at ../kernel/xenomai/thread.c:467
>> #3 __xnthread_cleanup (curr=0xffff88003ad07040) at
>> ../kernel/xenomai/thread.c:486
>> #4 0xffffffff811794fd in handle_taskexit_event (p=<optimized out>) at
>> ../kernel/xenomai/posix/process.c:1028
>> #5 0xffffffff8117b49d in ipipe_kevent_hook (kevent=<optimized out>,
>> data=0xffff88003cfcb870) at ../kernel/xenomai/posix/process.c:1228
>> #6 0xffffffff810fc6d1 in __ipipe_notify_kevent (kevent=<optimized out>,
>> data=0xffff88003cfcb870) at ../kernel/ipipe/core.c:1092
>> #7 0xffffffff81050702 in do_exit (code=0) at ../kernel/exit.c:717
>> #8 0xffffffff810518a7 in SYSC_exit (error_code=<optimized out>) at
>> ../kernel/exit.c:855
>> #9 SyS_exit (error_code=<optimized out>) at ../kernel/exit.c:853
>> #10 <signal handler called>
>> #11 0x00007ffff7354146 in ?? ()
>> #12 0xffff88003cfcde10 in ?? ()
>> #13 0xffffffff81a09260 in ?? ()
>> #14 0x0000000000000000 in ?? ()
>> (gdb) c
>> Continuing.
>>
>>
>> But then, when I start the test again (with or without gdb), I also get
>> this right at the beginning:
>>
>>
>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at
>> ../kernel/xenomai/sched-tp.c:175
>> 175 {
>> (gdb) bt
>> #0 xnsched_tp_forget (thread=0xffff88003ad07040) at
>> ../kernel/xenomai/sched-tp.c:175
>> #1 0xffffffff8113ebae in xnsched_forget (thread=<optimized out>) at
>> ../include/xenomai/cobalt/kernel/sched.h:603
>> #2 xnsched_set_policy (thread=0xffff88003ad07040,
>> sched_class=0xffffffff81a2bbe0 <xnsched_class_rt>, p=0xffff88003b813e00) at
>> ../kernel/xenomai/sched.c:403
>> #3 0xffffffff8115184f in xnsched_tp_set_schedule (sched=0xffff88003ad07040,
>> gps=0xffff88003ad08080) at ../kernel/xenomai/sched-tp.c:260
>> #4 0xffffffff8117c5df in set_tp_config (len=<optimized out>,
>> config=<optimized out>, cpu=<optimized out>) at
>> ../kernel/xenomai/posix/sched.c:284
>
> Yes, this one is the weird one. Normally, we should not find any TCB
> lingering in tp->threads, once threads A, B and C have exited and
> unlinked from it via xnsched_forget().
>
> That call on behalf of xnsched_tp_set_schedule() is aimed at moving all
> threads currently undergoing a TP schedule to the RT class, since we are
> about to change the scheduling data (i.e. time windows and partitions).
> Why tp->threads is not empty when running xnsched_tp_set_schedule() at
> the next program invocation is what needs to be explained.
Under gdb supervision, some threads are added to the tp pool multiple
times:
Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040,
p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
160 {
(gdb) bt
#0 xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at
../kernel/xenomai/sched-tp.c:160
#1 0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized
out>, sched_class=<optimized out>) at
../include/xenomai/cobalt/kernel/sched.h:540
#2 xnsched_set_policy (thread=0xffff88003ad07040,
sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at
../kernel/xenomai/sched.c:390
#3 0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040,
sched_class=<optimized out>, sched_param=<optimized out>) at
../kernel/xenomai/thread.c:1802
#4 0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040,
sched_class=0xffffffff81a2e400 <xnsched_class_tp>,
sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
#5 0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>,
policy=<optimized out>, thread=<optimized out>) at
../kernel/xenomai/posix/thread.c:271
#6 __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized
out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized
out>) at ../kernel/xenomai/posix/thread.c:481
#7 0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544,
policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec,
u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
#8 0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>,
regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
#9 ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at
../kernel/xenomai/posix/syscall.c:1107
#10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at
../kernel/ipipe/core.c:1006
#11 <signal handler called>
#12 0x00007ffff77b7d25 in ?? ()
#13 0x5a5a5a5a00000001 in ?? ()
#14 0x0000000000000000 in ?? ()
(gdb) c
Continuing.
Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040,
p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
160 {
(gdb) bt
#0 xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at
../kernel/xenomai/sched-tp.c:160
#1 0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized
out>, sched_class=<optimized out>) at
../include/xenomai/cobalt/kernel/sched.h:540
#2 xnsched_set_policy (thread=0xffff88003ad07040,
sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at
../kernel/xenomai/sched.c:390
#3 0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040,
sched_class=<optimized out>, sched_param=<optimized out>) at
../kernel/xenomai/thread.c:1802
#4 0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040,
sched_class=0xffffffff81a2e400 <xnsched_class_tp>,
sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
#5 0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>,
policy=<optimized out>, thread=<optimized out>) at
../kernel/xenomai/posix/thread.c:271
#6 __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized
out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized
out>) at ../kernel/xenomai/posix/thread.c:481
#7 0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544,
policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec,
u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
#8 0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>,
regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
#9 ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at
../kernel/xenomai/posix/syscall.c:1107
#10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at
../kernel/ipipe/core.c:1006
#11 <signal handler called>
#12 0x00007ffff77b7d25 in ?? ()
#13 0x5a5a5a5a00000001 in ?? ()
#14 0x0000000000000000 in ?? ()
Could it be that some syscall is restarted after being interrupted by
ptrace?
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai