Re: [Xenomai-core] [BUG] module usage counter of xenomai native corrupted (version 2.2.0 and 2.2.5)
On Fri, 2006-12-08 at 22:35 +0100, Philippe Gerum wrote: On Fri, 2006-12-08 at 20:05 +0100, Jan Kiszka wrote: Philippe Gerum wrote: On Fri, 2006-12-08 at 19:02 +0100, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Index: ksrc/nucleus/shadow.c === --- ksrc/nucleus/shadow.c (révision 1930) +++ ksrc/nucleus/shadow.c (copie de travail) @@ -888,6 +888,9 @@ p = xnthread_archtcb(thread)-user_task;/* May be != current */ + if (!xnshadow_thrptd(p)) + return; + magic = xnthread_get_magic(thread); for (muxid = 0; muxid XENOMAI_MUX_NR; muxid++) { @@ -1639,8 +1642,6 @@ xnshadow_relax(0); xnlock_get_irqsave(nklock, s); - /* Prevent wakeup call from xnshadow_unmap(). */ - xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; /* xnpod_delete_thread() - hook - xnshadow_unmap(). */ xnpod_delete_thread(thread); Can't comment on the correctness of the second hunk, but it unfortunately doesn't change the situation that test case does not longer terminate with the first hunk applied. May look like a trivial issue - but it isn't. :- Indeed. And xnshadow_thrptd(current) == NULL is used by xnpod_schedule, so the patch is probably completely incorrect. We should rather check the TCB backlink to the Linux task. Could someone who can reproduce this issue, test the following patch? TIA, --- ksrc/nucleus/shadow.c (revision 1931) +++ ksrc/nucleus/shadow.c (working copy) @@ -888,6 +888,10 @@ p = xnthread_archtcb(thread)-user_task;/* May be != current */ + xnltt_log_event(xeno_ev_shadowunmap, thread-name, p ? p-pid : -1); + if (!p) + goto renice_and_exit; + magic = xnthread_get_magic(thread); for (muxid = 0; muxid XENOMAI_MUX_NR; muxid++) { @@ -907,10 +911,6 @@ } } - xnltt_log_event(xeno_ev_shadowunmap, thread-name, p ? p-pid : -1); - if (!p) - goto renice_and_exit; - xnshadow_thrptd(p) = NULL; if (p-state != TASK_RUNNING) { Doesn't work, usage counter is now incremented. BTW, this patch slipped Ok, I will try to reproduce it here before fixing. Now fixed in both branches, v2.2.x and trunk. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] module usage counter of xenomai native corrupted (version 2.2.0 and 2.2.5)
Jan Kiszka wrote: Thomas Wiedemann wrote: Hi, there seems to be a bug in rt_task_create(). When no more memory is available, the module usage counter of xeno_native is decremented. I guess it is not incremented before, however, so the counter gets 0 and wraps then to a negative number. It is therefore not possible to remove the module. I appended a small program to demonstrate this. It simply eats up all memory from xenomai by registering as much mutexes as possible, and then tries to execute rt_task_create(), which fails. When started again, the bug occurs at rt_task_shadow(), as the mutexes have never been deleted. Compile with gcc -O2 -Wall `xeno-config --xeno-cflags` `xeno-config --xeno-ldflags` -lrtdm -lnative -o rttest rttest.c then simply run it, and watch the output of lsmod before and after. Tested with xenomai 2.2.{0,5} and linux 2.6.17.8, modules loaded: xeno_native and xeno_nucleus. Confirmed. Requires a closer look to find the leak path. Here is what happens: the task is created with the XNSHADOW bit, and destroyed before it was xnshadow_mapped, but the deletion hook calls xnshadow_unmap because the task has the XNSHADOW bit. And xnshadow_unmap decrements the module count. -- Gilles Chanteperdrix ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] module usage counter of xenomai native corrupted (version 2.2.0 and 2.2.5)
Gilles Chanteperdrix wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Thomas Wiedemann wrote: Hi, there seems to be a bug in rt_task_create(). When no more memory is available, the module usage counter of xeno_native is decremented. I guess it is not incremented before, however, so the counter gets 0 and wraps then to a negative number. It is therefore not possible to remove the module. I appended a small program to demonstrate this. It simply eats up all memory from xenomai by registering as much mutexes as possible, and then tries to execute rt_task_create(), which fails. When started again, the bug occurs at rt_task_shadow(), as the mutexes have never been deleted. Compile with gcc -O2 -Wall `xeno-config --xeno-cflags` `xeno-config --xeno-ldflags` -lrtdm -lnative -o rttest rttest.c then simply run it, and watch the output of lsmod before and after. Tested with xenomai 2.2.{0,5} and linux 2.6.17.8, modules loaded: xeno_native and xeno_nucleus. Confirmed. Requires a closer look to find the leak path. Here is what happens: the task is created with the XNSHADOW bit, and destroyed before it was xnshadow_mapped, but the deletion hook calls xnshadow_unmap because the task has the XNSHADOW bit. And xnshadow_unmap decrements the module count. Here is an untested quick fix. Index: ksrc/nucleus/shadow.c === --- ksrc/nucleus/shadow.c (révision 1930) +++ ksrc/nucleus/shadow.c (copie de travail) @@ -888,6 +888,9 @@ p = xnthread_archtcb(thread)-user_task;/* May be != current */ + if (!xnshadow_thrptd(p)) + return; + magic = xnthread_get_magic(thread); for (muxid = 0; muxid XENOMAI_MUX_NR; muxid++) { Nope, shows unwanted side effects, probably because xnshadow_thrptd is already NULL'ed in do_taskexit_event. Looks like it takes an extra flag, no? signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] module usage counter of xenomai native corrupted (version 2.2.0 and 2.2.5)
Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Thomas Wiedemann wrote: Hi, there seems to be a bug in rt_task_create(). When no more memory is available, the module usage counter of xeno_native is decremented. I guess it is not incremented before, however, so the counter gets 0 and wraps then to a negative number. It is therefore not possible to remove the module. I appended a small program to demonstrate this. It simply eats up all memory from xenomai by registering as much mutexes as possible, and then tries to execute rt_task_create(), which fails. When started again, the bug occurs at rt_task_shadow(), as the mutexes have never been deleted. Compile with gcc -O2 -Wall `xeno-config --xeno-cflags` `xeno-config --xeno-ldflags` -lrtdm -lnative -o rttest rttest.c then simply run it, and watch the output of lsmod before and after. Tested with xenomai 2.2.{0,5} and linux 2.6.17.8, modules loaded: xeno_native and xeno_nucleus. Confirmed. Requires a closer look to find the leak path. Here is what happens: the task is created with the XNSHADOW bit, and destroyed before it was xnshadow_mapped, but the deletion hook calls xnshadow_unmap because the task has the XNSHADOW bit. And xnshadow_unmap decrements the module count. Here is an untested quick fix. Index: ksrc/nucleus/shadow.c === --- ksrc/nucleus/shadow.c (révision 1930) +++ ksrc/nucleus/shadow.c (copie de travail) @@ -888,6 +888,9 @@ p = xnthread_archtcb(thread)-user_task;/* May be != current */ + if (!xnshadow_thrptd(p)) + return; + magic = xnthread_get_magic(thread); for (muxid = 0; muxid XENOMAI_MUX_NR; muxid++) { Nope, shows unwanted side effects, probably because xnshadow_thrptd is already NULL'ed in do_taskexit_event. Looks like it takes an extra flag, no? Setting xnshadow_thrptd to NULL in do_taskexit_event does not seem to be that useful. Here comes version 2. Index: ksrc/nucleus/shadow.c === --- ksrc/nucleus/shadow.c (révision 1930) +++ ksrc/nucleus/shadow.c (copie de travail) @@ -888,6 +888,9 @@ p = xnthread_archtcb(thread)-user_task;/* May be != current */ + if (!xnshadow_thrptd(p)) + return; + magic = xnthread_get_magic(thread); for (muxid = 0; muxid XENOMAI_MUX_NR; muxid++) { @@ -1639,8 +1642,6 @@ xnshadow_relax(0); xnlock_get_irqsave(nklock, s); - /* Prevent wakeup call from xnshadow_unmap(). */ - xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; /* xnpod_delete_thread() - hook - xnshadow_unmap(). */ xnpod_delete_thread(thread); Can't comment on the correctness of the second hunk, but it unfortunately doesn't change the situation that test case does not longer terminate with the first hunk applied. May look like a trivial issue - but it isn't. :- signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] module usage counter of xenomai native corrupted (version 2.2.0 and 2.2.5)
Jan Kiszka wrote: Gilles Chanteperdrix wrote: Index: ksrc/nucleus/shadow.c === --- ksrc/nucleus/shadow.c (révision 1930) +++ ksrc/nucleus/shadow.c (copie de travail) @@ -888,6 +888,9 @@ p = xnthread_archtcb(thread)-user_task;/* May be != current */ + if (!xnshadow_thrptd(p)) + return; + magic = xnthread_get_magic(thread); for (muxid = 0; muxid XENOMAI_MUX_NR; muxid++) { @@ -1639,8 +1642,6 @@ xnshadow_relax(0); xnlock_get_irqsave(nklock, s); - /* Prevent wakeup call from xnshadow_unmap(). */ - xnshadow_thrptd(p) = NULL; xnthread_archtcb(thread)-user_task = NULL; /* xnpod_delete_thread() - hook - xnshadow_unmap(). */ xnpod_delete_thread(thread); Can't comment on the correctness of the second hunk, but it unfortunately doesn't change the situation that test case does not longer terminate with the first hunk applied. May look like a trivial issue - but it isn't. :- Indeed. And xnshadow_thrptd(current) == NULL is used by xnpod_schedule, so the patch is probably completely incorrect. -- Gilles Chanteperdrix ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] module usage counter of xenomai native corrupted (version 2.2.0 and 2.2.5)
Thomas Wiedemann wrote: Hi, there seems to be a bug in rt_task_create(). When no more memory is available, the module usage counter of xeno_native is decremented. I guess it is not incremented before, however, so the counter gets 0 and wraps then to a negative number. It is therefore not possible to remove the module. I appended a small program to demonstrate this. It simply eats up all memory from xenomai by registering as much mutexes as possible, and then tries to execute rt_task_create(), which fails. When started again, the bug occurs at rt_task_shadow(), as the mutexes have never been deleted. Compile with gcc -O2 -Wall `xeno-config --xeno-cflags` `xeno-config --xeno-ldflags` -lrtdm -lnative -o rttest rttest.c then simply run it, and watch the output of lsmod before and after. Tested with xenomai 2.2.{0,5} and linux 2.6.17.8, modules loaded: xeno_native and xeno_nucleus. Confirmed. Requires a closer look to find the leak path. Thanks for reporting so thoroughly, Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core