Re: [Xenomai-core] hanging in Xenomai 2.5.5
Stefan Schaal wrote: Hi Philippe, thanks a lot for the hint. I configured my kernel from scratch, and got rid of the linux compile problems. I could thus verify that the commit you mentioned below DOES NOT have the problem I described, i.e., semaphores used by multiple processes which are running on different cores DID NOT hang anymore. Then, I thought I try to bisect the problem with git, and I pulled the latest version of the 2.5 repository. Interestingly, with the very latest commits, my problem has gone away. I confirmed this by switching back to Alexis' analogy branch, which I need for my development. This branch is not quite as up-to-date as the 2.5 branch, and the hanging problem still exists. I merged the analogy branch with the latest 2.5 branch, and now nothing hangs anymore. I guess, I stop investigating at this point, unless the problem re-apprears. 2.5.6 should be out soon, which should allow you to avoid doing this. But in the mean-time, you can probably merge the two branches, they should be fairly orthogonal. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] hanging in Xenomai 2.5.5
Hi Philippe, thanks a lot for the hint. I configured my kernel from scratch, and got rid of the linux compile problems. I could thus verify that the commit you mentioned below DOES NOT have the problem I described, i.e., semaphores used by multiple processes which are running on different cores DID NOT hang anymore. Then, I thought I try to bisect the problem with git, and I pulled the latest version of the 2.5 repository. Interestingly, with the very latest commits, my problem has gone away. I confirmed this by switching back to Alexis' analogy branch, which I need for my development. This branch is not quite as up-to-date as the 2.5 branch, and the hanging problem still exists. I merged the analogy branch with the latest 2.5 branch, and now nothing hangs anymore. I guess, I stop investigating at this point, unless the problem re-apprears. Thanks so much for you help! Best wishes, -Stefan On Jan 5, 2011, at 7:53, Philippe Gerum wrote: On Wed, 2011-01-05 at 07:41 -0800, Stefan Schaal wrote: HI Philippe, sorry, I must have mis-communicated. This was, of course, a xenomai commit that I tried, and the errors I sent you resulted when recompiling the linux kernel with this xenomai version. Those errors are not related to Xenomai, they happen on basic linux code. Make sure to work from a fresh build tree, using a proper toolchain. It looks like something is severely broken in your build env. -Stefan On Jan 5, 2011, at 6:07, Philippe Gerum wrote: On Sat, 2010-12-25 at 11:02 -0800, Stefan Schaal wrote: 6a020f5 I don't see how this messages could be related to Xenomai. I was mentioning a Xenomai commit, not a linux one. You should reset to this commit: commit 6a020f5a89955a42f1e03621ae6c63a587e9c75c Author: Philippe Gerum r...@xenomai.org Date: Sat Aug 28 13:04:45 2010 +0200 nucleus, posix: use fast APC scheduling call -- Philippe. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] hanging in Xenomai 2.5.5
Hi Phiippe, thanks so much for your replay -- it took me a moment to get back to this problem. Here are some first observations: 1) the problem only occurs when I distribute the communicating processes over multiple cores -- in Xenomai 2.5.4, this has never been a problem. 2) The /proc/xenomai/stat looks like: CPU PIDMSWCSWPFSTAT %CPU NAME 0 0 0 23492290 00500080 100.0 ROOT/0 1 0 0 20328410 0 00500080 100.0 ROOT/1 2 0 0 10403210 00500080 100.0 ROOT/2 3 0 0 445786 0 00500080 100.0 ROOT/3 4 0 0 71162 0 00500080 100.0 ROOT/4 5 0 0 0 0 00500080 100.0 ROOT/5 6 0 0 0 0 00500080 100.0 ROOT/6 7 0 0 0 0 00500080 100.0 ROOT/7 1 3128 0 91261 0 003001820.0 sem1_task 2 3166 0 90470 0 003001880.0 sem2_task 3 3195 0 45237 0 003001820.0 sem3_task 0 0 0 0 0 0.0 IRQ56: Analogy device 1 0 0 0 0 0.0 IRQ56: Analogy device 2 0 0 0 0 0.0 IRQ56: Analogy device 3 0 0 0 0 0.0 IRQ56: Analogy device 4 0 0 0 0 0.0 IRQ56: Analogy device 5 0 0 0 0 0.0 IRQ56: Analogy device 6 0 0 0 0 0.0 IRQ56: Analogy device 7 0 0 0 0 0.0 IRQ56: Analogy device 1 0 0 39326230 0 0.0 IRQ521: [timer] 2 0 0 16415320 0.0 IRQ521: [timer] 3 0 0 12585710 0.0 IRQ521: [timer] 4 0 0 722843 0 0.0 IRQ521: [timer] 5 0 0 780591 0 0.0 IRQ521: [timer] 6 0 0 764817 0 0.0 IRQ521: [timer] 7 0 0 385421 0 0.0 IRQ521: [timer] The three communicating processes are sem1_task, sem2_task, sem3_task -- they are currently hanging with 0% CPU 3) the /proc/xenomai/sched look like: CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle-1 - master R ROOT/0 1 0 idle-1 - master R ROOT/1 2 0 idle-1 - master R ROOT/2 3 0 idle-1 - master R ROOT/3 4 0 idle-1 - master R ROOT/4 5 0 idle-1 - master R ROOT/5 6 0 idle-1 - master R ROOT/6 7 0 idle-1 - master R ROOT/7 1 3128 rt 50 - master W sem1_task 2 3166 rt 50 - master R sem2_task 3 3195 rt 50 - master W sem3_task Interestingly, despite sem2_task is supposed to be running, it doesn't. 4) When I try to terminate the three processes, sem2_task would hand and I cannot kill it. Interestingly, if I start another program that does a similar semaphore communication, sem2_task is finally released. Indeed, when I start this other program, the three processes (sem1_task, sem2_task, sem3_task) start running again, until they hang again. 5) I appended the little test program I used -- it is called xtest_xeno_sem.c I compile with: gcc -o xtest -I/usr/xenomai/include -D_GNU_SOURCE -D_REENTRANT -Wall -pipe -D__XENO__ -lnative -L/usr/xenomai/lib -lxenomai -lpthread -lrt -lrtdk -lnative test_xeno_sem.c To create three communicating processes on different cores, I execute: terminal1 xtest 1 1 terminal2 xtest 2 1 terminal3 xtest 3 1 To create three communicating processes on ONE core, I execute: terminal1 xtest 1 0 terminal2 xtest 2 0 terminal3 xtest 3 0 6) I haven't tested the other commits yet -- this comes next. But maybe the information above already tells you all you need to know. Best wishes, and, as always, a thousand thanks for your kind help! -Stefan --- test_xeno_sem.c test_xeno_sem.c Description: Binary data On Oct 16, 2010, at 1:48, Philippe Gerum wrote: On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote: Hi everybody, here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue. We run multiple real-time
Re: [Xenomai-core] hanging in Xenomai 2.5.5
On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote: Hi everybody, here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue. We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive. Up to version 2.5.4, this worked fine. With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required. The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher. No dmesg print-outs when this error occurs. We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list. $ cat /proc/xenomai/stat $ cat /proc/xenomai/sched when the threads hang would help. Additionally, please clone the -stable repo from there: git://git.xenomai.org/xenomai-2.5.git then branch+build and test from these commits: - 6a020f5 first; if the bug does not show up anymore, check the next one - 5e7cfa5; if the bug is still there, try disabling CONFIG_XENO_OPT_PRIOCPL to test the basic system and re-check. Best wishes, -Stefan ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] hanging in Xenomai 2.5.5
Hi everybody, here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue. We run multiple real-time processes, synchronized by semaphores and interprocess communication using shared memory. All is cleanly implemented using the xenomai real-time functions, no mode switches. The different processes are distributed on different processors of our multi-core machine using rt_task_spawn() with the T_CPU directive. Up to version 2.5.4, this worked fine. With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of running (CPU consumption goes to zero), and usually one of them hangs so badly that it cannot be killed anymore with kill -9 -- thus reboot is required. The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and higher. No dmesg print-outs when this error occurs. We will try to create a simple test program to illustrate the problem, but maybe the issue is already obvious to some of the experts on this list. Best wishes, -Stefan ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core