Re: [Xenomai-core] hanging in Xenomai 2.5.5

2011-01-07 Thread Gilles Chanteperdrix
Stefan Schaal wrote:
 Hi Philippe,
 
 thanks a lot for the hint. I configured my kernel from scratch, and
 got rid of the linux compile problems. I could thus verify that the
 commit you mentioned below DOES NOT have the problem I described,
 i.e., semaphores used by multiple processes which are running on
 different cores DID NOT hang anymore.
 
 Then, I thought I try to bisect the problem with git, and I pulled
 the latest version of the 2.5 repository. Interestingly, with the
 very latest commits, my problem has gone away. I confirmed this by
 switching back to Alexis' analogy branch, which I need for my
 development. This branch is not quite as up-to-date as the 2.5
 branch, and the hanging problem still exists. I merged the analogy
 branch with the latest 2.5 branch, and now nothing hangs anymore.
 
 I guess, I stop investigating at this point, unless the problem
 re-apprears.

2.5.6 should be out soon, which should allow you to avoid doing this.

But in the mean-time, you can probably merge the two branches, they
should be fairly orthogonal.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] hanging in Xenomai 2.5.5

2011-01-06 Thread Stefan Schaal
Hi Philippe,

  thanks a lot for the hint. I configured my kernel from scratch, and got rid 
of the linux compile problems. I could thus verify that the commit you 
mentioned below DOES NOT have the problem I described, i.e., semaphores used by 
multiple processes which are running on different cores DID NOT hang anymore.

   Then, I thought I try to bisect the problem with git, and I pulled the 
latest version of the 2.5 repository. Interestingly, with the very latest 
commits, my problem has gone away. I confirmed this by switching back to 
Alexis' analogy branch, which I need for my development. This branch is not 
quite as up-to-date as the 2.5 branch, and the hanging problem still exists. I 
merged the analogy branch with the latest 2.5 branch, and now nothing hangs 
anymore.

  I guess, I stop investigating at this point, unless the problem re-apprears.

Thanks so much for you help!

Best wishes,

-Stefan



On Jan 5, 2011, at 7:53, Philippe Gerum wrote:

 On Wed, 2011-01-05 at 07:41 -0800, Stefan Schaal wrote:
 HI Philippe,
 
  sorry, I must have mis-communicated. This was, of course, a xenomai commit 
 that I tried, and the errors I sent you resulted when recompiling the linux 
 kernel with this xenomai version.
 
 
 Those errors are not related to Xenomai, they happen on basic linux
 code. Make sure to work from a fresh build tree, using a proper
 toolchain. It looks like something is severely broken in your build env.
 
 -Stefan
 
 
 On Jan 5, 2011, at 6:07, Philippe Gerum wrote:
 
 On Sat, 2010-12-25 at 11:02 -0800, Stefan Schaal wrote:
 6a020f5
 
 I don't see how this messages could be related to Xenomai. I was
 mentioning a Xenomai commit, not a linux one. You should reset to this
 commit:
 
 commit 6a020f5a89955a42f1e03621ae6c63a587e9c75c
 Author: Philippe Gerum r...@xenomai.org
 Date:   Sat Aug 28 13:04:45 2010 +0200
 
   nucleus, posix: use fast APC scheduling call
 
 -- 
 Philippe.
 
 
 
 
 -- 
 Philippe.
 
 


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] hanging in Xenomai 2.5.5

2010-12-25 Thread Stefan Schaal
Hi Phiippe,

  thanks so much for your replay -- it took me a moment to get back to this 
problem. Here are some first observations:

1) the problem only occurs when I distribute the communicating processes over 
multiple cores -- in Xenomai 2.5.4, this has never been a problem.

2) The /proc/xenomai/stat looks like:

CPU  PIDMSWCSWPFSTAT   %CPU  NAME
  0  0  0  23492290 00500080  100.0  ROOT/0
  1  0  0  20328410   0 00500080  100.0  ROOT/1
  2  0  0  10403210 00500080  100.0  ROOT/2
  3  0  0  445786 0 00500080  100.0  ROOT/3
  4  0  0  71162  0 00500080  100.0  ROOT/4
  5  0  0  0  0 00500080  100.0  ROOT/5
  6  0  0  0  0 00500080  100.0  ROOT/6
  7  0  0  0  0 00500080  100.0  ROOT/7
  1  3128   0  91261  0 003001820.0  sem1_task
  2  3166   0  90470  0 003001880.0  sem2_task
  3  3195   0  45237  0 003001820.0  sem3_task
  0  0  0  0  0 0.0  IRQ56: Analogy device
  1  0  0  0  0 0.0  IRQ56: Analogy device
  2  0  0  0  0 0.0  IRQ56: Analogy device
  3  0  0  0  0 0.0  IRQ56: Analogy device
  4  0  0  0  0 0.0  IRQ56: Analogy device
  5  0  0  0  0 0.0  IRQ56: Analogy device
  6  0  0  0  0 0.0  IRQ56: Analogy device
  7  0  0  0  0 0.0  IRQ56: Analogy device
  1  0  0  39326230   0 0.0  IRQ521: [timer]
  2  0  0  16415320 0.0  IRQ521: [timer]
  3  0  0  12585710 0.0  IRQ521: [timer]
  4  0  0  722843 0 0.0  IRQ521: [timer]
  5  0  0  780591 0 0.0  IRQ521: [timer]
  6  0  0  764817 0 0.0  IRQ521: [timer]
  7  0  0  385421 0 0.0  IRQ521: [timer]

The three communicating processes are sem1_task, sem2_task, sem3_task -- they 
are currently hanging with 0% CPU

3) the /proc/xenomai/sched look like:

CPU  PIDCLASS  PRI  TIMEOUT   TIMEBASE   STAT   NAME
  0  0  idle-1  - master R  ROOT/0
  1  0  idle-1  - master R  ROOT/1
  2  0  idle-1  - master R  ROOT/2
  3  0  idle-1  - master R  ROOT/3
  4  0  idle-1  - master R  ROOT/4
  5  0  idle-1  - master R  ROOT/5
  6  0  idle-1  - master R  ROOT/6
  7  0  idle-1  - master R  ROOT/7
  1  3128   rt  50  - master W  sem1_task
  2  3166   rt  50  - master R  sem2_task
  3  3195   rt  50  - master W  sem3_task

Interestingly, despite sem2_task is supposed to be running, it doesn't.


4) When I try to terminate the three processes, sem2_task would hand and I 
cannot kill it. Interestingly, if I start another program that does a similar 
semaphore communication, sem2_task is finally released. Indeed, when I start 
this other program, the three processes (sem1_task, sem2_task, sem3_task) start 
running again, until they hang again.


5) I appended the little test program I used -- it is called xtest_xeno_sem.c

I compile with:

gcc -o xtest -I/usr/xenomai/include -D_GNU_SOURCE -D_REENTRANT -Wall -pipe 
-D__XENO__ -lnative -L/usr/xenomai/lib -lxenomai -lpthread -lrt -lrtdk -lnative 
test_xeno_sem.c

To create three communicating processes on different cores, I execute:

terminal1  xtest 1 1
terminal2  xtest 2 1
terminal3  xtest 3 1


To create three communicating processes on ONE core, I execute:

terminal1  xtest 1 0
terminal2  xtest 2 0
terminal3  xtest 3 0


6) I haven't tested the other commits yet --  this comes next. But maybe the 
information above already tells you all you need to know.

Best wishes, and, as always, a thousand thanks for your kind help!

-Stefan

--- test_xeno_sem.c 




test_xeno_sem.c
Description: Binary data





On Oct 16, 2010, at 1:48, Philippe Gerum wrote:

 On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote:
 Hi everybody,
 
  here is a quick first report on an issue that appeared with Xenomai 2.5.5 
 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue.
 
 We run multiple real-time 

Re: [Xenomai-core] hanging in Xenomai 2.5.5

2010-10-16 Thread Philippe Gerum
On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote:
 Hi everybody,
 
   here is a quick first report on an issue that appeared with Xenomai 2.5.5 
 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue.
 
 We run multiple real-time processes, synchronized by semaphores and 
 interprocess communication using shared memory. All is cleanly implemented 
 using the xenomai real-time functions, no mode switches. The different 
 processes are distributed on different processors of our multi-core machine 
 using rt_task_spawn() with the T_CPU directive. 
 
 Up to version 2.5.4, this worked fine.
 
 With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of 
 running (CPU consumption goes to zero), and usually one of them hangs so 
 badly that it cannot be killed anymore with kill -9 -- thus reboot is 
 required.
 
 The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, 
 kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 
 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and 
 higher.
 
 No dmesg print-outs when this error occurs.
 
 We will try to create a simple test program to illustrate the problem, but 
 maybe the issue is already obvious to some of the experts on this list.
 

$ cat /proc/xenomai/stat
$ cat /proc/xenomai/sched

when the threads hang would help.

Additionally, please clone the -stable repo from there:
git://git.xenomai.org/xenomai-2.5.git

then branch+build and test from these commits:

- 6a020f5 first; if the bug does not show up anymore, check the next one
- 5e7cfa5; if the bug is still there, try disabling
CONFIG_XENO_OPT_PRIOCPL to test the basic system and re-check.

 Best wishes,
 
 -Stefan
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] hanging in Xenomai 2.5.5

2010-10-15 Thread Stefan Schaal
Hi everybody,

  here is a quick first report on an issue that appeared with Xenomai 2.5.5 --- 
NOTE: 2.5.4 (and earlier) DOES NOT have this issue.

We run multiple real-time processes, synchronized by semaphores and 
interprocess communication using shared memory. All is cleanly implemented 
using the xenomai real-time functions, no mode switches. The different 
processes are distributed on different processors of our multi-core machine 
using rt_task_spawn() with the T_CPU directive. 

Up to version 2.5.4, this worked fine.

With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of 
running (CPU consumption goes to zero), and usually one of them hangs so badly 
that it cannot be killed anymore with kill -9 -- thus reboot is required.

The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, kernel 
2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 2.6.31.4). Thus, 
this seems to be specific to the xenomai release 2.5.5 and higher.

No dmesg print-outs when this error occurs.

We will try to create a simple test program to illustrate the problem, but 
maybe the issue is already obvious to some of the experts on this list.

Best wishes,

-Stefan
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core