Hi Phiippe,

  thanks so much for your replay -- it took me a moment to get back to this 
problem. Here are some first observations:

1) the problem only occurs when I distribute the communicating processes over 
multiple cores -- in Xenomai 2.5.4, this has never been a problem.

2) The /proc/xenomai/stat looks like:

CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
  0  0      0          2349229    0     00500080  100.0  ROOT/0
  1  0      0          20328410   0     00500080  100.0  ROOT/1
  2  0      0          1040321    0     00500080  100.0  ROOT/2
  3  0      0          445786     0     00500080  100.0  ROOT/3
  4  0      0          71162      0     00500080  100.0  ROOT/4
  5  0      0          0          0     00500080  100.0  ROOT/5
  6  0      0          0          0     00500080  100.0  ROOT/6
  7  0      0          0          0     00500080  100.0  ROOT/7
  1  3128   0          91261      0     00300182    0.0  sem1_task
  2  3166   0          90470      0     00300188    0.0  sem2_task
  3  3195   0          45237      0     00300182    0.0  sem3_task
  0  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  1  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  2  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  3  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  4  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  5  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  6  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  7  0      0          0          0     00000000    0.0  IRQ56: Analogy device
  1  0      0          39326230   0     00000000    0.0  IRQ521: [timer]
  2  0      0          1641532    0     00000000    0.0  IRQ521: [timer]
  3  0      0          1258571    0     00000000    0.0  IRQ521: [timer]
  4  0      0          722843     0     00000000    0.0  IRQ521: [timer]
  5  0      0          780591     0     00000000    0.0  IRQ521: [timer]
  6  0      0          764817     0     00000000    0.0  IRQ521: [timer]
  7  0      0          385421     0     00000000    0.0  IRQ521: [timer]

The three communicating processes are sem1_task, sem2_task, sem3_task -- they 
are currently hanging with 0% CPU

3) the /proc/xenomai/sched look like:

CPU  PID    CLASS  PRI      TIMEOUT   TIMEBASE   STAT       NAME
  0  0      idle    -1      -         master     R          ROOT/0
  1  0      idle    -1      -         master     R          ROOT/1
  2  0      idle    -1      -         master     R          ROOT/2
  3  0      idle    -1      -         master     R          ROOT/3
  4  0      idle    -1      -         master     R          ROOT/4
  5  0      idle    -1      -         master     R          ROOT/5
  6  0      idle    -1      -         master     R          ROOT/6
  7  0      idle    -1      -         master     R          ROOT/7
  1  3128   rt      50      -         master     W          sem1_task
  2  3166   rt      50      -         master     R          sem2_task
  3  3195   rt      50      -         master     W          sem3_task

Interestingly, despite sem2_task is supposed to be running, it doesn't.


4) When I try to terminate the three processes, sem2_task would hand and I 
cannot kill it. Interestingly, if I start another program that does a similar 
semaphore communication, sem2_task is finally released. Indeed, when I start 
this other program, the three processes (sem1_task, sem2_task, sem3_task) start 
running again, until they hang again.


5) I appended the little test program I used -- it is called xtest_xeno_sem.c

I compile with:

gcc -o xtest -I/usr/xenomai/include -D_GNU_SOURCE -D_REENTRANT -Wall -pipe 
-D__XENO__ -lnative -L/usr/xenomai/lib -lxenomai -lpthread -lrt -lrtdk -lnative 
test_xeno_sem.c

To create three communicating processes on different cores, I execute:

terminal1>  xtest 1 1
terminal2>  xtest 2 1
terminal3>  xtest 3 1


To create three communicating processes on ONE core, I execute:

terminal1>  xtest 1 0
terminal2>  xtest 2 0
terminal3>  xtest 3 0


6) I haven't tested the other commits yet --  this comes next. But maybe the 
information above already tells you all you need to know.

Best wishes, and, as always, a thousand thanks for your kind help!

-Stefan

------------------------------------------- test_xeno_sem.c 
------------------------------------------------------------------------

Attachment: test_xeno_sem.c
Description: Binary data



--------------------------------------------------------------------------------------------------------------------------------------------

On Oct 16, 2010, at 1:48, Philippe Gerum wrote:

> On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote:
>> Hi everybody,
>> 
>>  here is a quick first report on an issue that appeared with Xenomai 2.5.5 
>> --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue.
>> 
>> We run multiple real-time processes, synchronized by semaphores and 
>> interprocess communication using shared memory. All is cleanly implemented 
>> using the xenomai real-time functions, no mode switches. The different 
>> processes are distributed on different processors of our multi-core machine 
>> using rt_task_spawn() with the T_CPU directive. 
>> 
>> Up to version 2.5.4, this worked fine.
>> 
>> With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of 
>> running (CPU consumption goes to zero), and usually one of them hangs so 
>> badly that it cannot be killed anymore with kill -9 -- thus reboot is 
>> required.
>> 
>> The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, 
>> kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 
>> 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and 
>> higher.
>> 
>> No "dmesg" print-outs when this error occurs.
>> 
>> We will try to create a simple test program to illustrate the problem, but 
>> maybe the issue is already obvious to some of the experts on this list.
>> 
> 
> $ cat /proc/xenomai/stat
> $ cat /proc/xenomai/sched
> 
> when the threads hang would help.
> 
> Additionally, please clone the -stable repo from there:
> git://git.xenomai.org/xenomai-2.5.git
> 
> then branch+build and test from these commits:
> 
> - 6a020f5 first; if the bug does not show up anymore, check the next one
> - 5e7cfa5; if the bug is still there, try disabling
> CONFIG_XENO_OPT_PRIOCPL to test the basic system and re-check.
> 
>> Best wishes,
>> 
>> -Stefan
>> _______________________________________________
>> Xenomai-core mailing list
>> Xenomai-core@gna.org
>> https://mail.gna.org/listinfo/xenomai-core
> 
> -- 
> Philippe.
> 
> 

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to