from:"Philippe Gerum"

[Xenomai-core] [REMINDER] Migrating Xenomai mailing lists

2012-05-18 Thread Philippe Gerum



We will soon be moving all our mailing lists out of gna.org to host them 
on xenomai.org instead.


At this chance, xenomai-h...@gna.org, xenomai-core@gna.org and 
adeos-m...@gna.org will be merged into a single list named 
xeno...@xenomai.org. These are low traffic lists, so we want to group 
all Xenomai-related discussions in one place.


Commits to the development trees will be sent to
xenomai-...@xenomai.org.

The migration is scheduled for May 19, all current subscribers of the 
former lists will be automatically subscribed to xeno...@xenomai.org.

You will receive an automated mail from our Mailman when this happens.

The Mailman interface to the new lists is available at: 
http://www.xenomai.org/mailman/listinfo/xenomai.


Please drop a mail to mail...@xenomai.org in case of issue.

Thanks,

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] rt_task_create and rt_task delete re-scheduling calling task

2012-05-14 Thread Philippe Gerum


On 05/14/2012 09:55 AM, Roberto Bielli wrote:

Hi,

i saw in the documentation that rt_task_create and rt_task_delete should
re-scheduling the calling task.
So i lost the priority if in a task try to call rt_task_create or
rt_task create. Do i understand correctly ?
Is there a way to avoid this behaviour ? Or which are all the case of
re-scheduling whne calling rt_task_create/rt_task_delete ?


There is no way to avoid rescheduling (assuming you are currently using 
the user-space API). Creating and deleting tasks involves switching to 
secondary mode to get/release linux resources it's impossible to access 
from a primary context.




Thanks of all

P.S. the imx25 now it's perfect. Was only the reentrant interrupt.




--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

[Xenomai-core] Migrating Xenomai mailing lists

2012-05-14 Thread Philippe Gerum



We will soon be moving all our mailing lists out of gna.org to host them 
on xenomai.org instead.


At this chance, xenomai-h...@gna.org, xenomai-core@gna.org and 
adeos-m...@gna.org will be merged into a single list named 
xeno...@xenomai.org. These are low traffic lists, so we want to group 
all Xenomai-related discussions in one place.


Commits to the development trees will be sent to
xenomai-...@xenomai.org.

The migration is scheduled for May 19, all current subscribers of the 
former lists will be automatically subscribed to xeno...@xenomai.org.

You will receive an automated mail from our Mailman when this happens.

The Mailman interface to the new lists is available at: 
http://www.xenomai.org/mailman/listinfo/xenomai.


Please drop a mail to mail...@xenomai.org in case of issue.

Thanks,

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Scheduler extensions

2012-05-08 Thread Philippe Gerum


On 05/08/2012 09:23 AM, Jonas Flodin wrote:


Hi!

I'm a PhD student who is currently doing research on multicore real-time
scheduling. I'm considering using Xenomai as a base for my research
experiments, but this would require me to replace or extend the current
scheduler. So far I have found no documents detailing how the scheduler
is implemented or how to extend it (if possible). Could you point me to
information regarding the scheduler?



There is no documentation on the scheduling core. You should probably 
start with ksrc/nucleus/sched*.c, and include/nucleus/sched*.h, having a 
look at the files implementing the plain FIFO policy in sched-rt*.


Hint: the scheduling core is meant to be extensible, adding a new policy 
entails providing an implementation for a new struct xnsched_class 
object. Make sure to read the comments in the files implementing the 
existing policies (-rt, -sporadic, -tp), they usually mention details on 
the calling context and requirements for the handlers defined by the 
xnsched_class type.



Thank you in advance.

BR
Jonas Flodin

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core




--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] xenomai-forge: round-robin scheduling in pSOS skin

2012-03-17 Thread Philippe Gerum


On 03/08/2012 03:30 PM, Ronny Meeus wrote:

Hello

I'm are using the xenomai-forge pSOS skin (Mercury).
My application is running on a P4040 (Freescale PPC with 4 cores).
Some code snippets are put in this mail but the complete testcode is
also attached.

I have a test task that just consumes the CPU:

int run_test = 1;
static void perform_work(u_long counter,u_long b,u_long c,u_long d)
{
   int i;
   while (run_test) {
 for (i=0;i10;i++);
 (*(unsigned long*)counter)++;
   }
   while (1) tm_wkafter(1000);
}

If I create 2 instances of this task with the T_SLICE option set:

 t_create(WORK,10,0,0,0,tid);
 t_start(tid,T_TSLICE, perform_work, args);


I see that only 1 task is consuming CPU.

# taskset 1 ./roundrobin.exe
#.543| [main] SCHED_RT priorities =  [1 .. 99]
.656| [main] SCHED_RT.99 reserved for IRQ emulation
.692| [main] SCHED_RT.98 reserved for scheduler-lock emulation
0 -  6602
1 -  0

If I adapt the code so that I call in my init the threadobj_start_rr
function, I see that the load is equally distributed over the 2
threads:

# taskset 1 ./roundrobin.exe
#.557| [main] SCHED_RT priorities =  [1 .. 99]
.672| [main] SCHED_RT.99 reserved for IRQ emulation
.708| [main] SCHED_RT.98 reserved for scheduler-lock emulation
0 -  3290
1 -  3291

Here are the questions:
- why is the threadobj_start_rr function not called from the context
of the init of the psos layer.


Because threadobj_start_rr() was originally designed to activate 
round-robin for all threads (some RTOS like VxWorks expose that kind of 
API), not on a per-thread basis. This is not what pSOS wants.


The round-robin API is in state of flux for mercury, only the cobalt one 
is stable. This is why RR is not yet activated despite T_SLICE is 
recognized.



- why is the roundrobin implemented in this way? If the tasks would be
mapped on the SCHED_RR instead of the SCHED_FF the Linux scheduler
would take care of this.


Nope. We need per-thread RR intervals, to manage multiple priority 
groups concurrently, and we also want to define that interval as we see 
fit for proper RTOS emulation. POSIX does not define anything like 
sched_set_rr_interval(), and the linux kernel applies a default fixed 
interval to all threads from the SCHED_RR class (100ms IIRC).


So we have to emulate SCHED_RR over SCHED_FIFO plus a per-thread virtual 
timer.



On the other hand, once the threadobj_start_rr function is called from
my init, and I create the tasks in T_NOTSLICE mode, the time-slicing
is still done.


Because you called threadobj_start_rr().



Thanks.

---
Ronny



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core



--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [PATCH forge] Fix build for relative invocations of configure

2012-02-19 Thread Philippe Gerum


On 02/07/2012 04:43 PM, Jan Kiszka wrote:

This fixes build setups like '../configure'.



Merged, thanks.


Signed-off-by: Jan Kiszkajan.kis...@siemens.com
---
  configure.in |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/configure.in b/configure.in
index c0a7d17..0bdced8 100644
--- a/configure.in
+++ b/configure.in
@@ -547,7 +547,7 @@ LD_FILE_OPTION=$ac_cv_ld_file_option
  AC_SUBST(LD_FILE_OPTION)

  if test x$rtcore_type = xcobalt; then
-   XENO_USER_CFLAGS=-I$srcdir/include/cobalt $XENO_USER_CFLAGS
+   XENO_USER_CFLAGS=-I`cd $srcdir  pwd`/include/cobalt $XENO_USER_CFLAGS
 if [[ $ac_cv_ld_file_option = yes ]]; then
XENO_POSIX_WRAPPERS=-Wl,@`cd $srcdir  pwd`/lib/cobalt/posix.wrappers
 else



--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [PATCH] Add sigdebug unit test

2012-01-26 Thread Philippe Gerum


On 01/26/2012 11:36 AM, Jan Kiszka wrote:

On 2012-01-25 19:05, Jan Kiszka wrote:

On 2012-01-25 18:44, Gilles Chanteperdrix wrote:

On 01/25/2012 06:10 PM, Jan Kiszka wrote:

On 2012-01-25 18:02, Gilles Chanteperdrix wrote:

On 01/25/2012 05:52 PM, Jan Kiszka wrote:

On 2012-01-25 17:47, Jan Kiszka wrote:

On 2012-01-25 17:35, Gilles Chanteperdrix wrote:

On 01/25/2012 05:21 PM, Jan Kiszka wrote:

We had two regressions in this code recently. So test all 6 possible
SIGDEBUG reasons, or 5 if the watchdog is not available.


Ok for this test, with a few remarks:
- this is a regression test, so should go to
src/testsuite/regression(/native), and should be added to the
xeno-regression-test


What are unit test for (as they are defined here)? Looks a bit inconsistent.


I put under regression all the tests I have which corresponded to
things that failed one time or another in xenomai past. Maybe we could
move unit tests under regression.




- we already have a regression test for the watchdog called mayday.c,
which tests the second watchdog action, please merge mayday.c with
sigdebug.c (mayday.c also allows checking the disassembly of the code in
the mayday page, a nice feature)


It seems to have failed in that important last discipline. Need to check
why.


Because it didn't check the page content for correctness. But that's now
done via the new watchdog test. I can keep the debug output, but the
watchdog test of mayday looks obsolete to me. Am I missing something?


The watchdog does two things: it first sends a SIGDEBUG, then if the
application is still spinning, it sends a SIGSEGV. As far as I
understood, you test tests the first case, and mayday tests the second
case, so, I agree that mayday should be removed, but whatever it tests
should be integrated in the sigdebug test.



Err... SIGSEGV is not a feature, it was the bug I fixed today. :) So the
test case actually specified a bug as correct behavior.

The fallback case is in fact killing the RT task as before. But I'm
unsure right now: will this leave the system always in a clean state
behind?


The test case being a test case and doing nothing particular, I do not
see what could go wrong. And if something goes wrong, then it needs fixing.


Well, if you kill a RT task while it's running in the kernel, you risk
inconsistent system states (held mutexex etc.). In this case the task is
supposed to spin in user space. If that is always safe, let's implement
the test.


Had a closer look: These days the two-stage killing is only useful to
catch endless loops in the kernel. User space tasks can't get around
being migrated on watchdog events, even when SIGDEBUG is ignored.

To trigger the enforced task termination without leaving any broken
states behind, there is one option: rt_task_spin. Surprisingly for me,
it actually spins in the kernel, thus triggers the second level if
waiting long enough. I wonder, though, if that behavior shouldn't be
improved, ie. the spinning loop be closed in user space - which would
take away that option again.

Thoughts?



Tick-based timing is going to be the problem for determining the 
spinning delay, unless we expose it in the vdso on a per-skin basis, 
which won't be pretty.



Jan




--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] realtime pipes

2012-01-16 Thread Philippe Gerum


On 01/16/2012 03:25 PM, Makarand Pradhan wrote:

Hi,

Real-time pipes are deprecated.

We use a lot of rt pipes. So, can you pl elaborate on this? I would
highly appreciate if you can comment on the following.

1. When will the rt pipe interface be removed? Any time frame?


Xenomai 3. Xenomai 2.x will keep them forever.


2. Would like to understand the reason for deprecating the interface.



- Because there is a better socket-based API implemented by the RTIPC 
driver w/ the XDDP protocol, which does not require running application 
level code in kernel space (RT_PIPE is definitely an application level 
API). This new interface is available since Xenomai 2.5.x. It is 
functionally 100% equivalent to the legacy RT_PIPE API.


- Because no support will be provided in Xenomai 3 for running 
application level code in kernel space, so RT_PIPE have to go from 
kernel space. However, RT_PIPE are still part of the user-space API of 
Xenomai 3, interfacing with XDDP endpoints in kernel space.


I'm really referring to application level code, by contrast to RTDM 
driver level code which will obviously remain a first-class citizen in 
kernel space.


See:

o http://www.xenomai.org/index.php/Xenomai:Roadmap

o 
http://www.xenomai.org/documentation/xenomai-2.6/html/api/group__rtipc.html


o examples/rtdm/profiles/ipc in the Xenomai distro


Thanks and Rgds,
Mak.


On 15/01/12 12:37 PM, Gilles Chanteperdrix wrote:

Real-time pipes are deprecated.






--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] realtime pipes

2012-01-16 Thread Philippe Gerum


On 01/16/2012 04:09 PM, Makarand Pradhan wrote:

Thanks Philippe.

To ensure that I understand correctly, let me rephrase my understanding.

In 3.0, the rt_pipe_create and friends will cease to exist. We have to
start using the sockets with domain AF_RTIPC and protocol IPCPROTO_XDDP
instead.

Is that a correct statement?


Basically, yes. In addition, X3 will keep the RT_PIPE interface for the 
-rt endpoint available on the application-side, by wrapping a XDDP 
socket to a RT_PIPE descriptor under the hood. In kernel space however, 
the RT_PIPE API to create -rt endpoints won't be available anymore, one 
will have to create them via the rtdm_socket/rt_dev_socket calls.


In any case, the API for the non-rt side does not change, i.e. POSIX 
file I/O calls will still be the way to interface with the -rt endpoint.




Rgds,
Mak.

On 16/01/12 09:35 AM, Philippe Gerum wrote:

On 01/16/2012 03:25 PM, Makarand Pradhan wrote:

Hi,

Real-time pipes are deprecated.

We use a lot of rt pipes. So, can you pl elaborate on this? I would
highly appreciate if you can comment on the following.

1. When will the rt pipe interface be removed? Any time frame?

Xenomai 3. Xenomai 2.x will keep them forever.


2. Would like to understand the reason for deprecating the interface.


- Because there is a better socket-based API implemented by the RTIPC
driver w/ the XDDP protocol, which does not require running application
level code in kernel space (RT_PIPE is definitely an application level
API). This new interface is available since Xenomai 2.5.x. It is
functionally 100% equivalent to the legacy RT_PIPE API.

- Because no support will be provided in Xenomai 3 for running
application level code in kernel space, so RT_PIPE have to go from
kernel space. However, RT_PIPE are still part of the user-space API of
Xenomai 3, interfacing with XDDP endpoints in kernel space.

I'm really referring to application level code, by contrast to RTDM
driver level code which will obviously remain a first-class citizen in
kernel space.

See:

o http://www.xenomai.org/index.php/Xenomai:Roadmap

o
http://www.xenomai.org/documentation/xenomai-2.6/html/api/group__rtipc.html


o examples/rtdm/profiles/ipc in the Xenomai distro


Thanks and Rgds,
Mak.


On 15/01/12 12:37 PM, Gilles Chanteperdrix wrote:

Real-time pipes are deprecated.










--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Synchronization of shared memory with mutexes

2012-01-10 Thread Philippe Gerum


On 01/10/2012 04:04 PM, Jan-Erik Lange wrote:


Hello,

I have a question about basics of the synchronization of shared memory
with mutexes.

The situation: The Sender is a RT task (primary domain) and the
recipient is a non-RT task (usually in the secondary domain). Namely,
the receiver is used to interact with a Web server. He calls to syscalls
and stuff and because of that he's usually in the secondary mode.

Suppose the sender has written something to the shared memory: He uses
mutex for synchronization, so he calls the rt_mutex_release() function.

The receiver will now get time to work from the scheduler. He calls
rt_mutex_acquire() function to lock the shared memory. Then a context
switch occurs from the secondary mode in the primary mode. He has now
the resource for himself.

Now the scheduler lets sender-task to work and it wants to write
something. So it calls rt_mutex_acquire() function. And now comes my
question: Provides rt_mutex_acquire() a mechanism to signal the cheduler
to immediately continue with the recipient-task? If so, how does the
rt_mutex_acquire() function tells the scheduler that?


There are two tasks controlled by the same (Xenomai) scheduler. One is 
trying to grab a mutex the other one holds, so it is put to sleep on 
that mutex. The scheduler will simply switch to the next ready-to-run 
task since the sender task cannot run anymore, and that next task may be 
the receiver task. There is no special signaling magic required.




I came out because I in the documention I read the term Rescheduling:
always.



The documentation for rt_mutex_acquire is Rescheduling: always unless 
the request is immediately satisfied or

timeout specifies a non-blocking operation..


Best regards
Jan



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core



--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Ipipe breaks my MPC8541 board boot

2012-01-03 Thread Philippe Gerum


On 01/03/2012 06:58 PM, Gilles Chanteperdrix wrote:

On 01/03/2012 06:49 PM, Jean-Michel Hautbois wrote:

cpm2_cascade is dedicated to my board, but has nothing impressive :
static void cpm2_cascade(unsigned int irq, struct irq_desc *desc)
{
 int cascade_irq;

 while ((cascade_irq = cpm2_get_irq())= 0)
 generic_handle_irq(cascade_irq);


Replace generic_handle_irq with ipipe_handle_chained_irq.



You have to fixup the eoi handling as well, check how this is done in 
arch/powerpc/platforms/85xx/sbc8560.c.


--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] general questions

2011-12-29 Thread Philippe Gerum


On 12/29/2011 01:04 PM, Jan-Erik Lange wrote:

Hello,
I'm new in the topic about the Xenomai co-kernel approach and I have
some questions to the primary mode and secondary mode.

I have trouble with the imagination of the fact in general, that one
task (process oder thread) can be processed by two kernels (Xenomai
nucleus and standard kernel), and treated by the one in real time and by
the other in non real-time.

1. So far I understood this approach, the primary and secondary mode is
an abstract description of the fact, that threads or processes can be
scheduled by the Xenomai nucleus or by the standard Linux kernel
scheduler. Is this correct?


Yes. A Xenomai thread in user-space is attached a shadow control area 
in addition to the regular linux context data, which enables both linux 
and the nucleus to schedule it in a mutually exclusive manner.




2. Now supose, that I have chosen the VxWork-skin and I started a task
in the primary mode. Is it correct that when this task is calling a non
VxWork-API-funtion, there will a change of the context from primary to
the secondary mode? Or what is the exact condition of the switching of
the context?



- invoking a regular linux syscall
- receiving a linux signal (e.g. kill(2) and GDB)
- causing a CPU trap (e.g. invalid memory access), hitting a breakpoint 
(e.g. GDB)


all these situations cause the switch from primary to secondary mode. We 
say that such thread relaxes in Xenomai parlance.


A common caveat is to call a glibc routine, which eventually issues a 
linux syscall under the hood. Think of malloc() detecting a process 
memory shortage, which then calls mmap or sbrk to extend the process 
data. Or, running into a mutex contention once in a while, forcing the 
calling thread to issue a syscall for sleeping on the mutex. 
Fortunately, we have a tool to detect these situations.



It would be very nice, if you could tell me a little bit about these
questions?

Best regards
Jan



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core



--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Philippe Gerum


On 12/23/2011 06:33 PM, Lennart Sorensen wrote:

After spending quite a while trying to explain how things like /bin/echo
could possibly segfault, I finally discovered that the new feature in
xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible
context switches is what is corrupting the state of random linux processes
once in a while.

After turning the option off, I haven't seen a single crash just like 2.4.10.

So something subtle is wrong with this option.

It appears to be most likely to occour (possibly only likely) when
xenomai is handling interrupts.

It seems that getting an interrupt in the middle of a context switch at
the wrong time corrupts the process that is being switched to or from
(no idea which it is).

Unless someone can think of a way to track down and fix this I would
certainly suggest making the option off by default instead of on.



Papering over a bug this way is certainly not an option.


With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore.



Which kernel version, what ppc hardware?

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Philippe Gerum


On 12/23/2011 07:32 PM, Lennart Sorensen wrote:

On Fri, Dec 23, 2011 at 07:17:09PM +0100, Philippe Gerum wrote:

Papering over a bug this way is certainly not an option.


Long term it certainly isn't.


Which kernel version, what ppc hardware?


3.0.13, 3.0.9, 3.0.8.  mpc8360e.

xenomai 2.6.0 with ipipe 3.0.8-powerpc-2.13-04



Do you have a typical test scenario which triggers this bug?

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Philippe Gerum


On 12/23/2011 09:25 PM, Lennart Sorensen wrote:

On Fri, Dec 23, 2011 at 09:08:11PM +0100, Philippe Gerum wrote:

Do you have a typical test scenario which triggers this bug?


It can take a couple of hours under pretty heavy load to get one
occourance.  But with preemptible context swiches off we haven't seen
any in a week.

For sure xenomai tasks are handling interrupts quite a lot at the time.

I wish we had a simple test case to show it, but it seems to require
triggering an interrupt in the middle of a context switch at exactly
the wrong place.



Is it reproducible with the basic latency or cyclic tests if waiting for 
long enough? Running ltp in parallel would trigger a decent load, but 
sometimes two shell loops forking commands in the background are enough 
to trigger a variety of issues when something fragile exists in the mmu 
layer as modified by the I-Pipe.


--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Philippe Gerum


On 12/23/2011 10:55 PM, Lennart Sorensen wrote:

On Fri, Dec 23, 2011 at 10:48:29PM +0100, Philippe Gerum wrote:

Is it reproducible with the basic latency or cyclic tests if waiting
for long enough? Running ltp in parallel would trigger a decent
load, but sometimes two shell loops forking commands in the
background are enough to trigger a variety of issues when something
fragile exists in the mmu layer as modified by the I-Pipe.


Well we can try after I come back from vacation in a couple of weeks.



Ok. I will try to reproduce on my side as well.

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Usage of Xenomai name

2011-10-22 Thread Philippe Gerum

On Mon, 2011-10-17 at 11:15 +0100, Jorge Amado Azevedo wrote:
 Hello
 
 
 I'm currently finishing a small application that allows users to draw
 block diagrams of control systems and execute them in real-time using
 Xenomai. Technically, each block is a Xenomai task and users can
 easily make their own as long as they adhere to a specific interface.
 I'm a student at the university of Aveiro (Portugal) and this work is
 part of my master's thesis. My original idea was to call my
 application Xenomai Lab but I'm not sure if I can use the Xenomai
 name like that.
 
 
 Am I violating any trademarks, copyrights or other legal restrictions
 by using the Xenomai name for my application?

There would be no objection from the Xenomai project, provided this is
and remains LGPL/GPL software.
 
 Regards,
 Jorge Azevedo
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc4

2011-09-29 Thread Philippe Gerum

On Wed, 2011-09-28 at 20:34 +0200, Gilles Chanteperdrix wrote:
 Hi, 
 
 here is the 4th release candidate for Xenomai 2.6.0:
 http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc4.tar.bz2
 
 Novelties since -rc3 include:
 - a fix for the long names issue on psos+
 - a fix for the build issue of mscan on mpc52xx (please Wolfgang, have 
 a look at the patch, to see if you like it:)
 http://git.xenomai.org/?p=xenomai-head.git;a=commitdiff;h=d22fd231db7eb0af8e77ec570efb89e578e13781;hp=4a2188f049e96fc59aa7c4a7a9d058075f3d79e8
 - a new version of the I-pipe patch for linux 3.0 on ppc.

People running 2.13-02/powerpc over linux 3.0.4 should definitely
upgrade to 2.13-03, or apply this:
http://git.denx.de/?p=ipipe-2.6.git;a=commit;h=7c28eb2dea86366bf721663bb8d28ce89cf2806c

 
 This should be the last release candidate.
 
 Regards.
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Policy switching and XNOTHER maintenance

2011-09-18 Thread Philippe Gerum

On Sun, 2011-09-18 at 16:34 +0200, Jan Kiszka wrote:
 On 2011-09-18 16:02, Philippe Gerum wrote:
  On Fri, 2011-09-16 at 22:39 +0200, Gilles Chanteperdrix wrote:
  On 09/16/2011 10:13 PM, Gilles Chanteperdrix wrote:
  On 09/11/2011 04:29 PM, Jan Kiszka wrote:
  On 2011-09-11 16:24, Gilles Chanteperdrix wrote:
  On 09/11/2011 12:50 PM, Jan Kiszka wrote:
  Hi all,
 
  just looked into the hrescnt issue again, specifically the corner case
  of a shadow thread switching from real-time policy to SCHED_OTHER.
 
  Doing this while holding a mutex looks invalid.
 
  Looking at POSIX e.g., is there anything in the spec that makes this
  invalid? If the kernel preserves or established proper priority
  boosting, I do not see what could break in principle.
 
  It is nothing I would design into some app, but we should somehow handle
  it (doc update or code adjustments).
 
  If we do not do it, the current code is valid.
 
  Except for its dependency on XNOTHER which is not updated on RT-NORMAL
  transitions.
 
  The fact that this update did not take place made the code work. No 
  negative rescnt could happen with that code.
 
  Anyway, here is a patch to allow switching back from RT to NORMAL, but 
  send a SIGDEBUG to a thread attempting to release a mutex while its 
  counter is already 0. We end up avoiding a big chunk of code that would 
  have been useful for a really strange corner case.
 
 
  Here comes version 2:
  diff --git a/include/nucleus/sched-idle.h b/include/nucleus/sched-idle.h
  index 6399a17..417170f 100644
  --- a/include/nucleus/sched-idle.h
  +++ b/include/nucleus/sched-idle.h
  @@ -39,6 +39,8 @@ extern struct xnsched_class xnsched_class_idle;
   static inline void __xnsched_idle_setparam(struct xnthread *thread,
const union xnsched_policy_param *p)
   {
  +  if (xnthread_test_state(thread, XNSHADOW))
  +  xnthread_clear_state(thread, XNOTHER);
 thread-cprio = p-idle.prio;
   }
   
  diff --git a/include/nucleus/sched-rt.h b/include/nucleus/sched-rt.h
  index 71f655c..cc1cefa 100644
  --- a/include/nucleus/sched-rt.h
  +++ b/include/nucleus/sched-rt.h
  @@ -86,6 +86,12 @@ static inline void __xnsched_rt_setparam(struct 
  xnthread *thread,
  const union xnsched_policy_param *p)
   {
 thread-cprio = p-rt.prio;
  +  if (xnthread_test_state(thread, XNSHADOW)) {
  +  if (thread-cprio)
  +  xnthread_clear_state(thread, XNOTHER);
  +  else
  +  xnthread_set_state(thread, XNOTHER);
  +  }
   }
   
   static inline void __xnsched_rt_getparam(struct xnthread *thread,
  diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
  index 9a02e80..d1f 100644
  --- a/ksrc/nucleus/pod.c
  +++ b/ksrc/nucleus/pod.c
  @@ -1896,16 +1896,6 @@ int __xnpod_set_thread_schedparam(struct xnthread 
  *thread,
 xnsched_putback(thread);
   
   #ifdef CONFIG_XENO_OPT_PERVASIVE
  -  /*
  -   * A non-real-time shadow may upgrade to real-time FIFO
  -   * scheduling, but the latter may never downgrade to
  -   * SCHED_NORMAL Xenomai-wise. In the valid case, we clear
  -   * XNOTHER to reflect the change. Note that we keep handling
  -   * non real-time shadow specifics in higher code layers, not
  -   * to pollute the core scheduler with peculiarities.
  -   */
  -  if (sched_class == xnsched_class_rt  sched_param-rt.prio  0)
  -  xnthread_clear_state(thread, XNOTHER);
 if (propagate) {
 if (xnthread_test_state(thread, XNRELAX))
 xnshadow_renice(thread);
  diff --git a/ksrc/nucleus/sched-sporadic.c b/ksrc/nucleus/sched-sporadic.c
  index fd37c21..ffc9bab 100644
  --- a/ksrc/nucleus/sched-sporadic.c
  +++ b/ksrc/nucleus/sched-sporadic.c
  @@ -258,6 +258,8 @@ static void xnsched_sporadic_setparam(struct xnthread 
  *thread,
 }
 }
   
  +  if (xnthread_test_state(thread, XNSHADOW))
  +  xnthread_clear_state(thread, XNOTHER);
 thread-cprio = p-pss.current_prio;
   }
   
  diff --git a/ksrc/nucleus/sched-tp.c b/ksrc/nucleus/sched-tp.c
  index 43a548e..a2af1d3 100644
  --- a/ksrc/nucleus/sched-tp.c
  +++ b/ksrc/nucleus/sched-tp.c
  @@ -100,6 +100,8 @@ static void xnsched_tp_setparam(struct xnthread 
  *thread,
   {
 struct xnsched *sched = thread-sched;
   
  +  if (xnthread_test_state(thread, XNSHADOW))
  +  xnthread_clear_state(thread, XNOTHER);
 thread-tps = sched-tp.partitions[p-tp.ptid];
 thread-cprio = p-tp.prio;
   }
  diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
  index b956e46..47bc0c5 100644
  --- a/ksrc/nucleus/synch.c
  +++ b/ksrc/nucleus/synch.c
  @@ -684,9 +684,13 @@ xnsynch_release_thread(struct xnsynch *synch, struct 
  xnthread *lastowner)
   
 XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
   
  -  if (xnthread_test_state(lastowner, XNOTHER))
  -  xnthread_dec_rescnt(lastowner);
  -  XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0

Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1

2011-09-06 Thread Philippe Gerum

On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote:
 On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote:
  
  Hi,
  
  The first release candidate for the 2.6.0 version may be downloaded here:
  
  http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2
 
 Hi,
 
 currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to
 vfile support. Do we really want to still support 2.4 kernels?
 

That would not be a massive loss, but removing linux 2.4 support is more
than a few hunks here and there, so this may not be the right thing to
do ATM. Besides, it would be better not to leave the few linux 2.4 users
out there without upgrade path to xenomai 2.6, since this will be the
last maintained version from the Xenomai 2.x architecture.

That stuff does not compile likely because the Config.in bits are not up
to date, blame it on me. I'll make this build over linux 2.4 and commit
the result today.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1

2011-09-06 Thread Philippe Gerum

On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote:
 On 09/06/2011 03:27 PM, Philippe Gerum wrote:
  On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote:
  On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote:
 
  Hi,
 
  The first release candidate for the 2.6.0 version may be downloaded here:
 
  http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2
 
  Hi,
 
  currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to
  vfile support. Do we really want to still support 2.4 kernels?
 
  
  That would not be a massive loss, but removing linux 2.4 support is more
  than a few hunks here and there, so this may not be the right thing to
  do ATM. Besides, it would be better not to leave the few linux 2.4 users
  out there without upgrade path to xenomai 2.6, since this will be the
  last maintained version from the Xenomai 2.x architecture.
  
  That stuff does not compile likely because the Config.in bits are not up
  to date, blame it on me. I'll make this build over linux 2.4 and commit
  the result today.
  
 
 No problem, I was not looking for someone to blame... Since you are at
 it, I have problems compiling the nios2 kernel too, but I am not sure I
 got the proper configuration file.
 

Ok, I'll check this.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1

2011-09-06 Thread Philippe Gerum

On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote:
 On 09/06/2011 03:27 PM, Philippe Gerum wrote:
  On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote:
  On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote:
 
  Hi,
 
  The first release candidate for the 2.6.0 version may be downloaded here:
 
  http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2
 
  Hi,
 
  currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to
  vfile support. Do we really want to still support 2.4 kernels?
 
  
  That would not be a massive loss, but removing linux 2.4 support is more
  than a few hunks here and there, so this may not be the right thing to
  do ATM. Besides, it would be better not to leave the few linux 2.4 users
  out there without upgrade path to xenomai 2.6, since this will be the
  last maintained version from the Xenomai 2.x architecture.
  
  That stuff does not compile likely because the Config.in bits are not up
  to date, blame it on me. I'll make this build over linux 2.4 and commit
  the result today.
  
 
 No problem, I was not looking for someone to blame... Since you are at
 it, I have problems compiling the nios2 kernel too, but I am not sure I
 got the proper configuration file.
 

HEAD builds fine based on the attached .config. 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1

2011-09-06 Thread Philippe Gerum

On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote:
 On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote:
  On 09/06/2011 03:27 PM, Philippe Gerum wrote:
   On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote:
   On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote:
  
   Hi,
  
   The first release candidate for the 2.6.0 version may be downloaded 
   here:
  
   http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2
  
   Hi,
  
   currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related to
   vfile support. Do we really want to still support 2.4 kernels?
  
   
   That would not be a massive loss, but removing linux 2.4 support is more
   than a few hunks here and there, so this may not be the right thing to
   do ATM. Besides, it would be better not to leave the few linux 2.4 users
   out there without upgrade path to xenomai 2.6, since this will be the
   last maintained version from the Xenomai 2.x architecture.
   
   That stuff does not compile likely because the Config.in bits are not up
   to date, blame it on me. I'll make this build over linux 2.4 and commit
   the result today.
   
  
  No problem, I was not looking for someone to blame... Since you are at
  it, I have problems compiling the nios2 kernel too, but I am not sure I
  got the proper configuration file.
  
 
 HEAD builds fine based on the attached .config. 
 

Mmmfff...

-- 
Philippe.

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.35
# Tue Sep  6 16:49:25 2011
#

#
# Linux/NiosII Configuration
#
CONFIG_NIOS2=y
CONFIG_MMU=y
# CONFIG_FPU is not set
# CONFIG_SWAP is not set
CONFIG_RWSEM_GENERIC_SPINLOCK=y

#
# NiosII board configuration
#
# CONFIG_3C120 is not set
CONFIG_NEEK=y
CONFIG_NIOS2_CUSTOM_FPGA=y
# CONFIG_NIOS2_NEEK_OCM is not set

#
# NiosII specific compiler options
#
CONFIG_NIOS2_HW_MUL_SUPPORT=y
# CONFIG_NIOS2_HW_MULX_SUPPORT is not set
# CONFIG_NIOS2_HW_DIV_SUPPORT is not set
# CONFIG_OF is not set
CONFIG_ALIGNMENT_TRAP=y
CONFIG_RAMKERNEL=y

#
# Boot options
#
CONFIG_CMDLINE=
CONFIG_PASS_CMDLINE=y
CONFIG_BOOT_LINK_OFFSET=0x0100

#
# Platform driver options
#
# CONFIG_AVALON_DMA is not set

#
# Additional NiosII Device Drivers
#
# CONFIG_PCI_ALTPCI is not set
# CONFIG_ALTERA_REMOTE_UPDATE is not set
# CONFIG_PIO_DEVICES is not set
# CONFIG_NIOS2_GPIO is not set
# CONFIG_ALTERA_PIO_GPIO is not set
CONFIG_UID16=y
CONFIG_GENERIC_CSUM=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_NO_IOPORT=y
CONFIG_ZONE_DMA=y
CONFIG_BINFMT_ELF=y
# CONFIG_NOT_COHERENT_CACHE is not set
CONFIG_HZ=100
# CONFIG_TRACE_IRQFLAGS_SUPPORT is not set
CONFIG_IPIPE=y
CONFIG_IPIPE_DOMAINS=4
CONFIG_IPIPE_DELAYED_ATOMICSW=y
# CONFIG_IPIPE_UNMASKED_CONTEXT_SWITCH is not set
CONFIG_IPIPE_HAVE_PREEMPTIBLE_SWITCH=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
# CONFIG_KSM is not set
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config
CONFIG_CONSTRUCTORS=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=
CONFIG_LOCALVERSION=
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_TREE_PREEMPT_RCU is not set
# CONFIG_TINY_RCU is not set
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_FANOUT=32
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_SYSFS_DEPRECATED_V2 is not set
# CONFIG_RELAY is not set
# CONFIG_NAMESPACES is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
CONFIG_RD_GZIP=y
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
# CONFIG_RD_LZO is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
# CONFIG_ELF_CORE is not set
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
# CONFIG_EPOLL is not set
# CONFIG_SIGNALFD is not set
# CONFIG_TIMERFD is not set
# CONFIG_EVENTFD is not set
# CONFIG_SHMEM is not set
CONFIG_AIO=y

#
# Kernel Performance Events And Counters
#
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_COMPAT_BRK=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
# CONFIG_PROFILING

Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1

2011-09-06 Thread Philippe Gerum

On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote:
 On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote:
  On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote:
   On 09/06/2011 03:27 PM, Philippe Gerum wrote:
On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote:
On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote:
   
Hi,
   
The first release candidate for the 2.6.0 version may be downloaded 
here:
   
http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2
   
Hi,
   
currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related 
to
vfile support. Do we really want to still support 2.4 kernels?
   

That would not be a massive loss, but removing linux 2.4 support is more
than a few hunks here and there, so this may not be the right thing to
do ATM. Besides, it would be better not to leave the few linux 2.4 users
out there without upgrade path to xenomai 2.6, since this will be the
last maintained version from the Xenomai 2.x architecture.

That stuff does not compile likely because the Config.in bits are not up
to date, blame it on me. I'll make this build over linux 2.4 and commit
the result today.

   
   No problem, I was not looking for someone to blame... Since you are at
   it, I have problems compiling the nios2 kernel too, but I am not sure I
   got the proper configuration file.
   
  
  HEAD builds fine based on the attached .config. 
  

Btw we now only support the MMU version (2.6.35.2) of this kernel over
Xenomai 2.6. Reference tree is available there:

url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git
branch = nios2mmu

nommu support is discontinued for nios2 - people who depend on it should
stick with Xenomai 2.5.x.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1

2011-09-06 Thread Philippe Gerum

On Tue, 2011-09-06 at 21:42 +0200, Gilles Chanteperdrix wrote:
 On 09/06/2011 08:19 PM, Gilles Chanteperdrix wrote:
  On 09/06/2011 05:10 PM, Philippe Gerum wrote:
  On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote:
  On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote:
  On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote:
  On 09/06/2011 03:27 PM, Philippe Gerum wrote:
  On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote:
  On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote:
 
  Hi,
 
  The first release candidate for the 2.6.0 version may be downloaded 
  here:
 
  http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2
 
  Hi,
 
  currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related 
  to
  vfile support. Do we really want to still support 2.4 kernels?
 
 
  That would not be a massive loss, but removing linux 2.4 support is 
  more
  than a few hunks here and there, so this may not be the right thing to
  do ATM. Besides, it would be better not to leave the few linux 2.4 
  users
  out there without upgrade path to xenomai 2.6, since this will be the
  last maintained version from the Xenomai 2.x architecture.
 
  That stuff does not compile likely because the Config.in bits are not 
  up
  to date, blame it on me. I'll make this build over linux 2.4 and commit
  the result today.
 
 
  No problem, I was not looking for someone to blame... Since you are at
  it, I have problems compiling the nios2 kernel too, but I am not sure I
  got the proper configuration file.
 
 
  HEAD builds fine based on the attached .config. 
 
 
  Btw we now only support the MMU version (2.6.35.2) of this kernel over
  Xenomai 2.6. Reference tree is available there:
 
  url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git
  branch = nios2mmu
 
  nommu support is discontinued for nios2 - people who depend on it should
  stick with Xenomai 2.5.x.
 
  
  Ok, still not building, maybe the commit number mentioned in the README
  is not up-to-date?
  
 
 More build failures for kernel 3.0 and ppc...
 
 http://sisyphus.hd.free.fr/~gilles/bx/index.html#powerpc
 

I've fixed most of these, however the platform driver interface changed
once again circa 2.6.39, and AFAICT, picking the right approach to cope
with this never ending mess for the mscan driver requires some thoughts
from educated people. Since I don't qualify for the job, I'm shamelessly
passing the buck to Wolfgang:
http://sisyphus.hd.free.fr/~gilles/bx/lite5200/3.0.4-ppc_6xx-gcc-4.2.2/log.html#1

PS: I guess this fix can wait until 2.6.0 final, this is not critical
for -rc2.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [Xenomai-help] Xenomai 2.6.0-rc1

2011-09-06 Thread Philippe Gerum

On Tue, 2011-09-06 at 20:19 +0200, Gilles Chanteperdrix wrote:
 On 09/06/2011 05:10 PM, Philippe Gerum wrote:
  On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote:
  On Tue, 2011-09-06 at 16:53 +0200, Philippe Gerum wrote:
  On Tue, 2011-09-06 at 16:19 +0200, Gilles Chanteperdrix wrote:
  On 09/06/2011 03:27 PM, Philippe Gerum wrote:
  On Tue, 2011-09-06 at 13:31 +0200, Gilles Chanteperdrix wrote:
  On 09/04/2011 10:52 PM, Gilles Chanteperdrix wrote:
 
  Hi,
 
  The first release candidate for the 2.6.0 version may be downloaded 
  here:
 
  http://download.gna.org/xenomai/testing/xenomai-2.6.0-rc1.tar.bz2
 
  Hi,
 
  currently 2.6.0-rc1 fails to build on 2.4 kernel, with errors related 
  to
  vfile support. Do we really want to still support 2.4 kernels?
 
 
  That would not be a massive loss, but removing linux 2.4 support is more
  than a few hunks here and there, so this may not be the right thing to
  do ATM. Besides, it would be better not to leave the few linux 2.4 users
  out there without upgrade path to xenomai 2.6, since this will be the
  last maintained version from the Xenomai 2.x architecture.
 
  That stuff does not compile likely because the Config.in bits are not up
  to date, blame it on me. I'll make this build over linux 2.4 and commit
  the result today.
 
 
  No problem, I was not looking for someone to blame... Since you are at
  it, I have problems compiling the nios2 kernel too, but I am not sure I
  got the proper configuration file.
 
 
  HEAD builds fine based on the attached .config. 
 
  
  Btw we now only support the MMU version (2.6.35.2) of this kernel over
  Xenomai 2.6. Reference tree is available there:
  
  url = git://sopc.et.ntust.edu.tw/git/linux-2.6.git
  branch = nios2mmu
  
  nommu support is discontinued for nios2 - people who depend on it should
  stick with Xenomai 2.5.x.
  
 
 Ok, still not building, maybe the commit number mentioned in the README
 is not up-to-date?
 

The commit # is correct, but I suspect that your kernel tree does not
have the files normally created by the SOPC builder anymore, these can't
(may not actually) be included in the pipeline patch. In short, your
tree might be missing the bits corresponding to the fpga design your
build for, so basic symbols like HRCLOCK* and HRTIMER* are undefined.

I'm building for a cyclone 3c25 from the NEEK kit, with SOPC files
available from arch/nios2/boards/neek. Any valuable files in there on
your side? (typically, include/asm/custom_fpga.h should contain
definitions for our real-time clocks and timers)

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?

2011-08-26 Thread Philippe Gerum

On Fri, 2011-08-26 at 14:34 +0200, Gilles Chanteperdrix wrote:
 Hi,
 
 I think it is about time we release Xenomai 2.6.0. Has anyone anything
 pending (maybe Alex)? Should we release an -rc first?
 
 Thanks in advance for your input.
 


Nothing pending for 2.6, I'm focusing on 3.x now. However let's go for
-rc1 first, this is a major release anyway.

-- 
Philippe.




___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [RFC] heap: rename sys_sem_heap syscall to sys_heap_info

2011-08-02 Thread Philippe Gerum

On Tue, 2011-08-02 at 21:16 +0200, Gilles Chanteperdrix wrote:
 On 08/01/2011 10:20 PM, Gilles Chanteperdrix wrote:
  And add the count of used bytes to the xnheap_desc structure. This allows
  for checking for leaks in unit tests.
  ---
 
 No comments?

Fine with me.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Exception #14

2011-07-28 Thread Philippe Gerum

On Wed, 2011-07-27 at 20:52 +0200, Gilles Chanteperdrix wrote:
 On 07/26/2011 09:36 AM, zenati wrote:
  Dear,
  
  I'm developping the skin Arinc 653 for Xenomai. I'm trying to run 
  process with my skin but I have an exception :
  
  Xenomai: suspending kernel thread d8824c40 ('�') at 0xb76dbdfc after 
  exception #14
  
  What is the exception 14 ? Have you an idea how can I solve them?
  
  Thank you for your attention and for your help.
  Sincerely,
 
 The meaning of the fault number depends on the platform you are using,
 see /proc/xenomai/faults for human readable messages for your platform.

I guess this is PF on x86, and this thread's TCB in kernel space looks
badly trashed. You should probably check the behavior of your Xenomai
kernel threads wrt memory writes, and possibly for stack overflows as
well.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] HOW CAN I KNOW WHICH LINUX SYSTEM CALLS SWITCH TASK IN SECONDARY MODE ?

2011-07-21 Thread Philippe Gerum

On Thu, 2011-07-21 at 12:38 +0200, Roberto Bielli wrote:
 Hi,
 
 how can it know with assurance which linux system calls switch to 
 secondary mode and which not ?

All do.

 
 Thanks for all
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [RFC] Waitqueue-free gatekeeper wakeup

2011-07-20 Thread Philippe Gerum

On Mon, 2011-07-18 at 13:52 +0200, Jan Kiszka wrote:
 Hi Philippe,
 
 trying to decouple the PREEMPT-RT gatekeeper wakeup path from XNATOMIC
 (to fix the remaining races there), I wondered why we need a waitqueue
 here at all.
 
 What about an approach like below, i.e. waking up the gatekeeper
 directly via wake_up_process? That could even be called from interrupt
 context. We should be able to avoid missing a wakeup by setting the task
 state to INTERRUPTIBLE before signaling the semaphore.
 
 Am I missing something?

No, I think this should work. IIRC, the wait queue dates back when we
did not have a strong synchro between the hardening code and the gk via
the request token, i.e. the initial implementation over 2.4 kernels. So
it is about time to question this.

 
 Jan
 
 
 diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
 index e251329..df8853b 100644
 --- a/include/nucleus/sched.h
 +++ b/include/nucleus/sched.h
 @@ -111,7 +111,6 @@ typedef struct xnsched {
  
  #ifdef CONFIG_XENO_OPT_PERVASIVE
   struct task_struct *gatekeeper;
 - wait_queue_head_t gkwaitq;
   struct semaphore gksync;
   struct xnthread *gktarget;
  #endif
 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index f6b1e16..238317a 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -92,7 +92,6 @@ static struct __lostagerq {
  #define LO_SIGGRP_REQ 2
  #define LO_SIGTHR_REQ 3
  #define LO_UNMAP_REQ  4
 -#define LO_GKWAKE_REQ 5
   int type;
   struct task_struct *task;
   int arg;
 @@ -759,9 +758,6 @@ static void lostage_handler(void *cookie)
   int cpu, reqnum, type, arg, sig, sigarg;
   struct __lostagerq *rq;
   struct task_struct *p;
 -#ifdef CONFIG_PREEMPT_RT
 - struct xnsched *sched;
 -#endif
  
   cpu = smp_processor_id();
   rq = lostagerq[cpu];
 @@ -819,13 +815,6 @@ static void lostage_handler(void *cookie)
   case LO_SIGGRP_REQ:
   kill_proc(p-pid, arg, 1);
   break;
 -
 -#ifdef CONFIG_PREEMPT_RT
 - case LO_GKWAKE_REQ:
 - sched = xnpod_sched_slot(cpu);
 - wake_up_interruptible_sync(sched-gkwaitq);
 - break;
 -#endif
   }
   }
  }
 @@ -873,7 +862,6 @@ static inline int normalize_priority(int prio)
  static int gatekeeper_thread(void *data)
  {
   struct task_struct *this_task = current;
 - DECLARE_WAITQUEUE(wait, this_task);
   int cpu = (long)data;
   struct xnsched *sched = xnpod_sched_slot(cpu);
   struct xnthread *target;
 @@ -886,12 +874,10 @@ static int gatekeeper_thread(void *data)
   set_cpus_allowed(this_task, cpumask);
   set_linux_task_priority(this_task, MAX_RT_PRIO - 1);
  
 - init_waitqueue_head(sched-gkwaitq);
 - add_wait_queue_exclusive(sched-gkwaitq, wait);
 + set_current_state(TASK_INTERRUPTIBLE);
   up(sched-gksync); /* Sync with xnshadow_mount(). */
  
   for (;;) {
 - set_current_state(TASK_INTERRUPTIBLE);
   up(sched-gksync); /* Make the request token available. */
   schedule();
  
 @@ -937,6 +923,7 @@ static int gatekeeper_thread(void *data)
   xnlock_put_irqrestore(nklock, s);
   xnpod_schedule();
   }
 + set_current_state(TASK_INTERRUPTIBLE);
   }
  
   return 0;
 @@ -1014,23 +1001,9 @@ redo:
   thread-gksched = sched;
   xnthread_set_info(thread, XNATOMIC);
   set_current_state(TASK_INTERRUPTIBLE | TASK_ATOMICSWITCH);
 -#ifndef CONFIG_PREEMPT_RT
 - /*
 -  * We may not hold the preemption lock across calls to
 -  * wake_up_*() services over fully preemptible kernels, since
 -  * tasks might sleep when contending for spinlocks. The wake
 -  * up call for the gatekeeper will happen later, over an APC
 -  * we kick in do_schedule_event() on the way out for the
 -  * hardening task.
 -  *
 -  * We could delay the wake up call over non-RT 2.6 kernels as
 -  * well, but not when running over 2.4 (scheduler innards
 -  * would not allow this, causing weirdnesses when hardening
 -  * tasks). So we always do the early wake up when running
 -  * non-RT, which includes 2.4.
 -  */
 - wake_up_interruptible_sync(sched-gkwaitq);
 -#endif
 +
 + wake_up_process(sched-gatekeeper);
 +
   schedule();
  
   /*
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-16 Thread Philippe Gerum

On Sat, 2011-07-16 at 11:15 +0200, Jan Kiszka wrote:
 On 2011-07-16 10:52, Philippe Gerum wrote:
  On Sat, 2011-07-16 at 10:13 +0200, Jan Kiszka wrote:
  On 2011-07-15 15:10, Jan Kiszka wrote:
  But... right now it looks like we found our primary regression:
  nucleus/shadow: shorten the uninterruptible path to secondary mode.
  It opens a short windows during relax where the migrated task may be
  active under both schedulers. We are currently evaluating a revert
  (looks good so far), and I need to work out my theory in more
  details.
 
  Looks like this commit just made a long-standing flaw in Xenomai's
  interrupt handling more visible: We reschedule over the interrupt stack
  in the Xenomai interrupt handler tails, at least on x86-64. Not sure if
  other archs have interrupt stacks, the point is Xenomai's design wrongly
  assumes there are no such things.
 
  Fortunately, no, this is not a design issue, no such assumption was ever
  made, but the Xenomai core expects this to be handled on a per-arch
  basis with the interrupt pipeline.
 
 And that's already the problem: If Linux uses interrupt stacks, relying 
 on ipipe to disable this during Xenomai interrupt handler execution is 
 at best a workaround. A fragile one unless you increase the pre-thread 
 stack size by the size of the interrupt stack. Lacking support for a 
 generic rescheduling hook became a problem by the time Linux introduced 
 interrupt threads.

Don't assume too much. What was done for ppc64 was not meant as a
general policy. Again, this is a per-arch decision.

 
  As you pointed out, there is no way
  to handle this via some generic Xenomai-only support.
 
  ppc64 now has separate interrupt stacks, which is why I disabled
  IRQSTACKS which became the builtin default at some point. Blackfin goes
  through a Xenomai-defined irq tail handler as well, because it may not
  reschedule over nested interrupt stacks.
 
 How does this arch prevent that xnpod_schedule in the generic interrupt 
 handler tail does its normal work?

It polls some hw status to know whether a rescheduling would be safe.
See xnarch_escalate().

 
  Fact is that such pending
  problem with x86_64 was overlooked since day #1 by /me.
 
We were lucky so far that the values
  saved on this shared stack were apparently compatible, means we were
  overwriting them with identical or harmless values. But that's no longer
  true when interrupts are hitting us in the xnpod_suspend_thread path of
  a relaxing shadow.
 
 
  Makes sense. It would be better to find a solution that does not make
  the relax path uninterruptible again for a significant amount of time.
  On low end platforms we support (i.e. non-x86* mainly), this causes
  obvious latency spots.
 
 I agree. Conceptually, the interruptible relaxation should be safe now 
 after recent fixes.
 
 
  Likely the only possible fix is establishing a reschedule hook for
  Xenomai in the interrupt exit path after the original stack is restored
  - - just like Linux works. Requires changes to both ipipe and Xenomai
  unfortunately.
 
  __ipipe_run_irqtail() is in the I-pipe core for such purpose. If
  instantiated properly for x86_64, and paired with xnarch_escalate() for
  that arch as well, it could be an option for running the rescheduling
  procedure when safe.
 
 Nope, that doesn't work. The stack is switched later in the return path 
 in entry_64.S. We need a hook there, ideally a conditional one, 
 controlled by some per-cpu variable that is set by Xenomai on return 
 from its interrupt handlers to signal the rescheduling need.
 

Yes, makes sense. The way to make it conditional without dragging bits
of Xenomai logic into the kernel innards is not obvious though.

It is probably time to officially introduce exo-kernel oriented bits
into the Linux thread info. PTDs have too lose semantics to be practical
if we want to avoid trashing the I-cache by calling probe hooks within
the dual kernel, each time we want to check some basic condition (e.g.
resched needed). A backlink to a foreign TCB there would help too.

Which leads us to killing the ad hoc kernel threads (and stacks) at some
point, which are an absolute pain.

 Jan

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-13 Thread Philippe Gerum

On Wed, 2011-07-13 at 20:39 +0200, Gilles Chanteperdrix wrote:
 On 07/12/2011 07:43 PM, Jan Kiszka wrote:
  On 2011-07-12 19:38, Gilles Chanteperdrix wrote:
  On 07/12/2011 07:34 PM, Jan Kiszka wrote:
  On 2011-07-12 19:31, Gilles Chanteperdrix wrote:
  On 07/12/2011 02:57 PM, Jan Kiszka wrote:
  xnlock_put_irqrestore(nklock, s);
  xnpod_schedule();
  }
  @@ -1036,6 +1043,7 @@ redo:
   * to process this signal anyway.
   */
  if (rthal_current_domain == rthal_root_domain) {
  +   XENO_BUGON(NUCLEUS, xnthread_test_info(thread, 
  XNATOMIC));
 
  Misleading dead code again, XNATOMIC is cleared not ten lines above.
 
  Nope, I forgot to remove that line.
 
 
  if (XENO_DEBUG(NUCLEUS)  (!signal_pending(this_task)
  || this_task-state != TASK_RUNNING))
  xnpod_fatal
  @@ -1044,6 +1052,8 @@ redo:
  return -ERESTARTSYS;
  }
   
  +   xnthread_clear_info(thread, XNATOMIC);
 
  Why this? I find the xnthread_clear_info(XNATOMIC) right at the right
  place at the point it currently is.
 
  Nope. Now we either clear XNATOMIC after successful migration or when
  the signal is about to be sent (ie. in the hook). That way we can test
  more reliably (TM) in the gatekeeper if the thread can be migrated.
 
  Ok for adding the XNATOMIC test, because it improves the robustness, but
  why changing the way XNATOMIC is set and clear? Chances of breaking
  thing while changing code in this area are really high...
  
  The current code is (most probably) broken as it does not properly
  synchronizes the gatekeeper against a signaled and runaway target
  Linux task.
  
  We need an indication if a Linux signal will (or already has) woken up
  the to-be-migrated task. That task may have continued over its context,
  potentially on a different CPU. Providing this indication is the purpose
  of changing where XNATOMIC is cleared.
 
 What about synchronizing with the gatekeeper with a semaphore, as done
 in the first patch you sent, but doing it in xnshadow_harden, as soon as
 we detect that we are not back from schedule in primary mode? It seems
 it would avoid any further issue, as we would then be guaranteed that
 the thread could not switch to TASK_INTERRUPTIBLE again before the
 gatekeeper is finished.
 
 What worries me is the comment in xnshadow_harden:
 
* gatekeeper sent us to primary mode. Since
* TASK_UNINTERRUPTIBLE is unavailable to us without wrecking
* the runqueue's count of uniniterruptible tasks, we just
* notice the issue and gracefully fail; the caller will have
* to process this signal anyway.
*/
 
 Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this
 point? Or simply that TASK_UNINTERRUPTIBLE is not available for the
 business of xnshadow_harden?

Second interpretation is correct.

 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] ESA SOCIS initiative

2011-07-12 Thread Philippe Gerum

On Tue, 2011-07-12 at 09:20 +0200, julien.dela...@esa.int wrote:
 Dear all,
 
 
 The European Space Agency started a program called SOCIS. It aims at 
 supporting free-software projects by providing funds to students that are 
 willing to contribute to free-software projects. It works like the summer 
 of code : mentoring organization subscribe to the program and propose 
 projects. Then, the SOCIS committee selects the projects that are 
 accepted. Finally, students apply to the selected projects, the mentoring 
 organization choose the students for each project and the student has to 
 complete the project in some weeks. Finally, if the projects is 
 successfully finished, the student receives money. It is a nice way to 
 improve free software and help some student !
 
 As a Xenomai user, I was wondering if the project would like to apply to 
 SOCIS. I think Xenomai developers may have several ideas of projects for 
 students. I contacted Gilles Chanteperdrix to inform him about the 
 initiative, he told me to post on this list because more people could be 
 interested.
 
 You can have more information about the program on 
 http://sophia.estec.esa.int/socis2011/?q=about . Subscription deadline is 
 next Friday so that if you want to apply, you have to do that quickly. If 
 you have any question regarding the program, do not hesitate to post on 
 the SOCIS mailing list or to contact me.

This is interesting, and the ESA running this program makes the latter
even more attractive.

However, the tasks of a mentoring organization proposing a project
described here http://sophia.estec.esa.int/socis2011/faq seem way too
heavy for us, especially in the short term. Things we commit to do
should be done right, and unless I'm mistaken, I'm unsure anyone from
the core team would be able to dedicate the required workload to handle
this task properly.

I would welcome any suggestion to make this possible nevertheless,
because there is no shortage of interesting stuff that remains to be
done on the Xenomai code base.

Thanks fro suggesting this anyway.

 
 Best regards,
 
 
 
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-12 Thread Philippe Gerum

On Tue, 2011-07-12 at 14:57 +0200, Jan Kiszka wrote:
 On 2011-07-12 14:13, Jan Kiszka wrote:
  On 2011-07-12 14:06, Gilles Chanteperdrix wrote:
  On 07/12/2011 01:58 PM, Jan Kiszka wrote:
  On 2011-07-12 13:56, Jan Kiszka wrote:
  However, this parallel unsynchronized execution of the gatekeeper and
  its target thread leaves an increasingly bad feeling on my side. Did we
  really catch all corner cases now? I wouldn't guarantee that yet.
  Specifically as I still have an obscure crash of a Xenomai thread on
  Linux schedule() on my table.
 
  What if the target thread woke up due to a signal, continued much
  further on a different CPU, blocked in TASK_INTERRUPTIBLE, and then the
  gatekeeper continued? I wish we could already eliminate this complexity
  and do the migration directly inside schedule()...
 
  BTW, we do we mask out TASK_ATOMICSWITCH when checking the task state in
  the gatekeeper? What would happen if we included it (state ==
  (TASK_ATOMICSWITCH | TASK_INTERRUPTIBLE))?
 
  I would tend to think that what we should check is
  xnthread_test_info(XNATOMIC). Or maybe check both, the interruptible
  state and the XNATOMIC info bit.
  
  Actually, neither the info bits nor the task state is sufficiently
  synchronized against the gatekeeper yet. We need to hold a shared lock
  when testing and resetting the state. I'm not sure yet if that is
  fixable given the gatekeeper architecture.
  
 
 This may work (on top of the exit-race fix):
 
 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index 50dcf43..90feb16 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -913,20 +913,27 @@ static int gatekeeper_thread(void *data)
   if ((xnthread_user_task(target)-state  ~TASK_ATOMICSWITCH) == 
 TASK_INTERRUPTIBLE) {
   rpi_pop(target);
   xnlock_get_irqsave(nklock, s);
 -#ifdef CONFIG_SMP
 +
   /*
 -  * If the task changed its CPU while in
 -  * secondary mode, change the CPU of the
 -  * underlying Xenomai shadow too. We do not
 -  * migrate the thread timers here, it would
 -  * not work. For a full migration comprising
 -  * timers, using xnpod_migrate_thread is
 -  * required.
 +  * Recheck XNATOMIC to avoid waking the shadow if the
 +  * Linux task received a signal meanwhile.
*/
 - if (target-sched != sched)
 - xnsched_migrate_passive(target, sched);
 + if (xnthread_test_info(target, XNATOMIC)) {
 +#ifdef CONFIG_SMP
 + /*
 +  * If the task changed its CPU while in
 +  * secondary mode, change the CPU of the
 +  * underlying Xenomai shadow too. We do not
 +  * migrate the thread timers here, it would
 +  * not work. For a full migration comprising
 +  * timers, using xnpod_migrate_thread is
 +  * required.
 +  */
 + if (target-sched != sched)
 + xnsched_migrate_passive(target, sched);
  #endif /* CONFIG_SMP */
 - xnpod_resume_thread(target, XNRELAX);
 + xnpod_resume_thread(target, XNRELAX);
 + }
   xnlock_put_irqrestore(nklock, s);
   xnpod_schedule();
   }
 @@ -1036,6 +1043,7 @@ redo:
* to process this signal anyway.
*/
   if (rthal_current_domain == rthal_root_domain) {
 + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC));
   if (XENO_DEBUG(NUCLEUS)  (!signal_pending(this_task)
   || this_task-state != TASK_RUNNING))
   xnpod_fatal
 @@ -1044,6 +1052,8 @@ redo:
   return -ERESTARTSYS;
   }
  
 + xnthread_clear_info(thread, XNATOMIC);
 +
   /* current is now running into the Xenomai domain. */
   thread-gksched = NULL;
   sched = xnsched_finish_unlocked_switch(thread-sched);
 @@ -2650,6 +2660,8 @@ static inline void do_sigwake_event(struct task_struct 
 *p)
  
   xnlock_get_irqsave(nklock, s);
  
 + xnthread_clear_info(thread, XNATOMIC);
 +
   if ((p-ptrace  PT_PTRACED)  !xnthread_test_state(thread, XNDEBUG)) {
   sigset_t pending;
  
 
 It totally ignores RPI and PREEMPT_RT for now. RPI is broken anyway,

I want to drop RPI in v3 for sure because it is misleading people. I'm
still pondering whether we should do that earlier during the 2.6
timeframe.

 ripping it out would allow to use solely XNATOMIC as condition in the
 gatekeeper.
 
 /me is now looking to get

Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage

2011-06-23 Thread Philippe Gerum

On Thu, 2011-06-23 at 20:13 +0200, Philippe Gerum wrote:
 On Thu, 2011-06-23 at 19:32 +0200, Gilles Chanteperdrix wrote:
  On 06/23/2011 01:15 PM, Jan Kiszka wrote:
   On 2011-06-23 13:11, Gilles Chanteperdrix wrote:
   On 06/23/2011 11:37 AM, Jan Kiszka wrote:
   On 2011-06-20 19:07, Jan Kiszka wrote:
   On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
   On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
   On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
   I am working on this ppd cleanup issue again, I am asking for help 
   to
   find a fix in -head for all cases where the sys_ppd is needed during
   some cleanup.
  
   The problem is that when the ppd cleanup is invoked:
   - we have no guarantee that current is a thread from the Xenomai
   application;
   - if it is, current-mm is NULL.
  
   So, associating the sys_ppd to either current or current-mm does 
   not
   work. What we could do is pass the sys_ppd to all the other ppds 
   cleanup
   handlers, this would fix cases such as freeing mutexes fastlock, but
   that does not help when the sys_ppd is needed during a thread 
   deletion hook.
  
   I would like to find a solution where simply calling xnsys_ppd_get()
   will work, where we do not have an xnsys_ppd_get for each context, 
   such
   as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
   because it would be too error-prone.
  
   Any idea anyone?
  
   The best I could come up with: use a ptd to store the mm currently 
   being cleaned up, so that xnshadow_ppd_get continues to work, even
   in the middle of a cleanup.
  
   In order to also get xnshadow_ppd_get to work in task deletion hooks 
   (which is needed to avoid the issue at the origin of this thread), we 
   also need to set this ptd upon shadow mapping, so it is still there 
   when reaching the task deletion hook (where current-mm may be NULL). 
   Hence the patch:
  
   diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
   index b243600..6bc4210 100644
   --- a/ksrc/nucleus/shadow.c
   +++ b/ksrc/nucleus/shadow.c
   @@ -65,6 +65,11 @@ int nkthrptd;
EXPORT_SYMBOL_GPL(nkthrptd);
int nkerrptd;
EXPORT_SYMBOL_GPL(nkerrptd);
   +int nkmmptd;
   +EXPORT_SYMBOL_GPL(nkmmptd);
   +
   +#define xnshadow_mmptd(t) ((t)-ptd[nkmmptd])
   +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
  
   xnshadow_mm() can now return a no longer existing mm. So no user of
   xnshadow_mm should ever dereference that pointer. Thus we better change
   all that user to treat the return value as a void pointer e.g.
  

struct xnskin_slot {
 struct xnskin_props *props;
   @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, 
   xncompletion_t __user *u_completion,
  * friends.
  */
 xnshadow_thrptd(current) = thread;
   + xnshadow_mmptd(current) = current-mm;
   +
 rthal_enable_notifier(current);

 if (xnthread_base_priority(thread) == 0 
   @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)

static inline void do_cleanup_event(struct mm_struct *mm)
{
   + struct task_struct *p = current;
   + struct mm_struct *old;
   +
   + old = xnshadow_mm(p);
   + xnshadow_mmptd(p) = mm;
   +
 ppd_remove_mm(mm, detach_ppd);
   +
   + xnshadow_mmptd(p) = old;
  
   I don't have the full picture yet, but that feels racy: If the context
   over which we clean up that foreign mm is also using xnshadow_mmptd,
   other threads in that process may dislike this temporary change.
  
}

RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
   @@ -2925,7 +2940,7 @@ 
   EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
{
 if (xnpod_userspace_p())
   - return ppd_lookup(muxid, current-mm);
   + return ppd_lookup(muxid, xnshadow_mm(current) ?: 
   current-mm);

 return NULL;
}
   @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
 sema_init(completion_mutex, 1);
 nkthrptd = rthal_alloc_ptdkey();
 nkerrptd = rthal_alloc_ptdkey();
   + nkmmptd = rthal_alloc_ptdkey();

   - if (nkthrptd  0 || nkerrptd  0) {
   + if (nkthrptd  0 || nkerrptd  0 || nkmmptd  0) {
 printk(KERN_ERR Xenomai: cannot allocate PTD slots\n);
 return -ENOMEM;
 }
   diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
   index 6ce75e5..cc86852 100644
   --- a/ksrc/skins/posix/mutex.c
   +++ b/ksrc/skins/posix/mutex.c
   @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t 
   *mutex,
 xnlock_put_irqrestore(nklock, s);

#ifdef CONFIG_XENO_FASTSYNCH
   - /* We call xnheap_free even if the mutex is not pshared; when
   -this function is called from pse51_mutexq_cleanup, the
   -sem_heap is destroyed, or not the one to which the fastlock
   -belongs, xnheap will simply return an error

Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage

2011-06-23 Thread Philippe Gerum

On Thu, 2011-06-23 at 19:32 +0200, Gilles Chanteperdrix wrote:
 On 06/23/2011 01:15 PM, Jan Kiszka wrote:
  On 2011-06-23 13:11, Gilles Chanteperdrix wrote:
  On 06/23/2011 11:37 AM, Jan Kiszka wrote:
  On 2011-06-20 19:07, Jan Kiszka wrote:
  On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
  On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
  On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
  I am working on this ppd cleanup issue again, I am asking for help to
  find a fix in -head for all cases where the sys_ppd is needed during
  some cleanup.
 
  The problem is that when the ppd cleanup is invoked:
  - we have no guarantee that current is a thread from the Xenomai
  application;
  - if it is, current-mm is NULL.
 
  So, associating the sys_ppd to either current or current-mm does not
  work. What we could do is pass the sys_ppd to all the other ppds 
  cleanup
  handlers, this would fix cases such as freeing mutexes fastlock, but
  that does not help when the sys_ppd is needed during a thread 
  deletion hook.
 
  I would like to find a solution where simply calling xnsys_ppd_get()
  will work, where we do not have an xnsys_ppd_get for each context, 
  such
  as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
  because it would be too error-prone.
 
  Any idea anyone?
 
  The best I could come up with: use a ptd to store the mm currently 
  being cleaned up, so that xnshadow_ppd_get continues to work, even
  in the middle of a cleanup.
 
  In order to also get xnshadow_ppd_get to work in task deletion hooks 
  (which is needed to avoid the issue at the origin of this thread), we 
  also need to set this ptd upon shadow mapping, so it is still there 
  when reaching the task deletion hook (where current-mm may be NULL). 
  Hence the patch:
 
  diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
  index b243600..6bc4210 100644
  --- a/ksrc/nucleus/shadow.c
  +++ b/ksrc/nucleus/shadow.c
  @@ -65,6 +65,11 @@ int nkthrptd;
   EXPORT_SYMBOL_GPL(nkthrptd);
   int nkerrptd;
   EXPORT_SYMBOL_GPL(nkerrptd);
  +int nkmmptd;
  +EXPORT_SYMBOL_GPL(nkmmptd);
  +
  +#define xnshadow_mmptd(t) ((t)-ptd[nkmmptd])
  +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
 
  xnshadow_mm() can now return a no longer existing mm. So no user of
  xnshadow_mm should ever dereference that pointer. Thus we better change
  all that user to treat the return value as a void pointer e.g.
 
   
   struct xnskin_slot {
  struct xnskin_props *props;
  @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, 
  xncompletion_t __user *u_completion,
   * friends.
   */
  xnshadow_thrptd(current) = thread;
  +   xnshadow_mmptd(current) = current-mm;
  +
  rthal_enable_notifier(current);
   
  if (xnthread_base_priority(thread) == 0 
  @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
   
   static inline void do_cleanup_event(struct mm_struct *mm)
   {
  +   struct task_struct *p = current;
  +   struct mm_struct *old;
  +
  +   old = xnshadow_mm(p);
  +   xnshadow_mmptd(p) = mm;
  +
  ppd_remove_mm(mm, detach_ppd);
  +
  +   xnshadow_mmptd(p) = old;
 
  I don't have the full picture yet, but that feels racy: If the context
  over which we clean up that foreign mm is also using xnshadow_mmptd,
  other threads in that process may dislike this temporary change.
 
   }
   
   RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
  @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
   xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
   {
  if (xnpod_userspace_p())
  -   return ppd_lookup(muxid, current-mm);
  +   return ppd_lookup(muxid, xnshadow_mm(current) ?: 
  current-mm);
   
  return NULL;
   }
  @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
  sema_init(completion_mutex, 1);
  nkthrptd = rthal_alloc_ptdkey();
  nkerrptd = rthal_alloc_ptdkey();
  +   nkmmptd = rthal_alloc_ptdkey();
   
  -   if (nkthrptd  0 || nkerrptd  0) {
  +   if (nkthrptd  0 || nkerrptd  0 || nkmmptd  0) {
  printk(KERN_ERR Xenomai: cannot allocate PTD slots\n);
  return -ENOMEM;
  }
  diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
  index 6ce75e5..cc86852 100644
  --- a/ksrc/skins/posix/mutex.c
  +++ b/ksrc/skins/posix/mutex.c
  @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t 
  *mutex,
  xnlock_put_irqrestore(nklock, s);
   
   #ifdef CONFIG_XENO_FASTSYNCH
  -   /* We call xnheap_free even if the mutex is not pshared; when
  -  this function is called from pse51_mutexq_cleanup, the
  -  sem_heap is destroyed, or not the one to which the fastlock
  -  belongs, xnheap will simply return an error. */
 
  I think this comment is not completely obsolete. It still applies /wrt
  shared/non-shared.

Re: [Xenomai-core] [RFC] Getting rid of the NMI latency watchdog

2011-06-22 Thread Philippe Gerum

On Wed, 2011-06-22 at 19:16 +0200, Gilles Chanteperdrix wrote:
 On 05/19/2011 10:29 PM, Philippe Gerum wrote:
  On Thu, 2011-05-19 at 20:36 +0200, Jan Kiszka wrote:
  On 2011-05-19 20:15, Gilles Chanteperdrix wrote:
  On 05/19/2011 03:58 PM, Philippe Gerum wrote:
  For this reason, I'm considering issuing a patch for a complete removal
  of the NMI latency watchdog code in Xenomai 2.6.x, disabling the feature
  for 2.6.38 kernels and above in 2.5.x.
 
  Comments welcome.
 
  I am in the same case as you: I no longer use Xeno's NMI watchdog, so I
  agree to get rid of it.
 
  Yeah. The last time we wanted to use it get more information about a
  hard hang, the CPU we used was not supported.
 
  Philippe, did you test the Linux watchdog already, if it generate proper
  results on artificial Xenomai lockups on a single core?
  
  This works provided we tell the pipeline to enter printk-sync mode when
  the watchdog kicks. So I'd say that we could probably do a better job in
  making the pipeline core smarter wrt NMI watchdog context handling than
  asking Xenomai to dup the mainline code for having its own NMI handling.
 
 If nobody disagrees, I am removing this code from -head. Now.
 

Ack.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Number of arguments

2011-06-21 Thread Philippe Gerum

On Tue, 2011-06-21 at 17:33 +0200, zenati wrote:
 Dear,
 
 As known, the norm arinc653 is very strict and the API proposed should 
 be respected. Some functions of the API need more than five arguments. 
 However, the SKINCALL is limited to 5. If I want to increase, I have 
 to modify the following files :
   - ./xenomai-2.5.6/include/asm-arm/syscall.h
   - ./xenomai-2.5.6/include/asm-blackfin/syscall.h
   - ./xenomai-2.5.6/include/asm-x86/syscall.h
   - ./xenomai-2.5.6/include/asm-powerpc/syscall.h
   - ./xenomai-2.5.6/include/asm-nios2/syscall.h
 
 Is it possible? Is is a good idea ?

No, dead end. You should group arguments in a struct and pass the
address of such struct to kernel land for decoding.

 
 Thank you for your attention and your help.
 Sincerely
 
 Omar ZENATI
 
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [RFC][PATCH] nucleus: Prevent rescheduling while in xntbase_tick

2011-06-17 Thread Philippe Gerum

On Fri, 2011-06-17 at 13:03 +0200, Jan Kiszka wrote:
 On 2011-06-17 12:58, Gilles Chanteperdrix wrote:
  On 06/17/2011 11:27 AM, Jan Kiszka wrote:
  Based on code inspection, it looks like a timer handler triggering a
  reschedule in the path xntbase_tick - xntimer_tick_aperiodic /
  xntimer_tick_periodic_inner - handler can cause problems, e.g. a
  reschedule before all expired timers were processed. The timer core is
  usually run atomically from an interrupt handler, so better emulate an
  IRQ context inside xntbase_tick by setting XNINIRQ.
  
  I do not understand this one either: if we are inside
  xntimer_tick_aperiodic, XNINIRQ is already set.
 
 Not if you come via xntbase_tick, called by the mentioned skins also
 outside a timer IRQ (at least based on my understanding of that skin
 APIs). But I might be wrong, I just came across this while checking for
 potentially invalid cached xnpod_current_sched values.

That is ok, ui_timer(), tickAnnounce() and tm_tick() are designed by the
respective RTOS to be called from IRQ context.

 
 Jan
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [RFC][PATCH] nucleus: Prevent rescheduling while in xntbase_tick

2011-06-17 Thread Philippe Gerum

On Fri, 2011-06-17 at 14:11 +0200, Jan Kiszka wrote:
 On 2011-06-17 13:58, Philippe Gerum wrote:
  On Fri, 2011-06-17 at 13:03 +0200, Jan Kiszka wrote:
  On 2011-06-17 12:58, Gilles Chanteperdrix wrote:
  On 06/17/2011 11:27 AM, Jan Kiszka wrote:
  Based on code inspection, it looks like a timer handler triggering a
  reschedule in the path xntbase_tick - xntimer_tick_aperiodic /
  xntimer_tick_periodic_inner - handler can cause problems, e.g. a
  reschedule before all expired timers were processed. The timer core is
  usually run atomically from an interrupt handler, so better emulate an
  IRQ context inside xntbase_tick by setting XNINIRQ.
 
  I do not understand this one either: if we are inside
  xntimer_tick_aperiodic, XNINIRQ is already set.
 
  Not if you come via xntbase_tick, called by the mentioned skins also
  outside a timer IRQ (at least based on my understanding of that skin
  APIs). But I might be wrong, I just came across this while checking for
  potentially invalid cached xnpod_current_sched values.
  
  That is ok, ui_timer(), tickAnnounce() and tm_tick() are designed by the
  respective RTOS to be called from IRQ context.
 
 Fine. Should we add a XENO_ASSERT to set this in stone and for
 documentation purposes?

I think so.

 
 Jan
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Oops with synchronous message passing support

2011-06-09 Thread Philippe Gerum

On Thu, 2011-06-09 at 14:42 +0200, Wolfgang Grandegger wrote:
 Hello,
 
 I just realized a problem with synchronous message passing support. When
 rt_task_send() send times out, I get the oops below from line:
 

Does this help?

diff --git a/ksrc/skins/native/task.c b/ksrc/skins/native/task.c
index b822fd0..b0e99a7 100644
--- a/ksrc/skins/native/task.c
+++ b/ksrc/skins/native/task.c
@@ -1988,21 +1988,28 @@ int rt_task_receive(RT_TASK_MCB *mcb_r, RTIME timeout)
}
 
/*
-* Wait on our receive slot for some client to enqueue itself
-* in our send queue.
+* We loop to care for spurious wakeups, in case the
+* client times out before we unblock.
 */
-   info = xnsynch_sleep_on(server-mrecv, timeout, XN_RELATIVE);
-   /*
-* XNRMID cannot happen, since well, the current task would be the
-* deleted object, so...
-*/
-   if (info  XNTIMEO) {
-   err = -ETIMEDOUT;   /* Timeout. */
-   goto unlock_and_exit;
-   } else if (info  XNBREAK) {
-   err = -EINTR;   /* Unblocked. */
-   goto unlock_and_exit;
-   }
+   do {
+   /*
+* Wait on our receive slot for some client to enqueue
+* itself in our send queue.
+*/
+   info = xnsynch_sleep_on(server-mrecv, timeout, XN_RELATIVE);
+   /*
+* XNRMID cannot happen, since well, the current task
+* would be the deleted object, so...
+*/
+   if (info  XNTIMEO) {
+   err = -ETIMEDOUT;   /* Timeout. */
+   goto unlock_and_exit;
+   }
+   if (info  XNBREAK) {
+   err = -EINTR;   /* Unblocked. */
+   goto unlock_and_exit;
+   }
+   } while (!xnsynch_pended_p(server-mrecv));
 
holder = getheadpq(xnsynch_wait_queue(server-msendq));
/* There must be a valid holder since we waited for it. */

 http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/skins/native/task.c#1976
 
 -bash-3.2# ./oops_sender 
 pre-rt_task_receive()
 [  662.423571] Unable to handle kernel paging request for data at address 
 0x024c
 [  662.515607] Faulting instruction address: 0xc0070124
 [  662.576614] Oops: Kernel access of bad area, sig: 11 [#2]
 [  662.642806] mpc5200-simple-platform
 [  662.685493] last sysfs file: 
 [  662.721775] Modules linked in:
 [  662.759127] NIP: c0070124 LR: c00701c8 CTR: 
 [  662.819974] REGS: c7b8bd40 TRAP: 0300   Tainted: G  D  
 (2.6.36.4-3-g1af23a4-dirty)
 [  662.925684] MSR: 3032 FP,ME,IR,DR  CR: 24008482  XER: 2000
 [  663.003613] DAR: 024c, DSISR: 2000
 [  663.053780] TASK = c7b923f0[1227] 'oops_test_main' THREAD: c7b8a000
 [  663.128525] GPR00:  c7b8bdf0 c7b923f0 c902a69c c042ea60  
 36291b28 c042ea60 
 [  663.231015] GPR08: c902a210 fe0c c902a210 c9029df8 24008422 1004a118 
   
 [  663.333503] GPR16: c040dc54 c0425ba0 fff0 c7b8bf50 c0425ba0 c042f058 
 0010 c902a210 
 [  663.435993] GPR24: c03f9af8 c0425ba0 c03fb678  fdfc c902a200 
 c7b8be20 fffc 
 [  663.540611] NIP [c0070124] rt_task_receive+0xc8/0x1ac
 [  663.602534] LR [c00701c8] rt_task_receive+0x16c/0x1ac
 [  663.664436] Call Trace:
 [  663.694322] [c7b8bdf0] [c00701c8] rt_task_receive+0x16c/0x1ac (unreliable)
 [  663.778687] [c7b8be10] [c0072de4] __rt_task_receive+0xd0/0x1b0
 [  663.850245] [c7b8be90] [c0068cd0] losyscall_event+0xc8/0x328
 [  663.919654] [c7b8bed0] [c00587c8] __ipipe_dispatch_event+0xa4/0x200
 [  663.996532] [c7b8bf20] [c000ae78] __ipipe_syscall_root+0x58/0x164
 [  664.071289] [c7b8bf40] [c00104b8] DoSyscall+0x20/0x5c
 [  664.133213] --- Exception: c01 at 0xffaca7c
 [  664.133222] LR = 0xffaca14
 [  664.221809] Instruction dump:
 [  664.258092] 557b07fe 90091b48 813d049c 7f9c4800 419e007c 2f89 3929fe0c 
 419e0070 
 [  664.353104] 2f89 3b80 419e0008 3b89fff0 83bc0450 3be0ff97 
 801e000c 7f9d0040 
 [  664.452834] ---[ end trace 07ae98a3f6576a96 ]---
 
 Message from syslogd@ at Sat Feb 21 08:02:43 1970 ...
 CPUP0 kernel: [  662.685493] last sysfs file: rt_task_send() failed: -110 
 (Connection timed out)
 Killing child
 
 The oops is *not* trigger if the timeout is long enough and
 rt_task_send() returns successfully.
 
 I'm using on a PowerPC MPC5200-based system:
 
   -bash-3.2# cat /proc/ipipe/version 
   2.12-03
   -bash-3.2# cat /proc/xenomai/version 
   2.5.6
   -bash-3.2# uname -a
   Linux CPUP0 2.6.36.4-3-g1af23a4-dirty #9 Thu Jun 9 11:56:54 CEST 2011 
 ppc ppc ppc GNU/Linux
 
 Any idea what could go wrong. I have attached my litte test programs
 including Makefile. Just start it with ./oops_sender.
 
 Thanks,
 
 Wolfgang.
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org

Re: [Xenomai-core] Oops with synchronous message passing support

2011-06-09 Thread Philippe Gerum

On Thu, 2011-06-09 at 15:34 +0200, Wolfgang Grandegger wrote:
 Hi Philippe,
 
 On 06/09/2011 03:05 PM, Philippe Gerum wrote:
  On Thu, 2011-06-09 at 14:42 +0200, Wolfgang Grandegger wrote:
  Hello,
 
  I just realized a problem with synchronous message passing support. When
  rt_task_send() send times out, I get the oops below from line:
 
  
  Does this help?
  
  diff --git a/ksrc/skins/native/task.c b/ksrc/skins/native/task.c
  index b822fd0..b0e99a7 100644
  --- a/ksrc/skins/native/task.c
  +++ b/ksrc/skins/native/task.c
  @@ -1988,21 +1988,28 @@ int rt_task_receive(RT_TASK_MCB *mcb_r, RTIME 
  timeout)
  }
   
  /*
  -* Wait on our receive slot for some client to enqueue itself
  -* in our send queue.
  +* We loop to care for spurious wakeups, in case the
  +* client times out before we unblock.
   */
  -   info = xnsynch_sleep_on(server-mrecv, timeout, XN_RELATIVE);
  -   /*
  -* XNRMID cannot happen, since well, the current task would be the
  -* deleted object, so...
  -*/
  -   if (info  XNTIMEO) {
  -   err = -ETIMEDOUT;   /* Timeout. */
  -   goto unlock_and_exit;
  -   } else if (info  XNBREAK) {
  -   err = -EINTR;   /* Unblocked. */
  -   goto unlock_and_exit;
  -   }
  +   do {
  +   /*
  +* Wait on our receive slot for some client to enqueue
  +* itself in our send queue.
  +*/
  +   info = xnsynch_sleep_on(server-mrecv, timeout, XN_RELATIVE);
  +   /*
  +* XNRMID cannot happen, since well, the current task
  +* would be the deleted object, so...
  +*/
  +   if (info  XNTIMEO) {
  +   err = -ETIMEDOUT;   /* Timeout. */
  +   goto unlock_and_exit;
  +   }
  +   if (info  XNBREAK) {
  +   err = -EINTR;   /* Unblocked. */
  +   goto unlock_and_exit;
  +   }
  +   } while (!xnsynch_pended_p(server-mrecv));
   
  holder = getheadpq(xnsynch_wait_queue(server-msendq));
  /* There must be a valid holder since we waited for it. */
 
 Yes, it does help:
 
   -bash-3.2# ./oops_sender
   pre-rt_task_receive()
   rt_task_send() failed: -110 (Connection timed out)
   Killing child
 
 No more oops, thanks for your quick help.

Ok, thanks for reporting. Patch queued.

 
 Wolfgang.
 
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Philippe Gerum

On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
 Hi Philippe,
 
 enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
 in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
 xnsynch_release triggers for RT threads:
 
   XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
 
 RT threads do not balance their lock and unlock syscalls, so their
 counter goes wild quite quickly.
 
 But just limiting the bug check to XNOTHER threads is neither a
 solution. How to deal with the counter on scheduling policy changes?
 
 So my suggestion is to convert the auto-relax feature into a service,
 user space can request based on a counter that user space maintains
 independently. I.e. we should create another shared word that user space
 increments and decrements on lock acquisitions/releases on its own. The
 nucleus just tests it when deciding about the relax on return to user space.
 
 But before hacking into that direction, I'd like to hear if it makes
 sense to you.

At first glance, this does not seem to address the root issue. The
bottom line is that we should not have any thread release an owned lock
it does not hold, kthread or not.

In that respect, xnsynch_release() looks fishy because it may be called
over a context which is _not_ the lock owner, but the thread who is
deleting the lock owner, so assuming lastowner == current_thread when
releasing is wrong.

At the very least, the following patch would prevent
xnsynch_release_all_ownerships() to break badly. The same way, the
fastlock stuff does not track the owner properly in the synchro object.
We should fix those issues before going further, they may be related to
the bug described.

Totally, genuinely, 100% untested.

diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
index 3a53527..0785533 100644
--- a/ksrc/nucleus/synch.c
+++ b/ksrc/nucleus/synch.c
@@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, xnticks_t 
timeout,
 XN_NO_HANDLE, threadh);
 
if (likely(fastlock == XN_NO_HANDLE)) {
+   xnsynch_set_owner(synch, thread);
xnthread_inc_rescnt(thread);
xnthread_clear_info(thread,
XNRMID | XNTIMEO | XNBREAK);
@@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch)
 
XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
 
-   lastowner = xnpod_current_thread();
+   lastowner = synch-owner ?: xnpod_current_thread();
xnthread_dec_rescnt(lastowner);
XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
lastownerh = xnthread_handle(lastowner);

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Philippe Gerum

On Tue, 2011-05-31 at 18:38 +0200, Gilles Chanteperdrix wrote:
 On 05/31/2011 06:29 PM, Philippe Gerum wrote:
  On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
  Hi Philippe,
 
  enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
  in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
  xnsynch_release triggers for RT threads:
 
 XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
 
  RT threads do not balance their lock and unlock syscalls, so their
  counter goes wild quite quickly.
 
  But just limiting the bug check to XNOTHER threads is neither a
  solution. How to deal with the counter on scheduling policy changes?
 
  So my suggestion is to convert the auto-relax feature into a service,
  user space can request based on a counter that user space maintains
  independently. I.e. we should create another shared word that user space
  increments and decrements on lock acquisitions/releases on its own. The
  nucleus just tests it when deciding about the relax on return to user 
  space.
 
  But before hacking into that direction, I'd like to hear if it makes
  sense to you.
  
  At first glance, this does not seem to address the root issue. The
  bottom line is that we should not have any thread release an owned lock
  it does not hold, kthread or not.
  
  In that respect, xnsynch_release() looks fishy because it may be called
  over a context which is _not_ the lock owner, but the thread who is
  deleting the lock owner, so assuming lastowner == current_thread when
  releasing is wrong.
  
  At the very least, the following patch would prevent
  xnsynch_release_all_ownerships() to break badly. The same way, the
  fastlock stuff does not track the owner properly in the synchro object.
  We should fix those issues before going further, they may be related to
  the bug described.
 
 It looks to me like xnsynch_fast_release uses cmpxchg, so, will not set
 the owner to NULL if the current owner is not the thread releasing the
 mutex. Is it not sufficient?
 

Yes, we need to move that swap to the irq off section to clear the owner
there as well.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Philippe Gerum

On Tue, 2011-05-31 at 18:38 +0200, Jan Kiszka wrote:
 On 2011-05-31 18:29, Philippe Gerum wrote:
  On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
  Hi Philippe,
 
  enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
  in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
  xnsynch_release triggers for RT threads:
 
 XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
 
  RT threads do not balance their lock and unlock syscalls, so their
  counter goes wild quite quickly.
 
  But just limiting the bug check to XNOTHER threads is neither a
  solution. How to deal with the counter on scheduling policy changes?
 
  So my suggestion is to convert the auto-relax feature into a service,
  user space can request based on a counter that user space maintains
  independently. I.e. we should create another shared word that user space
  increments and decrements on lock acquisitions/releases on its own. The
  nucleus just tests it when deciding about the relax on return to user 
  space.
 
  But before hacking into that direction, I'd like to hear if it makes
  sense to you.
  
  At first glance, this does not seem to address the root issue. The
  bottom line is that we should not have any thread release an owned lock
  it does not hold, kthread or not.
  
  In that respect, xnsynch_release() looks fishy because it may be called
  over a context which is _not_ the lock owner, but the thread who is
  deleting the lock owner, so assuming lastowner == current_thread when
  releasing is wrong.
  
  At the very least, the following patch would prevent
  xnsynch_release_all_ownerships() to break badly. The same way, the
  fastlock stuff does not track the owner properly in the synchro object.
  We should fix those issues before going further, they may be related to
  the bug described.
  
  Totally, genuinely, 100% untested.
  
  diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
  index 3a53527..0785533 100644
  --- a/ksrc/nucleus/synch.c
  +++ b/ksrc/nucleus/synch.c
  @@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, 
  xnticks_t timeout,
   XN_NO_HANDLE, threadh);
   
  if (likely(fastlock == XN_NO_HANDLE)) {
  +   xnsynch_set_owner(synch, thread);
  xnthread_inc_rescnt(thread);
  xnthread_clear_info(thread,
  XNRMID | XNTIMEO | XNBREAK);
  @@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch)
   
  XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
   
  -   lastowner = xnpod_current_thread();
  +   lastowner = synch-owner ?: xnpod_current_thread();
  xnthread_dec_rescnt(lastowner);
  XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
  lastownerh = xnthread_handle(lastowner);
  
 
 That's maybe another problem, need to check.
 
 Back to the original issue: with fastlock, kernel space has absolutely
 no clue about how many locks user space may hold - unless someone is
 contending for all those locks. IOW, you can't reliably track resource
 ownership at kernel level without user space help out. The current way
 it helps (enforced syscalls of XNOTHER threads) is insufficient.

The thing is: we don't care about knowing how many locks some
non-current thread owns. What the nucleus wants to know is whether the
_current user-space_ thread owns a lock, which is enough for the
autorelax management. This restricted scope makes the logic fine.

The existing resource counter is by no mean a resource tracking tool
that could be used from whatever context to query the number of locks an
arbitrary thread holds, it has not been intended that way at all. It
only answers the simple question: do I hold any lock, as an XNOTHER
thread.

 
 Alternatively to plain counting of ownership in user space, we could
 adopt mainline's robust mutex mechanism (a user space maintained list)
 that solves the release-all-ownerships issue. But I haven't looked into
 details yet.
 

Would be nice, but still overkill for the purpose of autorelax
management.

 Jan
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] I suspect the shared memory of Xenomai has bug.

2011-05-19 Thread Philippe Gerum

On Thu, 2011-05-19 at 15:37 +0800, arethe.rtai wrote:
  
 I solved the problem of the shared memory cannot be allocated in the
 user space. 
 Because the shared memory space are allocated from the kheap, but the
 kheap is not a mapped heap, i.e. its pages are not reserved.
 I init the kheap by xnheap_init_mapped rather than xnheap_init. The
 problem is solved.
  

You have just turned the global system heap to a shared heap, which is
badly wrong. If anywhere, the issue is in create_new_heap(), or in the
_compat_shm_alloc() interface in userland, or a combination of both.

As Gilles told you already, such a 100% reproducible allocation/mapping
issue can not be a generic one, involving the core heap system,
otherwise no skin would ever work. We do depend on the system heap
internally, for almost everything in the system. It is much more likely
a local RTAI skin bug, because this code has bit rot over time, due to
lack of interest and users.

PS: Please keep the list CCed.

 Qin Chenggang
  
 2011-05-19 
 
 __
 arethe.rtai 
 
 __
 发件人： Philippe Gerum 
 发送时间： 2011-05-13  14:50:06 
 收件人： arethe rtai 
 抄送： Xenomai-core 
 主题： Re: [Xenomai-core] I suspect the shared memory of Xenomai has
 bug. 
 
 On Fri, 2011-05-13 at 09:25 +0800, arethe rtai wrote:
  HI all:
 I always got null while I request a shared memory through RTAI
  skins in the user space.
 Some bugs maybe exist in the sub-system. I traced the execution
  stream of rt_shm_alloc, and found the mmap() operation always return
  -22. I suspect the problem should be same while we use other skins,
  because the mmap operation is implemented in /ksrc/nucleus/heap.c.
 Is there anyone encountered this problem?
 
 No. The fact is that the RTAI skin has not been actively maintained for
 years now, so a local bug there is possible. Due to the lack of users
 and interest, this skin was removed from the upcoming 2.6.x series.
 
  Regards Arethe.
  
  ___
  Xenomai-core mailing list
  Xenomai-core@gna.org
  https://mail.gna.org/listinfo/xenomai-core
 
 -- 
 Philippe.
 
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

[Xenomai-core] [RFC] Getting rid of the NMI latency watchdog

2011-05-19 Thread Philippe Gerum


The NMI latency watchdog is a feature Xenomai supports when proper
hardware is available, which triggers a stack backtrace dump, then
panics when a real-time timer tick is late by a given amount of time. We
used it in the early times to chase pathological latencies, particularly
when debugging the original SMP port.

We currently have two architectures supporting that watchdog, namely x86
and blackfin. x86-wise, the rebasing of the NMI support in mainline over
the perf sub-system just obsoleted our NMI hijacking badly, making it
unusable since 2.6.38.

As I was diving in our NMI support code to adapt it once again for
2.6.38 - with a vague feeling of seasickness coming - I felt maybe time
has come to question the very presence of that feature in our code base:

- NMI watchdog predated the latency tracer. AFAIC, I stopped using the
former long ago, preferring the latter for debugging latency issues.

- the non-maskable nature of the interrupt trigger does not help us
nowadays compared to using the I-pipe tracer: the mainline NMI support
would catch hard lockups with irqs off and panic the same way, and the
tracer would help spotting the issue with a much finer level of detail
in case the latency spot leaves the machine in a sane state, Ie. when
the board remains usable and allows for inspection of /proc/ipipe/trace
files.

- hijacking the mainline NMI code the way we do has always been a
massive pain on x86, prone to trigger conflicts with later kernel
releases.

For this reason, I'm considering issuing a patch for a complete removal
of the NMI latency watchdog code in Xenomai 2.6.x, disabling the feature
for 2.6.38 kernels and above in 2.5.x.

Comments welcome.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [RFC] Getting rid of the NMI latency watchdog

2011-05-19 Thread Philippe Gerum

On Thu, 2011-05-19 at 20:36 +0200, Jan Kiszka wrote:
 On 2011-05-19 20:15, Gilles Chanteperdrix wrote:
  On 05/19/2011 03:58 PM, Philippe Gerum wrote:
  For this reason, I'm considering issuing a patch for a complete removal
  of the NMI latency watchdog code in Xenomai 2.6.x, disabling the feature
  for 2.6.38 kernels and above in 2.5.x.
 
  Comments welcome.
  
  I am in the same case as you: I no longer use Xeno's NMI watchdog, so I
  agree to get rid of it.
 
 Yeah. The last time we wanted to use it get more information about a
 hard hang, the CPU we used was not supported.
 
 Philippe, did you test the Linux watchdog already, if it generate proper
 results on artificial Xenomai lockups on a single core?

This works provided we tell the pipeline to enter printk-sync mode when
the watchdog kicks. So I'd say that we could probably do a better job in
making the pipeline core smarter wrt NMI watchdog context handling than
asking Xenomai to dup the mainline code for having its own NMI handling.

 
 Jan
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] I suspect the shared memory of Xenomai has bug.

2011-05-13 Thread Philippe Gerum

On Fri, 2011-05-13 at 09:25 +0800, arethe rtai wrote:
 HI all:
I always got null while I request a shared memory through RTAI
 skins in the user space.
Some bugs maybe exist in the sub-system. I traced the execution
 stream of rt_shm_alloc, and found the mmap() operation always return
 -22. I suspect the problem should be same while we use other skins,
 because the mmap operation is implemented in /ksrc/nucleus/heap.c.
Is there anyone encountered this problem?

No. The fact is that the RTAI skin has not been actively maintained for
years now, so a local bug there is possible. Due to the lack of users
and interest, this skin was removed from the upcoming 2.6.x series.

 Regards Arethe.
 
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Integration of new drivers in xenomai

2011-05-10 Thread Philippe Gerum

On Tue, 2011-05-10 at 10:42 +0200, julien.dela...@esa.int wrote:
 Dear all,
 
 For my work, I had to develop drivers for the 6052 and 6701 boards from 
 National Instruments. These boards are already supported by the Comedi 
 drivers but were not supported by Xenomai yet. So, I took the code from 
 Comedi and adapt it to Xenomai with the analogy layer (a4l* functions and 
 so on). It introduces two new drivers : analogy_ni_670x and 
 analogy_ni_660x.
 
 At this time, the driver compiles and loads correctly. I will have the 
 hardware in ten days to test the code and make sure it works from a 
 functional point of view.
 
 In order to contribute to Xenomai, I would like to know if this code could 
 be integrated in Xenomai repository. In particular, what are the 
 conditions to integrate third-party code and especially driver code.

Simple and straightforward:

- free software license compatible with the linux kernel licensing terms
- proper credits and copyrights retained from the original code
- the code should solve a problem, instead of introducing it
- standard linux kernel coding style

  Then, 
 if it seems interesting for you, I will submit a patch as soon as I 
 checked that it works correctly with the physical boards.
 

Generally speaking, any sound contribution is welcome. Technically, Alex
has the final cut for Analogy stuff.

 Thanks for any suggestion,
 
 Best regards,
 
 
 
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] SWITCH TASK TO SECONDARY MODE DURING GDB SESSION DOESN'T WORK

2011-05-10 Thread Philippe Gerum

On Tue, 2011-05-10 at 18:01 +0200, Roberto Bielli wrote:
 Hi,
 
 i try the next C code in a gdb session on a ARM target with brakpoints 
 and i found a strange behaviour.
 I expected that the task 'tsk1' never execute instead if i put a 
 breakpoint on the instruction 'varInt += 1;' i see that it's executed.
 I sospect that there is this behavior:
 
 1. i start gdb with application. The application is not running.
 2. i enable the breakpoint on the instruction 'err = 
 rt_task_start(tsk1, test_tsk1, NULL );'
 3. i enable the breakpoint on the instruction 'varInt += 1;' in tsk1
 4. run the application that stop on the instruction 'err = 
 rt_task_start(tsk1, test_tsk1, NULL );'
 so the task is in secondary mode for the breakpoint hit if i understand 
 correctly .
 5. i make a single step and do the rt_task_start in main, so i see that 
 i break in the task on the instruction 'varInt += 1;' in tsk1
 this is strange because main has shadow and has a priority equal o 99, 
 instead  tsk1 has a priority equal to 49.
 6, the program execute the instruction  'varInt += 1;' and then return 
 to main.
 7 then the main has always the control.
 
 The question is:
 Why the priority is not respected ? I think this. Maybe the main is in 
 secondary mode for the breakpoint and when make a rt_task_create the new 
 task is in primary
 mode for the initial instruction. Then receive the signal ( verified 
 with rt_task_set_mode(0, T_WARNSW, NULL); ) and switch to secondary mode 
 but initially the task tsk is in primary mode and execute before main.
 
 Is a known problem ?
 

This is a known restriction imposed on us by the dual kernel design.

- gdb means ptrace(), ptrace() means lost of linux signals
- receiving a linux signal in primary mode causes a switch to secondary,
so that we can handle it safely from a sane linux context
- receiving a linux signal in secondary mode prevents the root priority
to be boosted (no PIP), so that lengthy kernel code handling lethal
signals is not stealing the CPU away from lively real-time tasks.

In short, gdb will surely break the expected priority order because it
depends on ptrace(), and ptrace() is making heavy use of linux signals.
Only explicit synchronization between threads (sems, mutexes, whatever)
can still guarantee proper serialization in this context.


 Best Regards
 
 
 
 CODE-
 
 
 int varInt = 0;
 RT_TASK tsk1, tskMain;
 
 int main (int argc, char *argv[])
 {
  int err;
  mlockall(MCL_CURRENT|MCL_FUTURE);
 
  rt_task_shadow(tskMain, main, 99, 0);
 
  err  = rt_task_create( tsk1, tsk1, 0, 49, T_FPU );
  err = rt_task_start(tsk1, test_tsk1, NULL );
 
  for (;;)
  {
  if( varInt  1 )
  break;
  }
 
 
  printf(Task started\n);
 
  return 0;
 }
 
 void test_tsk1( void * args )
 {
 
  for( ;;)
  {
  varInt += 1;
  rt_task_sleep( 1 );
  }
 }
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [PowerPC]Badness at mmu_context_nohash

2011-04-30 Thread Philippe Gerum

On Fri, 2011-04-29 at 18:08 +0200, Jean-Michel Hautbois wrote:
 2011/4/29 Philippe Gerum r...@xenomai.org:
  On Thu, 2011-04-28 at 10:33 +0200, Jean-Michel Hautbois wrote:
  2011/4/27 Philippe Gerum r...@xenomai.org:
   On Wed, 2011-04-27 at 20:42 +0200, Jean-Michel Hautbois wrote:
   Hi list,
  
   I am currently using a Xenomai port on a linux 2.6.35.11 linux kernel
   and the adeos-ipipe-2.6.35.7-powerpc-2.12-01.patch.
   I am facing a scheduling issue on a P2020 (dual core PowerPC), and I
   get the following message :
  
   Badness at arch/powerpc/mm/mmu_context_nohash.c:209
   NIP: c0018d20 LR: c039b94c CTR: c00343e4
   REGS: ecfadce0 TRAP: 0700   Tainted: GW(2.6.35.11)
   MSR: 00021000 ME,CE  CR: 24000488  XER: 
   TASK = ec5220d0[496] 'sipaq' THREAD: ecfac000 CPU: 1
   GPR00: 0001 ecfadd90 ec5220d0 ec5df340 ec58a700   
   0003
   GPR08: c04a2d98 0007 c04a2d98 0067e000 0002f385 1007f1f8 c04a5b40 
   ecfac040
   GPR16: c04a5b40 c04deb80 c04a2120 c04a2d98 c04a5b40 c04d008c ecfac000 
   00029000
   GPR24: c04d c04d1e6c 0001 ec58a700 eceaf390 c04d1e78 c0b23b40 
   ec5df340
   NIP [c0018d20] switch_mmu_context+0x80/0x438
   LR [c039b94c] schedule+0x774/0x7dc
   Call Trace:
   [ecfadd90] [44000484] 0x44000484 (unreliable)
   [ecfadde0] [c039b94c] schedule+0x774/0x7dc
   [ecfade50] [c039cb98] do_nanosleep+0xc8/0x114
   [ecfade80] [c0059bf8] hrtimer_nanosleep+0xd8/0x158
   [ecfadf10] [c0059d48] sys_nanosleep+0xd0/0xd4
   [ecfadf40] [c0013c0c] ret_from_syscall+0x0/0x3c
   --- Exception: c01 at 0xffa6cc4
  LR = 0xffa6cb0
   Instruction dump:
   40a2fff0 4c00012c 2f80 409e0128 813b018c 2f83 39290001 913b018c
   419e0020 8003018c 7c34 5400d97e 0f00 8123018c 3929 
   9123018c
  
   Do you have a clue on how to start debugging it ?
  
   Yes, but that can't be easily summarized here. In short, we have a
   serious problem with the sharing of the MMU context between the Linux
   and Xenomai schedulers in the SMP case on powerpc.
 
  OK, good to know that it is a known issue. If there is a thread with
  some thoughts about it, I am interested ;).
 
   It is happening quite randomly... :).
  
   Does disabling CONFIG_XENO_HW_UNLOCKED_SWITCH clear this issue?
  
 
  Well, yes and no. It starts well, but when booting the kernel I get :
 
 
  The mm switch issue was specifically addressed by this patch, which is
  part of 2.12-01:
  http://git.denx.de/?p=ipipe-2.6.git;a=commit;h=c14a47630d62d0328de1957636dceb1d498f7048
 
  However, it the last 2.6.35 patch issued was based on 2.6.35.7, not
  2.6.35.11, so there is still the possibility that something went wrong
  while you forward ported this code.
 
  - Please check that mmu_context_nohash.c does contain the fix above as
  it should
 
 It is ok, I have the fix.

Does 2.6.35.7-2.12-02 exhibit the issue as well?

 
  - Please try Richard's suggestion, i.e. moving to 2.6.36, which may give
  us more hints.
 
 It is better. I don't have the badness on mmu context anymore.
 This gives some hints ;).
 

Yes and no. The mmu management code involved was untouched between
2.6.35 and 2.6.36, so I still don't get why this activity counter gets
trashed yet.

  Badness at kernel/lockdep.c:2327
  NIP: c006e554 LR: c006e53c CTR: 000186a0
 
  Adeos sometimes conflicts with the vanilla IRQ state tracer. I'll have a
  look at this. Disable CONFIG_TRACE_IRQFLAGS.
 
 Yes, but I *want* to have the CONFIG_TRACE_IRQFLAGS on. I just wanted
 to tell that I had the problem, in order to be sure it is known ;).
 

Sure, but one issue at a time.

 JM

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [PowerPC]Badness at mmu_context_nohash

2011-04-29 Thread Philippe Gerum

On Thu, 2011-04-28 at 10:33 +0200, Jean-Michel Hautbois wrote:
 2011/4/27 Philippe Gerum r...@xenomai.org:
  On Wed, 2011-04-27 at 20:42 +0200, Jean-Michel Hautbois wrote:
  Hi list,
 
  I am currently using a Xenomai port on a linux 2.6.35.11 linux kernel
  and the adeos-ipipe-2.6.35.7-powerpc-2.12-01.patch.
  I am facing a scheduling issue on a P2020 (dual core PowerPC), and I
  get the following message :
 
  Badness at arch/powerpc/mm/mmu_context_nohash.c:209
  NIP: c0018d20 LR: c039b94c CTR: c00343e4
  REGS: ecfadce0 TRAP: 0700   Tainted: GW(2.6.35.11)
  MSR: 00021000 ME,CE  CR: 24000488  XER: 
  TASK = ec5220d0[496] 'sipaq' THREAD: ecfac000 CPU: 1
  GPR00: 0001 ecfadd90 ec5220d0 ec5df340 ec58a700   
  0003
  GPR08: c04a2d98 0007 c04a2d98 0067e000 0002f385 1007f1f8 c04a5b40 
  ecfac040
  GPR16: c04a5b40 c04deb80 c04a2120 c04a2d98 c04a5b40 c04d008c ecfac000 
  00029000
  GPR24: c04d c04d1e6c 0001 ec58a700 eceaf390 c04d1e78 c0b23b40 
  ec5df340
  NIP [c0018d20] switch_mmu_context+0x80/0x438
  LR [c039b94c] schedule+0x774/0x7dc
  Call Trace:
  [ecfadd90] [44000484] 0x44000484 (unreliable)
  [ecfadde0] [c039b94c] schedule+0x774/0x7dc
  [ecfade50] [c039cb98] do_nanosleep+0xc8/0x114
  [ecfade80] [c0059bf8] hrtimer_nanosleep+0xd8/0x158
  [ecfadf10] [c0059d48] sys_nanosleep+0xd0/0xd4
  [ecfadf40] [c0013c0c] ret_from_syscall+0x0/0x3c
  --- Exception: c01 at 0xffa6cc4
 LR = 0xffa6cb0
  Instruction dump:
  40a2fff0 4c00012c 2f80 409e0128 813b018c 2f83 39290001 913b018c
  419e0020 8003018c 7c34 5400d97e 0f00 8123018c 3929 9123018c
 
  Do you have a clue on how to start debugging it ?
 
  Yes, but that can't be easily summarized here. In short, we have a
  serious problem with the sharing of the MMU context between the Linux
  and Xenomai schedulers in the SMP case on powerpc.
 
 OK, good to know that it is a known issue. If there is a thread with
 some thoughts about it, I am interested ;).
 
  It is happening quite randomly... :).
 
  Does disabling CONFIG_XENO_HW_UNLOCKED_SWITCH clear this issue?
 
 
 Well, yes and no. It starts well, but when booting the kernel I get :


The mm switch issue was specifically addressed by this patch, which is
part of 2.12-01:
http://git.denx.de/?p=ipipe-2.6.git;a=commit;h=c14a47630d62d0328de1957636dceb1d498f7048

However, it the last 2.6.35 patch issued was based on 2.6.35.7, not
2.6.35.11, so there is still the possibility that something went wrong
while you forward ported this code.

- Please check that mmu_context_nohash.c does contain the fix above as
it should
- Please try Richard's suggestion, i.e. moving to 2.6.36, which may give
us more hints.

 Badness at kernel/lockdep.c:2327
 NIP: c006e554 LR: c006e53c CTR: 000186a0

Adeos sometimes conflicts with the vanilla IRQ state tracer. I'll have a
look at this. Disable CONFIG_TRACE_IRQFLAGS.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [PowerPC]Badness at mmu_context_nohash

2011-04-27 Thread Philippe Gerum

On Wed, 2011-04-27 at 20:42 +0200, Jean-Michel Hautbois wrote:
 Hi list,
 
 I am currently using a Xenomai port on a linux 2.6.35.11 linux kernel
 and the adeos-ipipe-2.6.35.7-powerpc-2.12-01.patch.
 I am facing a scheduling issue on a P2020 (dual core PowerPC), and I
 get the following message :
 
 Badness at arch/powerpc/mm/mmu_context_nohash.c:209
 NIP: c0018d20 LR: c039b94c CTR: c00343e4
 REGS: ecfadce0 TRAP: 0700   Tainted: GW(2.6.35.11)
 MSR: 00021000 ME,CE  CR: 24000488  XER: 
 TASK = ec5220d0[496] 'sipaq' THREAD: ecfac000 CPU: 1
 GPR00: 0001 ecfadd90 ec5220d0 ec5df340 ec58a700   0003
 GPR08: c04a2d98 0007 c04a2d98 0067e000 0002f385 1007f1f8 c04a5b40 ecfac040
 GPR16: c04a5b40 c04deb80 c04a2120 c04a2d98 c04a5b40 c04d008c ecfac000 00029000
 GPR24: c04d c04d1e6c 0001 ec58a700 eceaf390 c04d1e78 c0b23b40 ec5df340
 NIP [c0018d20] switch_mmu_context+0x80/0x438
 LR [c039b94c] schedule+0x774/0x7dc
 Call Trace:
 [ecfadd90] [44000484] 0x44000484 (unreliable)
 [ecfadde0] [c039b94c] schedule+0x774/0x7dc
 [ecfade50] [c039cb98] do_nanosleep+0xc8/0x114
 [ecfade80] [c0059bf8] hrtimer_nanosleep+0xd8/0x158
 [ecfadf10] [c0059d48] sys_nanosleep+0xd0/0xd4
 [ecfadf40] [c0013c0c] ret_from_syscall+0x0/0x3c
 --- Exception: c01 at 0xffa6cc4
LR = 0xffa6cb0
 Instruction dump:
 40a2fff0 4c00012c 2f80 409e0128 813b018c 2f83 39290001 913b018c
 419e0020 8003018c 7c34 5400d97e 0f00 8123018c 3929 9123018c
 
 Do you have a clue on how to start debugging it ?

Yes, but that can't be easily summarized here. In short, we have a
serious problem with the sharing of the MMU context between the Linux
and Xenomai schedulers in the SMP case on powerpc.

 It is happening quite randomly... :).

Does disabling CONFIG_XENO_HW_UNLOCKED_SWITCH clear this issue?

 
 Thanks in advance !
 JM
 
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] kernel threads crash

2011-04-19 Thread Philippe Gerum

On Tue, 2011-04-19 at 09:26 +0200, Jesper Christensen wrote:
 If i run switchtest i get the following output:
 


If still talking about the cpci6200, this patch should apply:
http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=3d6fa118ef282c60dfeb0e690a579e8357bb7d13

 [root@slot6 /bin]# switchtest
 == Testing FPU check routines...
 r0: 1 != 2
 r1: 1 != 2
 r2: 1 != 2
 r3: 1 != 2
 r4: 1 != 2
 r5: 1 != 2
 r6: 1 != 2
 r7: 1 != 2
 r8: 1 != 2
 r9: 1 != 2
 r10: 1 != 2
 r11: 1 != 2
 r12: 1 != 2
 r13: 1 != 2
 r14: 1 != 2
 r15: 1 != 2
 r16: 1 != 2
 r17: 1 != 2
 r18: 1 != 2
 r19: 1 != 2
 r20: 1 != 2
 r21: 1 != 2
 r22: 1 != 2
 r23: 1 != 2
 r24: 1 != 2
 r25: 1 != 2
 r26: 1 != 2
 r27: 1 != 2
 r28: 1 != 2
 r29: 1 != 2
 r30: 1 != 2
 r31: 1 != 2
 == FPU check routines: OK.
 == Threads: sleeper_ufps0-0 rtk0-1 rtk0-2 rtk_fp0-3 rtk_fp0-4
 rtk_fp_ufpp0-5 rtk_fp_ufpp0-6 rtup0-7 rtup0-8 rtup_ufpp0-9 rtup_ufpp0-10
 rtus0-11 rtus0-12 rtus_ufps0-13 rtus_ufps0-14 rtuo0-15 rtuo0-16
 rtuo_ufpp0-17 rtuo_ufpp0-18 rtuo_ufps0-19 rtuo_ufps0-20
 rtuo_ufpp_ufps0-21 rtuo_ufpp_ufps0-22
 
 
 
 And then it halts. dmesg shows:
 
 Xenomai: suspending kernel thread ae819678 ('rtk5/0') at nip=0x80319aa0,
 lr=0x80319a70, r1=0xafa90510 after exception #1792
 
 
 switchtest -n runs normally, should i use some sort of soft float flag
 in my compilations?
 
 /Jesper
 
 
 
 
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] kernel threads crash

2011-04-19 Thread Philippe Gerum

On Tue, 2011-04-19 at 09:58 +0200, Jesper Christensen wrote:
 Great thanks, but i can't help wondering if the problems i'm seeing are
 related to some of my userspace programs using fp.

I don't think so. The switchtest programs exercises the FPU hardware in
a certain way to make sure it is available in real-time mode from kernel
space (which is an utterly crappy legacy, but we will have to deal with
it until Xenomai 3.x). As far as I can see from your .config, you can't
have such support, so switchtest was basically trying to test an
inexistent feature.

 
 /Jesper
 
 
 On 2011-04-19 09:39, Philippe Gerum wrote:
  On Tue, 2011-04-19 at 09:26 +0200, Jesper Christensen wrote:

  If i run switchtest i get the following output:
 
  
 
  If still talking about the cpci6200, this patch should apply:
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=3d6fa118ef282c60dfeb0e690a579e8357bb7d13
 

  [root@slot6 /bin]# switchtest
  == Testing FPU check routines...
  r0: 1 != 2
  r1: 1 != 2
  r2: 1 != 2
  r3: 1 != 2
  r4: 1 != 2
  r5: 1 != 2
  r6: 1 != 2
  r7: 1 != 2
  r8: 1 != 2
  r9: 1 != 2
  r10: 1 != 2
  r11: 1 != 2
  r12: 1 != 2
  r13: 1 != 2
  r14: 1 != 2
  r15: 1 != 2
  r16: 1 != 2
  r17: 1 != 2
  r18: 1 != 2
  r19: 1 != 2
  r20: 1 != 2
  r21: 1 != 2
  r22: 1 != 2
  r23: 1 != 2
  r24: 1 != 2
  r25: 1 != 2
  r26: 1 != 2
  r27: 1 != 2
  r28: 1 != 2
  r29: 1 != 2
  r30: 1 != 2
  r31: 1 != 2
  == FPU check routines: OK.
  == Threads: sleeper_ufps0-0 rtk0-1 rtk0-2 rtk_fp0-3 rtk_fp0-4
  rtk_fp_ufpp0-5 rtk_fp_ufpp0-6 rtup0-7 rtup0-8 rtup_ufpp0-9 rtup_ufpp0-10
  rtus0-11 rtus0-12 rtus_ufps0-13 rtus_ufps0-14 rtuo0-15 rtuo0-16
  rtuo_ufpp0-17 rtuo_ufpp0-18 rtuo_ufps0-19 rtuo_ufps0-20
  rtuo_ufpp_ufps0-21 rtuo_ufpp_ufps0-22
 
 
 
  And then it halts. dmesg shows:
 
  Xenomai: suspending kernel thread ae819678 ('rtk5/0') at nip=0x80319aa0,
  lr=0x80319a70, r1=0xafa90510 after exception #1792
 
 
  switchtest -n runs normally, should i use some sort of soft float flag
  in my compilations?
 
  /Jesper
 
 
 
 
  ___
  Xenomai-core mailing list
  Xenomai-core@gna.org
  https://mail.gna.org/listinfo/xenomai-core
  

 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] kernel threads crash

2011-04-19 Thread Philippe Gerum

On Tue, 2011-04-19 at 10:42 +0200, Gilles Chanteperdrix wrote:
 Philippe Gerum wrote:
  On Tue, 2011-04-19 at 09:58 +0200, Jesper Christensen wrote:
  Great thanks, but i can't help wondering if the problems i'm seeing are
  related to some of my userspace programs using fp.
  
  I don't think so. The switchtest programs exercises the FPU hardware in
  a certain way to make sure it is available in real-time mode from kernel
  space (which is an utterly crappy legacy, but we will have to deal with
  it until Xenomai 3.x). As far as I can see from your .config, you can't
  have such support, so switchtest was basically trying to test an
  inexistent feature.
 
 In fact, switchtest whether Xenomai FPU switch routines work when the
 Linux kernel itself uses FPU in kernel-space. Currently, the only place
 when this happens is in the RAID code: x86 uses mmx/sse, and some power
 pcs use altivec. Some powerpc also fix unaligned accesses to floating
 point data in kernel-space, I do not know if this may interfere, which
 is why the powerpc code is compiled even without RAID.
 
 

AFAICS, fp_regs_set() on ppc is issuing a load float instruction in
kernel space which could be unaligned, and therefore trap. Looking at
the .config for the target system, hw FPU support is disabled in the
alignment code, so basically, this would beget a nop.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] kernel threads crash

2011-04-19 Thread Philippe Gerum

On Tue, 2011-04-19 at 11:29 +0200, Philippe Gerum wrote:
 On Tue, 2011-04-19 at 10:42 +0200, Gilles Chanteperdrix wrote:
  Philippe Gerum wrote:
   On Tue, 2011-04-19 at 09:58 +0200, Jesper Christensen wrote:
   Great thanks, but i can't help wondering if the problems i'm seeing are
   related to some of my userspace programs using fp.
   
   I don't think so. The switchtest programs exercises the FPU hardware in
   a certain way to make sure it is available in real-time mode from kernel
   space (which is an utterly crappy legacy, but we will have to deal with
   it until Xenomai 3.x). As far as I can see from your .config, you can't
   have such support, so switchtest was basically trying to test an
   inexistent feature.
  
  In fact, switchtest whether Xenomai FPU switch routines work when the
  Linux kernel itself uses FPU in kernel-space. Currently, the only place
  when this happens is in the RAID code: x86 uses mmx/sse, and some power
  pcs use altivec. Some powerpc also fix unaligned accesses to floating
  point data in kernel-space, I do not know if this may interfere, which
  is why the powerpc code is compiled even without RAID.
  
  
 
 AFAICS, fp_regs_set() on ppc is issuing a load float instruction in
 kernel space which could be unaligned, and therefore trap. Looking at
 the .config for the target system, hw FPU support is disabled in the
 alignment code, so basically, this would beget a nop.

A nop in fixing the issue, I mean.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] kernel threads crash - possible race condition?

2011-04-14 Thread Philippe Gerum

On Thu, 2011-04-14 at 15:46 +0200, Jesper Christensen wrote:
 Actually i have been running with CONFIG_XENO_HW_UNLOCKED_SWITCH the
 whole time

You mean enabled?

  and i also raised the stack size from 4k to 8k. I do however
 think there could be some fishyness in entry_32.S. In
 transfer_to_handler SPRN_SPRG3 is used to check for stack overflow (at
 least in my kernel 2.6.29.6), but i must admit i haven't seen any of
 that in the kernel log.
 

Mmm, you are right. In any case, what we want with the unmasked switch
feature is to allow interrupts while we flush the tlb and set the new mm
context, which may be lengthy on some low end platforms. Allowing the
switch code to be preempted during the register swap is of no use wrt
latency.

Do you have a patch at hand which you could post that flips MSR_EE in
rthal_thread_switch already?

 /Jesper
 
 
 On 2011-04-14 15:31, Philippe Gerum wrote:
  On Thu, 2011-04-14 at 15:04 +0200, Jesper Christensen wrote:

  I wrote about some problems concerning stack corruption when running
  xenomai on ppc. I have found out that if i disable hardware interrupts
  while running rthal_thread_switch the problem seems to dissapear
  somewhat. I saw a crash yesterday after running for 3 hours, and i'm
  currently running a test (has been running for 3 hours). Usually it
  would fail after 30-40 minutes. My question is: could there be a problem
  if we receive an interrupt between updating the stack pointer and the
  sprg3 register with the new thread pointer?
 
  
  Normally, there should not be any issue (famous last words), since we
  would run Xenomai-only code over the preempted context, and we don't
  depend on SPRG3 to fetch the current phys address. In fact, at this
  stage we simply don't care about the linux context, only referring to
  the current Xenomai thread, which is obtained differently.
 
  Try switching off CONFIG_XENO_HW_UNLOCKED_SWITCH, in the machine
  config area, if this ends up being rock-solid, then this would be a hint
  that something may be fishy in this area. Raising your k-thread stack
  sizes in a separate test may be interesting to check too, if not already
  done.
 
 

  /Jesper
 
 
 
  ___
  Xenomai-core mailing list
  Xenomai-core@gna.org
  https://mail.gna.org/listinfo/xenomai-core
  

 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] kernel threads crash

2011-04-11 Thread Philippe Gerum

On Mon, 2011-04-11 at 16:18 +0200, Philippe Gerum wrote:
 On Mon, 2011-04-11 at 16:13 +0200, Jesper Christensen wrote:
  I have updated to xenomai 2.5.6, but i'm still seeing exceptions
  (considerably less often though):
  
  Xenomai: suspending kernel thread b92a39d0 ('tt_upgw_0') at 0xb92a39d0
  after exception #1792
 
 You should build your code statically into the kernel, not as a module,
 and find out which code raises the MCE.

It's a program check exception, not a machine check, but the rest
remains applicable.

 
 CONFIG_DEBUG_INFO=y, then objdump -dl vmlinux, looking for the NIP
 mentioned.
 
  
  /Jesper
  
  
  On 2011-04-08 15:12, Philippe Gerum wrote:
   On Fri, 2011-04-08 at 14:58 +0200, Jesper Christensen wrote:
 
   Hi
  
   I'm trying to implement some gateway functionality in the kernel on a
   emerson CPCI6200 board, but have run into some strange errors. The
   kernel module is made up of two threads that run every 1 ms. I have also
   made use of the rtpc dispatcher in rtnet to dispatch control messages
   from a netlink socket to the RT part of my kernel module.
  
   The problem is that when loaded the threads get suspended due to 
   exceptions:
  
   Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0xb929cbc0
   after exception #1792
  
   or
  
   Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0x0 after
   exception #1025
  
   or
  
   Xenomai: suspending kernel thread b911f518 ('rtnet-rtpc') at 0xb911f940
   after exception #1792
  
  
   I have ported the gianfar driver from linux to rtnet.
  
   The versions and hardware are listed below. The errors are most likely
   due to faulty software on my part, but i would like to ask if there are
   any known issues with the versions or hardware i'm using. I would also
   like to ask if there are any ways of further debugging the errors as i
   am not getting very far with the above messages.
   
   A severe bug at kthread init was fixed in the 2.5.5.2 - 2.5.6 timeframe,
   which would cause exactly the kind of weird behavior you are seeing
   right now. The bug triggered random code execution due to stack memory
   pollution at init on powerpc for Xenomai kthreads:
   http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=90699565cbce41f2cec193d57857bb5817efc19a
   http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=da20c20d4b4d892d40c657ad1d32ddb6d0ceb47c
   http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=a5886b354dc18f054b187b58cfbacfb60bccaf47
  
   You need at the very least those three patches (from the top of my
   head), but it would be much better to upgrade to 2.5.6.
  
 
  
  
   System info:
  
   Linux kernel: 2.6.29.6
   i-pipe version: 2.7-04
   processor: powerpc mpc8572
   xenomai version: 2.5.3
   rtnet version: 0.9.12
  
   
 
  
  
  ___
  Xenomai-core mailing list
  Xenomai-core@gna.org
  https://mail.gna.org/listinfo/xenomai-core
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] kernel threads crash

2011-04-11 Thread Philippe Gerum

On Mon, 2011-04-11 at 16:20 +0200, Jesper Christensen wrote:
 Problem is the NIP in question is the address of the thread structure as
 seen in the error message.

LR?

 
 /Jesper
 
 
 On 2011-04-11 16:18, Philippe Gerum wrote:
  On Mon, 2011-04-11 at 16:13 +0200, Jesper Christensen wrote:

  I have updated to xenomai 2.5.6, but i'm still seeing exceptions
  (considerably less often though):
 
  Xenomai: suspending kernel thread b92a39d0 ('tt_upgw_0') at 0xb92a39d0
  after exception #1792
  
  You should build your code statically into the kernel, not as a module,
  and find out which code raises the MCE.
 
  CONFIG_DEBUG_INFO=y, then objdump -dl vmlinux, looking for the NIP
  mentioned.
 

  /Jesper
 
 
  On 2011-04-08 15:12, Philippe Gerum wrote:
  
  On Fri, 2011-04-08 at 14:58 +0200, Jesper Christensen wrote:


  Hi
 
  I'm trying to implement some gateway functionality in the kernel on a
  emerson CPCI6200 board, but have run into some strange errors. The
  kernel module is made up of two threads that run every 1 ms. I have also
  made use of the rtpc dispatcher in rtnet to dispatch control messages
  from a netlink socket to the RT part of my kernel module.
 
  The problem is that when loaded the threads get suspended due to 
  exceptions:
 
  Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0xb929cbc0
  after exception #1792
 
  or
 
  Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0x0 after
  exception #1025
 
  or
 
  Xenomai: suspending kernel thread b911f518 ('rtnet-rtpc') at 0xb911f940
  after exception #1792
 
 
  I have ported the gianfar driver from linux to rtnet.
 
  The versions and hardware are listed below. The errors are most likely
  due to faulty software on my part, but i would like to ask if there are
  any known issues with the versions or hardware i'm using. I would also
  like to ask if there are any ways of further debugging the errors as i
  am not getting very far with the above messages.
  
  
  A severe bug at kthread init was fixed in the 2.5.5.2 - 2.5.6 timeframe,
  which would cause exactly the kind of weird behavior you are seeing
  right now. The bug triggered random code execution due to stack memory
  pollution at init on powerpc for Xenomai kthreads:
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=90699565cbce41f2cec193d57857bb5817efc19a
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=da20c20d4b4d892d40c657ad1d32ddb6d0ceb47c
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=a5886b354dc18f054b187b58cfbacfb60bccaf47
 
  You need at the very least those three patches (from the top of my
  head), but it would be much better to upgrade to 2.5.6.
 


 
  System info:
 
  Linux kernel: 2.6.29.6
  i-pipe version: 2.7-04
  processor: powerpc mpc8572
  xenomai version: 2.5.3
  rtnet version: 0.9.12
 
  
  


 
  ___
  Xenomai-core mailing list
  Xenomai-core@gna.org
  https://mail.gna.org/listinfo/xenomai-core
  

 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] kernel threads crash

2011-04-11 Thread Philippe Gerum

On Mon, 2011-04-11 at 16:20 +0200, Jesper Christensen wrote:
 Problem is the NIP in question is the address of the thread structure as
 seen in the error message.
 

Is your code spawning -rt kernel threads frequently/periodically, or
only when the application initializes?

 /Jesper
 
 
 On 2011-04-11 16:18, Philippe Gerum wrote:
  On Mon, 2011-04-11 at 16:13 +0200, Jesper Christensen wrote:

  I have updated to xenomai 2.5.6, but i'm still seeing exceptions
  (considerably less often though):
 
  Xenomai: suspending kernel thread b92a39d0 ('tt_upgw_0') at 0xb92a39d0
  after exception #1792
  
  You should build your code statically into the kernel, not as a module,
  and find out which code raises the MCE.
 
  CONFIG_DEBUG_INFO=y, then objdump -dl vmlinux, looking for the NIP
  mentioned.
 

  /Jesper
 
 
  On 2011-04-08 15:12, Philippe Gerum wrote:
  
  On Fri, 2011-04-08 at 14:58 +0200, Jesper Christensen wrote:


  Hi
 
  I'm trying to implement some gateway functionality in the kernel on a
  emerson CPCI6200 board, but have run into some strange errors. The
  kernel module is made up of two threads that run every 1 ms. I have also
  made use of the rtpc dispatcher in rtnet to dispatch control messages
  from a netlink socket to the RT part of my kernel module.
 
  The problem is that when loaded the threads get suspended due to 
  exceptions:
 
  Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0xb929cbc0
  after exception #1792
 
  or
 
  Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0x0 after
  exception #1025
 
  or
 
  Xenomai: suspending kernel thread b911f518 ('rtnet-rtpc') at 0xb911f940
  after exception #1792
 
 
  I have ported the gianfar driver from linux to rtnet.
 
  The versions and hardware are listed below. The errors are most likely
  due to faulty software on my part, but i would like to ask if there are
  any known issues with the versions or hardware i'm using. I would also
  like to ask if there are any ways of further debugging the errors as i
  am not getting very far with the above messages.
  
  
  A severe bug at kthread init was fixed in the 2.5.5.2 - 2.5.6 timeframe,
  which would cause exactly the kind of weird behavior you are seeing
  right now. The bug triggered random code execution due to stack memory
  pollution at init on powerpc for Xenomai kthreads:
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=90699565cbce41f2cec193d57857bb5817efc19a
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=da20c20d4b4d892d40c657ad1d32ddb6d0ceb47c
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=a5886b354dc18f054b187b58cfbacfb60bccaf47
 
  You need at the very least those three patches (from the top of my
  head), but it would be much better to upgrade to 2.5.6.
 


 
  System info:
 
  Linux kernel: 2.6.29.6
  i-pipe version: 2.7-04
  processor: powerpc mpc8572
  xenomai version: 2.5.3
  rtnet version: 0.9.12
 
  
  


 
  ___
  Xenomai-core mailing list
  Xenomai-core@gna.org
  https://mail.gna.org/listinfo/xenomai-core
  

 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] kernel threads crash

2011-04-08 Thread Philippe Gerum

On Fri, 2011-04-08 at 14:58 +0200, Jesper Christensen wrote:
 Hi
 
 I'm trying to implement some gateway functionality in the kernel on a
 emerson CPCI6200 board, but have run into some strange errors. The
 kernel module is made up of two threads that run every 1 ms. I have also
 made use of the rtpc dispatcher in rtnet to dispatch control messages
 from a netlink socket to the RT part of my kernel module.
 
 The problem is that when loaded the threads get suspended due to exceptions:
 
 Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0xb929cbc0
 after exception #1792
 
 or
 
 Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0x0 after
 exception #1025
 
 or
 
 Xenomai: suspending kernel thread b911f518 ('rtnet-rtpc') at 0xb911f940
 after exception #1792
 
 
 I have ported the gianfar driver from linux to rtnet.
 
 The versions and hardware are listed below. The errors are most likely
 due to faulty software on my part, but i would like to ask if there are
 any known issues with the versions or hardware i'm using. I would also
 like to ask if there are any ways of further debugging the errors as i
 am not getting very far with the above messages.

A severe bug at kthread init was fixed in the 2.5.5.2 - 2.5.6 timeframe,
which would cause exactly the kind of weird behavior you are seeing
right now. The bug triggered random code execution due to stack memory
pollution at init on powerpc for Xenomai kthreads:
http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=90699565cbce41f2cec193d57857bb5817efc19a
http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=da20c20d4b4d892d40c657ad1d32ddb6d0ceb47c
http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=a5886b354dc18f054b187b58cfbacfb60bccaf47

You need at the very least those three patches (from the top of my
head), but it would be much better to upgrade to 2.5.6.

 
 
 
 System info:
 
 Linux kernel: 2.6.29.6
 i-pipe version: 2.7-04
 processor: powerpc mpc8572
 xenomai version: 2.5.3
 rtnet version: 0.9.12
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] kernel threads crash

2011-04-08 Thread Philippe Gerum

On Fri, 2011-04-08 at 15:20 +0200, Jesper Christensen wrote:
 Thanks i'll give 2.5.6 a shot.
 
 Also it has come to my attention that there is some source files
 (arch/powerpc/platforms/85xx/cpci6200.c,
 arch/powerpc/platforms/85xx/cpci6200.h,
 arch/powerpc/platforms/85xx/cpci6200_timer.c) that are probably not
 covered by the adeos patch. Am i correct in assuming these need some
 work to support i-pipe?
 

I can't tell since I have no access to them, this is probably not a
mainline port.

In any case, if any of those files implements the support for the
programmable interrupt controller, hw timer, gpios and/or any form of
cascaded interrupt handling, this is correct: they should be made I-pipe
aware.

 /Jesper
 
 
 On 2011-04-08 15:12, Philippe Gerum wrote:
  On Fri, 2011-04-08 at 14:58 +0200, Jesper Christensen wrote:

  Hi
 
  I'm trying to implement some gateway functionality in the kernel on a
  emerson CPCI6200 board, but have run into some strange errors. The
  kernel module is made up of two threads that run every 1 ms. I have also
  made use of the rtpc dispatcher in rtnet to dispatch control messages
  from a netlink socket to the RT part of my kernel module.
 
  The problem is that when loaded the threads get suspended due to 
  exceptions:
 
  Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0xb929cbc0
  after exception #1792
 
  or
 
  Xenomai: suspending kernel thread b929cbc0 ('tt_upgw_0') at 0x0 after
  exception #1025
 
  or
 
  Xenomai: suspending kernel thread b911f518 ('rtnet-rtpc') at 0xb911f940
  after exception #1792
 
 
  I have ported the gianfar driver from linux to rtnet.
 
  The versions and hardware are listed below. The errors are most likely
  due to faulty software on my part, but i would like to ask if there are
  any known issues with the versions or hardware i'm using. I would also
  like to ask if there are any ways of further debugging the errors as i
  am not getting very far with the above messages.
  
  A severe bug at kthread init was fixed in the 2.5.5.2 - 2.5.6 timeframe,
  which would cause exactly the kind of weird behavior you are seeing
  right now. The bug triggered random code execution due to stack memory
  pollution at init on powerpc for Xenomai kthreads:
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=90699565cbce41f2cec193d57857bb5817efc19a
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=da20c20d4b4d892d40c657ad1d32ddb6d0ceb47c
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=a5886b354dc18f054b187b58cfbacfb60bccaf47
 
  You need at the very least those three patches (from the top of my
  head), but it would be much better to upgrade to 2.5.6.
 

 
 
  System info:
 
  Linux kernel: 2.6.29.6
  i-pipe version: 2.7-04
  processor: powerpc mpc8572
  xenomai version: 2.5.3
  rtnet version: 0.9.12
 
  

 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Would Xenomai adopt RTL after the patent is expired?

2011-04-06 Thread Philippe Gerum

On Wed, 2011-04-06 at 09:19 +0800, arethe.rtai wrote:
 HI:
The patents that cover RT-Linux are set to expire in a few years,
 then, would Xenomai adopt the RTL technology? As known, the RTL idea
 is clean and minimalistic, it may improve the determinism of Xenomai.

The trend is rather to blur the distinction between native real-time and
dual kernel approaches these days, not to downgrade to a kernel-only
interrupt handler with co-routines on top.

So no, there would be no rational reason to do that, not to mention the
fact that if we can assess the typical latency of Xenomai over the seven
architectures it runs on with the latest mainline kernels, we would be
unable to compare this to anything else than x86 over a legacy kernel
AFAIK. And no, I don't think that I'm going to send an inquiry to WRS
for information regarding how RTL performs on other architectures.
Incidentally, maybe you should ask yourself why they ship WR-Linux with
PREEMPT_RT.

 Regards arethe
 2011-04-06 
 
 __
 arethe.rtai 
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Backfire: User - Kernel latancy mesurement tool on Xenomai

2011-04-06 Thread Philippe Gerum

On Wed, 2011-04-06 at 19:31 +0100, krishna murthy j s wrote:
 Thanks for the reply. Can you please tell me why my questions are
 useless. Such attitude will take Xenomai user group no where. 

It is often better to go nowhere than to go to the wrong place. Anyway,
some reasons for the flak you received could be:

- no specification of your target hardware. I understand there is a
long-standing trend to throw results into the latency debate without a
single bit of information regarding the hardware configuration under
test, the actual code being used (and not a vague description of what it
eventually does), how it has been changed and how it has been used, but
well, we are old-fashioned folks: we do prefer facts. Besides, we do
think there is life beyond x86, so it is always better to be specific in
this area when sending us inquiries.

- backfire comes from the PREEMPT_RT test suite. As such, it does not
care of any dual kernel issues. We do, when writing an application. So,
unless you also wrote an RTDM driver to replace the original backfire
driver, what you are testing is actually plain vanilla Linux, with the
additional overhead of moving your task back and forth between the
Xenomai scheduler and the Linux scheduler at a high rate. If so, no
wonder why you get some extra latency with Xenomai. It's a bit like
driving on a racetrack paved with speed bumpers.

- measuring the latency of Linux signal delivery like backfire does is
interestingly totally off-base wrt Xenomai, because linux signals are
delivered to Xenomai tasks ... in Linux mode (yes, we have runtime modes
like dual kernel systems may have). So, no real-time here either. Since
Xenomai does not implement signal delivery in real-time mode yet, what
you are testing still remains a mystery. But maybe you could explain
better?

To sum up, each RT enabler comes with a test suite which has been
written carefully to illustrate a particular behavior or performance
aspect, and Xenomai follows this common rule.

Before issuing any claims, maybe you could have posted your code, a
detailed description of your setup, and your test scenario. Asking
people to reverse-engineer what you might have done, based on a couple
of lose details placed side-by-side with strong claims and conclusions,
is not the best way to draw attention.

So, don't take what was said earlier personally. It is just that
sometimes, people may have tuned their bullshit deflector a bit eagerly.
Mine is totally busted btw, so you never know.

 
 On Wed, Apr 6, 2011 at 6:56 PM, Gilles Chanteperdrix
 gilles.chanteperd...@xenomai.org wrote:
 
 krishna m wrote:
  I ported the backfire tool in the OSADL site
 [https://www.osadl.org/backfire-4.backfire.0.html] to measure
 the user to/from kernel latency.I wanted to measure the
 difference between the RT_PREEMPT kernel and Xenomai Kernel.
 Surprisingly i see RT_PREEMPT performance better than Xenomai.
 
  Here are few points to note:
  1. The thread priority of the sendme tool of backfire in
 RT_PREEMPT is 99 [highest]
  2. I have made the thread priority 99 for the rt_task that i
 spawn [par of ported sendme]
  ret = rt_task_shadow(rt_task_desc, NULL, 99, 0);
 
  My Questions:
  * I wanted to know if any one has done such measurements
 using backfire and how dose Xenomai fair agnist RT_PREEMPT?
  * is there any similar tool like backfire in the Xenomai
 tool set that dose the similar measurements?
  * Do I need to do more Xenomai specific optimization in the
 sendme and backfire code to get better performance?
 
 
 Useless notes, useless questions. Show us the ported code.
 
 
 --
Gilles.
 
 
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Bug in Linux kernel 2.6.37 blocks xenomai threads

2011-04-04 Thread Philippe Gerum

On Mon, 2011-04-04 at 16:41 +0200, Sebastian Smolorz wrote:
 Hi,
 
 there is a bug in kernel 2.6.37 (fixed in 2.6.37.1, see commit 
 1cdc65e1400d863f28af868ee1e645485b04f5ed) which blocks RT threads during 
 creation. They stick to a certain CPU core for a certain amount of time 
 (sometimes minutes ...) before they are migrated to the proper core and run 
 as expected.
 
 Philippe, Gilles, maybe you could generate a new i-pipe patch based on the 
 newest 2.6.37-series kernel. I patched a 2.6.37.6 kernel with adeos-
 ipipe-2.6.37-x86-2.9-00.patch and the problem was gone.
 

Ok, I'll handle this. I have some patches from Jan which have been
pending for too long in my tree to add to this one.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Bug in Linux kernel 2.6.37 blocks xenomai threads

2011-04-04 Thread Philippe Gerum

On Mon, 2011-04-04 at 16:56 +0200, Gilles Chanteperdrix wrote:
 Philippe Gerum wrote:
  On Mon, 2011-04-04 at 16:41 +0200, Sebastian Smolorz wrote:
  Hi,
 
  there is a bug in kernel 2.6.37 (fixed in 2.6.37.1, see commit 
  1cdc65e1400d863f28af868ee1e645485b04f5ed) which blocks RT threads during 
  creation. They stick to a certain CPU core for a certain amount of time 
  (sometimes minutes ...) before they are migrated to the proper core and 
  run 
  as expected.
 
  Philippe, Gilles, maybe you could generate a new i-pipe patch based on the 
  newest 2.6.37-series kernel. I patched a 2.6.37.6 kernel with adeos-
  ipipe-2.6.37-x86-2.9-00.patch and the problem was gone.
 
  
  Ok, I'll handle this. I have some patches from Jan which have been
  pending for too long in my tree to add to this one.
 
 Since you are at it, could you have a look at:
 https://mail.gna.org/public/adeos-main/2011-04/msg1.html
 

Ok, queued. Thanks.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Whether can Xenomai bypass the cache?

2011-03-06 Thread Philippe Gerum

On Sun, 2011-03-06 at 14:05 +0800, arethe rtai wrote:
 Hello:
 As known, cache can accelerate the memory access, but
 unfortunately, it would decrease the predictability of real-time tasks'
 temporal behaviour. Many tasks of our application prefer
 predictability to the speed of execution. Intel's processors after P6
 include MTRR and PAT, both the two units can be used to bypass the
 cache.
 I wonder whether Xenomai can bypass the cache, and whether Xenomai
 can manage the MTRR or PAT? 


No.

 If the answers are true, how to use the
 function?
 3x
 arethe
 
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Is anybody using the pSOS skin in userland?

2010-11-15 Thread Philippe Gerum

On Wed, 2010-11-03 at 22:42 +0100, ronny meeus wrote:
Hello

we are investigating to usage of the pSOS+ skin to port a large legacy
pSOS application to Linux.
The application model consist of several processes in which the
application lives. All processes will make use of the pSOS library.

After playing around with the library for some time we have observed
several missing service calls, bugs and differences in behaviour
compared to a real pSOS implementation:
- missing sm_ident
- missing t_getreg / t_setreg in userland (patch already included in
2.5.5)
- not possible to use skin from the context of different processes
(patch already included in 2.5.5)
- added support for identical task/queue/semaphore/region names by
making names unique.
- strange behaviour in pSOS message queue (see post Possible memory
leak in psos skin message queue handling).

I can (and will) deliver patches for all issues I have found, but I'm
wondering whether there are other people using the pSOS skin (in
userland) in a real live application. The target for my project would
be an embedded system with strong reliability requirements (very
stable / long running etc).
Any feedback is welcome and appreciated.

It is not clear to me either which tests are executed before a new
version is released.

T-e-s-t? What's this? We are proud to deliver the greatest uncertainty,
where the deepest fears about upgrading may turn into the highest hopes.
And vice-versa.

Is there any test-suite available for the pSOS skin?

This is a good start, used to validate the Xenomai SOLO implementation.
http://git.denx.de/?p=xenomai-solo.git;a=tree;f=psos/testsuite;h=54411570e19dec40e14a1226084024c05c0f3e53;hb=ee9c11895ac7cf2d72b1158a4836a4465f478a0b

This needs to be slightly adapted to run over the current Xenomai 2.x
architecture, but the test logic of course is the same.

Best regards,
Ronny

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Is anybody using the pSOS skin in userland?

2010-11-15 Thread Philippe Gerum

On Wed, 2010-11-03 at 22:42 +0100, ronny meeus wrote:
Hello

After playing around with the library for some time we have observed
several missing service calls, bugs and differences in behaviour
compared to a real pSOS implementation:
- missing sm_ident

http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=26e916ecc3f8b71cd8ce4c4194555ee0cc4aa018

- missing t_getreg / t_setreg in userland (patch already included in
2.5.5)
- not possible to use skin from the context of different processes
(patch already included in 2.5.5)
- added support for identical task/queue/semaphore/region names by
making names unique.
- strange behaviour in pSOS message queue (see post Possible memory
leak in psos skin message queue handling).

It is not clear to me either which tests are executed before a new
version is released. Is there any test-suite available for the pSOS
skin?

Best regards,
Ronny

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-07 Thread Philippe Gerum

On Sun, 2010-11-07 at 02:00 +0100, Jan Kiszka wrote:
 Am 06.11.2010 23:49, Philippe Gerum wrote:
  On Sat, 2010-11-06 at 21:37 +0100, Gilles Chanteperdrix wrote:
  Anders Blomdell wrote:
  Gilles Chanteperdrix wrote:
  Anders Blomdell wrote:
  Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Am 05.11.2010 00:24, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Am 04.11.2010 23:06, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  At first sight, here you are more breaking things than cleaning 
  them.
  Still, it has the SMP record for my test program, still runs 
  with ftrace 
  on (after 2 hours, where it previously failed after maximum 23 
  minutes).
  My version was indeed still buggy, I'm reworking it ATM.
 
  If I get the gist of Jan's changes, they are (using the IPI to 
  transfer 
  one bit of information: your cpu needs to reschedule):
 
  xnsched_set_resched:
  -  setbits((__sched__)-status, XNRESCHED);
 
  xnpod_schedule_handler:
  +xnsched_set_resched(sched);
   
  If you (we?) decide to keep the debug checks, under what 
  circumstances 
  would the current check trigger (in laymans language, that I'll 
  be able 
  to understand)?
  That's actually what /me is wondering as well. I do not see yet 
  how you
  can reliably detect a missed reschedule reliably (that was the 
  purpose
  of the debug check) given the racy nature between signaling 
  resched and
  processing the resched hints.
  The purpose of the debugging change is to detect a change of the
  scheduler state which was not followed by setting the XNRESCHED 
  bit.
  But that is nucleus business, nothing skins can screw up (as long as
  they do not misuse APIs).
  Yes, but it happens that we modify the nucleus from time to time.
 
  Getting it to work is relatively simple: we add a scheduler 
  change set
  remotely bit to the sched structure which is NOT in the status 
  bit, set
  this bit when changing a remote sched (under nklock). In the debug 
  check
  code, if the scheduler state changed, and the XNRESCHED bit is not 
  set,
  only consider this a but if this new bit is not set. All this is
  compiled out if the debug is not enabled.
  I still see no benefit in this check. Where to you want to place 
  the bit
  set? Aren't that just the same locations where
  xnsched_set_[self_]resched already is today?
  Well no, that would be another bit in the sched structure which would
  allow us to manipulate the status bits from the local cpu. That
  supplementary bit would only be changed from a distant CPU, and 
  serve to
  detect the race which causes the false positive. The resched bits are
  set on the local cpu to get xnpod_schedule to trigger a rescheduling 
  on
  the distance cpu. That bit would be set on the remote cpu's sched. 
  Only
  when debugging is enabled.
 
  But maybe you can provide some motivating bug scenarios, real ones 
  of
  the past or realistic ones of the future.
  Of course. The bug is anything which changes the scheduler state but
  does not set the XNRESCHED bit. This happened when we started the SMP
  port. New scheduling policies would be good candidates for a revival 
  of
  this bug.
 
  You don't gain any worthwhile check if you cannot make the
  instrumentation required for a stable detection simpler than the 
  proper
  problem solution itself. And this is what I'm still skeptical of.
  The solution is simple, but finding the problem without the 
  instrumentation is way harder than with the instrumentation, so the 
  instrumentation is worth something.
 
  Reproducing the false positive is surprisingly easy with a simple
  dual-cpu semaphore ping-pong test. So, here is the (tested) patch, 
  using a ridiculous long variable name to illustrate what I was 
  thinking about:
 
  diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
  index cf4..454b8e8 100644
  --- a/include/nucleus/sched.h
  +++ b/include/nucleus/sched.h
  @@ -108,6 +108,9 @@ typedef struct xnsched {
  struct xnthread *gktarget;
   #endif
 
  +#ifdef CONFIG_XENO_OPT_DEBUG_NUCLEUS
  +   int debug_resched_from_remote;
  +#endif
   } xnsched_t;
 
   union xnsched_policy_param;
  @@ -185,6 +188,8 @@ static inline int xnsched_resched_p(struct xnsched 
  *sched)
 xnsched_t *current_sched = xnpod_current_sched();   
   \
 __setbits(current_sched-status, XNRESCHED);
   \
 if (current_sched != (__sched__)){  
   \
  + if (XENO_DEBUG(NUCLEUS)) 
   \
  + __sched__-debug_resched_from_remote = 1;
   \
 xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); 
   \
 }   
   \
   } while (0)
  diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
  index 4cb707a..50b0f49 100644
  --- a/ksrc/nucleus/pod.c
  +++ b/ksrc/nucleus/pod.c
  @@ -2177,6 +2177,10 @@ static

Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-07 Thread Philippe Gerum

On Sun, 2010-11-07 at 09:31 +0100, Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
  Anyway, after some thoughts, I think we are going to try and make the
  current situation work instead of going back to the old way.
 
  You can find the patch which attempts to do so here:
  http://sisyphus.hd.free.fr/~gilles/sched_status.txt
  Ack. At last, this addresses the real issues without asking for
  regression funkiness: fix the lack of barrier before testing XNSCHED in
  
  Check the kernel, we actually need it on both sides. Wherever the final
  barriers will be, we should leave a comment behind why they are there.
  Could be picked up from kernel/smp.c.
 
 We have it on both sides: the non-local flags are modified while holding
 the nklock. Unlocking the nklock implies a barrier.

I think we may have an issue with this kind of construct:

xnlock_get_irq*(nklock)
xnpod_resume/suspend/whatever_thread()
xnlock_get_irq*(nklock)
...
xnlock_put_irq*(nklock)
xnpod_schedule()
xnlock_get_irq*(nklock)
send_ipi
= xnpod_schedule_handler on dest CPU
xnlock_put_irq*(nklock)
xnlock_put_irq*(nklock)

The issue would be triggered by the use of recursive locking. In that
case, the source CPU would only sync its cache when the lock is actually
dropped by the outer xnlock_put_irq* call and the inner
xnlock_get/put_irq* would not act as barriers, so the remote
rescheduling handler won't always see the XNSCHED update done remotely,
and may lead to a no-op. So we need a barrier before sending the IPI in
__xnpod_test_resched().

This could not happen if all schedule state changes where clearly
isolated from rescheduling calls in different critical sections, but
it's sometimes not an option not to group them for consistency reasons.

 
  
  the xnpod_schedule pre-test, and stop sched-status trashing due to
  XNINIRQ/XNHTICK/XNRPICK ops done un-synced on nklock.
 
  In short, this patch looks like moving the local-only flags where they
  belong, i.e. anywhere you want but *outside* of the status with remotely
  accessed bits. XNRPICK seems to be handled differently, but it makes
  sense to group it with other RPI data as you did, so fine with me.
  
  I just hope we finally converge over a solution. Looks like all
  possibilities have been explored now. A few more comments on this one:
  
  It probably makes sense to group the status bits accordingly (both their
  values and definitions) and briefly document on which status field they
  are supposed to be applied.
 
 Ok, but I wanted them to not use the same values, so that we can use the
 sched-status | sched-lstatus trick in xnpod_schedule. Something is
 lacking too: we probably need to use sched-status | sched-lstatus for
 display in /proc.
 
  
  I do not understand the split logic - or some bits are simply not yet
  migrated. XNHDEFER, XNSWLOCK, XNKCOUT are all local-only as well, no?
  Then better put them in the _local_ status field, that's more consistent
  (and would help if we once wanted to optimize their cache line usage).
 
 Maybe the naming is not good the. -status is everything which is
 modified under nklock, -lstatus is for XNINIRQ and XNHTICK which are
 modified without holding the nklock.
 
  
  The naming is unfortunate: status vs. lstatus. This is asking for
  confusion and typos. They must be better distinguishable, e.g.
  local_status. Or we need accessors that have debug checks built in,
  catching wrong bits for their target fields.
 
 I agree.
 
  
  Good catch of the RPI breakage, Gilles!
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-07 Thread Philippe Gerum

On Sun, 2010-11-07 at 11:14 +0100, Jan Kiszka wrote:
 Am 07.11.2010 11:12, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Am 07.11.2010 11:03, Philippe Gerum wrote:
  On Sun, 2010-11-07 at 09:31 +0100, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Anyway, after some thoughts, I think we are going to try and make the
  current situation work instead of going back to the old way.
 
  You can find the patch which attempts to do so here:
  http://sisyphus.hd.free.fr/~gilles/sched_status.txt
  Ack. At last, this addresses the real issues without asking for
  regression funkiness: fix the lack of barrier before testing XNSCHED in
  Check the kernel, we actually need it on both sides. Wherever the final
  barriers will be, we should leave a comment behind why they are there.
  Could be picked up from kernel/smp.c.
  We have it on both sides: the non-local flags are modified while holding
  the nklock. Unlocking the nklock implies a barrier.
  I think we may have an issue with this kind of construct:
 
  xnlock_get_irq*(nklock)
xnpod_resume/suspend/whatever_thread()
xnlock_get_irq*(nklock)
...
xnlock_put_irq*(nklock)
xnpod_schedule()
xnlock_get_irq*(nklock)
send_ipi
= xnpod_schedule_handler on dest CPU
xnlock_put_irq*(nklock)
  xnlock_put_irq*(nklock)
 
  The issue would be triggered by the use of recursive locking. In that
  case, the source CPU would only sync its cache when the lock is actually
  dropped by the outer xnlock_put_irq* call and the inner
  xnlock_get/put_irq* would not act as barriers, so the remote
  rescheduling handler won't always see the XNSCHED update done remotely,
  and may lead to a no-op. So we need a barrier before sending the IPI in
  __xnpod_test_resched().
 
  That's what I said.
 
  And we need it on the reader side as an rmb().
  
  This one we have, in xnpod_schedule_handler.
  
 
 Right, with your patch (the above sounded like we only need it on writer
 side).

C'mon...

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-06 Thread Philippe Gerum

On Sat, 2010-11-06 at 21:37 +0100, Gilles Chanteperdrix wrote:
 Anders Blomdell wrote:
  Gilles Chanteperdrix wrote:
  Anders Blomdell wrote:
  Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Am 05.11.2010 00:24, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Am 04.11.2010 23:06, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  At first sight, here you are more breaking things than cleaning 
  them.
  Still, it has the SMP record for my test program, still runs with 
  ftrace 
  on (after 2 hours, where it previously failed after maximum 23 
  minutes).
  My version was indeed still buggy, I'm reworking it ATM.
 
  If I get the gist of Jan's changes, they are (using the IPI to 
  transfer 
  one bit of information: your cpu needs to reschedule):
 
  xnsched_set_resched:
  -  setbits((__sched__)-status, XNRESCHED);
 
  xnpod_schedule_handler:
  +  xnsched_set_resched(sched);
 
  If you (we?) decide to keep the debug checks, under what 
  circumstances 
  would the current check trigger (in laymans language, that I'll be 
  able 
  to understand)?
  That's actually what /me is wondering as well. I do not see yet how 
  you
  can reliably detect a missed reschedule reliably (that was the 
  purpose
  of the debug check) given the racy nature between signaling resched 
  and
  processing the resched hints.
  The purpose of the debugging change is to detect a change of the
  scheduler state which was not followed by setting the XNRESCHED bit.
  But that is nucleus business, nothing skins can screw up (as long as
  they do not misuse APIs).
  Yes, but it happens that we modify the nucleus from time to time.
 
  Getting it to work is relatively simple: we add a scheduler change 
  set
  remotely bit to the sched structure which is NOT in the status bit, 
  set
  this bit when changing a remote sched (under nklock). In the debug 
  check
  code, if the scheduler state changed, and the XNRESCHED bit is not 
  set,
  only consider this a but if this new bit is not set. All this is
  compiled out if the debug is not enabled.
  I still see no benefit in this check. Where to you want to place the 
  bit
  set? Aren't that just the same locations where
  xnsched_set_[self_]resched already is today?
  Well no, that would be another bit in the sched structure which would
  allow us to manipulate the status bits from the local cpu. That
  supplementary bit would only be changed from a distant CPU, and serve 
  to
  detect the race which causes the false positive. The resched bits are
  set on the local cpu to get xnpod_schedule to trigger a rescheduling on
  the distance cpu. That bit would be set on the remote cpu's sched. Only
  when debugging is enabled.
 
  But maybe you can provide some motivating bug scenarios, real ones of
  the past or realistic ones of the future.
  Of course. The bug is anything which changes the scheduler state but
  does not set the XNRESCHED bit. This happened when we started the SMP
  port. New scheduling policies would be good candidates for a revival of
  this bug.
 
  You don't gain any worthwhile check if you cannot make the
  instrumentation required for a stable detection simpler than the proper
  problem solution itself. And this is what I'm still skeptical of.
  The solution is simple, but finding the problem without the 
  instrumentation is way harder than with the instrumentation, so the 
  instrumentation is worth something.
 
  Reproducing the false positive is surprisingly easy with a simple
  dual-cpu semaphore ping-pong test. So, here is the (tested) patch, 
  using a ridiculous long variable name to illustrate what I was 
  thinking about:
 
  diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
  index cf4..454b8e8 100644
  --- a/include/nucleus/sched.h
  +++ b/include/nucleus/sched.h
  @@ -108,6 +108,9 @@ typedef struct xnsched {
  struct xnthread *gktarget;
   #endif
 
  +#ifdef CONFIG_XENO_OPT_DEBUG_NUCLEUS
  +   int debug_resched_from_remote;
  +#endif
   } xnsched_t;
 
   union xnsched_policy_param;
  @@ -185,6 +188,8 @@ static inline int xnsched_resched_p(struct xnsched 
  *sched)
 xnsched_t *current_sched = xnpod_current_sched();\
 __setbits(current_sched-status, XNRESCHED); \
 if (current_sched != (__sched__)){   \
  + if (XENO_DEBUG(NUCLEUS))  \
  + __sched__-debug_resched_from_remote = 1; \
 xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);  \
 }\
   } while (0)
  diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
  index 4cb707a..50b0f49 100644
  --- a/ksrc/nucleus/pod.c
  +++ b/ksrc/nucleus/pod.c
  @@ -2177,6 +2177,10 @@ static inline int __xnpod_test_resched(struct 
  xnsched *sched)
  xnarch_cpus_clear(sched-resched);
  }
   #endif
  +   if

Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-03 Thread Philippe Gerum

On Wed, 2010-11-03 at 20:38 +0100, Anders Blomdell wrote:
 Jan Kiszka wrote:
  Am 03.11.2010 17:46, Anders Blomdell wrote:
  Anders Blomdell wrote:
  Anders Blomdell wrote:
  Jan Kiszka wrote:
  additional barrier. Can you check this?
 
  diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
  index df56417..66b52ad 100644
  --- a/include/nucleus/sched.h
  +++ b/include/nucleus/sched.h
  @@ -187,6 +187,7 @@ static inline int xnsched_self_resched_p(struct 
  xnsched *sched)
 if (current_sched != (__sched__)){\
 xnarch_cpu_set(xnsched_cpu(__sched__), 
  current_sched-resched);\
 setbits((__sched__)-status, XNRESCHED);\
  +  xnarch_memory_barrier();\
 }\
   } while (0)
  In progress, if nothing breaks before, I'll report status tomorrow 
  morning.
  It still breaks (in approximately the same way). I'm currently putting a 
  barrier in the other macro doing a RESCHED, also adding some tracing to 
  see if a read barrier is needed.
  Nope, no luck there either. Will start interesting tracepoint 
  adding/conversion :-(
  
  Strange. But it was too easy anyway...
  
  Any reason why xn_nucleus_sched_remote should ever report status = 0?
  
  Really don't know yet. You could trigger on this state and call
  ftrace_stop() then. Provided you had the functions tracer enabled, that
  should give a nice pictures of what happened before.
 
 Isn't there a race betweeen these two (still waiting for compilation to 
 be finished)?

We always hold the nklock in both contexts.

 
 static inline int __xnpod_test_resched(struct xnsched *sched)
 {
   int resched = testbits(sched-status, XNRESCHED);
 #ifdef CONFIG_SMP
   /* Send resched IPI to remote CPU(s). */
   if (unlikely(xnsched_resched_p(sched))) {
   xnarch_send_ipi(sched-resched);
   xnarch_cpus_clear(sched-resched);
   }
 #endif
   clrbits(sched-status, XNRESCHED);
   return resched;
 }
 
 #define xnsched_set_resched(__sched__) do {   \
xnsched_t *current_sched = xnpod_current_sched();   \
setbits(current_sched-status, XNRESCHED);  \
if (current_sched != (__sched__)) { \
xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \
setbits((__sched__)-status, XNRESCHED);\
xnarch_memory_barrier();\
}   \
 } while (0)
 
 I would suggest (if I have got all the macros right):
 
 static inline int __xnpod_test_resched(struct xnsched *sched)
 {
   int resched = testbits(sched-status, XNRESCHED);
   if (unlikely(resched)) {
 #ifdef CONFIG_SMP
   /* Send resched IPI to remote CPU(s). */
   xnarch_send_ipi(sched-resched);
   xnarch_cpus_clear(sched-resched);
 #endif
   clrbits(sched-status, XNRESCHED);
   }
   return resched;
 }
 
 /Anders
 
 
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] arm: Unprotected access to irq_desc field?

2010-10-29 Thread Philippe Gerum

On Fri, 2010-10-29 at 09:00 +0200, Jan Kiszka wrote:
 Am 28.10.2010 21:34, Philippe Gerum wrote:
  On Thu, 2010-10-28 at 21:15 +0200, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Gilles,
 
  I happened to come across rthal_mark_irq_disabled/enabled on arm. On
  first glance, it looks like these helpers manipulate irq_desc::status
  non-atomically, i.e. without holding irq_desc::lock. Isn't this fragile?
 
  I have no idea. How do the other architectures do? As far as I know,
  this code has been copied from there.
  
  Other archs do the same, simply because once an irq is managed by the
  hal, it may not be shared in any way with the regular kernel. So locking
  is pointless.
 
 Indeed, I missed that all the other archs have this uninlined in hal.c.
 
 However, this leaves at least a race between xnintr_disable/enable and
 XN_ISR_PROPAGATE (ie. the related Linux path) behind.

I can't see why XN_ISR_PROPAGATE would be involved here. This service
pends an interrupt in the pipeline log.

  Not sure if it
 matters practically - but risking silent breakage for this micro
 optimization?

It was not meant as an optimization; we may not grab the linux
descriptor lock in this context because we may enter it in primary mode.

  Is disabling/enabling really in the highly
 latency-critical anywhere? Otherwise, I would suggest to just plug this
 by adding the intended lock for this field.

The caller is expected to manage locking; AFAICS the only one who does
not is the RTAI skin, which is obsolete and removed in 2.6.x, so no big
deal.

 
 Jan
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] arm: Unprotected access to irq_desc field?

2010-10-29 Thread Philippe Gerum

On Fri, 2010-10-29 at 11:05 +0200, Jan Kiszka wrote:
 Am 29.10.2010 10:27, Philippe Gerum wrote:
  On Fri, 2010-10-29 at 09:00 +0200, Jan Kiszka wrote:
  Am 28.10.2010 21:34, Philippe Gerum wrote:
  On Thu, 2010-10-28 at 21:15 +0200, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Gilles,
 
  I happened to come across rthal_mark_irq_disabled/enabled on arm. On
  first glance, it looks like these helpers manipulate irq_desc::status
  non-atomically, i.e. without holding irq_desc::lock. Isn't this fragile?
 
  I have no idea. How do the other architectures do? As far as I know,
  this code has been copied from there.
 
  Other archs do the same, simply because once an irq is managed by the
  hal, it may not be shared in any way with the regular kernel. So locking
  is pointless.
 
  Indeed, I missed that all the other archs have this uninlined in hal.c.
 
  However, this leaves at least a race between xnintr_disable/enable and
  XN_ISR_PROPAGATE (ie. the related Linux path) behind.
  
  I can't see why XN_ISR_PROPAGATE would be involved here. This service
  pends an interrupt in the pipeline log.
 
 And this finally lets Linux code run that fiddles with irq_desc::status
 as well - potentially in parallel to an unsyncrhonized
 xnintr_irq_disable in a different context. That's the problem.

Propagation happens in primary domain. When is this supposed to conflict
on the same CPU with linux?

 
  
   Not sure if it
  matters practically - but risking silent breakage for this micro
  optimization?
  
  It was not meant as an optimization; we may not grab the linux
  descriptor lock in this context because we may enter it in primary mode.
 
 Oh, that lock isn't harden as I somehow assumed. This of course
 complicates things.
 
  
   Is disabling/enabling really in the highly
  latency-critical anywhere? Otherwise, I would suggest to just plug this
  by adding the intended lock for this field.
  
  The caller is expected to manage locking; AFAICS the only one who does
  not is the RTAI skin, which is obsolete and removed in 2.6.x, so no big
  deal.
 
 The problem is that IRQ forwarding to Linux may let this manipulation
 race with plain Linux code, thus has to synchronize with it. It is a
 corner case (no one is supposed to pass IRQs down blindly anyway - if at
 all), but it should at least be documented (Don't use disable/enable
 together with IRQ forwarding unless you acquire the descriptor lock
 properly!).
 
 BTW, do we need to track the descriptor state in primary mode at all?
 

That is the real issue. I don't see the point of doing this with the
current kernel code.

 Jan
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] arm: Unprotected access to irq_desc field?

2010-10-29 Thread Philippe Gerum

On Fri, 2010-10-29 at 14:09 +0200, Jan Kiszka wrote:
 Am 29.10.2010 14:00, Philippe Gerum wrote:
  On Fri, 2010-10-29 at 11:05 +0200, Jan Kiszka wrote:
  Am 29.10.2010 10:27, Philippe Gerum wrote:
  On Fri, 2010-10-29 at 09:00 +0200, Jan Kiszka wrote:
  Am 28.10.2010 21:34, Philippe Gerum wrote:
  On Thu, 2010-10-28 at 21:15 +0200, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Gilles,
 
  I happened to come across rthal_mark_irq_disabled/enabled on arm. On
  first glance, it looks like these helpers manipulate irq_desc::status
  non-atomically, i.e. without holding irq_desc::lock. Isn't this 
  fragile?
 
  I have no idea. How do the other architectures do? As far as I know,
  this code has been copied from there.
 
  Other archs do the same, simply because once an irq is managed by the
  hal, it may not be shared in any way with the regular kernel. So locking
  is pointless.
 
  Indeed, I missed that all the other archs have this uninlined in hal.c.
 
  However, this leaves at least a race between xnintr_disable/enable and
  XN_ISR_PROPAGATE (ie. the related Linux path) behind.
 
  I can't see why XN_ISR_PROPAGATE would be involved here. This service
  pends an interrupt in the pipeline log.
 
  And this finally lets Linux code run that fiddles with irq_desc::status
  as well - potentially in parallel to an unsyncrhonized
  xnintr_irq_disable in a different context. That's the problem.
  
  Propagation happens in primary domain. When is this supposed to conflict
  on the same CPU with linux?
 
 The propagation triggers the delivery of this IRQ to the Linux domain,
 thus at some point there will be Linux accessing the descriptor while
 there might be xnintr_irq_enable/disable running on some other CPU (or
 it was preempted at the wrong point on the very same CPU).

The point is that XN_ISR_PROPAGATE, as a mean to force sharing of an IRQ
between both domains is plain wrong. Remove this, and no conflict
remains; this is what needs to be addressed. The potential issue between
xnintr_enable/disable and the hal routines does not exist, if those
callers handle locking properly.

 
  
 
 
   Not sure if it
  matters practically - but risking silent breakage for this micro
  optimization?
 
  It was not meant as an optimization; we may not grab the linux
  descriptor lock in this context because we may enter it in primary mode.
 
  Oh, that lock isn't harden as I somehow assumed. This of course
  complicates things.
 
 
   Is disabling/enabling really in the highly
  latency-critical anywhere? Otherwise, I would suggest to just plug this
  by adding the intended lock for this field.
 
  The caller is expected to manage locking; AFAICS the only one who does
  not is the RTAI skin, which is obsolete and removed in 2.6.x, so no big
  deal.
 
  The problem is that IRQ forwarding to Linux may let this manipulation
  race with plain Linux code, thus has to synchronize with it. It is a
  corner case (no one is supposed to pass IRQs down blindly anyway - if at
  all), but it should at least be documented (Don't use disable/enable
  together with IRQ forwarding unless you acquire the descriptor lock
  properly!).
 
  BTW, do we need to track the descriptor state in primary mode at all?
 
  
  That is the real issue. I don't see the point of doing this with the
  current kernel code.
 
 Do we need to keep the status in synch with the hardware state for the
 case Linux may take over the descriptor again? Or will Linux test the
 state when processing a forwarded IRQ? These are the two potential
 scenarios that come to my mind. For former could be deferred, but the
 latter would be critical again.
 
 Jan
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] arm: Unprotected access to irq_desc field?

2010-10-29 Thread Philippe Gerum

On Fri, 2010-10-29 at 14:46 +0200, Philippe Gerum wrote:
 On Fri, 2010-10-29 at 14:09 +0200, Jan Kiszka wrote:
  Am 29.10.2010 14:00, Philippe Gerum wrote:
   On Fri, 2010-10-29 at 11:05 +0200, Jan Kiszka wrote:
   Am 29.10.2010 10:27, Philippe Gerum wrote:
   On Fri, 2010-10-29 at 09:00 +0200, Jan Kiszka wrote:
   Am 28.10.2010 21:34, Philippe Gerum wrote:
   On Thu, 2010-10-28 at 21:15 +0200, Gilles Chanteperdrix wrote:
   Jan Kiszka wrote:
   Gilles,
  
   I happened to come across rthal_mark_irq_disabled/enabled on arm. On
   first glance, it looks like these helpers manipulate 
   irq_desc::status
   non-atomically, i.e. without holding irq_desc::lock. Isn't this 
   fragile?
  
   I have no idea. How do the other architectures do? As far as I know,
   this code has been copied from there.
  
   Other archs do the same, simply because once an irq is managed by the
   hal, it may not be shared in any way with the regular kernel. So 
   locking
   is pointless.
  
   Indeed, I missed that all the other archs have this uninlined in hal.c.
  
   However, this leaves at least a race between xnintr_disable/enable and
   XN_ISR_PROPAGATE (ie. the related Linux path) behind.
  
   I can't see why XN_ISR_PROPAGATE would be involved here. This service
   pends an interrupt in the pipeline log.
  
   And this finally lets Linux code run that fiddles with irq_desc::status
   as well - potentially in parallel to an unsyncrhonized
   xnintr_irq_disable in a different context. That's the problem.
   
   Propagation happens in primary domain. When is this supposed to conflict
   on the same CPU with linux?
  
  The propagation triggers the delivery of this IRQ to the Linux domain,
  thus at some point there will be Linux accessing the descriptor while
  there might be xnintr_irq_enable/disable running on some other CPU (or
  it was preempted at the wrong point on the very same CPU).
 
 The point is that XN_ISR_PROPAGATE, as a mean to force sharing of an IRQ
 between both domains is plain wrong. Remove this, and no conflict
 remains; this is what needs to be addressed. The potential issue between
 xnintr_enable/disable and the hal routines does not exist, if those
 callers handle locking properly.
 

In any case, I don't think we could accept that sharing, so flipping the
bits in the hal is in fact pointless. To match the linux locking, we
should iron the irq_desc::lock, which we won't since this would cause
massive jittery. We should stick to the basic logic: no sharing,
therefore no tracking need for the irqflags. I'll kill XN_ISR_PROPAGATE
in forge at some point, for sure.

  
   
  
  
Not sure if it
   matters practically - but risking silent breakage for this micro
   optimization?
  
   It was not meant as an optimization; we may not grab the linux
   descriptor lock in this context because we may enter it in primary mode.
  
   Oh, that lock isn't harden as I somehow assumed. This of course
   complicates things.
  
  
Is disabling/enabling really in the highly
   latency-critical anywhere? Otherwise, I would suggest to just plug this
   by adding the intended lock for this field.
  
   The caller is expected to manage locking; AFAICS the only one who does
   not is the RTAI skin, which is obsolete and removed in 2.6.x, so no big
   deal.
  
   The problem is that IRQ forwarding to Linux may let this manipulation
   race with plain Linux code, thus has to synchronize with it. It is a
   corner case (no one is supposed to pass IRQs down blindly anyway - if at
   all), but it should at least be documented (Don't use disable/enable
   together with IRQ forwarding unless you acquire the descriptor lock
   properly!).
  
   BTW, do we need to track the descriptor state in primary mode at all?
  
   
   That is the real issue. I don't see the point of doing this with the
   current kernel code.
  
  Do we need to keep the status in synch with the hardware state for the
  case Linux may take over the descriptor again? Or will Linux test the
  state when processing a forwarded IRQ? These are the two potential
  scenarios that come to my mind. For former could be deferred, but the
  latter would be critical again.
  
  Jan
  
 
 -- 
 Philippe.
 
 
 
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] arm: Unprotected access to irq_desc field?

2010-10-28 Thread Philippe Gerum

On Thu, 2010-10-28 at 21:15 +0200, Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
  Gilles,
  
  I happened to come across rthal_mark_irq_disabled/enabled on arm. On
  first glance, it looks like these helpers manipulate irq_desc::status
  non-atomically, i.e. without holding irq_desc::lock. Isn't this fragile?
 
 I have no idea. How do the other architectures do? As far as I know,
 this code has been copied from there.

Other archs do the same, simply because once an irq is managed by the
hal, it may not be shared in any way with the regular kernel. So locking
is pointless.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] hanging in Xenomai 2.5.5

2010-10-16 Thread Philippe Gerum

On Fri, 2010-10-15 at 22:43 -0700, Stefan Schaal wrote:
 Hi everybody,
 
   here is a quick first report on an issue that appeared with Xenomai 2.5.5 
 --- NOTE: 2.5.4 (and earlier) DOES NOT have this issue.
 
 We run multiple real-time processes, synchronized by semaphores and 
 interprocess communication using shared memory. All is cleanly implemented 
 using the xenomai real-time functions, no mode switches. The different 
 processes are distributed on different processors of our multi-core machine 
 using rt_task_spawn() with the T_CPU directive. 
 
 Up to version 2.5.4, this worked fine.
 
 With version 2.5.5 (and 2.5.5.1), the processes hang after a few seconds of 
 running (CPU consumption goes to zero), and usually one of them hangs so 
 badly that it cannot be killed anymore with kill -9 -- thus reboot is 
 required.
 
 The problems happens on BOTH our i386 machine (Dell 8-core, ubuntu 9.04, 
 kernel 2.6.29.5) AND x86_64 machine (Dell 8 core, ubuntu 9.10, kernel 
 2.6.31.4). Thus, this seems to be specific to the xenomai release 2.5.5 and 
 higher.
 
 No dmesg print-outs when this error occurs.
 
 We will try to create a simple test program to illustrate the problem, but 
 maybe the issue is already obvious to some of the experts on this list.
 

$ cat /proc/xenomai/stat
$ cat /proc/xenomai/sched

when the threads hang would help.

Additionally, please clone the -stable repo from there:
git://git.xenomai.org/xenomai-2.5.git

then branch+build and test from these commits:

- 6a020f5 first; if the bug does not show up anymore, check the next one
- 5e7cfa5; if the bug is still there, try disabling
CONFIG_XENO_OPT_PRIOCPL to test the basic system and re-check.

 Best wishes,
 
 -Stefan
 ___
 Xenomai-core mailing list
 Xenomai-core@gna.org
 https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [forge] irqbench removal

2010-10-14 Thread Philippe Gerum

On Sat, 2010-10-09 at 15:23 +0200, Jan Kiszka wrote:
 Philippe,
 
 irqbench does not inherently depend on a third I-pipe domain. It is a
 useful testcase, the only in our portfolio that targets a peripheral
 device use case. In fact, it was only of the first test cases for Native
 RTDM IIRC.
 
 Please revert the removal and then cut out only the few parts that
 actually instantiate an additional domain (i.e. mode 3.

So, what do we do with this? Any chance we move to arch-neutral code for
this test?

 
 Thanks,
 Jan
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

[Xenomai-core] Xenomai forge

2010-10-08 Thread Philippe Gerum


We need a playground for experimenting with the 3.x architecture. I have
set up a GIT tree for this purpose, which currently contains legacy
removal and preliminary cleanup work I've been doing lazily during the
past months, periodically rebasing on -head.

This tree is there for Xenomai hackers to work on radical changes toward
Xenomai 3.x; this is NOT for production use. It is expected to be in a
severe state-of-flux for several months from now on, until the updates
on the infrastructure calm down. The plan is to work on this tree, until
it makes sense to turn it into the official xenomai-3.0 tree eventually.

Some CPU architectures currently supported in Xenomai 2.5.x may not be
supported in this tree yet, until the dust settles, at some point (we do
plan to support all of them eventually, though). The bottom line is to
have powerpc (32/64), arm and x86 (32/64) available early; blackfin may
be there early too, since their reference kernel tracks mainline closely
as well. So this may leave us with nios2 lagging behind for a while.

The same goes for RTOS emulators such as VxWorks, pSOS and friends. They
have to be rebased on a new emulation core fully running in user-space
we experimented with Xenomai/SOLO, so their legacy 2.x incarnations have
been removed from the tree. This tree only features the POSIX, native
and RTDM skins for now.

The 3.x roadmap was published many moons ago on our web site [1], so I
won't rehash the final goals for this architecture. However, the major
development milestones can be outlined here:

* legacy support removal (mainly: kernel 2.4 support and in-kernel skin
APIs are being phased out, except the RTDM driver development API).
* introduction of a new RTOS emulation core, which can run on top of the
POSIX skin, or over the regular nptl.
* port of the existing Xenomai/SOLO emulators (VxWorks, pSOS) over the
new core. At some point, we shall decide whether it still makes sense to
provide VRTX and uITRON emulators on this new core, given the lack of
useful feedback we got for those for the past eight years. It seems that
nobody cares for them actually.
* integration of the missing bits to fully support our current dual
kernel software stack over -rt kernels as well (i.e. no I-pipe),
typically RTDM native.

For sure, all theses tasks will entail various cleanup, streamlining,
and sanitization activities all over the place, over time.

The forge can be found at:
git://git.xenomai.org/xenomai-forge.git

Ok, just go wild now.

[1] http://www.xenomai.org/index.php/Xenomai:Roadmap#Toward_Xenomai_3

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Overcoming the foreign stack

2010-10-07 Thread Philippe Gerum

On Wed, 2010-10-06 at 11:20 +0200, Jan Kiszka wrote: 
 Am 05.10.2010 16:21, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Am 05.10.2010 15:50, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Am 05.10.2010 15:42, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Am 05.10.2010 15:15, Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Hi,
 
  quite a few limitations and complications of using Linux services 
  over
  non-Linux domains relate to potentially invalid current and
  thread_info. The non-Linux domain could maintain their own kernel
  stacks while Linux tend to derive current and thread_info from the 
  stack
  pointer. This is not an issue anymore on x86-64 (both states are 
  stored
  in per-cpu variables) but other archs (e.g. x86-32 or ARM) still use 
  the
  stack and may continue to do so.
 
  I just looked into this thing again as I'm evaluating ways to exploit
  the kernel's tracing framework also under Xenomai. Unfortunately, it
  does a lot of fiddling with preempt_count and need_resched, so 
  patching
  it for Xenomai use would become a maintenance nightmare.
 
  An alternative, also for other use cases like kgdb and probably 
  perf, is
  to get rid of our dependency on home-grown stacks. I think we are on
  that way already as in-kernel skins have been deprecated. The only
  remaining user after them will be RTDM driver tasks. But I think 
  those
  could simply become in-kernel shadows of kthreads which would bind 
  their
  stacks to what Linux provides. Moreover, Xenomai could start updating
  current and thread_info on context switches (unless this already
  happens implicitly). That would give us proper contexts for 
  system-level
  tracing and profiling.
 
  My key question is currently if and how much of this could be 
  realized
  in 2.6. Could we drop in-kernel skins in that version? If not, what
  about disabling them by default, converting RTDM tasks to a
  kthread-based approach, and enabling tracing etc. only in that case?
  However, this might be a bit fragile unless we can establish
  compile-time or run-time requirements negotiation between Adeos and 
  its
  users (Xenomai) about the stack model.
  A stupid question: why not make things the other way around: patch the
  current and current_thread_info functions to be made I-pipe aware and
  use an ipipe_current pointer to the current thread task_struct. Of
  course, there are places where the current or current_thread_info 
  macros
  are implemented in assembly, so it may be not simple as it sounds, but
  it would allow to keep 128 Kb stacks if we want. This also means that 
  we
  would have to put a task_struct at the bottom of every Xenomai task.
  First of all, overhead vs. maintenance. Either every access to
  preempt_count() would require a check for the current domain and its
  foreign stack flag, or I would have to patch dozens (if that is enough)
  of code sites in the tracer framework.
  No. I mean we would dereference a pointer named ipipe_current. That is
  all, no other check. This pointer would be maintained elsewhere. And we
  modify the current macro, like:
 
  #ifdef CONFIG_IPIPE
  extern struct task_struct *ipipe_current;
  #define current ipipe_current
  #endif
 
  Any calll site gets modified automatically. Or current_thread_info, if
  it is current_thread_info which is obtained using the stack pointer mask
  trick.
  The stack pointer mask trick only works with fixed-sized stacks, not a
  guaranteed property of in-kernel Xenomai threads.
  Precisely the reason why I propose to replace it with a global variable
  reference, or a per-cpu variable for SMP systems.
 
  Then why is Linux not using this in favor of the stack pointer approach
  on, say, ARM?
 
  For sure, we can patch all Adeos-supported archs away from stack-based
  to per-cpu current  thread_info, but I don't feel comfortable with this
  in some way invasive approach as well. Well, maybe it's just my personal
  misperception.
  
  It is as much invasive as modifying local_irq_save/local_irq_restore.
  The real question about the global pointer approach, is, if it so much
  less efficient, how does Xenomai, which uses this scheme, manage to have
  good performances on ARM?
 
 Xenomai has no heavily-used preempt_disable/enable that is built on top
 of thread_info. But I also have no numbers on this.
 
 I looked closer at the kernel dependencies on a fixed stack size.
 Besides current and thread_info, further features that make use of this
 are stack unwinding (boundary checks) and overflow checking. So while we
 can work around the dependency for some tracing requirements, I really
 see no point in heading for this long-term. It just creates more subtle
 patching needs in Adeos, and it also requires work on Xenomai side. I
 really think it's better provide a compatible context to reduce
 maintenance efforts.
 
 So I played a bit with converting RTDM tasks to in-kernel shadows. It
 works but needs more fine-tuning. My proposal for

Re: [Xenomai-core] [Adeos-main] enable_kernel_fp broken with IPIPE on PowerPC

2010-10-04 Thread Philippe Gerum

On Fri, 2010-10-01 at 19:51 +, Steve Deiters wrote:
 I'm getting a thread crash where an unaligned floating point access occurs.  
 I tracked the cause down to enable_kernel_fp within the fix_alignment 
 routine.  The enable_kernel_fp routine is as follows:
 
 void enable_kernel_fp(void)
 {
 unsigned long flags;
 
 WARN_ON(preemptible());
 
 local_irq_save_hw_cond(flags);
 
 #ifdef CONFIG_SMP
 if (current-thread.regs  (current-thread.regs-msr  MSR_FP))
 giveup_fpu(current);
 else
 giveup_fpu(NULL);   /* just enables FP for kernel */
 #else
 giveup_fpu(last_task_used_math);
 #endif /* CONFIG_SMP */
 local_irq_restore_hw_cond(flags);
 }
 
 
 
 The local_irq_save_hw_cond saves the old MSR value in flags.  When this value 
 is restored with local_irq_restore_hw_cond it loses the MSR[FP] bit that was 
 set in giveup_fpu.  If the MSR[FP] was not previously set before it saved the 
 flags, I get a FPU exception a bit later in the alignment handling.
 
 
 As a quick fix I changed the line to restore to
 
 local_irq_restore_hw_cond(flags|MSR_FP);
 
 
 I'm not sure this is a correct fix.  I'm don't know where else there might be 
 code that is modifying the MSR in a similar fashion.  It seems any such case 
 would be broken.
 
 I'm using the ipipe version 2.10-03 patch that was bundled with Xenomai 2.5.4 
 on Linux 2.6.33.5.  I noticed that this is still the same though in the 
 2.11-00 ipipe patch.
 

Actually, giveup_fpu already handles the interrupt state properly, so
the protection code in enable_kernel_fp is buggy and useless as well.
I did not see any other spot where calling assembly code which may touch
the MSR would conflict with interrupt protection in the caller. Could
you try this patch instead?

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e4eaca4..3743b27 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -98,12 +98,8 @@ void flush_fp_to_thread(struct task_struct *tsk)
 
 void enable_kernel_fp(void)
 {
-   unsigned long flags;
-
WARN_ON(preemptible());
 
-   local_irq_save_hw_cond(flags);
-
 #ifdef CONFIG_SMP
if (current-thread.regs  (current-thread.regs-msr  MSR_FP))
giveup_fpu(current);
@@ -112,7 +108,6 @@ void enable_kernel_fp(void)
 #else
giveup_fpu(last_task_used_math);
 #endif /* CONFIG_SMP */
-   local_irq_restore_hw_cond(flags);
 }
 EXPORT_SYMBOL(enable_kernel_fp);
 

 ___
 Adeos-main mailing list
 adeos-m...@gna.org
 https://mail.gna.org/listinfo/adeos-main

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] RFC: /proc/xenomai/latency change

2010-09-27 Thread Philippe Gerum

On Sat, 2010-09-25 at 19:27 +0200, Gilles Chanteperdrix wrote:
 Gilles Chanteperdrix wrote:
  Hi,
  
  I have been working on omap3 performances, and during this, I noticed 
  one flaw in /proc/xenomai/latency: it displays the whole timer subsystem
  anticipation whereas it should probably only allow setting the scheduler
  latency. The reason is that when issuing the customary:
  echo 0  /proc/xenomai/latency
  
  we were in fact also disabling any account of the timer programming 
  latency. This is probably almoste invisible on systems with low timer 
  programming latencies, but this turned out to account for around 5us 
  error on timer programming on omap. Now, the timer programming latency 
  is back to a more reasonable 1us on omap, but I still think we should 
  change this.
  
  However, since it may break some users settings, I wonder if we should 
  apply it now or only in the 2.6 branch.
  
  Here is the patch I am talking about:
 
 Better:
 diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
 index 7db0ccf..2297b74 100644
 --- a/ksrc/nucleus/pod.c
 +++ b/ksrc/nucleus/pod.c
 @@ -3164,7 +3164,7 @@ static int latency_read_proc(char *page,
  {
 int len;
 
 -   len = sprintf(page, %Lu\n, xnarch_tsc_to_ns(nklatency));
 +   len = sprintf(page, %Lu\n, xnarch_tsc_to_ns(nklatency - 
 nktimerlat));
 len -= off;
 if (len = off + count)
 *eof = 1;
 @@ -3196,7 +3196,7 @@ static int latency_write_proc(struct file *file,
 if ((*end != '\0'  !isspace(*end)) || ns  0)
 return -EINVAL;
 
 -   nklatency = xnarch_ns_to_tsc(ns);
 +   nklatency = xnarch_ns_to_tsc(ns) + nktimerlat;
 
 return count;
  }
 
 

Fine with me. The nucleus should always know better regarding the timer
setup latency, so leaving it untouched by the /proc knob makes sense.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] RFC: /proc/xenomai/latency change

2010-09-27 Thread Philippe Gerum

On Mon, 2010-09-27 at 14:37 +0200, Gilles Chanteperdrix wrote:
 Philippe Gerum wrote:
  Fine with me. The nucleus should always know better regarding the timer
  setup latency, so leaving it untouched by the /proc knob makes sense.
  
 
 Ok. My concern was about user settings, but guaranteeing an ABI never
 meant we had to maintain the latency over Xenomai revisions, that was
 kind of silly.
 

It is even recommended to make it shorter over time.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-09-01 Thread Philippe Gerum

On Wed, 2010-09-01 at 10:39 +0200, Gilles Chanteperdrix wrote:
 Philippe Gerum wrote:
  On Mon, 2010-08-30 at 17:39 +0200, Jan Kiszka wrote:
  Philippe Gerum wrote:
  Ok, Gilles did not grumble at you, so I'm daring the following patch,
  since I agree with you here. Totally untested, not even compiled, just
  for the fun of getting lockups and/or threads in limbos. Nah, just
  kidding, your shiny SMP box should be bricked even before that:
 
  diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
  index f75c6f6..6ad66ba 100644
  --- a/include/nucleus/sched.h
  +++ b/include/nucleus/sched.h
  @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct 
  xnsched *sched)
   #define xnsched_set_resched(__sched__) do {  
  \
 xnsched_t *current_sched = xnpod_current_sched();  
  \
 xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);
  \
  To increase the probability of regressions: What about moving the above
  line...
 
  -  if (unlikely(current_sched != (__sched__)))
  \
  -  xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched);  
  \
 setbits(current_sched-status, XNRESCHED); 
  \
  -  /* remote will set XNRESCHED locally in the IPI handler */ 
  \
  +  if (current_sched != (__sched__))  
  \
  +  setbits((__sched__)-status, XNRESCHED);   
  \
  ...into this conditional block? Then you should be able to...
 
   } while (0)
   
   void xnsched_zombie_hooks(struct xnthread *thread);
  diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
  index 623bdff..cff76c2 100644
  --- a/ksrc/nucleus/pod.c
  +++ b/ksrc/nucleus/pod.c
  @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with hw 
  interrupts off. */
xnshadow_rpi_check();
}
   #endif /* CONFIG_SMP  CONFIG_XENO_OPT_PRIOCPL */
  - /*
  -  * xnsched_set_resched() did set the resched mask remotely. We
  -  * just need to make sure that our rescheduling request won't
  -  * be filtered out locally when testing for XNRESCHED
  -  * presence.
  -  */
  - setbits(sched-status, XNRESCHED);
xnpod_schedule();
   }
   
  @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct 
  xnsched *sched)
   {
int cpu = xnsched_cpu(sched), resched;
   
  - resched = xnarch_cpu_isset(cpu, sched-resched);
  - xnarch_cpu_clear(cpu, sched-resched);
  + resched = testbits(sched-status, XNRESCHED);
   #ifdef CONFIG_SMP
/* Send resched IPI to remote CPU(s). */
  + xnarch_cpu_clear(cpu, sched-resched);
  ...drop the line above as well.
 
if (unlikely(xnsched_resched_p(sched))) {
xnarch_send_ipi(sched-resched);
xnarch_cpus_clear(sched-resched);
 
  
  Yes, I do think that we are way too stable on SMP boxes these days.
  Let's merge this as well to bring the fun back.
  
 
 The current cpu bit in the resched cpu mask allowed us to know whether
 the local cpu actually needed rescheduling. At least on SMP. It may
 happen that only remote cpus were set, so, in that case, we were only
 sending the IPI, then exiting __xnpod_schedule.
 

So the choice here is, in SMP non-debug mode only, between:

- setting and clearing a bit at each local rescheduling unconditionally
- peeking at the runqueue uselessly at each rescheduling only involving
remote threads

The answer does not seem obvious.

 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [PULL-REQUEST] assorted fixes and updates for 2.5.x

2010-09-01 Thread Philippe Gerum

On Wed, 2010-09-01 at 07:14 +0200, Philippe Gerum wrote:
 On Tue, 2010-08-31 at 17:17 +0200, Philippe Gerum wrote:
  The following changes since commit 004f652d31d2e3b9b995850dbefcf12bc6dbd96d:
Gilles Chanteperdrix (1):
  Fix typo in edaf1e2e54343b6e4bf5cf6ece9175ec0ab21cad
  
  are available in the git repository at:
  
ssh+git://g...@xenomai.org/xenomai-rpm.git for-upstream
  
  Philippe Gerum (16):
powerpc: upgrade I-pipe support to 2.6.34.4-powerpc-2.10-04
nucleus: demote RPI boost upon linux-originated signal
blackfin: upgrade I-pipe support to 2.6.35.2-blackfin-1.15-00
nucleus: requeue blocked non-periodic timers properly
x86: upgrade I-pipe support to 2.6.32.20-x86-2.7-02, 
  2.6.34.5-x86-2.7-03
arm: force enable preemptible switch support in SMP mode
arm: enable VFP support in SMP
arm: use rthal_processor_id() over non-linux contexts
powerpc: resync thread switch code with mainline = 2.6.32
x86: increase SMP calibration value
nucleus/sched: move locking to resume_rpi/suspend_rpi
hal/generic: inline APC scheduling code
nucleus, posix: use fast APC scheduling call
nucleus/shadow: shorten the uninterruptible path to secondary mode
 
 This one causes the now famous need_resched debug assertion to trigger
 on UP. I'll have a look at this asap. It does not depend on 56ff4329f
 though.

Fixed by the following commit:
http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=47dac49c71e89b684203e854d1b0172ecacbc555

 
nucleus/sched: prevent remote wakeup from triggering a debug assertion
powerpc: upgrade I-pipe support to 2.6.35.4-powerpc-2.11-00
  
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

[Xenomai-core] [PULL-REQUEST] urgent scheduler fix for 2.5.x head

2010-09-01 Thread Philippe Gerum


The following changes since commit afc0eac7e4989f4134b18a256b5c5e1ca1c56a39:
  Gilles Chanteperdrix (1):
posix: add a magic to internal structures.

are available in the git repository at:

  ssh+git://g...@xenomai.org/xenomai-rpm.git for-upstream

Philippe Gerum (2):
  nucleus/sched: fix race in non-atomic suspend path
  nucleus/sched: raise self-resched condition when unlocking scheduler

 ksrc/nucleus/pod.c |   13 +
 1 files changed, 9 insertions(+), 4 deletions(-)

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-31 Thread Philippe Gerum

On Mon, 2010-08-30 at 17:39 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
  Ok, Gilles did not grumble at you, so I'm daring the following patch,
  since I agree with you here. Totally untested, not even compiled, just
  for the fun of getting lockups and/or threads in limbos. Nah, just
  kidding, your shiny SMP box should be bricked even before that:
  
  diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
  index f75c6f6..6ad66ba 100644
  --- a/include/nucleus/sched.h
  +++ b/include/nucleus/sched.h
  @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct 
  xnsched *sched)
   #define xnsched_set_resched(__sched__) do {
  \
 xnsched_t *current_sched = xnpod_current_sched();
  \
 xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);  \
 
 To increase the probability of regressions: What about moving the above
 line...
 
  -  if (unlikely(current_sched != (__sched__)))  
  \
  -  xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched);
  \
 setbits(current_sched-status, XNRESCHED);   
  \
  -  /* remote will set XNRESCHED locally in the IPI handler */   
  \
  +  if (current_sched != (__sched__))
  \
  +  setbits((__sched__)-status, XNRESCHED); 
  \
 
 ...into this conditional block? Then you should be able to...
 
   } while (0)
   
   void xnsched_zombie_hooks(struct xnthread *thread);
  diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
  index 623bdff..cff76c2 100644
  --- a/ksrc/nucleus/pod.c
  +++ b/ksrc/nucleus/pod.c
  @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with hw 
  interrupts off. */
  xnshadow_rpi_check();
  }
   #endif /* CONFIG_SMP  CONFIG_XENO_OPT_PRIOCPL */
  -   /*
  -* xnsched_set_resched() did set the resched mask remotely. We
  -* just need to make sure that our rescheduling request won't
  -* be filtered out locally when testing for XNRESCHED
  -* presence.
  -*/
  -   setbits(sched-status, XNRESCHED);
  xnpod_schedule();
   }
   
  @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct 
  xnsched *sched)
   {
  int cpu = xnsched_cpu(sched), resched;
   
  -   resched = xnarch_cpu_isset(cpu, sched-resched);
  -   xnarch_cpu_clear(cpu, sched-resched);
  +   resched = testbits(sched-status, XNRESCHED);
   #ifdef CONFIG_SMP
  /* Send resched IPI to remote CPU(s). */
  +   xnarch_cpu_clear(cpu, sched-resched);
 
 ...drop the line above as well.
 
  if (unlikely(xnsched_resched_p(sched))) {
  xnarch_send_ipi(sched-resched);
  xnarch_cpus_clear(sched-resched);
  
 

Yes, I do think that we are way too stable on SMP boxes these days.
Let's merge this as well to bring the fun back.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-31 Thread Philippe Gerum

On Tue, 2010-08-31 at 09:09 +0200, Philippe Gerum wrote:
 On Mon, 2010-08-30 at 17:39 +0200, Jan Kiszka wrote:
  Philippe Gerum wrote:
   Ok, Gilles did not grumble at you, so I'm daring the following patch,
   since I agree with you here. Totally untested, not even compiled, just
   for the fun of getting lockups and/or threads in limbos. Nah, just
   kidding, your shiny SMP box should be bricked even before that:
   
   diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
   index f75c6f6..6ad66ba 100644
   --- a/include/nucleus/sched.h
   +++ b/include/nucleus/sched.h
   @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct 
   xnsched *sched)
#define xnsched_set_resched(__sched__) do {  
   \
  xnsched_t *current_sched = xnpod_current_sched();  
   \
  xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);
   \
  
  To increase the probability of regressions: What about moving the above
  line...
  
   -  if (unlikely(current_sched != (__sched__)))
   \
   -  xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched);  
   \
  setbits(current_sched-status, XNRESCHED); 
   \
   -  /* remote will set XNRESCHED locally in the IPI handler */ 
   \
   +  if (current_sched != (__sched__))  
   \
   +  setbits((__sched__)-status, XNRESCHED);   
   \
  
  ...into this conditional block? Then you should be able to...
  
} while (0)

void xnsched_zombie_hooks(struct xnthread *thread);
   diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
   index 623bdff..cff76c2 100644
   --- a/ksrc/nucleus/pod.c
   +++ b/ksrc/nucleus/pod.c
   @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with hw 
   interrupts off. */
 xnshadow_rpi_check();
 }
#endif /* CONFIG_SMP  CONFIG_XENO_OPT_PRIOCPL */
   - /*
   -  * xnsched_set_resched() did set the resched mask remotely. We
   -  * just need to make sure that our rescheduling request won't
   -  * be filtered out locally when testing for XNRESCHED
   -  * presence.
   -  */
   - setbits(sched-status, XNRESCHED);
 xnpod_schedule();
}

   @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct 
   xnsched *sched)
{
 int cpu = xnsched_cpu(sched), resched;

   - resched = xnarch_cpu_isset(cpu, sched-resched);
   - xnarch_cpu_clear(cpu, sched-resched);
   + resched = testbits(sched-status, XNRESCHED);
#ifdef CONFIG_SMP
 /* Send resched IPI to remote CPU(s). */
   + xnarch_cpu_clear(cpu, sched-resched);
  
  ...drop the line above as well.
  
 if (unlikely(xnsched_resched_p(sched))) {
 xnarch_send_ipi(sched-resched);
 xnarch_cpus_clear(sched-resched);
   
  
 
 Yes, I do think that we are way too stable on SMP boxes these days.
 Let's merge this as well to bring the fun back.
 

All worked according to plan, this introduced a nice lockup under
switchtest load. Unfortunately, a solution exists to fix it:

--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -176,17 +176,17 @@ static inline int xnsched_self_resched_p(struct xnsched 
*sched)
 
 /* Set self resched flag for the given scheduler. */
 #define xnsched_set_self_resched(__sched__) do {   \
-  xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched); \
   setbits((__sched__)-status, XNRESCHED); \
 } while (0)


-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

[Xenomai-core] [PULL-REQUEST] assorted fixes and updates for 2.5.x

2010-08-31 Thread Philippe Gerum


The following changes since commit 004f652d31d2e3b9b995850dbefcf12bc6dbd96d:
  Gilles Chanteperdrix (1):
Fix typo in edaf1e2e54343b6e4bf5cf6ece9175ec0ab21cad

are available in the git repository at:

  ssh+git://g...@xenomai.org/xenomai-rpm.git for-upstream

Philippe Gerum (16):
  powerpc: upgrade I-pipe support to 2.6.34.4-powerpc-2.10-04
  nucleus: demote RPI boost upon linux-originated signal
  blackfin: upgrade I-pipe support to 2.6.35.2-blackfin-1.15-00
  nucleus: requeue blocked non-periodic timers properly
  x86: upgrade I-pipe support to 2.6.32.20-x86-2.7-02, 2.6.34.5-x86-2.7-03
  arm: force enable preemptible switch support in SMP mode
  arm: enable VFP support in SMP
  arm: use rthal_processor_id() over non-linux contexts
  powerpc: resync thread switch code with mainline = 2.6.32
  x86: increase SMP calibration value
  nucleus/sched: move locking to resume_rpi/suspend_rpi
  hal/generic: inline APC scheduling code
  nucleus, posix: use fast APC scheduling call
  nucleus/shadow: shorten the uninterruptible path to secondary mode
  nucleus/sched: prevent remote wakeup from triggering a debug assertion
  powerpc: upgrade I-pipe support to 2.6.35.4-powerpc-2.11-00

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-30 Thread Philippe Gerum

On Mon, 2010-08-30 at 10:51 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
  On Fri, 2010-08-27 at 20:09 +0200, Jan Kiszka wrote:
  Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Gilles Chanteperdrix wrote:
  Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Hi,
 
  I'm hitting that bug check in __xnpod_schedule after
  xnintr_clock_handler issued a xnpod_schedule like this:
 
if (--sched-inesting == 0) {
__clrbits(sched-status, XNINIRQ);
xnpod_schedule();
}
 
  Either the assumption behind the bug check is no longer correct 
  (no call
  to xnpod_schedule() without a real need), or we should check for
  __xnpod_test_resched(sched) in xnintr_clock_handler (but under 
  nklock then).
 
  Comments?
  You probably have a real bug. This BUG_ON means that the scheduler 
  is
  about to switch context for real, whereas the resched bit is not 
  set,
  which is wrong.
  This happened over my 2.6.35 port - maybe some spurious IRQ 
  enabling.
  Debugging further...
  You should look for something which changes the scheduler state 
  without
  setting the resched bit, or for something which clears the bit 
  without
  taking the scheduler changes into account.
  It looks like a generic Xenomai issue on SMP boxes, though a mostly
  harmless one:
 
  The task that was scheduled in without XNRESCHED set locally has been
  woken up by a remote CPU. The waker requeued the task and set the
  resched condition for itself and in the resched proxy mask for the
  remote CPU. But there is at least one place in the Xenomai code where 
  we
  drop the nklock between xnsched_set_resched and xnpod_schedule:
  do_taskexit_event (I bet there are even more). Now the resched target
  CPU runs into a timer handler, issues xnpod_schedule unconditionally,
  and happens to find the woken-up task before it is actually informed 
  via
  an IPI.
 
  I think this is a harmless race, but it ruins the debug assertion
  need_resched != 0.
  Not that harmless, since without the debugging code, we would miss the
  reschedule too...
  Ok. But we would finally reschedule when handling the IPI. So, the
  effect we see is a useless delay in the rescheduling.
 
  Depends on the POV: The interrupt or context switch between set_resched
  and xnpod_reschedule that may defer rescheduling may also hit us before
  we were able to wake up the thread at all. The worst case should not
  differ significantly.
  Yes, and whether we set the bit and call xnpod_schedule atomically does
  not really matter either: the IPI takes time to propagate, and since
  xnarch_send_ipi does not wait for the IPI to have been received on the
  remote CPU, there is no guarantee that xnpod_schedule could not have
  been called in the mean time.
  Indeed.
 
  More importantly, since in order to do an action on a remote xnsched_t,
  we need to hold the nklock, is there any point in not setting the
  XNRESCHED bit on that distant structure, at the same time as when we set
  the cpu bit on the local sched structure mask and send the IPI? This
  way, setting the XNRESCHED bit in the IPI handler would no longer be
  necessary, and we would avoid the race.
 
  I guess so. The IPI isn't more than a hint that something /may/ have
  changed in the schedule anyway.
 
  
  This makes sense. I'm currently testing the patch below which implements
  a close variant of Gilles's proposal. Could you try it as well, to see
  if things improve?
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=3200660065146915976c193387bf0851be10d0cc
 
 Will test ASAP.
 
  
  The logic makes sure that we can keep calling xnsched_set_resched() then
  xnpod_schedule() outside of the same critical section, which is
  something we need. Otherwise this requirement would extend to
  xnpod_suspend/resume_thread(), which is not acceptable.
 
 I still wonder if things can't be even simpler. What is the purpose of
 xnsched_t::resched? I first thought it's just there to coalesce multiple
 remote reschedule requests, thus IPIs triggered by one CPU over
 successive wakeups etc. If that is true, why going through resched for
 local changes, why not setting XNRESCHED directly? And why not setting
 the remote XNRESCHED instead of remote's xnsched_t::resched?
 

Ok, Gilles did not grumble at you, so I'm daring the following patch,
since I agree with you here. Totally untested, not even compiled, just
for the fun of getting lockups and/or threads in limbos. Nah, just
kidding, your shiny SMP box should be bricked even before that:

diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
index f75c6f6..6ad66ba 100644
--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct xnsched 
*sched)
 #define xnsched_set_resched(__sched__) do {\
   xnsched_t *current_sched = xnpod_current_sched

Re: [Xenomai-core] xenomai 2.5.3/native, kernel 2.6.31.8 and fork()

2010-08-22 Thread Philippe Gerum

On Sat, 2010-08-21 at 19:36 +0200, Gilles Chanteperdrix wrote:
 Gilles Chanteperdrix wrote:
  There are other issues to consider, such as detecting that a private
  mutex created in the father continues to be used in the child.
 
 A simple fix for this would be to keep a list of mutexes in the native
 and posix skin, and nullify their magic/opaque pointer at fork. The
 problem is that there is no more room in pthread_mutex_t, so we will
 have to malloc at pthread_mutex_init time.
 

Please simply issue a warning as you suggested once when a potentially
dangerous situation arises upon fork regarding mutexes. Piling up
non-trivial code to prevent an obviously broken application from
misbehaving even more is way too expensive if such code could introduce
more overhead, and potentially secondary mode switches.

IIUC, we are discussing about apps using in a child context some private
mutexes which were initially created in the parent context, right? If
so, then a warning upon detection should suffice to have the author go
back to the drawing board, and optionally run man pthread_mutex_init
as well.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [PATCH] Mayday support

2010-08-20 Thread Philippe Gerum

On Fri, 2010-08-20 at 14:32 +0200, Jan Kiszka wrote:
Jan Kiszka wrote:
Philippe Gerum wrote:
I've toyed a bit to find a generic approach for the nucleus to regain
complete control over a userland application running in a syscall-less
loop.

The original issue was about recovering gracefully from a runaway
situation detected by the nucleus watchdog, where a thread would spin in
primary mode without issuing any syscall, but this would also apply for
real-time signals pending for such a thread. Currently, Xenomai rt
signals cannot preempt syscall-less code running in primary mode either.

The major difference between the previous approaches we discussed about
and this one, is the fact that we now force the runaway thread to run a
piece of valid code that calls into the nucleus. We do not force the
thread to run faulty code or at a faulty address anymore. Therefore, we
can reuse this feature to improve the rt signal management, without
having to forge yet-another signal stack frame for this.

The code introduced only fixes the watchdog related issue, but also does
some groundwork for enhancing the rt signal support later. The
implementation details can be found here:
http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c

The current mayday support is only available for powerpc and x86 for
now, more will come in the next days. To have it enabled, you have to
upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
new interface available from those latest patches.

The current implementation does not break the 2.5.x ABI on purpose, so
we could merge it into the stable branch.

We definitely need user feedback on this. Typically, does arming the
nucleus watchdog with that patch support in, properly recovers from your
favorite get me out of here situation? TIA,

You can pull this stuff from
git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.

I've retested the feature as it's now in master, and it has one
remaining problem: If you run the cpu hog under gdb control and try to
break out of the while(1) loop, this doesn't work before the watchdog
expired - of course. But if you send the break before the expiry (or hit
a breakpoint), something goes wrong. The Xenomai task continues to spin,
and there is no chance to kill its process (only gdb).

# cat /proc/xenomai/sched
CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME
0 0 idle-1 - master RR ROOT/0

Eeek, we really need to have a look at this funky STAT output.

1 0 idle-1 - master R ROOT/1
0 6120 rt 99 - master Tt cpu-hog
# cat /proc/xenomai/stat
CPU PIDMSWCSWPFSTAT %CPU NAME
0 0 0 0 0 005000880.0 ROOT/0
1 0 0 0 0 00500080 99.7 ROOT/1
0 6120 0 1 0 00342180 100.0 cpu-hog
0 0 0 21005 0 0.0 IRQ3340: [timer]
1 0 0 35887 0 0.3 IRQ3340: [timer]

Fixable by this tiny change:

diff --git a/ksrc/nucleus/sched.c b/ksrc/nucleus/sched.c
index 5242d9f..04a344e 100644
--- a/ksrc/nucleus/sched.c
+++ b/ksrc/nucleus/sched.c
@@ -175,7 +175,8 @@ void xnsched_init(struct xnsched *sched, int cpu)
xnthread_name(sched-rootcb));

#ifdef CONFIG_XENO_OPT_WATCHDOG
- xntimer_init(sched-wdtimer, nktbase, xnsched_watchdog_handler);
+ xntimer_init_noblock(sched-wdtimer, nktbase,
+ xnsched_watchdog_handler);
xntimer_set_name(sched-wdtimer, [watchdog]);
xntimer_set_priority(sched-wdtimer, XNTIMER_LOPRIO);
xntimer_set_sched(sched-wdtimer, sched);

I.e. the watchdog timer should not be stopped by any ongoing debug
session of a Xenomai app. Will queue this for upstream.

Yes, that makes a lot of sense now. The watchdog would not fire if the
task was single-stepped anyway, since the latter would have been moved
to secondary mode first.

Did you see this bug happening in a uniprocessor context as well?

Jan

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [PATCH] Mayday support

2010-08-20 Thread Philippe Gerum

On Fri, 2010-08-20 at 16:06 +0200, Jan Kiszka wrote:
Philippe Gerum wrote:
On Fri, 2010-08-20 at 14:32 +0200, Jan Kiszka wrote:
Jan Kiszka wrote:
Philippe Gerum wrote:
I've toyed a bit to find a generic approach for the nucleus to regain
complete control over a userland application running in a syscall-less
loop.

The current implementation does not break the 2.5.x ABI on purpose, so
we could merge it into the stable branch.

We definitely need user feedback on this. Typically, does arming the
nucleus watchdog with that patch support in, properly recovers from your
favorite get me out of here situation? TIA,

You can pull this stuff from
git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.

# cat /proc/xenomai/sched
CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME
0 0 idle-1 - master RR ROOT/0

Eeek, we really need to have a look at this funky STAT output.

I've a patch for this queued as well. Was only a cosmetic thing.

Fixable by this tiny change:

I.e. the watchdog timer should not be stopped by any ongoing debug
session of a Xenomai app. Will queue this for upstream.

Yes, that makes a lot of sense now. The watchdog would not fire if the
task was single-stepped anyway, since the latter would have been moved
to secondary mode first.

Yep.

Did you see this bug happening in a uniprocessor context as well?

No, as it is impossible on a uniprocessor to interact with gdb if a cpu
hog - the only existing CPU is simply not available. :)

I was rather thinking of your hit-a-breakpoint-or-^C-early scenario... I
thought you did see this on UP as well, and scratched my head to
understand how this would have been possible. Ok, so let's merge this.

Jan

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https

Re: [Xenomai-core] rt timer jitter

2010-08-20 Thread Philippe Gerum

On Fri, 2010-08-20 at 18:20 +0200, Krzysztof Błaszkowski wrote:
 On Fri, 2010-08-20 at 18:06 +0200, Philippe Gerum wrote:
  On Fri, 2010-08-20 at 17:55 +0200, Krzysztof Błaszkowski wrote:
   Do you have any idea about reducing rt timer jitter ?
   I experience annoyingly big jitter in a thread which is supposed to run
   at 400us (i reckon this is nothing extra demanding from atom @1.6G)
   
   
   the thread's loop looks like:
   
   {
   function1()
   ..2()
   ..3()
   ..4()
   
   rt_task_wait_period()
   }
   
   (^yet another simplified model^)
  
  This is the typical pattern of the latency test. What figures do you get
  with:
  
  # /usr/xenomai/bin/latency -t0
  ...
  # /usr/xenomai/bin/latency -t1
  
 
 t0:
 
 RTS| -1.337| -0.039| 13.285|   0| 0|
 00:02:13/00:02:13
 

Those are common figures for user-space latency on the kind of hw you
run this test on.

 
 i can't run t1 because of missing seno_timerbench.ko (i have no idea how
 to find a config option which would build it)
 

Did you consider using the Search feature from
xconfig/gconfig/whatever, looking for timerbench?

config XENO_DRIVERS_TIMERBENCH
depends on XENO_SKIN_RTDM
tristate Timer benchmark driver
default y
help
Kernel-based benchmark driver for timer latency evaluation.
See testsuite/latency for a possible front-end.

If you run your app in kernel space, then -t1 is what you want to run.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1458 matches

Mail list logo