[Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Lennart Sorensen
After spending quite a while trying to explain how things like /bin/echo
could possibly segfault, I finally discovered that the new feature in
xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible
context switches is what is corrupting the state of random linux processes
once in a while.

After turning the option off, I haven't seen a single crash just like 2.4.10.

So something subtle is wrong with this option.

It appears to be most likely to occour (possibly only likely) when
xenomai is handling interrupts.

It seems that getting an interrupt in the middle of a context switch at
the wrong time corrupts the process that is being switched to or from
(no idea which it is).

Unless someone can think of a way to track down and fix this I would
certainly suggest making the option off by default instead of on.

With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore.

-- 
Len Sorensen

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Philippe Gerum

On 12/23/2011 06:33 PM, Lennart Sorensen wrote:

After spending quite a while trying to explain how things like /bin/echo
could possibly segfault, I finally discovered that the new feature in
xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible
context switches is what is corrupting the state of random linux processes
once in a while.

After turning the option off, I haven't seen a single crash just like 2.4.10.

So something subtle is wrong with this option.

It appears to be most likely to occour (possibly only likely) when
xenomai is handling interrupts.

It seems that getting an interrupt in the middle of a context switch at
the wrong time corrupts the process that is being switched to or from
(no idea which it is).

Unless someone can think of a way to track down and fix this I would
certainly suggest making the option off by default instead of on.



Papering over a bug this way is certainly not an option.


With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore.



Which kernel version, what ppc hardware?

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Lennart Sorensen
On Fri, Dec 23, 2011 at 07:17:09PM +0100, Philippe Gerum wrote:
 Papering over a bug this way is certainly not an option.

Long term it certainly isn't.

 Which kernel version, what ppc hardware?

3.0.13, 3.0.9, 3.0.8.  mpc8360e.

xenomai 2.6.0 with ipipe 3.0.8-powerpc-2.13-04

-- 
Len Sorensen

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Philippe Gerum

On 12/23/2011 07:32 PM, Lennart Sorensen wrote:

On Fri, Dec 23, 2011 at 07:17:09PM +0100, Philippe Gerum wrote:

Papering over a bug this way is certainly not an option.


Long term it certainly isn't.


Which kernel version, what ppc hardware?


3.0.13, 3.0.9, 3.0.8.  mpc8360e.

xenomai 2.6.0 with ipipe 3.0.8-powerpc-2.13-04



Do you have a typical test scenario which triggers this bug?

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Lennart Sorensen
On Fri, Dec 23, 2011 at 09:08:11PM +0100, Philippe Gerum wrote:
 Do you have a typical test scenario which triggers this bug?

It can take a couple of hours under pretty heavy load to get one
occourance.  But with preemptible context swiches off we haven't seen
any in a week.

For sure xenomai tasks are handling interrupts quite a lot at the time.

I wish we had a simple test case to show it, but it seems to require
triggering an interrupt in the middle of a context switch at exactly
the wrong place.

-- 
Len Sorensen

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Philippe Gerum

On 12/23/2011 09:25 PM, Lennart Sorensen wrote:

On Fri, Dec 23, 2011 at 09:08:11PM +0100, Philippe Gerum wrote:

Do you have a typical test scenario which triggers this bug?


It can take a couple of hours under pretty heavy load to get one
occourance.  But with preemptible context swiches off we haven't seen
any in a week.

For sure xenomai tasks are handling interrupts quite a lot at the time.

I wish we had a simple test case to show it, but it seems to require
triggering an interrupt in the middle of a context switch at exactly
the wrong place.



Is it reproducible with the basic latency or cyclic tests if waiting for 
long enough? Running ltp in parallel would trigger a decent load, but 
sometimes two shell loops forking commands in the background are enough 
to trigger a variety of issues when something fragile exists in the mmu 
layer as modified by the I-Pipe.


--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Lennart Sorensen
On Fri, Dec 23, 2011 at 10:48:29PM +0100, Philippe Gerum wrote:
 Is it reproducible with the basic latency or cyclic tests if waiting
 for long enough? Running ltp in parallel would trigger a decent
 load, but sometimes two shell loops forking commands in the
 background are enough to trigger a variety of issues when something
 fragile exists in the mmu layer as modified by the I-Pipe.

Well we can try after I come back from vacation in a couple of weeks.

-- 
Len Sorensen

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.

2011-12-23 Thread Philippe Gerum

On 12/23/2011 10:55 PM, Lennart Sorensen wrote:

On Fri, Dec 23, 2011 at 10:48:29PM +0100, Philippe Gerum wrote:

Is it reproducible with the basic latency or cyclic tests if waiting
for long enough? Running ltp in parallel would trigger a decent
load, but sometimes two shell loops forking commands in the
background are enough to trigger a variety of issues when something
fragile exists in the mmu layer as modified by the I-Pipe.


Well we can try after I come back from vacation in a couple of weeks.



Ok. I will try to reproduce on my side as well.

--
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core