[Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
After spending quite a while trying to explain how things like /bin/echo could possibly segfault, I finally discovered that the new feature in xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible context switches is what is corrupting the state of random linux processes once in a while. After turning the option off, I haven't seen a single crash just like 2.4.10. So something subtle is wrong with this option. It appears to be most likely to occour (possibly only likely) when xenomai is handling interrupts. It seems that getting an interrupt in the middle of a context switch at the wrong time corrupts the process that is being switched to or from (no idea which it is). Unless someone can think of a way to track down and fix this I would certainly suggest making the option off by default instead of on. With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore. -- Len Sorensen ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
On 12/23/2011 06:33 PM, Lennart Sorensen wrote: After spending quite a while trying to explain how things like /bin/echo could possibly segfault, I finally discovered that the new feature in xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible context switches is what is corrupting the state of random linux processes once in a while. After turning the option off, I haven't seen a single crash just like 2.4.10. So something subtle is wrong with this option. It appears to be most likely to occour (possibly only likely) when xenomai is handling interrupts. It seems that getting an interrupt in the middle of a context switch at the wrong time corrupts the process that is being switched to or from (no idea which it is). Unless someone can think of a way to track down and fix this I would certainly suggest making the option off by default instead of on. Papering over a bug this way is certainly not an option. With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore. Which kernel version, what ppc hardware? -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
On Fri, Dec 23, 2011 at 07:17:09PM +0100, Philippe Gerum wrote: Papering over a bug this way is certainly not an option. Long term it certainly isn't. Which kernel version, what ppc hardware? 3.0.13, 3.0.9, 3.0.8. mpc8360e. xenomai 2.6.0 with ipipe 3.0.8-powerpc-2.13-04 -- Len Sorensen ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
On 12/23/2011 07:32 PM, Lennart Sorensen wrote: On Fri, Dec 23, 2011 at 07:17:09PM +0100, Philippe Gerum wrote: Papering over a bug this way is certainly not an option. Long term it certainly isn't. Which kernel version, what ppc hardware? 3.0.13, 3.0.9, 3.0.8. mpc8360e. xenomai 2.6.0 with ipipe 3.0.8-powerpc-2.13-04 Do you have a typical test scenario which triggers this bug? -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
On Fri, Dec 23, 2011 at 09:08:11PM +0100, Philippe Gerum wrote: Do you have a typical test scenario which triggers this bug? It can take a couple of hours under pretty heavy load to get one occourance. But with preemptible context swiches off we haven't seen any in a week. For sure xenomai tasks are handling interrupts quite a lot at the time. I wish we had a simple test case to show it, but it seems to require triggering an interrupt in the middle of a context switch at exactly the wrong place. -- Len Sorensen ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
On 12/23/2011 09:25 PM, Lennart Sorensen wrote: On Fri, Dec 23, 2011 at 09:08:11PM +0100, Philippe Gerum wrote: Do you have a typical test scenario which triggers this bug? It can take a couple of hours under pretty heavy load to get one occourance. But with preemptible context swiches off we haven't seen any in a week. For sure xenomai tasks are handling interrupts quite a lot at the time. I wish we had a simple test case to show it, but it seems to require triggering an interrupt in the middle of a context switch at exactly the wrong place. Is it reproducible with the basic latency or cyclic tests if waiting for long enough? Running ltp in parallel would trigger a decent load, but sometimes two shell loops forking commands in the background are enough to trigger a variety of issues when something fragile exists in the mmu layer as modified by the I-Pipe. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
On Fri, Dec 23, 2011 at 10:48:29PM +0100, Philippe Gerum wrote: Is it reproducible with the basic latency or cyclic tests if waiting for long enough? Running ltp in parallel would trigger a decent load, but sometimes two shell loops forking commands in the background are enough to trigger a variety of issues when something fragile exists in the mmu layer as modified by the I-Pipe. Well we can try after I come back from vacation in a couple of weeks. -- Len Sorensen ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
On 12/23/2011 10:55 PM, Lennart Sorensen wrote: On Fri, Dec 23, 2011 at 10:48:29PM +0100, Philippe Gerum wrote: Is it reproducible with the basic latency or cyclic tests if waiting for long enough? Running ltp in parallel would trigger a decent load, but sometimes two shell loops forking commands in the background are enough to trigger a variety of issues when something fragile exists in the mmu layer as modified by the I-Pipe. Well we can try after I come back from vacation in a couple of weeks. Ok. I will try to reproduce on my side as well. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core