subject:"\[Xenomai\-core\] Summary\: Xenomai 2.3.2 and 2.4 lock\-ups and OOPSes"

[Xenomai-core] Summary: Xenomai 2.3.2 and 2.4 lock-ups and OOPSes

2007-09-07 Thread Peter Soetens

Just in case you hooked off the long discussion about the issues we found from
Xenomai 2.3.2 on:

  o We are using the xeno_native skin, create Xeno tasks and semaphores, but 
have strong indications that the crashes are caused by the memory allocation 
scheme of Xenomai in combination with task creation/deletion
  o We found two ways to break Xenomai, causing a 'Killed' (rt_task_delete) 
and causing an OOPS (rt_task_join).
  o They happen on 2.6.20 and 2.6.22 kernels
  o On the 2.3 branch, r2429 works, r2433 causes the faults. The patch is 
small, and in the ChangLog: 

2007-05-11  Philippe Gerum  [EMAIL PROTECTED]

* include/nucleus/heap.h (xnfreesafe): Use xnpod_current_p() when
checking for deferral.

* include/nucleus/pod.h (xnpod_current_p): Give exec mode
awareness to this predicate, checking for primary/secondary mode
of shadows.

2007-05-11  Gilles Chanteperdrix  [EMAIL PROTECTED]

* ksrc/skins: Always defer thread memory release in deletion hook
by calling xnheap_schedule_free() instead of xnfreesafe().

  o We reverted this patch on HEAD of the 2.3 branch, but got -ENOMEM errors 
during Xenomai resource allocations, indicating that later changes depend on 
this patch. So we use clean HEAD again further on to find the causes:
 o A first test (in Orocos) creates one thread, two semaphores, lets it wait 
on them and cleans up the thread.
 o During rt_task_delete, our program gets 'Killed' (without joinable thread), 
hence a user space problem. However, gdb is of no use, all thread info is 
lost.
 o We made the thread joinable (T_JOINABLE), and then joined. This bypassed 
the Kill on the first run but causes an OOPS the second time the same 
application is started:

Oops:  [#1]
PREEMPT
CPU:0
EIP:0060:[fef4a1f3]Not tainted VLI
EFLAGS: 00010002   (2.6.20.9-ipipe-1.8-08 #2)
EIP is at get_free_range+0x56/0x160 [xeno_nucleus]
eax: f3a81d01   ebx: 0200   ecx: 0101   edx: fef62b00
esi: 0101   edi: 0200   ebp: f0f33ec4   esp: f0f33e98
ds: 007b   es: 007b   ss: 0068
Process NonPeriodicActi (pid: 3020, ti=f0f32000 task=f7ce61b0 
task.ti=f0f32000)
Stack:  0600 fef62b80 f3a81b24 f3a8 fef62ba4 f3a80720 0101
   0600 f0f33f18 f7ce6360 f0f33ee4 fef4a948 fef62b80 f0f33f08 
   0400 f0f33f18 f7ce6360 f0f33f50 ff13e1de 0282 0282 bfab6350
Call Trace:
 [c0103ffb] show_trace_log_lvl+0x1f/0x35
 [c01040bb] show_stack_log_lvl+0xaa/0xcf
 [c01042a9] show_registers+0x1c9/0x392
 [c0104588] die+0x116/0x245
 [c0110fca] do_page_fault+0x287/0x61d
 [c010ea35] __ipipe_handle_exception+0x63/0x136
 [c029466d] error_code+0x79/0x88
 [fef4a948] xnheap_alloc+0x15b/0x17d [xeno_nucleus]
 [ff13e1de] __rt_task_create+0xe0/0x171 [xeno_native]
 [fef5655f] losyscall_event+0xaf/0x170 [xeno_nucleus]
 [c0138804] __ipipe_dispatch_event+0xc0/0x1da
 [c010e90b] __ipipe_syscall_root+0x43/0x10a
 [c0102e79] system_call+0x29/0x41
 ===
Code: 74 61 85 c0 74 5d c7 45 e0 00 00 00 00 8b 4d e4 8b 49 10 89 4d ec 85 c9 
74 38 8b 45 dc 8b 78 0c 89 4d f0 89 ce 89 fb eb 02 89 ce 8b 09 8d 04 3e 39 
c1 0f 94 c2 3b 5d d8 0f 92 c0 01 fb 84 c2 75
EIP: [fef4a1f3] get_free_range+0x56/0x160 [xeno_nucleus] SS:ESP 
0068:f0f33e98
[hard lockup]

  o Our application is also mixing the original RT_TASK struct and return 
value of the rt_task_self() function call when calling rt_ functions. 
Switching between one of those influences the crashing behaviour as well, not 
further investigated.

  o This was reproduced on two different systems (one with SMI workaround 
working)
 
You have the patch that broke things, I hope this gives you a hint on what 
causes our crashes. Know that Orocos as-is has worked with Xenomai from  
Xenomai 2.0 on.

Peter

-- 
Peter Soetens -- FMTC -- http://www.fmtc.be

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Summary: Xenomai 2.3.2 and 2.4 lock-ups and OOPSes

2007-09-07 Thread Gilles Chanteperdrix

Philippe Gerum wrote:
  On Fri, 2007-09-07 at 11:27 +0200, Peter Soetens wrote:
   Just in case you hooked off the long discussion about the issues we found 
   from
   Xenomai 2.3.2 on:
   
 o We are using the xeno_native skin, create Xeno tasks and semaphores, 
   but 
   have strong indications that the crashes are caused by the memory 
   allocation 
   scheme of Xenomai in combination with task creation/deletion
 o We found two ways to break Xenomai, causing a 'Killed' 
   (rt_task_delete) 
   and causing an OOPS (rt_task_join).
 o They happen on 2.6.20 and 2.6.22 kernels
 o On the 2.3 branch, r2429 works, r2433 causes the faults. The patch is 
   small, and in the ChangLog: 
   
   2007-05-11  Philippe Gerum  [EMAIL PROTECTED]
   
   * include/nucleus/heap.h (xnfreesafe): Use xnpod_current_p() when
   checking for deferral.
   
   * include/nucleus/pod.h (xnpod_current_p): Give exec mode
   awareness to this predicate, checking for primary/secondary mode
   of shadows.
   
   2007-05-11  Gilles Chanteperdrix  [EMAIL PROTECTED]
   
   * ksrc/skins: Always defer thread memory release in deletion hook
   by calling xnheap_schedule_free() instead of xnfreesafe().
   
 o We reverted this patch on HEAD of the 2.3 branch, but got -ENOMEM 
   errors 
   during Xenomai resource allocations, indicating that later changes depend 
   on 
   this patch. So we use clean HEAD again further on to find the causes:
o A first test (in Orocos) creates one thread, two semaphores, lets it 
   wait 
   on them and cleans up the thread.
  
  Please point me at the actual Orocos test code that breaks, with the
  hope to get a fairly standalone test case from it; if you do have a
  standalone test case already, this would be even better. I intend to
  address this issue asap.

Before you have a piece of code that causes the crash, I gave a look at
the code involved. The only suspicious thing I see is that the correct
working of native skins thread termination depends on the execution
order of the two deletion hooks, the one in task.c and the one in
syscall.c. As a matter of fact, if the one in task.c is executed before
the one in syscall.c, the task magic is changed and xnshadow_unmap will
never be called. I suspect this is true for all skins, but I do not know
if this could cause a crash.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Summary: Xenomai 2.3.2 and 2.4 lock-ups and OOPSes

2007-09-07 Thread Gilles Chanteperdrix

On 9/7/07, Gilles Chanteperdrix [EMAIL PROTECTED] wrote:
 Philippe Gerum wrote:
   On Fri, 2007-09-07 at 11:27 +0200, Peter Soetens wrote:
Just in case you hooked off the long discussion about the issues we
 found from
Xenomai 2.3.2 on:
   
  o We are using the xeno_native skin, create Xeno tasks and
 semaphores, but
have strong indications that the crashes are caused by the memory
 allocation
scheme of Xenomai in combination with task creation/deletion
  o We found two ways to break Xenomai, causing a 'Killed'
 (rt_task_delete)
and causing an OOPS (rt_task_join).
  o They happen on 2.6.20 and 2.6.22 kernels
  o On the 2.3 branch, r2429 works, r2433 causes the faults. The patch
 is
small, and in the ChangLog:
   
2007-05-11  Philippe Gerum  [EMAIL PROTECTED]
   
* include/nucleus/heap.h (xnfreesafe): Use xnpod_current_p() when
checking for deferral.
   
* include/nucleus/pod.h (xnpod_current_p): Give exec mode
awareness to this predicate, checking for primary/secondary mode
of shadows.
   
2007-05-11  Gilles Chanteperdrix  [EMAIL PROTECTED]
   
* ksrc/skins: Always defer thread memory release in deletion hook
by calling xnheap_schedule_free() instead of xnfreesafe().
   
  o We reverted this patch on HEAD of the 2.3 branch, but got -ENOMEM
 errors
during Xenomai resource allocations, indicating that later changes
 depend on
this patch. So we use clean HEAD again further on to find the causes:
 o A first test (in Orocos) creates one thread, two semaphores, lets it
 wait
on them and cleans up the thread.
  
   Please point me at the actual Orocos test code that breaks, with the
   hope to get a fairly standalone test case from it; if you do have a
   standalone test case already, this would be even better. I intend to
   address this issue asap.

 Before you have a piece of code that causes the crash, I gave a look at
 the code involved. The only suspicious thing I see is that the correct
 working of native skins thread termination depends on the execution
 order of the two deletion hooks, the one in task.c and the one in
 syscall.c. As a matter of fact, if the one in task.c is executed before
 the one in syscall.c, the task magic is changed and xnshadow_unmap will
 never be called. I suspect this is true for all skins, but I do not know
 if this could cause a crash.

There are two magics involved, this supposition is wrong.

-- 
   Gilles Chanteperdrix

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

[Xenomai-core] Summary: Xenomai 2.3.2 and 2.4 lock-ups and OOPSes

Re: [Xenomai-core] Summary: Xenomai 2.3.2 and 2.4 lock-ups and OOPSes

Re: [Xenomai-core] Summary: Xenomai 2.3.2 and 2.4 lock-ups and OOPSes

3 matches

Site Navigation

Mail list logo

Footer information