[Xenomai-core] Houston, we have a circular problem

2008-05-05 Thread Jan Kiszka
Hi,

after hacking away the barriers I-pipe erected in front of lockdep
(patches will follow on adeos-main), I was finally able to visualize a
bit more what our colleagues see in reality on SMP: some ugly, not yet
understood circular dependency when running some Xenomai app under gdb.
What lockdep tries to tell us remains unclear, unfortunately:

[  874.356703]
[  874.356957] ===

Here it hangs because of this (catched via QEMU):

(gdb) bt
#0  __delay (loops=1) at arch/x86/lib/delay_64.c:34
#1  0x80372712 in _raw_spin_lock (lock=0x81000232c6c0) at 
lib/spinlock_debug.c:111
#2  0x80479d3d in _spin_lock (lock=0x81000232c6c0) at 
kernel/spinlock.c:182
#3  0x8022e546 in task_rq_lock (p=0x81002e792000, 
flags=0x81002f487910) at kernel/sched.c:615
#4  0x8022e6b6 in try_to_wake_up (p=0x1, state=value optimized out, 
sync=341) at kernel/sched.c:1562
#5  0x8022e9a5 in default_wake_function (curr=value optimized out, 
mode=0, sync=341, key=0xf48791c8) at kernel/sched.c:3840
#6  0x8024ae51 in autoremove_wake_function (wait=0x1, mode=0, sync=341, 
key=0xf48791c8) at kernel/wait.c:132
#7  0x8022bdc7 in __wake_up_common (q=value optimized out, mode=1, 
nr_exclusive=1, sync=0, key=0x0) at kernel/sched.c:3861
#8  0x8022df43 in __wake_up (q=0x805a6240, mode=1, 
nr_exclusive=1, key=0x0) at kernel/sched.c:3880
#9  0x80235838 in wake_up_klogd () at kernel/printk.c:1013
#10 0x80235a30 in release_console_sem () at kernel/printk.c:1059
#11 0x802360be in vprintk (fmt=0x12 Address 0x12 out of bounds, 
args=0x81002f487a72) at kernel/printk.c:807
#12 0x802361e5 in printk (fmt=0x8054c0fd \n, '=' repeats 55 
times, \n) at kernel/printk.c:664
#13 0x80256268 in print_circular_bug_header (entry=0x809cb8c0, 
depth=2) at kernel/lockdep.c:902
#14 0x80256f84 in check_noncircular (source=value optimized out, 
depth=1) at kernel/lockdep.c:973
#15 0x80256f8e in check_noncircular (source=value optimized out, 
depth=0) at kernel/lockdep.c:975
#16 0x80257a45 in __lock_acquire (lock=0x81002e9ff960, subclass=0, 
trylock=0, read=0, check=2, hardirqs_off=1, ip=18446744071564715933) at 
kernel/lockdep.c:1324
#17 0x80258500 in lock_acquire (lock=0x1, subclass=0, 
trylock=-2140427232, read=-192441912, check=2, ip=value optimized out) at 
kernel/lockdep.c:2703
#18 0x80479d35 in _spin_lock (lock=0x81002e9ff948) at 
kernel/spinlock.c:181
#19 0x8028679d in schedule_event (event=value optimized out, ipd=0x0, 
data=0x81002e198000) at kernel/xenomai/nucleus/shadow.c:2197
#20 0x80274bfc in __ipipe_dispatch_event (event=33, 
data=0x81002e198000) at kernel/ipipe/core.c:828
#21 0x80477637 in schedule () at kernel/sched.c:1897
#22 0x80247598 in worker_thread (__cwq=value optimized out) at 
kernel/workqueue.c:314
#23 0x8024ad16 in kthread (_create=value optimized out) at 
kernel/kthread.c:78
#24 0x8020d238 in child_rip ()
#25 0x in ?? ()

The lock in question should be task-sighand-siglock, but as we hit the
bug inside the scheduler, printk deadlocks now :(. Need to dig out some
patch of Steven Rostedt (IIRC) that may overcome the second deadlock.

But maybe someone already hears a bell ringing. Would be highly
appreciated as gdb is effectively unusable here.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Houston, we have a circular problem

2008-05-05 Thread Jan Kiszka
Jan Kiszka wrote:
 Hi,
 
 after hacking away the barriers I-pipe erected in front of lockdep
 (patches will follow on adeos-main), I was finally able to visualize a
 bit more what our colleagues see in reality on SMP: some ugly, not yet
 understood circular dependency when running some Xenomai app under gdb.
 What lockdep tries to tell us remains unclear, unfortunately:
 
 [  874.356703]
 [  874.356957] ===

Got it!

[0.00]
[0.00] ===
[0.00] [ INFO: possible circular locking dependency detected ]
[0.00] 2.6.24.6-xeno_64 #313
[0.00] ---
[0.00] gdb/4385 is trying to acquire lock:
[0.00]  ((spinlock_t *)sighand-siglock){}, at: 
[802867c1] schedule_event+0x7f/0x578
[0.00]
[0.00] but task is already holding lock:
[0.00]  (rq-rq_lock_key){++..}, at: [80477235] 
schedule+0x176/0x7ff
[0.00]
[0.00] which lock already depends on the new lock.
[0.00]
[0.00]
[0.00] the existing dependency chain (in reverse order) is:
[0.00]
[0.00] - #2 (rq-rq_lock_key){++..}:
[0.00][80257b70] __lock_acquire+0xb91/0xd80
[0.00][80258524] lock_acquire+0x9d/0xbc
[0.00][8022e546] task_rq_lock+0x7f/0xb8
[0.00][80479d65] _spin_lock+0x2a/0x36
[0.00][8022e546] task_rq_lock+0x7f/0xb8
[0.00][8022e6b6] try_to_wake_up+0x29/0x306
[0.00][8022e9a5] default_wake_function+0x12/0x14
[0.00][8022bdc7] __wake_up_common+0x4b/0x7a
[0.00][8022de9f] complete+0x3d/0x51
[0.00][80231321] migration_thread+0x0/0x22b
[0.00][8024ad18] kthread+0x2c/0x7c
[0.00][8020d238] child_rip+0xa/0x12
[0.00][8020c8e8] restore_args+0x0/0x30
[0.00][8024acec] kthread+0x0/0x7c
[0.00][8020d22e] child_rip+0x0/0x12
[0.00][] 0x
[0.00]
[0.00] - #1 ((spinlock_t *)q-lock){++..}:
[0.00][80257b70] __lock_acquire+0xb91/0xd80
[0.00][80258524] lock_acquire+0x9d/0xbc
[0.00][8022ded6] __wake_up_sync+0x23/0x53
[0.00][8047a07b] _spin_lock_irqsave+0x69/0x79
[0.00][8022ded6] __wake_up_sync+0x23/0x53
[0.00][802419f1] do_notify_parent+0x1ea/0x207
[0.00][802db7a6] kmem_cache_free+0xc6/0xcf
[0.00][803727b6] _raw_write_lock+0xe/0x90
[0.00][80239451] do_exit+0x5fd/0x7e0
[0.00][8023955b] do_exit+0x707/0x7e0
[0.00][80246135] __call_usermodehelper+0x0/0x61
[0.00][802464fe] request_module+0x0/0x166
[0.00][8020d238] child_rip+0xa/0x12
[0.00][8020c8e8] restore_args+0x0/0x30
[0.00][8024637b] call_usermodehelper+0x0/0x183
[0.00][8020d22e] child_rip+0x0/0x12
[0.00][] 0x
[0.00]
[0.00] - #0 ((spinlock_t *)sighand-siglock){}:
[0.00][802556db] print_circular_bug_entry+0x4d/0x54
[0.00][80257a72] __lock_acquire+0xa93/0xd80
[0.00][80258524] lock_acquire+0x9d/0xbc
[0.00][802867c1] schedule_event+0x7f/0x578
[0.00][80479d65] _spin_lock+0x2a/0x36
[0.00][802867c1] schedule_event+0x7f/0x578
[0.00][80274c20] __ipipe_dispatch_event+0xe4/0x1db
[0.00][80477667] schedule+0x5a8/0x7ff
[0.00][80238cd1] do_wait+0xb5e/0xc4c
[0.00][80372466] _raw_read_unlock+0xe/0x2d
[0.00][80238d00] do_wait+0xb8d/0xc4c
[0.00][8022e993] default_wake_function+0x0/0x14
[0.00][8022211c] mcount+0x4c/0x72
[0.00][80238dec] sys_wait4+0x2d/0x2f
[0.00][8020c1a2] system_call+0x92/0x97
[0.00][] 0x
[0.00]
[0.00] other info that might help us debug this:
[0.00]
[0.00] 1 lock held by gdb/4385:
[0.00]  #0:  (rq-rq_lock_key){++..}, at: [80477235] 
schedule+0x176/0x7ff
[0.00]
[0.00] stack backtrace:
[0.00] Pid: 4385, comm: gdb Not tainted 2.6.24.6-xeno_64 #313
[0.00]
[0.00] Call Trace:
[0.00]  

Re: [Xenomai-core] Houston, we have a circular problem

2008-05-05 Thread Philippe Gerum
Jan Kiszka wrote:
 Hi,
 
 after hacking away the barriers I-pipe erected in front of lockdep
 (patches will follow on adeos-main), I was finally able to visualize a
 bit more what our colleagues see in reality on SMP: some ugly, not yet
 understood circular dependency when running some Xenomai app under gdb.
 What lockdep tries to tell us remains unclear, unfortunately:
 
 [  874.356703]
 [  874.356957] ===
 
 Here it hangs because of this (catched via QEMU):
 
 (gdb) bt
 #0  __delay (loops=1) at arch/x86/lib/delay_64.c:34
 #1  0x80372712 in _raw_spin_lock (lock=0x81000232c6c0) at 
 lib/spinlock_debug.c:111
 #2  0x80479d3d in _spin_lock (lock=0x81000232c6c0) at 
 kernel/spinlock.c:182
 #3  0x8022e546 in task_rq_lock (p=0x81002e792000, 
 flags=0x81002f487910) at kernel/sched.c:615
 #4  0x8022e6b6 in try_to_wake_up (p=0x1, state=value optimized out, 
 sync=341) at kernel/sched.c:1562
 #5  0x8022e9a5 in default_wake_function (curr=value optimized out, 
 mode=0, sync=341, key=0xf48791c8) at kernel/sched.c:3840
 #6  0x8024ae51 in autoremove_wake_function (wait=0x1, mode=0, 
 sync=341, key=0xf48791c8) at kernel/wait.c:132
 #7  0x8022bdc7 in __wake_up_common (q=value optimized out, mode=1, 
 nr_exclusive=1, sync=0, key=0x0) at kernel/sched.c:3861
 #8  0x8022df43 in __wake_up (q=0x805a6240, mode=1, 
 nr_exclusive=1, key=0x0) at kernel/sched.c:3880
 #9  0x80235838 in wake_up_klogd () at kernel/printk.c:1013
 #10 0x80235a30 in release_console_sem () at kernel/printk.c:1059
 #11 0x802360be in vprintk (fmt=0x12 Address 0x12 out of bounds, 
 args=0x81002f487a72) at kernel/printk.c:807
 #12 0x802361e5 in printk (fmt=0x8054c0fd \n, '=' repeats 
 55 times, \n) at kernel/printk.c:664
 #13 0x80256268 in print_circular_bug_header 
 (entry=0x809cb8c0, depth=2) at kernel/lockdep.c:902
 #14 0x80256f84 in check_noncircular (source=value optimized out, 
 depth=1) at kernel/lockdep.c:973
 #15 0x80256f8e in check_noncircular (source=value optimized out, 
 depth=0) at kernel/lockdep.c:975
 #16 0x80257a45 in __lock_acquire (lock=0x81002e9ff960, 
 subclass=0, trylock=0, read=0, check=2, hardirqs_off=1, 
 ip=18446744071564715933) at kernel/lockdep.c:1324
 #17 0x80258500 in lock_acquire (lock=0x1, subclass=0, 
 trylock=-2140427232, read=-192441912, check=2, ip=value optimized out) at 
 kernel/lockdep.c:2703
 #18 0x80479d35 in _spin_lock (lock=0x81002e9ff948) at 
 kernel/spinlock.c:181
 #19 0x8028679d in schedule_event (event=value optimized out, 
 ipd=0x0, data=0x81002e198000) at kernel/xenomai/nucleus/shadow.c:2197
 #20 0x80274bfc in __ipipe_dispatch_event (event=33, 
 data=0x81002e198000) at kernel/ipipe/core.c:828
 #21 0x80477637 in schedule () at kernel/sched.c:1897
 #22 0x80247598 in worker_thread (__cwq=value optimized out) at 
 kernel/workqueue.c:314
 #23 0x8024ad16 in kthread (_create=value optimized out) at 
 kernel/kthread.c:78
 #24 0x8020d238 in child_rip ()
 #25 0x in ?? ()
 
 The lock in question should be task-sighand-siglock, but as we hit the
 bug inside the scheduler, printk deadlocks now :(. Need to dig out some
 patch of Steven Rostedt (IIRC) that may overcome the second deadlock.
 
 But maybe someone already hears a bell ringing. Would be highly
 appreciated as gdb is effectively unusable here.
 

do_schedule_event() is the culprit when it reads the pending signals on the
shared queue (XNDEBUG check for rearming the timers), but we really need to know
who is the first locker to fix that properly. Any chance busting the spinlocks
and running with printk_sync() mode on the current domain would get us the
traces ou?


-- 
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Houston, we have a circular problem

2008-05-05 Thread Gilles Chanteperdrix
On Mon, May 5, 2008 at 6:08 PM, Philippe Gerum [EMAIL PROTECTED] wrote:
  do_schedule_event() is the culprit when it reads the pending signals on the
  shared queue (XNDEBUG check for rearming the timers),

A stupid suggestion: if we know that the spinlock is always locked
when calling do_schedule_event, maybe we can simply avoid the lock
there ?

-- 
 Gilles

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [HACK] x86-64: Reenable lockdep over i-pipe

2008-05-05 Thread Jan Kiszka
Hi,

not worth to call it a patch, at least not when looking at the second
file.

Lockdep gets unhappy when I-pipe is enabled because

 a) TRACE_IRQS_ON/OFF instrumentations in entry_64.S do not correlate
with the root's pipeline state. Instead of fixing the latter (which
costs extra cycles), the fix relaxes the sanity check.

 b) The exit path of __ipipe_handle_exception reenables Linux IRQs when
the were not disabled. I do not understand that case fully yet, so
I hacked the alarm away with that cli.

Jan

---
 arch/x86/kernel/ipipe.c |1 +
 kernel/lockdep.c|4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

Index: b/kernel/lockdep.c
===
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -2028,7 +2028,7 @@ void trace_hardirqs_on(void)
curr-hardirqs_enabled = 1;
ip = (unsigned long) __builtin_return_address(0);
 
-   if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+   if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()  !irqs_disabled_hw()))
return;
if (DEBUG_LOCKS_WARN_ON(current-hardirq_context))
return;
@@ -2064,7 +2064,7 @@ void trace_hardirqs_off(void)
if (unlikely(!debug_locks || current-lockdep_recursion))
return;
 
-   if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+   if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()  !irqs_disabled_hw()))
return;
 
if (curr-hardirqs_enabled) {
Index: b/arch/x86/kernel/ipipe.c
===
--- a/arch/x86/kernel/ipipe.c
+++ b/arch/x86/kernel/ipipe.c
@@ -799,6 +799,7 @@ int __ipipe_handle_exception(struct pt_r
}
 
__ipipe_std_extable[vector](regs, error_code);
+   asm volatile (cli);
local_irq_restore(flags);
__fixup_if(regs);
 

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Houston, we have a circular problem

2008-05-05 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 On Mon, May 5, 2008 at 6:08 PM, Philippe Gerum [EMAIL PROTECTED] wrote:
  do_schedule_event() is the culprit when it reads the pending signals on the
  shared queue (XNDEBUG check for rearming the timers),
 
 A stupid suggestion: if we know that the spinlock is always locked
 when calling do_schedule_event, maybe we can simply avoid the lock
 there ?

Would be the best solution - but I don't think so. After reading a bit
more into the lockdep output, I think the issue is that _some_other_
task my hold the siglock and then acquire our rq_lock, but not
necessarily along a similar code path we took to acquire the siglock now.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Houston, we have a circular problem

2008-05-05 Thread Philippe Gerum
Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 On Mon, May 5, 2008 at 6:08 PM, Philippe Gerum [EMAIL PROTECTED] wrote:
  do_schedule_event() is the culprit when it reads the pending signals on the
  shared queue (XNDEBUG check for rearming the timers),
 A stupid suggestion: if we know that the spinlock is always locked
 when calling do_schedule_event, maybe we can simply avoid the lock
 there ?
 
 Would be the best solution - but I don't think so. After reading a bit
 more into the lockdep output, I think the issue is that _some_other_
 task my hold the siglock and then acquire our rq_lock, but not
 necessarily along a similar code path we took to acquire the siglock now.
 

Mm, indeed. Grabbing any further kernel lock when we hold the runqueue lock is
probably a very bad idea, since any normal locking sequence would exactly go the
opposite way...

-- 
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Houston, we have a circular problem

2008-05-05 Thread Philippe Gerum
Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 On Mon, May 5, 2008 at 6:08 PM, Philippe Gerum [EMAIL PROTECTED] wrote:
  do_schedule_event() is the culprit when it reads the pending signals on the
  shared queue (XNDEBUG check for rearming the timers),
 A stupid suggestion: if we know that the spinlock is always locked
 when calling do_schedule_event, maybe we can simply avoid the lock
 there ?
 
 Would be the best solution - but I don't think so. After reading a bit
 more into the lockdep output, I think the issue is that _some_other_
 task my hold the siglock and then acquire our rq_lock, but not
 necessarily along a similar code path we took to acquire the siglock now.
 

Actually, this locking around the sigmask retrieval looks overkill, since we
only address ptracing signals here, and those should go through the shared
pending set, not through the task's private one. I.e. There should be no way to
get fooled by any asynchronous update of that mask.

-- 
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Patch 4/7] Define ARM atomic operations in user-space

2008-05-05 Thread Gilles Chanteperdrix
On Sat, May 3, 2008 at 12:34 AM, Gilles Chanteperdrix
[EMAIL PROTECTED] wrote:

  The include/asm-arm/atomic.h header now defines the xnarch_memory_barrier in
  addition to user-space atomic operations. The pxa3xx deserves a special
  treatment since it uses the ARMv6 memory barrier operation whereas being an
  ARMv5 for other operations, hence a special --enable-arm-mach=pxa3xx option 
 in
  configure.in.

After a quick look at all architectures xenomai supports, it seems
that we can implement atomic_cmpxchg for most of them in user-space.
The only one for which I have a doubt is blackfin. Since linux for
blackfin is uclinux, can we call cli/sti from user-space ?

-- 
 Gilles

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Patch 4/7] Define ARM atomic operations in user-space

2008-05-05 Thread Gilles Chanteperdrix
On Mon, May 5, 2008 at 6:39 PM, Philippe Gerum [EMAIL PROTECTED] wrote:

 Gilles Chanteperdrix wrote:
   On Sat, May 3, 2008 at 12:34 AM, Gilles Chanteperdrix
   [EMAIL PROTECTED] wrote:
The include/asm-arm/atomic.h header now defines the 
 xnarch_memory_barrier in
addition to user-space atomic operations. The pxa3xx deserves a special
treatment since it uses the ARMv6 memory barrier operation whereas being 
 an
ARMv5 for other operations, hence a special --enable-arm-mach=pxa3xx 
 option in
configure.in.
  
   After a quick look at all architectures xenomai supports, it seems
   that we can implement atomic_cmpxchg for most of them in user-space.
   The only one for which I have a doubt is blackfin. Since linux for
   blackfin is uclinux, can we call cli/sti from user-space ?
  

  Nope, these are supervisor's toys on this arch. IIRC, recent Blackfin 
 releases
  should be providing fast atomic ops from userland despite this, by using
  protected jumps to kernel space for that purpose. I need to check that in the
  kernel code, this was a discussion running on the Blackfin dev-list months 
 ago.

It looks like the same technique as used on pre-v6 ARM.

  Anyway, I would suggest to exclude Blackfin from the mutex optimization set
  right now, then go back to it when we support their latest kernel.

Ok. I was dreaming about removing all the #ifdef
XNARCH_HAVE_US_ATOMIC_CMPXCHG. But it is probably better to leave it
after all. If for anything, it will help people doing ports to new
architectures gradually.

-- 
 Gilles

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Patch 4/7] Define ARM atomic operations in user-space

2008-05-05 Thread Philippe Gerum
Gilles Chanteperdrix wrote:
 On Sat, May 3, 2008 at 12:34 AM, Gilles Chanteperdrix
 [EMAIL PROTECTED] wrote:
  The include/asm-arm/atomic.h header now defines the xnarch_memory_barrier in
  addition to user-space atomic operations. The pxa3xx deserves a special
  treatment since it uses the ARMv6 memory barrier operation whereas being an
  ARMv5 for other operations, hence a special --enable-arm-mach=pxa3xx option 
 in
  configure.in.
 
 After a quick look at all architectures xenomai supports, it seems
 that we can implement atomic_cmpxchg for most of them in user-space.
 The only one for which I have a doubt is blackfin. Since linux for
 blackfin is uclinux, can we call cli/sti from user-space ?
 

Nope, these are supervisor's toys on this arch. IIRC, recent Blackfin releases
should be providing fast atomic ops from userland despite this, by using
protected jumps to kernel space for that purpose. I need to check that in the
kernel code, this was a discussion running on the Blackfin dev-list months ago.
Anyway, I would suggest to exclude Blackfin from the mutex optimization set
right now, then go back to it when we support their latest kernel.

-- 
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Houston, we have a circular problem

2008-05-05 Thread Jan Kiszka
Philippe Gerum wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 On Mon, May 5, 2008 at 6:08 PM, Philippe Gerum [EMAIL PROTECTED] wrote:
  do_schedule_event() is the culprit when it reads the pending signals on 
 the
  shared queue (XNDEBUG check for rearming the timers),
 A stupid suggestion: if we know that the spinlock is always locked
 when calling do_schedule_event, maybe we can simply avoid the lock
 there ?
 Would be the best solution - but I don't think so. After reading a bit
 more into the lockdep output, I think the issue is that _some_other_
 task my hold the siglock and then acquire our rq_lock, but not
 necessarily along a similar code path we took to acquire the siglock now.

 
 Actually, this locking around the sigmask retrieval looks overkill, since we
 only address ptracing signals here, and those should go through the shared
 pending set, not through the task's private one. I.e. There should be no way 
 to
 get fooled by any asynchronous update of that mask.

This is a debug helper anyway, so we risk (if I got this right) at worst
a spurious unfreeze of the Xenomai timers. Does not really compare to
the current deadlock...

I will let my colleagues run the hunk below tomorrow (which works for me) -
let's see if they manage to crash it again :P (they are experts in this!).

Jan

Index: xenomai-2.4.x/ksrc/nucleus/shadow.c
===
--- xenomai-2.4.x/ksrc/nucleus/shadow.c (Revision 3734)
+++ xenomai-2.4.x/ksrc/nucleus/shadow.c (Arbeitskopie)
@@ -2194,9 +2194,7 @@ static inline void do_schedule_event(str
if (signal_pending(next)) {
sigset_t pending;
 
-   spin_lock(wrap_sighand_lock(next));/* 
Already interrupt-safe. */
wrap_get_sigpending(pending, next);
-   spin_unlock(wrap_sighand_lock(next));
 
if (sigismember(pending, SIGSTOP) ||
sigismember(pending, SIGINT))



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core