Re: [Xenomai-core] Support for 2.6.22/x86

2007-06-30 Thread Jan Kiszka
Philippe Gerum wrote:
 Our development trunk now contains the necessary support for running
 Xenomai over 2.6.22/x86. This work boils down to enabling Xenomai to use
 the generic clock event device abstraction that comes with newest
 kernels. Other archs / kernel versions still work the older way, until
 all archs eventually catch up with clockevents upstream.
 
 This support won't be backported to 2.3.x, because it has some
 significant impact on the nucleus. Tested as thoroughly as possible here
 on low-end and mid-range x86 boxen, including SMP.
 
 Please give this hell.
 
 http://download.gna.org/adeos/patches/v2.6/i386/adeos-ipipe-2.6.22-rc6-i386-1.9-00.patch
 

Running some tests, the gate to hell just opened:

[  210.247006] BUG: sleeping function called from invalid context at
kernel/sched.c:3941
[  210.248171] in_atomic():1, irqs_disabled():1
[  210.248828] no locks held by frag-ip/881.
[  210.249494]  [c01040e9] show_trace_log_lvl+0x1f/0x34
[  210.250523]  [c0104d6c] show_trace+0x17/0x19
[  210.257778]  [c0104e6a] dump_stack+0x1b/0x1d
[  210.258070]  [c0112030] __might_sleep+0xda/0xe1
[  210.258365]  [c028bacf] wait_for_completion+0x1f/0xc3
[  210.258688]  [c01143d8] set_cpus_allowed+0x77/0x95
[  210.258992]  [c89cc202] lostage_handler+0x75/0x201 [xeno_nucleus]
[  210.259551]  [c0146fe2] rthal_apc_handler+0x5c/0x89
[  210.259869]  [c0143ba9] __ipipe_sync_stage+0x13a/0x147
[  210.260204]  [c010e6b6] __ipipe_syscall_root+0x1a6/0x1c8
[  210.260536]  [c0102809] system_call+0x29/0x41

Setup is latest SVN + a few patches (the well-known ones), CONFIG_SMP,
qemu -smp 2, RTnet in loopback mode, just terminating the frag-ip example.

However, this gremlin looks like it is /far/ older than 2.6.22 support.
Calling set_cpus_allowed() from atomic lostage_handler is simply bogus,
I'm afraid. :-/

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RFC][PATCH] shirq locking rework

2007-06-30 Thread Dmitry Adamushko
Hello Jan,

I appologize for the huge reply latency.


 Yeah, that might explain while already trying to parse it manually
 failed: What is xnintr_sync_stat_references? :)

yeah.. it was supposed to be xnintr_sync_stat_refs()


  'prev = xnstat_get_current()' reference is also tracked as reference 
  accounting becomes
  a part of the xnstat interface (not sure we do need it though).

 Mind to elaborate on _why_ you think we need this, specifically if it
 adds new atomic counters?

Forget about it, it was a wrong approach. We do reschedule in
xnintr_*_handler() and if 'prev-refs' is non-zero and a newly
scheduled thread calls xnstat_runtime_synch() (well, how it could be
in theory with this interfcae) before deleting the first thread..
oops. so this 'referencing' scheme is bad anyway.

Note, that if the real re-schedule took place in xnpod_schedule() , we
actually don't need to _restore_ 'prev' when we get control back.. it
must be already restored by xnpod_schedule() when the preempted thread
('prev' is normally a thread in which context an interrupt occurs)
gets CPU back. if I'm not missing something. hum?

...
if (--sched-inesting == 0  xnsched_resched_p())
xnpod_schedule();

(*)  'sched-current_account' should be already == 'prev' in case
xnpod_schedule() took place

xnltt_log_event(xeno_ev_iexit, irq);
xnstat_runtime_switch(sched, prev);
...

The simpler scheme with xnstat_ accounting would be if we account only
time spent in intr-isr() to corresponding intr-stat[cpu].account...
This way, all accesses to the later one would be inside
xnlock_{get,put}(xnirqs[irq].lock) sections [*].

It's preciceness (although, it's arguable to some extent) vs.
simplicity (e.g. no need for any xnintr_sync_stat_references()). I
would still prefer this approach :-)

Otherwise, so far I don't see any much nicer solution that the one
illustrated by your first patch.


 Uhh, be careful, I burned my fingers with similar things recently as
 well. You have to make sure that all types are resolvable for _all_
 includers of that header. Otherwise, I'm fine with cleanups like this.
 But I think there was once a reason for #define.

yeah.. now I recall it as well :-)



 Thanks,
 Jan


-- 
Best regards,
Dmitry Adamushko

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Support for 2.6.22/x86

2007-06-30 Thread Philippe Gerum
On Sat, 2007-06-30 at 09:48 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
  Our development trunk now contains the necessary support for running
  Xenomai over 2.6.22/x86. This work boils down to enabling Xenomai to use
  the generic clock event device abstraction that comes with newest
  kernels. Other archs / kernel versions still work the older way, until
  all archs eventually catch up with clockevents upstream.
  
  This support won't be backported to 2.3.x, because it has some
  significant impact on the nucleus. Tested as thoroughly as possible here
  on low-end and mid-range x86 boxen, including SMP.
  
  Please give this hell.
  
  http://download.gna.org/adeos/patches/v2.6/i386/adeos-ipipe-2.6.22-rc6-i386-1.9-00.patch
  
 
 Running some tests, the gate to hell just opened:
 
 [  210.247006] BUG: sleeping function called from invalid context at
 kernel/sched.c:3941
 [  210.248171] in_atomic():1, irqs_disabled():1
 [  210.248828] no locks held by frag-ip/881.
 [  210.249494]  [c01040e9] show_trace_log_lvl+0x1f/0x34
 [  210.250523]  [c0104d6c] show_trace+0x17/0x19
 [  210.257778]  [c0104e6a] dump_stack+0x1b/0x1d
 [  210.258070]  [c0112030] __might_sleep+0xda/0xe1
 [  210.258365]  [c028bacf] wait_for_completion+0x1f/0xc3
 [  210.258688]  [c01143d8] set_cpus_allowed+0x77/0x95
 [  210.258992]  [c89cc202] lostage_handler+0x75/0x201 [xeno_nucleus]
 [  210.259551]  [c0146fe2] rthal_apc_handler+0x5c/0x89
 [  210.259869]  [c0143ba9] __ipipe_sync_stage+0x13a/0x147
 [  210.260204]  [c010e6b6] __ipipe_syscall_root+0x1a6/0x1c8
 [  210.260536]  [c0102809] system_call+0x29/0x41
 
 Setup is latest SVN + a few patches (the well-known ones), CONFIG_SMP,
 qemu -smp 2, RTnet in loopback mode, just terminating the frag-ip example.
 
 However, this gremlin looks like it is /far/ older than 2.6.22 support.
 Calling set_cpus_allowed() from atomic lostage_handler is simply bogus,
 I'm afraid. :-/

Why did we never get this migration case before? I'm running with all
debug knobs on too, and never hit this issue. Anyway... The APC
dispatcher does explicitly unlock the APC serialization lock. However,
the I-pipe syncer would stall the stage before calling the dispatcher,
so we need to bracket the dispatch loop within an unstall/stall block.
This said, I'm still wondering why the preemption is disabled here.

Do you happen to run with the tracer on when testing?

 
 Jan
 
-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Support for 2.6.22/x86

2007-06-30 Thread Philippe Gerum
On Sat, 2007-06-30 at 09:48 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
  Our development trunk now contains the necessary support for running
  Xenomai over 2.6.22/x86. This work boils down to enabling Xenomai to use
  the generic clock event device abstraction that comes with newest
  kernels. Other archs / kernel versions still work the older way, until
  all archs eventually catch up with clockevents upstream.
  
  This support won't be backported to 2.3.x, because it has some
  significant impact on the nucleus. Tested as thoroughly as possible here
  on low-end and mid-range x86 boxen, including SMP.
  
  Please give this hell.
  
  http://download.gna.org/adeos/patches/v2.6/i386/adeos-ipipe-2.6.22-rc6-i386-1.9-00.patch
  
 
 Running some tests, the gate to hell just opened:
 
 [  210.247006] BUG: sleeping function called from invalid context at
 kernel/sched.c:3941
 [  210.248171] in_atomic():1, irqs_disabled():1
 [  210.248828] no locks held by frag-ip/881.
 [  210.249494]  [c01040e9] show_trace_log_lvl+0x1f/0x34
 [  210.250523]  [c0104d6c] show_trace+0x17/0x19
 [  210.257778]  [c0104e6a] dump_stack+0x1b/0x1d
 [  210.258070]  [c0112030] __might_sleep+0xda/0xe1
 [  210.258365]  [c028bacf] wait_for_completion+0x1f/0xc3
 [  210.258688]  [c01143d8] set_cpus_allowed+0x77/0x95
 [  210.258992]  [c89cc202] lostage_handler+0x75/0x201 [xeno_nucleus]
 [  210.259551]  [c0146fe2] rthal_apc_handler+0x5c/0x89
 [  210.259869]  [c0143ba9] __ipipe_sync_stage+0x13a/0x147
 [  210.260204]  [c010e6b6] __ipipe_syscall_root+0x1a6/0x1c8
 [  210.260536]  [c0102809] system_call+0x29/0x41
 
 Setup is latest SVN + a few patches (the well-known ones), CONFIG_SMP,
 qemu -smp 2, RTnet in loopback mode, just terminating the frag-ip example.
 
 However, this gremlin looks like it is /far/ older than 2.6.22 support.
 Calling set_cpus_allowed() from atomic lostage_handler is simply bogus,
 I'm afraid. :-/

Btw, you should have a look at a critical change in the way raw I-pipe
spinlocks are now manipulated (include/linux/spinlock.h wrappers).
In short, to solve a deadly bug in all previous implementations, a set
of dedicated helpers is now used to stall/unstall the current stage for
the spin_lock_irq* forms, the way it has to be, i.e. touching both the
real and virtual IRQ masks.

Such bug would accidentally clear the hardware IRQ mask, which would
lead to a recursive lock attempt whenever an interrupt is caught at the
wrong time on the same CPU, e.g.:

mask_and_ack_8259A
local_irq_save_hw()+spinlock
printk(spurious IRQ #...)
printk() -vprintk()
...
spin_lock_irqsave()
spin_unlock_irqrestore()
local_irq_enable_hw()
IRQ - mask_and_ack_8259A

The way to solve this is to make sure that the stall bit for the current
domain always reflects the state of the hardware mask when operating raw
I-pipe locks.

As a consequence of this, you may not assume anymore that calling
spin_unlock() + local_irq_restore_hw() in sequence would have the same
effect than calling spin_unlock_irqrestore() on any ipipe_spinlock_t
locks. This would have the very undesirable side-effect of leaving the
virtual IRQ mask in stalled mode. I fixed an issue of this kind in the
tracer code (__ipipe_global_path_unlock) already, precisely caught after
getting a might_sleep() warning when reading /proc/ipe/trace/{max,
frozen}.

So you may want to double-check whether some constructs of this kind
might exist in any of your local patches. I did not find any in the
vanilla code, but another round of verifications may be useful.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Support for 2.6.22/x86

2007-06-30 Thread Philippe Gerum
On Sat, 2007-06-30 at 09:48 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
  Our development trunk now contains the necessary support for running
  Xenomai over 2.6.22/x86. This work boils down to enabling Xenomai to use
  the generic clock event device abstraction that comes with newest
  kernels. Other archs / kernel versions still work the older way, until
  all archs eventually catch up with clockevents upstream.
  
  This support won't be backported to 2.3.x, because it has some
  significant impact on the nucleus. Tested as thoroughly as possible here
  on low-end and mid-range x86 boxen, including SMP.
  
  Please give this hell.
  
  http://download.gna.org/adeos/patches/v2.6/i386/adeos-ipipe-2.6.22-rc6-i386-1.9-00.patch
  
 
 Running some tests, the gate to hell just opened:
 
 [  210.247006] BUG: sleeping function called from invalid context at
 kernel/sched.c:3941
 [  210.248171] in_atomic():1, irqs_disabled():1
 [  210.248828] no locks held by frag-ip/881.
 [  210.249494]  [c01040e9] show_trace_log_lvl+0x1f/0x34
 [  210.250523]  [c0104d6c] show_trace+0x17/0x19
 [  210.257778]  [c0104e6a] dump_stack+0x1b/0x1d
 [  210.258070]  [c0112030] __might_sleep+0xda/0xe1
 [  210.258365]  [c028bacf] wait_for_completion+0x1f/0xc3
 [  210.258688]  [c01143d8] set_cpus_allowed+0x77/0x95
 [  210.258992]  [c89cc202] lostage_handler+0x75/0x201 [xeno_nucleus]
 [  210.259551]  [c0146fe2] rthal_apc_handler+0x5c/0x89
 [  210.259869]  [c0143ba9] __ipipe_sync_stage+0x13a/0x147
 [  210.260204]  [c010e6b6] __ipipe_syscall_root+0x1a6/0x1c8
 [  210.260536]  [c0102809] system_call+0x29/0x41
 
 Setup is latest SVN + a few patches (the well-known ones), CONFIG_SMP,
 qemu -smp 2, RTnet in loopback mode, just terminating the frag-ip example.
 
 However, this gremlin looks like it is /far/ older than 2.6.22 support.
 Calling set_cpus_allowed() from atomic lostage_handler is simply bogus,
 I'm afraid. :-/
 

Confirmed, this is an old bug. Just adding a might_sleep() statement
even in UP config inside the lostage handler would trigger the warning.

 Jan
 
-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Support for 2.6.22/x86

2007-06-30 Thread Philippe Gerum
On Sat, 2007-06-30 at 13:02 +0200, Philippe Gerum wrote:
 On Sat, 2007-06-30 at 09:48 +0200, Jan Kiszka wrote:
  Philippe Gerum wrote:
   Our development trunk now contains the necessary support for running
   Xenomai over 2.6.22/x86. This work boils down to enabling Xenomai to use
   the generic clock event device abstraction that comes with newest
   kernels. Other archs / kernel versions still work the older way, until
   all archs eventually catch up with clockevents upstream.
   
   This support won't be backported to 2.3.x, because it has some
   significant impact on the nucleus. Tested as thoroughly as possible here
   on low-end and mid-range x86 boxen, including SMP.
   
   Please give this hell.
   
   http://download.gna.org/adeos/patches/v2.6/i386/adeos-ipipe-2.6.22-rc6-i386-1.9-00.patch
   
  
  Running some tests, the gate to hell just opened:
  
  [  210.247006] BUG: sleeping function called from invalid context at
  kernel/sched.c:3941
  [  210.248171] in_atomic():1, irqs_disabled():1
  [  210.248828] no locks held by frag-ip/881.
  [  210.249494]  [c01040e9] show_trace_log_lvl+0x1f/0x34
  [  210.250523]  [c0104d6c] show_trace+0x17/0x19
  [  210.257778]  [c0104e6a] dump_stack+0x1b/0x1d
  [  210.258070]  [c0112030] __might_sleep+0xda/0xe1
  [  210.258365]  [c028bacf] wait_for_completion+0x1f/0xc3
  [  210.258688]  [c01143d8] set_cpus_allowed+0x77/0x95
  [  210.258992]  [c89cc202] lostage_handler+0x75/0x201 [xeno_nucleus]
  [  210.259551]  [c0146fe2] rthal_apc_handler+0x5c/0x89
  [  210.259869]  [c0143ba9] __ipipe_sync_stage+0x13a/0x147
  [  210.260204]  [c010e6b6] __ipipe_syscall_root+0x1a6/0x1c8
  [  210.260536]  [c0102809] system_call+0x29/0x41
  
  Setup is latest SVN + a few patches (the well-known ones), CONFIG_SMP,
  qemu -smp 2, RTnet in loopback mode, just terminating the frag-ip example.
  
  However, this gremlin looks like it is /far/ older than 2.6.22 support.
  Calling set_cpus_allowed() from atomic lostage_handler is simply bogus,
  I'm afraid. :-/
  
 
 Confirmed, this is an old bug. Just adding a might_sleep() statement
 even in UP config inside the lostage handler would trigger the warning.

Ok, found it. It's an I-pipe issue. Working on a fix.

 
  Jan
  
-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core