Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jan Kiszka
Jeroen Van den Keybus wrote:
 I have a setup which apparently suffers from long interrupt-disabled
 sections during heavy load (make -j16 of a kernel). I have instrumented the
 __ipipe_unstall_iret_root() function with the tracer as follows:
 
 
 asmlinkage void __ipipe_unstall_iret_root(struct pt_regs regs)
 {
 ipipe_trace_special(0x20, 0);
 ipipe_declare_cpuid;
 
 /* Emulate IRET's handling of the interrupt flag. */
 
 ipipe_trace_special(0x21, 0);
 local_irq_disable_hw();
 
 ipipe_trace_special(0x22, 0);
 ipipe_load_cpuid();
 
 /* Restore the software state as it used to be on kernel
entry. CAUTION: NMIs must *not* return through this
emulation. */
 
 ipipe_trace_special(0x23, 0);
 if (!(regs.eflags  X86_EFLAGS_IF)) {
 ipipe_trace_special(0x24, 0);
 if (!__test_and_set_bit(IPIPE_STALL_FLAG,
 ipipe_root_domain-cpudata[cpuid].status))
 trace_hardirqs_off();
 regs.eflags |= X86_EFLAGS_IF;
 ipipe_trace_special(0x25, 0);
 } else {
 ipipe_trace_special(0x26, 0);
 if (test_bit(IPIPE_STALL_FLAG,
  ipipe_root_domain-cpudata[cpuid].status)) {
 trace_hardirqs_on();
 __clear_bit(IPIPE_STALL_FLAG,
 ipipe_root_domain-cpudata[cpuid].status);
 }
 ipipe_trace_special(0x27, 0);
 
 /* Only sync virtual IRQs here, so that we don't recurse
indefinitely in case of an external interrupt flood. */
 
 if ((ipipe_root_domain-cpudata[cpuid].
  irq_pending_hi  IPIPE_IRQMASK_VIRT) != 0) {
 ipipe_trace_special(0x28, 0);
 __ipipe_sync_pipeline(IPIPE_IRQMASK_VIRT);
 }
 ipipe_trace_special(0x29, 0);
 }
 #ifdef CONFIG_IPIPE_TRACE_IRQSOFF
 ipipe_trace_end(0x800D);
 #endif /* CONFIG_IPIPE_TRACE_IRQSOFF */
 }
 
 While under heavy load, the following trace log is produced:
 
 ...
 
 
 
+ func-1160.124  xnarch_get_cpu_time+0x8
 (rtdm_toseq_init+0x21)
 + func-1160.169  xnarch_tsc_to_ns+0x12
 (xnarch_get_cpu_time+0xf)
 + func-1160.284  rtdm_event_timedwait+0xe
 (rt_e1000_read+0xf5 [rt_e1000])
  |  # func-1150.189  xnsynch_sleep_on+0xe
 (rtdm_event_timedwait+0x141)
  |  # func-1150.184  xnpod_suspend_thread+0xe
 (xnsynch_sleep_on+0x3bf)
  |  # func-1150.194  xntimer_migrate+0xe
 (xnpod_suspend_thread+0xa2)
  |  # func-1150.144  xntimer_start_aperiodic+0xe
 (xnpod_suspend_thread+0xc5)
  |  # func-1150.334  xnarch_ns_to_tsc+0x14
 (xntimer_start_aperiodic+0x63)
  |  # func-1140.199  xnpod_schedule+0xe
 (xnpod_suspend_thread+0x237)
  |  # [18006] -?-   99  -1140.814  xnpod_schedule+0x176
 (xnpod_suspend_thread+0x237)
  |  # func-1130.874  __switch_to+0xe
 (xnpod_schedule+0x77c)
  |  # [18019] -?-   -1  -1120.799  xnpod_schedule+0x887
 (xnintr_irq_handler+0x18e)
  |   +func-1120.319  __ipipe_walk_pipeline+0xb
 (__ipipe_handle_irq+0x87)
  |   #func-1110.204  __ipipe_stall_root+0x8
 (resume_userspace+0x5)
  #func-1110.134  __ipipe_unstall_iret_root+0xa
 (restore_nocheck_notrace+0x0)
  #(0x20)  0x  -1110.119  __ipipe_unstall_iret_root+0x19
 (restore_nocheck_notrace+0x0)
  #(0x21)  0x  -1110.124  __ipipe_unstall_iret_root+0x25
 (restore_nocheck_notrace+0x0)
  |   #(0x22)  0x  -1110.119  __ipipe_unstall_iret_root+0x32
 (restore_nocheck_notrace+0x0)
  |   #(0x23)  0x  -1110.119  __ipipe_unstall_iret_root+0x45
 (restore_nocheck_notrace+0x0)
  |   #(0x26)  0x  -1100.134  __ipipe_unstall_iret_root+0x88
 (restore_nocheck_notrace+0x0)
  |   +(0x27)  0x  -1100.109  __ipipe_unstall_iret_root+0x9e
 (restore_nocheck_notrace+0x0)
  |   +(0x29)  0x  -110   57.850  __ipipe_unstall_iret_root+0xb6
 (restore_nocheck_notrace+0x0)

The tracer accounts the full time between the previous and the next
kernel trace point to the above one - although we likely left to user
land here (iret will re-enable IRQs for us). So, unless the IRQ below
should have been raised much earlier, there is no issue here.

If there is a real delay, evil user land may have switched off IRQs
(given root privileges), or there is some hardware related IRQ delivery
latency. But this cannot be determined via the tracer as it only sees
kernel land.

  |   +func -520.289  __ipipe_handle_irq+0xe
 (ipipe_ipi3+0x26)
  |   +func -520.169  __ipipe_ack_apic+0x8
 (__ipipe_handle_irq+0xbf)
  |   +func -520.154  __ipipe_dispatch_wired+0x14
 (__ipipe_handle_irq+0x5f)
  |  # func -520.309  xnintr_clock_handler+0xe
 (__ipipe_dispatch_wired+0xbe)
  |  # func   

Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jeroen Van den Keybus
 So, unless the IRQ below
 should have been raised much earlier, there is no issue here.


The point is: I actually do have interrupts (not the IPI that is visible
here) that are being withheld (on average 80 us late) during this period. On
this particular setup, I have seen numbers exceeding 300 us. Strangely
enough, the latency test does not show these hiccups. I'm suspecting cache
issues, but I would think these numbers are a bit large for that.

What you say is that between tracepoint 0x27 and 0x29, we may have entered
userland. But where does this happen, given that point 0x28 is not executed
?

I also do understand that, if a cli or sti would still lurk somewhere in
this configuration, I'm going to have a real bad time... Although I doubt
it, since the unexpected latencies always happen at this particular point.

Jeroen.


If there is a real delay, evil user land may have switched off IRQs
 (given root privileges), or there is some hardware related IRQ delivery
 latency. But this cannot be determined via the tracer as it only sees
 kernel land.


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jan Kiszka
Jeroen Van den Keybus wrote:
 So, unless the IRQ below
 should have been raised much earlier, there is no issue here.
 
 
 The point is: I actually do have interrupts (not the IPI that is visible
 here) that are being withheld (on average 80 us late) during this period. On
 this particular setup, I have seen numbers exceeding 300 us. Strangely
 enough, the latency test does not show these hiccups. I'm suspecting cache
 issues, but I would think these numbers are a bit large for that.

Indeed, 300 us are too long. Maybe long DMA bursts on your PCI bus? Can
you exclude certain devices from your load to check for this (no hardisk
load e.g., use nfs instead, then no network load, etc.)?

 
 What you say is that between tracepoint 0x27 and 0x29, we may have entered
 userland. But where does this happen, given that point 0x28 is not executed
 ?

Nope, user land is entered _after_ 0x29 (you need to read the numbers
with an offset of one: the delay is between
__ipipe_unstall_iret_root+0xb6 and _ipipe_handle_irq+0xe).

 
 I also do understand that, if a cli or sti would still lurk somewhere in
 this configuration, I'm going to have a real bad time... Although I doubt
 it, since the unexpected latencies always happen at this particular point.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jan Kiszka
[Save the CCs!]

Jeroen Van den Keybus wrote:
 Indeed, 300 us are too long. Maybe long DMA bursts on your PCI bus? Can
 you exclude certain devices from your load to check for this (no hardisk
 load e.g., use nfs instead, then no network load, etc.)?
 
 
 I already tried setting PCI latency timers of everything except my card to
 e.g. 0x08. No difference though.
 
 What you say is that between tracepoint 0x27 and 0x29, we may have
 entered
 userland. But where does this happen, given that point 0x28 is not
 executed
 ?
 Nope, user land is entered _after_ 0x29 (you need to read the numbers
 with an offset of one: the delay is between
 __ipipe_unstall_iret_root+0xb6 and _ipipe_handle_irq+0xe).
 
 
 Ah!  Now I see. So basically I would have some user space code that is doing
 something that would delay my interrupts considerably, irrespective of the
 fact that IRQs are enabled.
 
 I once tried a WBINVD in a user space task and it has a really devastating
 effect on RT (basically locks the CPU for 200us). If there was a way of
 preventing this kind of instructions...
 

Ouch, this shouldn't be allowed in user space! WBINVD is a privileged
instruction. Do we leak privileges to user land??? Please check if your
execution mode (privilege ring) is correct there.

But indeed, wbinvd is devastating if you can execute it, causing
typically around 300 us latencies, at worst even milliseconds
(cache-size and state dependent)!

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jeroen Van den Keybus

 Ouch, this shouldn't be allowed in user space! WBINVD is a privileged
 instruction. Do we leak privileges to user land??? Please check if your
 execution mode (privilege ring) is correct there.


No, I rather meant a kernel-mode program that was controlled from the user
space. Sorry for upsetting you.

But indeed, wbinvd is devastating if you can execute it, causing
 typically around 300 us latencies, at worst even milliseconds
 (cache-size and state dependent)!


If I recall correctly, some of the Linux AGP GART drivers use(d?) it.

This raises another interesting question: to what extent is the x86 actually
a viable and dependable realtime platform, with its SMI's and highly
uncontrollable caching architecture ? How would VT-based solutions compare ?
(BTW Intel should really have implemented a feature to use parts of the
cache as SRAM.)


Jeroen.
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jan Kiszka
Jeroen Van den Keybus wrote:
 Ouch, this shouldn't be allowed in user space! WBINVD is a privileged
 instruction. Do we leak privileges to user land??? Please check if your
 execution mode (privilege ring) is correct there.
 
 
 No, I rather meant a kernel-mode program that was controlled from the user
 space. Sorry for upsetting you.

Puh... fine, the world is round again. :)

 
 But indeed, wbinvd is devastating if you can execute it, causing
 typically around 300 us latencies, at worst even milliseconds
 (cache-size and state dependent)!
 
 
 If I recall correctly, some of the Linux AGP GART drivers use(d?) it.

Yep, some only require this during setup (while MTRR reconfig e.g.),
others also use it during runtime of X - making your graphical system a
real-time killer.

 
 This raises another interesting question: to what extent is the x86 actually
 a viable and dependable realtime platform, with its SMI's and highly
 uncontrollable caching architecture ? How would VT-based solutions compare ?

No surprise: x86 is a real-time minefield. Actually, I have been told
that PCI Express also requires cache flushes before starting DMA
transactions, but I haven't checked this yet.

Regarding fun with virtualisation over RT:

http://thread.gmane.org/gmane.comp.emulators.kvm.devel/8520/focus=8540
(There are even more threads about this, and some light at the end of
this tunnel, also for Intel. Check the archive.)

This reminds me that I still need to post my experimental kvm-over-ipipe
patch for VMX (yet Intel only, no AMD at hand :( ).

 (BTW Intel should really have implemented a feature to use parts of the
 cache as SRAM.)

Ack.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core