Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jan Kiszka
Jeroen Van den Keybus wrote:
>> Ouch, this shouldn't be allowed in user space! WBINVD is a privileged
>> instruction. Do we leak privileges to user land??? Please check if your
>> execution mode (privilege ring) is correct there.
> 
> 
> No, I rather meant a kernel-mode program that was controlled from the user
> space. Sorry for upsetting you.

Puh... fine, the world is round again. :)

> 
> But indeed, wbinvd is devastating if you can execute it, causing
>> typically around 300 us latencies, at worst even milliseconds
>> (cache-size and state dependent)!
> 
> 
> If I recall correctly, some of the Linux AGP GART drivers use(d?) it.

Yep, some only require this during setup (while MTRR reconfig e.g.),
others also use it during runtime of X - making your graphical system a
real-time killer.

> 
> This raises another interesting question: to what extent is the x86 actually
> a viable and dependable realtime platform, with its SMI's and highly
> uncontrollable caching architecture ? How would VT-based solutions compare ?

No surprise: x86 is a real-time minefield. Actually, I have been told
that PCI Express also requires cache flushes before starting DMA
transactions, but I haven't checked this yet.

Regarding fun with virtualisation over RT:

http://thread.gmane.org/gmane.comp.emulators.kvm.devel/8520/focus=8540
(There are even more threads about this, and some light at the end of
this tunnel, also for Intel. Check the archive.)

This reminds me that I still need to post my experimental kvm-over-ipipe
patch for VMX (yet Intel only, no AMD at hand :( ).

> (BTW Intel should really have implemented a feature to use parts of the
> cache as SRAM.)

Ack.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jeroen Van den Keybus
>
> Ouch, this shouldn't be allowed in user space! WBINVD is a privileged
> instruction. Do we leak privileges to user land??? Please check if your
> execution mode (privilege ring) is correct there.


No, I rather meant a kernel-mode program that was controlled from the user
space. Sorry for upsetting you.

But indeed, wbinvd is devastating if you can execute it, causing
> typically around 300 us latencies, at worst even milliseconds
> (cache-size and state dependent)!


If I recall correctly, some of the Linux AGP GART drivers use(d?) it.

This raises another interesting question: to what extent is the x86 actually
a viable and dependable realtime platform, with its SMI's and highly
uncontrollable caching architecture ? How would VT-based solutions compare ?
(BTW Intel should really have implemented a feature to use parts of the
cache as SRAM.)


Jeroen.
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jan Kiszka
[Save the CCs!]

Jeroen Van den Keybus wrote:
>> Indeed, 300 us are too long. Maybe long DMA bursts on your PCI bus? Can
>> you exclude certain devices from your load to check for this (no hardisk
>> load e.g., use nfs instead, then no network load, etc.)?
> 
> 
> I already tried setting PCI latency timers of everything except my card to
> e.g. 0x08. No difference though.
> 
>>> What you say is that between tracepoint 0x27 and 0x29, we may have
>> entered
>>> userland. But where does this happen, given that point 0x28 is not
>> executed
>>> ?
>> Nope, user land is entered _after_ 0x29 (you need to read the numbers
>> with an offset of one: the delay is between
>> __ipipe_unstall_iret_root+0xb6 and _ipipe_handle_irq+0xe).
> 
> 
> Ah!  Now I see. So basically I would have some user space code that is doing
> something that would delay my interrupts considerably, irrespective of the
> fact that IRQs are enabled.
> 
> I once tried a WBINVD in a user space task and it has a really devastating
> effect on RT (basically locks the CPU for 200us). If there was a way of
> preventing this kind of instructions...
> 

Ouch, this shouldn't be allowed in user space! WBINVD is a privileged
instruction. Do we leak privileges to user land??? Please check if your
execution mode (privilege ring) is correct there.

But indeed, wbinvd is devastating if you can execute it, causing
typically around 300 us latencies, at worst even milliseconds
(cache-size and state dependent)!

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jan Kiszka
Jeroen Van den Keybus wrote:
>> So, unless the IRQ below
>> should have been raised much earlier, there is no issue here.
> 
> 
> The point is: I actually do have interrupts (not the IPI that is visible
> here) that are being withheld (on average 80 us late) during this period. On
> this particular setup, I have seen numbers exceeding 300 us. Strangely
> enough, the latency test does not show these hiccups. I'm suspecting cache
> issues, but I would think these numbers are a bit large for that.

Indeed, 300 us are too long. Maybe long DMA bursts on your PCI bus? Can
you exclude certain devices from your load to check for this (no hardisk
load e.g., use nfs instead, then no network load, etc.)?

> 
> What you say is that between tracepoint 0x27 and 0x29, we may have entered
> userland. But where does this happen, given that point 0x28 is not executed
> ?

Nope, user land is entered _after_ 0x29 (you need to read the numbers
with an offset of one: the delay is between
__ipipe_unstall_iret_root+0xb6 and _ipipe_handle_irq+0xe).

> 
> I also do understand that, if a cli or sti would still lurk somewhere in
> this configuration, I'm going to have a real bad time... Although I doubt
> it, since the unexpected latencies always happen at this particular point.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jeroen Van den Keybus
> So, unless the IRQ below
> should have been raised much earlier, there is no issue here.


The point is: I actually do have interrupts (not the IPI that is visible
here) that are being withheld (on average 80 us late) during this period. On
this particular setup, I have seen numbers exceeding 300 us. Strangely
enough, the latency test does not show these hiccups. I'm suspecting cache
issues, but I would think these numbers are a bit large for that.

What you say is that between tracepoint 0x27 and 0x29, we may have entered
userland. But where does this happen, given that point 0x28 is not executed
?

I also do understand that, if a cli or sti would still lurk somewhere in
this configuration, I'm going to have a real bad time... Although I doubt
it, since the unexpected latencies always happen at this particular point.

Jeroen.


If there is a real delay, evil user land may have switched off IRQs
> (given root privileges), or there is some hardware related IRQ delivery
> latency. But this cannot be determined via the tracer as it only sees
> kernel land.
>
>
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()

2007-11-12 Thread Jan Kiszka
Jeroen Van den Keybus wrote:
> I have a setup which apparently suffers from long interrupt-disabled
> sections during heavy load (make -j16 of a kernel). I have instrumented the
> __ipipe_unstall_iret_root() function with the tracer as follows:
> 
> 
> asmlinkage void __ipipe_unstall_iret_root(struct pt_regs regs)
> {
> ipipe_trace_special(0x20, 0);
> ipipe_declare_cpuid;
> 
> /* Emulate IRET's handling of the interrupt flag. */
> 
> ipipe_trace_special(0x21, 0);
> local_irq_disable_hw();
> 
> ipipe_trace_special(0x22, 0);
> ipipe_load_cpuid();
> 
> /* Restore the software state as it used to be on kernel
>entry. CAUTION: NMIs must *not* return through this
>emulation. */
> 
> ipipe_trace_special(0x23, 0);
> if (!(regs.eflags & X86_EFLAGS_IF)) {
> ipipe_trace_special(0x24, 0);
> if (!__test_and_set_bit(IPIPE_STALL_FLAG,
> &ipipe_root_domain->cpudata[cpuid].status))
> trace_hardirqs_off();
> regs.eflags |= X86_EFLAGS_IF;
> ipipe_trace_special(0x25, 0);
> } else {
> ipipe_trace_special(0x26, 0);
> if (test_bit(IPIPE_STALL_FLAG,
>  &ipipe_root_domain->cpudata[cpuid].status)) {
> trace_hardirqs_on();
> __clear_bit(IPIPE_STALL_FLAG,
> &ipipe_root_domain->cpudata[cpuid].status);
> }
> ipipe_trace_special(0x27, 0);
> 
> /* Only sync virtual IRQs here, so that we don't recurse
>indefinitely in case of an external interrupt flood. */
> 
> if ((ipipe_root_domain->cpudata[cpuid].
>  irq_pending_hi & IPIPE_IRQMASK_VIRT) != 0) {
> ipipe_trace_special(0x28, 0);
> __ipipe_sync_pipeline(IPIPE_IRQMASK_VIRT);
> }
> ipipe_trace_special(0x29, 0);
> }
> #ifdef CONFIG_IPIPE_TRACE_IRQSOFF
> ipipe_trace_end(0x800D);
> #endif /* CONFIG_IPIPE_TRACE_IRQSOFF */
> }
> 
> While under heavy load, the following trace log is produced:
> 
> ...
> 
> 
> 
>+ func-1160.124  xnarch_get_cpu_time+0x8
> (rtdm_toseq_init+0x21)
> + func-1160.169  xnarch_tsc_to_ns+0x12
> (xnarch_get_cpu_time+0xf)
> + func-1160.284  rtdm_event_timedwait+0xe
> (rt_e1000_read+0xf5 [rt_e1000])
>  |  # func-1150.189  xnsynch_sleep_on+0xe
> (rtdm_event_timedwait+0x141)
>  |  # func-1150.184  xnpod_suspend_thread+0xe
> (xnsynch_sleep_on+0x3bf)
>  |  # func-1150.194  xntimer_migrate+0xe
> (xnpod_suspend_thread+0xa2)
>  |  # func-1150.144  xntimer_start_aperiodic+0xe
> (xnpod_suspend_thread+0xc5)
>  |  # func-1150.334  xnarch_ns_to_tsc+0x14
> (xntimer_start_aperiodic+0x63)
>  |  # func-1140.199  xnpod_schedule+0xe
> (xnpod_suspend_thread+0x237)
>  |  # [18006] --   99  -1140.814  xnpod_schedule+0x176
> (xnpod_suspend_thread+0x237)
>  |  # func-1130.874  __switch_to+0xe
> (xnpod_schedule+0x77c)
>  |  # [18019] --   -1  -1120.799  xnpod_schedule+0x887
> (xnintr_irq_handler+0x18e)
>  |   +func-1120.319  __ipipe_walk_pipeline+0xb
> (__ipipe_handle_irq+0x87)
>  |   #func-1110.204  __ipipe_stall_root+0x8
> (resume_userspace+0x5)
>  #func-1110.134  __ipipe_unstall_iret_root+0xa
> (restore_nocheck_notrace+0x0)
>  #(0x20)  0x  -1110.119  __ipipe_unstall_iret_root+0x19
> (restore_nocheck_notrace+0x0)
>  #(0x21)  0x  -1110.124  __ipipe_unstall_iret_root+0x25
> (restore_nocheck_notrace+0x0)
>  |   #(0x22)  0x  -1110.119  __ipipe_unstall_iret_root+0x32
> (restore_nocheck_notrace+0x0)
>  |   #(0x23)  0x  -1110.119  __ipipe_unstall_iret_root+0x45
> (restore_nocheck_notrace+0x0)
>  |   #(0x26)  0x  -1100.134  __ipipe_unstall_iret_root+0x88
> (restore_nocheck_notrace+0x0)
>  |   +(0x27)  0x  -1100.109  __ipipe_unstall_iret_root+0x9e
> (restore_nocheck_notrace+0x0)
>  |   +(0x29)  0x  -110   57.850  __ipipe_unstall_iret_root+0xb6
> (restore_nocheck_notrace+0x0)

The tracer accounts the full time between the previous and the next
kernel trace point to the above one - although we likely left to user
land here (iret will re-enable IRQs for us). So, unless the IRQ below
should have been raised much earlier, there is no issue here.

If there is a real delay, evil user land may have switched off IRQs
(given root privileges), or there is some hardware related IRQ delivery
latency. But this cannot be determined via the tracer as it only sees
kernel land.

>  |   +func -520.289  __ipipe_handle_irq+0xe
> (ipipe_ipi3+0x26)
>  |   +func -520.169  __ipipe_ack_apic+0x8
> (__ipipe_handle_irq+0xbf)
>  |   +func -520.154  __ipipe_dispatch_wired+0x14
> (__ipipe_handle_irq+0x5f)
>  |