from:"kus Kusche Klaus"

[RT] Compile error with certain IRQ debug config option combinations

2008-02-14 Thread kus Kusche Klaus

I'm using 2.6.24.2 with -rt1 applied 
(applies cleanly with some offsets) on x86.

I've *not* configured PROVE_LOCKING (too expensive).
However, I selected all the latency tracings, timings and histograms,
among them CRITICAL_IRQSOFF_TIMING.
This implicitly configures TRACE_IRQFLAGS.
However, TRACE_IRQFLAGS depends on PROVE_LOCKING, which is off
(and not implicitly selected).

With TRACE_IRQFLAGS on and PROVE_LOCKING off,
compilation of kernel/lockdep.c fails miserably:
  CC  kernel/lockdep.o
kernel/lockdep.c: In function 'check_usage_forwards':
kernel/lockdep.c:1740: error: 'find_usage_bit' undeclared (first use in
this function)
kernel/lockdep.c:1740: error: (Each undeclared identifier is reported
only once
kernel/lockdep.c:1740: error: for each function it appears in.)
kernel/lockdep.c:1742: error: implicit declaration of function
'find_usage_forwards'
kernel/lockdep.c:1746: error: 'forwards_match' undeclared (first use in
this function)
kernel/lockdep.c: In function 'check_usage_backwards':
kernel/lockdep.c:1759: error: 'find_usage_bit' undeclared (first use in
this function)
kernel/lockdep.c:1761: error: implicit declaration of function
'find_usage_backwards'
kernel/lockdep.c:1765: error: 'backwards_match' undeclared (first use in
this function)
make[1]: *** [kernel/lockdep.o] Error 1

Looking at the source quickly, 
it seems that the #ifdefs do not cover that combination correctly, 
but I can't figure out how to fix that (the ifdef nesting in that file
seems to have grown over time...).


By the way: With all the tracing on, 
make also tries to compile scripts/trace-it.c (with HOSTCC),
which also fails if the kernel is cross-compiled in a non-linux
development environment 
(Interix in my case - don't ask me, I didn't vote for it...).

Many thanks in advance for any help!

Greetings

-- 
Klaus Kusche, DI. Dr.Software Development - Control
KEBA AGGewerbepark Urfahr, A-4041 Linz, Austria
www.keba.com Firmenbuchgericht Linz FN 184376 t
mailto:[EMAIL PROTECTED]  Phone: +43 732 7090-23120  Fax: +43 732 7090-63401


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RT] Compile error with certain IRQ debug config option combinations

2008-02-14 Thread kus Kusche Klaus

I'm using 2.6.24.2 with -rt1 applied 
(applies cleanly with some offsets) on x86.

I've *not* configured PROVE_LOCKING (too expensive).
However, I selected all the latency tracings, timings and histograms,
among them CRITICAL_IRQSOFF_TIMING.
This implicitly configures TRACE_IRQFLAGS.
However, TRACE_IRQFLAGS depends on PROVE_LOCKING, which is off
(and not implicitly selected).

With TRACE_IRQFLAGS on and PROVE_LOCKING off,
compilation of kernel/lockdep.c fails miserably:
  CC  kernel/lockdep.o
kernel/lockdep.c: In function 'check_usage_forwards':
kernel/lockdep.c:1740: error: 'find_usage_bit' undeclared (first use in
this function)
kernel/lockdep.c:1740: error: (Each undeclared identifier is reported
only once
kernel/lockdep.c:1740: error: for each function it appears in.)
kernel/lockdep.c:1742: error: implicit declaration of function
'find_usage_forwards'
kernel/lockdep.c:1746: error: 'forwards_match' undeclared (first use in
this function)
kernel/lockdep.c: In function 'check_usage_backwards':
kernel/lockdep.c:1759: error: 'find_usage_bit' undeclared (first use in
this function)
kernel/lockdep.c:1761: error: implicit declaration of function
'find_usage_backwards'
kernel/lockdep.c:1765: error: 'backwards_match' undeclared (first use in
this function)
make[1]: *** [kernel/lockdep.o] Error 1

Looking at the source quickly, 
it seems that the #ifdefs do not cover that combination correctly, 
but I can't figure out how to fix that (the ifdef nesting in that file
seems to have grown over time...).


By the way: With all the tracing on, 
make also tries to compile scripts/trace-it.c (with HOSTCC),
which also fails if the kernel is cross-compiled in a non-linux
development environment 
(Interix in my case - don't ask me, I didn't vote for it...).

Many thanks in advance for any help!

Greetings

-- 
Klaus Kusche, DI. Dr.Software Development - Control
KEBA AGGewerbepark Urfahr, A-4041 Linz, Austria
www.keba.com Firmenbuchgericht Linz FN 184376 t
mailto:[EMAIL PROTECTED]  Phone: +43 732 7090-23120  Fax: +43 732 7090-63401


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RT 45-01: CF Card read: High latency?

2005-04-11 Thread kus Kusche Klaus

>From the three sources of multi-millisecond latency in my experiments
(console messages to dead serial console, USB I/O, CF Card bulk read), 
I've analyzed one:

The latency of around 70 milliseconds in low-priority RT processes
when running a "dd if=/dev/hda of=/dev/null" in parallel (where hda
is a CF card) is due to readahead.

Two indications for that:
* Per default, maximum readahead is 128 KB. 70 ms is exactly the time
needed to transfer 128 KB 
(our CF cards have a sustained transfer rate of 1800 KB/sec).
* If the readahead constants in the kernel are changed,
the observed latencies scale almost linearly.

So obviously:
* The PIO mode IDE transfers keep the CPU 100 % busy.
* No other processes get scheduled between the data transfer of
individual "sectors", i.e. there are no holes between the IDE 
interrupts.

However, there is no need to change the readahead mechanism,
because even the transfer of a single sector would exceed our
latency requirements. So we either need to switch to DMA (which we
can't in the short term), or run the time-critical tasks above IDE 
priority. 

So the question is, what exactly is the IDE priority?
Is the PIO transfer done in the IRQ handler or in a bh?

-- 
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [EMAIL PROTECTED]WWW: www.keba.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG in RT 45-01 when RT program dumps core

2005-04-11 Thread kus Kusche Klaus

When a process running with RT priority dumps core,
I get the following BUG:

Apr 11 13:44:23 OF455 kern.err kernel: BUG: rtc2:833 RT task
yield()-ing!
Apr 11 13:44:23 OF455 kern.warn kernel:  [] yield+0x61/0x70
(8)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
coredump_wait+0x79/0xc0 (20)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
do_coredump+0xf3/0x200 (92)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
kmem_cache_free+0x49/0x120 (32)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
atomic_dec_and_spin_lock+0x3b/0x50 (24)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
__dequeue_signal+0x105/0x160 (20)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
get_signal_to_deliver+0x334/0x350 (48)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
do_signal+0x98/0x180 (44)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
timer_interrupt+0x46/0x70 (108)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
handle_IRQ_event+0x5b/0xe0 (8)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
__do_IRQ+0x111/0x190 (48)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
do_page_fault+0x0/0x530 (16)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
do_notify_resume+0x37/0x3c (8)
Apr 11 13:44:23 OF455 kern.warn kernel:  []
work_notifysig+0x13/0x15 (8)



Klaus Kusche
> Entwicklung Software - Steuerung
> Software Development - Control
> 
> KEBA AG
> A-4041 Linz
> Gewerbepark Urfahr
> Tel +43 / 732 / 7090-3120
> Fax +43 / 732 / 7090-6301
> E-Mail: [EMAIL PROTECTED]
> www.keba.com
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG in RT 45-01 when RT program dumps core

2005-04-11 Thread kus Kusche Klaus

When a process running with RT priority dumps core,
I get the following BUG:

Apr 11 13:44:23 OF455 kern.err kernel: BUG: rtc2:833 RT task
yield()-ing!
Apr 11 13:44:23 OF455 kern.warn kernel:  [c026dad1] yield+0x61/0x70
(8)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c0151e49]
coredump_wait+0x79/0xc0 (20)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c0151f83]
do_coredump+0xf3/0x200 (92)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c0136789]
kmem_cache_free+0x49/0x120 (32)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c012abdb]
atomic_dec_and_spin_lock+0x3b/0x50 (24)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c011c9a5]
__dequeue_signal+0x105/0x160 (20)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c011e734]
get_signal_to_deliver+0x334/0x350 (48)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c01027f8]
do_signal+0x98/0x180 (44)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c0106b56]
timer_interrupt+0x46/0x70 (108)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c012c9eb]
handle_IRQ_event+0x5b/0xe0 (8)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c012cba1]
__do_IRQ+0x111/0x190 (48)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c010d590]
do_page_fault+0x0/0x530 (16)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c0102917]
do_notify_resume+0x37/0x3c (8)
Apr 11 13:44:23 OF455 kern.warn kernel:  [c0102ae6]
work_notifysig+0x13/0x15 (8)



Klaus Kusche
 Entwicklung Software - Steuerung
 Software Development - Control
 
 KEBA AG
 A-4041 Linz
 Gewerbepark Urfahr
 Tel +43 / 732 / 7090-3120
 Fax +43 / 732 / 7090-6301
 E-Mail: [EMAIL PROTECTED]
 www.keba.com
 
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RT 45-01: CF Card read: High latency?

2005-04-11 Thread kus Kusche Klaus

From the three sources of multi-millisecond latency in my experiments
(console messages to dead serial console, USB I/O, CF Card bulk read), 
I've analyzed one:

The latency of around 70 milliseconds in low-priority RT processes
when running a dd if=/dev/hda of=/dev/null in parallel (where hda
is a CF card) is due to readahead.

Two indications for that:
* Per default, maximum readahead is 128 KB. 70 ms is exactly the time
needed to transfer 128 KB 
(our CF cards have a sustained transfer rate of 1800 KB/sec).
* If the readahead constants in the kernel are changed,
the observed latencies scale almost linearly.

So obviously:
* The PIO mode IDE transfers keep the CPU 100 % busy.
* No other processes get scheduled between the data transfer of
individual sectors, i.e. there are no holes between the IDE 
interrupts.

However, there is no need to change the readahead mechanism,
because even the transfer of a single sector would exceed our
latency requirements. So we either need to switch to DMA (which we
can't in the short term), or run the time-critical tasks above IDE 
priority. 

So the question is, what exactly is the IDE priority?
Is the PIO transfer done in the IRQ handler or in a bh?

-- 
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [EMAIL PROTECTED]WWW: www.keba.com
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, USB: High latency?

2005-04-04 Thread kus Kusche Klaus

> * kus Kusche Klaus <[EMAIL PROTECTED]> wrote:
> >IRQ 7-724   0d..11us : end_8259A_irq (do_hardirq)
> >IRQ 7-724   0d..11us!: enable_8259A_irq (do_hardirq)
> >IRQ 7-724   0d...  832us : do_hardirq (do_irqd)
> >IRQ 7-724   0d...  833us : trace_irqs_on (do_hardirq)
> 
> > mmap-1000  0d.h1   21us : end_8259A_irq (__do_IRQ)
> > mmap-1000  0d.h1   22us!: enable_8259A_irq (__do_IRQ)
> > mmap-1000  0d.h.  662us : irq_exit (do_IRQ)
> > mmap-1000  0d..1  662us : do_softirq (irq_exit)
> 
> > mmap-1000  0d.h.0us : do_IRQ (c012d6d5 7 0)
> > mmap-1000  0d.h12us!: mask_and_ack_8259A (__do_IRQ)
> > mmap-1000  0d.h1  938us : redirect_hardirq (__do_IRQ)
> > mmap-1000  0d.h1  939us : wake_up_process (redirect_hardirq)
> 
> such 'freezes' almost certainly signal some sort of hardware 
> latency - 
> some device holding the system bus up during DMA. There is no 
> algorithmic reason for any of those steps above to take 
> several hundreds 
> of microseconds.
> 
>   Ingo

I asked our hardware team. The hardware has two devices which are
in use and capable of busmaster/DMA transfers: 
The intel e100 ethernet controller and the intel PIIX4 USB 
controller. 
The IDE interface is also a busmaster, but there are only PIO IDE
devices.

I suspect the latter, as USB reads were running in parallel...
How many bytes are transferred at most by the USB controller 
for a single request? How long may this take?

Any experiences / opinitions / advices?

Moreover, we know from experience that the "WBINDV" instruction
(Write back and invalidate CPU cache) can cause such latencies. 

Does this instruction occur anywhere in Linux?

-- 
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]WWW: www.keba.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, USB: High latency?

2005-04-04 Thread kus Kusche Klaus

 * kus Kusche Klaus [EMAIL PROTECTED] wrote:
 IRQ 7-724   0d..11us : end_8259A_irq (do_hardirq)
 IRQ 7-724   0d..11us!: enable_8259A_irq (do_hardirq)
 IRQ 7-724   0d...  832us : do_hardirq (do_irqd)
 IRQ 7-724   0d...  833us : trace_irqs_on (do_hardirq)
 
  mmap-1000  0d.h1   21us : end_8259A_irq (__do_IRQ)
  mmap-1000  0d.h1   22us!: enable_8259A_irq (__do_IRQ)
  mmap-1000  0d.h.  662us : irq_exit (do_IRQ)
  mmap-1000  0d..1  662us : do_softirq (irq_exit)
 
  mmap-1000  0d.h.0us : do_IRQ (c012d6d5 7 0)
  mmap-1000  0d.h12us!: mask_and_ack_8259A (__do_IRQ)
  mmap-1000  0d.h1  938us : redirect_hardirq (__do_IRQ)
  mmap-1000  0d.h1  939us : wake_up_process (redirect_hardirq)
 
 such 'freezes' almost certainly signal some sort of hardware 
 latency - 
 some device holding the system bus up during DMA. There is no 
 algorithmic reason for any of those steps above to take 
 several hundreds 
 of microseconds.
 
   Ingo

I asked our hardware team. The hardware has two devices which are
in use and capable of busmaster/DMA transfers: 
The intel e100 ethernet controller and the intel PIIX4 USB 
controller. 
The IDE interface is also a busmaster, but there are only PIO IDE
devices.

I suspect the latter, as USB reads were running in parallel...
How many bytes are transferred at most by the USB controller 
for a single request? How long may this take?

Any experiences / opinitions / advices?

Moreover, we know from experience that the WBINDV instruction
(Write back and invalidate CPU cache) can cause such latencies. 

Does this instruction occur anywhere in Linux?

-- 
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]WWW: www.keba.com
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, USB: High latency?

2005-04-01 Thread kus Kusche Klaus

> > Even when the errors described in my previous mail does not occur, 
> > massive USB stick transfers cause latencies of 1 to 2 milliseconds, 
> > which is way too much for realtime control systems.
> 
> do these occur under PREEMPT_RT? If yes, do you get any 
> useful trace if 
> you enable all the tracing options but keep wakeup-timing off:
> 
>  # CONFIG_WAKEUP_TIMING is not set
>  CONFIG_PREEMPT_TRACE=y
>  CONFIG_CRITICAL_PREEMPT_TIMING=y
>  CONFIG_CRITICAL_IRQSOFF_TIMING=y
>  CONFIG_CRITICAL_TIMING=y
>  CONFIG_LATENCY_TIMING=y
>  CONFIG_LATENCY_TRACE=y
>  CONFIG_MCOUNT=y
> 
> this should catch any type of preempt-off section, irqs-off and 
> preempt_disable() alike. (unless the tracer has a bug.)

I tried with the trace settings above, but I didn't have good luck:

1.) Tracing slows things down *a lot* (even *before* turned on):
* My own rtc test program causes 97 % CPU load when tracing is on.
  On an RT kernel with tracing off, it consumes just below 20 %,
  on the standard kernel, it consumes less than 3 %.
* With -f 8192, rtc_wakeup does not even start:
  It blocks the whole system for about 1 minute at 
  "setting up consumer thread".
  After that, it starts running and produces output, but ends with
  "ringbuffer full" almost immediately.
  Surprisingly, when started under the control of "strace", it runs,
  but reports tons of missed interrupts.
  With "-f 1024", it works.

2.) Tracing still doesn't seem to catch everything:
rtc_wakeup sometimes reported a max. jitter of several milliseconds,
but the max trace from /proc/latency_trace showed around 50
microseconds...

3.) However, I got some interesting traces later (see below).
They seem to be related to USB read load.

4.) When running rtc_wakeup at low rtpri (e.g. 2(12)),
I can still generate very bad "missed interrupts" counts
(even on a kernel *without* latency tracing):
core dumps: Up to 10 missed interrupts, almost continuously
USB reads:  Also up to 10 missed interrupts, almost continuously
USB reads, mmap load in parallel: Up to 50, almost continuously
OOM: Up to 500
CF reads: Up to 600
Moreover, rtc_wakeup shows jitter up to 2500 microseconds

Even if there is no special test load running, 
rtc_wakeup now and then misses an interrupt!

All the load is generated at shell priority, not at rt priority!
(so, ideally, it should not influence rt threads *at all*)

5.) I still do not understand the effect of "chrt"-ing "IRQ 8" to 95:
"IRQ 8" is at 49 per default, and there is no process with rtpri>49
(except for rtc_wakeup itself).
Hence, to my understanding, there should not be *any* difference 
between running "IRQ 8" at 49 or 95!
However, when "IRQ 8" is at 95, a dd from the CF card doesn't hurt much.
When "IRQ 8" is at 49, the dd causes jitter of several milliseconds!
(both measured with rtc_wakeup at 89(99)).
Why? "IRQ 14" (IDE) is at 47! There is nothing except rtc_wakeup
which could block or preempt the RTC "IRQ 8"?



preemption latency trace v1.1.4 on 2.6.12-rc1-RT-V0.7.42-08

 latency: 833 us, #7/7, CPU#0 | (M:rt VP:0, KP:1, SP:1 HP:1 #P:1)
-
| task: IRQ 7-724 (uid:0 nice:-5 policy:1 rt_prio:46)
-

 _--=> CPU#
/ _-=> irqs-off
   | / _=> need-resched
   || / _---=> hardirq/softirq
   ||| / _--=> preempt-depth
    /
   | delay
   cmd pid | time  |   caller
  \   /|   \   |   /
   IRQ 7-724   0d...0us : do_hardirq (do_irqd)
   IRQ 7-724   0d..10us : note_interrupt (do_hardirq)
   IRQ 7-724   0d..11us : end_8259A_irq (do_hardirq)
   IRQ 7-724   0d..11us!: enable_8259A_irq (do_hardirq)
   IRQ 7-724   0d...  832us : do_hardirq (do_irqd)
   IRQ 7-724   0d...  833us : trace_irqs_on (do_hardirq)

Some more of those:

preemption latency trace v1.1.4 on 2.6.12-rc1-RT-V0.7.42-08

 latency: 675 us, #47/47, CPU#0 | (M:rt VP:0, KP:1, SP:1 HP:1 #P:1)
-
| task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0)
-

 _--=> CPU#
/ _-=> irqs-off
   | / _=> need-resched
   || / _---=> hardirq/softirq
   ||| / _--=> preempt-depth
    /
   | delay
   cmd pid | time  |   caller
  \   /|   \   |   /
mmap-1000  0d...0us : common_interrupt ((0))
mmap-1000  0d.h.0us : do_IRQ (c028e90e 0 0)
mmap-1000  0d.h11us+: mask_and_ack_8259A (__do_IRQ)
mmap-1000  0d.h15us : redirect_hardirq (__do_IRQ)
mmap-1000  0d.h.5us : handle_IRQ_event (__do_IRQ)
mmap-1000  0d.h.6us : timer_interrupt (handle_IRQ_event)
mmap-1000  0d.h16us+: mark_offset_tsc (timer_interrupt)
mmap-1000  0d.h1   14us : do_timer

RE: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-04-01 Thread kus Kusche Klaus

> > The following tests are made with 'IRQ 8' at 95, rtc_wakeup 
> at 89(99):
> > * Heavy mmap load, no oom: max jitter: 42.1% (   51 usec)
> > * Heavy mmap load, oom:max jitter:  11989.2% (14635 usec)
> >   (but still "missed irqs: 0", so IRQ 8 was also blocked for 14 ms)
> 
> did you get any kernel messages in that time? (about missed 
> irqs, etc.)  
> Please do a 'dmesg -n 0' to minimize the effect of kernel messages.

Excellent, thanks!

It turned out that the latencies are not caused by the kernel
messages themselves, but by sending them to a serial console
(which was off), in all my high latency cases at rtpri 89(99).

After removing the serial console from the boot parameters,
* the OOM timings are back to normal (around 50 microseconds)
* the USB error and remove timings are back to normal
* the USB plugin timings are in the range of the USB read
  (which is up to 1 ms - still bad)

However, latencies at rtprio 2 are still very frustrating
(details will follow).

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-04-01 Thread kus Kusche Klaus

  The following tests are made with 'IRQ 8' at 95, rtc_wakeup 
 at 89(99):
  * Heavy mmap load, no oom: max jitter: 42.1% (   51 usec)
  * Heavy mmap load, oom:max jitter:  11989.2% (14635 usec)
(but still missed irqs: 0, so IRQ 8 was also blocked for 14 ms)
 
 did you get any kernel messages in that time? (about missed 
 irqs, etc.)  
 Please do a 'dmesg -n 0' to minimize the effect of kernel messages.

Excellent, thanks!

It turned out that the latencies are not caused by the kernel
messages themselves, but by sending them to a serial console
(which was off), in all my high latency cases at rtpri 89(99).

After removing the serial console from the boot parameters,
* the OOM timings are back to normal (around 50 microseconds)
* the USB error and remove timings are back to normal
* the USB plugin timings are in the range of the USB read
  (which is up to 1 ms - still bad)

However, latencies at rtprio 2 are still very frustrating
(details will follow).

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, USB: High latency?

2005-04-01 Thread kus Kusche Klaus

  Even when the errors described in my previous mail does not occur, 
  massive USB stick transfers cause latencies of 1 to 2 milliseconds, 
  which is way too much for realtime control systems.
 
 do these occur under PREEMPT_RT? If yes, do you get any 
 useful trace if 
 you enable all the tracing options but keep wakeup-timing off:
 
  # CONFIG_WAKEUP_TIMING is not set
  CONFIG_PREEMPT_TRACE=y
  CONFIG_CRITICAL_PREEMPT_TIMING=y
  CONFIG_CRITICAL_IRQSOFF_TIMING=y
  CONFIG_CRITICAL_TIMING=y
  CONFIG_LATENCY_TIMING=y
  CONFIG_LATENCY_TRACE=y
  CONFIG_MCOUNT=y
 
 this should catch any type of preempt-off section, irqs-off and 
 preempt_disable() alike. (unless the tracer has a bug.)

I tried with the trace settings above, but I didn't have good luck:

1.) Tracing slows things down *a lot* (even *before* turned on):
* My own rtc test program causes 97 % CPU load when tracing is on.
  On an RT kernel with tracing off, it consumes just below 20 %,
  on the standard kernel, it consumes less than 3 %.
* With -f 8192, rtc_wakeup does not even start:
  It blocks the whole system for about 1 minute at 
  setting up consumer thread.
  After that, it starts running and produces output, but ends with
  ringbuffer full almost immediately.
  Surprisingly, when started under the control of strace, it runs,
  but reports tons of missed interrupts.
  With -f 1024, it works.

2.) Tracing still doesn't seem to catch everything:
rtc_wakeup sometimes reported a max. jitter of several milliseconds,
but the max trace from /proc/latency_trace showed around 50
microseconds...

3.) However, I got some interesting traces later (see below).
They seem to be related to USB read load.

4.) When running rtc_wakeup at low rtpri (e.g. 2(12)),
I can still generate very bad missed interrupts counts
(even on a kernel *without* latency tracing):
core dumps: Up to 10 missed interrupts, almost continuously
USB reads:  Also up to 10 missed interrupts, almost continuously
USB reads, mmap load in parallel: Up to 50, almost continuously
OOM: Up to 500
CF reads: Up to 600
Moreover, rtc_wakeup shows jitter up to 2500 microseconds

Even if there is no special test load running, 
rtc_wakeup now and then misses an interrupt!

All the load is generated at shell priority, not at rt priority!
(so, ideally, it should not influence rt threads *at all*)

5.) I still do not understand the effect of chrt-ing IRQ 8 to 95:
IRQ 8 is at 49 per default, and there is no process with rtpri49
(except for rtc_wakeup itself).
Hence, to my understanding, there should not be *any* difference 
between running IRQ 8 at 49 or 95!
However, when IRQ 8 is at 95, a dd from the CF card doesn't hurt much.
When IRQ 8 is at 49, the dd causes jitter of several milliseconds!
(both measured with rtc_wakeup at 89(99)).
Why? IRQ 14 (IDE) is at 47! There is nothing except rtc_wakeup
which could block or preempt the RTC IRQ 8?



preemption latency trace v1.1.4 on 2.6.12-rc1-RT-V0.7.42-08

 latency: 833 us, #7/7, CPU#0 | (M:rt VP:0, KP:1, SP:1 HP:1 #P:1)
-
| task: IRQ 7-724 (uid:0 nice:-5 policy:1 rt_prio:46)
-

 _--= CPU#
/ _-= irqs-off
   | / _= need-resched
   || / _---= hardirq/softirq
   ||| / _--= preempt-depth
    /
   | delay
   cmd pid | time  |   caller
  \   /|   \   |   /
   IRQ 7-724   0d...0us : do_hardirq (do_irqd)
   IRQ 7-724   0d..10us : note_interrupt (do_hardirq)
   IRQ 7-724   0d..11us : end_8259A_irq (do_hardirq)
   IRQ 7-724   0d..11us!: enable_8259A_irq (do_hardirq)
   IRQ 7-724   0d...  832us : do_hardirq (do_irqd)
   IRQ 7-724   0d...  833us : trace_irqs_on (do_hardirq)

Some more of those:

preemption latency trace v1.1.4 on 2.6.12-rc1-RT-V0.7.42-08

 latency: 675 us, #47/47, CPU#0 | (M:rt VP:0, KP:1, SP:1 HP:1 #P:1)
-
| task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0)
-

 _--= CPU#
/ _-= irqs-off
   | / _= need-resched
   || / _---= hardirq/softirq
   ||| / _--= preempt-depth
    /
   | delay
   cmd pid | time  |   caller
  \   /|   \   |   /
mmap-1000  0d...0us : common_interrupt ((0))
mmap-1000  0d.h.0us : do_IRQ (c028e90e 0 0)
mmap-1000  0d.h11us+: mask_and_ack_8259A (__do_IRQ)
mmap-1000  0d.h15us : redirect_hardirq (__do_IRQ)
mmap-1000  0d.h.5us : handle_IRQ_event (__do_IRQ)
mmap-1000  0d.h.6us : timer_interrupt (handle_IRQ_event)
mmap-1000  0d.h16us+: mark_offset_tsc (timer_interrupt)
mmap-1000  0d.h1   14us : do_timer (timer_interrupt)
mmap-1000  0d.h1   14us : update_process_times

RE: 2.6.11, USB: High latency?

2005-03-31 Thread kus Kusche Klaus

> > > The latencies are almost certainly caused by the USB host 
> controller 
> > > driver.  I'm planning improvements to uhci-hcd which should 
> > > help reduce 
> > > the latency, but it will still be on the large side.  And I 
> > > won't have 
> > > time to write the changes to the driver for several months.
> > 
> > Any numbers about the expected "large side"? 
> > We would need <30 microseconds irq latency,
> > and <<1 milliseconds rt application latency.
> 
> The biggest advantage would come from using a bottom-half 
> handler to do 
> most of the work.  Right now the uhci-hcd driver does 
> everything in its 
> interrupt handler.  This would certainly help IRQ latency; it 
> might not 
> affect application latency very much.

Sounds very reasonable, thanks. Also helps application latency,
because with the RT patches, I can tune the rt prio of softirq
execution (that's where bottom-half goes, doesn't it?) w.r.t. the
rt prio of the application threads.

However, if I understand things correctly, if you really need 
to disable all interrupts while doing the USB work, it will not
make any difference if IRQs are disabled while you are in the
USB IRQ handler, or if they are disabled for the same amount of 
work/time in the bottom-half code.

> I'll try adding a bottom half in my next series of patches.  
> Maybe it will 
> be ready in time to appear in the -mm kernels before 2.6.12-final is 
> released.

> We'll see what happens with the upcoming changes.  Maybe 
> you'll be able to 
> test them for me?

Basically, yes (as long as our company doesn't decide to stop the
linux experiments).

However, I depend on Ingo's RT patch, which is against the -rc series,
not against the -mm series. So I will probably not be able to apply
patches created against -mm.

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [BUG] 2.6.11: Random SCSI/USB errors when reading from USB memory stick

2005-03-31 Thread kus Kusche Klaus

> > > Latency is the subject of a separate email.  Does this 
> > > increase in latency 
> > > occur only when you see the errors, or whenever you do a 
> large data 
> > > transfer?  In fact, I would suspect the errors to _decrease_ 
> > > the latency 
> > > with respect to a normal transfer.
> > 
> > I observe from <1 to 2 ms on successful transfers, and around 15 ms
> > latency when things go wrong.
> 
> I hate to ask this question; it sounds an awful lot like 
> "Monty Python and
> the Holy Grail", but...  Is that IRQ latency or application latency?

With Ingo's RT kernels, it is not possible for me to tell the 
difference, because IRQ handlers are also running as kernel threads,
scheduled by the scheduler according to their rt prio
(which is around 50 by default, 95 for the RTC IRQ in my case).

I can tell for sure that the RTC IRQ handler was not executed
in that time, but I can't tell if that's because IRQs were blocked
or because it didn't get scheduled.

> I can't think of any reason why IRQ latency should change 
> during the error 
> handling.  Application latency might change because the SCSI 
> error handler 
> could start using up a lot of CPU time.  I don't know what 
> priority it 
> runs at; you can check with ps.

Ingo just suggested that error handling (writing kernel messages
to console/disk) in general is causing the trouble. 
I'll check that and post the results.

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-03-31 Thread kus Kusche Klaus

> getting /dev/rtc handling right for latency measurement is 
> ... tricky.  
> The method i'm using under PREEMPT_RT is:
> 
>  chrt -f 84 -p `pidof 'IRQ 0'`
>  chrt -f 95 -p `pidof 'IRQ 8'`
>  ./rtc_wakeup -f 1024 -t 10

I tried it your way.

First impressions with realtime-preempt-2.6.12-rc1-V0.7.42-01, 
rtc_wakeup and chrt:


The main difference is not between rtc_wakeup and my test script
(the timings agree very well), but comes from chrt for 'IRQ 8':

System not loaded, original 'IRQ 8' prio 49:
* rtc_wakeup at 89(99): max jitter: 378.2% (461 usec), missed irqs:  0
* rtc_wakeup at 1(11):  max jitter:  97.1% (118 usec), missed irqs: 25
System not loaded, 'IRQ 8' prio at 95:
* rtc_wakeup at 89(99): max jitter:  19.1% ( 23 usec), missed irqs:  0
* rtc_wakeup at 1(11):  max jitter: 190.7% (232 usec), missed irqs: 52


The following tests are made with 'IRQ 8' at 95, rtc_wakeup at 89(99):
* Heavy mmap load, no oom: max jitter: 42.1% (   51 usec)
* Heavy mmap load, oom:max jitter:  11989.2% (14635 usec)
  (but still "missed irqs: 0", so IRQ 8 was also blocked for 14 ms)
* USB reads, no error: max jitter:707.7% (  863 usec)
  (happens rarely)
* USB reads, error:max jitter:   7247.2% ( 8846 usec)
  (again, no missed irqs)
* USB stick connect:   max jitter:   5776.7% ( 7051 usec)
* USB stick disconnect:max jitter:   2966.6% ( 3621 usec)
  (both without error!)


Notes:
* My system does not have a 'IRQ 0' kernel thread.
  It has IRQ 1, 3, 7, 8, 12 and 14.
  This is strange, because /proc/interrupts says
0:4805658  XT-PIC  timer  0/5658
2:  0  XT-PIC  cascade  0/0
3:  0  XT-PIC  serial  0/0
7:  17538  XT-PIC  uhci_hcd:usb1, eth0  0/17538
8: 838523  XT-PIC  rtc  0/38523
   14:   5775  XT-PIC  ide0  0/5775
* The syntax of chrt has changed:
  chrt -f -p  
* I run rtc_wakeup with -f 8192.
  This is more realistic in our case.

CF card (IDE) load does not hurt as long as the rtc test runs
with a very high rtpri: The results did not differ significantly
from no-load results.
Attempts with tracing on and results with running the test 
at low rtpri will follow tomorrow...

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, USB: High latency?

2005-03-31 Thread kus Kusche Klaus

> I couldn't find that previous email in the MARC archives.
> 
> Regardless, you'd have to provide a small bit of information about
> your hardware configuration.  What device speed:  full or high?
> What controller:  EHCI, OHCI, UHCI, something else?  Which driver
> for the stick:  usb-storage, or ub?  What else was using memory
> and PCI bandwidth at the time?  SMP?

The error occurred on an intel Pentium 3 (500 MHz) embedded system with
440BX chipset and 192 MB RAM. USB is handled by the 440BX (intel 82371
PIIX4). The UHCI driver shares interrupt 7 with an intel 82559ER 100
Mbit ethernet controller (which is driven by the e100 driver and active:
As there is no local keyboard, I access the system by ssh). 

The system "disk" is a 128 MB CF card directly connected to the 440BX
primary IDE port and running in PIO mode 2 at about 2 MB/sec peak (but
it is idle most of the time). There is a SM712 VGA chip running in text
mode, a 1000 HZ std PC timer, and no other "interesting" device (nothing
else on the PCI bus or causing any interrupts).

The error was reproduced with statically linked (no modules)
vanilla-2.6.11, 2.6.11-gentoo-r3, and
realtime-preempt-2.6.12-rc1-V0.7.41-11 kernels, all built with gcc
3.4.3. No SMP. USB-storage for the sticks.

I tried with two different sticks (an old 64 MB USB 1.x and a 1 GB USB
2.x), both show the same problem on all USB interfaces on the target.
The same dd command works fine on both sticks on my office PC. 

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [BUG] 2.6.11: Random SCSI/USB errors when reading from USB memory stick

2005-03-31 Thread kus Kusche Klaus

> Latency is the subject of a separate email.  Does this 
> increase in latency 
> occur only when you see the errors, or whenever you do a large data 
> transfer?  In fact, I would suspect the errors to _decrease_ 
> the latency 
> with respect to a normal transfer.

I observe from <1 to 2 ms on successful transfers, and around 15 ms
latency when things go wrong.

I do not know what the kernel does internally, but I tried block
sizes from 512b to 8192kb with dd, and it did not make any
difference, neither w.r.t. timing nor w.r.t. errors.

> This indicates that the problem lies in the computer, or its USB 
> connectors and their electromagnetic environment.

Hmmm, hard to believe. It happens on all USB ports, on two
different controllers. The box is an industrial-grade
embedded control system, in a robust steel case. There is
no external USB cable involved, the stick is plugged directly
into the box. 

The USB ports of these boxes work fine in vxWorks
(don't know about Windows).

> > I suspect some kind of timing / bus / interrupt trouble, 
> either due to
> > the interrupt shared between uhci and e100, or due to the 
> extremely slow
> > PIO transfers (which seem to block interrupts).
> 
> Timing or interrupt trouble wouldn't cause the problems you see.  Bus
> utilization can cause problems, but they would be reported 
> differently.  
> I think your problem is due either to faulty hardware or to bad signal
> propagation.

If it's not caused by bus utilization or shared or lost irqs,
I think it is some kind of software problem, maybe in the driver,
maybe elsewhere: 
The dd command *always* runs fine for some time, up to now
the transfer never had any problems for the first 30 MB or so,
sometimes even for 500 MB on my 1 GB stick.

Is there any possibility for integer overflow?
Is there any data which may accumulate over time?

> > What can I do to solve or further analyze the problem?
> 
> Replace the USB cable.  Use a separate PCI USB controller 
> card, preferably 
> one with a high-speed USB controller

A cable is not involved, 
and the embedded system doesn't offer PCI slots.

> > Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: *** thread
> > awakened.
> > Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: 
> Command READ_10
> > (10 bytes)
> > Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:  28 
> 00 00 01 bd c0
> > 00 00 f0 00
> > Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Bulk Command S
> > 0x43425355 T 0x38e L 122880 F 128 Trg 0 LUN 0 CL 10
> > Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:
> > usb_stor_bulk_transfer_buf: xfer 31 bytes
> > Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Status code 0;
> > transferred 31/31
> > Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: -- transfer
> > complete
> > Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Bulk command
> > transfer result=0
> > Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:
> > usb_stor_bulk_transfer_sglist: xfer 122880 bytes, 30 entries
> > Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: 
> Status code -75;
> > transferred 19840/122880
> > Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: -- babble
> 
> This is not a "short read"; as the log says, it's a "babble". 
>  That means
> the computer's USB controller believed the device continued 
> transmitting
> past the time when it was supposed to stop.  Assuming the device is
> functioning correctly, either the USB controller is bad or 
> the signal was
> degraded.

But doesn't it say 19840 of 122880 transferred?
In all cases of trouble, the first number is smaller than the second.

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, USB: High latency?

2005-03-31 Thread kus Kusche Klaus

> The latencies are almost certainly caused by the USB host controller 
> driver.  I'm planning improvements to uhci-hcd which should 
> help reduce 
> the latency, but it will still be on the large side.  And I 
> won't have 
> time to write the changes to the driver for several months.

Any numbers about the expected "large side"? 
We would need <30 microseconds irq latency,
and <<1 milliseconds rt application latency.

> The best solution is to stop using uhci-hcd.  Get a PCI card 
> with an OHCI 
> or EHCI (high-speed) controller.  They do much more work in hardware, 
> reducing the amount of time the driver needs to spend with interrupts 
> disabled.

The hardware is invariable. It is an embedded system with no PCI slots.

And it seems to be possible with UHCI, 
because vxWorks allows USB stick transfers in operation without
missing latency requirements.

I do not require rt on the USB, it may block its own irq as long as
it likes, but it should not affect other irqs.

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-03-31 Thread kus Kusche Klaus

> > The test system runs a 2.6.11 kernel (no SMP) on a Pentium3 500 MHz 
> > embedded hardware.
> 
> which probably has memory bandwidth of at most a couple hundred MB/s,
> which is really horrible by modern standards.

It's my job to find out if linux can be used for control systems 
on this (and smaller: ARM, embedded PPC) hardware. 
It's not my job to discuss the hardware.

This hardware is quite fast for embedded control system standards,
where product lifetimes exceed 10 years. AFAIK, intel does not
offer any P4 products with industrial availability and lifetime
gurantee (and they would run too hot anyway).

> > However, things break seriously when exercising the CF card 
> in parallel 
> > (e.g. with a dd if=/dev/hda of=/dev/null):
> > 
> > * The rtc *interrupt handler* is delayed for up to 250 
> *micro*seconds. 
> > This is very bad for my purpose, but easy to explain: It is 
> roughly the 
> > time needed to transfer 512 Bytes from a CF card which can 
> transfer 2 
> > Mbyte/sec, and obviously, the CPU blocks all interrupts 
> while making pio
> > 
> > transfers. (Why? Is this really necessary?)
> 
> even with -u1, isn't there still a softirq queue that will 
> delay the wakeup
> of your user-space tester?

I assume your comment belongs to the timings of my test program
(below), not to the timings of the rtc irq handler (above):
The softirq queue should not block the rtc irq handler I think.

The softirq queue should not even block user-space rt programs
for 60-300 ms. If something takes that long, it should perhaps 
be implemented as threads with a well-known rtprio, not in the 
softirq queue.

> > * The *test program* is regularly blocked for 63 *milli*seconds, 
> > sometimes for up to 300 *milli*seconds, which is absolutely
> > unacceptable.
> 
> guessing that's VM housekeeping.

I would feel better if I knew instead of guessing.
Moreover, VM housekeeping caused by non-rtpri programs should not
block memory-resident rt programs.
And the VM housekeeping in this case is just too simple to waste 
milliseconds on it: Just take one page of free mem (there is
plenty of it) and use it as a buffer...

> > Now the big question:
> > *** Why doesn't my rt test program get *any* CPU for 63 jiffies? ***
> > (the system ticks at 1000 HZ)
> 
> because it's user-space.  the 'rt' is a bit of a misnomer - 
> it's merely
> a higher priority, less preemptable job.
> 
> > * The dd program obviously gets some CPU regularly (because 
> it copies 2 
> > MB/s, and because no other program could cause the 1 % user 
> CPU load). 
> > The dd runs at normal shell scheduling priority, so it should be 
> > preempted immediately by my test program!
> 
> out of curiosity, does running it with "nice -n 19" change anything?

no.

> > 2.) Using a realtime-preempt-2.6.12-rc1-V0.7.41-11 kernel with
> > PREEMPT_RT:
> > If my test program runs at rtpri 99, the problem is gone: It is 
> > scheduled within 30 microseconds after the rtc interrupt.
> > If my test program runs at rtpri 1, it still suffers 
> from delays 
> > in the 30 to 300 millisecond range.
> 
> so your problem is solved, no?

No, running at an rtpri above the irq handlers is not an option 
in practice: My program would block irq handling...

> also, did you try a (plain) preemptable kernel?

What I called "vanilla" was configured with "plain" preemption.
 

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-25

2005-03-31 Thread kus Kusche Klaus

> i have released the -V0.7.41-25 Real-Time Preemption patch, 
> which can be 
> downloaded from the usual place:

1. Does not compile without RT_DEADLOCK_DETECT:
kernel/rt.c: In function `change_owner':
kernel/rt.c:556: error: structure has no member named `debug'
kernel/rt.c: In function `pi_setprio':
kernel/rt.c:576: error: structure has no member named `debug'
kernel/rt.c: In function `task_blocks_on_lock':
kernel/rt.c:677: error: structure has no member named `debug'
kernel/rt.c:687: error: structure has no member named `debug'
kernel/rt.c: In function `__up_mutex':
kernel/rt.c:1223: error: structure has no member named `debug'

2. My problem (see my LKML mails yesterday) is not yet solved:
The latency tracer shows latencies of at most 40 microseconds,
but my test program at rtprio 99 sometimes did not get any CPU 
for milliseconds...

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-25

2005-03-31 Thread kus Kusche Klaus

 i have released the -V0.7.41-25 Real-Time Preemption patch, 
 which can be 
 downloaded from the usual place:

1. Does not compile without RT_DEADLOCK_DETECT:
kernel/rt.c: In function `change_owner':
kernel/rt.c:556: error: structure has no member named `debug'
kernel/rt.c: In function `pi_setprio':
kernel/rt.c:576: error: structure has no member named `debug'
kernel/rt.c: In function `task_blocks_on_lock':
kernel/rt.c:677: error: structure has no member named `debug'
kernel/rt.c:687: error: structure has no member named `debug'
kernel/rt.c: In function `__up_mutex':
kernel/rt.c:1223: error: structure has no member named `debug'

2. My problem (see my LKML mails yesterday) is not yet solved:
The latency tracer shows latencies of at most 40 microseconds,
but my test program at rtprio 99 sometimes did not get any CPU 
for milliseconds...

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-03-31 Thread kus Kusche Klaus

  The test system runs a 2.6.11 kernel (no SMP) on a Pentium3 500 MHz 
  embedded hardware.
 
 which probably has memory bandwidth of at most a couple hundred MB/s,
 which is really horrible by modern standards.

It's my job to find out if linux can be used for control systems 
on this (and smaller: ARM, embedded PPC) hardware. 
It's not my job to discuss the hardware.

This hardware is quite fast for embedded control system standards,
where product lifetimes exceed 10 years. AFAIK, intel does not
offer any P4 products with industrial availability and lifetime
gurantee (and they would run too hot anyway).

  However, things break seriously when exercising the CF card 
 in parallel 
  (e.g. with a dd if=/dev/hda of=/dev/null):
  
  * The rtc *interrupt handler* is delayed for up to 250 
 *micro*seconds. 
  This is very bad for my purpose, but easy to explain: It is 
 roughly the 
  time needed to transfer 512 Bytes from a CF card which can 
 transfer 2 
  Mbyte/sec, and obviously, the CPU blocks all interrupts 
 while making pio
  
  transfers. (Why? Is this really necessary?)
 
 even with -u1, isn't there still a softirq queue that will 
 delay the wakeup
 of your user-space tester?

I assume your comment belongs to the timings of my test program
(below), not to the timings of the rtc irq handler (above):
The softirq queue should not block the rtc irq handler I think.

The softirq queue should not even block user-space rt programs
for 60-300 ms. If something takes that long, it should perhaps 
be implemented as threads with a well-known rtprio, not in the 
softirq queue.

  * The *test program* is regularly blocked for 63 *milli*seconds, 
  sometimes for up to 300 *milli*seconds, which is absolutely
  unacceptable.
 
 guessing that's VM housekeeping.

I would feel better if I knew instead of guessing.
Moreover, VM housekeeping caused by non-rtpri programs should not
block memory-resident rt programs.
And the VM housekeeping in this case is just too simple to waste 
milliseconds on it: Just take one page of free mem (there is
plenty of it) and use it as a buffer...

  Now the big question:
  *** Why doesn't my rt test program get *any* CPU for 63 jiffies? ***
  (the system ticks at 1000 HZ)
 
 because it's user-space.  the 'rt' is a bit of a misnomer - 
 it's merely
 a higher priority, less preemptable job.
 
  * The dd program obviously gets some CPU regularly (because 
 it copies 2 
  MB/s, and because no other program could cause the 1 % user 
 CPU load). 
  The dd runs at normal shell scheduling priority, so it should be 
  preempted immediately by my test program!
 
 out of curiosity, does running it with nice -n 19 change anything?

no.

  2.) Using a realtime-preempt-2.6.12-rc1-V0.7.41-11 kernel with
  PREEMPT_RT:
  If my test program runs at rtpri 99, the problem is gone: It is 
  scheduled within 30 microseconds after the rtc interrupt.
  If my test program runs at rtpri 1, it still suffers 
 from delays 
  in the 30 to 300 millisecond range.
 
 so your problem is solved, no?

No, running at an rtpri above the irq handlers is not an option 
in practice: My program would block irq handling...

 also, did you try a (plain) preemptable kernel?

What I called vanilla was configured with plain preemption.
 

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, USB: High latency?

2005-03-31 Thread kus Kusche Klaus

 The latencies are almost certainly caused by the USB host controller 
 driver.  I'm planning improvements to uhci-hcd which should 
 help reduce 
 the latency, but it will still be on the large side.  And I 
 won't have 
 time to write the changes to the driver for several months.

Any numbers about the expected large side? 
We would need 30 microseconds irq latency,
and 1 milliseconds rt application latency.

 The best solution is to stop using uhci-hcd.  Get a PCI card 
 with an OHCI 
 or EHCI (high-speed) controller.  They do much more work in hardware, 
 reducing the amount of time the driver needs to spend with interrupts 
 disabled.

The hardware is invariable. It is an embedded system with no PCI slots.

And it seems to be possible with UHCI, 
because vxWorks allows USB stick transfers in operation without
missing latency requirements.

I do not require rt on the USB, it may block its own irq as long as
it likes, but it should not affect other irqs.

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [BUG] 2.6.11: Random SCSI/USB errors when reading from USB memory stick

2005-03-31 Thread kus Kusche Klaus

 Latency is the subject of a separate email.  Does this 
 increase in latency 
 occur only when you see the errors, or whenever you do a large data 
 transfer?  In fact, I would suspect the errors to _decrease_ 
 the latency 
 with respect to a normal transfer.

I observe from 1 to 2 ms on successful transfers, and around 15 ms
latency when things go wrong.

I do not know what the kernel does internally, but I tried block
sizes from 512b to 8192kb with dd, and it did not make any
difference, neither w.r.t. timing nor w.r.t. errors.

 This indicates that the problem lies in the computer, or its USB 
 connectors and their electromagnetic environment.

Hmmm, hard to believe. It happens on all USB ports, on two
different controllers. The box is an industrial-grade
embedded control system, in a robust steel case. There is
no external USB cable involved, the stick is plugged directly
into the box. 

The USB ports of these boxes work fine in vxWorks
(don't know about Windows).

  I suspect some kind of timing / bus / interrupt trouble, 
 either due to
  the interrupt shared between uhci and e100, or due to the 
 extremely slow
  PIO transfers (which seem to block interrupts).
 
 Timing or interrupt trouble wouldn't cause the problems you see.  Bus
 utilization can cause problems, but they would be reported 
 differently.  
 I think your problem is due either to faulty hardware or to bad signal
 propagation.

If it's not caused by bus utilization or shared or lost irqs,
I think it is some kind of software problem, maybe in the driver,
maybe elsewhere: 
The dd command *always* runs fine for some time, up to now
the transfer never had any problems for the first 30 MB or so,
sometimes even for 500 MB on my 1 GB stick.

Is there any possibility for integer overflow?
Is there any data which may accumulate over time?

  What can I do to solve or further analyze the problem?
 
 Replace the USB cable.  Use a separate PCI USB controller 
 card, preferably 
 one with a high-speed USB controller

A cable is not involved, 
and the embedded system doesn't offer PCI slots.

  Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: *** thread
  awakened.
  Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: 
 Command READ_10
  (10 bytes)
  Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:  28 
 00 00 01 bd c0
  00 00 f0 00
  Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Bulk Command S
  0x43425355 T 0x38e L 122880 F 128 Trg 0 LUN 0 CL 10
  Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:
  usb_stor_bulk_transfer_buf: xfer 31 bytes
  Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Status code 0;
  transferred 31/31
  Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: -- transfer
  complete
  Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Bulk command
  transfer result=0
  Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:
  usb_stor_bulk_transfer_sglist: xfer 122880 bytes, 30 entries
  Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: 
 Status code -75;
  transferred 19840/122880
  Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: -- babble
 
 This is not a short read; as the log says, it's a babble. 
  That means
 the computer's USB controller believed the device continued 
 transmitting
 past the time when it was supposed to stop.  Assuming the device is
 functioning correctly, either the USB controller is bad or 
 the signal was
 degraded.

But doesn't it say 19840 of 122880 transferred?
In all cases of trouble, the first number is smaller than the second.

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, USB: High latency?

2005-03-31 Thread kus Kusche Klaus

 I couldn't find that previous email in the MARC archives.
 
 Regardless, you'd have to provide a small bit of information about
 your hardware configuration.  What device speed:  full or high?
 What controller:  EHCI, OHCI, UHCI, something else?  Which driver
 for the stick:  usb-storage, or ub?  What else was using memory
 and PCI bandwidth at the time?  SMP?

The error occurred on an intel Pentium 3 (500 MHz) embedded system with
440BX chipset and 192 MB RAM. USB is handled by the 440BX (intel 82371
PIIX4). The UHCI driver shares interrupt 7 with an intel 82559ER 100
Mbit ethernet controller (which is driven by the e100 driver and active:
As there is no local keyboard, I access the system by ssh). 

The system disk is a 128 MB CF card directly connected to the 440BX
primary IDE port and running in PIO mode 2 at about 2 MB/sec peak (but
it is idle most of the time). There is a SM712 VGA chip running in text
mode, a 1000 HZ std PC timer, and no other interesting device (nothing
else on the PCI bus or causing any interrupts).

The error was reproduced with statically linked (no modules)
vanilla-2.6.11, 2.6.11-gentoo-r3, and
realtime-preempt-2.6.12-rc1-V0.7.41-11 kernels, all built with gcc
3.4.3. No SMP. USB-storage for the sticks.

I tried with two different sticks (an old 64 MB USB 1.x and a 1 GB USB
2.x), both show the same problem on all USB interfaces on the target.
The same dd command works fine on both sticks on my office PC. 

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-03-31 Thread kus Kusche Klaus

 getting /dev/rtc handling right for latency measurement is 
 ... tricky.  
 The method i'm using under PREEMPT_RT is:
 
  chrt -f 84 -p `pidof 'IRQ 0'`
  chrt -f 95 -p `pidof 'IRQ 8'`
  ./rtc_wakeup -f 1024 -t 10

I tried it your way.

First impressions with realtime-preempt-2.6.12-rc1-V0.7.42-01, 
rtc_wakeup and chrt:


The main difference is not between rtc_wakeup and my test script
(the timings agree very well), but comes from chrt for 'IRQ 8':

System not loaded, original 'IRQ 8' prio 49:
* rtc_wakeup at 89(99): max jitter: 378.2% (461 usec), missed irqs:  0
* rtc_wakeup at 1(11):  max jitter:  97.1% (118 usec), missed irqs: 25
System not loaded, 'IRQ 8' prio at 95:
* rtc_wakeup at 89(99): max jitter:  19.1% ( 23 usec), missed irqs:  0
* rtc_wakeup at 1(11):  max jitter: 190.7% (232 usec), missed irqs: 52


The following tests are made with 'IRQ 8' at 95, rtc_wakeup at 89(99):
* Heavy mmap load, no oom: max jitter: 42.1% (   51 usec)
* Heavy mmap load, oom:max jitter:  11989.2% (14635 usec)
  (but still missed irqs: 0, so IRQ 8 was also blocked for 14 ms)
* USB reads, no error: max jitter:707.7% (  863 usec)
  (happens rarely)
* USB reads, error:max jitter:   7247.2% ( 8846 usec)
  (again, no missed irqs)
* USB stick connect:   max jitter:   5776.7% ( 7051 usec)
* USB stick disconnect:max jitter:   2966.6% ( 3621 usec)
  (both without error!)


Notes:
* My system does not have a 'IRQ 0' kernel thread.
  It has IRQ 1, 3, 7, 8, 12 and 14.
  This is strange, because /proc/interrupts says
0:4805658  XT-PIC  timer  0/5658
2:  0  XT-PIC  cascade  0/0
3:  0  XT-PIC  serial  0/0
7:  17538  XT-PIC  uhci_hcd:usb1, eth0  0/17538
8: 838523  XT-PIC  rtc  0/38523
   14:   5775  XT-PIC  ide0  0/5775
* The syntax of chrt has changed:
  chrt -f -p prio pid
* I run rtc_wakeup with -f 8192.
  This is more realistic in our case.

CF card (IDE) load does not hurt as long as the rtc test runs
with a very high rtpri: The results did not differ significantly
from no-load results.
Attempts with tracing on and results with running the test 
at low rtpri will follow tomorrow...

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [BUG] 2.6.11: Random SCSI/USB errors when reading from USB memory stick

2005-03-31 Thread kus Kusche Klaus

   Latency is the subject of a separate email.  Does this 
   increase in latency 
   occur only when you see the errors, or whenever you do a 
 large data 
   transfer?  In fact, I would suspect the errors to _decrease_ 
   the latency 
   with respect to a normal transfer.
  
  I observe from 1 to 2 ms on successful transfers, and around 15 ms
  latency when things go wrong.
 
 I hate to ask this question; it sounds an awful lot like 
 Monty Python and
 the Holy Grail, but...  Is that IRQ latency or application latency?

With Ingo's RT kernels, it is not possible for me to tell the 
difference, because IRQ handlers are also running as kernel threads,
scheduled by the scheduler according to their rt prio
(which is around 50 by default, 95 for the RTC IRQ in my case).

I can tell for sure that the RTC IRQ handler was not executed
in that time, but I can't tell if that's because IRQs were blocked
or because it didn't get scheduled.

 I can't think of any reason why IRQ latency should change 
 during the error 
 handling.  Application latency might change because the SCSI 
 error handler 
 could start using up a lot of CPU time.  I don't know what 
 priority it 
 runs at; you can check with ps.

Ingo just suggested that error handling (writing kernel messages
to console/disk) in general is causing the trouble. 
I'll check that and post the results.

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, USB: High latency?

2005-03-31 Thread kus Kusche Klaus

   The latencies are almost certainly caused by the USB host 
 controller 
   driver.  I'm planning improvements to uhci-hcd which should 
   help reduce 
   the latency, but it will still be on the large side.  And I 
   won't have 
   time to write the changes to the driver for several months.
  
  Any numbers about the expected large side? 
  We would need 30 microseconds irq latency,
  and 1 milliseconds rt application latency.
 
 The biggest advantage would come from using a bottom-half 
 handler to do 
 most of the work.  Right now the uhci-hcd driver does 
 everything in its 
 interrupt handler.  This would certainly help IRQ latency; it 
 might not 
 affect application latency very much.

Sounds very reasonable, thanks. Also helps application latency,
because with the RT patches, I can tune the rt prio of softirq
execution (that's where bottom-half goes, doesn't it?) w.r.t. the
rt prio of the application threads.

However, if I understand things correctly, if you really need 
to disable all interrupts while doing the USB work, it will not
make any difference if IRQs are disabled while you are in the
USB IRQ handler, or if they are disabled for the same amount of 
work/time in the bottom-half code.

 I'll try adding a bottom half in my next series of patches.  
 Maybe it will 
 be ready in time to appear in the -mm kernels before 2.6.12-final is 
 released.

 We'll see what happens with the upcoming changes.  Maybe 
 you'll be able to 
 test them for me?

Basically, yes (as long as our company doesn't decide to stop the
linux experiments).

However, I depend on Ingo's RT patch, which is against the -rc series,
not against the -mm series. So I will probably not be able to apply
patches created against -mm.

-- 
Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.11, USB: High latency?

2005-03-30 Thread kus Kusche Klaus

I'm performing realtime latency tests (for details about the hardware
and software, see my mail "[BUG] 2.6.11: Random SCSI/USB errors when
reading from USB memory stick" erlier today).

Even when the errors described in my previous mail does not occur,
massive USB stick transfers cause latencies of 1 to 2 milliseconds,
which is way too much for realtime control systems. 

I observe these latencies on a vanilla 2.6.11 at any rtprio (even 99),
and on realtime-preempt-2.6.12-rc1-V0.7.41-11 at low rtprio (1). When
running the program on realtime-preempt-2.6.12-rc1-V0.7.41-11 with
rtprio 99, the latencies are gone, but using a rtprio higher than the
interrupt handlers is not realistic.

Is there anything which can be done about it?

Klaus Kusche
> Entwicklung Software - Steuerung
> Software Development - Control
> 
> KEBA AG
> A-4041 Linz
> Gewerbepark Urfahr
> Tel +43 / 732 / 7090-3120
> Fax +43 / 732 / 7090-8919
> E-Mail: [EMAIL PROTECTED]
> www.keba.com
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

PREEMPT_RT: Undetected latencies?

2005-03-30 Thread kus Kusche Klaus

I'm experimenting with a realtime-preempt-2.6.12-rc1-V0.7.41-11 kernel,
preemption fully enabled, and a small RTC test program (see my previous
mail for details).

If my program runs at rtprio 99, most of the time everything is fine:
* My program records intervals of 122 +- 40 microseconds (which
corresponds nicely to the 8192 Hz RTC interrupt).
* Both the kernel's RTC histogram and the kernel's RT latency trace stay
below 40 microseconds.

However, during extensive testing, I found some cases which extend the
intervals unacceptably:
1.) When a test program performs massive mmaps (at base shell prio) and
activates the oom killer, my program measures intervals in the 15
millisecond range.
2.) When I get the USB errors described in a separate mail earlier
today, I also get intervals from 10 to 15 milliseconds.
(a combination of heavy USB and CF card traffic also seems to cause
blackouts in the 1 ms range even for rtprio 99, but I have to look into
that again)

Surprisingly, in both cases the worst latencies traced by the RT latency
trace and the RTC histogram are still less than 40 microseconds!
Moreover, the values read from /dev/rtc do not indicate any lost
interrupts. This seems to indicate that neither the RTC IRQ handler
itself nor timers were executed during these 15 millisecond period,
otherwise, either the RTC IRQ handler or the rtc_dropped_irq watchdog
would have counted lost interrupts! It seems that something locked up
the system completely for 15 ms: No interrupts at all, no scheduling,
...

Now, there are several questions:
* What is causing these 15 ms latencies in the two cases above, and how
can they be fixed?
* Why aren't they detected at all by any traces? 
Are these classical spinlocks which aren't instrumented? 
Are they explicit cli/sti in device drivers? 
Any other possibilities?
How can this be traced and analyzed?

A simple explanation would be that the TSC is reset. Is there any code
in the kernel messing with the TSC?

If my program runs at rtprio 1 instead of rtprio 99, many things cause
long intervals:
* normal working load (up to 1.2 ms)
* heavy mmap load without oom (many samples increased by 0.2 ms)
* heavy USB stick I/O without errors (up to 3 ms)
* core dumps (up to 10 ms)
* heavy USB stick I/O with errors (up to 15 ms)
* oom killer (up to 100 ms)
* heavy CF card I/O (up to 300 ms, see separate mail)
Even on an idle system, I see 1 ms intervals now and then!

Again, the latency traces are in the range of 40 to 50 microseconds
only, so these delays are caused by something not covered by the latency
tracer. How to find that out?

One more question: When my test program runs at rtprio 1, there are huge
delays (up to 100 ms) between the moment the rtc read gets ready and the
moment the program is actually scheduled. However, even then the wakeup
latency trace is not showing it. What is actually measured by the wakeup
latency trace?

Thanks for any help!

Klaus Kusche
> Entwicklung Software - Steuerung
> Software Development - Control
> 
> KEBA AG
> A-4041 Linz
> Gewerbepark Urfahr
> Tel +43 / 732 / 7090-3120
> Fax +43 / 732 / 7090-8919
> E-Mail: [EMAIL PROTECTED]
> www.keba.com
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-03-30 Thread kus Kusche Klaus

> From: Bartlomiej Zolnierkiewicz [mailto:[EMAIL PROTECTED] 
>
> On Wed, 30 Mar 2005 13:52:05 +0200, kus Kusche Klaus 
> <[EMAIL PROTECTED]> wrote:
> > However, things break seriously when exercising the CF card 
> in parallel
> > (e.g. with a dd if=/dev/hda of=/dev/null):
> > 
> > * The rtc *interrupt handler* is delayed for up to 250 
> *micro*seconds.
> > This is very bad for my purpose, but easy to explain: It is 
> roughly the
> > time needed to transfer 512 Bytes from a CF card which can 
> transfer 2
> > Mbyte/sec, and obviously, the CPU blocks all interrupts 
> while making pio
> > 
> > transfers. (Why? Is this really necessary?)
> >  
> > (I know because I instrumented the rtc irq handler with 
> rdtscl(), too)
> 
> hdparm -u1 /dev/hda
> 
> should help

Hmmm, thanks, that sounds very reasonable, and I didn't know that flag.

Unfortunately, it doesn't help. The bad timings stay the same (still
delays in the 30-300 ms range), the number of context switches stays the
same, ...

The only thing which changes is the CPU load shown by vmstat:
* With -u0, I have 1 % user, ~50 % sys, ~50 % wa
* With -u1, I have 1 % user, ~98 % sys, 1 % wa

P.S.: Apologies for my badly formatted mails. The company forces us to
use outlook, we may not even change the settings... 

Klaus Kusche
>Entwicklung Software - Steuerung
>Software Development - Control
>
>KEBA AG
>A-4041 Linz
>Gewerbepark Urfahr
>Tel +43 / 732 / 7090-3120
>Fax +43 / 732 / 7090-8919
>E-Mail: [EMAIL PROTECTED]
>www.keba.com
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-03-30 Thread kus Kusche Klaus

I've written a small test program which enables periodic RTC interrupts 
at 8192 Hz and then goes into a loop reading /dev/rtc and collecting 
timing statistics (using the rdtscl macro).

The program runs at highest realtime scheduling priority (99) with
memory 
locked. In the loop, it doesn't do any I/O except for /dev/rtc reads, no

memory allocations, nothing which could block or take time:

  rdtscl(old);

  for (i = 0; i < SAMPLES; ++i) {
if (read(fd, , sizeof (unsigned long)) != sizeof (unsigned
long)) {
  fprintf(stderr, "%s: reading rtc data failed: %s\n", argv[0],
  strerror(errno));
  exit(1);
}
rtcval >>= 8;
--rtcval;
if (rtcval >= INTERRUPTS)
  ++intr[INTERRUPTS];
else
  ++intr[rtcval];

rdtscl(new);
delta = new - old;
delta /= RESOLUTION;
if (delta >= INTERVALS)
  ++cnt[INTERVALS];
else
  ++cnt[delta];

old = new;
  }

The test system runs a 2.6.11 kernel (no SMP) on a Pentium3 500 MHz 
embedded hardware. All filesystems are on a CF Card connected to the 
primary IDE controller (intel 440BX chipset) in PIO mode 2. The CF
card can transfer up to 2 MB/s. 
There are 192 MB RAM, most of it free.
The system is configured without swap space.

As long as the system is idle otherwise, my program works as expected: 
Most of the time, after an rtc interrupt the program gets the data from 
/dev/rtc within less than 50 microseconds.

Even with the test program running (which causes >16000 context switches

per second), vmstat shows more than 95 % idle CPU. Hence, the CPU needed

by the rtc irq handler and the test program is minimal.

However, things break seriously when exercising the CF card in parallel 
(e.g. with a dd if=/dev/hda of=/dev/null):

* The rtc *interrupt handler* is delayed for up to 250 *micro*seconds. 
This is very bad for my purpose, but easy to explain: It is roughly the 
time needed to transfer 512 Bytes from a CF card which can transfer 2 
Mbyte/sec, and obviously, the CPU blocks all interrupts while making pio

transfers. (Why? Is this really necessary?)

(I know because I instrumented the rtc irq handler with rdtscl(), too)

* The *test program* is regularly blocked for 63 *milli*seconds, 
sometimes for up to 300 *milli*seconds, which is absolutely
unacceptable.

* vmstat shows around 50 % sys CPU and around 50 % I/O wait, and 0 or
1 % user CPU (zero idle CPU). context switches are down to around 
500 from the expected >16000, which also indicates that my program is 
scheduled much less often than it should. Most of the time, vmstat shows
one running and one blocked process.

Now the big question:
*** Why doesn't my rt test program get *any* CPU for 63 jiffies? ***
(the system ticks at 1000 HZ)

* There is around 50 % idle CPU (I/O wait). I/O wait should not block my

program, because my program neither performs any disk I/O nor swaps.

* The dd program obviously gets some CPU regularly (because it copies 2 
MB/s, and because no other program could cause the 1 % user CPU load). 
The dd runs at normal shell scheduling priority, so it should be 
preempted immediately by my test program!

Some things I tried:
1.) Using a mdma2 instead of a pio2 CF card: The CF card speed doubles
(4 MB/s instead of 2 MB/s), and the "blind time" of my test program
halves from 63 ms to 32 ms, but basically, the problem remains.
2.) Using a realtime-preempt-2.6.12-rc1-V0.7.41-11 kernel with
PREEMPT_RT:
If my test program runs at rtpri 99, the problem is gone: It is 
scheduled within 30 microseconds after the rtc interrupt.
If my test program runs at rtpri 1, it still suffers from delays 
in the 30 to 300 millisecond range.

Thanks for any help!

Klaus Kusche
> Entwicklung Software - Steuerung
> Software Development - Control
> 
> KEBA AG
> A-4041 Linz
> Gewerbepark Urfahr
> Tel +43 / 732 / 7090-3120
> Fax +43 / 732 / 7090-8919
> E-Mail: [EMAIL PROTECTED]
> www.keba.com
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] 2.6.11: Random SCSI/USB errors when reading from USB memory stick

2005-03-30 Thread kus Kusche Klaus

When trying a "dd if=/dev/sda of=/dev/null" (where /dev/sda is an USB
memory stick), this works fine for some seconds (and actually transfers
data at around 700-1000 KB/s), but ends up with some I/O errors sooner
or later, which cause the device to go offline (the stick must be
replugged to make it accessible again). Sometimes, this causes kernel
bugchecks and hung "dd" processes (which cannot be terminated even with
kill -9), see second example.

Two examples (syslog message outputs) are appended below, in both cases,
short USB reads seem to initiate the trouble.

I also tried running a kernel latency test program simultaneously: When
the errors occur, the latency goes up from less than 1 millisec to about
15 millisec (very bad for embedded control systems...).

The error occurred on an intel Pentium 3 (500 MHz) embedded system with
440BX chipset and 192 MB RAM. USB is handled by the 440BX (intel 82371
PIIX4). The UHCI driver shares interrupt 7 with an intel 82559ER 100
Mbit ethernet controller (which is driven by the e100 driver and active:
As there is no local keyboard, I access the system by ssh). 

The system "disk" is a 128 MB CF card directly connected to the 440BX
primary IDE port and running in PIO mode 2 at about 2 MB/sec peak (but
it is idle most of the time). There is a SM712 VGA chip running in text
mode, a 1000 HZ std PC timer, and no other "interesting" device (nothing
else on the PCI bus or causing any interrupts).

The error was reproduced with statically linked (no modules)
vanilla-2.6.11, 2.6.11-gentoo-r3, and
realtime-preempt-2.6.12-rc1-V0.7.41-11 kernels, all built with gcc
3.4.3.

I tried with two different sticks (an old 64 MB USB 1.x and a 1 GB USB
2.x), both show the same problem on all USB interfaces on the target.
The same dd command works fine on both sticks on my office PC. 

I suspect some kind of timing / bus / interrupt trouble, either due to
the interrupt shared between uhci and e100, or due to the extremely slow
PIO transfers (which seem to block interrupts).

What can I do to solve or further analyze the problem?

Thanks!

*** First example:

... trouble starts here:
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: *** thread
awakened.
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Command READ_10
(10 bytes)
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:  28 00 00 01 bd c0
00 00 f0 00
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Bulk Command S
0x43425355 T 0x38e L 122880 F 128 Trg 0 LUN 0 CL 10
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:
usb_stor_bulk_transfer_buf: xfer 31 bytes
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Status code 0;
transferred 31/31
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: -- transfer
complete
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Bulk command
transfer result=0
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:
usb_stor_bulk_transfer_sglist: xfer 122880 bytes, 30 entries
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Status code -75;
transferred 19840/122880
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: -- babble
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Bulk data transfer
result 0x3
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Attempting to get
CSW...
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage:
usb_stor_bulk_transfer_buf: xfer 13 bytes
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Status code -75;
transferred 13/13
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: -- babble
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Bulk status result
= 3
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: -- transport
indicates error, resetting
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage:
usb_stor_Bulk_reset called
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage:
usb_stor_control_msg: rq=ff rqtype=21 value= index=00 len=0
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Soft reset failed:
-32
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: scsi cmd done,
result=0x7
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: queuecommand
called
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: *** thread
sleeping.

... this error is repeated several times before being reported to the
SCSI layer...

Mar 30 10:47:17 OF455 kern.info kernel: SCSI error : <0 0 0 0> return
code = 0x7
Mar 30 10:47:17 OF455 kern.warn kernel: end_request: I/O error, dev sda,
sector 114112
Mar 30 10:47:17 OF455 kern.err kernel: Buffer I/O error on device sda,
logical block 57056

... the same for block 57057, 57058, 57059

Mar 30 10:47:18 OF455 kern.info kernel: SCSI error : <0 0 0 0> return
code = 0x7
Mar 30 10:47:18 OF455 kern.warn kernel: end_request: I/O error, dev sda,
sector 114118
Mar 30 10:47:18 OF455 kern.err kernel: Buffer I/O error on device sda,
logical block 57059

... now the device goes offline:

Mar 30 10:47:18 OF455 kern.debug kernel: usb-storage: *** thread
awakened.
Mar 30 10:47:18

[BUG] 2.6.11: Random SCSI/USB errors when reading from USB memory stick

2005-03-30 Thread kus Kusche Klaus

When trying a dd if=/dev/sda of=/dev/null (where /dev/sda is an USB
memory stick), this works fine for some seconds (and actually transfers
data at around 700-1000 KB/s), but ends up with some I/O errors sooner
or later, which cause the device to go offline (the stick must be
replugged to make it accessible again). Sometimes, this causes kernel
bugchecks and hung dd processes (which cannot be terminated even with
kill -9), see second example.

Two examples (syslog message outputs) are appended below, in both cases,
short USB reads seem to initiate the trouble.

I also tried running a kernel latency test program simultaneously: When
the errors occur, the latency goes up from less than 1 millisec to about
15 millisec (very bad for embedded control systems...).

The error occurred on an intel Pentium 3 (500 MHz) embedded system with
440BX chipset and 192 MB RAM. USB is handled by the 440BX (intel 82371
PIIX4). The UHCI driver shares interrupt 7 with an intel 82559ER 100
Mbit ethernet controller (which is driven by the e100 driver and active:
As there is no local keyboard, I access the system by ssh). 

The system disk is a 128 MB CF card directly connected to the 440BX
primary IDE port and running in PIO mode 2 at about 2 MB/sec peak (but
it is idle most of the time). There is a SM712 VGA chip running in text
mode, a 1000 HZ std PC timer, and no other interesting device (nothing
else on the PCI bus or causing any interrupts).

The error was reproduced with statically linked (no modules)
vanilla-2.6.11, 2.6.11-gentoo-r3, and
realtime-preempt-2.6.12-rc1-V0.7.41-11 kernels, all built with gcc
3.4.3.

I tried with two different sticks (an old 64 MB USB 1.x and a 1 GB USB
2.x), both show the same problem on all USB interfaces on the target.
The same dd command works fine on both sticks on my office PC. 

I suspect some kind of timing / bus / interrupt trouble, either due to
the interrupt shared between uhci and e100, or due to the extremely slow
PIO transfers (which seem to block interrupts).

What can I do to solve or further analyze the problem?

Thanks!

*** First example:

... trouble starts here:
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: *** thread
awakened.
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Command READ_10
(10 bytes)
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:  28 00 00 01 bd c0
00 00 f0 00
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Bulk Command S
0x43425355 T 0x38e L 122880 F 128 Trg 0 LUN 0 CL 10
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:
usb_stor_bulk_transfer_buf: xfer 31 bytes
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Status code 0;
transferred 31/31
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: -- transfer
complete
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage: Bulk command
transfer result=0
Mar 30 10:47:16 OF455 kern.debug kernel: usb-storage:
usb_stor_bulk_transfer_sglist: xfer 122880 bytes, 30 entries
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Status code -75;
transferred 19840/122880
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: -- babble
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Bulk data transfer
result 0x3
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Attempting to get
CSW...
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage:
usb_stor_bulk_transfer_buf: xfer 13 bytes
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Status code -75;
transferred 13/13
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: -- babble
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Bulk status result
= 3
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: -- transport
indicates error, resetting
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage:
usb_stor_Bulk_reset called
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage:
usb_stor_control_msg: rq=ff rqtype=21 value= index=00 len=0
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: Soft reset failed:
-32
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: scsi cmd done,
result=0x7
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: queuecommand
called
Mar 30 10:47:17 OF455 kern.debug kernel: usb-storage: *** thread
sleeping.

... this error is repeated several times before being reported to the
SCSI layer...

Mar 30 10:47:17 OF455 kern.info kernel: SCSI error : 0 0 0 0 return
code = 0x7
Mar 30 10:47:17 OF455 kern.warn kernel: end_request: I/O error, dev sda,
sector 114112
Mar 30 10:47:17 OF455 kern.err kernel: Buffer I/O error on device sda,
logical block 57056

... the same for block 57057, 57058, 57059

Mar 30 10:47:18 OF455 kern.info kernel: SCSI error : 0 0 0 0 return
code = 0x7
Mar 30 10:47:18 OF455 kern.warn kernel: end_request: I/O error, dev sda,
sector 114118
Mar 30 10:47:18 OF455 kern.err kernel: Buffer I/O error on device sda,
logical block 57059

... now the device goes offline:

Mar 30 10:47:18 OF455 kern.debug kernel: usb-storage: *** thread
awakened.
Mar 30 10:47:18 OF455 kern.debug

2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-03-30 Thread kus Kusche Klaus

I've written a small test program which enables periodic RTC interrupts 
at 8192 Hz and then goes into a loop reading /dev/rtc and collecting 
timing statistics (using the rdtscl macro).

The program runs at highest realtime scheduling priority (99) with
memory 
locked. In the loop, it doesn't do any I/O except for /dev/rtc reads, no

memory allocations, nothing which could block or take time:

  rdtscl(old);

  for (i = 0; i  SAMPLES; ++i) {
if (read(fd, rtcval, sizeof (unsigned long)) != sizeof (unsigned
long)) {
  fprintf(stderr, %s: reading rtc data failed: %s\n, argv[0],
  strerror(errno));
  exit(1);
}
rtcval = 8;
--rtcval;
if (rtcval = INTERRUPTS)
  ++intr[INTERRUPTS];
else
  ++intr[rtcval];

rdtscl(new);
delta = new - old;
delta /= RESOLUTION;
if (delta = INTERVALS)
  ++cnt[INTERVALS];
else
  ++cnt[delta];

old = new;
  }

The test system runs a 2.6.11 kernel (no SMP) on a Pentium3 500 MHz 
embedded hardware. All filesystems are on a CF Card connected to the 
primary IDE controller (intel 440BX chipset) in PIO mode 2. The CF
card can transfer up to 2 MB/s. 
There are 192 MB RAM, most of it free.
The system is configured without swap space.

As long as the system is idle otherwise, my program works as expected: 
Most of the time, after an rtc interrupt the program gets the data from 
/dev/rtc within less than 50 microseconds.

Even with the test program running (which causes 16000 context switches

per second), vmstat shows more than 95 % idle CPU. Hence, the CPU needed

by the rtc irq handler and the test program is minimal.

However, things break seriously when exercising the CF card in parallel 
(e.g. with a dd if=/dev/hda of=/dev/null):

* The rtc *interrupt handler* is delayed for up to 250 *micro*seconds. 
This is very bad for my purpose, but easy to explain: It is roughly the 
time needed to transfer 512 Bytes from a CF card which can transfer 2 
Mbyte/sec, and obviously, the CPU blocks all interrupts while making pio

transfers. (Why? Is this really necessary?)

(I know because I instrumented the rtc irq handler with rdtscl(), too)

* The *test program* is regularly blocked for 63 *milli*seconds, 
sometimes for up to 300 *milli*seconds, which is absolutely
unacceptable.

* vmstat shows around 50 % sys CPU and around 50 % I/O wait, and 0 or
1 % user CPU (zero idle CPU). context switches are down to around 
500 from the expected 16000, which also indicates that my program is 
scheduled much less often than it should. Most of the time, vmstat shows
one running and one blocked process.

Now the big question:
*** Why doesn't my rt test program get *any* CPU for 63 jiffies? ***
(the system ticks at 1000 HZ)

* There is around 50 % idle CPU (I/O wait). I/O wait should not block my

program, because my program neither performs any disk I/O nor swaps.

* The dd program obviously gets some CPU regularly (because it copies 2 
MB/s, and because no other program could cause the 1 % user CPU load). 
The dd runs at normal shell scheduling priority, so it should be 
preempted immediately by my test program!

Some things I tried:
1.) Using a mdma2 instead of a pio2 CF card: The CF card speed doubles
(4 MB/s instead of 2 MB/s), and the blind time of my test program
halves from 63 ms to 32 ms, but basically, the problem remains.
2.) Using a realtime-preempt-2.6.12-rc1-V0.7.41-11 kernel with
PREEMPT_RT:
If my test program runs at rtpri 99, the problem is gone: It is 
scheduled within 30 microseconds after the rtc interrupt.
If my test program runs at rtpri 1, it still suffers from delays 
in the 30 to 300 millisecond range.

Thanks for any help!

Klaus Kusche
 Entwicklung Software - Steuerung
 Software Development - Control
 
 KEBA AG
 A-4041 Linz
 Gewerbepark Urfahr
 Tel +43 / 732 / 7090-3120
 Fax +43 / 732 / 7090-8919
 E-Mail: [EMAIL PROTECTED]
 www.keba.com
 
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-03-30 Thread kus Kusche Klaus

 From: Bartlomiej Zolnierkiewicz [mailto:[EMAIL PROTECTED] 

 On Wed, 30 Mar 2005 13:52:05 +0200, kus Kusche Klaus 
 [EMAIL PROTECTED] wrote:
  However, things break seriously when exercising the CF card 
 in parallel
  (e.g. with a dd if=/dev/hda of=/dev/null):
  
  * The rtc *interrupt handler* is delayed for up to 250 
 *micro*seconds.
  This is very bad for my purpose, but easy to explain: It is 
 roughly the
  time needed to transfer 512 Bytes from a CF card which can 
 transfer 2
  Mbyte/sec, and obviously, the CPU blocks all interrupts 
 while making pio
  
  transfers. (Why? Is this really necessary?)
   
  (I know because I instrumented the rtc irq handler with 
 rdtscl(), too)
 
 hdparm -u1 /dev/hda
 
 should help

Hmmm, thanks, that sounds very reasonable, and I didn't know that flag.

Unfortunately, it doesn't help. The bad timings stay the same (still
delays in the 30-300 ms range), the number of context switches stays the
same, ...

The only thing which changes is the CPU load shown by vmstat:
* With -u0, I have 1 % user, ~50 % sys, ~50 % wa
* With -u1, I have 1 % user, ~98 % sys, 1 % wa

P.S.: Apologies for my badly formatted mails. The company forces us to
use outlook, we may not even change the settings... 

Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-8919
E-Mail: [EMAIL PROTECTED]
www.keba.com

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

PREEMPT_RT: Undetected latencies?

2005-03-30 Thread kus Kusche Klaus

I'm experimenting with a realtime-preempt-2.6.12-rc1-V0.7.41-11 kernel,
preemption fully enabled, and a small RTC test program (see my previous
mail for details).

If my program runs at rtprio 99, most of the time everything is fine:
* My program records intervals of 122 +- 40 microseconds (which
corresponds nicely to the 8192 Hz RTC interrupt).
* Both the kernel's RTC histogram and the kernel's RT latency trace stay
below 40 microseconds.

However, during extensive testing, I found some cases which extend the
intervals unacceptably:
1.) When a test program performs massive mmaps (at base shell prio) and
activates the oom killer, my program measures intervals in the 15
millisecond range.
2.) When I get the USB errors described in a separate mail earlier
today, I also get intervals from 10 to 15 milliseconds.
(a combination of heavy USB and CF card traffic also seems to cause
blackouts in the 1 ms range even for rtprio 99, but I have to look into
that again)

Surprisingly, in both cases the worst latencies traced by the RT latency
trace and the RTC histogram are still less than 40 microseconds!
Moreover, the values read from /dev/rtc do not indicate any lost
interrupts. This seems to indicate that neither the RTC IRQ handler
itself nor timers were executed during these 15 millisecond period,
otherwise, either the RTC IRQ handler or the rtc_dropped_irq watchdog
would have counted lost interrupts! It seems that something locked up
the system completely for 15 ms: No interrupts at all, no scheduling,
...

Now, there are several questions:
* What is causing these 15 ms latencies in the two cases above, and how
can they be fixed?
* Why aren't they detected at all by any traces? 
Are these classical spinlocks which aren't instrumented? 
Are they explicit cli/sti in device drivers? 
Any other possibilities?
How can this be traced and analyzed?

A simple explanation would be that the TSC is reset. Is there any code
in the kernel messing with the TSC?

If my program runs at rtprio 1 instead of rtprio 99, many things cause
long intervals:
* normal working load (up to 1.2 ms)
* heavy mmap load without oom (many samples increased by 0.2 ms)
* heavy USB stick I/O without errors (up to 3 ms)
* core dumps (up to 10 ms)
* heavy USB stick I/O with errors (up to 15 ms)
* oom killer (up to 100 ms)
* heavy CF card I/O (up to 300 ms, see separate mail)
Even on an idle system, I see 1 ms intervals now and then!

Again, the latency traces are in the range of 40 to 50 microseconds
only, so these delays are caused by something not covered by the latency
tracer. How to find that out?

One more question: When my test program runs at rtprio 1, there are huge
delays (up to 100 ms) between the moment the rtc read gets ready and the
moment the program is actually scheduled. However, even then the wakeup
latency trace is not showing it. What is actually measured by the wakeup
latency trace?

Thanks for any help!

Klaus Kusche
 Entwicklung Software - Steuerung
 Software Development - Control
 
 KEBA AG
 A-4041 Linz
 Gewerbepark Urfahr
 Tel +43 / 732 / 7090-3120
 Fax +43 / 732 / 7090-8919
 E-Mail: [EMAIL PROTECTED]
 www.keba.com
 
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.11, USB: High latency?

2005-03-30 Thread kus Kusche Klaus

I'm performing realtime latency tests (for details about the hardware
and software, see my mail [BUG] 2.6.11: Random SCSI/USB errors when
reading from USB memory stick erlier today).

Even when the errors described in my previous mail does not occur,
massive USB stick transfers cause latencies of 1 to 2 milliseconds,
which is way too much for realtime control systems. 

I observe these latencies on a vanilla 2.6.11 at any rtprio (even 99),
and on realtime-preempt-2.6.12-rc1-V0.7.41-11 at low rtprio (1). When
running the program on realtime-preempt-2.6.12-rc1-V0.7.41-11 with
rtprio 99, the latencies are gone, but using a rtprio higher than the
interrupt handlers is not realistic.

Is there anything which can be done about it?

Klaus Kusche
 Entwicklung Software - Steuerung
 Software Development - Control
 
 KEBA AG
 A-4041 Linz
 Gewerbepark Urfahr
 Tel +43 / 732 / 7090-3120
 Fax +43 / 732 / 7090-8919
 E-Mail: [EMAIL PROTECTED]
 www.keba.com
 
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

38 matches

Mail list logo