On Mon, Feb 11, 2008 at 2:41 PM, Jan Kiszka <[EMAIL PROTECTED]> wrote:
>
> Gilles Chanteperdrix wrote:
>  > Juan Antonio Garcia Redondo wrote:
>  >  > On 23/01/08 14:15, Gilles Chanteperdrix wrote:
>  >  > > On Jan 23, 2008 11:04 AM, Gilles Chanteperdrix
>  >  > > <[EMAIL PROTECTED]> wrote:
>  >  > > > On Jan 23, 2008 7:52 AM, Juan Antonio Garcia Redondo
>  >  > > >
>  >  > > > <[EMAIL PROTECTED]> wrote:
>  >  > > > > I see everything OK except for the first samples of cyclictests. 
> Any comments ?
>  >  > > >
>  >  > > > The load you apply does not load the cache, which is a source of
>  >  > > > jitter. You should run the cache calibrator (I do not find the cache
>  >  > > > calibrator URL, but it is somewhere in Xenomai distribution or 
> wiki).
>  >  > >
>  >  > > It is in the TROUBLESHOOTING guide, question "How do I adequately 
> stress test".
>  >  > >
>  >  > > --
>  >  > >                                                Gilles Chanteperdrix
>  >  >
>  >  > Thanks Gilles, I've done more tests using the cache calibrator from
>  >  > http://www.cwi.nl/~manegold/Calibrator. The latency numbers are very
>  >  > similar althought I've found an strange behaviour related to telnet
>  >  > sessions.
>  >  >
>  >  > Environment:
>  >  >    o Tests running from console over atmel serial port.
>  >  >    o A telnet session over on-chip ethernet.
>  >  >         o System without load.
>  >  >
>  >  > ./latency -p 500 -t0
>  >  > == All results in microseconds
>  >  > warming up...
>  >  > RTT|  00:00:01  (periodic user-mode task, 500 us period, priority 99)
>  >  > RTH|-RTH----lat min|-----lat avg|-----lat max|-overrun|----lat 
> best|---lat
>  >  > worst
>  >  > RTD|      49.613|      52.190|      62.822|       0|      49.613| 62.822
>  >  > RTD|      42.203|      52.512|      66.365|       0|      42.203| 66.365
>  >  >
>  >  >
>  >  > Now If hit a key on the telnet session :
>  >  >
>  >  > RTD|      36.726|      57.989|     109.536|       0|      31.572| 
> 109.536  <-------- Here I've hit the key.
>  >  > RTD|      36.404|      51.868|      69.587|       0|      31.572| 
> 109.536
>  >  > RTD|      35.760|      51.868|      73.775|       0|      31.572| 
> 109.536
>  >  >
>  >  > Now, I launch an script which executes four instances of cache
>  >  > calibrator.
>  >  >
>  >  > RTD|      45.103|      57.667|      75.708|       0|      32.538| 
> 122.422
>  >  > RTD|      45.425|      57.023|      76.030|       0|      32.538| 
> 122.422
>  >  > RTD|      46.069|      57.023|      75.708|       0|      32.538| 
> 122.422
>  >  >
>  >  > Now, I can hit a key on the telnet session without effects over latency
>  >  > numbers:
>  >  >
>  >  > RTD|      44.136|      57.989|      75.386|       0|      27.384| 
> 128.221
>  >  > RTD|      46.713|      57.345|      76.353|       0|      27.384| 
> 128.221
>  >  > RTD|      44.780|      57.345|      76.675|       0|      27.384| 
> 128.221
>  >  > RTD|      43.492|      56.701|      76.997|       0|      27.384| 
> 128.221
>  >  >
>  >  > Now I stop the calibrator process and launch 'ping -f -s2048 
> 192.168.2.82' from an external
>  >  > machine.
>  >  >
>  >  > RTD|      40.270|      68.621|      90.850|       0|      27.384| 
> 128.221
>  >  > RTD|      36.082|      68.621|      88.273|       0|      27.384| 
> 128.221
>  >  > RTD|      40.592|      67.976|      91.494|       0|      27.384| 
> 128.221
>  >  > RTD|      41.237|      68.298|      89.239|       0|      27.384| 
> 128.221
>  >  >
>  >  >
>  >  > Now If hit a key on the telnet session :
>  >  >
>  >  > RTD|      42.203|      67.976|      88.273|       0|      27.384| 
> 128.221
>  >  > RTD|      32.216|      93.427|     128.543|       0|      27.384| 
> 128.543 <---------- Here I've hit the key.
>  >  > RTD|      42.203|      68.298|      87.628|       0|      27.384| 
> 128.543
>  >  >
>  >  > And again the calibrator execution results on eliminate the strange
>  >  > behaviour whith the telnet session.
>  >  >
>  >  > Any clues ?
>  >
>  > Here is an update, follow-up on xenomai-core. I was finally able to
>  > reproduce this behaviour: I run latency in the background and hit the
>  > "Enter" key on my serial console, and get high latency figures.
>  >
>  > I enabled the tracer, set xenomai latency to 300us and managed to get a
>  > trace (220us latency). However, I do not understand what is going wrong
>  > from reading the trace, so I post it here in case someone sees something.
>  >
>  > Ah, and I added an ipipe_trace_special in ipipe_grab_irq to log the
>  > number of the received irq. 1 is serial interrupt 18 (0x12) is timer
>  > interrupt.
>  >
>  > Inline, so that Jan can comment it.
>
>  Thanks, but TB is too "smart" - it cuts off everything that is marked as
>  footer ("--"). :-/
>
>
>  > I-pipe frozen back-tracing service on 2.6.20/ipipe-1.8-04
>  > ------------------------------------------------------------
>  > CPU: 0, Freeze: 450692973 cycles, Trace Points: 1000 (+10)
>  > Calibrated minimum trace-point overhead: 1.000 us
>
>  That is interesting. I tells us that we might subtract 1 us
>  _per_tracepoint_ from the given latencies due to the inherent tracer
>  overhead. We have about 50 entries in the critical path, so 50 us
>  compared to 220 us that were measured - roughly 170 us real latency.
>
>  What is the clock resolution btw? 500 ns?

It should be around 1us. A bit less. I think the frequency is 92 MHz /
64. Anyway, I checked once and the I-pipe rounding is really
approximative, it is equivalent to rounding frequency to 2 MHz.

>
>  So here is the interesting block, starting with the last larger IRQs-on
>  window.
>
>
>  > :   + func                -447+   2.500  xnshadow_relax+0x14 
> (hisyscall_event+0x210)
>  > :|  + begin   0x80000000  -445+   3.000  xnshadow_relax+0xd4 
> (hisyscall_event+0x210)
>  > :|  # func                -442+   5.000  schedule_linux_call+0x10 
> (xnshadow_relax+0x114)
>  > :|  # func                -437+   4.000  rthal_apc_schedule+0x10 
> (schedule_linux_call+0x1e8)
>  > :|  # func                -433+   5.000  __ipipe_schedule_irq+0x10 
> (rthal_apc_schedule+0xac)
>  > :|  # func                -428+   4.500  __ipipe_set_irq_pending+0x10 
> (__ipipe_schedule_irq+0xa4)
>  > :|  # func                -423+   3.500  rpi_push+0x14 
> (xnshadow_relax+0x11c)
>  > :|  # func                -420+   5.500  xnpod_suspend_thread+0x14 
> (xnshadow_relax+0x148)
>  > :|  # func                -414+   4.000  xnpod_schedule+0x14 
> (xnpod_suspend_thread+0x60c)
>  > :|  # [  752] -<?>-    0  -410+   7.000  xnpod_schedule+0xc8 
> (xnpod_suspend_thread+0x60c)
>  > :|  # func                -403!  56.000  xnheap_finalize_free_inner+0x10 
> (xnpod_schedule+0x82c)
>  > :|  # [    0] -<?>-   -1  -347!  20.000  xnpod_schedule+0xb14 
> (xnintr_clock_handler+0xa0)
>  > :|   +func                -327+   3.000  __ipipe_walk_pipeline+0x10 
> (__ipipe_handle_irq+0x124)
>  > :|   +func                -324+   7.500  __ipipe_sync_stage+0x14 
> (__ipipe_walk_pipeline+0xa8)
>  > :|   #end     0x80000000  -317+   7.000  __ipipe_sync_stage+0x250 
> (__ipipe_walk_pipeline+0xa8)
>
>  OK, the clock starts ticking...
>
>
>  > :|   #func                -310+   2.500  __ipipe_grab_irq+0x10 
> (__irq_svc+0x28)
>  > :|   #begin   0xffffffff  -307+   7.000  __ipipe_grab_irq+0x20 
> (__irq_svc+0x28)
>  > :|   #(0x2a)  0x00000012  -300+   5.000  __ipipe_grab_irq+0x2c 
> (__irq_svc+0x28)
>  > :|   #func                -295+   4.000  __ipipe_handle_irq+0x10 
> (__ipipe_grab_irq+0x104)
>  > :|   #func                -291+   2.500  __ipipe_ack_timerirq+0x10 
> (__ipipe_handle_irq+0x74)
>  > :|   #func                -289+   3.000  __ipipe_ack_level_irq+0x10 
> (__ipipe_ack_timerirq+0x30)
>  > :|   #func                -286+   2.000  at91_aic_mask_irq+0x10 
> (__ipipe_ack_level_irq+0x3c)
>  > :|   #func                -284+   2.000  at91_aic_mask_irq+0x10 
> (__ipipe_ack_level_irq+0x4c)
>
>  (Without looking at the arm code: Is this double invocation of
>  at91_aic_mask_irq correct and required?)

No, and I do not see where it is coming from.

>
>
>  > :|   #func                -282+   4.000  __ipipe_mach_acktimer+0x10 
> (__ipipe_ack_timerirq+0x40)
>  > :|   #func                -278+   2.000  __ipipe_end_level_irq+0x10 
> (__ipipe_ack_timerirq+0x50)
>  > :|   #func                -276+   2.500  at91_aic_unmask_irq+0x10 
> (__ipipe_end_level_irq+0x28)
>  > :|   #func                -273+   3.500  __ipipe_dispatch_wired+0x14 
> (__ipipe_handle_irq+0x80)
>  > :|  #*func                -270+   3.500  xnintr_clock_handler+0x10 
> (__ipipe_dispatch_wired+0xe4)
>  > :|  #*func                -266+   9.500  xntimer_tick_aperiodic+0x14 
> (xnintr_clock_handler+0x34)
>  > :|  #*func                -257+   3.500  xnthread_periodic_handler+0x10 
> (xntimer_tick_aperiodic+0x354)
>  > :|  #*func                -253+   4.000  xnpod_resume_thread+0x14 
> (xnthread_periodic_handler+0x34)
>  > :|  #*[  753] -<?>-   99  -249!  15.000  xnpod_resume_thread+0x84 
> (xnthread_periodic_handler+0x34)
>
>  Hmm, comparably costly, this simple resume. Hope it's not the
>  instrumentation (ipipe_trace_pid?) itself.
>
>
>  > :|  #*func                -234+   6.500  xntimer_next_local_shot+0x10 
> (xntimer_tick_aperiodic+0x7c0)
>  > :|  #*func                -228+   4.000  __ipipe_mach_set_dec+0x10 
> (xntimer_next_local_shot+0xbc)
>  > :|  #*func                -224+   3.500  xnpod_schedule+0x14 
> (xnintr_clock_handler+0xa0)
>  > :|  #*[    0] -<?>-   -1  -220!  59.500  xnpod_schedule+0xc8 
> (xnintr_clock_handler+0xa0)
>
>  OK, this is the cache flushing thing, I guess. Expected.

Yes, I took care of disabling the (still recent) irqs unmasking in
xnpod_schedule. So, we see the cache flushing with irqs disabled.

>
>
>  > :|  #*[  753] -<?>-   99  -161!  17.500  xnpod_schedule+0xb14 
> (xnpod_suspend_thread+0x60c)
>
>  Again this peak. Strange.
>
>
>  > :|  #*func                -143+   4.000  xntimer_get_overruns+0x14 
> (xnpod_wait_thread_period+0xe0)
>  > :|  #*func                -139+   4.000  
> __ipipe_restore_pipeline_head+0x10 (xnpod_wait_thread_period+0x114)
>  > :|  +*end     0x80000000  -135+   7.000  
> __ipipe_restore_pipeline_head+0x11c (xnpod_wait_thread_period+0x114)
>
>  So we woke up the thread and are on the way back to userland. But then
>  the serial IRQ hits us...
>
>
>  > :|  +*func                -128+   3.000  __ipipe_grab_irq+0x10 
> (__irq_svc+0x28)
>  > :|  +*begin   0xffffffff  -125+   7.000  __ipipe_grab_irq+0x20 
> (__irq_svc+0x28)
>  > :|  +*(0x2a)  0x00000001  -118+   4.000  __ipipe_grab_irq+0x2c 
> (__irq_svc+0x28)
>  > :|  +*func                -114+   5.000  __ipipe_handle_irq+0x10 
> (__ipipe_grab_irq+0x104)
>  > :|  +*func                -109+   4.500  __ipipe_set_irq_pending+0x10 
> (__ipipe_handle_irq+0xdc)
>  > :|  +*func                -105+   2.000  __ipipe_ack_irq+0x10 
> (__ipipe_handle_irq+0xfc)
>  > :|  +*func                -103+   3.000  __ipipe_ack_level_irq+0x10 
> (__ipipe_ack_irq+0x28)
>  > :|  +*func                -100+   2.500  at91_aic_mask_irq+0x10 
> (__ipipe_ack_level_irq+0x3c)
>  > :|  +*func                 -97+   2.500  at91_aic_mask_irq+0x10 
> (__ipipe_ack_level_irq+0x4c)
>  > :|  +*func                 -95+   3.000  __ipipe_walk_pipeline+0x10 
> (__ipipe_handle_irq+0x124)
>
>  Serial done, the above 30 us (minus tracer overhead) are unavoidable.
>
>
>  > :|  +*end     0xffffffff   -92!  10.500  __ipipe_grab_irq+0x110 
> (__irq_svc+0x28)
>  > :|  +*begin   0x80000001   -81+   4.000  __ipipe_dispatch_event+0x108 
> (__ipipe_syscall_root+0x84)
>
>  This is the syscall epilogue related to wait_thread_period.

Is it normal to see the hw irqs disabled here ?

>
>
>  > :|  +*end     0x80000001   -77!  39.500  __ipipe_dispatch_event+0x1cc 
> (__ipipe_syscall_root+0x84)
>  > :   +*func                 -38+   2.500  __ipipe_syscall_root+0x10 
> (vector_swi+0x68)
>
>  After doing some latency calculation in user land, we are back to freeze
>  the trace now.
>
>
>  > :   +*func                 -35+   2.000  __ipipe_dispatch_event+0x14 
> (__ipipe_syscall_root+0x84)
>  > :|  +*begin   0x80000001   -33+   4.000  __ipipe_dispatch_event+0x40 
> (__ipipe_syscall_root+0x84)
>  > :|  +*end     0x80000001   -29+   2.000  __ipipe_dispatch_event+0xc8 
> (__ipipe_syscall_root+0x84)
>  > :   +*func                 -27+   6.500  hisyscall_event+0x14 
> (__ipipe_dispatch_event+0xe0)
>  > :   +*func                 -21+   2.500  xnshadow_sys_trace+0x10 
> (hisyscall_event+0x150)
>  > :   +*func                 -18+   2.000  ipipe_trace_frozen_reset+0x10 
> (xnshadow_sys_trace+0x98)
>  > :   +*func                 -16+   2.000  __ipipe_global_path_lock+0x10 
> (ipipe_trace_frozen_reset+0x14)
>  > :   +*func                 -14+   2.000  __ipipe_spin_lock_irqsave+0x10 
> (__ipipe_global_path_lock+0x18)
>  > :|  +*begin   0x80000001   -12+   8.000  __ipipe_spin_lock_irqsave+0x34 
> (__ipipe_global_path_lock+0x18)
>  > :|  #*func                  -4+   2.500  
> __ipipe_spin_unlock_irqcomplete+0x10 (__ipipe_global_path_unlock+0x6c)
>  > :|  +*end     0x80000001    -2+   2.000  
> __ipipe_spin_unlock_irqcomplete+0x4c (__ipipe_global_path_unlock+0x6c)
>  > <   +*freeze  0x00035ea1     0    2.500  xnshadow_sys_trace+0xa4 
> (hisyscall_event+0x150)
>
>  So, all in all, nothing fishy, at least here. Any IRQ (actually even
>  multiple!) can hit us between wakeup and latency calculation, and this
>  happened here due to sending some characters over the serial console.
>  Plus we see the cache flushing overhead when switching processes, also
>  normal.

The problem is that the nucleus latency is calibrated to 300 us (which
is very high), and that we get a 220us latency, which means that it
took 500us going from the timer interrupt to the xntrace syscall.

>
>  I would just check those small peaks around ipipe_trace_pid, if they are
>  just tracing related) (e.g. by disabling those instrumentations in the
>  nucleus).

Will try that.

-- 
                                               Gilles Chanteperdrix

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to