Re: [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS

2006-04-27 Thread Philippe Gerum

Jan Kiszka wrote:

ROSSIER Daniel wrote:


I'd say that the most efficient way to reduce those latencies would
require to first identify the source of the 40+ us spot observed with
the -t2 form on an idle system. For that, I'm convinced that porting the
I-pipe tracer to ARM would be the best option, since this tool would be
of great help there.



Thanks for the hint; we will spend some time on the tracer in the coming days. 
We keep you informed.




Cool, tracing also for ARM!



This port basically requires 1) to code the mcount() routine supporting
gcc's -pg option, 2) to solve early boot issues so that mcount() does
not attempt to trace anything while the memory environment has not been
fully set up. The rest is pretty generic.




Regarding a mcount() implementation and other details, the original
tracer effort by Ingo Molnar may give useful hints (at least it did for me):

http://people.redhat.com/mingo/latency-tracing-patches/

I remember the ARM part not being that simple as x86. I think this was
also due to lacking stack unwinding support on that arch.



There's a said to be working replacement for __builtin_return_address 
provided by mingo's patch.

http://people.redhat.com/mingo/latency-tracing-patches/patches/latency-tracing.patch

--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS

2006-04-27 Thread Jan Kiszka
ROSSIER Daniel wrote:
>>
>> I'd say that the most efficient way to reduce those latencies would
>> require to first identify the source of the 40+ us spot observed with
>> the -t2 form on an idle system. For that, I'm convinced that porting the
>> I-pipe tracer to ARM would be the best option, since this tool would be
>> of great help there.
>>
> Thanks for the hint; we will spend some time on the tracer in the coming 
> days. We keep you informed.
> 

Cool, tracing also for ARM!

>> This port basically requires 1) to code the mcount() routine supporting
>> gcc's -pg option, 2) to solve early boot issues so that mcount() does
>> not attempt to trace anything while the memory environment has not been
>> fully set up. The rest is pretty generic.
>>

Regarding a mcount() implementation and other details, the original
tracer effort by Ingo Molnar may give useful hints (at least it did for me):

http://people.redhat.com/mingo/latency-tracing-patches/

I remember the ARM part not being that simple as x86. I think this was
also due to lacking stack unwinding support on that arch.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


RE: [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS

2006-04-27 Thread ROSSIER Daniel


> -Message d'origine-
> De : Philippe Gerum [mailto:[EMAIL PROTECTED]
> Envoyé : mercredi, 26. avril 2006 10:41
> À : ROSSIER Daniel
> Cc : xenomai-core@gna.org
> Objet : Re: [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS
> 
> ROSSIER Daniel wrote:
> > Hi all,
> >
> >
> >
> > As promised, you can find the latency results (latency -t0/-t1/-t2) as
> > well as the
> >
> > stats from the switch utility for the performance of our Xenomai port
> > onto the i.MX21 board.
> >
> >
> >
> > These are fesh results J and we didn't have time to analyze them yet.
> >
> >
> >
> > Thanks for any feedback...
> 
> The tests have not been run long enough under load to get a reliable
> measure of the real worst-case figures, but still, the data sets seem
> consistent.

Ok; we will then make further test.

> 
> - the test run of latency -t2 (in-kernel timer handler) shows equivalent
> worst-case figures than the -t1 form (in-kernel thread), which means
> that most of the latency hit is taken at the Adeos level, i.e. in-kernel
> scheduling adds little in the picture. Room for improvement is primarily
> hiding somewhere in the Adeos layer, I think.
> 

ok; we still have to investigate all the call paths at the Adeos layer before 
the timer reprogramming.

> - comparing the min latency observed in the -t1 and -t2 forms, it looks
> like the inherent cost of traversing the rescheduling path would be
> close to ~10 us.
> 
> - comparing the min latency observed in the -t0 and -t1 forms, there is
> another 10+ us consumed in switching mm contexts, and paying the
> involved cache penalties. The way to measure the level of perturbation
> Linux adds by switching its own tasks is to write a simple kernel module
> embodying a Xenomai thread that keeps the CPU busy while the performance
>   test is running at a higher priority.
> 
> I'd say that the most efficient way to reduce those latencies would
> require to first identify the source of the 40+ us spot observed with
> the -t2 form on an idle system. For that, I'm convinced that porting the
> I-pipe tracer to ARM would be the best option, since this tool would be
> of great help there.
> 
Thanks for the hint; we will spend some time on the tracer in the coming days. 
We keep you informed.

> This port basically requires 1) to code the mcount() routine supporting
> gcc's -pg option, 2) to solve early boot issues so that mcount() does
> not attempt to trace anything while the memory environment has not been
> fully set up. The rest is pretty generic.
> 
> --
> 
> Philippe.

Daniel

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS

2006-04-26 Thread Philippe Gerum

ROSSIER Daniel wrote:

Hi all,

 

As promised, you can find the latency results (latency –t0/-t1/-t2) as 
well as the


stats from the switch utility for the performance of our Xenomai port 
onto the i.MX21 board.


 


These are fesh results J and we didn't have time to analyze them yet.

 


Thanks for any feedback…


The tests have not been run long enough under load to get a reliable 
measure of the real worst-case figures, but still, the data sets seem 
consistent.


- the test run of latency -t2 (in-kernel timer handler) shows equivalent 
worst-case figures than the -t1 form (in-kernel thread), which means 
that most of the latency hit is taken at the Adeos level, i.e. in-kernel 
scheduling adds little in the picture. Room for improvement is primarily 
hiding somewhere in the Adeos layer, I think.


- comparing the min latency observed in the -t1 and -t2 forms, it looks 
like the inherent cost of traversing the rescheduling path would be 
close to ~10 us.


- comparing the min latency observed in the -t0 and -t1 forms, there is 
another 10+ us consumed in switching mm contexts, and paying the 
involved cache penalties. The way to measure the level of perturbation 
Linux adds by switching its own tasks is to write a simple kernel module 
embodying a Xenomai thread that keeps the CPU busy while the performance 
 test is running at a higher priority.


I'd say that the most efficient way to reduce those latencies would 
require to first identify the source of the 40+ us spot observed with 
the -t2 form on an idle system. For that, I'm convinced that porting the 
I-pipe tracer to ARM would be the best option, since this tool would be 
of great help there.


This port basically requires 1) to code the mcount() routine supporting 
gcc's -pg option, 2) to solve early boot issues so that mcount() does 
not attempt to trace anything while the memory environment has not been 
fully set up. The rest is pretty generic.


--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core