[Xenomai-core] Re: Benchmarking Plan

Jim Cromie Tue, 01 Nov 2005 20:08:07 +0100

Philippe Gerum wrote:

This is a partial roadmap for the project, composed of the currently

Ah! I just _knew_ you would jump in as expected. The teasing worked :o)

well done ! Its the mark of a great leader to get folks to do what hewants,

while making them think its their idea ;-)

(and I imagine thats why you ccd Takis too :-)

[lots of snippage, thruout]

LiveCD has a few weaknesses though:

- cant test platforms w/o cdrom
I also think that's a serious issue. Aside of the hw availabilityproblem (e.g. non-x86 eval boards), having to burn the CD is one steptoo many when time is a scarce resource. It often prevents to run itas a fast check procedure even in the absence of any noticeableproblem. IOW, you won't burn a CD to run the tests unless you arereally stuck with some issue. So a significant part of the interest ofhaving a generic testsuite is lost: you just don't discover potentialproblems before the serious breakage is already in the wild.

One thing that would help expand LiveCD's usefullness is to be able to :

- mount pirt.iso in loopback on a host (my laptop),
- export it via NFS to box-under-test,
- use pxelinux to feed LiveCD's kernel(s?)  to box when it boots.

I tried to do this, and IIRC ran into trouble with absolute symlinks
from /etc.ro to /etc.   The absoluteness fouls things when the ISO
is mounted on forex: /media/cd.

I poked a bit at trying to convince NFS to resolve them as if they
were used within a chroot jail, but I dont know enough about that.

- manual re-entry of data is tedious,
- no collection of platform data (available for automation)
- spotty info about cpu, memory, mobo, etc

which is largely user-supplied, so it can be wrong.

- no unattended test (still true?)
- unfiltered preposterous data. Sometimes, data sent are just rubbishbecause of well-known hw-related dysfunctioning or misuse of theLiveCD. This perturbates the results uselessly.

Any ideas on how to reject these outliers ?
(defer til we have statistical analysis in place ?)

- difficulties so far to really get a sensible digested informationout of the zillions of results, aside of very general figures (e.g.best performer). But this is more an issue of lack of datapost-processors than of the LiveCD infrastructure itself.

yep.  And we *need* platform data to start to categorize them by platform,
important config choices, etc. We should see narrower ranges of results,
and be more able to reject the junk.

<snip>

Additionally, LiveCD is a really great tool when it comes to helppeople figuring out whether their respective box or brain have aproblem with the tested software, i.e. by automatically providing asane software (kernel+rtos) configuration and the proper way to run itquite easily, a number of people could determine if their current lackof luck comes from their software configuration, or rather from a moreserious problem.

yeah.  pre-built world saves a lot of early thrashing.

- testsuite/cruncher ?
The cruncher measures the impact of using the interrupt shield, butthis setting is now configured out by default since a majority ofpeople don't currently need it. Shield cost/performances are stilluseful to know though.

OK. adding 1 call to cruncher is simple. Over time we *may* collectenough data tomake some A (shields up!) vs B (shields down!) comparisons.But I dont see the data to distinguish A, B - dont we need thexeno/ipipe equivalent

of /proc/config.gz to do this ?

wrt testsuite/README cruncher notes, is this useful info ?

(manual insmods here...)
soekris:/usr/realtime/2.6.14-ski9-v1/testsuite/cruncher# cruncher
Calibrating cruncher...11773, done -- ideal computation time = 10023 us.
1000 samples, 1000 hz freq (pid=4183, policy=SCHED_FIFO, prio=99)
--------
Nanosleep jitter: min = 60 us, max = 192 us, avg = 77 us
Execution jitter: min = 39 us (0%), max = 72 us (0%), avg = 51 us (0%)
--------
Segmentation fault

soekris:/usr/realtime/2.6.14-ski9-v1/testsuite/cruncher# run
*
*
* Type ^C to stop this application.
*
*
Calibrating cruncher...11769, done -- ideal computation time = 10018 us.
1000 samples, 1000 hz freq (pid=4260, policy=SCHED_FIFO, prio=99)
--------
Nanosleep jitter: min = 62 us, max = 195 us, avg = 79 us
Execution jitter: min = 46 us (0%), max = 77 us (0%), avg = 57 us (0%)
--------

2. send your results to xenomai.testout-at-gmail.com
Obviously, an official gna.org ML might be more appropriate.


Will appear soon.


should this wait til xeno-test is upgraded to produce good data ?
ie prevent early bogus data from being submitted.


<snip>

As said before, the problem that currently exists with LiveCD's data,is that the results are cripled with irrelevant stuff, either becausesome people just tried it out over a simulator (ahem...), or had aserious hw-generated latency issue that basically made the whole runuseless (mostly x86 issues: e.g. SMI stuff, legacy USB emulation,powermgmt, cpufreq artefacts etc.).

I added a few /proc/config.gz related checks for CPU-FREQ, X86-GENERIC,
can you suggest additional checks ?

4. xeno-test output parser

- /proc/ipipe/Linux-stats parse into pairs of IRQ => CPU0 prop-times
- such data is only comparable across kernels with eq IRQ maps
- currently wont handle CPU1, SMP data
- /proc/interrupts is slightly better parsed.
- no detail-parse at all for top-data, needed?
I'm not sure that per-process data would help, just because those areway too volatile and fragmented to be interpreted rationally over along test period; maybe using per-subsystem data (e.g. /proc/syscrowd) at some point in time would better help.

prototype only, but its hackable (perl), and Im happy to graft all
sorts of horrible experiments on it provisionally to see whats useful.
Hopefully a plugin refactoring will become obvious wo too much work.
Warning people: JimC belongs to some kind of hybridization between aPerl Monger and a Real-timer; and the resulting entity is about to gowild... :o>

go off the deep end ?  into shark infested waters ?

Generally speaking, I guess that your idea is to collect sensible rawdata first, and devise how to process combiantions of them later.Sounds ok for me, and I especially like the idea of providing aspecialized ML for that which would be processed by a bot', sinceanyone would have unlimited access to the data, which might triggersome incentive for anyone to craft other/better digested figures.

yup.  inspired by LiveCD, and your reaction to it.

We should make sure to not base all the reasoning on a lo latency / hicpufreq correlation: this just happens to be wrong, especiallyx86-wise. Actually, a lot of recent x86 platforms with insanely highCPU freqs are really out of luck when it comes to perform decently inreal-time mode, just because the trend of "optimization" is just aboutkilling any determinism one would expect from his hw, by various uglytricks often aimed at making gamers happy.

pentium 4's   31 stage instruction-processing pipeline ? :-O

Im not suggesting its a good measure, but that it would make aninteresting graph.

latency vs mhz,  with data-points colored per the CPU type.

K6 - navy-blue, K7-royal-blue, K8- sky-blue, P2 - lime-green, P3 -mint-green, P4 - forest green

I understand "the plan behind the plan" to be able to somehow predictthat some particular sw / hw combo would work and help people figuringout which platform they might want to build their RT solution overusing Xeno, and it would be quite an achievement to do that.
For the time being though, I'd suggest that we focus on gathering rawdata and digest them according to a few simple metrics first; I'mpretty sure that once a sane and simple infrastructure to do that isin place, we should be able to flesh out the available results. Asusual, the key issue is to make such process of producing and usingthis data becoming a routine; once people get used to something, theytend to improve it quite naturally.

agreed. Its all blue-sky dreaming atm, and subject to ongoing realitychecks,

and ongoing discussion ( in little trickles )

jimc

[Xenomai-core] Re: Benchmarking Plan

Reply via email to