Re: [Xenomai-core] Testing the adeos-ipipe-2.6.13-ppc-1.0-00.patch
On 10/17/2005 05:42 PM Fillod Stephane wrote: Hi Philippe, Sorry for the late report, Xenomai appears to work fine on a Freescale e500 board (MPC8541E) under Linux 2.6.13. Xenomai version was v1.9.9, ie. the daily snapshot as of today. Here are some preliminary figures (CPU 800MHz, Bus 133MHz, 32 kiB I-Cache 32 kiB D-Cache, 256 kiB L2): switch $ ./run == Sampling period: 100 us RTH| lat min| lat avg| lat max|lost RTD|3660|3690|8070| 0 kaltency $ ./run RTH|klat min|klat avg|klat max| overrun| RTS| -7350| -5715|6420| 0| 00:03:17/00:03:17 latency $ ./run == Sampling period: 100 us RTT| 00:08:04 RTH|-lat min|-lat avg|-lat max|-overrun| RTS| -6930| -4260|8700| 0| 00:08:06/00:08:06 Load for klatency/latency was ping flooding on FCC (piece of cake), and cache calibrator. IMHO, we can do nastier. You mean the cache calibrator from http://monetdb.cwi.nl/Calibrator/? I tried it on my Ocotea board and it increased the max latency for 25 to 30 us. Thanks. Wolfgang.
RE: [Xenomai-core] Testing the adeos-ipipe-2.6.13-ppc-1.0-00.patch
Wolfgang Grandegger wrote: [...] Load for klatency/latency was ping flooding on FCC (piece of cake), and cache calibrator. IMHO, we can do nastier. You mean the cache calibrator from http://monetdb.cwi.nl/Calibrator/? I tried it on my Ocotea board and it increased the max latency for 25 to 30 us. Yes, that very one. In this case, it has been used as a cache trashing load generator. But IMHO, this Calibrator should be better used in the Benchmarking Plan to get L1/L2/RAM access latency figures (w/o RT running), and offer one more correlation against RT latency results. We can afford a better cache trashing load generator. Earlier this year, I proposed flushy(tm) [1], but as Philippe suggested, we can do better. Flushy should be rewritten as an ADEOS layer, inserted just in front of Xenomai in the pipeline. This way, we would be sure the caches are dead cold when Xenomai enter its domain. Using tools like OProfile, it should be possible then to track cache misses, and fix them by prefetching, where available. [1] http://rtai.dk/cgi-bin/gratiswiki.pl?Latency_Killer (bottom of page) Here is the result of my 1.0-01 tests on e500: $ cat /proc/ipipe/version 1.0-01 SWITCH without load: RTH| lat min| lat avg| lat max|lost RTD|3660|3690|8070| 0 1.0-00 RTD|4620|4740|8730| 0 1.0-01 KLATENCY with load: RTH|-lat min|-lat avg|-lat max|-overrun| RTS| -7350| -5715|6420| 0|00:03:17 1.0-00 RTS| -6150| -4384| 12180| 0|00:03:13 1.0-01 LATENCY with load: == Sampling period: 100 us RTH|-lat min|-lat avg|-lat max|-overrun| RTS| -6930| -4260|8700| 0|00:08:06 1.0-00 RTS| -5670| -4620| 12930| 0|00:12:39 1.0-01 That's weird. Figures are worse, but since the load (ping -f + calibrator) was executed manually, it may not be the same. -- Stephane
Re: [Xenomai-core] Testing the adeos-ipipe-2.6.13-ppc-1.0-00.patch
Fillod Stephane wrote: Wolfgang Grandegger wrote: [...] Load for klatency/latency was ping flooding on FCC (piece of cake), and cache calibrator. IMHO, we can do nastier. You mean the cache calibrator from http://monetdb.cwi.nl/Calibrator/? I tried it on my Ocotea board and it increased the max latency for 25 to 30 us. Yes, that very one. In this case, it has been used as a cache trashing load generator. But IMHO, this Calibrator should be better used in the Benchmarking Plan to get L1/L2/RAM access latency figures (w/o RT running), and offer one more correlation against RT latency results. We can afford a better cache trashing load generator. Earlier this year, I proposed flushy(tm) [1], but as Philippe suggested, we can do better. Flushy should be rewritten as an ADEOS layer, inserted just in front of Xenomai in the pipeline. This way, we would be sure the caches are dead cold when Xenomai enter its domain. Using tools like OProfile, it should be possible then to track cache misses, and fix them by prefetching, where available. [1] http://rtai.dk/cgi-bin/gratiswiki.pl?Latency_Killer (bottom of page) Here is the result of my 1.0-01 tests on e500: $ cat /proc/ipipe/version 1.0-01 SWITCH without load: RTH| lat min| lat avg| lat max|lost RTD|3660|3690|8070| 0 1.0-00 RTD|4620|4740|8730| 0 1.0-01 KLATENCY with load: RTH|-lat min|-lat avg|-lat max|-overrun| RTS| -7350| -5715|6420| 0|00:03:17 1.0-00 RTS| -6150| -4384| 12180| 0|00:03:13 1.0-01 LATENCY with load: == Sampling period: 100 us RTH|-lat min|-lat avg|-lat max|-overrun| RTS| -6930| -4260|8700| 0|00:08:06 1.0-00 RTS| -5670| -4620| 12930| 0|00:12:39 1.0-01 That's weird. Figures are worse, but since the load (ping -f + calibrator) was executed manually, it may not be the same. Ok, I now suspect that another change regarding the size of the interrupt counters made this worse. I'm going to revert it and upload -02, just to make sure. -- Philippe.
Re: [Xenomai-core] Testing the adeos-ipipe-2.6.13-ppc-1.0-00.patch
Fillod Stephane wrote: Philippe Gerum wrote: [..] http://download.gna.org/adeos/patches/v2.6/adeos/ppc/adeos-ipipe-2.6.13- ppc-1.0-02.patch Here is the result of tests with version 1.0-02 on e500: load: ~1 minute ping -f, one run of calibrator chewing 64MiB. $ cat /proc/ipipe/version 1.0-02 SWITCH without load: RTH| lat min| lat avg| lat max|lost RTD|3660|3690|8070| 01.0-00 RTD|4620|4740|8730| 01.0-01 RTD|4620|4740|8190| 01.0-02 KLATENCY with load: RTH|-lat min|-lat avg|-lat max|-overrun| RTS| -7350| -5715|6420| 0|00:03:17 1.0-00 RTS| -6150| -4384| 12180| 0|00:03:13 1.0-01 RTS| -6150| -4183| 12480| 0|00:03:38 1.0-02 LATENCY with load: == Sampling period: 100 us RTH|-lat min|-lat avg|-lat max|-overrun| RTS| -6930| -4260|8700| 0|00:08:06 1.0-00 RTS| -5670| -4620| 12930| 0|00:12:39 1.0-01 RTS| -5700| -3750| 11280| 0|00:06:05 1.0-02 It looks like the char vs. long in the 1.0-0[12] patch was not the culprit, The last significant change between -00 and -01 is actually the one related to the fork pressure (others are cosmetic ones aimed at better sharing stuff with the blackfin port). The patch below against -02 removes it. --- 2.6.13/arch/ppc/kernel/entry.S~ 2005-10-18 18:42:09.0 +0200 +++ 2.6.13/arch/ppc/kernel/entry.S 2005-10-19 15:07:54.0 +0200 @@ -316,10 +316,8 @@ .globl ret_from_fork ret_from_fork: - STALL_ROOT_COND REST_NVGPRS(r1) bl schedule_tail - UNSTALL_ROOT_COND li r3,0 b ret_from_syscall at least not on e500. I'll do the bench again on 1.0-00. Man, if only we had that automated benchmark suite... Indeed... The positive thing being that, we now have the ultimate proof of its usefulness :o -- Philippe.
Re: [Xenomai-core] Testing the adeos-ipipe-2.6.13-ppc-1.0-00.patch
On 10/15/2005 09:17 PM Heikki Lindholm wrote: Wolfgang Grandegger kirjoitti: Hello Philippe, I got Xenomai working on a Ocotea-Board (AMCC 440GX) and a low-end TQM855L-Module (MPC 855) under Linux 2.6.14-rc3 :-). The patch applied with a few hunks and one easy to fix reject and I had to correct two problems. One with FEW_CONTEXT (see attached patch) and the second with #include asm/offsets.h in xenomai/arch/ppc/hal/switch.S. The include file does not exist (any more) in the kernel tree and therefore I commented out the line. I'm going to perform latency tests on various 4xx and 8xx boards next week. Here are some preliminary figures of the TQM855L-Module (CPU 80 MHz, Bus 40 MHz, 4 kB I-Cache 4 kB D-Cache): If you happen to know some (semi-)comparable figures for the same boards using some commercial RTOS, it would be nice to know them also, for comparison. Well, we only deal with free software. But I can compare the result from the klatency test with the one from RTAI/RTHAL under Linux 2.4, of course. Wolfgang.
RE: [Xenomai-core] Testing the adeos-ipipe-2.6.13-ppc-1.0-00.patch
Hi Philippe, Sorry for the late report, Xenomai appears to work fine on a Freescale e500 board (MPC8541E) under Linux 2.6.13. Xenomai version was v1.9.9, ie. the daily snapshot as of today. Here are some preliminary figures (CPU 800MHz, Bus 133MHz, 32 kiB I-Cache 32 kiB D-Cache, 256 kiB L2): switch $ ./run == Sampling period: 100 us RTH| lat min| lat avg| lat max|lost RTD|3660|3690|8070| 0 kaltency $ ./run RTH|klat min|klat avg|klat max| overrun| RTS| -7350| -5715|6420| 0| 00:03:17/00:03:17 latency $ ./run == Sampling period: 100 us RTT| 00:08:04 RTH|-lat min|-lat avg|-lat max|-overrun| RTS| -6930| -4260|8700| 0| 00:08:06/00:08:06 Load for klatency/latency was ping flooding on FCC (piece of cake), and cache calibrator. IMHO, we can do nastier. Thanks! -- Stephane PS: some rtai skin patches are to be expected
Re: [Xenomai-core] Testing the adeos-ipipe-2.6.13-ppc-1.0-00.patch
Fillod Stephane wrote: Hi Philippe, Sorry for the late report, Xenomai appears to work fine on a Freescale e500 board (MPC8541E) under Linux 2.6.13. Xenomai version was v1.9.9, ie. the daily snapshot as of today. Here are some preliminary figures (CPU 800MHz, Bus 133MHz, 32 kiB I-Cache 32 kiB D-Cache, 256 kiB L2): switch $ ./run == Sampling period: 100 us RTH| lat min| lat avg| lat max|lost RTD|3660|3690|8070| 0 kaltency $ ./run RTH|klat min|klat avg|klat max| overrun| RTS| -7350| -5715|6420| 0| 00:03:17/00:03:17 latency $ ./run == Sampling period: 100 us RTT| 00:08:04 RTH|-lat min|-lat avg|-lat max|-overrun| RTS| -6930| -4260|8700| 0| 00:08:06/00:08:06 Great you tested that, thanks. The calibration looks a bit pessimistic, so I guess that a narrowed one would leave us with something in the 10-12 us range worst-case in user-space, which would still be quite decent. Load for klatency/latency was ping flooding on FCC (piece of cake), and cache calibrator. IMHO, we can do nastier. Mixed LTP stuff and dd loops are quite good punishers AFAICS here. -- Philippe.
Re: [Xenomai-core] Testing the adeos-ipipe-2.6.13-ppc-1.0-00.patch
Wolfgang Grandegger kirjoitti: Hello Philippe, I got Xenomai working on a Ocotea-Board (AMCC 440GX) and a low-end TQM855L-Module (MPC 855) under Linux 2.6.14-rc3 :-). The patch applied with a few hunks and one easy to fix reject and I had to correct two problems. One with FEW_CONTEXT (see attached patch) and the second with #include asm/offsets.h in xenomai/arch/ppc/hal/switch.S. The include file does not exist (any more) in the kernel tree and therefore I commented out the line. I'm going to perform latency tests on various 4xx and 8xx boards next week. Here are some preliminary figures of the TQM855L-Module (CPU 80 MHz, Bus 40 MHz, 4 kB I-Cache 4 kB D-Cache): If you happen to know some (semi-)comparable figures for the same boards using some commercial RTOS, it would be nice to know them also, for comparison. -- Heikki Lindholm