Re: [Xenomai-core] co-kernel benchmarking on arm926 (was: Fwd: problem in pthread_mutex_lock/unlock)

2010-06-25 Thread Philippe Gerum
On Thu, 2010-06-24 at 17:05 +0530, Nero Fernandez wrote:
 Thanks for your response, Philippe.
 
 The concerns while the carrying out my experiments were to:
 
  - compare xenomai co-kernel overheads (timer and context switch
 latencies)
in xenomai-space vs similar native-linux overheads. These are
 presented in 
the first two sheets.
 
  - find out, how addition of xenomai, xenomai+adeos effects the native
 kernel's 
performance. Here, lmbench was used on the native linux side to
 estimate
the changes to standard linux services.

How can your reasonably estimate the overhead of co-kernel services
without running any co-kernel services? Interrupt pipelining is not a
co-kernel service. You do nothing with interrupt pipelining except
enabling co-kernel services to be implemented with real-time response
guarantee.

 
 Regarding the additions of latency measurements in sys-timer handler,
 i performed 
 a similar measurement from xnintr_clock_handler(), and the results
 were similar 
 to ones reported from sys-timer handler in xenomai-enabled linux.

If your benchmark is about Xenomai, then at least make sure to provide
results for Xenomai services, used in a relevant application and
platform context. Pretending that you instrumented
xnintr_clock_handler() at some point and got some results, but
eventually decided to illustrate your benchmark with other similar
results obtained from a totally unrelated instrumentation code, does not
help considering the figures as relevant.

Btw, hooking xnintr_clock_handler() is not correct. Again, benchmarking
interrupt latency with Xenomai has to measure the entire code path, from
the moment the interrupt is taken by the CPU, until it is delivered to
the Xenomai service user. By instrumenting directly in
xnintr_clock_handler(), your test bypasses the Xenomai timer handling
code which delivers the timer tick to the user code, and the
rescheduling procedure as well, so your figures are optimistically wrong
for any normal use case based on real-time tasks.

  While trying to 
 make both these measurements, i tried to take care that delay-value
 logging is 
 done at the end the handler routines,but the __ipipe_mach_tsc value is
 recorded 
 at the beginning of the routine (a patch for this is included in the
 worksheet itself)

This patch is hopelessly useless and misleading. Unless your intent is
to have your application directly embodied into low-level interrupt
handlers, you are not measuring the actual overhead.

Latency is not solely a matter of interrupt masking, but also a matter
of I/D cache misses, particularly on ARM - you have to traverse the
actual code until delivery to exhibit the latter.

This is exactly what the latency tests shipped with Xenomai are for:
- /usr/xenomai/bin/latency -t0/1/2
- /usr/xenomai/bin/klatency
- /usr/xenomai/bin/irqbench

If your system involves user-space tasks, then you should benchmark
user-space response time using latency [-t0]. If you plan to use
kernel-based tasks such as RTDM tasks, then latency -t1 and klatency
tests will provide correct results for your benchmark.
If you are interested only in interrupt latency, then latency -t2 will
help.

If you do think that those tests do not measure what you seem to be
interested in, then you may want to explain why on this list, so that we
eventually understand what you are after.

 
 Regarding the system, changing the kernel version would invalidate my
 results
 as the system is a released CE device and has no plans to upgrade the
 kernel.

Ok. But that makes your benchmark 100% irrelevant with respect to
assessing the real performances of a decent co-kernel on your setup.

 AFAIK, enabling FCSE would limit the number of concurrent processes,
 hence
 becoming inviable in my scenario.

Ditto. Besides, FCSE as implemented in recent I-pipe patches has a
best-effort mode which lifts those limitations, at the expense of
voiding the latency guarantee, but on the average, that would still be
much better than always suffering the VIVT cache insanity without FCSE.

Quoting a previous mail of yours, regarding your target:
 Processor   : ARM926EJ-S rev 5 (v5l)

The latency hit induced by VIVT caching on arm926 is typically in the
180-200 us range under load in user-space, and 100-120 us in kernel
space. So, without FCSE, this would bite at each Xenomai __and__ linux
process context switch. Since your application requires that more than
95 processes be available in the system, you will likely get quite a few
switches in any given period of time, unless most of them always sleep,
of course.

Ok, so let me do some wild guesses here: you told us this is a CE-based
application; maybe it exists already? maybe it has to be put on steroïds
for gaining decent real-time guarantees it doesn't have yet? and perhaps
the design of that application involves many processes undergoing
periodic activities, so lots of context switches with address space
changes during normal operations?

And, you 

Re: [Xenomai-core] analogy - experimental branch

2010-06-25 Thread Stefan Schaal
Hi Alexis,

  thanks so much for the new analogy software. Here are some first observations:

1) cmd_bits.c works fine on our NI6250 board

2) however, a slightly modified version hangs -- I appended my cmd_bits.c to 
this email. All what I added is a for loop around the a4l_async_write() and 
a4l_snd_insn() commands, i.e., I wanted to trigger a write repeatedly. Look for 
the sschaal comment in my modified cmd_bits.c .  After 32 iterations, 
cmd_bits hangs, no error messages in dmesg. Interesting, when I change your 
trigger_threshold variable from 128 to 256, my loop runs for 16 iterations 
(other changes of the trigger threshold adjust the number of iterations I get 
in a similar way). Thus, it feels like there is a buffer which does not get 
reset after a4l_snd_insn() is called -- does this make sense?

Best wishes,

-Stefan


On Jun 24, 2010, at 15:43, Alexis Berlemont wrote:

 Hi,
 
 Alexis Berlemont wrote:
 Hi Stefan,
 
 Stefan Schaal wrote:
 Hi Alexis,
 
  I was just wondering whether the new experimental branch in your git 
 repository is something that can be tried already.
 
 
 No. Not yet. This branch is aimed at temporarily holding the
 corrections I am trying to do for the cmd_bits issue. It needs quite a
 lot of work and I have not finished yet. 
 
 If you have a look at the commits in this branch, we will see many
 (broken).
 
 
 I just rebased the experimental branch into the branch analogy. So,
 starting from now, we should be able to properly use cmd_bits with a
 clone of my git repository.
 
 After having reworked the asynchronous buffer subsystem (and having
 fixed some oops in the NI driver and in the new code), cmd_bits can
 correctly communicate with the DIO subdevice. 
 
 A command like ./cmd_bits 0x 0x works on my
 board. Unfortunately, I have not done the necessary to check the
 digital output lines yet.
 
 
 Best wishes,
 
 -Stefan
 
 -- 
 Alexis.
 
 -- 
 Alexis.


=== cmd_bits.c 
==


cmd_bits.c
Description: Binary data
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core