On Jan 17, 2008 3:22 PM, Jan Kiszka <[EMAIL PROTECTED]> wrote:
>
> Gilles Chanteperdrix wrote:
> > On Jan 17, 2008 3:16 PM, Jan Kiszka <[EMAIL PROTECTED]> wrote:
> >> Gilles Chanteperdrix wrote:
> >>> On Jan 17, 2008 12:55 PM, Jan Kiszka <[EMAIL PROTECTED]> wrote:
> >>>> Gilles Chanteperdrix wrote:
> >>>>> On Jan 17, 2008 11:42 AM, Jan Kiszka <[EMAIL PROTECTED]> wrote:
> >>>>>> Gilles Chanteperdrix wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> after some (unsuccessful) time trying to instrument the code in a way
> >>>>>>> that does not change the latency results completely, I found the
> >>>>>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
> >>>>>>> So, here comes an update on this issue. The culprit is the user-space
> >>>>>>> context switch, which flushes the processor cache with the nklock
> >>>>>>> locked, irqs off.
> >>>>>>>
> >>>>>>> There are two things we could do:
> >>>>>>> - arrange for the ARM cache flush to happen with the nklock unlocked
> >>>>>>> and irqs enabled. This will improve interrupt latency (latency -t 2)
> >>>>>>> but obviously not scheduling latency (latency -t 1). If we go that
> >>>>>>> way, there are several problems we should solve:
> >>>>>>>
> >>>>>>> we do not want interrupt handlers to reenter xnpod_schedule(), for
> >>>>>>> this we can use the XNLOCK bit, set on whatever is
> >>>>>>> xnpod_current_thread() when the cache flush occurs
> >>>>>>>
> >>>>>>> since the interrupt handler may modify the rescheduling bits, we need
> >>>>>>> to test these bits in xnpod_schedule() epilogue and restart
> >>>>>>> xnpod_schedule() if need be
> >>>>>>>
> >>>>>>> we do not want xnpod_delete_thread() to delete one of the two threads
> >>>>>>> involved in the context switch, for this the only solution I found is
> >>>>>>> to add a bit to the thread mask meaning that the thread is currently
> >>>>>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
> >>>>>>> to delete whatever thread was marked for deletion
> >>>>>>>
> >>>>>>> in case of migration with xnpod_migrate_thread, we do not want
> >>>>>>> xnpod_schedule() on the target CPU to switch to the migrated thread
> >>>>>>> before the context switch on the source CPU is finished, for this we
> >>>>>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect
> >>>>>>> the condition in xnpod_schedule() epilogue and set the rescheduling
> >>>>>>> bits so that xnpod_schedule is restarted and send the IPI to the
> >>>>>>> target CPU.
> >>>>>>>
> >>>>>>> - avoid using user-space real-time tasks when running latency
> >>>>>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
> >>>>>>> 2 case. This means that we should change the timerbench driver. There
> >>>>>>> are at least two ways of doing this:
> >>>>>>> use an rt_pipe
> >>>>>>>  modify the timerbench driver to implement only the nrt ioctl, using
> >>>>>>> vanilla linux services such as wait_event and wake_up.
> >>>>>> [As you reminded me of this unanswered question:]
> >>>>>> One may consider adding further modes _besides_ current kernel tests
> >>>>>> that do not rely on RTDM & native userland support (e.g. when
> >>>>>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
> >>>>>> scenarios as well that must not be killed by such a change.
> >>>>> I think the current test scenario for latency -t 1 and latency -t 2
> >>>>> are a bit misleading: they measure kernel-space latencies in presence
> >>>>> of user-space real-time tasks. When one runs latency -t 1 or latency
> >>>>> -t 2, one would expect that there are only kernel-space real-time
> >>>>> tasks.
> >>>> If they are misleading, depends on your perspective. In fact, they are
> >>>> measuring in-kernel scenarios over the standard Xenomai setup, which
> >>>> includes userland RT task activity these day. Those scenarios are mainly
> >>>> targeting driver use cases, not pure kernel-space applications.
> >>>>
> >>>> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we
> >>>> would benefit from an additional set of test cases.
> >>> Ok, I will not touch timerbench then, and implement another kernel module.
> >>>
> >> [Without considering all details]
> >> To achieve this independence of user space RT thread, it should suffice
> >> to implement a kernel-based frontend for timerbench. This frontent would
> >> then either dump to syslog or open some pipe to tell userland about the
> >> benchmark results. What do yo think?
> >
> > My intent was to implement a protocol similar to the one of
> > timerbench, but using an rt-pipe, and continue to use the latency
> > test, adding new options such as -t 3 and t 4. But there may be
> > problems with this approach: if we are compiling without
> > CONFIG_XENO_OPT_PERVASIVE, latency will not run at all. So, it is
> > probably simpler to implement a klatency that just reads from the
> > rt-pipe.
>
> But that klantency could perfectly reuse what timerbench already
> provides, without code changes to the latter, in theory.

That would be a kernel module then, but I also need some user-space
piece of software to do the computations and print the results.

-- 
                                               Gilles Chanteperdrix

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to