On Jan 17, 2008 3:16 PM, Jan Kiszka <[EMAIL PROTECTED]> wrote: > > Gilles Chanteperdrix wrote: > > On Jan 17, 2008 12:55 PM, Jan Kiszka <[EMAIL PROTECTED]> wrote: > >> Gilles Chanteperdrix wrote: > >>> On Jan 17, 2008 11:42 AM, Jan Kiszka <[EMAIL PROTECTED]> wrote: > >>>> Gilles Chanteperdrix wrote: > >>>>> Hi, > >>>>> > >>>>> after some (unsuccessful) time trying to instrument the code in a way > >>>>> that does not change the latency results completely, I found the > >>>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM. > >>>>> So, here comes an update on this issue. The culprit is the user-space > >>>>> context switch, which flushes the processor cache with the nklock > >>>>> locked, irqs off. > >>>>> > >>>>> There are two things we could do: > >>>>> - arrange for the ARM cache flush to happen with the nklock unlocked > >>>>> and irqs enabled. This will improve interrupt latency (latency -t 2) > >>>>> but obviously not scheduling latency (latency -t 1). If we go that > >>>>> way, there are several problems we should solve: > >>>>> > >>>>> we do not want interrupt handlers to reenter xnpod_schedule(), for > >>>>> this we can use the XNLOCK bit, set on whatever is > >>>>> xnpod_current_thread() when the cache flush occurs > >>>>> > >>>>> since the interrupt handler may modify the rescheduling bits, we need > >>>>> to test these bits in xnpod_schedule() epilogue and restart > >>>>> xnpod_schedule() if need be > >>>>> > >>>>> we do not want xnpod_delete_thread() to delete one of the two threads > >>>>> involved in the context switch, for this the only solution I found is > >>>>> to add a bit to the thread mask meaning that the thread is currently > >>>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue > >>>>> to delete whatever thread was marked for deletion > >>>>> > >>>>> in case of migration with xnpod_migrate_thread, we do not want > >>>>> xnpod_schedule() on the target CPU to switch to the migrated thread > >>>>> before the context switch on the source CPU is finished, for this we > >>>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect > >>>>> the condition in xnpod_schedule() epilogue and set the rescheduling > >>>>> bits so that xnpod_schedule is restarted and send the IPI to the > >>>>> target CPU. > >>>>> > >>>>> - avoid using user-space real-time tasks when running latency > >>>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t > >>>>> 2 case. This means that we should change the timerbench driver. There > >>>>> are at least two ways of doing this: > >>>>> use an rt_pipe > >>>>> modify the timerbench driver to implement only the nrt ioctl, using > >>>>> vanilla linux services such as wait_event and wake_up. > >>>> [As you reminded me of this unanswered question:] > >>>> One may consider adding further modes _besides_ current kernel tests > >>>> that do not rely on RTDM & native userland support (e.g. when > >>>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid > >>>> scenarios as well that must not be killed by such a change. > >>> I think the current test scenario for latency -t 1 and latency -t 2 > >>> are a bit misleading: they measure kernel-space latencies in presence > >>> of user-space real-time tasks. When one runs latency -t 1 or latency > >>> -t 2, one would expect that there are only kernel-space real-time > >>> tasks. > >> If they are misleading, depends on your perspective. In fact, they are > >> measuring in-kernel scenarios over the standard Xenomai setup, which > >> includes userland RT task activity these day. Those scenarios are mainly > >> targeting driver use cases, not pure kernel-space applications. > >> > >> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we > >> would benefit from an additional set of test cases. > > > > Ok, I will not touch timerbench then, and implement another kernel module. > > > > [Without considering all details] > To achieve this independence of user space RT thread, it should suffice > to implement a kernel-based frontend for timerbench. This frontent would > then either dump to syslog or open some pipe to tell userland about the > benchmark results. What do yo think?
My intent was to implement a protocol similar to the one of timerbench, but using an rt-pipe, and continue to use the latency test, adding new options such as -t 3 and t 4. But there may be problems with this approach: if we are compiling without CONFIG_XENO_OPT_PERVASIVE, latency will not run at all. So, it is probably simpler to implement a klatency that just reads from the rt-pipe. -- Gilles Chanteperdrix _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core