Gilles Chanteperdrix wrote:
Gilles Chanteperdrix wrote: > Hi,> > after some (unsuccessful) time trying to instrument the code in a way> that does not change the latency results completely, I found the > reason for the high latency with latency -t 1 and latency -t 2 on ARM. > So, here comes an update on this issue. The culprit is the user-space > context switch, which flushes the processor cache with the nklock > locked, irqs off.> > There are two things we could do:> - arrange for the ARM cache flush to happen with the nklock unlocked > and irqs enabled. This will improve interrupt latency (latency -t 2) > but obviously not scheduling latency (latency -t 1). If we go that > way, there are several problems we should solve:> > we do not want interrupt handlers to reenter xnpod_schedule(), for> this we can use the XNLOCK bit, set on whatever is > xnpod_current_thread() when the cache flush occurs> > since the interrupt handler may modify the rescheduling bits, we need> to test these bits in xnpod_schedule() epilogue and restart > xnpod_schedule() if need be> > we do not want xnpod_delete_thread() to delete one of the two threads> involved in the context switch, for this the only solution I found is > to add a bit to the thread mask meaning that the thread is currently > switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue > to delete whatever thread was marked for deletion> > in case of migration with xnpod_migrate_thread, we do not want> xnpod_schedule() on the target CPU to switch to the migrated thread > before the context switch on the source CPU is finished, for this we > can avoid setting the resched bit in xnpod_migrate_thread(), detect > the condition in xnpod_schedule() epilogue and set the rescheduling > bits so that xnpod_schedule is restarted and send the IPI to the > target CPU.Please find attached a patch implementing these ideas. This adds some clutter, which I would be happy to reduce. Better ideas are welcome.
I tried to cross-read the patch (-p would have been nice) but failed - this needs to be applied on some tree. Does the patch improve ARM latencies already?
> > - avoid using user-space real-time tasks when running latency> kernel-space benches, i.e. at least in the latency -t 1 and latency -t > 2 case. This means that we should change the timerbench driver. There > are at least two ways of doing this: > use an rt_pipe > modify the timerbench driver to implement only the nrt ioctl, using > vanilla linux services such as wait_event and wake_up.> > What do you think ?So, what do you thing is the best way to change the timerbench driver, * use an rt_pipe ? Pros: allows to run latency -t 1 and latency -t 2 even if Xenomai is compiled with CONFIG_XENO_OPT_PERVASIVE off; cons: make the timerbench non portable on other implementations of rtdm, eg. rtdm over rtai or the version of rtdm which runs over vanilla linux * modify the timerbecn driver to implement only nrt ioctls ? Pros: better driver portability; cons: latency would still need CONFIG_XENO_OPT_PERVASIVE to run latency -t 1 and latency -t 2.
I'm still voting for my third approach: -> Write latency as kernel application (klatency) against the timerbench device -> Call NRT IOCTLs of timerbench during module init/cleanup -> Use module parameters for customization -> Setup a low-prio kernel-based RT task to issue the RT IOCTLs -> Format the results nicely (similar to userland latency) in that RT task and stuff them into some rtpipe -> Use "cat /dev/rtpipeX" to display the results Jan
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core