Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>  > At least when SMP is enable, already __xnlock_get becomes far too
>  > heavy-weighted for being inlined. xnlock_put is fine now, but looking
>  > closer at the disassembly still revealed a lot of redundancy related to
>  > acquiring and releasing xnlocks. In fact, we are mostly using
>  > xnlock_get_irqsave and xnlock_put_irqrestore. Both include fiddling with
>  > rthal_local_irq_save/restore, also heavy-weighted on SMP.
>  > 
>  > So this patch turns the latter two into uninlined functions which
>  > reduces the text size or nucleus and skins significantly on x86-64/SMP
>  > (XENO_OPT_DEBUG_NUCLEUS disabled):
> I think the human idea of how long an inline function can be is far more
> restrictive than what a processor can take. When looking at assembly
> code, you always find the code long, whereas in reality it is not that
> long for a processor. 
> Besides, IMO, the proper way to uninline xnlock operations is to leave
> the non contended case inline, and to move the spinning out of line.

This patch is not just about uninlining xnlock, that's only one half of
the savings. The other one is irq-disabling via i-pipe. The problem with
our case is that we have no simple single check to find out that we are
on a fast ride. Rather, we have to do quite some calculations/lookups
before the first check, and we have to perform multiple checks even in
the best case.

> And this is something we should not do without measuring its impact.

For sure, will be done. But I'm very optimistic about the results given
this massive code size reduction - which should translates in less cache
misses for the worst-case path. What increase latency most for us
(special hardware properties aside) is memory access, both data and text.


Attachment: signature.asc
Description: OpenPGP digital signature

Xenomai-core mailing list

Reply via email to