Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>  > Gilles Chanteperdrix wrote:
>  > > Jan Kiszka wrote:
>  > >  > At least when SMP is enable, already __xnlock_get becomes far too
>  > >  > heavy-weighted for being inlined. xnlock_put is fine now, but looking
>  > >  > closer at the disassembly still revealed a lot of redundancy related 
> to
>  > >  > acquiring and releasing xnlocks. In fact, we are mostly using
>  > >  > xnlock_get_irqsave and xnlock_put_irqrestore. Both include fiddling 
> with
>  > >  > rthal_local_irq_save/restore, also heavy-weighted on SMP.
>  > >  > 
>  > >  > So this patch turns the latter two into uninlined functions which
>  > >  > reduces the text size or nucleus and skins significantly on x86-64/SMP
>  > >  > (XENO_OPT_DEBUG_NUCLEUS disabled):
>  > > 
>  > > I think the human idea of how long an inline function can be is far more
>  > > restrictive than what a processor can take. When looking at assembly
>  > > code, you always find the code long, whereas in reality it is not that
>  > > long for a processor. 
>  > > 
>  > > Besides, IMO, the proper way to uninline xnlock operations is to leave
>  > > the non contended case inline, and to move the spinning out of line.
>  > 
>  > This patch is not just about uninlining xnlock, that's only one half of
>  > the savings. The other one is irq-disabling via i-pipe. The problem with
>  > our case is that we have no simple single check to find out that we are
>  > on a fast ride. Rather, we have to do quite some calculations/lookups
>  > before the first check, and we have to perform multiple checks even in
>  > the best case.
> 
> This is my fault, a tradeoff I made, I thought that the atomic_cmpxchg
> could be heavy on SMP systems, so I made a first check to see if we are
> not recursing. But we can do the two operations in one move if we accept
> to have a failing atomic_cmpxchg when recursing.

I'm unsure about the cache pressure of cmpxchg vs. plain read. I guess
the existing variant is already better.

Moreover, the spinning code is only a fraction of the fraction. We
cannot eliminate the recursion check, and we still have all the
local_irq_save code. And _all_ this code mostly comes together, thus we
save so much by uninlining those two functions.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to