Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
> > Jan Kiszka wrote:
> > > At least when SMP is enable, already __xnlock_get becomes far too
> > > heavy-weighted for being inlined. xnlock_put is fine now, but looking
> > > closer at the disassembly still revealed a lot of redundancy related to
> > > acquiring and releasing xnlocks. In fact, we are mostly using
> > > xnlock_get_irqsave and xnlock_put_irqrestore. Both include fiddling with
> > > rthal_local_irq_save/restore, also heavy-weighted on SMP.
> > >
> > > So this patch turns the latter two into uninlined functions which
> > > reduces the text size or nucleus and skins significantly on x86-64/SMP
> > > (XENO_OPT_DEBUG_NUCLEUS disabled):
> > I think the human idea of how long an inline function can be is far more
> > restrictive than what a processor can take. When looking at assembly
> > code, you always find the code long, whereas in reality it is not that
> > long for a processor.
> > Besides, IMO, the proper way to uninline xnlock operations is to leave
> > the non contended case inline, and to move the spinning out of line.
> This patch is not just about uninlining xnlock, that's only one half of
> the savings. The other one is irq-disabling via i-pipe. The problem with
> our case is that we have no simple single check to find out that we are
> on a fast ride. Rather, we have to do quite some calculations/lookups
> before the first check, and we have to perform multiple checks even in
> the best case.
This is my fault, a tradeoff I made, I thought that the atomic_cmpxchg
could be heavy on SMP systems, so I made a first check to see if we are
not recursing. But we can do the two operations in one move if we accept
to have a failing atomic_cmpxchg when recursing.
Xenomai-core mailing list