Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>  > Jan Kiszka wrote:
>  > ...
>  > > fast-tsc-to-ns-v2.patch
>  > > 
>  > >     [Rebased, improved rounding of least significant digit]
>  > 
>  > Rounding in the fast path for the sake of the last digit was silly.
>  > Instead, I'm now addressing the ugly interval printing via
>  > xnarch_precise_tsc_to_ns when converting the timer interval back into
>  > nanos. -v3 incorporating this has just been uploaded.
> 
> Hi,
> 
> I had a look at the fast-tsc-to-ns implementation, here is how I would
> rewrite it:
> 
> static inline void xnarch_init_llmulshft(const unsigned m_in,
>                                        const unsigned d_in,
>                                        unsigned *m_out,
>                                        unsigned *s_out)
> {
>       unsigned long long mult;
> 
>       *s_out = 31;
>       while (1) {
>               mult = ((unsigned long long)m_in) << *s_out;
>               do_div(mult, d_in);
>               if (mult <= INT_MAX)
>                       break;
>               (*s_out)--;
>       }
>       *m_out = (unsigned)mult;
> }
> 
> /* Non x86. */
> #define __rthal_u96shift(h, m, l, s) ({               \
>       unsigned _l = (l);                      \
>       unsigned _m = (m);                      \
>       unsigned _s = (s);                      \
>       _l >>= _s;                              \
>       _m >>= s;                               \
>       _l |= (_m << (32 - s));                 \
>       _m |= ((h) << (32 - s));                \
>         __rthal_u64fromu32(_m, _l);           \
> })
> 
> /* x86 */
> #define __rthal_u96shift(h, m, l, s) ({               \
>       unsigned _l = (l);                      \
>       unsigned _m = (m);                      \
>       unsigned _s = (s);                      \
>       asm ("shrdl\t%%cl,%1,%0"                \
>            : "+r,?m"(_l)                      \
>            : "r,r"(_m), "c,c"(_s));           \
>       asm ("shrdl\t%%cl,%1,%0"                \
>            : "+r,?m"(_m)                      \
>            : "r,r"(h), "c,c"(_s));            \
>       __rthal_u64fromu32(_m, _l);             \
> })
> 
> static inline long long rthal_llmi(int i, int j)
> {
>         /* Signed fast 32x32->64 multiplication */
>       return (long long) i * j;
> }
> 
> static inline long long gilles_llmulshft(const long long op,
>                                        const unsigned m,
>                                        const unsigned s)
> {
>       unsigned oph, opl, tlh, tll, thh, thl;
>       unsigned long long th, tl;
> 
>       __rthal_u64tou32(op, oph, opl);
>       tl = rthal_ullmul(opl, m);
>       __rthal_u64tou32(tl, tlh, tll);
>       th = rthal_llmi(oph, m);
>       th += tlh;
>       __rthal_u64tou32(th, thh, thl);
>       
>       return __rthal_u96shift(thh, thl, tll, s);
> }
> 
> 

Thanks for your suggestion.

While your generic version produces comparable code, the x86 variant is
about twice as large as the full-assembly version. And code size
translates into I-cache occupation, which may have latency costs.

[gcc 4.1, i386]
-O2 -mregparm=3 -fomit-frame-pointer:
    63: 08048490   119 FUNC    GLOBAL DEFAULT   13 gilles_llmulshft
    68: 08048510   121 FUNC    GLOBAL DEFAULT   13 gilles_llmulshft_x86
    77: 08048450    57 FUNC    GLOBAL DEFAULT   13 rthal_llmulshft
    78: 080483c0   135 FUNC    GLOBAL DEFAULT   13 __rthal_generic_llmulshft

-Os -mregparm=3 -fomit-frame-pointer:
    63: 0804843b    93 FUNC    GLOBAL DEFAULT   13 gilles_llmulshft
    68: 08048498    97 FUNC    GLOBAL DEFAULT   13 gilles_llmulshft_x86
    77: 08048410    43 FUNC    GLOBAL DEFAULT   13 rthal_llmulshft
    78: 080483b4    92 FUNC    GLOBAL DEFAULT   13 __rthal_generic_llmulshft

-O2:
    63: 08048480   120 FUNC    GLOBAL DEFAULT   13 gilles_llmulshft
    68: 08048500   105 FUNC    GLOBAL DEFAULT   13 gilles_llmulshft_x86
    77: 08048440    60 FUNC    GLOBAL DEFAULT   13 rthal_llmulshft
    78: 080483c0   117 FUNC    GLOBAL DEFAULT   13 __rthal_generic_llmulshft

-Os:
    63: 08048438   104 FUNC    GLOBAL DEFAULT   13 gilles_llmulshft
    68: 080484a0    83 FUNC    GLOBAL DEFAULT   13 gilles_llmulshft_x86
    77: 0804840b    45 FUNC    GLOBAL DEFAULT   13 rthal_llmulshft
    78: 080483b4    87 FUNC    GLOBAL DEFAULT   13 __rthal_generic_llmulshft

I'm not arguing we should turn each and every Xenomai arch code into
pure assembly. But in this case it already happened, it's less scattered
source code-wise, and it is compacter object-wise. So I would prefer to
keep it as is.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to