Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Hi Jan,
>> I see that the implementation of rthal_llmulshft seems to account for
>> the first argument sign. Does it work ? Namely, in the generic
>> implementation will __rthal_u96shift propagate the sign bit ?
> Yes, this works (given there is no overflow, of course). If you consider
> a high word of 0xfffffff0 and a (right) shift of 8, we effectively cut
> off all the leading 1s: high << (32-8) = 0xf0000000. But this only works
> because we replace a right shift with a left shift (plus some OR'ing
> later on). If we had to do a real right shift, we would also have to
> take signed vs. unsigned into account (ie. shift in zeros or the sign
> bit from the left?).
>> If yes, do you see a way llimd could be made to work the same way ? This
>> way we would avoid inline ullimd twice in llimd code.
> As the basic building block here is a multiplication, we cannot get
> around telling apart signed from unsigned (or converting signed into
> unsigned): the underlying multiplication logic is different.
> But what about this approach:
> static inline __attribute__((__const__)) long long
> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
> {
>       int signed = 0;
>       long long ret;
>       if (op < 0LL) {
>               op = -op;
>               signed = 1;
>       }
>       ret = __rthal_generic_ullimd(op, m, d);
>       return signed ? -ret : ret;
> }
> However, I guess writing this in assembly for archs that suffer should
> be more efficient.

Hi Jan,

You may have noticed that we played a bit with arithmetic operations
(namely, we use an llimd without division to make the reverse of
llmulshft), and it pays off on slow machines, such as ARM, where the
division is done in software.

At this chance, I looked at the code generated by this soluion, and I am
not sure that it is better: on ARM, and I suspect this is true on other
architectures, the operations needed to negate a long long clobbers the
code conditions, which means we can not make these operations
conditionals without a conditional jump, so the hand-coded assembler is
not better than what the compiler does: it uses two conditional jumps
whereas the original solution uses only one. Of course we could set sign
to -1 or 1, and multiply by sign at the end, but the multiplication is
probably even heavier than conditional jump.

So, would you have any idea of a better solution ?


Xenomai-core mailing list

Reply via email to