Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Hi Jan,
>>>
>>> I see that the implementation of rthal_llmulshft seems to account for
>>> the first argument sign. Does it work ? Namely, in the generic
>>> implementation will __rthal_u96shift propagate the sign bit ?
>> Yes, this works (given there is no overflow, of course). If you consider
>> a high word of 0xfffffff0 and a (right) shift of 8, we effectively cut
>> off all the leading 1s: high << (32-8) = 0xf0000000. But this only works
>> because we replace a right shift with a left shift (plus some OR'ing
>> later on). If we had to do a real right shift, we would also have to
>> take signed vs. unsigned into account (ie. shift in zeros or the sign
>> bit from the left?).
>>
>>> If yes, do you see a way llimd could be made to work the same way ? This
>>> way we would avoid inline ullimd twice in llimd code.
>> As the basic building block here is a multiplication, we cannot get
>> around telling apart signed from unsigned (or converting signed into
>> unsigned): the underlying multiplication logic is different.
>>
>> But what about this approach:
>>
>> static inline __attribute__((__const__)) long long
>> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
>> {
>>      int signed = 0;
>>      long long ret;
>>
>>      if (op < 0LL) {
>>              op = -op;
>>              signed = 1;
>>      }
>>      ret = __rthal_generic_ullimd(op, m, d);
>>      return signed ? -ret : ret;
>> }
>>
>> However, I guess writing this in assembly for archs that suffer should
>> be more efficient.
> 
> Hi Jan,
> 
> You may have noticed that we played a bit with arithmetic operations
> (namely, we use an llimd without division to make the reverse of
> llmulshft), and it pays off on slow machines, such as ARM, where the
> division is done in software.
> 
> At this chance, I looked at the code generated by this soluion, and I am
> not sure that it is better: on ARM, and I suspect this is true on other
> architectures, the operations needed to negate a long long clobbers the
> code conditions, which means we can not make these operations
> conditionals without a conditional jump, so the hand-coded assembler is
> not better than what the compiler does: it uses two conditional jumps
> whereas the original solution uses only one. Of course we could set sign
> to -1 or 1, and multiply by sign at the end, but the multiplication is
> probably even heavier than conditional jump.

Yes, on the archs that matter here (32-bit).

> 
> So, would you have any idea of a better solution ?

In an assembly version, one could save 'sign' in form of a jump target
that should be taken after __rthal_generic_ullimd (ie. jump to the
negation, or jump over it). Specifically when that address is kept in a
register, I think smart branch prediction units will be able to do the
right forecast.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to