Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
>>>> Hi Jan,
>>>>
>>>> I see that the implementation of rthal_llmulshft seems to account for
>>>> the first argument sign. Does it work ? Namely, in the generic
>>>> implementation will __rthal_u96shift propagate the sign bit ?
>>> Yes, this works (given there is no overflow, of course). If you consider
>>> a high word of 0xfffffff0 and a (right) shift of 8, we effectively cut
>>> off all the leading 1s: high << (32-8) = 0xf0000000. But this only works
>>> because we replace a right shift with a left shift (plus some OR'ing
>>> later on). If we had to do a real right shift, we would also have to
>>> take signed vs. unsigned into account (ie. shift in zeros or the sign
>>> bit from the left?).
>>>
>>>> If yes, do you see a way llimd could be made to work the same way ? This
>>>> way we would avoid inline ullimd twice in llimd code.
>>> As the basic building block here is a multiplication, we cannot get
>>> around telling apart signed from unsigned (or converting signed into
>>> unsigned): the underlying multiplication logic is different.
>>>
>>> But what about this approach:
>>>
>>> static inline __attribute__((__const__)) long long
>>> __rthal_generic_llimd (long long op, unsigned m, unsigned d)
>>> {
>>>     int signed = 0;
>>>     long long ret;
>>>
>>>     if (op < 0LL) {
>>>             op = -op;
>>>             signed = 1;
>>>     }
>>>     ret = __rthal_generic_ullimd(op, m, d);
>>>     return signed ? -ret : ret;
>>> }
>>>
>>> However, I guess writing this in assembly for archs that suffer should
>>> be more efficient.
>> Hi Jan,
>>
>> You may have noticed that we played a bit with arithmetic operations
>> (namely, we use an llimd without division to make the reverse of
>> llmulshft), and it pays off on slow machines, such as ARM, where the
>> division is done in software.
>>
>> At this chance, I looked at the code generated by this soluion, and I am
>> not sure that it is better: on ARM, and I suspect this is true on other
>> architectures, the operations needed to negate a long long clobbers the
>> code conditions, which means we can not make these operations
>> conditionals without a conditional jump, so the hand-coded assembler is
>> not better than what the compiler does: it uses two conditional jumps
>> whereas the original solution uses only one. Of course we could set sign
>> to -1 or 1, and multiply by sign at the end, but the multiplication is
>> probably even heavier than conditional jump.
> 
> Yes, on the archs that matter here (32-bit).
> 
>> So, would you have any idea of a better solution ?
> 
> In an assembly version, one could save 'sign' in form of a jump target
> that should be taken after __rthal_generic_ullimd (ie. jump to the
> negation, or jump over it). Specifically when that address is kept in a
> register, I think smart branch prediction units will be able to do the
> right forecast.

Good idea, there is even a gcc extension which allows to do this in the
generic section:

static inline __attribute__((__const__)) long long
__rthal_generic_llimd (long long op, unsigned m, unsigned d)
{
        void *epilogue;
        long long ret;

        if (op < 0LL) {
                op = -op;
                epilogue = &&ret_neg;
        } else
                epilogue = &&ret_unchanged;
        ret = __rthal_generic_ullimd(op, m, d);
        goto *epilogue;
ret_unchanged:
        return ret;
ret_neg:
        return -ret;
}


-- 
                                                 Gilles.

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to