Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Hi Jan, >>> >>> I see that the implementation of rthal_llmulshft seems to account for >>> the first argument sign. Does it work ? Namely, in the generic >>> implementation will __rthal_u96shift propagate the sign bit ? >> Yes, this works (given there is no overflow, of course). If you consider >> a high word of 0xfffffff0 and a (right) shift of 8, we effectively cut >> off all the leading 1s: high << (32-8) = 0xf0000000. But this only works >> because we replace a right shift with a left shift (plus some OR'ing >> later on). If we had to do a real right shift, we would also have to >> take signed vs. unsigned into account (ie. shift in zeros or the sign >> bit from the left?). >> >>> If yes, do you see a way llimd could be made to work the same way ? This >>> way we would avoid inline ullimd twice in llimd code. >> As the basic building block here is a multiplication, we cannot get >> around telling apart signed from unsigned (or converting signed into >> unsigned): the underlying multiplication logic is different. >> >> But what about this approach: >> >> static inline __attribute__((__const__)) long long >> __rthal_generic_llimd (long long op, unsigned m, unsigned d) >> { >> int signed = 0; >> long long ret; >> >> if (op < 0LL) { >> op = -op; >> signed = 1; >> } >> ret = __rthal_generic_ullimd(op, m, d); >> return signed ? -ret : ret; >> } >> >> However, I guess writing this in assembly for archs that suffer should >> be more efficient. > > Hi Jan, > > You may have noticed that we played a bit with arithmetic operations > (namely, we use an llimd without division to make the reverse of > llmulshft), and it pays off on slow machines, such as ARM, where the > division is done in software. > > At this chance, I looked at the code generated by this soluion, and I am > not sure that it is better: on ARM, and I suspect this is true on other > architectures, the operations needed to negate a long long clobbers the > code conditions, which means we can not make these operations > conditionals without a conditional jump, so the hand-coded assembler is > not better than what the compiler does: it uses two conditional jumps > whereas the original solution uses only one. Of course we could set sign > to -1 or 1, and multiply by sign at the end, but the multiplication is > probably even heavier than conditional jump.

Yes, on the archs that matter here (32-bit). > > So, would you have any idea of a better solution ? In an assembly version, one could save 'sign' in form of a jump target that should be taken after __rthal_generic_ullimd (ie. jump to the negation, or jump over it). Specifically when that address is kept in a register, I think smart branch prediction units will be able to do the right forecast. Jan

