Hello...

On 2/25/14 18:32 , Martin McClure wrote:

Andres and I have an ongoing discussion on this topic. :-)

:)...

And #bitShift *should* be faster.

Ok, but how much faster? 64x64 bit multiplication on modern x86 is just 8 cycles (!), and there will likely be parallelism done by the CPU anyway...

But test this. In VW, multiplication
is quite a lot faster than bitShift: for some reason. Unless Andres has
fixed this recently. :-)

I haven't measured to see what's going on. In general terms though, unlike with multiplication, on top of the overflow check you also have the argument overflow check because on x86

shl eax, 65

is the same as

shl eax, 1

which is obviously not the behavior you want. So that means extra cmp, extra jmp, and you also have to deal with positive / negative arguments to select shl or shr/sar so more cmp and jmp, etc...

Note though that IIRC the manual says that for SSE registers, doing something like

shl xmm0, 129

is the same as

xor xmm0, xmm0

(assembler for the sake of illustratio only) which is nice because then you can combine the overflow check with the argument overflow check.

The whole SSE thing though is much, much more than just bitShift.

Andres.

Reply via email to