I do find that the bit-twiddling is faster to be surprising.
How much of the win is from avoiding a useless SMI->FP->int32 conversion vs.
avoiding the FISTTP instruction?

A branch mispredict is several times slower than a FP conversion, so be  
careful
that your benchmarks are realistic.
It might also be worth running the code under VTune to see if the old code  
was
stalling due to an unfortunate microarchitectural issue that could be fixed  
with
a little instruction scheduling.
I can imagine that fnstsw, having a value dependency on the exception state,
might be delayed for the latency of the previous instruction(s).


I see some benchmarks have code like (x & 0xFFFFFFFF) or the same but with
dynamic values.  It would be a huge advantage in this case if the FP->int
conversion took care of ToInt32 / ToUint32 conversions specified for the  
bitops.

I was considering investigating if the low 32 bits of FISTTP mem64 would  
always
be the right value.  I think it would for values with magnitude < 2^51, but  
I
was not sure outside that range.


http://codereview.chromium.org/506052

-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

Reply via email to