From: Andy Polyakov <[email protected]> Date: Fri, 31 May 2013 10:29:37 +0200
> Another question is about suitability of floating-point fcmps and fmovd > instructions. These are used to pick a vector from powers table in > cache-timing neutral manner. I have to admit I haven't done due research > whether or not they are optimal choice in the context, and/or whether or > not we are better off using fand and for instructions for this purpose. > As instructions in question are floating-point they might be executed by > *shared* FPU and not by individual core [which might be disruptive for > pipeline?]... fcmps is 11 cycle latency and executes in the external FPU. Likewise for floating point conditional moves of floating point registers. Floating point conditional moves of integer registers is the worst, it is split into two micro-ops and it breaks the instruction decode group. Plain fmovd you should never use, it goes into the external FPU because it effects the condition codes in the %fsr. Use fsrc2 isntead which has 1 cycle latency and executes in the front end of cpu. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [email protected]
