>> Another question is about suitability of floating-point fcmps and fmovd >> instructions. These are used to pick a vector from powers table in >> cache-timing neutral manner. I have to admit I haven't done due research >> whether or not they are optimal choice in the context, and/or whether or >> not we are better off using fand and for instructions for this purpose. >> As instructions in question are floating-point they might be executed by >> *shared* FPU and not by individual core [which might be disruptive for >> pipeline?]... > > fcmps is 11 cycle latency and executes in the external FPU. > > Likewise for floating point conditional moves of floating point registers. > > Floating point conditional moves of integer registers is the worst, it > is split into two micro-ops and it breaks the instruction decode > group. > > Plain fmovd you should never use, it goes into the external FPU > because it effects the condition codes in the %fsr. Use fsrc2 isntead > which has 1 cycle latency and executes in the front end of cpu.
Thanks! Even though the question was inadequately formulated (it was not about just fmovd, but about *conditional* fmovd on floating-point condition, sorry), I get the picture. The conclusion seems to be that we should bet on logical operations, fand and for, which are 3 cycles and [more importantly?] are handled by private core resources. Thanks again. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [email protected]
