> For public reference. In certain degree it's apparent from the context, > but the report is about RSA sign performance difference for OpenSSL > SPARC T4 Montgomery multiplication module and corresponding Solaris T4 > module, with OpenSSL being significantly slower. The least one can say > [at this point] is that problem appears to be "multi-layer", in sense > that there are different factors in play. First question in line is how > come same code performs that differently on Solaris and Linux. OpenSSL > on Linux delivers ~70% more RSA1024 signs than on Solaris (if we assume > that both systems operate at same frequency, which is supported by the > fact that verify results were virtually identical).
Another question is about suitability of floating-point fcmps and fmovd instructions. These are used to pick a vector from powers table in cache-timing neutral manner. I have to admit I haven't done due research whether or not they are optimal choice in the context, and/or whether or not we are better off using fand and for instructions for this purpose. As instructions in question are floating-point they might be executed by *shared* FPU and not by individual core [which might be disruptive for pipeline?]... ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [email protected]
