For public reference. The final version committed to repository with following changes:
- nameing re-biased to ecp_nistz256, both filenames and functions; - assembly optimized for processors other than Intel Core family; - assembly modules adapted even for non-ELF platforms; - some of higher level subroutines are moved to assembly to improve performance even further; - code adapted even for 32-bit platforms; As for latter. Effort is ongoing to initially support ARM and x86 and preliminary results indicate ~2x performance improvement. Point worth mentioning in the context is that I'm considering switching back to scatter-gather method. Basically for 32-bit sake, because current method is a bit too slow on non-SIMD platforms. But then it would be simpler to maintain same method in all cases including x86_64 one. Objection against scatter-gather method was based on assertion in http://cryptojedi.org/peter/data/chesrump-20130822.pdf that timing within cache line varies. While phenomena is real, one has to recognize that its effect on gather procedure is either 0 or at most few cycles. This is because if conflict can occur it occurs only once per gather procedure. And since it's only few cycles it's not given that you can actually measure it, because tick counter resolution is actually limited on contemporary processors. Not to mention that conflict can be avoided by aligning stack frame in specific manner relative to scatter table (method used in x86_64-mont5 module by the way). ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [email protected]
