For public reference. The final version committed to repository with
following changes:

- nameing re-biased to ecp_nistz256, both filenames and functions;
- assembly optimized for processors other than Intel Core family;
- assembly modules adapted even for non-ELF platforms;
- some of higher level subroutines are moved to assembly to improve
performance even further;
- code adapted even for 32-bit platforms;

As for latter. Effort is ongoing to initially support ARM and x86 and
preliminary results indicate ~2x performance improvement. Point worth
mentioning in the context is that I'm considering switching back to
scatter-gather method. Basically for 32-bit sake, because current method
is a bit too slow on non-SIMD platforms. But then it would be simpler to
maintain same method in all cases including x86_64 one. Objection
against scatter-gather method was based on assertion in
http://cryptojedi.org/peter/data/chesrump-20130822.pdf that timing
within cache line varies. While phenomena is real, one has to recognize
that its effect on gather procedure is either 0 or at most few cycles.
This is because if conflict can occur it occurs only once per gather
procedure. And since it's only few cycles it's not given that you can
actually measure it, because tick counter resolution is actually limited
on contemporary processors. Not to mention that conflict can be avoided
by aligning stack frame in specific manner relative to scatter table
(method used in x86_64-mont5 module by the way).


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to