> When compiling OpenSSL optimized on ARM using the Microsoft compiler,
Out of curiosity, which version? There is record of it generating bad
code specifically in bn_nist.c. At one occasion it was reported that it
generates bad code when optimization is switched off.
> the wrong code is being emitted for BN_nist_mod_521 (in bn_nist.c).
> The compiler seems to think that val and temp represent the same item
> when they are clearly one index apart. I've coded a fix that simply
> avoid using temporary variables and uses the indices into the t_d
> array directly. The code is simply a refactor of the existing code
> and does generate very effective neon instructions for the loop.
>
> ectest test which was always failing before in ARM on Windows is now
> succeeding (as well as all the other tests).
A recall looking at code generated at x86 and not liking the result with
code similar to what you suggest. Which is why those temporary values
were added. I wonder if you could test following loop.
for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++)
{
t_d[i] = (val>>BN_NIST_521_RSHIFT |
(val=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2;
}
t_d[i] = val>>BN_NIST_521_RSHIFT;
BTW, is there interest to adapt ARM/NEON assembly for Windows?
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [email protected]