Hi Andy (and anyone else that's interested),
As part of the general hackathon/audit we're doing in crypto/bn/ I once again came across the curious zeroing code in bn_expand2, only this time I figured it was high time for me to actually ask you about it. :-)
I understand the desire to cater for CPU pipelining with the 8-wise loop unrolling, but is this a better solution than just using memset() and letting the compiler take care of the same sort of thing?
It's #ifdef'ed already, so you may've as well tried it out and check performance with profiler.
Generally, there is nothing could be said about such optimizations. It so heavily depends on the platform (compiler + libc + hardware arch) that no useful conclusions may be drawn from just looking at the code.
Even if profiling would show you that memset() is faster, it may be by order of magnitude slower on the machine very next to you (which has different version of C library or different compiler settings).
Particularly, the considerations about precisely that internal loop vs. memset() are that:
1. memset MAY invoke a function call to the area which (1) MAY not be readily available in the cache and (2) the call itself takes so much CPU time that it would be possible to move a handful of BN_ULONG's instead. 2. if memset() is inlined, the particular implementation MAY perform some computations and byte-to-byte moves to align data on proper boundaries for further word-by-word copying. This may as well be less optimal than doing T *A = T *B, where T is a word. 3. Note that this loop moves 16 bytes per loop, compared to 4 bytes per loop in typical memset() implementations on x86. It has slightly higher cost per loop though (i--, A+=4, B+=4) and a pointer subscription (instead of dereferencing, this alone would be about 30% faster).
Bottom line: try it out.
P.S. any way, the optimization in this part will be completely negligible compared to the other costs, like encryption.
Note, I'm not going to mess with this code myself, partly because I don't know the full history behind it and partly because this sort of thing is not really the focus for me at the moment. I'm just querying out of interest. TIA.
Cheers, Geoff
-- Lev Walkin [EMAIL PROTECTED]
______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]