This time, here is an article about optimizing MD5 for AMD64. Code is provided, it has been designed to be easily integrated into OpenSSL:
http://epita.fr/~bevand_m/papers/md5-amd64.html
On an Opteron 244, the code is 65% faster than the current (C language)
implementation of OpenSSL.
As it's not really a key algorithm, adoption if this submission is going to the end of my TODO list. Another reason for this is that it requires assembler patch. One of unwritten OpenSSL design rule is to make things work assuming the least about target environment. Can you make 64-bit lea operations optional to break dependency from assembler patch? Note that 64-bit lea are more compact as far as I understand, which improves instruction pre-fetch. It would also be interesting to see if EM64T would be affected by switching to 64-bit lea. If not, then there is hardly a reason to keep 32-bit ones...
It also might make sense to coordinate efforts. Announce in advance what you're planning to do, ask for hints [like the below one]...
you might be interested in an approach i've used to mix both SSE and integer operations to implement algorithms from the SHA family. in my implementation i hide the long latency between the SSE and integer register files by doing software pipelining -- the SSE computation precedes the integer computation by several rounds (12 rounds in the sha1 case).
i've released my sha1 code and written up a description of how it works here <http://arctic.org/~dean/crypto/sha1.html>. i haven't yet released the sha256 code, but it's based on the same techniques (with similar results).
Dean,
I recall you posed some questions about OpenSSL SHA1 implementation earlier on this list, which remained unanswered. We might be able to discuss it now perhaps? On your page you mention API differences. Can you elaborate on this? At the end you mention that "OpenSSL's API expects to pass the input ... one block at a time." It's not true. sha1_block[_asm]_data_order is called with multi-block input whenever possible. Also note that SHA1 assembler was overhauled since 0.9.7c used as reference, which resulted in +85% improvement on P4. Would you care to comment? A.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]
