Re: MD5 optimized for AMD64 (+65% speedup)

Andy Polyakov Tue, 28 Dec 2004 08:40:01 -0800

Keep in mind that [unlike Gladman's code] OpenSSL code has to be position independent! It surely no problem on x86_64, but on x86 this puts you in very tight spot. But I've sketched some 32-bit PIC code already [as previously mentioned "I might have an opportunity to play with AES some day *this* year"], so give me few more days...
oh i guess you need to throw away another register on 32-bit x86 to load up the table base address. perhaps if you copied the key-schedule/context to the stack you could refer to it off %esp, and then use %ebp as a base register for the tables? it would pay if you can amortize the stack copy over multiple blocks...
That would require [major] surgery to API and will most likely push a bunch of "front-end" functions such as AES_cbc_encrypt to assembler... As it's unlikely to result in further *significant* improvement, I'd rather not:-)

Just for the record. As was shown by Dean one can expect ~30% *asymptotic* gain resulting from making a copy of key schedule into controlled place on stack. "Asymptotic" means for larger chunk-sizes only and implies API surgery. It also means that that small-packet oriented applications [such as ssh] are likely to suffer (though this can be avoided by maintaining two code-pathes and choosing one at run-time depending on input lenght:-) "Controlled place on stack" implies ""front-end" functions being implemented in assembler"... Is there interest for this? A. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [EMAIL PROTECTED]

Re: MD5 optimized for AMD64 (+65% speedup)

Reply via email to