From: Andy Polyakov <ap...@openssl.org> Date: Sun, 23 Sep 2012 22:53:53 +0200
>> The techniques used in this plain v9 implementation are: >> >> 1) Use little-endian 32-bit loads when input data is aligned. >> 2) Avoid having to accumulate into the context hash values every >> loop iteration. >> 3) In the aligned case try to seperate the loads from the first >> use by as many instructions as possible, without sacrificing >> the schedule too much. >> 4) Attempt to dual-issue as much as possible on UltraSPARC-I/II/III/IV >> and SPARC-T4. > > I had an old module lying around, dusted it off in > http://cvs.openssl.org/chngview?cn=22842. It's 20% faster than your > version on US pre-Tx. Improvement coefficient is likely to be even > higher on T1, because it keeps everything in register bank and there > are no loads except for input. Not really relevant, but it's nominally > faster even on T4. Could you discuss something like this before checking in such changes instead of just silently dismissing work I've posted? ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org