Re: SHA-256 implementation improvement

Andy Polyakov Tue, 29 May 2012 10:11:13 -0700

Interleaved are my results translated to your units, basically justmultiplied by 64 and rounded to three significant digits.

                    1.5    1.6    1.7    1.8    my
P III (Coppermime) 1821 / 1850 / 1742 / 1574 / 1614

P4 (Prescott)      1544 / 1546 / 1541 / 1375 / 1450

P4 (Northwood)     2200 / 1963 / 1931 / 2483 / 1957

AMD Sempron        1537 / 1450 / 1394 / 1205 / 1305

n/a

AMD K10            1270 / 1210 / 1215 / 988  / 1057

Core 2             1170 / 1131 / 1130 / 985  / 984

i5 Lynnfield       1250 / 1426 / 1271 / 1121 / 1033

Sandy Bridge       1265 / 1225 / 1228 / 1115 / 981 (*) with shrd

                                          1010 (folded loop with shrd)

Atom               2300 / 2050 / 1984 / 1700 / 2455


Results are consistent except for P4, Core 2 and Sandy Bridge.

As for P4 it's probably just to shrug the shoulders, accept whatever theresult is and forget about it. It's a bit hard to accept, but it'shardly worth figuring it out why our results vary that much.

As for Core 2. Difference is nominal and if I execute my binary withvarying stack seed(*) I can also measure 990 cycles per block. In otherwords variation can be explained by environmental factors such as cachecontention.

As for Sandy Bridge. I don't know... I could observe nominal variations,2-3%, on my machine, but nothing close to 10%, so this is odd... If youhave energy, test with varying stack seed(*)...

(*) because environment variables reside below stack simplest way toreseed stack is to 'env A=`perl -e 'print "A"x1024"'` ...' andexperiment with number after x.

So, 1.8 version is quite good. It's the best for almost all old/slow
architectures,  and my version is still the best for modern/powerful ones.

Come on, apart from your Sandy Bridge result for 1.8, it's virtuallyequivalent. Nominal difference can be explained by environmentalfactors, and if not, it's really low price to pay for >40% improvementon Atom. Besides, it's actually "slow" architectures that needoptimization more :-)


Cheers.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Re: SHA-256 implementation improvement

Reply via email to