The not-taken branch hint in the assembly code causes performance
degradation as the hardware always predict the specific branch that way.
The branch hint is not necessary as the hardware prediction is very good
and getting better.  The patch attached removed the branch hint to let
the hardware do the prediction. 

To see the performance improvements build with -mcpu=power7 (or whatever
hardware it's running on), since the hints may get ignored if the
compiler defaults to targeting an older version of the hardware
(Power4).

Below is the performance results built with -mcpu=power7.  The positive
number shows performance improvements percentage after the branch hint
is removed.  The performance test used "openssl speed" then calculate
the percentage using the results from the branch hint removed and the
results from the base (with branch hint).

Percentage=(withoutHint/withHint) * 100 - 100

sha512 shows 32% performance improvements.  sha256, sha1, md4, and md5
also benefit from this change. There are some negative numbers but they
are very small (less than 1%).

type           16bytes   64bytes   256bytes   1024bytes   8192bytes
mdc2           1.57      0.43      0.07       0.03        0.01
md4            6.6       6.6       4.47       2.5         0.35
md5            6.9       5.65      3.68       1.44        0.23
hmac(md5)      0.35      0.01      0.42       0.12        0
sha1           7.29      6.46      4.35       2.21        0.42
sha256         18.35     10.99     5.05       1.64        0.24
sha512         31.85     32.08     13.72      4.95        0.67
whirlpool      0.69      0.66      0.44       0.33        0.31
rmd160         6.61      4.96      3.08       1.32        0.36
rc4            -0.01     -0.02     -0.19      -0.14       -0.22
descbc         0.04      -0        0.02       0.07        0.04
desede3        -0.02     0.01      0          -0          0.01
aes-128        0.05      -0        -0         0           0.01
aes-192        0.04      -0.01     -0         0.01        0
aes-256        0.08      -0.01     0          0.01        0.01
aes-128        0.02      0.03      -0.01      -0.01       -0.1
aes-192        0         0.01      0.02       0.01        -0.08
aes-256        0         0.02      0.01       -0          -0.07
ghash          0.51      0.36      0.06       0.02        -0.01
camellia-128   -0.34     -0.03     -0.02      -0.18       -0.69
camellia-192   -0.26     0.22      0.11       0.07        -0.26
camellia-256   -0.23     0.03     -0.01       -0.04       -0.32
idea           0.18      0.07     0.02        0           0.02
seed           0.02      0.06    -0.04        0.04        0.06
rc2            0.04      0       -0           0           0.01
blowfish       0.26      0.09    -0.09       -0.04        -0
cast           0.14      0.04    0.02        0.01         0

Please let me know if you have any questions.

Thanks,
Ashley Lai


diff -ur openssl-1.0.1/crypto/ppccpuid.pl openssl-1.0.1-wp/crypto/ppccpuid.pl
--- openssl-1.0.1/crypto/ppccpuid.pl	2011-11-14 14:52:33.000000000 -0600
+++ openssl-1.0.1-wp/crypto/ppccpuid.pl	2012-04-18 16:47:53.098711478 -0500
@@ -105,7 +105,7 @@
 Little:	mtctr	r4
 	stb	r0,0(r3)
 	addi	r3,r3,1
-	bdnz-	\$-8
+	bdnz	\$-8
 	blr
 Lot:	andi.	r5,r3,3
 	beq	Laligned
@@ -118,7 +118,7 @@
 	mtctr	r5
 	stw	r0,0(r3)
 	addi	r3,r3,4
-	bdnz-	\$-8
+	bdnz	\$-8
 	andi.	r4,r4,3
 	bne	Little
 	blr

Reply via email to