The not-taken branch hint in the assembly code causes performance degradation as the hardware always predict the specific branch that way. The branch hint is not necessary as the hardware prediction is very good and getting better. The patch attached removed the branch hint to let the hardware do the prediction.
To see the performance improvements build with -mcpu=power7 (or whatever hardware it's running on), since the hints may get ignored if the compiler defaults to targeting an older version of the hardware (Power4). Below is the performance results built with -mcpu=power7. The positive number shows performance improvements percentage after the branch hint is removed. The performance test used "openssl speed" then calculate the percentage using the results from the branch hint removed and the results from the base (with branch hint). Percentage=(withoutHint/withHint) * 100 - 100 sha512 shows 32% performance improvements. sha256, sha1, md4, and md5 also benefit from this change. There are some negative numbers but they are very small (less than 1%). type 16bytes 64bytes 256bytes 1024bytes 8192bytes mdc2 1.57 0.43 0.07 0.03 0.01 md4 6.6 6.6 4.47 2.5 0.35 md5 6.9 5.65 3.68 1.44 0.23 hmac(md5) 0.35 0.01 0.42 0.12 0 sha1 7.29 6.46 4.35 2.21 0.42 sha256 18.35 10.99 5.05 1.64 0.24 sha512 31.85 32.08 13.72 4.95 0.67 whirlpool 0.69 0.66 0.44 0.33 0.31 rmd160 6.61 4.96 3.08 1.32 0.36 rc4 -0.01 -0.02 -0.19 -0.14 -0.22 descbc 0.04 -0 0.02 0.07 0.04 desede3 -0.02 0.01 0 -0 0.01 aes-128 0.05 -0 -0 0 0.01 aes-192 0.04 -0.01 -0 0.01 0 aes-256 0.08 -0.01 0 0.01 0.01 aes-128 0.02 0.03 -0.01 -0.01 -0.1 aes-192 0 0.01 0.02 0.01 -0.08 aes-256 0 0.02 0.01 -0 -0.07 ghash 0.51 0.36 0.06 0.02 -0.01 camellia-128 -0.34 -0.03 -0.02 -0.18 -0.69 camellia-192 -0.26 0.22 0.11 0.07 -0.26 camellia-256 -0.23 0.03 -0.01 -0.04 -0.32 idea 0.18 0.07 0.02 0 0.02 seed 0.02 0.06 -0.04 0.04 0.06 rc2 0.04 0 -0 0 0.01 blowfish 0.26 0.09 -0.09 -0.04 -0 cast 0.14 0.04 0.02 0.01 0 Please let me know if you have any questions. Thanks, Ashley Lai
diff -ur openssl-1.0.1/crypto/ppccpuid.pl openssl-1.0.1-wp/crypto/ppccpuid.pl --- openssl-1.0.1/crypto/ppccpuid.pl 2011-11-14 14:52:33.000000000 -0600 +++ openssl-1.0.1-wp/crypto/ppccpuid.pl 2012-04-18 16:47:53.098711478 -0500 @@ -105,7 +105,7 @@ Little: mtctr r4 stb r0,0(r3) addi r3,r3,1 - bdnz- \$-8 + bdnz \$-8 blr Lot: andi. r5,r3,3 beq Laligned @@ -118,7 +118,7 @@ mtctr r5 stw r0,0(r3) addi r3,r3,4 - bdnz- \$-8 + bdnz \$-8 andi. r4,r4,3 bne Little blr
