Another test "Openssl speed"

[Without patch]
 $ openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s
Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s
OpenSSL 1.0.2g 1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN 
-DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 
-DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM 
-DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM 
-DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 

[With patch]
 $ openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s
OpenSSL 1.0.2g 1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN 
-DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 
-DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM 
-DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM 
-DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k


** Description changed:

  [Impact]
  
  * Context:
  
  AMD added support in their processors for SHA Extensions[1] (CPU flag:
  sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit
  only (Confirmed with AMD representative). Current OpenSSL version in
  Ryzens still calls SHA for SSSE3 routine as result a number of
  extensions were effectively masked on Ryzen and shows no improvement.
  
  [1] /proc/cpuinfo
  processor : 0
  vendor_id : AuthenticAMD
  cpu family : 23
  model : 1
  model name : AMD Ryzen 5 1600 Six-Core Processor
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 
constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni 
pclmulqdq monitor ssse3 fma cx16 sse
  4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm 
extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce 
topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall 
fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho
  pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
  
  [2] - sha_ni: SHA1/SHA256 Instruction Extensions
  
  [3] - https://en.wikipedia.org/wiki/Ryzen
  ...
  All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, 
CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5]
  ...
  
  * Program to performs the CPUID check:
  
  Reference :
  https://software.intel.com/en-us/articles/intel-sha-extensions
  
  ... Availability of the Intel® SHA Extensions on a particular processor
  can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H,
  ECX=0):EBX.SHA [bit 29]. The following C function, using inline
  assembly, performs the CPUID check:
  
  --
  int CheckForIntelShaExtensions() {
     int a, b, c, d;
  
     // Look for CPUID.7.0.EBX[29]
     // EAX = 7, ECX = 0
     a = 7;
     c = 0;
  
     asm volatile ("cpuid"
          :"=a"(a), "=b"(b), "=c"(c), "=d"(d)
          :"a"(a), "c"(c)
         );
  
     // Intel® SHA Extensions feature bit is EBX[29]
     return ((b >> 29) & 1);
  }
  --
  
  On CPU with sha_ni the program return "1". Otherwise it return "0".
  
  [Test Case]
  
   * Reproducible with Xenial/Zesty/Artful release.
  
   * Generated a checksum of a big file (e.g. 5GB file) with openssl
   $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
  SHA256(/var/tmp/5Gfile)= 
8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8
  
  real  0m12.835s
  user  0m12.344s
  sys   0m0.484s
  
+ * Openssl speed 
+ $ openssl speed sha1
+ Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s
+ Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s
+ Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s
+ Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s
+ Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s
+ OpenSSL 1.0.2g 1 Mar 2016
+ built on: reproducible build, date unspecified
+ options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
+ compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT 
-DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM 
-DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
+ The 'numbers' are in 1000s of bytes per second processed.
+ type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
+ sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 
+ 
  The performance are clearly better when using the patch which take
- benefit of the sha extension.
+ benefit of the sha extension. (See Regression Potential section for
+ result with patch)
  
  [Regression Potential]
  
   * None expected, it basically allow openssl to take benefit of sha
  extension potential (mostly performance-wise) if AMD cpu has the
  capability.
  
   * Generated a checksum of a big file (e.g. 5GB file) with openssl
   $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
  SHA256(/var/tmp/5Gfile)= 
8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8
  
  real  0m3.471s
  user  0m2.956s
  sys   0m0.516s
  
+ * Openssl speed 
+ $ openssl speed sha1
+ Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s
+ Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s
+ Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s
+ Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s
+ Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s
+ OpenSSL 1.0.2g 1 Mar 2016
+ built on: reproducible build, date unspecified
+ options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
+ compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT 
-DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM 
-DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
+ The 'numbers' are in 1000s of bytes per second processed.
+ type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
+ sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k 
+ 
  [Other Info]
  
  * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145
  
  * Upstream Repository : https://github.com/openssl/openssl.git
  
  * Upstream Commits :
  1aed5e1 crypto/x86*cpuid.pl: move extended feature detection.
  ## This fix moves extended feature detection past basic feature detection 
where it belongs.
  
  f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
  ## This commit for x86_64cpuid.pl addressed the problem, but messed up 
processor vendor detection.
- 
- [Original Description]
- 
- * Context
- 
- AMD added support in their processors for SHA Extensions[1] (CPU flag:
- sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit
- only (Confirmed with AMD representative). Current OpenSSL version in
- Ryzens still calls SHA for SSSE3 routine as result a number of
- extensions were effectively masked on Ryzen and shows no improvement.
- 
- [1] /proc/cpuinfo
- processor     : 0
- vendor_id     : AuthenticAMD
- cpu family    : 23
- model         : 1
- model name    : AMD Ryzen 5 1600 Six-Core Processor
- flags         : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf 
eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse
- 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm 
extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce 
topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall 
fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho
- pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
- 
- [2] - sha_ni: SHA1/SHA256 Instruction Extensions
- 
- [3] - https://en.wikipedia.org/wiki/Ryzen
- ...
- All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, 
CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5]
- ...
- 
- * Program to performs the CPUID check
- 
- Reference :
- https://software.intel.com/en-us/articles/intel-sha-extensions
- 
- ... Availability of the Intel® SHA Extensions on a particular processor
- can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H,
- ECX=0):EBX.SHA [bit 29]. The following C function, using inline
- assembly, performs the CPUID check:
- 
- --
- int CheckForIntelShaExtensions() {
-    int a, b, c, d;
- 
-    // Look for CPUID.7.0.EBX[29]
-    // EAX = 7, ECX = 0
-    a = 7;
-    c = 0;
- 
-    asm volatile ("cpuid"
-         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)
-         :"a"(a), "c"(c)
-        );
- 
-    // Intel® SHA Extensions feature bit is EBX[29]
-    return ((b >> 29) & 1);
- }
- --
- 
- On CPU with sha_ni the program return "1". Otherwise it return "0".
- 
- * Upstream work:
- 
- - Repository : https://github.com/openssl/openssl.git
- 
- - Commits :
- 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection.
- ## This fix moves extended feature detection past basic feature detection 
where it belongs.
- 
- f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
- ## This commit for x86_64cpuid.pl addressed the problem, but messed up 
processor vendor detection.

** Description changed:

  [Impact]
  
  * Context:
  
  AMD added support in their processors for SHA Extensions[1] (CPU flag:
  sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit
  only (Confirmed with AMD representative). Current OpenSSL version in
  Ryzens still calls SHA for SSSE3 routine as result a number of
  extensions were effectively masked on Ryzen and shows no improvement.
  
  [1] /proc/cpuinfo
  processor : 0
  vendor_id : AuthenticAMD
  cpu family : 23
  model : 1
  model name : AMD Ryzen 5 1600 Six-Core Processor
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 
constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni 
pclmulqdq monitor ssse3 fma cx16 sse
  4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm 
extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce 
topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall 
fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho
  pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
  
  [2] - sha_ni: SHA1/SHA256 Instruction Extensions
  
  [3] - https://en.wikipedia.org/wiki/Ryzen
  ...
  All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, 
CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5]
  ...
  
  * Program to performs the CPUID check:
  
  Reference :
  https://software.intel.com/en-us/articles/intel-sha-extensions
  
  ... Availability of the Intel® SHA Extensions on a particular processor
  can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H,
  ECX=0):EBX.SHA [bit 29]. The following C function, using inline
  assembly, performs the CPUID check:
  
  --
  int CheckForIntelShaExtensions() {
     int a, b, c, d;
  
     // Look for CPUID.7.0.EBX[29]
     // EAX = 7, ECX = 0
     a = 7;
     c = 0;
  
     asm volatile ("cpuid"
          :"=a"(a), "=b"(b), "=c"(c), "=d"(d)
          :"a"(a), "c"(c)
         );
  
     // Intel® SHA Extensions feature bit is EBX[29]
     return ((b >> 29) & 1);
  }
  --
  
  On CPU with sha_ni the program return "1". Otherwise it return "0".
  
  [Test Case]
  
   * Reproducible with Xenial/Zesty/Artful release.
  
   * Generated a checksum of a big file (e.g. 5GB file) with openssl
   $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
  SHA256(/var/tmp/5Gfile)= 
8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8
  
  real  0m12.835s
  user  0m12.344s
  sys   0m0.484s
  
- * Openssl speed 
+ * Openssl speed
  $ openssl speed sha1
  Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s
  Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s
  Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s
  Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s
  Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s
  OpenSSL 1.0.2g 1 Mar 2016
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
  compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT 
-DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM 
-DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
  The 'numbers' are in 1000s of bytes per second processed.
  type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
- sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 
+ sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55
  
  The performance are clearly better when using the patch which take
  benefit of the sha extension. (See Regression Potential section for
  result with patch)
  
  [Regression Potential]
+ 
+  * Note from irc discussion with apw and rbasak :
+ 
+ [10:03:20] <apw> slashd, for me some new functionality like that is ok
+ as long as it is very self-contained so easy to review and confirm is
+ only used on the new h/w
+ 
+ [10:03:52] <apw>one of our main goals is to avoid regressions
+ 
+ [10:12:24] <rbasak> The SRU policy does explicitly permit hardware
+ enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be involved
+ in mitigating risk and making the final risk decision, FWIW.
  
   * None expected, it basically allow openssl to take benefit of sha
  extension potential (mostly performance-wise) if AMD cpu has the
  capability.
  
   * Generated a checksum of a big file (e.g. 5GB file) with openssl
   $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
  SHA256(/var/tmp/5Gfile)= 
8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8
  
  real  0m3.471s
  user  0m2.956s
  sys   0m0.516s
  
- * Openssl speed 
+ * Openssl speed
  $ openssl speed sha1
  Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s
  Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s
  Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s
  Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s
  Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s
  OpenSSL 1.0.2g 1 Mar 2016
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
  compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT 
-DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM 
-DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
  The 'numbers' are in 1000s of bytes per second processed.
  type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
- sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k 
+ sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k
  
  [Other Info]
  
  * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145
  
  * Upstream Repository : https://github.com/openssl/openssl.git
  
  * Upstream Commits :
  1aed5e1 crypto/x86*cpuid.pl: move extended feature detection.
  ## This fix moves extended feature detection past basic feature detection 
where it belongs.
  
  f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
  ## This commit for x86_64cpuid.pl addressed the problem, but messed up 
processor vendor detection.

** Description changed:

  [Impact]
  
  * Context:
  
  AMD added support in their processors for SHA Extensions[1] (CPU flag:
  sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit
  only (Confirmed with AMD representative). Current OpenSSL version in
  Ryzens still calls SHA for SSSE3 routine as result a number of
  extensions were effectively masked on Ryzen and shows no improvement.
  
  [1] /proc/cpuinfo
  processor : 0
  vendor_id : AuthenticAMD
  cpu family : 23
  model : 1
  model name : AMD Ryzen 5 1600 Six-Core Processor
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 
constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni 
pclmulqdq monitor ssse3 fma cx16 sse
  4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm 
extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce 
topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall 
fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho
  pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
  
  [2] - sha_ni: SHA1/SHA256 Instruction Extensions
  
  [3] - https://en.wikipedia.org/wiki/Ryzen
  ...
  All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, 
CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5]
  ...
  
  * Program to performs the CPUID check:
  
  Reference :
  https://software.intel.com/en-us/articles/intel-sha-extensions
  
  ... Availability of the Intel® SHA Extensions on a particular processor
  can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H,
  ECX=0):EBX.SHA [bit 29]. The following C function, using inline
  assembly, performs the CPUID check:
  
  --
  int CheckForIntelShaExtensions() {
     int a, b, c, d;
  
     // Look for CPUID.7.0.EBX[29]
     // EAX = 7, ECX = 0
     a = 7;
     c = 0;
  
     asm volatile ("cpuid"
          :"=a"(a), "=b"(b), "=c"(c), "=d"(d)
          :"a"(a), "c"(c)
         );
  
     // Intel® SHA Extensions feature bit is EBX[29]
     return ((b >> 29) & 1);
  }
  --
  
  On CPU with sha_ni the program return "1". Otherwise it return "0".
  
  [Test Case]
  
   * Reproducible with Xenial/Zesty/Artful release.
  
   * Generated a checksum of a big file (e.g. 5GB file) with openssl
   $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
  SHA256(/var/tmp/5Gfile)= 
8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8
  
  real  0m12.835s
  user  0m12.344s
  sys   0m0.484s
  
  * Openssl speed
  $ openssl speed sha1
  Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s
  Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s
  Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s
  Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s
  Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s
  OpenSSL 1.0.2g 1 Mar 2016
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
  compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT 
-DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM 
-DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
  The 'numbers' are in 1000s of bytes per second processed.
  type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
  sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55
  
  The performance are clearly better when using the patch which take
  benefit of the sha extension. (See Regression Potential section for
  result with patch)
  
  [Regression Potential]
  
-  * Note from irc discussion with apw and rbasak :
+  * Note from irc discussion with apw and rbasak :
  
  [10:03:20] <apw> slashd, for me some new functionality like that is ok
  as long as it is very self-contained so easy to review and confirm is
  only used on the new h/w
  
  [10:03:52] <apw>one of our main goals is to avoid regressions
  
  [10:12:24] <rbasak> The SRU policy does explicitly permit hardware
  enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be involved
  in mitigating risk and making the final risk decision, FWIW.
  
   * None expected, it basically allow openssl to take benefit of sha
- extension potential (mostly performance-wise) if AMD cpu has the
- capability.
+ extension potential (mostly performance-wise) now that new AMD cpu
+ starting  have the capability.
  
   * Generated a checksum of a big file (e.g. 5GB file) with openssl
   $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
  SHA256(/var/tmp/5Gfile)= 
8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8
  
  real  0m3.471s
  user  0m2.956s
  sys   0m0.516s
  
  * Openssl speed
  $ openssl speed sha1
  Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s
  Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s
  Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s
  Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s
  Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s
  OpenSSL 1.0.2g 1 Mar 2016
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
  compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT 
-DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM 
-DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
  The 'numbers' are in 1000s of bytes per second processed.
  type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
  sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k
  
  [Other Info]
  
  * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145
  
  * Upstream Repository : https://github.com/openssl/openssl.git
  
  * Upstream Commits :
  1aed5e1 crypto/x86*cpuid.pl: move extended feature detection.
  ## This fix moves extended feature detection past basic feature detection 
where it belongs.
  
  f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
  ## This commit for x86_64cpuid.pl addressed the problem, but messed up 
processor vendor detection.

** Changed in: openssl (Ubuntu Zesty)
       Status: Triaged => In Progress

** Changed in: openssl (Ubuntu Xenial)
       Status: Triaged => In Progress

** Changed in: openssl (Ubuntu Xenial)
   Importance: Undecided => Medium

** Changed in: openssl (Ubuntu Zesty)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1674399

Title:
  OpenSSL CPU detection for AMD Ryzen CPUs

Status in openssl package in Ubuntu:
  In Progress
Status in openssl source package in Xenial:
  In Progress
Status in openssl source package in Zesty:
  In Progress
Status in openssl source package in Artful:
  In Progress

Bug description:
  [Impact]

  * Context:

  AMD added support in their processors for SHA Extensions[1] (CPU flag:
  sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in
  64bit only (Confirmed with AMD representative). Current OpenSSL
  version in Ryzens still calls SHA for SSSE3 routine as result a number
  of extensions were effectively masked on Ryzen and shows no
  improvement.

  [1] /proc/cpuinfo
  processor : 0
  vendor_id : AuthenticAMD
  cpu family : 23
  model : 1
  model name : AMD Ryzen 5 1600 Six-Core Processor
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 
constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni 
pclmulqdq monitor ssse3 fma cx16 sse
  4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm 
extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce 
topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall 
fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho
  pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

  [2] - sha_ni: SHA1/SHA256 Instruction Extensions

  [3] - https://en.wikipedia.org/wiki/Ryzen
  ...
  All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, 
CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5]
  ...

  * Program to performs the CPUID check:

  Reference :
  https://software.intel.com/en-us/articles/intel-sha-extensions

  ... Availability of the Intel® SHA Extensions on a particular
  processor can be determined by checking the SHA CPUID bit in
  CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function,
  using inline assembly, performs the CPUID check:

  --
  int CheckForIntelShaExtensions() {
     int a, b, c, d;

     // Look for CPUID.7.0.EBX[29]
     // EAX = 7, ECX = 0
     a = 7;
     c = 0;

     asm volatile ("cpuid"
          :"=a"(a), "=b"(b), "=c"(c), "=d"(d)
          :"a"(a), "c"(c)
         );

     // Intel® SHA Extensions feature bit is EBX[29]
     return ((b >> 29) & 1);
  }
  --

  On CPU with sha_ni the program return "1". Otherwise it return "0".

  [Test Case]

   * Reproducible with Xenial/Zesty/Artful release.

   * Generated a checksum of a big file (e.g. 5GB file) with openssl
   $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
  SHA256(/var/tmp/5Gfile)= 
8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8

  real  0m12.835s
  user  0m12.344s
  sys   0m0.484s

  * Openssl speed
  $ openssl speed sha1
  Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s
  Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s
  Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s
  Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s
  Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s
  OpenSSL 1.0.2g 1 Mar 2016
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
  compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT 
-DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM 
-DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
  The 'numbers' are in 1000s of bytes per second processed.
  type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
  sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55

  The performance are clearly better when using the patch which take
  benefit of the sha extension. (See Regression Potential section for
  result with patch)

  [Regression Potential]

   * Note from irc discussion with apw and rbasak :

  [10:03:20] <apw> slashd, for me some new functionality like that is ok
  as long as it is very self-contained so easy to review and confirm is
  only used on the new h/w

  [10:03:52] <apw>one of our main goals is to avoid regressions

  [10:12:24] <rbasak> The SRU policy does explicitly permit hardware
  enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be
  involved in mitigating risk and making the final risk decision, FWIW.

   * None expected, it basically allow openssl to take benefit of sha
  extension potential (mostly performance-wise) now that new AMD cpu
  starting  have the capability.

   * Generated a checksum of a big file (e.g. 5GB file) with openssl
   $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
  SHA256(/var/tmp/5Gfile)= 
8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8

  real  0m3.471s
  user  0m2.956s
  sys   0m0.516s

  * Openssl speed
  $ openssl speed sha1
  Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s
  Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s
  Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s
  Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s
  Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s
  OpenSSL 1.0.2g 1 Mar 2016
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
  compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT 
-DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM 
-DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
  The 'numbers' are in 1000s of bytes per second processed.
  type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
  sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k

  [Other Info]

  * Debian Bug : https://bugs.debian.org/cgi-
  bin/bugreport.cgi?bug=861145

  * Upstream Repository : https://github.com/openssl/openssl.git

  * Upstream Commits :
  1aed5e1 crypto/x86*cpuid.pl: move extended feature detection.
  ## This fix moves extended feature detection past basic feature detection 
where it belongs.

  f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
  ## This commit for x86_64cpuid.pl addressed the problem, but messed up 
processor vendor detection.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to