This change allows use of the AVX512_VBMI/VMBI2 instruction set to further 
optimize decompression/parsing of polynomial coefficients for ML-KEM.  The 
speedup gained in the ML-KEM benchmarks for key generation is between 0.2 to 
0.5%, encapsulation is  0.3 to 1.5%, and decapsulation is 0 to 0.9%.

Thank you to @sviswa7 and @ferakocz for their help in working through the early 
stages of this code with me.

-------------

Commit messages:
 - Merge with mainline
 - Swap parameter operation with source
 - Remove wrong mask from evpsrlvw
 - Reverse ordering for vpermb and vpsrlvw instructions
 - Switch from vpshldvw to vpsrlvw
 - Fix whitespaces
 - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and 
AVX512_VBMI2

Changes:https://git.openjdk.org/jdk/pull/28815/files
  Webrev:https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=00 
<https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=00>
  Issue:https://bugs.openjdk.org/browse/JDK-8360934
  Stats: 88 lines in 1 file changed: 87 ins; 0 del; 1 mod
  Patch:https://git.openjdk.org/jdk/pull/28815.diff
  Fetch: git fetchhttps://git.openjdk.org/jdk.git pull/28815/head:pull/28815

PR:https://git.openjdk.org/jdk/pull/28815

Reply via email to