This change allows use of the AVX512_VBMI/VMBI2 instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.2 to 0.5%, encapsulation is 0.3 to 1.5%, and decapsulation is 0 to 0.9%.
Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. ------------- Commit messages: - Merge with mainline - Swap parameter operation with source - Remove wrong mask from evpsrlvw - Reverse ordering for vpermb and vpsrlvw instructions - Switch from vpshldvw to vpsrlvw - Fix whitespaces - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 Changes:https://git.openjdk.org/jdk/pull/28815/files Webrev:https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=00 <https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=00> Issue:https://bugs.openjdk.org/browse/JDK-8360934 Stats: 88 lines in 1 file changed: 87 ins; 0 del; 1 mod Patch:https://git.openjdk.org/jdk/pull/28815.diff Fetch: git fetchhttps://git.openjdk.org/jdk.git pull/28815/head:pull/28815 PR:https://git.openjdk.org/jdk/pull/28815
