Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain.
Before (no AVX512) Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ± 17.879 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ± 15.807 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ± 4.190 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ± 2.484 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ± 2.285 ops/s After (with AVX2) Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ± 39.923 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ± 34.838 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ± 12.179 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ± 8.992 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ± 6.238 ops/s Before (with AVX512): Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ± 27.260 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ± 26.707 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 3112.945 ± 12.930 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 3039.183 ± 12.362 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 2248.987 ± 7.427 ops/s After (with AVX512): Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9815.713 ± 23.455 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 9136.786 ± 27.747 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 3167.702 ± 13.331 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 3090.053 ± 12.925 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 2278.031 ± 6.971 ops/s ------------- Commit messages: - whitespace - split up ASM and Math changes Changes: https://git.openjdk.org/jdk/pull/23719/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350459 Stats: 625 lines in 9 files changed: 525 ins; 15 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/23719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23719/head:pull/23719 PR: https://git.openjdk.org/jdk/pull/23719