On Thu, 21 May 2026 00:06:29 GMT, Volodymyr Paprotski <[email protected]> wrote:
>> This PR: >> - changes existing AVX512 SHA3 intrinsic to be more parallel >> - adds an AVX2 SHA3 intrinsic >> - change `SHA3Parallel.java` to NR=4 (to be able to exploit the AVX512 >> parallelism while keeping doubleKeccak for platforms where double >> parallelism is preferable. I experimented with NR=8 as well, does also gain >> a few percent, but I think NR=4 is sufficient tradeoff) >> >> Performance gains: >> - `MessageDigestBench.digest`: >> - AVX2: **16%-39%** >> - AVX512: **24%-33%** >> - `SignatureBench.MLDSA.sign` >> - AVX2: **6-12%** >> - AVX512: **11%-18%** >> - `SignatureBench.MLDSA.verify` >> - AVX2: **2%-14%** >> - AVX512: **31%-40%** >> - `KEMBench.MLKEM` >> - AVX2: **~5%** >> - AVX512: **14%-23%** >> - `KEMBench.JSSE_*` >> - appears unaffected >> >> Note on intrinsics. (As noted in the code..) there are multiple entrypoints >> wrapping the same intrinsic.. >> - `SHA3.implCompress`: single blockSize of user data xored with keccak >> - `DigestBase.implCompressMultiBlock`: loop over user data and xor with >> keccak >> - `SHA3Parallel.doubleKeccak`: (still used for AVX2) no message data, just >> two state vectors >> - `SHA3Parallel.quadKeccak`: (AVX512 benefit) no message data, four state >> vectors >> >> Note 1: `make test >> TEST="micro:org.openjdk.bench.javax.crypto.full.MessageDigestBench >> micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA >> micro:org.openjdk.bench.javax.crypto.full.KEMBench"` >> Note 2: I have left more targeted fuzzing and benchmarks out of this PR, but >> they are preserved at [on my >> branch](https://github.com/vpaprotsk/jdk/compare/sha3-avx-quad...vpaprotsk:jdk:sha3-avx-quad-extras?expand=1). >> If there is something you rather see pulled in.. (otherwise, can include a >> diff in JBS for 'future reference') >> >> --------- >> - [X] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Volodymyr Paprotski has updated the pull request incrementally with one > additional commit since the last revision: > > comments, next round done! src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 83: > 81: ATTRIBUTE_ALIGNED(64) static const uint64_t avx2_rotate_consts[] = { > 82: // X0 X0 X1 X3 X1 X3 X2 X4 X2 X4 > 83: 1, 28, 1, 28, 62, 27, 62, 27, // A0_, > A1A3, A2A4 A0_ should be removed from comment as there are no corresponding entries? src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 1055: > 1053: __ movq(T0, Address(buf, 5 * 8)); //b5 > 1054: __ movq(T1, Address(buf, 15 * 8)); //b15 > 1055: __ vshufpd(T0, T0/*b5*/, T1/*b15*/, 0b0000, vector_len); //b5A15 Comment should be //b5b15 ? src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 1061: > 1059: __ movq(T0, Address(buf, 10 * 8)); //b10 > 1060: __ vpxor(T0, T0, A10A20, vector_len); > 1061: __ vshufpd(A10A20, T0/*A10*/, A10A20, 0b1010, vector_len); > //b10A20 Looks to me that this vshufpd could be removed. Or if retained, comment should be //A10A20 ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/31125#discussion_r3284439491 PR Review Comment: https://git.openjdk.org/jdk/pull/31125#discussion_r3291081988 PR Review Comment: https://git.openjdk.org/jdk/pull/31125#discussion_r3291098777
