On Tue, 12 May 2026 08:22:36 GMT, Andrew Dinn <[email protected]> wrote:
>> This PR: >> - changes existing AVX512 SHA3 intrinsic to be more parallel >> - adds an AVX2 SHA3 intrinsic >> - change `SHA3Parallel.java` to NR=4 (to be able to exploit the AVX512 >> parallelism while keeping doubleKeccak for platforms where double >> parallelism is preferable. I experimented with NR=8 as well, does also gain >> a few percent, but I think NR=4 is sufficient tradeoff) >> >> Performance gains: >> - `MessageDigestBench.digest`: >> - AVX2: **16%-39%** >> - AVX512: **24%-33%** >> - `SignatureBench.MLDSA.sign` >> - AVX2: **6-12%** >> - AVX512: **11%-18%** >> - `SignatureBench.MLDSA.verify` >> - AVX2: **2%-14%** >> - AVX512: **31%-40%** >> - `KEMBench.MLKEM` >> - AVX2: **~5%** >> - AVX512: **14%-23%** >> - `KEMBench.JSSE_*` >> - appears unaffected >> >> Note on intrinsics. (As noted in the code..) there are multiple entrypoints >> wrapping the same intrinsic.. >> - `SHA3.implCompress`: single blockSize of user data xored with keccak >> - `DigestBase.implCompressMultiBlock`: loop over user data and xor with >> keccak >> - `SHA3Parallel.doubleKeccak`: (still used for AVX2) no message data, just >> two state vectors >> - `SHA3Parallel.quadKeccak`: (AVX512 benefit) no message data, four state >> vectors >> >> Note 1: `make test >> TEST="micro:org.openjdk.bench.javax.crypto.full.MessageDigestBench >> micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA >> micro:org.openjdk.bench.javax.crypto.full.KEMBench"` >> Note 2: I have left more targeted fuzzing and benchmarks out of this PR, but >> they are preserved at [on my >> branch](https://github.com/vpaprotsk/jdk/compare/sha3-avx-quad...vpaprotsk:jdk:sha3-avx-quad-extras?expand=1). >> If there is something you rather see pulled in.. (otherwise, can include a >> diff in JBS for 'future reference') >> >> --------- >> - [X] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 96: > >> 94: StubGenerator *stubgen, >> 95: MacroAssembler *_masm) { >> 96: bool multiBlock; > > You have moved the switch statement which verifies the stub id is in the > expected range after the call to `load_archive_data`. Please return it to the > start of this method to ensure we perform validation before trying to > retrieve the stub from the AOT cache. > > Note that you need to follow the original and provide a case switch for all > legitimate cases and a default case which calls `ShouldNotReachHere()`. I remember removing this because "this function is a local (static) function, essentially a private helper in this file.. it is only ever called here.. no need to tripple-check every parameter.. and it will just add complexity.. " (changed my mind!) ```C++ void StubGenerator::generate_sha3_stubs() { bool avx512Available = VM_Version::supports_evex() && VM_Version::supports_avx512bw(); if (UseSHA3Intrinsics) { if (avx512Available) { StubRoutines::_sha3_implCompress = generate_sha3_implCompress_avx512(StubId::stubgen_sha3_implCompress_id, this, _masm); StubRoutines::_sha3_implCompressMB = generate_sha3_implCompress_avx512(StubId::stubgen_sha3_implCompressMB_id, this, _masm); StubRoutines::_double_keccak = generate_sha3_implCompress_avx512(StubId::stubgen_double_keccak_id, this, _masm); StubRoutines::_quad_keccak = generate_sha3_implCompress_avx512(StubId::stubgen_quad_keccak_id, this, _masm); } else { StubRoutines::_sha3_implCompress = generate_sha3_implCompress_avx2(StubId::stubgen_sha3_implCompress_id, this, _masm); StubRoutines::_double_keccak = generate_sha3_implCompress_avx2(StubId::stubgen_double_keccak_id, this, _masm); StubRoutines::_sha3_implCompressMB = generate_sha3_implCompress_avx2(StubId::stubgen_sha3_implCompressMB_id, this, _masm); } } } but.. I compacted your version to just: ```C++ switch(stub_id) { case StubId::stubgen_sha3_implCompress_id: case StubId::stubgen_sha3_implCompressMB_id: case StubId::stubgen_double_keccak_id: case StubId::stubgen_quad_keccak_id: break; default: ShouldNotReachHere(); } In fact.. its 'self-documentation' on ways to call this helper.. works for me, thanks! > src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 624: > >> 622: StubGenerator *stubgen, >> 623: MacroAssembler *_masm) { >> 624: bool multiBlock = stub_id == StubId::stubgen_sha3_implCompressMB_id; > > Please insert a switch statement here like the one employed in the previous > generator to validate the stub id before we call `load_archive_data`. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/31125#discussion_r3229832536 PR Review Comment: https://git.openjdk.org/jdk/pull/31125#discussion_r3229837964
