On Fri, 6 Mar 2026 12:31:57 GMT, xinyangwu <[email protected]> wrote: >> ### Summary >> This PR introduces a parallel intrinsic for AES/ECB operations to replace >> the current per-block processing approach, reducing native call overhead and >> improving throughput for multi-block operations. >> ### Problem >> Except supporting AVX512, The existing AES/ECB implementation suffers from >> three major performance issues: >> 1. Excessive stub call overhead: Each 16-byte block requires a separate >> intrinsic call, resulting in high invocation frequency >> >> 2. Inefficient instruction-level parallelism: The serialized block >> processing fails to fully utilize instruction-level parallelism >> >> 3. Redundant setup/teardown: Repeated initialization of encryption state for >> each block >> ### Changes >> Added parallel AES intrinsic implementation >> ### Testing >> JMH benchmarks >> >> It can bring about a **37.43%** performance improvement. >> >> On a Intel(R) Core(TM) i9-14900HX CPU machine with origin implements: >> >> >> Benchmark Mode Cnt Score Error Units >> AesTest.test avgt 5 11518.846 ± 68.621 ns/op >> >> >> On the same machine with optimized implements: >> >> >> Benchmark Mode Cnt Score Error Units >> AesTest.test avgt 5 8381.499 ± 57.751 ns/op >> >> >> All Tier-1 tests pass on linux-x64. This modification does not involve >> changing the encryption or decryption logic. > > xinyangwu has updated the pull request incrementally with one additional > commit since the last revision: > > changes suggested by sviswa7
I have couple more minor items. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 1436: > 1434: const Register len_reg = c_rarg3; // src len (must be multiple of > blocksize 16) > 1435: const Register pos = rax; > 1436: const Register keylen = rbx; You could use r11 for keylen, then you don't have to do push/pop rbx. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 1575: > 1573: } //for key_128/192/256 > 1574: > 1575: __ BIND(L_exit); It will be good to clear all the xmm registers holding the sensitive key values using pxor before returning from the stub. ------------- PR Review: https://git.openjdk.org/jdk/pull/29385#pullrequestreview-3905078214 PR Review Comment: https://git.openjdk.org/jdk/pull/29385#discussion_r2897021863 PR Review Comment: https://git.openjdk.org/jdk/pull/29385#discussion_r2897177056
