On Fri, 23 Jan 2026 12:10:09 GMT, xinyangwu <[email protected]> wrote:
> ### Summary > This PR introduces a parallel intrinsic for AES/ECB operations to replace the > current per-block processing approach, reducing native call overhead and > improving throughput for multi-block operations. > ### Problem > Except supporting AVX512, The existing AES/ECB implementation suffers from > three major performance issues: > 1. Excessive stub call overhead: Each 16-byte block requires a separate > intrinsic call, resulting in high invocation frequency > > 2. Inefficient instruction-level parallelism: The serialized block processing > fails to fully utilize instruction-level parallelism > > 3. Redundant setup/teardown: Repeated initialization of encryption state for > each block > ### Changes > Added parallel AES intrinsic implementation > ### Testing > JMH benchmarks > > It can bring about a **37.43%** performance improvement. > > On a Intel(R) Core(TM) i9-14900HX CPU machine with origin implements: > > > Benchmark Mode Cnt Score Error Units > AesTest.test avgt 5 11518.846 ± 68.621 ns/op > > > On the same machine with optimized implements: > > > Benchmark Mode Cnt Score Error Units > AesTest.test avgt 5 8381.499 ± 57.751 ns/op > > > All Tier-1 tests pass on linux-x64. This modification does not involve > changing the encryption or decryption logic. This pull request has now been integrated. Changeset: 3e9fc5d4 Author: wuxinyang <[email protected]> Committer: SendaoYan <[email protected]> URL: https://git.openjdk.org/jdk/commit/3e9fc5d49e52d79bcd2bb75068ff7efb31f768fd Stats: 212 lines in 2 files changed: 209 ins; 0 del; 3 mod 8376164: Optimize AES/ECB implementation using full-message intrinsic stub and parallel RoundKey addition Reviewed-by: sviswanathan, semery ------------- PR: https://git.openjdk.org/jdk/pull/29385
