On Fri, 6 Mar 2026 12:31:57 GMT, xinyangwu <[email protected]> wrote:

>> ### Summary
>> This PR introduces a parallel intrinsic for AES/ECB operations to replace 
>> the current per-block processing approach, reducing native call overhead and 
>> improving throughput for multi-block operations.
>> ### Problem
>> Except supporting AVX512, The existing AES/ECB implementation suffers from 
>> three major performance issues:
>> 1. Excessive stub call overhead: Each 16-byte block requires a separate 
>> intrinsic call, resulting in high invocation frequency
>> 
>> 2. Inefficient instruction-level parallelism: The serialized block 
>> processing fails to fully utilize instruction-level parallelism
>> 
>> 3. Redundant setup/teardown: Repeated initialization of encryption state for 
>> each block
>> ### Changes
>> Added parallel AES intrinsic implementation
>> ### Testing
>> JMH benchmarks
>> 
>> It can bring about a **37.43%** performance improvement.
>> 
>> On a Intel(R) Core(TM) i9-14900HX CPU machine with origin implements:
>> 
>> 
>> Benchmark     Mode  Cnt      Score    Error  Units
>> AesTest.test  avgt    5  11518.846 ± 68.621  ns/op
>> 
>> 
>> On the same machine with optimized implements:
>> 
>> 
>> Benchmark     Mode  Cnt     Score    Error  Units
>> AesTest.test  avgt    5  8381.499 ± 57.751  ns/op
>> 
>> 
>> All Tier-1 tests pass on linux-x64. This modification does not involve 
>> changing the encryption or decryption logic.
>
> xinyangwu has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   changes suggested by sviswa7

I have couple more minor items.

src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 1436:

> 1434:   const Register len_reg = c_rarg3;  // src len (must be multiple of 
> blocksize 16)
> 1435:   const Register pos     = rax;
> 1436:   const Register keylen  = rbx;

You could use r11 for keylen, then you don't have to do push/pop rbx.

src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 1575:

> 1573:   } //for key_128/192/256
> 1574: 
> 1575:   __ BIND(L_exit);

It will be good to clear all the xmm registers holding the sensitive key values 
using pxor before returning from the stub.

-------------

PR Review: https://git.openjdk.org/jdk/pull/29385#pullrequestreview-3905078214
PR Review Comment: https://git.openjdk.org/jdk/pull/29385#discussion_r2897021863
PR Review Comment: https://git.openjdk.org/jdk/pull/29385#discussion_r2897177056

Reply via email to