On Thu, 20 Nov 2025 22:55:07 GMT, Volodymyr Paprotski <[email protected]> 
wrote:

>> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline 
>>    - `SignatureBench.MLDSA` is 1.2x-2.2x faster
>>    - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed 
>> https://github.com/vpaprotsk/jdk/pull/7)
>> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version 
>>   - `SignatureBench.MLDSA` is upto 5% faster, never slower
>> 
>> Note on intrinsic:
>> - The emitted (existing) AVX512 assembler was not "significantly" changed; 
>> mostly more efficient instruction selection and tighter register allocation, 
>> which allowed removal of NTT loop and stack spill.
>> - Code was refactored to allow reuse of same assembler (as possible) for 
>> AVX512 and AVX2
>> 
>> Tests and benchmarks:
>> - Added a fuzz test to ensure Java and intrinsic produces exactly same result
>> - Added benchmark to measure the performance of intrinsic itself
>> 
>> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java 
>> test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java"
>> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java 
>> test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" 
>> JTREG="JAVA_OPTIONS=-XX:UseAVX=2"
>> make test 
>> TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" 
>> MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions 
>> -XX:+UseDilithiumIntrinsics;FORK=1"
>> make test 
>> TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" 
>> MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions 
>> -XX:-UseDilithiumIntrinsics;FORK=1"
>
> Volodymyr Paprotski has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   next set of comments

The 2.24% improvement is the difference between `+UseDilithiumIntrinsics` and 
`-UseDilithiumIntrinsics.` I just repeated the testing that you documented in 
the description section of this PR on a different machine.

My baseline is simply a build without your changes. I compared this with a 
build containing your changes and see a 2.24% improvement.

Verification showed the least amount of improvement (same as what you observed).

"never worse" is just my way of saying "always faster".

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3572037049

Reply via email to