On Mon, 24 Nov 2025 22:01:17 GMT, Volodymyr Paprotski <[email protected]> wrote:
>> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed >> https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; >> mostly more efficient instruction selection and tighter register allocation, >> which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for >> AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java >> test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java >> test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test >> TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" >> MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions >> -XX:+UseDilithiumIntrinsics;FORK=1" >> make test >> TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" >> MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions >> -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one > additional commit since the last revision: > > spelling Very nice work @vpaprotsk , Please also add in comments the links to original reference implimentation. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 365: > 363: > 364: static void loadXmms(const XMMRegister destinationRegs[], Register > source, int offset, > 365: int vector_len, MacroAssembler *_masm, int regCnt > = -1, int memStep = -1) { Suggestion: int vector_len, MacroAssembler *_masm, int regCnt = -1, int memStep = -1) { src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 381: > 379: > 380: static void storeXmms(Register destination, int offset, const > XMMRegister xmmRegs[], > 381: int vector_len, MacroAssembler *_masm, int regCnt > = -1, int memStep = -1) { Suggestion: int vector_len, MacroAssembler *_masm, int regCnt = -1, int memStep = -1) { src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 659: > 657: // zetas (int[128*8]) = c_rarg1 > 658: static address generate_dilithiumAlmostInverseNtt_avx(StubGenerator > *stubgen, > 659: int vector_len,MacroAssembler > *_masm) { Fix indentation src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 718: > 716: > 717: // Constants for shuffle and montMul64 > 718: __ mov64(scratch, 0b1010101010101010); 64 bit constant suffix src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 901: > 899: // poly2 (int[256]) = c_rarg2 > 900: static address generate_dilithiumNttMult_avx(StubGenerator *stubgen, > 901: int vector_len, MacroAssembler > *_masm) { Fix indentation src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 939: > 937: vector_len, scratch); // 2^64 mod q > 938: if (vector_len == Assembler::AVX_512bit) { > 939: __ mov64(scratch, 0b0101010101010101); Add long constant suffix src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 985: > 983: // constant (int) = c_rarg1 > 984: static address generate_dilithiumMontMulByConstant_avx(StubGenerator > *stubgen, > 985: int vector_len, MacroAssembler > *_masm) { Fix indentation src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1026: > 1024: __ evpbroadcastd(constant, rConstant, Assembler::AVX_512bit); // > constant multiplier > 1025: > 1026: __ mov64(scratch, 0b0101010101010101); //dw-mask Constant suffix ------------- PR Review: https://git.openjdk.org/jdk/pull/28136#pullrequestreview-3503056034 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558380867 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558381318 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558385868 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558390552 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558390067 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558370904 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558391135 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558371478
