On Wed, 2 Apr 2025 07:38:34 GMT, Ferenc Rakoczi <d...@openjdk.org> wrote:
>> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional > commit since the last revision: > > Reacting to comment by Sandhya. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 339: > 337: > 338: // levels 2 to 7 are done in 2 batches, by first saving half of the > coefficients > 339: // from level 1 into memory, doing all the level 2 to level 7 > computations In line number 344 - 347, we seem to be storing all the coefficients from level 1 into memory. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 345: > 343: > 344: store4Xmms(coeffs, 0, xmm0_3, _masm); > 345: store4Xmms(coeffs, 4 * XMMBYTES, xmm4_7, _masm); This seems to be unnecessary store. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 370: > 368: loadPerm(xmm16_19, perms, nttL4PermsIdx, _masm); > 369: loadPerm(xmm12_15, perms, nttL4PermsIdx + 64, _masm); > 370: load4Xmms(xmm24_27, zetas, 4 * 512, _masm); // for level 3 The comment // for level3 is not relevant here and could be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029437396 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029578599 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029583308