On Thu, 20 Mar 2025 21:06:30 GMT, Ferenc Rakoczi <d...@openjdk.org> wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 58: >> >>> 56: >>> 57: ATTRIBUTE_ALIGNED(64) static const uint32_t dilithiumAvx512Perms[] = { >>> 58: // collect montmul results into the destination register >> >> same as `dilithiumAvx512Consts()`, 'magic offsets'; except here they are >> harder to count (eg. not clear visually what is the offset of `ntt inverse`). >> >> Could be split into three constant arrays to make the compiler count for us > > Well, it is 64 bytes per line (16 4-byte uint32_ts), not that hard :-) ... Ha! I didn't realize it was 16 per line.. ran out of fingers while counting!!! :) 'works for me, as long as its a "premeditated" decision' >> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 980: >> >>> 978: // Dilithium multiply polynomials in the NTT domain. >>> 979: // Implements >>> 980: // static int implDilithiumNttMult( >> >> I suppose no java changes in this PR, but I notice that the inputs are all >> assumed to have fixed size. >> >> Most/all intrinsics I worked with had some sort of guard (eg >> `Objects.checkFromIndexSize`) right before the intrinsic java call. (It >> usually looks like it can be optimized away). But I notice no such guard >> here on the java side. > > These functions will not be used anywhere else and in ML_DSA.java all of the > arrays passed to inrinsics are of the correct size. Works for me; just thought I would point it out, so its a 'premeditated' decision. >> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1012: >> >>> 1010: __ evmovdqul(xmm28, Address(perms, 0), Assembler::AVX_512bit); >>> 1011: >>> 1012: __ movl(len, 4); >> >> Compile-time constant, why not 'unroll at compile time'? i.e. wrap this loop >> with `for (int len=0; len<4; len++)` instead? > > I have found that unrolling these loops actually hurts performance (probably > an I-cache effect. Interesting; I keep on having to re-train my intuition, thanks for the data ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008806159 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008805574 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008805113