On Thu, 10 Apr 2025 13:19:05 GMT, Ferenc Rakoczi <d...@openjdk.org> wrote:
>> By using the aarch64 vector registers the speed of the computation of the >> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be >> approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with two additional > commits since the last revision: > > - Code rearrange, some renaming, fixing comments > - Changes suggested by Andrew Dinn. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5278: > 5276: // level 4 > 5277: vs_ldpq(vq, kyberConsts); > 5278: int offsets3[8] = { 0, 32, 64, 96, 128, 160, 192, 224 }; I'd like to add comment here to explain the coefficient grouping and likewise at level 5 and 6. So here we have: // Up to level 3 the coefficients multiplied by or added/subtracted // to the zetas occur in discrete blocks whose size is some multiple // of 32. At level 4 coefficients occur in 8 discrete blocks of size 16 // so they are loaded using employing an ldr at 8 distinct offsets. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2037560706