On Thu, 10 Apr 2025 13:19:05 GMT, Ferenc Rakoczi <d...@openjdk.org> wrote:
>> By using the aarch64 vector registers the speed of the computation of the >> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be >> approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with two additional > commits since the last revision: > > - Code rearrange, some renaming, fixing comments > - Changes suggested by Andrew Dinn. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5661: > 5659: // load 16 zetas > 5660: vs_ldpq_post(vz, zetas); > 5661: // load 2 sets of 32 coefficients from the two input arrays Suggestion: // load 2 sets of 32 coefficients from the two input arrays // interleaved as shorts. i.e. pairs of shorts adjacent in memory // are striped across pairs of vector registers ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2042093533