On Tue, 18 Feb 2025 13:43:18 GMT, Andrew Dinn <ad...@openjdk.org> wrote:
>> Ferenc Rakoczi has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Adding comments + some code reorganization > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4066: > >> 4064: } >> 4065: >> 4066: // Execute on round of keccak of two computations in parallel. > > Suggestion: > > It would be helpful to add comments that relate the register and instruction > selection to the original Java source code. e.g. change the header as follows > > // Performs 2 keccak round transformations using vector parallelism > // > // Two sets of 25 * 64-bit input states a0[lo:hi]...a24[lo:hi] are passed > in > // the lower/upper halves of registers v0...v24 and the transformed states > // are returned in the same registers. Intermediate 64-bit pairs > // c0...c5 and d0...d5 are computed in registers v25...v30. v31 is > // loaded with the required pair of 64 bit rounding constants. > // During computation of the output states some intermediate results are > // shuffled around registers v0...v30. Comments on each line indicate > // how the values in registers correspond to variables ai, ci, di in > // the Java source code, likewise how the generated machine instructions > // correspond to Java source operations (n.b. rol means rotate left). > > The annotate the generation steps as follows: > > __ eor3(v29, __ T16B, v4, v9, v14); // c4 = a4 ^ a9 ^ a14 > __ eor3(v26, __ T16B, v1, v6, v11); // c1 = a1 ^ a16 ^ a11 > __ eor3(v28, __ T16B, v3, v8, v13); // c3 = a3 ^ a8 ^a13 > __ eor3(v25, __ T16B, v0, v5, v10); // c0 = a0 ^ a5 ^ a10 > __ eor3(v27, __ T16B, v2, v7, v12); // c2 = a2 ^ a7 ^ a12 > __ eor3(v29, __ T16B, v29, v19, v24); // c4 ^= a19 ^ a24 > __ eor3(v26, __ T16B, v26, v16, v21); // c1 ^= a16 ^ a21 > __ eor3(v28, __ T16B, v28, v18, v23); // c3 ^= a18 ^ a23 > __ eor3(v25, __ T16B, v25, v15, v20); // c0 ^= a15 ^ a20 > __ eor3(v27, __ T16B, v27, v17, v22); // c2 ^= a17 ^ a22 > > __ rax1(v30, __ T2D, v29, v26); // d0 = c4 ^ rol(c1, 1) > __ rax1(v26, __ T2D, v26, v28); // d2 = c1 ^ rol(c3, 1) > __ rax1(v28, __ T2D, v28, v25); // d4 = c3 ^ rol(c0, 1) > __ rax1(v25, __ T2D, v25, v27); // d1 = c0 ^ rol(c2, 1) > __ rax1(v27, __ T2D, v27, v29); // d3 = c2 ^ rol(c4, 1) > > __ eor(v0, __ T16B, v0, v30); // a0 = a0 ^ d0 > __ xar(v29, __ T2D, v1, v25, (64 - 1)); // a10' = rol((a1^d1), 1) > __ xar(v1, __ T2D, v6, v25, (64 - 44)); // a1 = rol(a6^d1), 44) > __ xar(v6, __ T2D, v9, v28, (64 - 20)); // a6 = rol((a9^d4), 20) > __ xar(v... Although this piece of code is not new, and I don't really think that this level of commenting is necessary, especially in code that is very unlikely to change, I added the comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1965220606