On Thu, 23 Oct 2025 01:39:02 GMT, Ben Perez <[email protected]> wrote:
> An aarch64 implementation of the `MontgomeryIntegerPolynomial256.mult()` > method On 29/10/2025 20:18, Ben Perez wrote: > Is there by any chance documentation for `RegSet` that I can reference while > making these changes? No, but it's all there in the source. -- Andrew Haley (he/him) Java Platform Lead Engineer https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7144: > 7142: > 7143: address generate_intpoly_montgomeryMult_P256() { > 7144: As a general point, it would help everyone if you provided pseudocode for the whole thing. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7160: > 7158: }; > 7159: > 7160: Register c_ptr = r9; `rscratch1` and `rscratch2` are used freely by macros, so aliasing them is always rather sketchy. As far as I can tell the arg registers aren't used here, so it makes sense to use `r3`... src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7169: > 7167: Register mul_tmp = r14; > 7168: Register n = r15; > 7169: Here, you could do something like RegSet scratch = RegSet::range(r3, r28) - rscratch1 - rscratch2; { auto r_it = scratch.begin(); Register c_ptr = *r_it++, a_i = *r_it++, c_idx = *r_it++, //c_idx is not used at the same time as a_i limb_mask_scalar = *r_it++, b_j = *r_it++, mod_j = *r_it++, mod_ptr = *r_it++, mul_tmp = *r_it++, n = *r_it++; ... } Note that a RegSet iterator doesn't affect the RegSet it was created from, so once this block has ended you can allocate again from the set of scratch registers. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7201: > 7199: __ mov(limb_mask_scalar, 1); > 7200: __ neg(limb_mask_scalar, limb_mask_scalar); > 7201: __ lsr(limb_mask_scalar, limb_mask_scalar, 12); Suggestion: __ mov(limb_mask_scalar, -UCONST64(1) >> (64 - BITS_PER_LIMB)); src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7229: > 7227: __ shl(high_01, __ T2D, high_01, shift1); > 7228: __ ushr(tmp, __ T2D, low_01, shift2); > 7229: __ orr(high_01, __ T2D, high_01, tmp); Suggestion: __ orr(high_01, __ T16B, high_01, tmp); everywhere. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7403: > 7401: // c3 &= LIMB_MASK; > 7402: > 7403: __ ldr(mod_j, __ post(mod_ptr, 8)); Best not to use post-increment if you can avoid it. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7440: > 7438: > 7439: Register res_0 = r9; > 7440: Register res_1 = r10; Aliasing the same register with different names is very dangerous, and has cause hard-to-find failures in production code in the past. You can confine the `Register` instances to block scope. You can also suffix or prefix the local names with canonical register names. Best of all is to get rid of the manual register allocation altogether, by creating a `RegSet`, then adding and removing registers that you need, as you go along. That way the need to manually check register usage goes away altogether. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27946#issuecomment-3466766156 PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2460515105 PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2460510697 PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2461123587 PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2460582673 PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2461125143 PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2460621762 PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2460612964
