Re: RFR: 8355216: Accelerate P-256 arithmetic on aarch64 [v4]

Ben Perez Wed, 04 Feb 2026 12:53:27 -0800

On Mon, 26 Jan 2026 13:39:10 GMT, Andrew Dinn <[email protected]> wrote:


>> Ben Perez has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Added conditionalAssign() intrinsic, changed mult intrinsic to use hybrid 
>> neon/gpr approach
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7282:
> 
>> 7280:     __ mov(mul_ptr, sp);
>> 7281: 
>> 7282:     __ umullv(A[0], __ T2D, b_lows, __ T2S, a_vals, __ S, 0);
> 
> I notice that this 4 line insn pattern recurs several times. You might 
> consider defining a macro generator method to generate these sequences i.e.
> 
> vs_umullv(VSeq<4> vs, FloatRegister bs, FloatRegister as, int lane_lo) {
>     __ umullv(vs[0],  __ T2D, bs, __ T2S, as, __S, lane_lo);
>     __ umull2v(vs[1], __ T2D, bs, __ T4S, as, __S, lane_lo);
>     __ umullv(vs[2],  __ T2D, bs, __ T2S, as, __S, lane_lo + 2);
>     __ umull2v(vs[3], __ T2D, bs, __ T4S, as, __S, lane_lo + 2);
> }
> 
> So, in this case you would call
> 
>   vs_umullv(A, b_lows, a_vals, 0);
> 
> and in other cases you can simply vary which arguments you provide for `vs`, 
> `as`, `bs` and `lane_lo`.
> 
> I'm suggesting that because the various cross multiplies generated by this 
> macro-method appear to implement a specific subsequence of the mont mul 
> computation (similar to what is done in e.g. the kyber code). So, this 
> generator method abstraction should also abstract a clear step in any 
> pseudo-code algorithm you provide and ought therefore to make clear /that/ 
> (also make clear /how/) the generated code implements the algorithm. If you 
> provide such a pseudo-code algorithm then we might even be able come up with 
> a better name for the generator method (e.g. montmul_uproduct or something 
> else that reflects what is happening here).

Great idea, just split that sequence out into a separate method

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2765923257

Re: RFR: 8355216: Accelerate P-256 arithmetic on aarch64 [v4]

Reply via email to