Re: RFR: 8355216: Accelerate P-256 arithmetic on aarch64 [v6]

Ben Perez Tue, 17 Feb 2026 13:31:01 -0800

On Mon, 9 Feb 2026 11:31:00 GMT, Andrew Dinn <[email protected]> wrote:


>> Ben Perez has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   fixed indexing bug in vs_ldpq, simplified vector loads in 
>> generate_intpoly_assign()
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7988:
> 
>> 7986:       __ ld1(a_vec[0], __ T2D, aLimbs);
>> 7987:       __ ldpq(a_vec[1], a_vec[2], Address(aLimbs, 16));
>> 7988:       __ ldpq(a_vec[3], a_vec[4], Address(aLimbs, 48));
> 
> I notice that here and elsewhere you have a 5 vector sequence and hence are 
> not using vs_ldpq/stpq operations (because they only operate on even length 
> sequences). However, if you add a bit of extra 'apparatus' to register.hpp 
> you can then use the vs_ldpq/stpq operations.
> 
> Your code processes the first register individually via ld1/st1 and then the 
> remaining registers using a pair of loads i.e. operate as if the latter were 
> a VSeq<4>. So, in register_aarch64.hpp you can add these functions:
> 
> 
> template<int N>
> FloatRegister vs_head(const VSeq<N>& v) {
>   static_assert(N > 1), "sequence length must be greater than 1");
>   return v.base();
> }
> 
> template<int N>
> VSeq<N> vs_tail(const VSeq<N+1>& v) {
>   static_assert(N > 1, "tail sequence length must be greater than 2");
>   return VSeq<N>(v.base() + v.delta(), v.delta());
> }
> 
> With those methods available you should be able to do all these VSeq<5> loads 
> and stores using an ld1/st1 followed by an vs_ldpq_indexed or vs_stpq_indexed 
> with a suitable start index and the same constant offset array e.g. here you 
> could use
> 
> Suggestion:
> 
>       int offsets[2] = { 0, 32 };
>       __ ld1(vs_head(a_vec), __ T2D, aLimbs);
>       vs_ldpq_indexed(vs_tail(a_vec), aLimbs, 16, offsets);

Made these changes except I opted to use `__ ld1(a_vec[0], __ T2D, aLimbs)` 
instead of a `vs_head` method because it seemed easier to read. Happy to add 
that though.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2819156616

Re: RFR: 8355216: Accelerate P-256 arithmetic on aarch64 [v6]

Reply via email to