On Wed, 17 Jun 2026 00:02:25 GMT, Shawn Emery <[email protected]> wrote:

>> Curve25519 polynomial arithmetic is performed with intrinsincs implemented 
>> in GPR related instructions for multiplication operations (method mult()). 
>> Benchmark improvements include:
>> 
>> X25519 decapsulation: +9%
>> X25519 encapsulation: +9%
>> X22519 key agreement: +7%
>> X25519 key-pair generation: +10%
>> X25519-MLKEM decapsulation: +7%
>> X25519-MLKEM encapsulation: +8%
>> X25519-MLKEM key-pair generation: +8%
>> EdDSA sign: +12%
>> EdDSA verify: +12%
>> EdDSA key-pair generation: +15%
>> 
>> Note 1: The difference between Aarch64 vs. x86_64 intrinsics implementation 
>> include the lack of square() intrinsics; usage caused a 3.3% performance 
>> regression due to the efficiencies of the symmetric squaring shape in Java 
>> vs. the inefficiencies of the leaf calls and the additional cycles required 
>> for 64 bit multiplication in Aarch64.
>> Note 2: The GPR related instructions were optimal when compared to hybrid 
>> (GPR related instructions for the first two iterations and Neon instructions 
>> for the last two iterations) solution.  This design produced a -4%/-1% 
>> performance drop in KEM decapsulation/encapsulation compared to the GPR 
>> related instructions where the overhead of performing the limb splits and 
>> reconstruction did not compensate enough for the efficiencies of SIMD 
>> parallelism.
>> 
>> ---------
>> - [X] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> Shawn Emery has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update based on adinn's comments

Please merge from master to get Windows AArch64 tested. I suspect it will show 
up the following issues:

Ah, Windows AArch64 already _does_ show the problem I noted above:


c:\a\jdk\jdk\src\hotspot\cpu\aarch64\stubGenerator_aarch64.cpp(7721): error 
C2220: the following warning is treated as an error
c:\a\jdk\jdk\src\hotspot\cpu\aarch64\stubGenerator_aarch64.cpp(7721): warning 
C4293: '<<': shift count negative or too big, undefined behavior

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7677:

> 7675: 
> 7676:   /**
> 7677:    * Arithmetic polynomial multiplicaiton in Curve25519.  The algorithm 
> mimics

Suggestion:

   * Arithmetic polynomial multiplication in Curve25519.  The algorithm mimics

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7721:

> 7719:     const int32_t columns    = limbs * 2;
> 7720:     const uint64_t mask      = -1UL >> rem;
> 7721:     const uint64_t CARRY_ADD = 1UL << (bpl - 1);

On MSVC, `UL` is `unsigned long`, which is apparently only 32-bit long. So 
shifting `<< 51` breaks.

-------------

PR Review: https://git.openjdk.org/jdk/pull/31409#pullrequestreview-4541898537
PR Comment: https://git.openjdk.org/jdk/pull/31409#issuecomment-4765874405
PR Review Comment: https://git.openjdk.org/jdk/pull/31409#discussion_r3450421727
PR Review Comment: https://git.openjdk.org/jdk/pull/31409#discussion_r3450419179

Reply via email to