Re: RFR: 8350126: Regression ~3% on Crypto-ChaCha20Poly1305.encrypt for MacOSX aarch64 [v2]

Jamil Nimeh Mon, 07 Apr 2025 10:38:52 -0700

> This fix addresses a performance regression found on some aarch64 processors, 
> namely the Apple M1, when we moved to a quarter round parallel implementation 
> in JDK-8349106.  After making some improvements in the ordering of the 
> instructions in the 20-round loop we found that going back to a 
> block-parallel implementation was faster, but it definitely needed the 
> ordering changes for that to be the case.  More importantly, the block 
> parallel implementation with the interleaving turns out to be faster on even 
> those processors that showed improvements when moving to the quarter round 
> parallel implementation.
> 
> There is a spreadsheet attached to the JBS bug that shows 3 different 
> implementations relative to the current (QR-parallel with no interleaving) 
> implementation on 3 different ARM64 processors.  Comparative benchmarks can 
> also be found below.


Jamil Nimeh has updated the pull request incrementally with one additional 
commit since the last revision:

  Place columnar/diagonal alignment code into separate method

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/24420/files
  - new: https://git.openjdk.org/jdk/pull/24420/files/b530e166..fe865308

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24420&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24420&range=00-01

  Stats: 39 lines in 3 files changed: 33 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/24420.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24420/head:pull/24420

PR: https://git.openjdk.org/jdk/pull/24420

Re: RFR: 8350126: Regression ~3% on Crypto-ChaCha20Poly1305.encrypt for MacOSX aarch64 [v2]

Reply via email to