> This enhancement makes a change to the ChaCha20 block function intrinsic on 
> aarch64, moving away from the block parallel implementation and to the 
> quarter-round parallel implementation that was done on x86_64.  Assembly 
> language profiling yielded an 11% improvement in throughput.  When put 
> together as an intrinsic and hooked into the JCE ChaCha20 cipher, the gains 
> are more modest, somewhere in the 2-4% range depending on job size, but still 
> an improvement.

Jamil Nimeh has updated the pull request incrementally with one additional 
commit since the last revision:

  Add explanatory comment and reference for quarter round intrinsic

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/23397/files
  - new: https://git.openjdk.org/jdk/pull/23397/files/41817c77..6ba0770b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=23397&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23397&range=00-01

  Stats: 25 lines in 1 file changed: 25 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/23397.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23397/head:pull/23397

PR: https://git.openjdk.org/jdk/pull/23397

Reply via email to