This enhancement makes a change to the ChaCha20 block function intrinsic on 
aarch64, moving away from the block parallel implementation and to the 
quarter-round parallel implementation that was done on x86_64.  Assembly 
language profiling yielded an 11% improvement in throughput.  When put together 
as an intrinsic and hooked into the JCE ChaCha20 cipher, the gains are more 
modest, somewhere in the 2-4% range depending on job size, but still an 
improvement.

-------------

Commit messages:
 - Clean up whitespace errors, remove unneeded undefs
 - Remove block parallel implementation and cleanup
 - 8349106: Change ChaCha20 intrinsic to use quarter-round parallel 
implementation on aarch64

Changes: https://git.openjdk.org/jdk/pull/23397/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23397&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8349106
  Stats: 141 lines in 1 file changed: 46 ins; 1 del; 94 mod
  Patch: https://git.openjdk.org/jdk/pull/23397.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23397/head:pull/23397

PR: https://git.openjdk.org/jdk/pull/23397

Reply via email to