On Thu, 3 Apr 2025 16:31:39 GMT, Jamil Nimeh <jni...@openjdk.org> wrote:
> This fix addresses a performance regression found on some aarch64 processors, > namely the Apple M1, when we moved to a quarter round parallel implementation > in JDK-8349106. After making some improvements in the ordering of the > instructions in the 20-round loop we found that going back to a > block-parallel implementation was faster, but it definitely needed the > ordering changes for that to be the case. More importantly, the block > parallel implementation with the interleaving turns out to be faster on even > those processors that showed improvements when moving to the quarter round > parallel implementation. > > There is a spreadsheet attached to the JBS bug that shows 3 different > implementations relative to the current (QR-parallel with no interleaving) > implementation on 3 different ARM64 processors. Comparative benchmarks can > also be found below. Benchmarks for Apple M1: MacOS Sonoma 14.5, 8x Apple M1 Quarter Round Parallel, No Interleaving --------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 3837175.980 ? 14108.076 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1150065.857 ? 2238.499 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 299444.203 ? 1914.377 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 76149.432 ? 81.343 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3457825.749 ? 95284.525 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1100458.180 ? 9856.390 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 296393.225 ? 1176.583 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 75271.693 ? 848.788 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 995936.643 ? 8252.270 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 518474.192 ? 2541.371 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 178582.085 ? 337.094 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50037.769 ? 60.497 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1189366.955 ? 3437.169 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 568044.693 ? 6057.314 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 181517.405 ? 248.283 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 49339.073 ? 298.549 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50024.452 ? 53.838 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 49459.758 ? 63.090 ops/s Quarter Round Parallel, With Interleaving ----------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 3880433.294 ? 9904.562 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1157285.625 ? 2415.082 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 301986.767 ? 339.147 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 75990.670 ? 194.671 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3486874.086 ? 93507.311 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1111966.942 ? 9602.005 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 297633.816 ? 1455.184 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 74817.230 ? 1737.888 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 998384.311 ? 7491.076 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 517031.021 ? 1756.181 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 179139.212 ? 401.008 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 49796.519 ? 609.335 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1207581.459 ? 13757.759 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 576596.806 ? 4205.682 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 184108.182 ? 229.014 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50120.498 ? 300.391 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50053.528 ? 181.415 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50232.767 ? 62.234 ops/s Block Parallel, No Interleaving ------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 4107524.407 ? 9337.726 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1210532.736 ? 1111.846 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 315178.899 ? 375.858 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 78782.555 ? 856.939 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3601509.841 ? 103375.315 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1156918.875 ? 9666.447 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 312270.458 ? 1726.717 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 79394.369 ? 513.291 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1029546.842 ? 2317.072 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 532504.493 ? 2836.934 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 183874.028 ? 332.438 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51739.678 ? 122.138 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1263370.572 ? 15424.473 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 588853.049 ? 3419.509 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 188899.111 ? 160.103 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51516.978 ? 147.720 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51758.247 ? 39.852 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51441.519 ? 278.059 ops/s Block Parallel, With Interleaving --------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 4154482.236 ? 8208.082 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1221710.558 ? 5967.515 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 319918.165 ? 327.235 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 80602.283 ? 193.687 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3710733.896 ? 88631.462 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1168824.003 ? 10465.340 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 315040.718 ? 1389.500 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 80365.126 ? 586.286 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1007279.441 ? 8794.990 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 536758.995 ? 3346.320 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 184600.058 ? 362.456 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 52079.247 ? 38.558 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1233639.918 ? 7503.063 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 593298.939 ? 3886.323 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 190535.858 ? 215.443 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51953.765 ? 226.078 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 52073.085 ? 46.961 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51815.757 ? 331.563 ops/s Benchmarks for Neoverse-N1: System: 2x Neoverse-N1, 2 cores, 1 socket, 1 thread/core (var 0x3, part, 0xD0C) Quarter-Round Parallel Intrinsics Implementation ------------------------------------------------ Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2219198.137 ± 13314.344 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 684200.661 ± 3601.031 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 181048.566 ± 942.201 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 46150.219 ± 118.031 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 2049320.671 ± 9549.691 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 663456.090 ± 2722.964 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 179921.834 ± 573.613 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 45885.159 ± 102.974 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 476694.433 ± 4118.055 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 251749.129 ± 1535.415 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 87052.901 ± 436.111 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24099.749 ± 136.009 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 601333.942 ± 5414.186 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 280884.583 ± 2332.119 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 90250.320 ± 604.948 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24346.217 ± 101.557 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23950.145 ± 119.081 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24405.675 ± 93.554 ops/s Quarter-Round Parallel Intrinsics with Interleaving Implementation: ------------------------------------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2344673.121 ± 14885.986 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 724626.059 ± 3078.617 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 192723.841 ± 744.860 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 49050.992 ± 118.087 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 2136919.832 ± 7229.740 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 703672.009 ± 2520.798 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 191748.973 ± 421.704 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 48939.791 ± 194.749 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 497137.864 ± 2915.527 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 262127.552 ± 1302.946 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 90018.698 ± 425.574 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24987.421 ± 119.936 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 634980.497 ± 4191.567 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 293529.897 ± 1496.703 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 93230.690 ± 480.282 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24936.479 ± 112.139 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24897.542 ± 76.891 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25128.075 ± 120.033 ops/s Block-Parallel Intrinsics Implementation ---------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2164945.312 ± 8845.473 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 659831.098 ± 1968.217 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 175252.222 ± 512.910 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 44329.489 ± 126.564 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 1975016.045 ± 11695.931 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 640856.881 ± 1830.533 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 173305.072 ± 366.240 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 44208.373 ± 107.018 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 466351.469 ± 3278.807 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 247662.489 ± 1165.507 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 85367.721 ± 404.796 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23492.360 ± 92.043 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 589645.973 ± 4262.663 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 278130.465 ± 1394.179 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 88081.739 ± 443.476 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23853.430 ± 104.346 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23620.475 ± 75.932 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23750.134 ± 118.572 ops/s Block-Parallel with Interleaving Intrinsics Implementation ---------------------------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2358246.820 ± 14256.312 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 734318.183 ± 2447.434 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 196243.937 ± 517.431 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 50008.245 ± 85.350 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 2156054.908 ± 5432.249 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 713847.200 ± 1962.784 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 194383.466 ± 464.389 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 49652.092 ± 166.716 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 497410.798 ± 3632.927 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 261587.126 ± 1336.591 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 90453.673 ± 429.630 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24963.118 ± 103.795 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 623876.407 ± 4655.637 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 292279.929 ± 1345.033 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 93352.350 ± 429.286 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25190.232 ± 121.961 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25128.018 ± 84.863 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25371.698 ± 129.837 ops/s Benchmarks for Cortex-A72: 4 processor Cortex-A72, 1 cluster, 4 cores/cluster, 1 thread/core (var 0x0, part 0xD08) Quarter Round Parallel Implementation, No Interleaving ------------------------------------------------------ Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 602983.483 ± 6556.879 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 186189.843 ± 628.835 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 49499.230 ± 139.811 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 12487.617 ± 69.484 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 592209.356 ± 3927.984 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 185091.856 ± 366.779 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 49491.296 ± 117.179 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 12512.907 ± 71.587 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 96212.313 ± 2482.928 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 65131.604 ± 1504.555 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 27746.783 ± 229.856 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8381.946 ± 32.122 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 129453.321 ± 3224.106 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 77091.625 ± 1470.684 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 29334.590 ± 303.107 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8460.356 ± 8.524 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8386.624 ± 34.163 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8471.573 ± 8.635 ops/s Quarter Round Parallel Implementaion, With Interleaving ------------------------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 767143.826 ± 9195.715 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 254386.139 ± 1378.080 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 69152.606 ± 176.940 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 17609.457 ± 71.086 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 746643.194 ± 9077.375 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 251953.223 ± 959.588 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 69064.757 ± 197.231 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 17563.052 ± 97.678 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 105520.550 ± 2805.637 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 72902.046 ± 1738.503 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 33446.843 ± 377.742 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10437.913 ± 31.702 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 141153.205 ± 3693.280 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 89657.996 ± 1635.631 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 35926.981 ± 244.574 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10555.879 ± 18.698 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10440.037 ± 33.023 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10542.745 ± 45.282 ops/s Block Parallel Implementation, No Interleaving ---------------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 587100.753 ± 5754.708 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 178737.840 ± 730.445 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 47340.182 ± 121.627 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 11947.269 ± 66.887 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 574123.343 ± 3838.477 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 177870.311 ± 420.125 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 47409.796 ± 109.224 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 11967.672 ± 65.803 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 95867.086 ± 2228.000 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 63376.433 ± 1301.826 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 26988.391 ± 231.289 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8139.090 ± 20.871 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 127770.261 ± 3262.540 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 76019.408 ± 1226.583 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 28652.283 ± 214.896 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8208.186 ± 11.455 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8131.508 ± 27.548 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8207.550 ± 13.086 ops/s Block Parallel Implementation, With Interleaving ------------------------------------------------ Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 826086.130 ± 9933.137 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 276583.128 ± 1434.611 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 75688.367 ± 228.277 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 19348.013 ± 77.810 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 800978.386 ± 10445.822 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 274107.264 ± 1606.978 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 75446.852 ± 209.379 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 19270.292 ± 105.573 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 105988.778 ± 3001.220 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 76162.169 ± 1692.042 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 34978.996 ± 468.786 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11040.040 ± 31.844 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 146046.188 ± 3471.952 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 94041.417 ± 1834.558 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 37770.658 ± 311.519 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11183.053 ± 11.204 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11037.956 ± 39.522 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11196.095 ± 33.796 ops/s ------------- PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776357177 PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776369079 PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776371619