On Thu, 3 Apr 2025 16:31:39 GMT, Jamil Nimeh <jni...@openjdk.org> wrote:

> This fix addresses a performance regression found on some aarch64 processors, 
> namely the Apple M1, when we moved to a quarter round parallel implementation 
> in JDK-8349106.  After making some improvements in the ordering of the 
> instructions in the 20-round loop we found that going back to a 
> block-parallel implementation was faster, but it definitely needed the 
> ordering changes for that to be the case.  More importantly, the block 
> parallel implementation with the interleaving turns out to be faster on even 
> those processors that showed improvements when moving to the quarter round 
> parallel implementation.
> 
> There is a spreadsheet attached to the JBS bug that shows 3 different 
> implementations relative to the current (QR-parallel with no interleaving) 
> implementation on 3 different ARM64 processors.  Comparative benchmarks can 
> also be found below.

Benchmarks for Apple M1:

MacOS Sonoma 14.5, 8x Apple M1


Quarter Round Parallel, No Interleaving
---------------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       
Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  3837175.980 ? 
14108.076  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  1150065.857 ?  
2238.499  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   299444.203 ?  
1914.377  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    76149.432 ?    
81.343  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  3457825.749 ? 
95284.525  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  1100458.180 ?  
9856.390  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   296393.225 ?  
1176.583  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    75271.693 ?   
848.788  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   995936.643 ?  
8252.270  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   518474.192 ?  
2541.371  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   178582.085 ?   
337.094  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    50037.769 ?    
60.497  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  1189366.955 ?  
3437.169  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   568044.693 ?  
6057.314  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   181517.405 ?   
248.283  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    49339.073 ?   
298.549  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    50024.452 ?    
53.838  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    49459.758 ?    
63.090  ops/s


Quarter Round Parallel, With Interleaving
-----------------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       
Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  3880433.294 ?  
9904.562  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  1157285.625 ?  
2415.082  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   301986.767 ?   
339.147  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    75990.670 ?   
194.671  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  3486874.086 ? 
93507.311  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  1111966.942 ?  
9602.005  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   297633.816 ?  
1455.184  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    74817.230 ?  
1737.888  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   998384.311 ?  
7491.076  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   517031.021 ?  
1756.181  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   179139.212 ?   
401.008  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    49796.519 ?   
609.335  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  1207581.459 ? 
13757.759  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   576596.806 ?  
4205.682  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   184108.182 ?   
229.014  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    50120.498 ?   
300.391  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    50053.528 ?   
181.415  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    50232.767 ?    
62.234  ops/s


Block Parallel, No Interleaving
-------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       
 Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  4107524.407 ?   
9337.726  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  1210532.736 ?   
1111.846  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   315178.899 ?    
375.858  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    78782.555 ?    
856.939  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  3601509.841 ? 
103375.315  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  1156918.875 ?   
9666.447  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   312270.458 ?   
1726.717  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    79394.369 ?    
513.291  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  1029546.842 ?   
2317.072  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   532504.493 ?   
2836.934  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   183874.028 ?    
332.438  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    51739.678 ?    
122.138  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  1263370.572 ?  
15424.473  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   588853.049 ?   
3419.509  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   188899.111 ?    
160.103  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    51516.978 ?    
147.720  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    51758.247 ?     
39.852  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    51441.519 ?    
278.059  ops/s


Block Parallel, With Interleaving
---------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       
Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  4154482.236 ?  
8208.082  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  1221710.558 ?  
5967.515  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   319918.165 ?   
327.235  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    80602.283 ?   
193.687  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  3710733.896 ? 
88631.462  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  1168824.003 ? 
10465.340  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   315040.718 ?  
1389.500  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    80365.126 ?   
586.286  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  1007279.441 ?  
8794.990  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   536758.995 ?  
3346.320  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   184600.058 ?   
362.456  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    52079.247 ?    
38.558  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  1233639.918 ?  
7503.063  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   593298.939 ?  
3886.323  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   190535.858 ?   
215.443  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    51953.765 ?   
226.078  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    52073.085 ?    
46.961  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    51815.757 ?   
331.563  ops/s

Benchmarks for Neoverse-N1:

System: 2x Neoverse-N1, 2 cores, 1 socket, 1 thread/core (var 0x3, part, 0xD0C)


Quarter-Round Parallel Intrinsics Implementation
------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       
Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  2219198.137 ± 
13314.344  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40   684200.661 ±  
3601.031  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   181048.566 ±   
942.201  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    46150.219 ±   
118.031  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  2049320.671 ±  
9549.691  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40   663456.090 ±  
2722.964  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   179921.834 ±   
573.613  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    45885.159 ±   
102.974  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   476694.433 ±  
4118.055  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   251749.129 ±  
1535.415  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    87052.901 ±   
436.111  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    24099.749 ±   
136.009  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   601333.942 ±  
5414.186  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   280884.583 ±  
2332.119  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    90250.320 ±   
604.948  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    24346.217 ±   
101.557  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    23950.145 ±   
119.081  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    24405.675 ±    
93.554  ops/s


Quarter-Round Parallel Intrinsics with Interleaving Implementation:
-------------------------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       
Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  2344673.121 ± 
14885.986  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40   724626.059 ±  
3078.617  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   192723.841 ±   
744.860  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    49050.992 ±   
118.087  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  2136919.832 ±  
7229.740  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40   703672.009 ±  
2520.798  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   191748.973 ±   
421.704  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    48939.791 ±   
194.749  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   497137.864 ±  
2915.527  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   262127.552 ±  
1302.946  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    90018.698 ±   
425.574  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    24987.421 ±   
119.936  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   634980.497 ±  
4191.567  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   293529.897 ±  
1496.703  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    93230.690 ±   
480.282  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    24936.479 ±   
112.139  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    24897.542 ±    
76.891  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    25128.075 ±   
120.033  ops/s


Block-Parallel Intrinsics Implementation
----------------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       
Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  2164945.312 ±  
8845.473  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40   659831.098 ±  
1968.217  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   175252.222 ±   
512.910  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    44329.489 ±   
126.564  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  1975016.045 ± 
11695.931  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40   640856.881 ±  
1830.533  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   173305.072 ±   
366.240  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    44208.373 ±   
107.018  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   466351.469 ±  
3278.807  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   247662.489 ±  
1165.507  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    85367.721 ±   
404.796  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    23492.360 ±    
92.043  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   589645.973 ±  
4262.663  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   278130.465 ±  
1394.179  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    88081.739 ±   
443.476  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    23853.430 ±   
104.346  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    23620.475 ±    
75.932  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    23750.134 ±   
118.572  ops/s


Block-Parallel with Interleaving Intrinsics Implementation
----------------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       
Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  2358246.820 ± 
14256.312  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40   734318.183 ±  
2447.434  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   196243.937 ±   
517.431  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    50008.245 ±    
85.350  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  2156054.908 ±  
5432.249  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40   713847.200 ±  
1962.784  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   194383.466 ±   
464.389  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40    49652.092 ±   
166.716  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   497410.798 ±  
3632.927  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   261587.126 ±  
1336.591  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    90453.673 ±   
429.630  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    24963.118 ±   
103.795  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   623876.407 ±  
4655.637  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   292279.929 ±  
1345.033  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    93352.350 ±   
429.286  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    25190.232 ±   
121.961  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    25128.018 ±    
84.863  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    25371.698 ±   
129.837  ops/s

Benchmarks for Cortex-A72:

4 processor Cortex-A72, 1 cluster, 4 cores/cluster, 1 thread/core (var 0x0, 
part 0xD08)

Quarter Round Parallel Implementation, No Interleaving
------------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt       Score      
Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  602983.483 ± 
6556.879  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  186189.843 ±  
628.835  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   49499.230 ±  
139.811  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40   12487.617 ±   
69.484  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  592209.356 ± 
3927.984  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  185091.856 ±  
366.779  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   49491.296 ±  
117.179  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40   12512.907 ±   
71.587  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   96212.313 ± 
2482.928  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   65131.604 ± 
1504.555  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   27746.783 ±  
229.856  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    8381.946 ±   
32.122  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  129453.321 ± 
3224.106  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   77091.625 ± 
1470.684  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   29334.590 ±  
303.107  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    8460.356 ±    
8.524  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    8386.624 ±   
34.163  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    8471.573 ±    
8.635  ops/s


Quarter Round Parallel Implementaion, With Interleaving
-------------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt       Score      
Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  767143.826 ± 
9195.715  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  254386.139 ± 
1378.080  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   69152.606 ±  
176.940  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40   17609.457 ±   
71.086  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  746643.194 ± 
9077.375  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  251953.223 ±  
959.588  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   69064.757 ±  
197.231  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40   17563.052 ±   
97.678  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  105520.550 ± 
2805.637  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   72902.046 ± 
1738.503  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   33446.843 ±  
377.742  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   10437.913 ±   
31.702  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  141153.205 ± 
3693.280  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   89657.996 ± 
1635.631  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   35926.981 ±  
244.574  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   10555.879 ±   
18.698  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   10440.037 ±   
33.023  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   10542.745 ±   
45.282  ops/s


Block Parallel Implementation, No Interleaving
----------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt       Score      
Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  587100.753 ± 
5754.708  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  178737.840 ±  
730.445  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   47340.182 ±  
121.627  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40   11947.269 ±   
66.887  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  574123.343 ± 
3838.477  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  177870.311 ±  
420.125  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   47409.796 ±  
109.224  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40   11967.672 ±   
65.803  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   95867.086 ± 
2228.000  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   63376.433 ± 
1301.826  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   26988.391 ±  
231.289  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    8139.090 ±   
20.871  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  127770.261 ± 
3262.540  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   76019.408 ± 
1226.583  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   28652.283 ±  
214.896  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    8208.186 ±   
11.455  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    8131.508 ±   
27.548  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40    8207.550 ±   
13.086  ops/s


Block Parallel Implementation, With Interleaving
------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  
(mode)  (padding)      (permutation)  (provider)   Mode  Cnt       Score       
Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  826086.130 ±  
9933.137  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  276583.128 ±  
1434.611  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   75688.367 ±   
228.277  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40   19348.013 ±    
77.810  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256   
 None  NoPadding           ChaCha20              thrpt   40  800978.386 ± 
10445.822  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256   
 None  NoPadding           ChaCha20              thrpt   40  274107.264 ±  
1606.978  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256   
 None  NoPadding           ChaCha20              thrpt   40   75446.852 ±   
209.379  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256   
 None  NoPadding           ChaCha20              thrpt   40   19270.292 ±   
105.573  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  105988.778 ±  
3001.220  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   76162.169 ±  
1692.042  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   34978.996 ±   
468.786  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   11040.040 ±    
31.844  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40  146046.188 ±  
3471.952  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   94041.417 ±  
1834.558  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   37770.658 ±   
311.519  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   11183.053 ±    
11.204  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   11037.956 ±    
39.522  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256   
 None  NoPadding  ChaCha20-Poly1305              thrpt   40   11196.095 ±    
33.796  ops/s

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776357177
PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776369079
PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776371619

Reply via email to