On Sun, 13 Nov 2022 02:54:10 GMT, Anthony Scarpino <ascarp...@openjdk.org> 
wrote:

> I would like a review of an update to the GCM code.  A recent report showed 
> that GCM memory usage for TLS was very large.  This was a result of in-place 
> buffers, which TLS uses, and how the code handled the combined intrinsic 
> method during decryption.  A temporary buffer was used because the combined 
> intrinsic does gctr before ghash which results in a bad tag.  The fix is to 
> not use the combined intrinsic during in-place decryption and depend on the 
> individual GHASH and CounterMode intrinsics.  Direct ByteBuffers are not 
> affected as they are not used by the intrinsics directly.
> 
> The reduction in the memory usage boosted performance back to where it was 
> before despite using slower intrinsics (gctr & ghash individually).  The 
> extra memory allocation for the temporary buffer out-weighted the faster 
> intrinsic.
> 
> 
>     JDK 17:   122913.554 ops/sec
>     JDK 19:    94885.008 ops/sec
>     Post fix: 122735.804 ops/sec 
> 
> There is no regression test because this is a memory change and test coverage 
> already existing.

Carter, when I looked at this a few months back (admittedly I'm a fairly 
careless profiler and didn't fully dig down to a root cause) I felt as though 
direct bytebuffers were possibly getting compromised round about here: 
https://github.com/openjdk/jdk/blob/3416bfa2560e240b5e602f10e98e8a06c96852df/src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java#L722,
 where... essentially it's easier for intrinsics to operate on byte arrays and 
so any direct data passed in gets copied into a new byte array which is then 
passed into the intrinsic.

It's possible that I'm misunderstanding, however. I think one could test this 
hypothesis by adjusting the size of PARALLEL_LEN. Halving it will lead to a 
less efficient intrinsic usage but correspondingly halve the allocation rate.

-------------

PR: https://git.openjdk.org/jdk/pull/11121

Reply via email to