On Tue, 29 Sep 2020 20:22:55 GMT, Anthony Scarpino <ascarp...@openjdk.org> 
wrote:

> 8253821: Improve ByteBuffer performance with GCM

I'd like a review of this change.  They are two performance improvements to 
AES-GCM, the larger being the usage with
ByteBuffers.  Below are the details of the change and are listed in the JBS bug 
description, any future comments will
be applied to the bug:

There were two areas of focus, the primary is when direct bytebuffers are used 
with some crypto algorithms, data is
copied to byte arrays numerous times, causing unnecessary memory allocation and 
bringing down performance.  The other
focus was the non-direct bytebuffer output arrays.

This change comes in multiple parts:
1)  Changing CipherCore to not allocate a new output array if the existing 
array is large enough.  Create a new array
only if the length is not enough.  The only SunJCE algorithm that has special 
output needs is GCM which can be dealt
with elsewhere.  2) AESCipher has a one-size-fits-all approach to bytebuffers.  
All encryption and decryption is done
in byte arrays.  When the input data is a byte array or a bytebuffer backed by 
a byte array, this is ok.  However when
it is a direct buffer, the data is copied into a new byte array.  
Unfortunately, this hurts SSLEngine which uses direct
buffers causing multiple copies of data down to the raw algorithm.  
Additionally GCM code and other related classes had
to be changed to allow ByteBuffers down to the algorithm where it can be copied 
into a fixed size byte array that can
be reused.  Code without this modifications running JFR with Flink, a 
performance test, shows ~150GB of byte array
allocation in one minute of operation, afterward 7GB. 3) GCM needed some 
reworking of the logic.  Being an
authenticated cipher, if the GHASH check fails, the decryption fails and no 
data is returned.  The existing code would
perform the decryption at the same time as the GHASH check, which current 
design offers no parallel performance
advantage.  Performing GHASH fully before decryption prevents allocating output 
data and perform unneeded operations if
the GHASH is failed.  If GHASH is successful, in-place operations can be 
performed directly to the buffer without
allocating an intermediary buffer and then copying that data. 4) GCTR and GHASH 
allocating a fixed buffer size if the
data size is over 1k when going into an intrinsic.  At this time copying data 
from the bytebuffer into a byte array for
the intrinsic to work on it is required.  We cannot eliminate the copy, but we 
can reduce the size of the allocated
buffer.  There is little harm in creating a maximum size this buffer can be and 
copy data into that buffer repeatedly
until it is finished.  Having the maximum size at 4k does produce slightly 
faster top-end performance at times, but
inconsistent results and an increase in memory usage from 7GB to 17GB have been 
inconclusive to increase the buffer
size. 5) Using bytebuffers allows for using duplicate() which lets the code 
easier chop up the data without unnecessary
copying

The CipherCore change provided a 6% performance gains for GCM with byte array 
based data, such as SSLSocket and direct
API calls.  Similar performance gains should be evident with other algorithms 
using this method.

The GCM bytebuffer and logic changes produced a 16% increase in performance in 
the Flink test.  This is limited to only
GCM as the other algorithms still use bytebuffer-to-byte array copy method.  
Doing similar work on other algorithms
would provide less of a performance gain because of the complexities of GCM and 
are have diminishing usage in TLS.

-------------

PR: https://git.openjdk.java.net/jdk/pull/411

Reply via email to