On Thu, 23 Nov 2023 10:30:52 GMT, Daniel Jeliński <djelin...@openjdk.org> wrote:

>> Hi,
>> 
>> I need a review for a new internal buffer class called AEADBufferStream.  
>> AEADBufferStream extends ByteArrayOutputStream, but eliminates some data 
>> checking and copying that are not necessary for what GaloisCounterMode.java 
>> and ChaCha20Cipher.java need.  
>> 
>> The changes greatest benefit is with decryption operations.  
>> ChaCha20-Poly1305 had larger performance gains by adopting similar 
>> techniques that AES/GCM already uses. 
>> 
>> The new buffer shows up to 21% bytes/sec performance increase for decryption 
>> for ChaCha20-Poly1305 and 12% for AES/GCM.  16K data sizes saw a memory 
>> usage reduction of 46% with and 83% with ChaCha20-Poly1305.  These results 
>> come from the JMH tests updated in this request and memory usage using the 
>> JMH gc profile gc.alloc.rate.norm entry
>> 
>> thanks
>> 
>> Tony
>
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java 
> line 45:
> 
>> 43:      * Create an instance with no buffer
>> 44:      */
>> 45:     public AEADBufferedStream() {
> 
> Never used. And once you remove it, `buf` is never null

Sure

> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java 
> line 55:
> 
>> 53: 
>> 54:     public AEADBufferedStream(int len) {
>> 55:         buf = new byte[len];
> 
> Suggestion:
> 
>         super(len);
> 
> otherwise buf will be initialized twice, once here, once in the base class 
> constructor

Yeah.. better memory performance too.

> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java 
> line 56:
> 
>> 54:     public AEADBufferedStream(int len) {
>> 55:         buf = new byte[len];
>> 56:         count = 0;
> 
> no need to zero-initialize, it's zero by default

Yes

> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java 
> line 67:
> 
>> 65:      */
>> 66:     @Override
>> 67:     public synchronized byte[] toByteArray() {
> 
> remove this `synchronized`; the new `write` methods are not synchronized, so 
> this does not add value.

Interesting, I thought I was stuck with `synchronized` from the parent

> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java 
> line 68:
> 
>> 66:     @Override
>> 67:     public synchronized byte[] toByteArray() {
>> 68:         if (buf.length > count) {
> 
> Can this ever happen? And if it can, would it be better to return the whole 
> buffer and have the caller extract the necessary range?

Yes, because it can reuse the allocated buffer during encryption scenarios when 
multi-part ops send non-block size data.  For decryption, it should never 
happen.

> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java 
> line 94:
> 
>> 92:         } else {
>> 93:             if (buf.length < (count + len)) {
>> 94:                 buf = Arrays.copyOf(buf, count + len);
> 
> this will copy a lot if the user performs many small writes, for example when 
> decrypting with CipherInputStream; see `AESGCMCipherOutputStream` benchmark.

As I understand the `ByteArrayOutputStream` code, the 
`ArraysSupport.newLength()` will double the allocation each time.  So if the 
buffer is 1k size and it wants to add one more byte, the method will allocate 
2k.
I agree that in my change, if you send one byte at a time, it will be doing a 
lot of allocating.  But I don't believe that is the general case.  If you have 
examples of apps doing small allocations, let me know and I can come up with a 
different scheme.  But at this point I thought it was a bitter risk more 
allocations in the less-likely situation, than unused allocation in what I see 
as a more common case.

> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 
> 699:
> 
>> 697:      * @throws ShortBufferException if the buffer {@code out} does not 
>> have
>> 698:      *      enough space to hold the resulting data.
>> 699:      */    @Override
> 
> Suggestion:
> 
>      */
>     @Override

Yep

> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 
> 792:
> 
>> 790: 
>> 791:     /*
>> 792:      * Optimized version of bufferCrypt from CipherSpi.java.  Direct
> 
> Can you document the optimizations done in this function?

The second sentence says what the optimizations is.

> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 
> 844:
> 
>> 842:                         total = engine.doFinal(inArray, inOfs, inLen, 
>> outArray, outOfs);
>> 843:                     }
>> 844:                 } catch (BadPaddingException | KeyException e) {
> 
> Preexisting, and probably out of scope for this PR, but we should expose the 
> "Counter exhausted" exception in javax.crypto.Cipher; filed 
> [JDK-8320743](https://bugs.openjdk.org/browse/JDK-8320743) for that

Certainly out of scope here.

> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 
> 1427:
> 
>> 1425:             input.get(in);
>> 1426:             byte[] out = new byte[in.length];
>> 1427:             doUpdate(in, 0, in.length, out, out.length);
> 
> Suggestion:
> 
>             byte[] out = in;
>             doUpdate(in, 0, in.length, out, 0);
> 
> I guess we need more test coverage here

I don't see a testing issue there, but that's better memory usage.  I probably 
copied this code over from AES/GCM where it's blocksized data and `in` and 
`out` could have been different sizes.  But CC20 can use this optimization 
because it's a streaming cipher.

> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 
> 1505:
> 
>> 1503:             byte[] in = new byte[input.remaining()];
>> 1504:             input.get(in);
>> 1505:             byte[] out = new byte[in.length];
> 
> Suggestion:
> 
>             byte[] out = in;

yep

> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 
> 1516:
> 
>> 1514:             input.get(in);
>> 1515:             byte[] out = new byte[in.length + TAG_LENGTH];
>> 1516:             doFinal(in, 0, in.length, out, 0);
> 
> A bit more complex, but you can avoid one allocation here too; something like 
> (untested):
> Suggestion:
> 
>             int inLen = input.remaining();
>             byte[] out = new byte[inLen + TAG_LENGTH];
>             byte[] in = out;
>             input.get(in, 0, inLen);
>             doFinal(in, 0, inLen, out, 0);

I will have to check on this, I've run into `BufferUnderflowException` too many 
times with ByteBuffer to say for sure at this point.

> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 
> 1652:
> 
>> 1650:             int ctLen = getBufferedLength() + inLen;
>> 1651:             if (ctLen == 0) {
>> 1652:                 throw new AEADBadTagException("Tag mismatch");
> 
> Suggestion:
> 
>             if (ctLen < TAG_LENGTH) {
>                 throw new AEADBadTagException("Input too short - need tag");

Yep

> test/micro/org/openjdk/bench/javax/crypto/full/AESGCMBench.java line 43:
> 
>> 41:     AlgorithmParameterSpec getNewSpec() {
>> 42:         iv_index = (iv_index + 1) % IV_MODULO;
>> 43:         return new GCMParameterSpec(96, iv, iv_index, 12);
> 
> Can you also change tag length to 128 bits? TLS uses 128, and 128-bit tag 
> generates a slightly different flamegraph.

I'll see if I can do it cleanly. GCM spec defaults to 96bit and because 
CC20P1305 requires 96bit, it made the common code easier. I'm surprised you any 
differences which such a minor change.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946509
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946538
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968884
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946567
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946612
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946637
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946687
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946735
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946776
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406947706
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406947296
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406953636
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968978
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968180

Reply via email to