On Thu, 23 Nov 2023 10:30:52 GMT, Daniel Jeliński <djelin...@openjdk.org> wrote:
>> Hi, >> >> I need a review for a new internal buffer class called AEADBufferStream. >> AEADBufferStream extends ByteArrayOutputStream, but eliminates some data >> checking and copying that are not necessary for what GaloisCounterMode.java >> and ChaCha20Cipher.java need. >> >> The changes greatest benefit is with decryption operations. >> ChaCha20-Poly1305 had larger performance gains by adopting similar >> techniques that AES/GCM already uses. >> >> The new buffer shows up to 21% bytes/sec performance increase for decryption >> for ChaCha20-Poly1305 and 12% for AES/GCM. 16K data sizes saw a memory >> usage reduction of 46% with and 83% with ChaCha20-Poly1305. These results >> come from the JMH tests updated in this request and memory usage using the >> JMH gc profile gc.alloc.rate.norm entry >> >> thanks >> >> Tony > > src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java > line 45: > >> 43: * Create an instance with no buffer >> 44: */ >> 45: public AEADBufferedStream() { > > Never used. And once you remove it, `buf` is never null Sure > src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java > line 55: > >> 53: >> 54: public AEADBufferedStream(int len) { >> 55: buf = new byte[len]; > > Suggestion: > > super(len); > > otherwise buf will be initialized twice, once here, once in the base class > constructor Yeah.. better memory performance too. > src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java > line 56: > >> 54: public AEADBufferedStream(int len) { >> 55: buf = new byte[len]; >> 56: count = 0; > > no need to zero-initialize, it's zero by default Yes > src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java > line 67: > >> 65: */ >> 66: @Override >> 67: public synchronized byte[] toByteArray() { > > remove this `synchronized`; the new `write` methods are not synchronized, so > this does not add value. Interesting, I thought I was stuck with `synchronized` from the parent > src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java > line 68: > >> 66: @Override >> 67: public synchronized byte[] toByteArray() { >> 68: if (buf.length > count) { > > Can this ever happen? And if it can, would it be better to return the whole > buffer and have the caller extract the necessary range? Yes, because it can reuse the allocated buffer during encryption scenarios when multi-part ops send non-block size data. For decryption, it should never happen. > src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java > line 94: > >> 92: } else { >> 93: if (buf.length < (count + len)) { >> 94: buf = Arrays.copyOf(buf, count + len); > > this will copy a lot if the user performs many small writes, for example when > decrypting with CipherInputStream; see `AESGCMCipherOutputStream` benchmark. As I understand the `ByteArrayOutputStream` code, the `ArraysSupport.newLength()` will double the allocation each time. So if the buffer is 1k size and it wants to add one more byte, the method will allocate 2k. I agree that in my change, if you send one byte at a time, it will be doing a lot of allocating. But I don't believe that is the general case. If you have examples of apps doing small allocations, let me know and I can come up with a different scheme. But at this point I thought it was a bitter risk more allocations in the less-likely situation, than unused allocation in what I see as a more common case. > src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line > 699: > >> 697: * @throws ShortBufferException if the buffer {@code out} does not >> have >> 698: * enough space to hold the resulting data. >> 699: */ @Override > > Suggestion: > > */ > @Override Yep > src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line > 792: > >> 790: >> 791: /* >> 792: * Optimized version of bufferCrypt from CipherSpi.java. Direct > > Can you document the optimizations done in this function? The second sentence says what the optimizations is. > src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line > 844: > >> 842: total = engine.doFinal(inArray, inOfs, inLen, >> outArray, outOfs); >> 843: } >> 844: } catch (BadPaddingException | KeyException e) { > > Preexisting, and probably out of scope for this PR, but we should expose the > "Counter exhausted" exception in javax.crypto.Cipher; filed > [JDK-8320743](https://bugs.openjdk.org/browse/JDK-8320743) for that Certainly out of scope here. > src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line > 1427: > >> 1425: input.get(in); >> 1426: byte[] out = new byte[in.length]; >> 1427: doUpdate(in, 0, in.length, out, out.length); > > Suggestion: > > byte[] out = in; > doUpdate(in, 0, in.length, out, 0); > > I guess we need more test coverage here I don't see a testing issue there, but that's better memory usage. I probably copied this code over from AES/GCM where it's blocksized data and `in` and `out` could have been different sizes. But CC20 can use this optimization because it's a streaming cipher. > src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line > 1505: > >> 1503: byte[] in = new byte[input.remaining()]; >> 1504: input.get(in); >> 1505: byte[] out = new byte[in.length]; > > Suggestion: > > byte[] out = in; yep > src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line > 1516: > >> 1514: input.get(in); >> 1515: byte[] out = new byte[in.length + TAG_LENGTH]; >> 1516: doFinal(in, 0, in.length, out, 0); > > A bit more complex, but you can avoid one allocation here too; something like > (untested): > Suggestion: > > int inLen = input.remaining(); > byte[] out = new byte[inLen + TAG_LENGTH]; > byte[] in = out; > input.get(in, 0, inLen); > doFinal(in, 0, inLen, out, 0); I will have to check on this, I've run into `BufferUnderflowException` too many times with ByteBuffer to say for sure at this point. > src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line > 1652: > >> 1650: int ctLen = getBufferedLength() + inLen; >> 1651: if (ctLen == 0) { >> 1652: throw new AEADBadTagException("Tag mismatch"); > > Suggestion: > > if (ctLen < TAG_LENGTH) { > throw new AEADBadTagException("Input too short - need tag"); Yep > test/micro/org/openjdk/bench/javax/crypto/full/AESGCMBench.java line 43: > >> 41: AlgorithmParameterSpec getNewSpec() { >> 42: iv_index = (iv_index + 1) % IV_MODULO; >> 43: return new GCMParameterSpec(96, iv, iv_index, 12); > > Can you also change tag length to 128 bits? TLS uses 128, and 128-bit tag > generates a slightly different flamegraph. I'll see if I can do it cleanly. GCM spec defaults to 96bit and because CC20P1305 requires 96bit, it made the common code easier. I'm surprised you any differences which such a minor change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946509 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946538 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968884 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946567 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946612 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946637 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946687 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946735 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946776 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406947706 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406947296 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406953636 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968978 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968180