On Fri, 3 Nov 2023 04:08:27 GMT, Anthony Scarpino <ascarp...@openjdk.org> wrote:
> Hi, > > I need a review for a new internal buffer class called AEADBufferStream. > AEADBufferStream extends ByteArrayOutputStream, but eliminates some data > checking and copying that are not necessary for what GaloisCounterMode.java > and ChaCha20Cipher.java need. > > The changes greatest benefit is with decryption operations. > ChaCha20-Poly1305 had larger performance gains by adopting similar techniques > that AES/GCM already uses. > > The new buffer shows up to 21% bytes/sec performance increase for decryption > for ChaCha20-Poly1305 and 12% for AES/GCM. 16K data sizes saw a memory usage > reduction of 46% with and 83% with ChaCha20-Poly1305. These results come > from the JMH tests updated in this request and memory usage using the JMH gc > profile gc.alloc.rate.norm entry > > thanks > > Tony Thanks for doing this! A few comments in line. src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 45: > 43: * Create an instance with no buffer > 44: */ > 45: public AEADBufferedStream() { Never used. And once you remove it, `buf` is never null src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 55: > 53: > 54: public AEADBufferedStream(int len) { > 55: buf = new byte[len]; Suggestion: super(len); otherwise buf will be initialized twice, once here, once in the base class constructor src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 56: > 54: public AEADBufferedStream(int len) { > 55: buf = new byte[len]; > 56: count = 0; no need to zero-initialize, it's zero by default src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 67: > 65: */ > 66: @Override > 67: public synchronized byte[] toByteArray() { remove this `synchronized`; the new `write` methods are not synchronized, so this does not add value. src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 68: > 66: @Override > 67: public synchronized byte[] toByteArray() { > 68: if (buf.length > count) { Can this ever happen? And if it can, would it be better to return the whole buffer and have the caller extract the necessary range? src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 94: > 92: } else { > 93: if (buf.length < (count + len)) { > 94: buf = Arrays.copyOf(buf, count + len); this will copy a lot if the user performs many small writes, for example when decrypting with CipherInputStream; see `AESGCMCipherOutputStream` benchmark. src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 699: > 697: * @throws ShortBufferException if the buffer {@code out} does not > have > 698: * enough space to hold the resulting data. > 699: */ @Override Suggestion: */ @Override src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 792: > 790: > 791: /* > 792: * Optimized version of bufferCrypt from CipherSpi.java. Direct Can you document the optimizations done in this function? src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 844: > 842: total = engine.doFinal(inArray, inOfs, inLen, > outArray, outOfs); > 843: } > 844: } catch (BadPaddingException | KeyException e) { Preexisting, and probably out of scope for this PR, but we should expose the "Counter exhausted" exception in javax.crypto.Cipher src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 1427: > 1425: input.get(in); > 1426: byte[] out = new byte[in.length]; > 1427: doUpdate(in, 0, in.length, out, out.length); Suggestion: byte[] out = in; doUpdate(in, 0, in.length, out, 0); I guess we need more test coverage here src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 1505: > 1503: byte[] in = new byte[input.remaining()]; > 1504: input.get(in); > 1505: byte[] out = new byte[in.length]; Suggestion: byte[] out = in; src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 1516: > 1514: input.get(in); > 1515: byte[] out = new byte[in.length + TAG_LENGTH]; > 1516: doFinal(in, 0, in.length, out, 0); A bit more complex, but you can avoid one allocation here too; something like (untested): Suggestion: int inLen = input.remaining(); byte[] out = new byte[inLen + TAG_LENGTH]; byte[] in = out; input.get(in, 0, inLen); doFinal(in, 0, inLen, out, 0); src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 1652: > 1650: int ctLen = getBufferedLength() + inLen; > 1651: if (ctLen == 0) { > 1652: throw new AEADBadTagException("Tag mismatch"); Suggestion: if (ctLen < TAG_LENGTH) { throw new AEADBadTagException("Input too short - need tag"); test/micro/org/openjdk/bench/javax/crypto/full/AESGCMBench.java line 43: > 41: AlgorithmParameterSpec getNewSpec() { > 42: iv_index = (iv_index + 1) % IV_MODULO; > 43: return new GCMParameterSpec(96, iv, iv_index, 12); Can you also change tag length to 128 bits? TLS uses 128, and 128-bit tag generates a slightly different flamegraph. test/micro/org/openjdk/bench/javax/crypto/full/AESGCMByteBuffer.java line 44: > 42: AlgorithmParameterSpec getNewSpec() { > 43: iv_index = (iv_index + 1) % IV_MODULO; > 44: return new GCMParameterSpec(96, iv, iv_index, 12); Also use 128-bit tag here. test/micro/org/openjdk/bench/javax/crypto/full/BenchBase.java line 122: > 120: public void decrypt() throws Exception { > 121: decryptCipher.init(Cipher.DECRYPT_MODE, ks, > 122: encryptCipher.getParameters(). Consider using a different method to obtain parameter spec here; the flamegraph suggests that `getParameters` is responsible for up to 20% of the time spent in these benchmarks test/micro/org/openjdk/bench/javax/crypto/small/AESGCMBench.java line 48: > 46: AlgorithmParameterSpec getNewSpec() { > 47: iv_index = (iv_index + 1) % IV_MODULO; > 48: return new GCMParameterSpec(96, iv, iv_index, 12); Also use 128-bit tag here. test/micro/org/openjdk/bench/javax/crypto/small/AESGCMByteBuffer.java line 48: > 46: AlgorithmParameterSpec getNewSpec() { > 47: iv_index = (iv_index + 1) % IV_MODULO; > 48: return new GCMParameterSpec(96, iv, iv_index, 12); Also use 128-bit tag here. ------------- Changes requested by djelinski (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16487#pullrequestreview-1746152716 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403191252 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403308026 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403191602 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403193417 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403194450 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403196426 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403196854 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403224371 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403205814 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403238862 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403239235 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403244203 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403252213 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403271442 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403271853 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403303743 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403274634 PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1403274860