Hi Andrew,

Does your alignment changes affect x86 only or should this help all 
architectures?

In general I don't see a disadvantage and that it could be expanded to other 
places in crypto too. But I have think about the effects on sparc, so that 
would need to be tested.  Right now the sparc intrinsic does alignment checking 
and realigning, so it would be interesting to see if ByteArrays performed 
better than the intrinsic alignment.   I assume you don't have the hardware to 
test sparc, right?

Tony

> On Aug 27, 2015, at 10:51 AM, Andrew Haley <a...@redhat.com> wrote:
> 
> I've been looking at the performance of AES/GCM.  The profile is quite
> surprising:
> 
> samples  cum. samples  %        cum. %  symbol name
> 476009   476009        36.7033  36.7033 aescrypt_encryptBlock
> 297239   773248        22.9190  59.6224 ghash_processBlocks
> 195334   968582        15.0615  74.6839 int 
> com.sun.crypto.provider.GCTR.doFinal(byte[], int, int, byte[], int)
> 
> I would have expected aescrypt_encryptBlock and ghash_processBlocks to
> be very high, but that GCTR.doFinal is so high is rather surprising:
> all it has to do is increment a counter, call aescrypt_encryptBlock,
> and xor the result with the plaintext.
> 
> The problem seems to be due to byte accesses in GCTR.doFinal() and
> GaloisCounterMode.update().  Earlier this year I wrote a bunch of new
> Unsafe.get/put-XX-Unaligned methods, and these are ideal for bulk
> accesses to byte arrays.  So, as an experiment I wrote some methods to
> do array accesses and used them to speed up GCM, with this result:
> 
> 492274   492274        40.8856  40.8856    13256.jo                 
> aescrypt_encryptBlock
> 298185   790459        24.7656  65.6512    13256.jo                 
> ghash_processBlocks
> 86325    876784         7.1697  72.8209    13256.jo                 int 
> com.sun.crypto.provider.GCTR.update(byte[], int, int, byte[], int)
> 
> GCTR.update() is twice as fast as it was, and overall the performance
> of AES/GCM is 10% better.
> 
> The changes to the GCM code are quite minor:
> 
> diff -r 6940407d544a 
> src/java.base/share/classes/com/sun/crypto/provider/GCTR.java
> --- a/src/java.base/share/classes/com/sun/crypto/provider/GCTR.java     Thu 
> Aug 20 07:36:37 2015 -0700
> +++ b/src/java.base/share/classes/com/sun/crypto/provider/GCTR.java     Thu 
> Aug 27 16:17:25 2015 +0100
> @@ -94,11 +97,12 @@
>         int numOfCompleteBlocks = inLen / AES_BLOCK_SIZE;
>         for (int i = 0; i < numOfCompleteBlocks; i++) {
>             aes.encryptBlock(counter, 0, encryptedCntr, 0);
> -            for (int n = 0; n < AES_BLOCK_SIZE; n++) {
> -                int index = (i * AES_BLOCK_SIZE + n);
> -                out[outOfs + index] =
> -                    (byte) ((in[inOfs + index] ^ encryptedCntr[n]));
> -            }
> +            int index = i * AES_BLOCK_SIZE;
> +            ByteArrays.putLongs(out, outOfs + index,
> +                                (ByteArrays.getLong(in, inOfs + index + 0) ^
> +                                 ByteArrays.getLong(encryptedCntr, 0)),
> +                                (ByteArrays.getLong(in, inOfs + index + 8) ^
> +                                 ByteArrays.getLong(encryptedCntr, 8)));
>             GaloisCounterMode.increment32(counter);
>         }
>         return inLen;
> diff -r 6940407d544a 
> src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java
> --- 
> a/src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java  
>       Thu Aug 20 07:36:37 2015 -0700
> +++ 
> b/src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java  
>       Thu Aug 27 16:17:25 2015 +0100
> @@ -82,11 +82,8 @@
>             // should never happen
>             throw new ProviderException("Illegal counter block length");
>         }
> -        // start from last byte and only go over 4 bytes, i.e. total 32 bits
> -        int n = value.length - 1;
> -        while ((n >= value.length - 4) && (++value[n] == 0)) {
> -            n--;
> -        }
> +        int counter = ByteArrays.getInt(value, value.length - 4, true);
> +        ByteArrays.putInt(value, value.length - 4, counter + 1, true);
>     }
> 
>     // ivLen in bits
> 
> I've attached the full diff.
> 
> So, here's my question: there are many places over the crypto code
> base where we could take advantage of this.  Do you think it makes
> sense to make changes like this?  I can't see any major disadvantages,
> and it's a considerable performance improvement.
> 
> Andrew.
> 
> 
> <GCTR.diffs>

Reply via email to