On Tue 09 Oct 2018 02:55:39 PM CEST, Daniel P. Berrangé wrote: > Using 64-bit arithmetic increases the performance for xts-aes-128 > when built with gcrypt: > > Encrypt: 235 MB/s -> 320 MB/s > Decrypt: 245 MB/s -> 325 MB/s > > Signed-off-by: Daniel P. Berrangé <berra...@redhat.com> > --- > crypto/xts.c | 52 +++++++++++++++++++++++++++++++++------------------- > 1 file changed, 33 insertions(+), 19 deletions(-) > > diff --git a/crypto/xts.c b/crypto/xts.c > index ded4365191..f109c8a3ee 100644 > --- a/crypto/xts.c > +++ b/crypto/xts.c > @@ -31,6 +31,12 @@ typedef struct { > uint64_t b; > } xts_uint128; > > +#define xts_uint128_xor(D, S1, S2) \ > + do { \ > + (D)->a = (S1)->a ^ (S2)->a; \ > + (D)->b = (S1)->b ^ (S2)->b; \ > + } while (0) > + > static void xts_mult_x(uint8_t *I) > { > int x; > @@ -59,25 +65,19 @@ static void xts_mult_x(uint8_t *I) > */ > static void xts_tweak_encdec(const void *ctx, > xts_cipher_func *func, > - const uint8_t *src, > - uint8_t *dst, > - uint8_t *iv) > + const xts_uint128 *src, > + xts_uint128 *dst, > + xts_uint128 *iv) > { > - unsigned long x; > - > /* tweak encrypt block i */ > - for (x = 0; x < XTS_BLOCK_SIZE; x++) { > - dst[x] = src[x] ^ iv[x]; > - } > + xts_uint128_xor(dst, src, iv); > > - func(ctx, XTS_BLOCK_SIZE, dst, dst); > + func(ctx, XTS_BLOCK_SIZE, (uint8_t *)dst, (uint8_t *)dst);
In the line of what I said earlier, perhaps it's clearer if you leave everything as uint8_t * and simply make xts_uint128_xor() treat the array as xts_uint128 internally. > for (i = 0; i < lim; i++) { > - xts_tweak_encdec(datactx, decfunc, src, dst, (uint8_t *)&T); > + xts_uint128 S, D; > + > + memcpy(&S, src, XTS_BLOCK_SIZE); > + xts_tweak_encdec(datactx, decfunc, &S, &D, &T); > + memcpy(dst, &D, XTS_BLOCK_SIZE); Why do you need S and D? Berto