Re: [PATCH v2 1/2] crypto: Fix out-of-bounds access of the data buffer in generic-gcm-aesni

2017-12-20 Thread Eric Biggers
On Wed, Dec 20, 2017 at 11:28:27AM -0800, Junaid Shahid wrote:
> > > + # adjust the shuffle mask pointer to be able to shift either 0 or 8
> > > + # bytes depending on whether the last block is <8 bytes or not
> > > +mov \DLEN, \TMP1
> > > +and $8, \TMP1
> > > + lea SHIFT_MASK(%rip), %rax
> > > + sub \TMP1, %rax
> > > + movdqu (%rax), \XMM2# get the appropriate shuffle mask
> > > + PSHUFB_XMM \XMM2, \XMM1 # shift left either 0 or 8 bytes
> > 
> > Is there any way this can be simplified to use pslldq?  The shift is just 8
> > bytes or nothing, and we already had to branch earlier depending on whether 
> > we
> > needed to read the 8 bytes or not.
> 
> I am not sure if we can use a simple pslldq without either introducing another
> branch, or duplicating the _read_last_lt8 block for each case of the original
> branch. Do you think that it is better to just duplicate the _read_last_lt8
> block instead of using pshufb? Or do you have any other suggestion about how
> to do it?
> 

The best I can come up with now is just duplicating the "read one byte at a
time" instructions (see below).  One way to avoid the duplication would be to
read the 1-7 byte remainder (end of the block) first and the 8-byte word
(beginning of the block) second, but it would be a bit weird.

# read the last <16 byte block
# Clobbers %rax, TMP1 and XMM1
.macro READ_PARTIAL_BLOCK DPTR DLEN TMP1 XMM1 XMMDst
mov \DLEN, \TMP1
cmp $8, \DLEN
jl _read_partial_lt8_\@
mov (\DPTR), %rax
MOVQ_R64_XMM %rax, \XMMDst
sub $8, \TMP1
jz _done_read_partial_\@
xor %rax, %rax
1:
shl $8, %rax
mov 7(\DPTR, \TMP1, 1), %al
dec \TMP1
jnz 1b
MOVQ_R64_XMM %rax, \XMM1
pslldq $8, \XMM1
por \XMM1, \XMMDst
jmp _done_read_partial_\@

_read_partial_lt8_\@:
xor %rax, %rax
1:
shl $8, %rax
mov -1(\DPTR, \TMP1, 1), %al
dec \TMP1
jnz 1b
MOVQ_R64_XMM %rax, \XMMDst
_done_read_partial_\@:
.endm


Re: [PATCH v2 1/2] crypto: Fix out-of-bounds access of the data buffer in generic-gcm-aesni

2017-12-20 Thread Junaid Shahid
On Wednesday, December 20, 2017 12:36:16 AM PST Eric Biggers wrote:
> 
> Did you run the self-tests (boot with CONFIG_CRYPTO_MANAGER_DISABLE_TESTS
> unset)?  The second patch causes them to start failing:
> 
> [1.169640] alg: aead: Test 7 failed on encryption for rfc4106-gcm-aesni
> [1.178308] alg: aead: Test 4 failed on encryption for generic-gcm-aesni

Thanks for the pointer. Let me run the self-tests and see.

> 
> Also, don't the AVX and AVX2 versions have the same bug?  These patches only 
> fix
> the non-AVX version.

The AVX/AVX2 versions are used only when the data length is at least 640/4K 
bytes respectively. Therefore, the first bug doesn’t apply at all. The second 
bug does exist, but it won’t cause any ill effect at the present time because 
the overrun will be covered by the data bytes. However, I am planning to send 
out a separate patch series that allows for non-contiguous AAD, data and 
AuthTag buffers, which will cause the AAD overrun to manifest even in the AVX 
versions, so I am going to include the AVX version fixes in that series.

> 
> > +# read the last <16 byte block
> > +# Clobbers %rax, DPTR, TMP1 and XMM1-2
> > +.macro READ_PARTIAL_BLOCK DPTR DLEN TMP1 XMM1 XMM2 XMMDst
> > +pxor \XMMDst, \XMMDst
> > +mov \DLEN, \TMP1
> > +cmp $8, \DLEN
> > +jl _read_last_lt8_\@
> > +mov (\DPTR), %rax
> > +MOVQ_R64_XMM %rax, \XMMDst
> > +   add $8, \DPTR
> > +sub $8, \TMP1
> > +jz _done_read_last_partial_block_\@
> > +_read_last_lt8_\@:
> > +   shl $8, %rax
> > +mov -1(\DPTR, \TMP1, 1), %al
> > +dec \TMP1
> > +jnz _read_last_lt8_\@
> > +MOVQ_R64_XMM %rax, \XMM1
> 
> Doesn't %rax needs to be zeroed at '_read_last_lt8' so that the padding is
> zeroed?
> 

Ah, yes. It actually isn’t needed for the first patch, as in that case the 
result returned by this macro is masked appropriately at the call-sites anyway. 
But that is not the case for the second patch. That is probably also the reason 
for the failures that you noticed.

> > +   # adjust the shuffle mask pointer to be able to shift either 0 or 8
> > +   # bytes depending on whether the last block is <8 bytes or not
> > +mov \DLEN, \TMP1
> > +and $8, \TMP1
> > +   lea SHIFT_MASK(%rip), %rax
> > +   sub \TMP1, %rax
> > +   movdqu (%rax), \XMM2# get the appropriate shuffle mask
> > +   PSHUFB_XMM \XMM2, \XMM1 # shift left either 0 or 8 bytes
> 
> Is there any way this can be simplified to use pslldq?  The shift is just 8
> bytes or nothing, and we already had to branch earlier depending on whether we
> needed to read the 8 bytes or not.

I am not sure if we can use a simple pslldq without either introducing another 
branch, or duplicating the _read_last_lt8 block for each case of the original 
branch. Do you think that it is better to just duplicate the _read_last_lt8 
block instead of using pshufb? Or do you have any other suggestion about how to 
do it?

Thanks,
Junaid


Re: [PATCH v2 1/2] crypto: Fix out-of-bounds access of the data buffer in generic-gcm-aesni

2017-12-20 Thread Eric Biggers
On Tue, Dec 19, 2017 at 08:42:58PM -0800, Junaid Shahid wrote:
> The aesni_gcm_enc/dec functions can access memory before the start of
> the data buffer if the length of the data buffer is less than 16 bytes.
> This is because they perform the read via a single 16-byte load. This
> can potentially result in accessing a page that is not mapped and thus
> causing the machine to crash. This patch fixes that by reading the
> partial block byte-by-byte and optionally an via 8-byte load if the block
> was at least 8 bytes.
> 
> Fixes: 0487ccac ("crypto: aesni - make non-AVX AES-GCM work with any aadlen")
> Signed-off-by: Junaid Shahid 

Did you run the self-tests (boot with CONFIG_CRYPTO_MANAGER_DISABLE_TESTS
unset)?  The second patch causes them to start failing:

[1.169640] alg: aead: Test 7 failed on encryption for rfc4106-gcm-aesni
[1.178308] alg: aead: Test 4 failed on encryption for generic-gcm-aesni

Also, don't the AVX and AVX2 versions have the same bug?  These patches only fix
the non-AVX version.

> +# read the last <16 byte block
> +# Clobbers %rax, DPTR, TMP1 and XMM1-2
> +.macro READ_PARTIAL_BLOCK DPTR DLEN TMP1 XMM1 XMM2 XMMDst
> +pxor \XMMDst, \XMMDst
> +mov \DLEN, \TMP1
> +cmp $8, \DLEN
> +jl _read_last_lt8_\@
> +mov (\DPTR), %rax
> +MOVQ_R64_XMM %rax, \XMMDst
> + add $8, \DPTR
> +sub $8, \TMP1
> +jz _done_read_last_partial_block_\@
> +_read_last_lt8_\@:
> + shl $8, %rax
> +mov -1(\DPTR, \TMP1, 1), %al
> +dec \TMP1
> +jnz _read_last_lt8_\@
> +MOVQ_R64_XMM %rax, \XMM1

Doesn't %rax needs to be zeroed at '_read_last_lt8' so that the padding is
zeroed?

> + # adjust the shuffle mask pointer to be able to shift either 0 or 8
> + # bytes depending on whether the last block is <8 bytes or not
> +mov \DLEN, \TMP1
> +and $8, \TMP1
> + lea SHIFT_MASK(%rip), %rax
> + sub \TMP1, %rax
> + movdqu (%rax), \XMM2# get the appropriate shuffle mask
> + PSHUFB_XMM \XMM2, \XMM1 # shift left either 0 or 8 bytes

Is there any way this can be simplified to use pslldq?  The shift is just 8
bytes or nothing, and we already had to branch earlier depending on whether we
needed to read the 8 bytes or not.

Eric


[PATCH v2 1/2] crypto: Fix out-of-bounds access of the data buffer in generic-gcm-aesni

2017-12-19 Thread Junaid Shahid
The aesni_gcm_enc/dec functions can access memory before the start of
the data buffer if the length of the data buffer is less than 16 bytes.
This is because they perform the read via a single 16-byte load. This
can potentially result in accessing a page that is not mapped and thus
causing the machine to crash. This patch fixes that by reading the
partial block byte-by-byte and optionally an via 8-byte load if the block
was at least 8 bytes.

Fixes: 0487ccac ("crypto: aesni - make non-AVX AES-GCM work with any aadlen")
Signed-off-by: Junaid Shahid 
---
 arch/x86/crypto/aesni-intel_asm.S | 86 ---
 1 file changed, 44 insertions(+), 42 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 16627fec80b2..4b16f31cb359 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -85,6 +85,7 @@ enc:.octa 0x2
 # and zero should follow ALL_F
 .section   .rodata, "a", @progbits
 .align 16
+.octa 0x
 SHIFT_MASK: .octa 0x0f0e0d0c0b0a09080706050403020100
 ALL_F:  .octa 0x
 .octa 0x
@@ -256,6 +257,36 @@ aad_shift_arr:
pxor  \TMP1, \GH# result is in TMP1
 .endm
 
+# read the last <16 byte block
+# Clobbers %rax, DPTR, TMP1 and XMM1-2
+.macro READ_PARTIAL_BLOCK DPTR DLEN TMP1 XMM1 XMM2 XMMDst
+pxor \XMMDst, \XMMDst
+mov \DLEN, \TMP1
+cmp $8, \DLEN
+jl _read_last_lt8_\@
+mov (\DPTR), %rax
+MOVQ_R64_XMM %rax, \XMMDst
+   add $8, \DPTR
+sub $8, \TMP1
+jz _done_read_last_partial_block_\@
+_read_last_lt8_\@:
+   shl $8, %rax
+mov -1(\DPTR, \TMP1, 1), %al
+dec \TMP1
+jnz _read_last_lt8_\@
+MOVQ_R64_XMM %rax, \XMM1
+   # adjust the shuffle mask pointer to be able to shift either 0 or 8
+   # bytes depending on whether the last block is <8 bytes or not
+mov \DLEN, \TMP1
+and $8, \TMP1
+   lea SHIFT_MASK(%rip), %rax
+   sub \TMP1, %rax
+   movdqu (%rax), \XMM2# get the appropriate shuffle mask
+   PSHUFB_XMM \XMM2, \XMM1 # shift left either 0 or 8 bytes
+   por \XMM1, \XMMDst
+_done_read_last_partial_block_\@:
+.endm
+
 /*
 * if a = number of total plaintext bytes
 * b = floor(a/16)
@@ -1310,6 +1341,7 @@ _esb_loop_\@:
MOVADQ  (%r10),\TMP1
AESENCLAST  \TMP1,\XMM0
 .endm
+
 /*
 * void aesni_gcm_dec(void *aes_ctx,// AES Key schedule. Starts on a 16 
byte boundary.
 *   u8 *out,   // Plaintext output. Encrypt in-place 
is allowed.
@@ -1385,14 +1417,6 @@ _esb_loop_\@:
 *
 *AAD Format with 64-bit Extended Sequence Number
 *
-* aadLen:
-*   from the definition of the spec, aadLen can only be 8 or 12 bytes.
-*   The code supports 16 too but for other sizes, the code will fail.
-*
-* TLen:
-*   from the definition of the spec, TLen can only be 8, 12 or 16 bytes.
-*   For other sizes, the code will fail.
-*
 * poly = x^128 + x^127 + x^126 + x^121 + 1
 *
 */
@@ -1486,19 +1510,15 @@ _zero_cipher_left_decrypt:
PSHUFB_XMM %xmm10, %xmm0
 
ENCRYPT_SINGLE_BLOCK  %xmm0, %xmm1# E(K, Yn)
-   sub $16, %r11
-   add %r13, %r11
-   movdqu (%arg3,%r11,1), %xmm1   # receive the last <16 byte block
-   lea SHIFT_MASK+16(%rip), %r12
-   sub %r13, %r12
-# adjust the shuffle mask pointer to be able to shift 16-%r13 bytes
-# (%r13 is the number of bytes in plaintext mod 16)
-   movdqu (%r12), %xmm2   # get the appropriate shuffle mask
-   PSHUFB_XMM %xmm2, %xmm1# right shift 16-%r13 butes
 
+   lea (%arg3,%r11,1), %r10
+   READ_PARTIAL_BLOCK %r10 %r13 %r12 %xmm2 %xmm3 %xmm1
+
+   lea ALL_F+16(%rip), %r12
+   sub %r13, %r12
movdqa  %xmm1, %xmm2
pxor %xmm1, %xmm0# Ciphertext XOR E(K, Yn)
-   movdqu ALL_F-SHIFT_MASK(%r12), %xmm1
+   movdqu (%r12), %xmm1
# get the appropriate mask to mask out top 16-%r13 bytes of %xmm0
pand %xmm1, %xmm0# mask out top 16-%r13 bytes of %xmm0
pand%xmm1, %xmm2
@@ -1507,9 +1527,6 @@ _zero_cipher_left_decrypt:
 
pxor %xmm2, %xmm8
GHASH_MUL %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
- # GHASH computation for the last <16 byte block
-   sub %r13, %r11
-   add $16, %r11
 
 # output %r13 bytes
MOVQ_R64_XMM%xmm0, %rax
@@ -1663,14 +1680,6 @@ ENDPROC(aesni_gcm_dec)
 *
 * AAD Format with 64-bit Extended Sequence Number
 *
-* aadLen:
-*   from the definition of the spec, aadLen can only be 8 or 12 bytes.
-*   The c