[PATCH 2/4] crypto: testmgr - add/enhance test cases for CRC-T10DIF

2016-11-24 Thread Ard Biesheuvel
The existing test cases only exercise a small slice of the various possible code paths through the x86 SSE/PCLMULQDQ implementation, and the upcoming ports of it for arm64. So add one that exceeds 256 bytes in size, and convert another to a chunked test. Signed-off-by: Ard Biesheuvel --- crypto

[PATCH 3/4] crypto: arm64/crct10dif - port x86 SSE implementation to arm64

2016-11-24 Thread Ard Biesheuvel
This is a straight transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides under in the file arch/x86/crypto/crct10dif-pcl-asm_64.S. Suggested-by: YueHaibing Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 5 + arch/arm64

[PATCH 4/4] crypto: arm/crct10dif - port x86 SSE implementation to ARM

2016-11-24 Thread Ard Biesheuvel
This is a straight transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides under in the file arch/x86/crypto/crct10dif-pcl-asm_64.S. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/Kconfig| 5 + arch/arm/crypto/Makefile

Re: [PATCH 4/4] crypto: arm/crct10dif - port x86 SSE implementation to ARM

2016-11-24 Thread Ard Biesheuvel
On 24 November 2016 at 15:43, Ard Biesheuvel wrote: > This is a straight transliteration of the Intel algorithm implemented > using SSE and PCLMULQDQ instructions that resides under in the file > arch/x86/crypto/crct10dif-pcl-asm_64.S. > > Signed-off-by: Ard Biesheuvel > ---

[PATCH 0/2] CRC32 for ARM and arm64 using PMULL and CRC instructions

2016-11-26 Thread Ard Biesheuvel
last Thursday. https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/log/?h=crc32 Ard Biesheuvel (2): crypto: arm64/crc32 - accelerated support based on x86 SSE implementation crypto: arm/crc32 - accelerated support based on x86 SSE implementation arch/arm/crypto/Kconfig

[PATCH 1/2] crypto: arm64/crc32 - accelerated support based on x86 SSE implementation

2016-11-26 Thread Ard Biesheuvel
blocks of at least 64 bytes, and on multiples of 16 bytes only. For the remaining input, or for all input on systems that lack the PMULL 64x64->128 instructions, the CRC32 instructions will be used. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 6 + arch/arm64/cry

[PATCH 2/2] crypto: arm/crc32 - accelerated support based on x86 SSE implementation

2016-11-26 Thread Ard Biesheuvel
blocks of at least 64 bytes, and on multiples of 16 bytes only. For the remaining input, or for all input on systems that lack the PMULL 64x64->128 instructions, the CRC32 instructions will be used. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/Kconfig | 5 + arch/arm/crypto/Makef

Re: [PATCH v4] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-28 Thread Ard Biesheuvel
On 20 November 2016 at 11:43, Ard Biesheuvel wrote: > On 20 November 2016 at 11:42, Ard Biesheuvel > wrote: >> This integrates both the accelerated scalar and the NEON implementations >> of SHA-224/256 as well as SHA-384/512 from the OpenSSL project. >> >> Relati

Re: [PATCH v4] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-28 Thread Ard Biesheuvel
On 28 November 2016 at 13:05, Will Deacon wrote: > On Sun, Nov 20, 2016 at 11:42:01AM +0000, Ard Biesheuvel wrote: >> This integrates both the accelerated scalar and the NEON implementations >> of SHA-224/256 as well as SHA-384/512 from the OpenSSL project. >> >> Relat

[PATCH] crypto: arm64/sha2: add generated .S files to .gitignore

2016-11-28 Thread Ard Biesheuvel
Add the files that are generated by the recently merged OpenSSL SHA-256/512 implementation to .gitignore so Git disregards them when showing untracked files. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/.gitignore | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 arch/arm64

Re: [PATCH 4/4] crypto: arm/crct10dif - port x86 SSE implementation to ARM

2016-11-28 Thread Ard Biesheuvel
On 28 November 2016 at 14:17, Herbert Xu wrote: > On Thu, Nov 24, 2016 at 05:32:42PM +0000, Ard Biesheuvel wrote: >> On 24 November 2016 at 15:43, Ard Biesheuvel >> wrote: >> > This is a straight transliteration of the Intel algorithm implemented >> > using SS

[PATCH 4/4] crypto: arm64/aes-ce-ctr: fix skcipher conversion

2016-11-29 Thread Ard Biesheuvel
Fix a missing statement that got lost in the skcipher conversion of the CTR transform. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 5c43b92b3714..4e3f8adb1793

[PATCH 1/4] crypto: arm/aes-ce: fix broken monolithic build

2016-11-29 Thread Ard Biesheuvel
s+0x0): first defined here Fix this by making aes_simd_algs 'static'. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 24f6137c1a6e..5c43

[PATCH 3/4] crypto: arm64/aes-ce-ccm - fix decrypt path with new skcipher interface

2016-11-29 Thread Ard Biesheuvel
The new skcipher walk interface does not take into account whether we are encrypting or decrypting. In the latter case, the walk should disregard the MAC. Fix this in the arm64 CE driver. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-glue.c | 7 +++ 1 file changed, 3

[PATCH 2/4] crypto: skcipher - fix crash in skcipher_walk_aead()

2016-11-29 Thread Ard Biesheuvel
. Signed-off-by: Ard Biesheuvel --- crypto/skcipher.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/crypto/skcipher.c b/crypto/skcipher.c index 0f3071991b13..5367f817b40e 100644 --- a/crypto/skcipher.c +++ b/crypto/skcipher.c @@ -506,6 +506,8 @@ int skcipher_walk_aead(struct skcipher_walk *walk

[PATCH] crypto: arm/aesbs - fix brokenness after skcipher conversion

2016-11-29 Thread Ard Biesheuvel
The CBC encryption routine should use the encryption round keys, not the decryption round keys. Signed-off-by: Ard Biesheuvel --- Another fix for the queued changes, this time for 32-bit ARM. I must say, I'm not impressed with the level of testing that has been carried out after applying

Re: [PATCH 3/4] crypto: arm64/aes-ce-ccm - fix decrypt path with new skcipher interface

2016-11-30 Thread Ard Biesheuvel
On 30 November 2016 at 13:14, Herbert Xu wrote: > On Tue, Nov 29, 2016 at 01:05:32PM +0000, Ard Biesheuvel wrote: >> The new skcipher walk interface does not take into account whether we >> are encrypting or decrypting. In the latter case, the walk should >> disregard the

Re: [PATCH] crypto: arm/aesbs - fix brokenness after skcipher conversion

2016-11-30 Thread Ard Biesheuvel
> On 30 Nov 2016, at 13:19, Herbert Xu wrote: > >> On Tue, Nov 29, 2016 at 05:23:36PM +, Ard Biesheuvel wrote: >> The CBC encryption routine should use the encryption round keys, not >> the decryption round keys. >> >> Signed-off-by: Ard Bie

[PATCH v2 1/6] crypto: testmgr - avoid overlap in chunked tests

2016-12-04 Thread Ard Biesheuvel
The IDXn offsets are chosen such that tap values (which may go up to 255) end up overlapping in the xbuf allocation. In particular, IDX1 and IDX3 are too close together, so update IDX3 to avoid this issue. Signed-off-by: Ard Biesheuvel --- crypto/testmgr.c | 2 +- 1 file changed, 1 insertion

[PATCH v2 0/6] crypto: ARM/arm64 CRC-T10DIF/CRC32/CRC32C roundup

2016-12-04 Thread Ard Biesheuvel
This v2 combines the CRC-T10DIF and CRC32 implementations for both ARM and arm64 that I sent out a couple of weeks ago, and adds support to the latter for CRC32C. Ard Biesheuvel (6): crypto: testmgr - avoid overlap in chunked tests crypto: testmgr - add/enhance test cases for CRC-T10DIF

[PATCH v2 2/6] crypto: testmgr - add/enhance test cases for CRC-T10DIF

2016-12-04 Thread Ard Biesheuvel
The existing test cases only exercise a small slice of the various possible code paths through the x86 SSE/PCLMULQDQ implementation, and the upcoming ports of it for arm64. So add one that exceeds 256 bytes in size, and convert another to a chunked test. Signed-off-by: Ard Biesheuvel --- crypto

[PATCH v2 3/6] crypto: arm64/crct10dif - port x86 SSE implementation to arm64

2016-12-04 Thread Ard Biesheuvel
: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 5 + arch/arm64/crypto/Makefile| 3 + arch/arm64/crypto/crct10dif-ce-core.S | 317 arch/arm64/crypto/crct10dif-ce-glue.c | 91 ++ 4 files changed, 416 insertions(+) diff --git a/arch/arm64/crypto

[PATCH v2 4/6] crypto: arm/crct10dif - port x86 SSE implementation to ARM

2016-12-04 Thread Ard Biesheuvel
: Ard Biesheuvel --- arch/arm/crypto/Kconfig | 5 + arch/arm/crypto/Makefile| 2 + arch/arm/crypto/crct10dif-ce-core.S | 349 arch/arm/crypto/crct10dif-ce-glue.c | 95 ++ 4 files changed, 451 insertions(+) diff --git a/arch/arm/crypto/Kconfig b

[PATCH v2 5/6] crypto: arm64/crc32 - accelerated support based on x86 SSE implementation

2016-12-04 Thread Ard Biesheuvel
) +{ + crypto_unregister_shashes(crc32_pmull_algs, + ARRAY_SIZE(crc32_pmull_algs)); +} + +module_cpu_feature_match(PMULL, crc32_pmull_mod_init); +module_exit(crc32_pmull_mod_exit); + +MODULE_AUTHOR("Ard Biesheuvel "); +MODULE_LICENSE("GPL v2"); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH v2 6/6] crypto: arm/crc32 - accelerated support based on x86 SSE implementation

2016-12-04 Thread Ard Biesheuvel
ARRAY_SIZE(crc32_pmull_algs)); +} + +static void __exit crc32_pmull_mod_exit(void) +{ + crypto_unregister_shashes(crc32_pmull_algs, + ARRAY_SIZE(crc32_pmull_algs)); +} + +module_init(crc32_pmull_mod_init); +module_exit(crc32_pmull_mod_exit); +

Re: [PATCH v2 0/6] crypto: ARM/arm64 CRC-T10DIF/CRC32/CRC32C roundup

2016-12-05 Thread Ard Biesheuvel
On 4 December 2016 at 11:54, Ard Biesheuvel wrote: > This v2 combines the CRC-T10DIF and CRC32 implementations for both ARM and > arm64 that I sent out a couple of weeks ago, and adds support to the latter > for CRC32C. > Please don't apply yet. There is an issue in the 32-bi

[PATCH v3 1/6] crypto: testmgr - avoid overlap in chunked tests

2016-12-05 Thread Ard Biesheuvel
The IDXn offsets are chosen such that tap values (which may go up to 255) end up overlapping in the xbuf allocation. In particular, IDX1 and IDX3 are too close together, so update IDX3 to avoid this issue. Signed-off-by: Ard Biesheuvel --- crypto/testmgr.c | 2 +- 1 file changed, 1 insertion

[PATCH v3 2/6] crypto: testmgr - add/enhance test cases for CRC-T10DIF

2016-12-05 Thread Ard Biesheuvel
The existing test cases only exercise a small slice of the various possible code paths through the x86 SSE/PCLMULQDQ implementation, and the upcoming ports of it for arm64. So add one that exceeds 256 bytes in size, and convert another to a chunked test. Signed-off-by: Ard Biesheuvel --- crypto

[PATCH v3 0/6] crypto: ARM/arm64 CRC-T10DIF/CRC32/CRC32C roundup

2016-12-05 Thread Ard Biesheuvel
not a multiple of 16 bytes (but they still must be 16 byte aligned) Ard Biesheuvel (6): crypto: testmgr - avoid overlap in chunked tests crypto: testmgr - add/enhance test cases for CRC-T10DIF crypto: arm64/crct10dif - port x86 SSE implementation to arm64 crypto: arm/crct10dif - port x86

[PATCH v3 6/6] crypto: arm/crc32 - accelerated support based on x86 SSE implementation

2016-12-05 Thread Ard Biesheuvel
CRC32 and one for CRC32C. The PMULL/NEON algorithm is faster, but operates on blocks of at least 64 bytes, and on multiples of 16 bytes only. For the remaining input, or for all input on systems that lack the PMULL 64x64->128 instructions, the CRC32 instructions will be used. Signed-off-by:

[PATCH v3 5/6] crypto: arm64/crc32 - accelerated support based on x86 SSE implementation

2016-12-05 Thread Ard Biesheuvel
ull_mod_exit(void) +{ + crypto_unregister_shashes(crc32_pmull_algs, + ARRAY_SIZE(crc32_pmull_algs)); +} + +module_cpu_feature_match(PMULL, crc32_pmull_mod_init); +module_exit(crc32_pmull_mod_exit); + +MODULE_AUTHOR("Ard Biesheuvel "); +MODULE_LICENSE("GPL v2"); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/6] crypto: arm/crct10dif - port x86 SSE implementation to ARM

2016-12-05 Thread Ard Biesheuvel
This is a transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides in the file arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only operate on buffers that are 16 byte aligned (but of any size) Signed-off-by: Ard Biesheuvel --- arch/arm

[PATCH v3 3/6] crypto: arm64/crct10dif - port x86 SSE implementation to arm64

2016-12-05 Thread Ard Biesheuvel
This is a transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides in the file arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only operate on buffers that are 16 byte aligned (but of any size) Signed-off-by: Ard Biesheuvel --- arch/arm64

Re: [PATCH v3 1/6] crypto: testmgr - avoid overlap in chunked tests

2016-12-07 Thread Ard Biesheuvel
On 7 December 2016 at 19:19, Eric Biggers wrote: > On Mon, Dec 05, 2016 at 06:42:23PM +0000, Ard Biesheuvel wrote: >> The IDXn offsets are chosen such that tap values (which may go up to >> 255) end up overlapping in the xbuf allocation. In particular, IDX1 >> and IDX3 are t

[PATCH] crypto: testmgr - fix overlap in chunked tests again

2016-12-08 Thread Ard Biesheuvel
by putting IDX3 within 492 bytes of IDX1, which causes overlap if the first chunk exceeds 492 bytes, which is the case for at least one of the xts(aes) test cases. So increase IDX3 by another 1000 bytes. Signed-off-by: Ard Biesheuvel --- crypto/testmgr.c | 2 +- 1 file changed, 1 insertion(+),

[PATCH 0/2] crypto: arm64/ARM: NEON accelerated ChaCha20

2016-12-08 Thread Ard Biesheuvel
generic C code (measured on Cortex-A57 using the arm64 version) I'm aware that blkciphers are deprecated in favor of skciphers, but this code (like the x86 version) uses the init and setkey routines of the generic version, so it is probably better to port all implementations at once. Ar

[PATCH 1/2] crypto: arm64/chacha20 - implement NEON version based on SSE3 code

2016-12-08 Thread Ard Biesheuvel
This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 6 + arch/arm64/crypto/Makefile | 3 + arch/arm64/crypto/chacha20-neon-core.S | 480

[PATCH 2/2] crypto: arm/chacha20 - implement NEON version based on SSE3 code

2016-12-08 Thread Ard Biesheuvel
This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/Kconfig | 6 + arch/arm/crypto/Makefile | 2 + arch/arm/crypto/chacha20-neon-core.S | 524 arch

[PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-09 Thread Ard Biesheuvel
not* guarantee that those steps produce an exact multiple of the chunk size. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/aesbs-glue.c | 68 +--- 1 file changed, 38 insertions(+), 30 deletions(-) diff --git a/arch/arm/crypto/aesbs-glue.c b/arch/arm/

[PATCH v2 0/3] crypto: arm64/ARM: NEON accelerated ChaCha20 *skcipher*

2016-12-09 Thread Ard Biesheuvel
generic C code (measured on Cortex-A57 using the arm64 version) Changes in v2: - add patch to convert the generic and x86 to skciphers first - tweaked the arm64 version for some additional performance - use chunksize == 4x blocksize for optimal speed Ard Biesheuvel (3): crypto: chacha20 - conve

[PATCH v2 2/3] crypto: arm64/chacha20 - implement NEON version based on SSE3 code

2016-12-09 Thread Ard Biesheuvel
This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 6 + arch/arm64/crypto/Makefile | 3 + arch/arm64/crypto/chacha20-neon-core.S | 450

[PATCH v2 3/3] crypto: arm/chacha20 - implement NEON version based on SSE3 code

2016-12-09 Thread Ard Biesheuvel
This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/Kconfig | 6 + arch/arm/crypto/Makefile | 2 + arch/arm/crypto/chacha20-neon-core.S | 524 arch

[PATCH v2 1/3] crypto: chacha20 - convert generic and x86 versions to skcipher

2016-12-09 Thread Ard Biesheuvel
that all presented blocks except the final one are a multiple of the chunk size, so we can simplify the encrypt() routine somewhat. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/chacha20_glue.c | 69 +- crypto/chacha20_generic.c | 73 include/crypto

[PATCH] crypto: arm64/aes: reimplement bit-sliced ARM/NEON implementation for arm64

2016-12-12 Thread Ard Biesheuvel
tions introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 6 + arch/arm64/crypto/Makefile | 3 + arch/arm64/crypto/aes-neonbs-core.S

[PATCH] crypto: skcipher - fix crash in virtual walk

2016-12-13 Thread Ard Biesheuvel
to be the intention that walk->buffer point to walk->page after skcipher_next_slow(), so ensure that is the case. Signed-off-by: Ard Biesheuvel --- crypto/skcipher.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/crypto/skcipher.c b/crypto/skcipher.c ind

Re: [PATCH] crypto: testmgr: Use linear alias for test input

2016-12-20 Thread Ard Biesheuvel
we should not use sg_init_one() with the address of a kernel symbol). But I will leave it up to Herbert to decide whether he prefers that or not. In any case, Acked-by: Ard Biesheuvel > --- > crypto/testmgr.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > >

Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-24 Thread Ard Biesheuvel
entation anyway, and the base layer was already a huge improvement compared to the open coded implementations of the SHA boilerplate. > Cc: Ard Biesheuvel > Cc: Herbert Xu > Signed-off-by: Andy Lutomirski > --- > arch/arm/crypto/sha2-ce-glue.c | 10 --- > arch/ar

Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-26 Thread Ard Biesheuvel
On 26 December 2016 at 07:57, Herbert Xu wrote: > On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote: >> >> I actually do use incremental hashing later on. BPF currently >> vmallocs() a big temporary buffer just so it can fill it and hash it. >> I change it to hash as it goes. > > H

Re: [PATCH 0/2] crypto: arm64/ARM: NEON accelerated ChaCha20

2016-12-27 Thread Ard Biesheuvel
> On 27 Dec 2016, at 10:04, Herbert Xu wrote: > >> On Thu, Dec 08, 2016 at 02:28:57PM +, Ard Biesheuvel wrote: >> Another port of existing x86 SSE code to NEON, again both for arm64 and ARM. >> >> ChaCha20 is a stream cipher described in RFC 7539, and is

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-27 Thread Ard Biesheuvel
On 27 December 2016 at 08:57, Herbert Xu wrote: > On Fri, Dec 09, 2016 at 01:47:26PM +0000, Ard Biesheuvel wrote: >> The bit-sliced NEON implementation of AES only performs optimally if >> it can process 8 blocks of input in parallel. This is due to the nature >> of bit sl

Re: [PATCH 0/2] crypto: arm64/ARM: NEON accelerated ChaCha20

2016-12-27 Thread Ard Biesheuvel
On 27 December 2016 at 15:36, Jeffrey Walton wrote: >> ChaCha20 is a stream cipher described in RFC 7539, and is intended to be >> an efficient software implementable 'standby cipher', in case AES cannot >> be used. > > That's not quite correct. > > The IETF changed the algorithm a bit, and its no

Re: [PATCH 0/2] crypto: arm64/ARM: NEON accelerated ChaCha20

2016-12-28 Thread Ard Biesheuvel
> On 28 Dec 2016, at 09:03, Herbert Xu wrote: > >> On Tue, Dec 27, 2016 at 02:26:35PM +, Ard Biesheuvel wrote: >> >> You just nacked the v2 of this series (due to the chunksize/walksize) and i >> rewrote them as skciphers as well > > Sorry. Would you

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-28 Thread Ard Biesheuvel
> On 28 Dec 2016, at 09:10, Herbert Xu wrote: > >> On Tue, Dec 27, 2016 at 06:35:46PM +, Ard Biesheuvel wrote: >> >> OK, I will try to hack something up. >> >> One thing to keep in mind though is that stacked chaining modes should >> present the

Re: [PATCH v2 1/3] crypto: chacha20 - convert generic and x86 versions to skcipher

2016-12-28 Thread Ard Biesheuvel
> On 28 Dec 2016, at 09:18, Herbert Xu wrote: > >> On Tue, Dec 27, 2016 at 06:04:52PM +0800, Herbert Xu wrote: >>> On Fri, Dec 09, 2016 at 02:33:51PM +, Ard Biesheuvel wrote: >>> This converts the ChaCha20 code from a blkcipher to a skcipher, which >>>

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-28 Thread Ard Biesheuvel
On 28 December 2016 at 09:23, Herbert Xu wrote: > On Wed, Dec 28, 2016 at 09:19:32AM +0000, Ard Biesheuvel wrote: >> >> Ok, so that implies a field in the skcipher algo struct then, rather than >> some definition internal to the driver? > > Oh yes it should definitely

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-29 Thread Ard Biesheuvel
On 29 December 2016 at 02:23, Herbert Xu wrote: > On Wed, Dec 28, 2016 at 07:50:44PM +0000, Ard Biesheuvel wrote: >> >> So about this chunksize, is it ever expected to assume other values >> than 1 (for stream ciphers) or the block size (for block ciphers)? >> Havi

[RFC PATCH] crypto: skcipher - introduce walksize attribute for SIMD algos

2016-12-29 Thread Ard Biesheuvel
e walksize (in the skcipher case) or from the chunksize (in the AEAD case). Signed-off-by: Ard Biesheuvel --- crypto/skcipher.c | 20 +++- include/crypto/internal/skcipher.h | 2 +- include/crypto/skcipher.h | 34 3 files changed, 47 insert

[PATCH 1/6] crypto: generic/aes - export encrypt and decrypt entry points

2017-01-02 Thread Ard Biesheuvel
implementation of AES in XTS mode for arm64, where using the 8-way cipher (and its ~2 KB expanded key schedule) to generate the initial tweak is suboptimal. Signed-off-by: Ard Biesheuvel --- crypto/aes_generic.c | 10 ++ include/crypto/aes.h | 3 +++ 2 files changed, 9 insertions(+), 4

[PATCH 3/6] crypto: arm/chacha20 - implement NEON version based on SSE3 code

2017-01-02 Thread Ard Biesheuvel
This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/Kconfig | 6 + arch/arm/crypto

[PATCH 0/6] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11

2017-01-02 Thread Ard Biesheuvel
modes. Ard Biesheuvel (6): crypto: generic/aes - export encrypt and decrypt entry points crypto: arm/aes-neonbs - process 8 blocks in parallel if we can crypto: arm/chacha20 - implement NEON version based on SSE3 code crypto: arm64/chacha20 - implement NEON version based on SSE3 code

[PATCH 2/6] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2017-01-02 Thread Ard Biesheuvel
ll available. However, it does *not* guarantee that those steps produce an exact multiple of the walk size. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/aesbs-glue.c | 67 +++- 1 file changed, 38 insertions(+), 29 deletions(-) diff --git a/arch/arm/crypto/aesbs-glue.c b/arc

[PATCH 4/6] crypto: arm64/chacha20 - implement NEON version based on SSE3 code

2017-01-02 Thread Ard Biesheuvel
This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 6 + arch/arm64

[PATCH 5/6] crypto: arm64/aes-blk - expose AES-CTR as synchronous cipher as well

2017-01-02 Thread Ard Biesheuvel
in places where synchronous transforms are required, such as the MAC802.11 encryption code, which executes in sotfirq context, where SIMD processing is allowed on arm64. Users of the async transform will keep the existing behavior. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c

[PATCH 6/6] crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64

2017-01-02 Thread Ard Biesheuvel
tions introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 7 + arch/arm64/crypto/Makefile | 3 + arch/arm64/crypto/aes-neonbs-core.S | 879

Re: [PATCH 1/5] ARM: wire up HWCAP2 feature bits to the CPU modalias

2017-01-02 Thread Ard Biesheuvel
On 31 October 2016 at 16:13, Russell King - ARM Linux wrote: > On Sat, Oct 29, 2016 at 11:08:36AM +0100, Ard Biesheuvel wrote: >> On 18 October 2016 at 11:52, Ard Biesheuvel >> wrote: >> > Wire up the generic support for exposing CPU feature bits via the >> > m

Re: [PATCH 0/6] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11

2017-01-03 Thread Ard Biesheuvel
On 2 January 2017 at 18:21, Ard Biesheuvel wrote: > This series adds SIMD implementations for arm64 and ARM of ChaCha20 (*), > and a port of the ARM bit-sliced AES algorithm to arm64, and > > Patch #1 is a prerequisite for the AES-XTS implementation in #6, which needs > a secondar

[PATCH] crypto: arm64/aes - add scalar implementation

2017-01-04 Thread Ard Biesheuvel
-A57, this code manages 13.0 cycles per byte, which is ~34% faster than the generic C code. (Note that this is still >13x slower than the code that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per byte.) Signed-off-by: Ard Biesheuvel --- Raw performance data after the

Re: [PATCH 1/5] ARM: wire up HWCAP2 feature bits to the CPU modalias

2017-01-04 Thread Ard Biesheuvel
On 2 January 2017 at 23:40, Russell King - ARM Linux wrote: > On Mon, Jan 02, 2017 at 09:06:04PM +0000, Ard Biesheuvel wrote: >> On 31 October 2016 at 16:13, Russell King - ARM Linux >> wrote: >> > On Sat, Oct 29, 2016 at 11:08:36AM +0100, Ard Biesheuvel wrote: >>

[RFT PATCH] crypto: arm/aes - replace scalar AES cipher

2017-01-06 Thread Ard Biesheuvel
: Ard Biesheuvel --- It makes sense to test this on a variety of cores before deciding whether to merge it or not. Test results welcome. (insmod tcrypt.ko mode=200 sec=1) arch/arm/crypto/Kconfig | 20 +-- arch/arm/crypto/Makefile | 4 +- arch/arm/crypto/aes-cipher-core.S | 169

Re: [PATCH 0/6] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11

2017-01-09 Thread Ard Biesheuvel
On 3 January 2017 at 20:01, Ard Biesheuvel wrote: > On 2 January 2017 at 18:21, Ard Biesheuvel wrote: >> This series adds SIMD implementations for arm64 and ARM of ChaCha20 (*), >> and a port of the ARM bit-sliced AES algorithm to arm64, and >> >> Patch #1 is a p

Re: x86-64: Maintain 16-byte stack alignment

2017-01-10 Thread Ard Biesheuvel
On 10 January 2017 at 14:33, Herbert Xu wrote: > I recently applied the patch > > https://patchwork.kernel.org/patch/9468391/ > > and ended up with a boot crash when it tried to run the x86 chacha20 > code. It turned out that the patch changed a manually aligned > stack buffer to one that

Re: x86-64: Maintain 16-byte stack alignment

2017-01-10 Thread Ard Biesheuvel
On 10 January 2017 at 19:00, Andy Lutomirski wrote: > On Tue, Jan 10, 2017 at 9:30 AM, Ard Biesheuvel > wrote: >> On 10 January 2017 at 14:33, Herbert Xu wrote: >>> I recently applied the patch >>> >>> https://patchwork.kernel.org/patch/9468391/

Re: x86-64: Maintain 16-byte stack alignment

2017-01-10 Thread Ard Biesheuvel
On 10 January 2017 at 19:22, Andy Lutomirski wrote: > On Tue, Jan 10, 2017 at 11:16 AM, Ard Biesheuvel > wrote: >> On 10 January 2017 at 19:00, Andy Lutomirski wrote: >>> On Tue, Jan 10, 2017 at 9:30 AM, Ard Biesheuvel >>> wrote: >>>> On 10 Janu

Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Ard Biesheuvel
On 11 January 2017 at 06:53, Linus Torvalds wrote: > > > On Jan 10, 2017 8:36 PM, "Herbert Xu" wrote: > > > Sure we can ban the use of attribute aligned on stacks. But > what about indirect uses through structures? > > > It should be pretty trivial to add a sparse warning for that, though. > Co

Re: crypto: x86/chacha20 - Manually align stack buffer

2017-01-11 Thread Ard Biesheuvel
On 11 January 2017 at 12:08, Herbert Xu wrote: > The kernel on x86-64 cannot use gcc attribute align to align to > a 16-byte boundary. This patch reverts to the old way of aligning > it by hand. > > Incidentally the old way was actually broken in not allocating > enough space and would silently c

Re: [PATCH v2] crypto: x86/chacha20 - Manually align stack buffer

2017-01-11 Thread Ard Biesheuvel
On 11 January 2017 at 12:28, Herbert Xu wrote: > On Wed, Jan 11, 2017 at 12:14:24PM +0000, Ard Biesheuvel wrote: >> >> I think the old code was fine, actually: >> >> u32 *state, state_buf[16 + (CHACHA20_STATE_ALIGN / sizeof(u32)) - 1]; >> >> ends up all

[PATCH v2 3/7] crypto: arm64/aes-blk - expose AES-CTR as synchronous cipher as well

2017-01-11 Thread Ard Biesheuvel
in places where synchronous transforms are required, such as the MAC802.11 encryption code, which executes in sotfirq context, where SIMD processing is allowed on arm64. Users of the async transform will keep the existing behavior. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c

[PATCH v2 0/7] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11

2017-01-11 Thread Ard Biesheuvel
://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git crypto-arm-v4.11 https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/log/?h=crypto-arm-v4.11 Ard Biesheuvel (7): crypto: arm64/chacha20 - implement NEON version based on SSE3 code crypto: arm/chacha20 - implement NEON version

[PATCH v2 5/7] crypto: arm/aes - replace scalar AES cipher

2017-01-11 Thread Ard Biesheuvel
: Ard Biesheuvel --- arch/arm/crypto/Kconfig | 20 +-- arch/arm/crypto/Makefile | 4 +- arch/arm/crypto/aes-cipher-core.S | 179 arch/arm/crypto/aes-cipher-glue.c | 74 arch/arm/crypto/aes_glue.c| 98 --- 5 files changed, 256

[PATCH v2 2/7] crypto: arm/chacha20 - implement NEON version based on SSE3 code

2017-01-11 Thread Ard Biesheuvel
This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/Kconfig | 6 + arch/arm/crypto

[PATCH v2 1/7] crypto: arm64/chacha20 - implement NEON version based on SSE3 code

2017-01-11 Thread Ard Biesheuvel
This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 6 + arch/arm64

[PATCH v2 7/7] crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64

2017-01-11 Thread Ard Biesheuvel
tions introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 7 + arch/arm64/crypto/Makefile | 3 + arch/arm64/crypto/aes-neonbs-core.S | 963

[PATCH v2 4/7] crypto: arm64/aes - add scalar implementation

2017-01-11 Thread Ard Biesheuvel
-A57, this code manages 13.0 cycles per byte, which is ~34% faster than the generic C code. (Note that this is still >13x slower than the code that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per byte.) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/K

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Ard Biesheuvel
On 12 January 2017 at 06:12, Herbert Xu wrote: > On Tue, Jan 10, 2017 at 05:30:48PM +0000, Ard Biesheuvel wrote: >> >> Apologies for introducing this breakage. It seemed like an obvious and >> simple cleanup, so I didn't even bother to mention it in the commit >>

[PATCH] crypto: testmgr - use calculated count for number of test vectors

2017-01-12 Thread Ard Biesheuvel
ORS 2 -> 3 AES_CCM_ENC_TEST_VECTORS 8 -> 14 AES_CCM_DEC_TEST_VECTORS 7 -> 17 AES_CCM_4309_ENC_TEST_VECTORS7 -> 23 AES_CCM_4309_DEC_TEST_VECTORS 10 -> 23 CAMELLIA_CTR_ENC_TEST_VECTORS2 -> 3 CAMELLIA_CTR_DEC_TEST_VECTORS2 -> 3 Signed

Re: [PATCH v2 0/7] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11

2017-01-12 Thread Ard Biesheuvel
On 12 January 2017 at 16:45, Herbert Xu wrote: > On Wed, Jan 11, 2017 at 04:41:48PM +0000, Ard Biesheuvel wrote: >> This adds ARM and arm64 implementations of ChaCha20, scalar AES and SIMD >> AES (using bit slicing). The SIMD algorithms in this series take advantage >>

Re: [cryptodev:master 43/44] arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not support `tt .req ip' in ARM mode

2017-01-12 Thread Ard Biesheuvel
Hi Arnd, On 12 January 2017 at 19:04, kbuild test robot wrote: > tree: > https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git > master > head: 1abee99eafab67fb1c98f9ecfc43cd5735384a86 > commit: 81edb42629758bacdf813dd5e4542ae26e3ad73a [43/44] crypto: arm/aes - > replac

[PATCH] crypto: arm/aes - avoid reserved 'tt' mnemonic in asm code

2017-01-13 Thread Ard Biesheuvel
The ARMv8-M architecture introduces 'tt' and 'ttt' instructions, which means we can no longer use 'tt' as a register alias on recent versions of binutils for ARM. So replace the alias with 'ttab'. Fixes: 81edb4262975 ("crypto: arm/aes - replace scal

Re: [BISECT] ARM build errors on GCC v6.2 (crypto: arm/aes - replace scalar AES cipher)

2017-01-14 Thread Ard Biesheuvel
On 14 January 2017 at 14:24, Krzysztof Kozlowski wrote: > Hi, > > allyesconfig and multi_v7_defconfig fail to build on recent linux-next > on GCC 6.2.0. > > Errors: > ../arch/arm/crypto/aes-cipher-core.S: Assembler messages: > ../arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does

[PATCH] crypto: generic/cts - fix regression in iv handling

2017-01-16 Thread Ard Biesheuvel
While this is usually the case, it is not mandated by the API, and given that the CTS code already accesses the ciphertext scatterlist to retrieve those bytes, we can simply copy them into req->iv before proceeding. Fixes: 0605c41cc53c ("crypto: cts - Convert to skcipher") Sig

Re: [PATCH] crypto: generic/cts - fix regression in iv handling

2017-01-17 Thread Ard Biesheuvel
On 17 January 2017 at 09:11, Herbert Xu wrote: > On Mon, Jan 16, 2017 at 09:16:35AM +0000, Ard Biesheuvel wrote: >> Since the skcipher conversion in commit 0605c41cc53c ("crypto: >> cts - Convert to skcipher"), the cts code tacitly assumes that >> the underlying CBC

Re: [PATCH] crypto: generic/cts - fix regression in iv handling

2017-01-17 Thread Ard Biesheuvel
On 17 January 2017 at 09:25, Herbert Xu wrote: > On Tue, Jan 17, 2017 at 09:20:11AM +0000, Ard Biesheuvel wrote: >> >> So to be clear, it is part of the API that after calling >> crypto_skcipher_encrypt(req), and completing the request, req->iv >> should contain a v

[PATCH] crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes

2017-01-17 Thread Ard Biesheuvel
otherwise, chaining is impossible anyway. Cc: # v3.16+ Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 88 ++-- 1 file changed, 42 insertions(+), 46 deletions(-) diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index c53dbeae79f2

[PATCH 04/10] crypto: arm64/aes-ce-ccm - remove cra_alignmask

2017-01-17 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-glue.c | 1 - 1 file changed, 1 deletion

[PATCH 10/10] crypto: arm64/aes - replace scalar fallback with plain NEON fallback

2017-01-17 Thread Ard Biesheuvel
sensitivity to cache timing attacks. So switch the fallback handling to the plain NEON driver. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 2 +- arch/arm64/crypto/aes-neonbs-glue.c | 38 ++-- 2 files changed, 29 insertions(+), 11 deletions(-) diff

[PATCH 05/10] crypto: arm64/aes-blk - remove cra_alignmask

2017-01-17 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- NOTE: this won't apply unless 'crypto: arm64/aes-blk - hon

[PATCH 08/10] crypto: arm64/aes - performance tweak

2017-01-17 Thread Ard Biesheuvel
Shuffle some instructions around in the __hround macro to shave off 0.1 cycles per byte on Cortex-A57. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-cipher-core.S | 52 +++- 1 file changed, 19 insertions(+), 33 deletions(-) diff --git a/arch/arm64/crypto/aes-cipher

[PATCH 09/10] crypto: arm64/aes-neon-blk - tweak performance for low end cores

2017-01-17 Thread Ard Biesheuvel
constants from memory in every round. To allow the ECB and CBC encrypt routines to be reused by the bitsliced NEON code in a subsequent patch, export them from the module. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 2 + arch/arm64/crypto/aes-neon.S | 199

[PATCH 06/10] crypto: arm64/chacha20 - remove cra_alignmask

2017-01-17 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/chacha20-neon-glue.c | 1 - 1 file changed, 1

[PATCH 07/10] crypto: arm64/aes - avoid literals for cross-module symbol references

2017-01-17 Thread Ard Biesheuvel
KASLR"), which is why the AES code used literals instead. So now we can get rid of the literals, and switch to the adr_l macro. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-cipher-core.S | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/arm64/crypto/aes-cip

<    1   2   3   4   5   6   7   8   9   10   >