Re: [PATCH 0/6] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11

2017-01-09 Thread Ard Biesheuvel
On 3 January 2017 at 20:01, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 2 January 2017 at 18:21, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: >> This series adds SIMD implementations for arm64 and ARM of ChaCha20 (*), >> and a port of the ARM bit-sl

[RFT PATCH] crypto: arm/aes - replace scalar AES cipher

2017-01-06 Thread Ard Biesheuvel
-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- It makes sense to test this on a variety of cores before deciding whether to merge it or not. Test results welcome. (insmod tcrypt.ko mode=200 sec=1) arch/arm/crypto/Kconfig | 20 +-- arch/arm/crypto/Makefile | 4 +- ar

Re: [PATCH 1/5] ARM: wire up HWCAP2 feature bits to the CPU modalias

2017-01-04 Thread Ard Biesheuvel
On 2 January 2017 at 23:40, Russell King - ARM Linux <li...@armlinux.org.uk> wrote: > On Mon, Jan 02, 2017 at 09:06:04PM +0000, Ard Biesheuvel wrote: >> On 31 October 2016 at 16:13, Russell King - ARM Linux >> <li...@armlinux.org.uk> wrote: >> > On Sat, O

[PATCH] crypto: arm64/aes - add scalar implementation

2017-01-04 Thread Ard Biesheuvel
-A57, this code manages 13.0 cycles per byte, which is ~34% faster than the generic C code. (Note that this is still >13x slower than the code that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per byte.) Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org&

Re: [PATCH 0/6] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11

2017-01-03 Thread Ard Biesheuvel
On 2 January 2017 at 18:21, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > This series adds SIMD implementations for arm64 and ARM of ChaCha20 (*), > and a port of the ARM bit-sliced AES algorithm to arm64, and > > Patch #1 is a prerequisite for the AES-XTS implementation

Re: [PATCH 1/5] ARM: wire up HWCAP2 feature bits to the CPU modalias

2017-01-02 Thread Ard Biesheuvel
On 31 October 2016 at 16:13, Russell King - ARM Linux <li...@armlinux.org.uk> wrote: > On Sat, Oct 29, 2016 at 11:08:36AM +0100, Ard Biesheuvel wrote: >> On 18 October 2016 at 11:52, Ard Biesheuvel <ard.biesheu...@linaro.org> >> wrote: >> > Wire up the gene

[PATCH 6/6] crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64

2017-01-02 Thread Ard Biesheuvel
introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/Kconfig | 7 + arch/arm64/crypto/Makefile | 3 + arch/arm64/crypto/aes-neonbs-core.S

[PATCH 5/6] crypto: arm64/aes-blk - expose AES-CTR as synchronous cipher as well

2017-01-02 Thread Ard Biesheuvel
in places where synchronous transforms are required, such as the MAC802.11 encryption code, which executes in sotfirq context, where SIMD processing is allowed on arm64. Users of the async transform will keep the existing behavior. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> ---

[PATCH 4/6] crypto: arm64/chacha20 - implement NEON version based on SSE3 code

2017-01-02 Thread Ard Biesheuvel
This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/K

[PATCH 1/6] crypto: generic/aes - export encrypt and decrypt entry points

2017-01-02 Thread Ard Biesheuvel
implementation of AES in XTS mode for arm64, where using the 8-way cipher (and its ~2 KB expanded key schedule) to generate the initial tweak is suboptimal. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- crypto/aes_generic.c | 10 ++ include/crypto/aes.h | 3 +++ 2 files c

[PATCH 3/6] crypto: arm/chacha20 - implement NEON version based on SSE3 code

2017-01-02 Thread Ard Biesheuvel
This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm/crypto/K

[PATCH 0/6] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11

2017-01-02 Thread Ard Biesheuvel
modes. Ard Biesheuvel (6): crypto: generic/aes - export encrypt and decrypt entry points crypto: arm/aes-neonbs - process 8 blocks in parallel if we can crypto: arm/chacha20 - implement NEON version based on SSE3 code crypto: arm64/chacha20 - implement NEON version based on SSE3 code

[PATCH 2/6] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2017-01-02 Thread Ard Biesheuvel
, it does *not* guarantee that those steps produce an exact multiple of the walk size. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm/crypto/aesbs-glue.c | 67 +++- 1 file changed, 38 insertions(+), 29 deletions(-) diff --git a/arch/arm/crypto/aesbs-glue.c

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-29 Thread Ard Biesheuvel
On 29 December 2016 at 02:23, Herbert Xu <herb...@gondor.apana.org.au> wrote: > On Wed, Dec 28, 2016 at 07:50:44PM +0000, Ard Biesheuvel wrote: >> >> So about this chunksize, is it ever expected to assume other values >> than 1 (for stream ciphers) or the block size (fo

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-28 Thread Ard Biesheuvel
> On 28 Dec 2016, at 09:10, Herbert Xu <herb...@gondor.apana.org.au> wrote: > >> On Tue, Dec 27, 2016 at 06:35:46PM +0000, Ard Biesheuvel wrote: >> >> OK, I will try to hack something up. >> >> One thing to keep in mind though is that stacked

Re: [PATCH 0/2] crypto: arm64/ARM: NEON accelerated ChaCha20

2016-12-28 Thread Ard Biesheuvel
> On 28 Dec 2016, at 09:03, Herbert Xu <herb...@gondor.apana.org.au> wrote: > >> On Tue, Dec 27, 2016 at 02:26:35PM +0000, Ard Biesheuvel wrote: >> >> You just nacked the v2 of this series (due to the chunksize/walksize) and i >> rewrote them as skciphers as

Re: [PATCH 0/2] crypto: arm64/ARM: NEON accelerated ChaCha20

2016-12-27 Thread Ard Biesheuvel
On 27 December 2016 at 15:36, Jeffrey Walton wrote: >> ChaCha20 is a stream cipher described in RFC 7539, and is intended to be >> an efficient software implementable 'standby cipher', in case AES cannot >> be used. > > That's not quite correct. > > The IETF changed the

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-27 Thread Ard Biesheuvel
On 27 December 2016 at 08:57, Herbert Xu <herb...@gondor.apana.org.au> wrote: > On Fri, Dec 09, 2016 at 01:47:26PM +0000, Ard Biesheuvel wrote: >> The bit-sliced NEON implementation of AES only performs optimally if >> it can process 8 blocks of input in parallel. This

Re: [PATCH 0/2] crypto: arm64/ARM: NEON accelerated ChaCha20

2016-12-27 Thread Ard Biesheuvel
> On 27 Dec 2016, at 10:04, Herbert Xu <herb...@gondor.apana.org.au> wrote: > >> On Thu, Dec 08, 2016 at 02:28:57PM +0000, Ard Biesheuvel wrote: >> Another port of existing x86 SSE code to NEON, again both for arm64 and ARM. >> >> ChaCha20 is a

Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-26 Thread Ard Biesheuvel
On 26 December 2016 at 07:57, Herbert Xu wrote: > On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote: >> >> I actually do use incremental hashing later on. BPF currently >> vmallocs() a big temporary buffer just so it can fill it and hash it. >> I

Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-24 Thread Ard Biesheuvel
, and the base layer was already a huge improvement compared to the open coded implementations of the SHA boilerplate. > Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> > Cc: Herbert Xu <herb...@gondor.apana.org.au> > Signed-off-by: Andy Lutomirski <l...@kernel.org> >

Re: [PATCH] crypto: testmgr: Use linear alias for test input

2016-12-20 Thread Ard Biesheuvel
y, and avoid the redundant virt_to_phys(__va()) translation (and add a comment *why* we should not use sg_init_one() with the address of a kernel symbol). But I will leave it up to Herbert to decide whether he prefers that or not. In any case, Acked-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

[PATCH] crypto: skcipher - fix crash in virtual walk

2016-12-13 Thread Ard Biesheuvel
t appears to be the intention that walk->buffer point to walk->page after skcipher_next_slow(), so ensure that is the case. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- crypto/skcipher.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/crypto/skcipher.

[PATCH] crypto: arm64/aes: reimplement bit-sliced ARM/NEON implementation for arm64

2016-12-12 Thread Ard Biesheuvel
introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/Kconfig | 6 + arch/arm64/crypto/Makefile | 3 + arch/arm64/crypto/aes-neonbs-core.S

[PATCH v2 1/3] crypto: chacha20 - convert generic and x86 versions to skcipher

2016-12-09 Thread Ard Biesheuvel
that all presented blocks except the final one are a multiple of the chunk size, so we can simplify the encrypt() routine somewhat. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/x86/crypto/chacha20_glue.c | 69 +- crypto/chacha20_generic.c

[PATCH v2 3/3] crypto: arm/chacha20 - implement NEON version based on SSE3 code

2016-12-09 Thread Ard Biesheuvel
This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm/crypto/Kconfig | 6 + arch/arm/crypto/Makefile | 2 + arch/arm/crypto/chacha20-neon-core.S

[PATCH 0/2] crypto: arm64/ARM: NEON accelerated ChaCha20

2016-12-08 Thread Ard Biesheuvel
code (measured on Cortex-A57 using the arm64 version) I'm aware that blkciphers are deprecated in favor of skciphers, but this code (like the x86 version) uses the init and setkey routines of the generic version, so it is probably better to port all implementations at once. Ard Biesheuvel (2

[PATCH] crypto: testmgr - fix overlap in chunked tests again

2016-12-08 Thread Ard Biesheuvel
by putting IDX3 within 492 bytes of IDX1, which causes overlap if the first chunk exceeds 492 bytes, which is the case for at least one of the xts(aes) test cases. So increase IDX3 by another 1000 bytes. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- crypto/testmgr.c | 2 +- 1

Re: [PATCH v3 1/6] crypto: testmgr - avoid overlap in chunked tests

2016-12-07 Thread Ard Biesheuvel
On 7 December 2016 at 19:19, Eric Biggers <ebigg...@google.com> wrote: > On Mon, Dec 05, 2016 at 06:42:23PM +0000, Ard Biesheuvel wrote: >> The IDXn offsets are chosen such that tap values (which may go up to >> 255) end up overlapping in the xbuf allocation. In particu

[PATCH v3 3/6] crypto: arm64/crct10dif - port x86 SSE implementation to arm64

2016-12-05 Thread Ard Biesheuvel
This is a transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides in the file arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only operate on buffers that are 16 byte aligned (but of any size) Signed-off-by: Ard Biesheuvel <ard.bies

[PATCH v3 4/6] crypto: arm/crct10dif - port x86 SSE implementation to ARM

2016-12-05 Thread Ard Biesheuvel
This is a transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides in the file arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only operate on buffers that are 16 byte aligned (but of any size) Signed-off-by: Ard Biesheuvel <ard.bies

[PATCH v3 6/6] crypto: arm/crc32 - accelerated support based on x86 SSE implementation

2016-12-05 Thread Ard Biesheuvel
shes(crc32_pmull_algs, + ARRAY_SIZE(crc32_pmull_algs)); +} + +static void __exit crc32_pmull_mod_exit(void) +{ + crypto_unregister_shashes(crc32_pmull_algs, + ARRAY_SIZE(crc32_pmull_algs)); +} + +module_init(crc32_pmull_mod_init); +

[PATCH v3 5/6] crypto: arm64/crc32 - accelerated support based on x86 SSE implementation

2016-12-05 Thread Ard Biesheuvel
o_register_shashes(crc32_pmull_algs, + ARRAY_SIZE(crc32_pmull_algs)); +} + +static void __exit crc32_pmull_mod_exit(void) +{ + crypto_unregister_shashes(crc32_pmull_algs, + ARRAY_SIZE(crc32_pmull_algs)); +} + +module_cpu_fe

[PATCH v3 0/6] crypto: ARM/arm64 CRC-T10DIF/CRC32/CRC32C roundup

2016-12-05 Thread Ard Biesheuvel
that are not a multiple of 16 bytes (but they still must be 16 byte aligned) Ard Biesheuvel (6): crypto: testmgr - avoid overlap in chunked tests crypto: testmgr - add/enhance test cases for CRC-T10DIF crypto: arm64/crct10dif - port x86 SSE implementation to arm64 crypto: arm/crct10dif - port x86

[PATCH v3 2/6] crypto: testmgr - add/enhance test cases for CRC-T10DIF

2016-12-05 Thread Ard Biesheuvel
The existing test cases only exercise a small slice of the various possible code paths through the x86 SSE/PCLMULQDQ implementation, and the upcoming ports of it for arm64. So add one that exceeds 256 bytes in size, and convert another to a chunked test. Signed-off-by: Ard Biesheuvel <ard.bies

[PATCH v3 1/6] crypto: testmgr - avoid overlap in chunked tests

2016-12-05 Thread Ard Biesheuvel
The IDXn offsets are chosen such that tap values (which may go up to 255) end up overlapping in the xbuf allocation. In particular, IDX1 and IDX3 are too close together, so update IDX3 to avoid this issue. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- crypto/testmgr.c | 2

Re: [PATCH v2 0/6] crypto: ARM/arm64 CRC-T10DIF/CRC32/CRC32C roundup

2016-12-05 Thread Ard Biesheuvel
On 4 December 2016 at 11:54, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > This v2 combines the CRC-T10DIF and CRC32 implementations for both ARM and > arm64 that I sent out a couple of weeks ago, and adds support to the latter > for CRC32C. > Please don't apply yet.

[PATCH v2 6/6] crypto: arm/crc32 - accelerated support based on x86 SSE implementation

2016-12-04 Thread Ard Biesheuvel
; (HWCAP2_PMULL|HWCAP2_CRC32))) + return -ENODEV; + + return crypto_register_shashes(crc32_pmull_algs, + ARRAY_SIZE(crc32_pmull_algs)); +} + +static void __exit crc32_pmull_mod_exit(void) +{ + crypto_unregister_shashes(crc32_pmu

[PATCH v2 5/6] crypto: arm64/crc32 - accelerated support based on x86 SSE implementation

2016-12-04 Thread Ard Biesheuvel
rypto_register_shashes(crc32_pmull_algs, + ARRAY_SIZE(crc32_pmull_algs)); +} + +static void __exit crc32_pmull_mod_exit(void) +{ + crypto_unregister_shashes(crc32_pmull_algs, + ARRAY_SIZE(crc32_pmull_algs)); +} + +module_cpu_feature_match

[PATCH v2 2/6] crypto: testmgr - add/enhance test cases for CRC-T10DIF

2016-12-04 Thread Ard Biesheuvel
The existing test cases only exercise a small slice of the various possible code paths through the x86 SSE/PCLMULQDQ implementation, and the upcoming ports of it for arm64. So add one that exceeds 256 bytes in size, and convert another to a chunked test. Signed-off-by: Ard Biesheuvel <ard.bies

[PATCH v2 3/6] crypto: arm64/crct10dif - port x86 SSE implementation to arm64

2016-12-04 Thread Ard Biesheuvel
-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/Kconfig | 5 + arch/arm64/crypto/Makefile| 3 + arch/arm64/crypto/crct10dif-ce-core.S | 317 arch/arm64/crypto/crct10dif-ce-glue.c | 91 ++ 4 files changed, 416 insertions(+)

[PATCH v2 4/6] crypto: arm/crct10dif - port x86 SSE implementation to ARM

2016-12-04 Thread Ard Biesheuvel
-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm/crypto/Kconfig | 5 + arch/arm/crypto/Makefile| 2 + arch/arm/crypto/crct10dif-ce-core.S | 349 arch/arm/crypto/crct10dif-ce-glue.c | 95 ++ 4 files changed, 451 insertions(+) diff

[PATCH v2 1/6] crypto: testmgr - avoid overlap in chunked tests

2016-12-04 Thread Ard Biesheuvel
The IDXn offsets are chosen such that tap values (which may go up to 255) end up overlapping in the xbuf allocation. In particular, IDX1 and IDX3 are too close together, so update IDX3 to avoid this issue. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- crypto/testmgr.c | 2

[PATCH v2 0/6] crypto: ARM/arm64 CRC-T10DIF/CRC32/CRC32C roundup

2016-12-04 Thread Ard Biesheuvel
This v2 combines the CRC-T10DIF and CRC32 implementations for both ARM and arm64 that I sent out a couple of weeks ago, and adds support to the latter for CRC32C. Ard Biesheuvel (6): crypto: testmgr - avoid overlap in chunked tests crypto: testmgr - add/enhance test cases for CRC-T10DIF

Re: [PATCH] crypto: arm/aesbs - fix brokenness after skcipher conversion

2016-11-30 Thread Ard Biesheuvel
> On 30 Nov 2016, at 13:19, Herbert Xu <herb...@gondor.apana.org.au> wrote: > >> On Tue, Nov 29, 2016 at 05:23:36PM +0000, Ard Biesheuvel wrote: >> The CBC encryption routine should use the encryption round keys, not >> the decryption round keys. >&g

Re: [PATCH 3/4] crypto: arm64/aes-ce-ccm - fix decrypt path with new skcipher interface

2016-11-30 Thread Ard Biesheuvel
On 30 November 2016 at 13:14, Herbert Xu <herb...@gondor.apana.org.au> wrote: > On Tue, Nov 29, 2016 at 01:05:32PM +0000, Ard Biesheuvel wrote: >> The new skcipher walk interface does not take into account whether we >> are encrypting or decrypting. In the latter case, the wal

[PATCH] crypto: arm/aesbs - fix brokenness after skcipher conversion

2016-11-29 Thread Ard Biesheuvel
The CBC encryption routine should use the encryption round keys, not the decryption round keys. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- Another fix for the queued changes, this time for 32-bit ARM. I must say, I'm not impressed with the level of testing that ha

[PATCH 3/4] crypto: arm64/aes-ce-ccm - fix decrypt path with new skcipher interface

2016-11-29 Thread Ard Biesheuvel
The new skcipher walk interface does not take into account whether we are encrypting or decrypting. In the latter case, the walk should disregard the MAC. Fix this in the arm64 CE driver. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/aes-ce-ccm-glue

[PATCH 2/4] crypto: skcipher - fix crash in skcipher_walk_aead()

2016-11-29 Thread Ard Biesheuvel
. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- crypto/skcipher.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/crypto/skcipher.c b/crypto/skcipher.c index 0f3071991b13..5367f817b40e 100644 --- a/crypto/skcipher.c +++ b/crypto/skcipher.c @@ -506,6 +506,8 @@ int skcipher_wal

[PATCH 4/4] crypto: arm64/aes-ce-ctr: fix skcipher conversion

2016-11-29 Thread Ard Biesheuvel
Fix a missing statement that got lost in the skcipher conversion of the CTR transform. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/aes-glue.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c

[PATCH 1/4] crypto: arm/aes-ce: fix broken monolithic build

2016-11-29 Thread Ard Biesheuvel
): first defined here Fix this by making aes_simd_algs 'static'. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/aes-glue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c

Re: [PATCH 4/4] crypto: arm/crct10dif - port x86 SSE implementation to ARM

2016-11-28 Thread Ard Biesheuvel
On 28 November 2016 at 14:17, Herbert Xu <herb...@gondor.apana.org.au> wrote: > On Thu, Nov 24, 2016 at 05:32:42PM +0000, Ard Biesheuvel wrote: >> On 24 November 2016 at 15:43, Ard Biesheuvel <ard.biesheu...@linaro.org> >> wrote: >> > This is a straight tr

[PATCH] crypto: arm64/sha2: add generated .S files to .gitignore

2016-11-28 Thread Ard Biesheuvel
Add the files that are generated by the recently merged OpenSSL SHA-256/512 implementation to .gitignore so Git disregards them when showing untracked files. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/.gitignore | 2 ++ 1 file changed, 2 insertions(+)

Re: [PATCH v4] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-28 Thread Ard Biesheuvel
On 28 November 2016 at 13:05, Will Deacon <will.dea...@arm.com> wrote: > On Sun, Nov 20, 2016 at 11:42:01AM +0000, Ard Biesheuvel wrote: >> This integrates both the accelerated scalar and the NEON implementations >> of SHA-224/256 as well as SHA-384/512 from the OpenSSL pr

Re: [PATCH v4] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-28 Thread Ard Biesheuvel
On 20 November 2016 at 11:43, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 20 November 2016 at 11:42, Ard Biesheuvel <ard.biesheu...@linaro.org> > wrote: >> This integrates both the accelerated scalar and the NEON implementations >> of SHA-224/256 as well a

[PATCH 0/2] CRC32 for ARM and arm64 using PMULL and CRC instructions

2016-11-26 Thread Ard Biesheuvel
last Thursday. https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/log/?h=crc32 Ard Biesheuvel (2): crypto: arm64/crc32 - accelerated support based on x86 SSE implementation crypto: arm/crc32 - accelerated support based on x86 SSE implementation arch/arm/crypto/Kconfig

[PATCH 1/2] crypto: arm64/crc32 - accelerated support based on x86 SSE implementation

2016-11-26 Thread Ard Biesheuvel
on blocks of at least 64 bytes, and on multiples of 16 bytes only. For the remaining input, or for all input on systems that lack the PMULL 64x64->128 instructions, the CRC32 instructions will be used. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypt

[PATCH 2/2] crypto: arm/crc32 - accelerated support based on x86 SSE implementation

2016-11-26 Thread Ard Biesheuvel
on blocks of at least 64 bytes, and on multiples of 16 bytes only. For the remaining input, or for all input on systems that lack the PMULL 64x64->128 instructions, the CRC32 instructions will be used. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm/crypto/Kconfig

Re: [PATCH 4/4] crypto: arm/crct10dif - port x86 SSE implementation to ARM

2016-11-24 Thread Ard Biesheuvel
On 24 November 2016 at 15:43, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > This is a straight transliteration of the Intel algorithm implemented > using SSE and PCLMULQDQ instructions that resides under in the file > arch/x86/crypto/crct10dif-pcl-asm_64.S. > > Signed-o

[PATCH 4/4] crypto: arm/crct10dif - port x86 SSE implementation to ARM

2016-11-24 Thread Ard Biesheuvel
This is a straight transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides under in the file arch/x86/crypto/crct10dif-pcl-asm_64.S. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm/crypto/Kconfig

[PATCH 2/4] crypto: testmgr - add/enhance test cases for CRC-T10DIF

2016-11-24 Thread Ard Biesheuvel
The existing test cases only exercise a small slice of the various possible code paths through the x86 SSE/PCLMULQDQ implementation, and the upcoming ports of it for arm64. So add one that exceeds 256 bytes in size, and convert another to a chunked test. Signed-off-by: Ard Biesheuvel <ard.bies

[PATCH 3/4] crypto: arm64/crct10dif - port x86 SSE implementation to arm64

2016-11-24 Thread Ard Biesheuvel
This is a straight transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides under in the file arch/x86/crypto/crct10dif-pcl-asm_64.S. Suggested-by: YueHaibing <yuehaib...@huawei.com> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org&g

[PATCH 0/4] crypto: CRCT10DIF support for ARM and arm64

2016-11-24 Thread Ard Biesheuvel
how the ARM code deviates from the arm64 code. NOTE: this code uses the 64x64->128 bit polynomial multiply instruction, which is only available on cores that implement the v8 Crypto Extensions. Ard Biesheuvel (4): crypto: testmgr - avoid overlap in chunked tests crypto: testmgr - add/enha

[PATCH 1/4] crypto: testmgr - avoid overlap in chunked tests

2016-11-24 Thread Ard Biesheuvel
The IDXn offsets are chosen such that tap values (which may go up to 255) end up overlapping in the xbuf allocation. In particular, IDX1 and IDX3 are too close together, so update IDX3 to avoid this issue. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- crypto/testmgr.c | 2

Re: [PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation

2016-11-22 Thread Ard Biesheuvel
On 22 November 2016 at 12:53, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 22 November 2016 at 10:14, YueHaibing <yuehaib...@huawei.com> wrote: >> This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8 >> NEON instruction.The config CRYPTO_CRCT

Re: [PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation

2016-11-22 Thread Ard Biesheuvel
On 22 November 2016 at 10:14, YueHaibing wrote: > This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8 > NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned > on to enable the feature.The crc_t10dif crypto library function will > use this faster

Re: [PATCH v4] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-20 Thread Ard Biesheuvel
On 20 November 2016 at 11:42, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > This integrates both the accelerated scalar and the NEON implementations > of SHA-224/256 as well as SHA-384/512 from the OpenSSL project. > > Relative performance compared to the respective

Re: [PATCH v3] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-13 Thread Ard Biesheuvel
On 13 November 2016 at 15:12, Andy Polyakov wrote: >> (+ Andy) >> >> ... >>> >>> Looking at the generated code, I see references to __ARMEB__ and >> __ILP32__. >>> The former is probably a bug, > > Oh! You mean that it should be __AARCH64EB__/__AARCH64EL__! Indeed: $

Re: [PATCH] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-12 Thread Ard Biesheuvel
On 11 November 2016 at 20:56, Will Deacon <will.dea...@arm.com> wrote: > On Fri, Nov 11, 2016 at 09:51:13PM +0800, Ard Biesheuvel wrote: >> This integrates both the accelerated scalar and the NEON implementations >> of SHA-224/256 as well as SHA-384/512 from the OpenSSL pr

[PATCH] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-11 Thread Ard Biesheuvel
instructions. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- This supersedes the SHA-256-NEON-only patch I sent out about 6 weeks ago. Will, Catalin: note that this pulls in a .pl script, and adds a build rule locally in arch/arm64/crypto to generate .S files on the fly from Perl scri

Re: [PATCH 1/5] ARM: wire up HWCAP2 feature bits to the CPU modalias

2016-10-29 Thread Ard Biesheuvel
On 18 October 2016 at 11:52, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > Wire up the generic support for exposing CPU feature bits via the > modalias in /sys/device/system/cpu. This allows udev to automatically > load modules for things like crypto algorithms that are impl

Re: [PATCH v2 0/8] crypto: ARM/arm64 - big endian fixes

2016-10-19 Thread Ard Biesheuvel
On 19 October 2016 at 09:46, Will Deacon <will.dea...@arm.com> wrote: > On Wed, Oct 19, 2016 at 11:03:33AM +0800, Herbert Xu wrote: >> On Tue, Oct 18, 2016 at 01:14:38PM +0100, Ard Biesheuvel wrote: >> > On 18 October 2016 at 12:49, Catalin Marinas <catalin.m

Re: [PATCH v2 0/8] crypto: ARM/arm64 - big endian fixes

2016-10-18 Thread Ard Biesheuvel
On 18 October 2016 at 12:49, Catalin Marinas <catalin.mari...@arm.com> wrote: > On Tue, Oct 11, 2016 at 07:15:12PM +0100, Ard Biesheuvel wrote: >> As it turns out, none of the accelerated crypto routines under >> arch/arm64/crypto >> currently work, or have ever

Re: [PATCH v2 0/8] crypto: ARM/arm64 - big endian fixes

2016-10-18 Thread Ard Biesheuvel
On 11 October 2016 at 19:15, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > As it turns out, none of the accelerated crypto routines under > arch/arm64/crypto > currently work, or have ever worked correctly when built for big endian. So > this > series fixes all

[PATCH 3/5] crypto: arm/ghash-ce - enable module autoloading based on CPU feature bits

2016-10-18 Thread Ard Biesheuvel
Make the module autoloadable by tying it to the CPU feature bit that describes whether the optional instructions it relies on are implemented by the current CPU. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm/crypto/ghash-ce-glue.c | 6 ++ 1 file changed, 2 inse

[PATCH 0/5] ARM: add module autoloading support for crypto modules

2016-10-18 Thread Ard Biesheuvel
: patches #2 - #5 all depend on #1, which requires an ack from Russell, so please don't pull anything until #1 has been acked and/or merged. Ard Biesheuvel (5): ARM: wire up HWCAP2 feature bits to the CPU modalias crypto: arm/aes-ce - enable module autoloading based on CPU feature bits crypto

[PATCH 1/5] ARM: wire up HWCAP2 feature bits to the CPU modalias

2016-10-18 Thread Ard Biesheuvel
Wire up the generic support for exposing CPU feature bits via the modalias in /sys/device/system/cpu. This allows udev to automatically load modules for things like crypto algorithms that are implemented using optional instructions. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.

[PATCH 2/5] crypto: arm/aes-ce - enable module autoloading based on CPU feature bits

2016-10-18 Thread Ard Biesheuvel
Make the module autoloadable by tying it to the CPU feature bit that describes whether the optional instructions it relies on are implemented by the current CPU. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm/crypto/aes-ce-glue.c | 5 ++--- 1 file changed, 2 inse

[PATCH 4/5] crypto: arm/sha1-ce - enable module autoloading based on CPU feature bits

2016-10-18 Thread Ard Biesheuvel
Make the module autoloadable by tying it to the CPU feature bit that describes whether the optional instructions it relies on are implemented by the current CPU. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm/crypto/sha1-ce-glue.c | 5 ++--- 1 file changed, 2 inse

[PATCH 5/5] crypto: arm/sha2-ce - enable module autoloading based on CPU feature bits

2016-10-18 Thread Ard Biesheuvel
Make the module autoloadable by tying it to the CPU feature bit that describes whether the optional instructions it relies on are implemented by the current CPU. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm/crypto/sha2-ce-glue.c | 5 ++--- 1 file changed, 2 inse

[PATCH v2 6/8] crypto: arm64/aes-neon - fix for big endian

2016-10-11 Thread Ard Biesheuvel
rypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/aes-neon.S | 25 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/arm64/crypto/aes-neon.S b/arch/arm64/crypto/aes-neon.S index b93170e1cc93..85f

[PATCH v2 7/8] crypto: arm64/aes-xts-ce: fix for big endian

2016-10-11 Thread Ard Biesheuvel
Emit the XTS tweak literal constants in the appropriate order for a single 128-bit scalar literal load. Fixes: 49788fe2a128 ("arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64

[PATCH v2 8/8] crypto: arm/aes-ce - fix for big endian

2016-10-11 Thread Ard Biesheuvel
("crypto: arm - AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm/crypto/aes-ce-glue.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/arm/crypto/aes-ce-glue.c b/arch/arm/crypto/aes

[PATCH v2 5/8] crypto: arm64/aes-ccm-ce: fix for big endian

2016-10-11 Thread Ard Biesheuvel
endian builds. So fix both issues. Fixes: 12ac3efe74f8 ("arm64/crypto: use crypto instructions to generate AES key schedule") Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/aes-ce-ccm-core.S | 53 ++-- 1 file changed, 27 insertions(

[PATCH v2 0/8] crypto: ARM/arm64 - big endian fixes

2016-10-11 Thread Ard Biesheuvel
with the generic AES key schedule generation code (which it currently no longer uses) In any case, please apply with cc to stable. Ard Biesheuvel (8): crypto: arm64/aes-ce - fix for big endian crypto: arm64/ghash-ce - fix for big endian crypto: arm64/sha1-ce - fix for big endian crypto

[PATCH v2 1/8] crypto: arm64/aes-ce - fix for big endian

2016-10-11 Thread Ard Biesheuvel
, when loading the combining the input key with the round constants. So fix both issues. Fixes: 12ac3efe74f8 ("arm64/crypto: use crypto instructions to generate AES key schedule") Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/aes-

[PATCH v2 4/8] crypto: arm64/sha2-ce - fix for big endian

2016-10-11 Thread Ard Biesheuvel
A-256 using ARMv8 Crypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/sha2-ce-core.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S index 5df9d9d470

Re: [PATCH 0/6] crypto: arm64 - big endian fixes

2016-10-11 Thread Ard Biesheuvel
On 11 October 2016 at 03:12, Herbert Xu <herb...@gondor.apana.org.au> wrote: > On Mon, Oct 10, 2016 at 12:26:00PM +0100, Ard Biesheuvel wrote: >> >> /* This piece of crap needs to disappear into per-type test hooks. */ >> if (!((type

Re: [PATCH 0/6] crypto: arm64 - big endian fixes

2016-10-10 Thread Ard Biesheuvel
On 9 October 2016 at 18:42, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > As it turns out, none of the accelerated crypto routines under > arch/arm64/crypto > currently work, or have ever worked correctly when built for big endian. So > this > series fixes

[PATCH 5/6] crypto: arm64/aes-ccm-ce: fix for big endian

2016-10-09 Thread Ard Biesheuvel
endian builds. So fix both issues. Fixes: 12ac3efe74f8 ("arm64/crypto: use crypto instructions to generate AES key schedule") Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/aes-ce-ccm-core.S | 53 ++-- 1 file changed, 27 insertions(

[PATCH 3/6] crypto: arm64/sha1-ce - fix for big endian

2016-10-09 Thread Ard Biesheuvel
ARMv8 Crypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/sha1-ce-core.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S index 033aae6d732a..c98

[PATCH 4/6] crypto: arm64/sha2-ce - fix for big endian

2016-10-09 Thread Ard Biesheuvel
A-256 using ARMv8 Crypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/sha2-ce-core.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S index 5df9d9d470

[PATCH 0/6] crypto: arm64 - big endian fixes

2016-10-09 Thread Ard Biesheuvel
to stable. Ard Biesheuvel (6): crypto: arm64/aes-ce - fix for big endian crypto: arm64/ghash-ce - fix for big endian crypto: arm64/sha1-ce - fix for big endian crypto: arm64/sha2-ce - fix for big endian crypto: arm64/aes-ccm-ce: fix for big endian crypto: arm64/aes-neon - fix for big endian

[PATCH 1/6] crypto: arm64/aes-ce - fix for big endian

2016-10-09 Thread Ard Biesheuvel
, when loading the combining the input key with the round constants. So fix both issues. Fixes: 12ac3efe74f8 ("arm64/crypto: use crypto instructions to generate AES key schedule") Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/aes-

[PATCH 2/6] crypto: arm64/ghash-ce - fix for big endian

2016-10-09 Thread Ard Biesheuvel
rithm") Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/ghash-ce-core.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index dc457015884e..f0bb9f0b524f 100644

Re: [PATCH] crypto: arm64/sha256 - add support for SHA256 using NEON instructions

2016-10-01 Thread Ard Biesheuvel
On 29 September 2016 at 16:37, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 29 September 2016 at 15:51, Ard Biesheuvel <ard.biesheu...@linaro.org> > wrote: >> This is a port to arm64 of the NEON implementation of SHA256 that lives >> under arch/arm

Re: [PATCH] crypto: arm64/sha256 - add support for SHA256 using NEON instructions

2016-09-29 Thread Ard Biesheuvel
On 29 September 2016 at 15:51, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > This is a port to arm64 of the NEON implementation of SHA256 that lives > under arch/arm/crypto. > > Due to the fact that the AArch64 assembler dialect deviates from the > 32-bit ARM one in wa

[PATCH] crypto: arm64/sha256 - add support for SHA256 using NEON instructions

2016-09-29 Thread Ard Biesheuvel
the original implementation supports plain ALU assembler, NEON and Crypto Extensions, this code is built from a version sha256-armv4.pl that has been transliterated to the AArch64 NEON dialect. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/Kconfig

[PATCH 2/2] crypto: arm64/aes-ctr: fix NULL dereference in tail processing

2016-09-13 Thread Ard Biesheuvel
ing for nbytes != 0, check for (walk.nbytes % AES_BLOCK_SIZE) != 0, which implies the former in non-error conditions. Fixes: 49788fe2a128 ("arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions") Reported-by: xiakaixu <xiaka...@huawei.com> Signed-off-by: Ard Bies

Re: Kernel panic - encryption/decryption failed when open file on Arm64

2016-09-13 Thread Ard Biesheuvel
On 13 September 2016 at 07:43, Herbert Xu <herb...@gondor.apana.org.au> wrote: > On Mon, Sep 12, 2016 at 06:40:15PM +0100, Ard Biesheuvel wrote: >> >> So to me, it seems like we should be taking the blkcipher_next_slow() >> path, which does a kmalloc() and bails

<    1   2   3   4   5   6   7   8   9   >