Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm
Hi David, > While not part of this change, the unrolled loops look as though > they just destroy the cpu cache. > I'd like be convinced that anything does CRC over long enough buffers > to make it a gain at all. btrfs data checksumming is one area. > With modern (not that modern now) superscalar cpus you can often > get the loop instructions 'for free'. A branch on POWER8 is a three cycle redirect. The vpmsum instructions are 6 cycles. > Sometimes pipelining the loop is needed to get full throughput. > Unlike the IP checksum, you don't even have to 'loop carry' the > cpu carry flag. It went through quite a lot of simulation to reach peak performance. The loop is quite delicate, we have to pace it just right to avoid some pipeline reject conditions. Note also that we already modulo schedule the loop across three iterations, required to hide the latency of the vpmsum instructions. Anton
Re: [PATCH] crypto: powerpc - Fix initialisation of crc32c context
Hi Daniel, > Turning on crypto self-tests on a POWER8 shows: > > alg: hash: Test 1 failed for crc32c-vpmsum > : ff ff ff ff > > Comparing the code with the Intel CRC32c implementation on which > ours is based shows that we are doing an init with 0, not ~0 > as CRC32c requires. > > This probably wasn't caught because btrfs does its own weird > open-coded initialisation. > > Initialise our internal context to ~0 on init. > > This makes the self-tests pass, and btrfs continues to work. Thanks! Not sure how I screwed that up. Acked-by: Anton Blanchard > Fixes: 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c") > Cc: Anton Blanchard > Cc: sta...@vger.kernel.org > Signed-off-by: Daniel Axtens > --- > arch/powerpc/crypto/crc32c-vpmsum_glue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/crypto/crc32c-vpmsum_glue.c > b/arch/powerpc/crypto/crc32c-vpmsum_glue.c index > 9fa046d56eba..411994551afc 100644 --- > a/arch/powerpc/crypto/crc32c-vpmsum_glue.c +++ > b/arch/powerpc/crypto/crc32c-vpmsum_glue.c @@ -52,7 +52,7 @@ static > int crc32c_vpmsum_cra_init(struct crypto_tfm *tfm) { > u32 *key = crypto_tfm_ctx(tfm); > > - *key = 0; > + *key = ~0; > > return 0; > }
Re: [PATCH] crypto: powerpc - Rename CRYPT_CRC32C_VPMSUM option
Hi Jean, > For consistency with the other 246 kernel configuration options, > rename CRYPT_CRC32C_VPMSUM to CRYPTO_CRC32C_VPMSUM. Thanks! Not sure how I missed that. Acked-by: Anton Blanchard Anton -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix crypto/vmx/p8_ghash memory corruption
Hi Marcelo > This series fixes the memory corruption found by Jan Stancek in > 4.8-rc7. The problem however also affects previous versions of the > driver. If it affects previous versions, please add the lines in the sign off to get it into the stable kernels. Anton -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading
Hi Michael, > Is VEC_CRYPTO the right feature? > > That's new power8 crypto stuff. The vpmsum* instructions are part of the same pipeline as the vcipher* instructions, introduced in POWER8. > I thought this only used VMX? (but I haven't looked closely) Yes, vcipher* and vpmsum* are VMX instructions. Anton -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading
From: Anton Blanchard This patch utilises the GENERIC_CPU_AUTOPROBE infrastructure to automatically load the crc32c-vpmsum module if the CPU supports it. Signed-off-by: Anton Blanchard --- arch/powerpc/crypto/crc32c-vpmsum_glue.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/crypto/crc32c-vpmsum_glue.c b/arch/powerpc/crypto/crc32c-vpmsum_glue.c index bfe3d37..9fa046d 100644 --- a/arch/powerpc/crypto/crc32c-vpmsum_glue.c +++ b/arch/powerpc/crypto/crc32c-vpmsum_glue.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #define CHKSUM_BLOCK_SIZE 1 @@ -157,7 +158,7 @@ static void __exit crc32c_vpmsum_mod_fini(void) crypto_unregister_shash(&alg); } -module_init(crc32c_vpmsum_mod_init); +module_cpu_feature_match(PPC_MODULE_FEATURE_VEC_CRYPTO, crc32c_vpmsum_mod_init); module_exit(crc32c_vpmsum_mod_fini); MODULE_AUTHOR("Anton Blanchard "); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] powerpc: define FUNC_START/FUNC_END
From: Anton Blanchard gcc provides FUNC_START/FUNC_END macros to help with creating assembly functions. Mirror these in the kernel so we can more easily share code between userspace and the kernel. FUNC_END is just a stub since we don't currently annotate the end of kernel functions. It might make sense to do a wholesale search and replace, but for now just create a couple of defines. Signed-off-by: Anton Blanchard --- arch/powerpc/include/asm/ppc_asm.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index 7b591f9..7a924da 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -286,6 +286,9 @@ n: #endif +#define FUNC_START(name) _GLOBAL(name) +#define FUNC_END(name) + /* * LOAD_REG_IMMEDIATE(rn, expr) * Loads the value of the constant expression 'expr' into register 'rn' -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] crypto: powerpc: Add POWER8 optimised crc32c
From: Anton Blanchard Use the vector polynomial multiply-sum instructions in POWER8 to speed up crc32c. This is just over 41x faster than the slice-by-8 method that it replaces. Measurements on a 4.1 GHz POWER8 show it sustaining 52 GiB/sec. A simple btrfs write performance test: dd if=/dev/zero of=/mnt/tmpfile bs=1M count=4096 sync is over 3.7x faster. Signed-off-by: Anton Blanchard --- arch/powerpc/crypto/Makefile |2 + arch/powerpc/crypto/crc32c-vpmsum_asm.S | 1553 ++ arch/powerpc/crypto/crc32c-vpmsum_glue.c | 167 arch/powerpc/include/asm/ppc-opcode.h| 12 + crypto/Kconfig | 11 + 5 files changed, 1745 insertions(+) create mode 100644 arch/powerpc/crypto/crc32c-vpmsum_asm.S create mode 100644 arch/powerpc/crypto/crc32c-vpmsum_glue.c diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile index 9c221b6..7998c17 100644 --- a/arch/powerpc/crypto/Makefile +++ b/arch/powerpc/crypto/Makefile @@ -9,9 +9,11 @@ obj-$(CONFIG_CRYPTO_MD5_PPC) += md5-ppc.o obj-$(CONFIG_CRYPTO_SHA1_PPC) += sha1-powerpc.o obj-$(CONFIG_CRYPTO_SHA1_PPC_SPE) += sha1-ppc-spe.o obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o +obj-$(CONFIG_CRYPT_CRC32C_VPMSUM) += crc32c-vpmsum.o aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o aes-spe-glue.o md5-ppc-y := md5-asm.o md5-glue.o sha1-powerpc-y := sha1-powerpc-asm.o sha1.o sha1-ppc-spe-y := sha1-spe-asm.o sha1-spe-glue.o sha256-ppc-spe-y := sha256-spe-asm.o sha256-spe-glue.o +crc32c-vpmsum-y := crc32c-vpmsum_asm.o crc32c-vpmsum_glue.o diff --git a/arch/powerpc/crypto/crc32c-vpmsum_asm.S b/arch/powerpc/crypto/crc32c-vpmsum_asm.S new file mode 100644 index 000..dc640b2 --- /dev/null +++ b/arch/powerpc/crypto/crc32c-vpmsum_asm.S @@ -0,0 +1,1553 @@ +/* + * Calculate the checksum of data that is 16 byte aligned and a multiple of + * 16 bytes. + * + * The first step is to reduce it to 1024 bits. We do this in 8 parallel + * chunks in order to mask the latency of the vpmsum instructions. If we + * have more than 32 kB of data to checksum we repeat this step multiple + * times, passing in the previous 1024 bits. + * + * The next step is to reduce the 1024 bits to 64 bits. This step adds + * 32 bits of 0s to the end - this matches what a CRC does. We just + * calculate constants that land the data in this 32 bits. + * + * We then use fixed point Barrett reduction to compute a mod n over GF(2) + * for n = CRC using POWER8 instructions. We use x = 32. + * + * http://en.wikipedia.org/wiki/Barrett_reduction + * + * Copyright (C) 2015 Anton Blanchard , IBM + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include + + .section.rodata +.balign 16 + +.byteswap_constant: + /* byte reverse permute constant */ + .octa 0x0F0E0D0C0B0A09080706050403020100 + +#define MAX_SIZE 32768 +.constants: + + /* Reduce 262144 kbits to 1024 bits */ + /* x^261120 mod p(x)` << 1, x^261184 mod p(x)` << 1 */ + .octa 0xb6ca9e209c37c408 + + /* x^260096 mod p(x)` << 1, x^260160 mod p(x)` << 1 */ + .octa 0x350249a80001b51df26c + + /* x^259072 mod p(x)` << 1, x^259136 mod p(x)` << 1 */ + .octa 0x0001862dac540724b9d0 + + /* x^258048 mod p(x)` << 1, x^258112 mod p(x)` << 1 */ + .octa 0x0001d87fb48c0001c00532fe + + /* x^257024 mod p(x)` << 1, x^257088 mod p(x)` << 1 */ + .octa 0x0001f39b699ef05a9362 + + /* x^256000 mod p(x)` << 1, x^256064 mod p(x)` << 1 */ + .octa 0x000101da11b40001e1007970 + + /* x^254976 mod p(x)` << 1, x^255040 mod p(x)` << 1 */ + .octa 0x0001cab571e0a57366ee + + /* x^253952 mod p(x)` << 1, x^254016 mod p(x)` << 1 */ + .octa 0xc7020cfe000192011284 + + /* x^252928 mod p(x)` << 1, x^252992 mod p(x)` << 1 */ + .octa 0xcdaed1ae000162716d9a + + /* x^251904 mod p(x)` << 1, x^251968 mod p(x)` << 1 */ + .octa 0x0001e804effccd97ecde + + /* x^250880 mod p(x)` << 1, x^250944 mod p(x)` << 1 */ + .octa 0x77c3ea3a58812bc0 + + /* x^249856 mod p(x)` << 1, x^249920 mod p(x)` << 1 */ + .octa 0x68df31b488b8c12e + + /* x^248832 mod p(x)` << 1, x^248896 mod p(x)` << 1 */ + .octa 0xb059b6c20001230b234c + + /* x^247808 mod p(x)` << 1, x^247872 mod p(x)` << 1 */ + .octa 0x000145fb8ed80001120b416e + +
[PATCH 1/2] crypto: vmx: Fix ABI detection
From: Anton Blanchard When calling ppc-xlate.pl, we pass it either linux-ppc64 or linux-ppc64le. The script however was expecting linux64le, a result of its OpenSSL origins. This means we aren't obeying the ppc64le ABIv2 rules. Fix this by checking for linux-ppc64le. Fixes: 5ca55738201c ("crypto: vmx - comply with ABIs that specify vrsave as reserved.") Cc: sta...@vger.kernel.org Signed-off-by: Anton Blanchard --- drivers/crypto/vmx/ppc-xlate.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/vmx/ppc-xlate.pl b/drivers/crypto/vmx/ppc-xlate.pl index 9f4994c..b18e67d 100644 --- a/drivers/crypto/vmx/ppc-xlate.pl +++ b/drivers/crypto/vmx/ppc-xlate.pl @@ -141,7 +141,7 @@ my $vmr = sub { # Some ABIs specify vrsave, special-purpose register #256, as reserved # for system use. -my $no_vrsave = ($flavour =~ /aix|linux64le/); +my $no_vrsave = ($flavour =~ /linux-ppc64le/); my $mtspr = sub { my ($f,$idx,$ra) = @_; if ($idx == 256 && $no_vrsave) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] crypto: vmx: Increase priority of aes-cbc cipher
From: Anton Blanchard All of the VMX AES ciphers (AES, AES-CBC and AES-CTR) are set at priority 1000. Unfortunately this means we never use AES-CBC and AES-CTR, because the base AES-CBC cipher that is implemented on top of AES inherits its priority. To fix this, AES-CBC and AES-CTR have to be a higher priority. Set them to 2000. Testing on a POWER8 with: cryptsetup benchmark --cipher aes --key-size 256 Shows decryption speed increase from 402.4 MB/s to 3069.2 MB/s, over 7x faster. Thanks to Mike Strosaker for helping me debug this issue. Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module") Cc: sta...@vger.kernel.org Signed-off-by: Anton Blanchard --- drivers/crypto/vmx/aes_cbc.c | 2 +- drivers/crypto/vmx/aes_ctr.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c index 495577b..94ad5c0 100644 --- a/drivers/crypto/vmx/aes_cbc.c +++ b/drivers/crypto/vmx/aes_cbc.c @@ -182,7 +182,7 @@ struct crypto_alg p8_aes_cbc_alg = { .cra_name = "cbc(aes)", .cra_driver_name = "p8_aes_cbc", .cra_module = THIS_MODULE, - .cra_priority = 1000, + .cra_priority = 2000, .cra_type = &crypto_blkcipher_type, .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_NEED_FALLBACK, .cra_alignmask = 0, diff --git a/drivers/crypto/vmx/aes_ctr.c b/drivers/crypto/vmx/aes_ctr.c index 0a3c1b0..38ed10d 100644 --- a/drivers/crypto/vmx/aes_ctr.c +++ b/drivers/crypto/vmx/aes_ctr.c @@ -166,7 +166,7 @@ struct crypto_alg p8_aes_ctr_alg = { .cra_name = "ctr(aes)", .cra_driver_name = "p8_aes_ctr", .cra_module = THIS_MODULE, - .cra_priority = 1000, + .cra_priority = 2000, .cra_type = &crypto_blkcipher_type, .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_NEED_FALLBACK, .cra_alignmask = 0, -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto/nx: disable NX on little endian builds
The NX driver has endian issues so disable it for now. Signed-off-by: Anton Blanchard --- diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index 03ccdb0..8280a7a3 100644 --- a/drivers/crypto/Kconfig +++ b/drivers/crypto/Kconfig @@ -313,7 +313,7 @@ config CRYPTO_DEV_S5P config CRYPTO_DEV_NX bool "Support for IBM Power7+ in-Nest cryptographic acceleration" - depends on PPC64 && IBMVIO + depends on PPC64 && IBMVIO && !CPU_LITTLE_ENDIAN default n help Support for Power7+ in-Nest cryptographic acceleration. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/17] pseries: Enabled the PFO-based RNG accelerator
Hi, +#if defined(CONFIG_HW_RANDOM_PSERIES) || \ + defined(CONFIG_HW_RANDOM_PSERIES_MODULE) +#define OV5_PFO_HW_RNG 0x80/* PFO Random Number Generator */ +#else +#define OV5_PFO_HW_RNG 0x00 +#endif Milton tipped me off about this. We really don't want to be doing ibm,client-architecture reboots every time a config option is changed. Let's just hardwire it on. Anton -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html