Re: [PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation
On 22 November 2016 at 12:53, Ard Biesheuvel wrote: > On 22 November 2016 at 10:14, YueHaibing wrote: >> This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8 >> NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned >> on to enable the feature.The crc_t10dif crypto library function will >> use this faster algorithm when crct10dif_neon module is loaded. >> > > What is this algorithm commonly used for? In other words, why is it a > good idea to add support for this algorithm to the kernel? > >> Tcrypt benchmark results: >> >> HIP06 (mode=320 sec=2) >> >> The ratio of bytes/sec crct10dif-neon Vs. crct10dif-generic: >> >> TEST neon >> generic ratio >> 16 byte blocks, 16 bytes per update, 1 updates214506112 >> 171095400 1.25 >> 64 byte blocks, 16 bytes per update, 4 updates139385312 >> 119036352 1.17 >> 64 byte blocks, 64 bytes per update, 1 updates671523712 >> 198945344 3.38 >> 256 byte blocks, 16 bytes per update, 16 updates157674880 >> 125146752 1.26 >> 256 byte blocks, 64 bytes per update, 4 updates491888128 >> 175764096 2.80 >> 256 byte blocks, 256 bytes per update, 1 updates2123298176 >> 206995200 10.26 >> 1024 byte blocks, 16 bytes per update, 64 updates161243136 >> 126460416 1.28 >> 1024 byte blocks, 256 bytes per update, 4 updates1643020800 >> 200027136 8.21 >> 1024 byte blocks, 1024 bytes per update, 1 updates4238239232 >> 209106432 20.27 >> 2048 byte blocks, 16 bytes per update, 128 updates162079744 >> 126953472 1.28 >> 2048 byte blocks, 256 bytes per update, 8 updates1693587456 >> 200867840 8.43 >> 2048 byte blocks, 1024 bytes per update, 2 updates3424323584 >> 206330880 16.60 >> 2048 byte blocks, 2048 bytes per update, 1 updates5228207104 >> 208620544 25.06 >> 4096 byte blocks, 16 bytes per update, 256 updates162304000 >> 126894080 1.28 >> 4096 byte blocks, 256 bytes per update, 16 updates1731862528 >> 201197568 8.61 >> 4096 byte blocks, 1024 bytes per update, 4 updates3668625408 >> 207003648 17.72 >> 4096 byte blocks, 4096 bytes per update, 1 updates5551239168 >> 209127424 26.54 >> 8192 byte blocks, 16 bytes per update, 512 updates162779136 >> 126984192 1.28 >> 8192 byte blocks, 256 bytes per update, 32 updates1753702400 >> 201420800 8.71 >> 8192 byte blocks, 1024 bytes per update, 8 updates3760918528 >> 207351808 18.14 >> 8192 byte blocks, 4096 bytes per update, 2 updates5483655168 >> 208928768 26.25 >> 8192 byte blocks, 8192 bytes per update, 1 updates5623377920 >> 209108992 26.89 >> >> Signed-off-by: YueHaibing >> Signed-off-by: YangShengkai >> Signed-off-by: Ding Tianhong >> Signed-off-by: Hanjun Guo >> >> --- >> arch/arm64/crypto/Kconfig | 5 + >> arch/arm64/crypto/Makefile| 4 + >> arch/arm64/crypto/crct10dif-neon-asm_64.S | 751 >> ++ >> arch/arm64/crypto/crct10dif-neon_glue.c | 115 + >> 4 files changed, 875 insertions(+) >> create mode 100644 arch/arm64/crypto/crct10dif-neon-asm_64.S >> create mode 100644 arch/arm64/crypto/crct10dif-neon_glue.c >> >> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig >> index 2cf32e9..2e450bf 100644 >> --- a/arch/arm64/crypto/Kconfig >> +++ b/arch/arm64/crypto/Kconfig >> @@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE >> depends on ARM64 && KERNEL_MODE_NEON >> select CRYPTO_HASH >> >> +config CRYPTO_CRCT10DIF_NEON >> + tristate "CRCT10DIF hardware acceleration using NEON instructions" >> + depends on ARM64 && KERNEL_MODE_NEON >> + select CRYPTO_HASH >> + > > Could you please follow the existing pattern: > > config CRYPTO_CRCT10DIF_ARM64_NEON > > >> config CRYPTO_AES_ARM64_CE >> tristate "AES core cipher using ARMv8 Crypto Extensions" >> depends on ARM64 && KERNEL_MODE_NEON >> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile >> index abb79b3..6c9ff2c 100644 >> --- a/arch/arm64/crypto/Makefile >> +++ b/arch/arm64/crypto/Makefile >> @@ -29,6 +29,10 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o >> obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o >> aes-neon-blk-y := aes-glue-neon.o aes-neon.o >> >> +obj-$(CONFIG_CRYPTO_CRCT10DIF_NEON) += crct10dif-neon.o >> +crct10dif-neon-y := crct10dif-neon-asm_64.o crct10dif-neon_glue.o >> +AFLAGS_crct10dif-neon-asm_64.o := -march=armv8-a+crypto >> + > > Please drop this line, and add > > .cpu generic+crypto > > to the .S file > >> AFLAGS_aes-ce.o:= -DINTERLEAVE=4 >> AFLAGS_aes-neon.o := -DINTERLEAVE=4 >> >> diff -
Re: [PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation
On 22 November 2016 at 10:14, YueHaibing wrote: > This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8 > NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned > on to enable the feature.The crc_t10dif crypto library function will > use this faster algorithm when crct10dif_neon module is loaded. > What is this algorithm commonly used for? In other words, why is it a good idea to add support for this algorithm to the kernel? > Tcrypt benchmark results: > > HIP06 (mode=320 sec=2) > > The ratio of bytes/sec crct10dif-neon Vs. crct10dif-generic: > > TEST neon > generic ratio > 16 byte blocks, 16 bytes per update, 1 updates214506112 > 171095400 1.25 > 64 byte blocks, 16 bytes per update, 4 updates139385312 > 119036352 1.17 > 64 byte blocks, 64 bytes per update, 1 updates671523712 > 198945344 3.38 > 256 byte blocks, 16 bytes per update, 16 updates157674880 > 125146752 1.26 > 256 byte blocks, 64 bytes per update, 4 updates491888128 > 175764096 2.80 > 256 byte blocks, 256 bytes per update, 1 updates2123298176 > 206995200 10.26 > 1024 byte blocks, 16 bytes per update, 64 updates161243136 > 126460416 1.28 > 1024 byte blocks, 256 bytes per update, 4 updates1643020800 > 200027136 8.21 > 1024 byte blocks, 1024 bytes per update, 1 updates4238239232 > 209106432 20.27 > 2048 byte blocks, 16 bytes per update, 128 updates162079744 > 126953472 1.28 > 2048 byte blocks, 256 bytes per update, 8 updates1693587456 > 200867840 8.43 > 2048 byte blocks, 1024 bytes per update, 2 updates3424323584 > 206330880 16.60 > 2048 byte blocks, 2048 bytes per update, 1 updates5228207104 > 208620544 25.06 > 4096 byte blocks, 16 bytes per update, 256 updates162304000 > 126894080 1.28 > 4096 byte blocks, 256 bytes per update, 16 updates1731862528 > 201197568 8.61 > 4096 byte blocks, 1024 bytes per update, 4 updates3668625408 > 207003648 17.72 > 4096 byte blocks, 4096 bytes per update, 1 updates5551239168 > 209127424 26.54 > 8192 byte blocks, 16 bytes per update, 512 updates162779136 > 126984192 1.28 > 8192 byte blocks, 256 bytes per update, 32 updates1753702400 > 201420800 8.71 > 8192 byte blocks, 1024 bytes per update, 8 updates3760918528 > 207351808 18.14 > 8192 byte blocks, 4096 bytes per update, 2 updates5483655168 > 208928768 26.25 > 8192 byte blocks, 8192 bytes per update, 1 updates5623377920 > 209108992 26.89 > > Signed-off-by: YueHaibing > Signed-off-by: YangShengkai > Signed-off-by: Ding Tianhong > Signed-off-by: Hanjun Guo > > --- > arch/arm64/crypto/Kconfig | 5 + > arch/arm64/crypto/Makefile| 4 + > arch/arm64/crypto/crct10dif-neon-asm_64.S | 751 > ++ > arch/arm64/crypto/crct10dif-neon_glue.c | 115 + > 4 files changed, 875 insertions(+) > create mode 100644 arch/arm64/crypto/crct10dif-neon-asm_64.S > create mode 100644 arch/arm64/crypto/crct10dif-neon_glue.c > > diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig > index 2cf32e9..2e450bf 100644 > --- a/arch/arm64/crypto/Kconfig > +++ b/arch/arm64/crypto/Kconfig > @@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE > depends on ARM64 && KERNEL_MODE_NEON > select CRYPTO_HASH > > +config CRYPTO_CRCT10DIF_NEON > + tristate "CRCT10DIF hardware acceleration using NEON instructions" > + depends on ARM64 && KERNEL_MODE_NEON > + select CRYPTO_HASH > + Could you please follow the existing pattern: config CRYPTO_CRCT10DIF_ARM64_NEON > config CRYPTO_AES_ARM64_CE > tristate "AES core cipher using ARMv8 Crypto Extensions" > depends on ARM64 && KERNEL_MODE_NEON > diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile > index abb79b3..6c9ff2c 100644 > --- a/arch/arm64/crypto/Makefile > +++ b/arch/arm64/crypto/Makefile > @@ -29,6 +29,10 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o > obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o > aes-neon-blk-y := aes-glue-neon.o aes-neon.o > > +obj-$(CONFIG_CRYPTO_CRCT10DIF_NEON) += crct10dif-neon.o > +crct10dif-neon-y := crct10dif-neon-asm_64.o crct10dif-neon_glue.o > +AFLAGS_crct10dif-neon-asm_64.o := -march=armv8-a+crypto > + Please drop this line, and add .cpu generic+crypto to the .S file > AFLAGS_aes-ce.o:= -DINTERLEAVE=4 > AFLAGS_aes-neon.o := -DINTERLEAVE=4 > > diff --git a/arch/arm64/crypto/crct10dif-neon-asm_64.S > b/arch/arm64/crypto/crct10dif-neon-asm_64.S > new file mode 100644 > index 000..2ae3033 > --- /dev/null > +++ b/arch/arm64/crypt
[PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation
This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8 NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned on to enable the feature.The crc_t10dif crypto library function will use this faster algorithm when crct10dif_neon module is loaded. Tcrypt benchmark results: HIP06 (mode=320 sec=2) The ratio of bytes/sec crct10dif-neon Vs. crct10dif-generic: TEST neon generic ratio 16 byte blocks, 16 bytes per update, 1 updates214506112 171095400 1.25 64 byte blocks, 16 bytes per update, 4 updates139385312 119036352 1.17 64 byte blocks, 64 bytes per update, 1 updates671523712 198945344 3.38 256 byte blocks, 16 bytes per update, 16 updates157674880 125146752 1.26 256 byte blocks, 64 bytes per update, 4 updates491888128 175764096 2.80 256 byte blocks, 256 bytes per update, 1 updates2123298176 206995200 10.26 1024 byte blocks, 16 bytes per update, 64 updates161243136 126460416 1.28 1024 byte blocks, 256 bytes per update, 4 updates1643020800 200027136 8.21 1024 byte blocks, 1024 bytes per update, 1 updates4238239232 209106432 20.27 2048 byte blocks, 16 bytes per update, 128 updates162079744 126953472 1.28 2048 byte blocks, 256 bytes per update, 8 updates1693587456 200867840 8.43 2048 byte blocks, 1024 bytes per update, 2 updates3424323584 206330880 16.60 2048 byte blocks, 2048 bytes per update, 1 updates5228207104 208620544 25.06 4096 byte blocks, 16 bytes per update, 256 updates162304000 126894080 1.28 4096 byte blocks, 256 bytes per update, 16 updates1731862528 201197568 8.61 4096 byte blocks, 1024 bytes per update, 4 updates3668625408 207003648 17.72 4096 byte blocks, 4096 bytes per update, 1 updates5551239168 209127424 26.54 8192 byte blocks, 16 bytes per update, 512 updates162779136 126984192 1.28 8192 byte blocks, 256 bytes per update, 32 updates1753702400 201420800 8.71 8192 byte blocks, 1024 bytes per update, 8 updates3760918528 207351808 18.14 8192 byte blocks, 4096 bytes per update, 2 updates5483655168 208928768 26.25 8192 byte blocks, 8192 bytes per update, 1 updates5623377920 209108992 26.89 Signed-off-by: YueHaibing Signed-off-by: YangShengkai Signed-off-by: Ding Tianhong Signed-off-by: Hanjun Guo --- arch/arm64/crypto/Kconfig | 5 + arch/arm64/crypto/Makefile| 4 + arch/arm64/crypto/crct10dif-neon-asm_64.S | 751 ++ arch/arm64/crypto/crct10dif-neon_glue.c | 115 + 4 files changed, 875 insertions(+) create mode 100644 arch/arm64/crypto/crct10dif-neon-asm_64.S create mode 100644 arch/arm64/crypto/crct10dif-neon_glue.c diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index 2cf32e9..2e450bf 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig @@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE depends on ARM64 && KERNEL_MODE_NEON select CRYPTO_HASH +config CRYPTO_CRCT10DIF_NEON + tristate "CRCT10DIF hardware acceleration using NEON instructions" + depends on ARM64 && KERNEL_MODE_NEON + select CRYPTO_HASH + config CRYPTO_AES_ARM64_CE tristate "AES core cipher using ARMv8 Crypto Extensions" depends on ARM64 && KERNEL_MODE_NEON diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index abb79b3..6c9ff2c 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -29,6 +29,10 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o aes-neon-blk-y := aes-glue-neon.o aes-neon.o +obj-$(CONFIG_CRYPTO_CRCT10DIF_NEON) += crct10dif-neon.o +crct10dif-neon-y := crct10dif-neon-asm_64.o crct10dif-neon_glue.o +AFLAGS_crct10dif-neon-asm_64.o := -march=armv8-a+crypto + AFLAGS_aes-ce.o:= -DINTERLEAVE=4 AFLAGS_aes-neon.o := -DINTERLEAVE=4 diff --git a/arch/arm64/crypto/crct10dif-neon-asm_64.S b/arch/arm64/crypto/crct10dif-neon-asm_64.S new file mode 100644 index 000..2ae3033 --- /dev/null +++ b/arch/arm64/crypto/crct10dif-neon-asm_64.S @@ -0,0 +1,751 @@ +/* + * Copyright (c) 2016-2017 Hisilicon Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include + +.global crc_t10dif_neon +.text + +/* X0 is initial CRC value + * X1 is data buffer + * X2 is the length of buffer + * X3 is the backup buffer(for extend) +