Re: [PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation

2016-11-22 Thread Ard Biesheuvel
On 22 November 2016 at 12:53, Ard Biesheuvel  wrote:
> On 22 November 2016 at 10:14, YueHaibing  wrote:
>> This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8
>> NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned
>> on to enable the feature.The crc_t10dif crypto library function will
>> use this faster algorithm when crct10dif_neon module is loaded.
>>
>
> What is this algorithm commonly used for? In other words, why is it a
> good idea to add support for this algorithm to the kernel?
>
>> Tcrypt benchmark results:
>>
>> HIP06  (mode=320 sec=2)
>>
>> The ratio of bytes/sec crct10dif-neon Vs. crct10dif-generic:
>>
>> TEST  neon  
>> generic ratio
>>   16 byte blocks,   16 bytes per update,   1 updates214506112   
>> 171095400   1.25
>>   64 byte blocks,   16 bytes per update,   4 updates139385312   
>> 119036352   1.17
>>   64 byte blocks,   64 bytes per update,   1 updates671523712   
>> 198945344   3.38
>>  256 byte blocks,   16 bytes per update,  16 updates157674880   
>> 125146752   1.26
>>  256 byte blocks,   64 bytes per update,   4 updates491888128   
>> 175764096   2.80
>>  256 byte blocks,  256 bytes per update,   1 updates2123298176  
>> 206995200   10.26
>> 1024 byte blocks,   16 bytes per update,  64 updates161243136   
>> 126460416   1.28
>> 1024 byte blocks,  256 bytes per update,   4 updates1643020800  
>> 200027136   8.21
>> 1024 byte blocks, 1024 bytes per update,   1 updates4238239232  
>> 209106432   20.27
>> 2048 byte blocks,   16 bytes per update, 128 updates162079744   
>> 126953472   1.28
>> 2048 byte blocks,  256 bytes per update,   8 updates1693587456  
>> 200867840   8.43
>> 2048 byte blocks, 1024 bytes per update,   2 updates3424323584  
>> 206330880   16.60
>> 2048 byte blocks, 2048 bytes per update,   1 updates5228207104  
>> 208620544   25.06
>> 4096 byte blocks,   16 bytes per update, 256 updates162304000   
>> 126894080   1.28
>> 4096 byte blocks,  256 bytes per update,  16 updates1731862528  
>> 201197568   8.61
>> 4096 byte blocks, 1024 bytes per update,   4 updates3668625408  
>> 207003648   17.72
>> 4096 byte blocks, 4096 bytes per update,   1 updates5551239168  
>> 209127424   26.54
>> 8192 byte blocks,   16 bytes per update, 512 updates162779136   
>> 126984192   1.28
>> 8192 byte blocks,  256 bytes per update,  32 updates1753702400  
>> 201420800   8.71
>> 8192 byte blocks, 1024 bytes per update,   8 updates3760918528  
>> 207351808   18.14
>> 8192 byte blocks, 4096 bytes per update,   2 updates5483655168  
>> 208928768   26.25
>> 8192 byte blocks, 8192 bytes per update,   1 updates5623377920  
>> 209108992   26.89
>>
>> Signed-off-by: YueHaibing 
>> Signed-off-by: YangShengkai 
>> Signed-off-by: Ding Tianhong 
>> Signed-off-by: Hanjun Guo 
>>
>> ---
>>  arch/arm64/crypto/Kconfig |   5 +
>>  arch/arm64/crypto/Makefile|   4 +
>>  arch/arm64/crypto/crct10dif-neon-asm_64.S | 751 
>> ++
>>  arch/arm64/crypto/crct10dif-neon_glue.c   | 115 +
>>  4 files changed, 875 insertions(+)
>>  create mode 100644 arch/arm64/crypto/crct10dif-neon-asm_64.S
>>  create mode 100644 arch/arm64/crypto/crct10dif-neon_glue.c
>>
>> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
>> index 2cf32e9..2e450bf 100644
>> --- a/arch/arm64/crypto/Kconfig
>> +++ b/arch/arm64/crypto/Kconfig
>> @@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE
>> depends on ARM64 && KERNEL_MODE_NEON
>> select CRYPTO_HASH
>>
>> +config CRYPTO_CRCT10DIF_NEON
>> +   tristate "CRCT10DIF hardware acceleration using NEON instructions"
>> +   depends on ARM64 && KERNEL_MODE_NEON
>> +   select CRYPTO_HASH
>> +
>
> Could you please follow the existing pattern:
>
> config CRYPTO_CRCT10DIF_ARM64_NEON
>
>
>>  config CRYPTO_AES_ARM64_CE
>> tristate "AES core cipher using ARMv8 Crypto Extensions"
>> depends on ARM64 && KERNEL_MODE_NEON
>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> index abb79b3..6c9ff2c 100644
>> --- a/arch/arm64/crypto/Makefile
>> +++ b/arch/arm64/crypto/Makefile
>> @@ -29,6 +29,10 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
>>  obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
>>  aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>>
>> +obj-$(CONFIG_CRYPTO_CRCT10DIF_NEON) += crct10dif-neon.o
>> +crct10dif-neon-y := crct10dif-neon-asm_64.o crct10dif-neon_glue.o
>> +AFLAGS_crct10dif-neon-asm_64.o := -march=armv8-a+crypto
>> +
>
> Please drop this line, and add
>
> .cpu 

Re: [PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation

2016-11-22 Thread Ard Biesheuvel
On 22 November 2016 at 12:53, Ard Biesheuvel  wrote:
> On 22 November 2016 at 10:14, YueHaibing  wrote:
>> This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8
>> NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned
>> on to enable the feature.The crc_t10dif crypto library function will
>> use this faster algorithm when crct10dif_neon module is loaded.
>>
>
> What is this algorithm commonly used for? In other words, why is it a
> good idea to add support for this algorithm to the kernel?
>
>> Tcrypt benchmark results:
>>
>> HIP06  (mode=320 sec=2)
>>
>> The ratio of bytes/sec crct10dif-neon Vs. crct10dif-generic:
>>
>> TEST  neon  
>> generic ratio
>>   16 byte blocks,   16 bytes per update,   1 updates214506112   
>> 171095400   1.25
>>   64 byte blocks,   16 bytes per update,   4 updates139385312   
>> 119036352   1.17
>>   64 byte blocks,   64 bytes per update,   1 updates671523712   
>> 198945344   3.38
>>  256 byte blocks,   16 bytes per update,  16 updates157674880   
>> 125146752   1.26
>>  256 byte blocks,   64 bytes per update,   4 updates491888128   
>> 175764096   2.80
>>  256 byte blocks,  256 bytes per update,   1 updates2123298176  
>> 206995200   10.26
>> 1024 byte blocks,   16 bytes per update,  64 updates161243136   
>> 126460416   1.28
>> 1024 byte blocks,  256 bytes per update,   4 updates1643020800  
>> 200027136   8.21
>> 1024 byte blocks, 1024 bytes per update,   1 updates4238239232  
>> 209106432   20.27
>> 2048 byte blocks,   16 bytes per update, 128 updates162079744   
>> 126953472   1.28
>> 2048 byte blocks,  256 bytes per update,   8 updates1693587456  
>> 200867840   8.43
>> 2048 byte blocks, 1024 bytes per update,   2 updates3424323584  
>> 206330880   16.60
>> 2048 byte blocks, 2048 bytes per update,   1 updates5228207104  
>> 208620544   25.06
>> 4096 byte blocks,   16 bytes per update, 256 updates162304000   
>> 126894080   1.28
>> 4096 byte blocks,  256 bytes per update,  16 updates1731862528  
>> 201197568   8.61
>> 4096 byte blocks, 1024 bytes per update,   4 updates3668625408  
>> 207003648   17.72
>> 4096 byte blocks, 4096 bytes per update,   1 updates5551239168  
>> 209127424   26.54
>> 8192 byte blocks,   16 bytes per update, 512 updates162779136   
>> 126984192   1.28
>> 8192 byte blocks,  256 bytes per update,  32 updates1753702400  
>> 201420800   8.71
>> 8192 byte blocks, 1024 bytes per update,   8 updates3760918528  
>> 207351808   18.14
>> 8192 byte blocks, 4096 bytes per update,   2 updates5483655168  
>> 208928768   26.25
>> 8192 byte blocks, 8192 bytes per update,   1 updates5623377920  
>> 209108992   26.89
>>
>> Signed-off-by: YueHaibing 
>> Signed-off-by: YangShengkai 
>> Signed-off-by: Ding Tianhong 
>> Signed-off-by: Hanjun Guo 
>>
>> ---
>>  arch/arm64/crypto/Kconfig |   5 +
>>  arch/arm64/crypto/Makefile|   4 +
>>  arch/arm64/crypto/crct10dif-neon-asm_64.S | 751 
>> ++
>>  arch/arm64/crypto/crct10dif-neon_glue.c   | 115 +
>>  4 files changed, 875 insertions(+)
>>  create mode 100644 arch/arm64/crypto/crct10dif-neon-asm_64.S
>>  create mode 100644 arch/arm64/crypto/crct10dif-neon_glue.c
>>
>> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
>> index 2cf32e9..2e450bf 100644
>> --- a/arch/arm64/crypto/Kconfig
>> +++ b/arch/arm64/crypto/Kconfig
>> @@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE
>> depends on ARM64 && KERNEL_MODE_NEON
>> select CRYPTO_HASH
>>
>> +config CRYPTO_CRCT10DIF_NEON
>> +   tristate "CRCT10DIF hardware acceleration using NEON instructions"
>> +   depends on ARM64 && KERNEL_MODE_NEON
>> +   select CRYPTO_HASH
>> +
>
> Could you please follow the existing pattern:
>
> config CRYPTO_CRCT10DIF_ARM64_NEON
>
>
>>  config CRYPTO_AES_ARM64_CE
>> tristate "AES core cipher using ARMv8 Crypto Extensions"
>> depends on ARM64 && KERNEL_MODE_NEON
>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> index abb79b3..6c9ff2c 100644
>> --- a/arch/arm64/crypto/Makefile
>> +++ b/arch/arm64/crypto/Makefile
>> @@ -29,6 +29,10 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
>>  obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
>>  aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>>
>> +obj-$(CONFIG_CRYPTO_CRCT10DIF_NEON) += crct10dif-neon.o
>> +crct10dif-neon-y := crct10dif-neon-asm_64.o crct10dif-neon_glue.o
>> +AFLAGS_crct10dif-neon-asm_64.o := -march=armv8-a+crypto
>> +
>
> Please drop this line, and add
>
> .cpu generic+crypto
>
> to the .S file
>
>>  AFLAGS_aes-ce.o:= -DINTERLEAVE=4
>>  AFLAGS_aes-neon.o  := -DINTERLEAVE=4
>>
>> diff 

Re: [PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation

2016-11-22 Thread Ard Biesheuvel
On 22 November 2016 at 10:14, YueHaibing  wrote:
> This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8
> NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned
> on to enable the feature.The crc_t10dif crypto library function will
> use this faster algorithm when crct10dif_neon module is loaded.
>

What is this algorithm commonly used for? In other words, why is it a
good idea to add support for this algorithm to the kernel?

> Tcrypt benchmark results:
>
> HIP06  (mode=320 sec=2)
>
> The ratio of bytes/sec crct10dif-neon Vs. crct10dif-generic:
>
> TEST  neon  
> generic ratio
>   16 byte blocks,   16 bytes per update,   1 updates214506112   
> 171095400   1.25
>   64 byte blocks,   16 bytes per update,   4 updates139385312   
> 119036352   1.17
>   64 byte blocks,   64 bytes per update,   1 updates671523712   
> 198945344   3.38
>  256 byte blocks,   16 bytes per update,  16 updates157674880   
> 125146752   1.26
>  256 byte blocks,   64 bytes per update,   4 updates491888128   
> 175764096   2.80
>  256 byte blocks,  256 bytes per update,   1 updates2123298176  
> 206995200   10.26
> 1024 byte blocks,   16 bytes per update,  64 updates161243136   
> 126460416   1.28
> 1024 byte blocks,  256 bytes per update,   4 updates1643020800  
> 200027136   8.21
> 1024 byte blocks, 1024 bytes per update,   1 updates4238239232  
> 209106432   20.27
> 2048 byte blocks,   16 bytes per update, 128 updates162079744   
> 126953472   1.28
> 2048 byte blocks,  256 bytes per update,   8 updates1693587456  
> 200867840   8.43
> 2048 byte blocks, 1024 bytes per update,   2 updates3424323584  
> 206330880   16.60
> 2048 byte blocks, 2048 bytes per update,   1 updates5228207104  
> 208620544   25.06
> 4096 byte blocks,   16 bytes per update, 256 updates162304000   
> 126894080   1.28
> 4096 byte blocks,  256 bytes per update,  16 updates1731862528  
> 201197568   8.61
> 4096 byte blocks, 1024 bytes per update,   4 updates3668625408  
> 207003648   17.72
> 4096 byte blocks, 4096 bytes per update,   1 updates5551239168  
> 209127424   26.54
> 8192 byte blocks,   16 bytes per update, 512 updates162779136   
> 126984192   1.28
> 8192 byte blocks,  256 bytes per update,  32 updates1753702400  
> 201420800   8.71
> 8192 byte blocks, 1024 bytes per update,   8 updates3760918528  
> 207351808   18.14
> 8192 byte blocks, 4096 bytes per update,   2 updates5483655168  
> 208928768   26.25
> 8192 byte blocks, 8192 bytes per update,   1 updates5623377920  
> 209108992   26.89
>
> Signed-off-by: YueHaibing 
> Signed-off-by: YangShengkai 
> Signed-off-by: Ding Tianhong 
> Signed-off-by: Hanjun Guo 
>
> ---
>  arch/arm64/crypto/Kconfig |   5 +
>  arch/arm64/crypto/Makefile|   4 +
>  arch/arm64/crypto/crct10dif-neon-asm_64.S | 751 
> ++
>  arch/arm64/crypto/crct10dif-neon_glue.c   | 115 +
>  4 files changed, 875 insertions(+)
>  create mode 100644 arch/arm64/crypto/crct10dif-neon-asm_64.S
>  create mode 100644 arch/arm64/crypto/crct10dif-neon_glue.c
>
> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
> index 2cf32e9..2e450bf 100644
> --- a/arch/arm64/crypto/Kconfig
> +++ b/arch/arm64/crypto/Kconfig
> @@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE
> depends on ARM64 && KERNEL_MODE_NEON
> select CRYPTO_HASH
>
> +config CRYPTO_CRCT10DIF_NEON
> +   tristate "CRCT10DIF hardware acceleration using NEON instructions"
> +   depends on ARM64 && KERNEL_MODE_NEON
> +   select CRYPTO_HASH
> +

Could you please follow the existing pattern:

config CRYPTO_CRCT10DIF_ARM64_NEON


>  config CRYPTO_AES_ARM64_CE
> tristate "AES core cipher using ARMv8 Crypto Extensions"
> depends on ARM64 && KERNEL_MODE_NEON
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index abb79b3..6c9ff2c 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -29,6 +29,10 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
>  obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
>  aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>
> +obj-$(CONFIG_CRYPTO_CRCT10DIF_NEON) += crct10dif-neon.o
> +crct10dif-neon-y := crct10dif-neon-asm_64.o crct10dif-neon_glue.o
> +AFLAGS_crct10dif-neon-asm_64.o := -march=armv8-a+crypto
> +

Please drop this line, and add

.cpu generic+crypto

to the .S file

>  AFLAGS_aes-ce.o:= -DINTERLEAVE=4
>  AFLAGS_aes-neon.o  := -DINTERLEAVE=4
>
> diff --git a/arch/arm64/crypto/crct10dif-neon-asm_64.S 
> 

Re: [PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation

2016-11-22 Thread Ard Biesheuvel
On 22 November 2016 at 10:14, YueHaibing  wrote:
> This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8
> NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned
> on to enable the feature.The crc_t10dif crypto library function will
> use this faster algorithm when crct10dif_neon module is loaded.
>

What is this algorithm commonly used for? In other words, why is it a
good idea to add support for this algorithm to the kernel?

> Tcrypt benchmark results:
>
> HIP06  (mode=320 sec=2)
>
> The ratio of bytes/sec crct10dif-neon Vs. crct10dif-generic:
>
> TEST  neon  
> generic ratio
>   16 byte blocks,   16 bytes per update,   1 updates214506112   
> 171095400   1.25
>   64 byte blocks,   16 bytes per update,   4 updates139385312   
> 119036352   1.17
>   64 byte blocks,   64 bytes per update,   1 updates671523712   
> 198945344   3.38
>  256 byte blocks,   16 bytes per update,  16 updates157674880   
> 125146752   1.26
>  256 byte blocks,   64 bytes per update,   4 updates491888128   
> 175764096   2.80
>  256 byte blocks,  256 bytes per update,   1 updates2123298176  
> 206995200   10.26
> 1024 byte blocks,   16 bytes per update,  64 updates161243136   
> 126460416   1.28
> 1024 byte blocks,  256 bytes per update,   4 updates1643020800  
> 200027136   8.21
> 1024 byte blocks, 1024 bytes per update,   1 updates4238239232  
> 209106432   20.27
> 2048 byte blocks,   16 bytes per update, 128 updates162079744   
> 126953472   1.28
> 2048 byte blocks,  256 bytes per update,   8 updates1693587456  
> 200867840   8.43
> 2048 byte blocks, 1024 bytes per update,   2 updates3424323584  
> 206330880   16.60
> 2048 byte blocks, 2048 bytes per update,   1 updates5228207104  
> 208620544   25.06
> 4096 byte blocks,   16 bytes per update, 256 updates162304000   
> 126894080   1.28
> 4096 byte blocks,  256 bytes per update,  16 updates1731862528  
> 201197568   8.61
> 4096 byte blocks, 1024 bytes per update,   4 updates3668625408  
> 207003648   17.72
> 4096 byte blocks, 4096 bytes per update,   1 updates5551239168  
> 209127424   26.54
> 8192 byte blocks,   16 bytes per update, 512 updates162779136   
> 126984192   1.28
> 8192 byte blocks,  256 bytes per update,  32 updates1753702400  
> 201420800   8.71
> 8192 byte blocks, 1024 bytes per update,   8 updates3760918528  
> 207351808   18.14
> 8192 byte blocks, 4096 bytes per update,   2 updates5483655168  
> 208928768   26.25
> 8192 byte blocks, 8192 bytes per update,   1 updates5623377920  
> 209108992   26.89
>
> Signed-off-by: YueHaibing 
> Signed-off-by: YangShengkai 
> Signed-off-by: Ding Tianhong 
> Signed-off-by: Hanjun Guo 
>
> ---
>  arch/arm64/crypto/Kconfig |   5 +
>  arch/arm64/crypto/Makefile|   4 +
>  arch/arm64/crypto/crct10dif-neon-asm_64.S | 751 
> ++
>  arch/arm64/crypto/crct10dif-neon_glue.c   | 115 +
>  4 files changed, 875 insertions(+)
>  create mode 100644 arch/arm64/crypto/crct10dif-neon-asm_64.S
>  create mode 100644 arch/arm64/crypto/crct10dif-neon_glue.c
>
> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
> index 2cf32e9..2e450bf 100644
> --- a/arch/arm64/crypto/Kconfig
> +++ b/arch/arm64/crypto/Kconfig
> @@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE
> depends on ARM64 && KERNEL_MODE_NEON
> select CRYPTO_HASH
>
> +config CRYPTO_CRCT10DIF_NEON
> +   tristate "CRCT10DIF hardware acceleration using NEON instructions"
> +   depends on ARM64 && KERNEL_MODE_NEON
> +   select CRYPTO_HASH
> +

Could you please follow the existing pattern:

config CRYPTO_CRCT10DIF_ARM64_NEON


>  config CRYPTO_AES_ARM64_CE
> tristate "AES core cipher using ARMv8 Crypto Extensions"
> depends on ARM64 && KERNEL_MODE_NEON
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index abb79b3..6c9ff2c 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -29,6 +29,10 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
>  obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
>  aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>
> +obj-$(CONFIG_CRYPTO_CRCT10DIF_NEON) += crct10dif-neon.o
> +crct10dif-neon-y := crct10dif-neon-asm_64.o crct10dif-neon_glue.o
> +AFLAGS_crct10dif-neon-asm_64.o := -march=armv8-a+crypto
> +

Please drop this line, and add

.cpu generic+crypto

to the .S file

>  AFLAGS_aes-ce.o:= -DINTERLEAVE=4
>  AFLAGS_aes-neon.o  := -DINTERLEAVE=4
>
> diff --git a/arch/arm64/crypto/crct10dif-neon-asm_64.S 
> b/arch/arm64/crypto/crct10dif-neon-asm_64.S
> new file mode 100644
> index 000..2ae3033
> --- /dev/null
> +++ 

[PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation

2016-11-22 Thread YueHaibing
This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8
NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned
on to enable the feature.The crc_t10dif crypto library function will
use this faster algorithm when crct10dif_neon module is loaded.

Tcrypt benchmark results:

HIP06  (mode=320 sec=2)

The ratio of bytes/sec crct10dif-neon Vs. crct10dif-generic:

TEST  neon  generic 
ratio
  16 byte blocks,   16 bytes per update,   1 updates214506112   
171095400   1.25
  64 byte blocks,   16 bytes per update,   4 updates139385312   
119036352   1.17
  64 byte blocks,   64 bytes per update,   1 updates671523712   
198945344   3.38
 256 byte blocks,   16 bytes per update,  16 updates157674880   
125146752   1.26
 256 byte blocks,   64 bytes per update,   4 updates491888128   
175764096   2.80
 256 byte blocks,  256 bytes per update,   1 updates2123298176  
206995200   10.26
1024 byte blocks,   16 bytes per update,  64 updates161243136   
126460416   1.28
1024 byte blocks,  256 bytes per update,   4 updates1643020800  
200027136   8.21
1024 byte blocks, 1024 bytes per update,   1 updates4238239232  
209106432   20.27
2048 byte blocks,   16 bytes per update, 128 updates162079744   
126953472   1.28
2048 byte blocks,  256 bytes per update,   8 updates1693587456  
200867840   8.43
2048 byte blocks, 1024 bytes per update,   2 updates3424323584  
206330880   16.60
2048 byte blocks, 2048 bytes per update,   1 updates5228207104  
208620544   25.06
4096 byte blocks,   16 bytes per update, 256 updates162304000   
126894080   1.28
4096 byte blocks,  256 bytes per update,  16 updates1731862528  
201197568   8.61
4096 byte blocks, 1024 bytes per update,   4 updates3668625408  
207003648   17.72
4096 byte blocks, 4096 bytes per update,   1 updates5551239168  
209127424   26.54
8192 byte blocks,   16 bytes per update, 512 updates162779136   
126984192   1.28
8192 byte blocks,  256 bytes per update,  32 updates1753702400  
201420800   8.71
8192 byte blocks, 1024 bytes per update,   8 updates3760918528  
207351808   18.14
8192 byte blocks, 4096 bytes per update,   2 updates5483655168  
208928768   26.25
8192 byte blocks, 8192 bytes per update,   1 updates5623377920  
209108992   26.89

Signed-off-by: YueHaibing 
Signed-off-by: YangShengkai 
Signed-off-by: Ding Tianhong 
Signed-off-by: Hanjun Guo 

---
 arch/arm64/crypto/Kconfig |   5 +
 arch/arm64/crypto/Makefile|   4 +
 arch/arm64/crypto/crct10dif-neon-asm_64.S | 751 ++
 arch/arm64/crypto/crct10dif-neon_glue.c   | 115 +
 4 files changed, 875 insertions(+)
 create mode 100644 arch/arm64/crypto/crct10dif-neon-asm_64.S
 create mode 100644 arch/arm64/crypto/crct10dif-neon_glue.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 2cf32e9..2e450bf 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_HASH
 
+config CRYPTO_CRCT10DIF_NEON
+   tristate "CRCT10DIF hardware acceleration using NEON instructions"
+   depends on ARM64 && KERNEL_MODE_NEON
+   select CRYPTO_HASH
+
 config CRYPTO_AES_ARM64_CE
tristate "AES core cipher using ARMv8 Crypto Extensions"
depends on ARM64 && KERNEL_MODE_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index abb79b3..6c9ff2c 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -29,6 +29,10 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
 obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
 aes-neon-blk-y := aes-glue-neon.o aes-neon.o
 
+obj-$(CONFIG_CRYPTO_CRCT10DIF_NEON) += crct10dif-neon.o
+crct10dif-neon-y := crct10dif-neon-asm_64.o crct10dif-neon_glue.o
+AFLAGS_crct10dif-neon-asm_64.o := -march=armv8-a+crypto
+
 AFLAGS_aes-ce.o:= -DINTERLEAVE=4
 AFLAGS_aes-neon.o  := -DINTERLEAVE=4
 
diff --git a/arch/arm64/crypto/crct10dif-neon-asm_64.S 
b/arch/arm64/crypto/crct10dif-neon-asm_64.S
new file mode 100644
index 000..2ae3033
--- /dev/null
+++ b/arch/arm64/crypto/crct10dif-neon-asm_64.S
@@ -0,0 +1,751 @@
+/*
+ * Copyright (c) 2016-2017 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+
+.global crc_t10dif_neon
+.text
+
+/* X0 is initial CRC 

[PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation

2016-11-22 Thread YueHaibing
This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8
NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned
on to enable the feature.The crc_t10dif crypto library function will
use this faster algorithm when crct10dif_neon module is loaded.

Tcrypt benchmark results:

HIP06  (mode=320 sec=2)

The ratio of bytes/sec crct10dif-neon Vs. crct10dif-generic:

TEST  neon  generic 
ratio
  16 byte blocks,   16 bytes per update,   1 updates214506112   
171095400   1.25
  64 byte blocks,   16 bytes per update,   4 updates139385312   
119036352   1.17
  64 byte blocks,   64 bytes per update,   1 updates671523712   
198945344   3.38
 256 byte blocks,   16 bytes per update,  16 updates157674880   
125146752   1.26
 256 byte blocks,   64 bytes per update,   4 updates491888128   
175764096   2.80
 256 byte blocks,  256 bytes per update,   1 updates2123298176  
206995200   10.26
1024 byte blocks,   16 bytes per update,  64 updates161243136   
126460416   1.28
1024 byte blocks,  256 bytes per update,   4 updates1643020800  
200027136   8.21
1024 byte blocks, 1024 bytes per update,   1 updates4238239232  
209106432   20.27
2048 byte blocks,   16 bytes per update, 128 updates162079744   
126953472   1.28
2048 byte blocks,  256 bytes per update,   8 updates1693587456  
200867840   8.43
2048 byte blocks, 1024 bytes per update,   2 updates3424323584  
206330880   16.60
2048 byte blocks, 2048 bytes per update,   1 updates5228207104  
208620544   25.06
4096 byte blocks,   16 bytes per update, 256 updates162304000   
126894080   1.28
4096 byte blocks,  256 bytes per update,  16 updates1731862528  
201197568   8.61
4096 byte blocks, 1024 bytes per update,   4 updates3668625408  
207003648   17.72
4096 byte blocks, 4096 bytes per update,   1 updates5551239168  
209127424   26.54
8192 byte blocks,   16 bytes per update, 512 updates162779136   
126984192   1.28
8192 byte blocks,  256 bytes per update,  32 updates1753702400  
201420800   8.71
8192 byte blocks, 1024 bytes per update,   8 updates3760918528  
207351808   18.14
8192 byte blocks, 4096 bytes per update,   2 updates5483655168  
208928768   26.25
8192 byte blocks, 8192 bytes per update,   1 updates5623377920  
209108992   26.89

Signed-off-by: YueHaibing 
Signed-off-by: YangShengkai 
Signed-off-by: Ding Tianhong 
Signed-off-by: Hanjun Guo 

---
 arch/arm64/crypto/Kconfig |   5 +
 arch/arm64/crypto/Makefile|   4 +
 arch/arm64/crypto/crct10dif-neon-asm_64.S | 751 ++
 arch/arm64/crypto/crct10dif-neon_glue.c   | 115 +
 4 files changed, 875 insertions(+)
 create mode 100644 arch/arm64/crypto/crct10dif-neon-asm_64.S
 create mode 100644 arch/arm64/crypto/crct10dif-neon_glue.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 2cf32e9..2e450bf 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_HASH
 
+config CRYPTO_CRCT10DIF_NEON
+   tristate "CRCT10DIF hardware acceleration using NEON instructions"
+   depends on ARM64 && KERNEL_MODE_NEON
+   select CRYPTO_HASH
+
 config CRYPTO_AES_ARM64_CE
tristate "AES core cipher using ARMv8 Crypto Extensions"
depends on ARM64 && KERNEL_MODE_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index abb79b3..6c9ff2c 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -29,6 +29,10 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
 obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
 aes-neon-blk-y := aes-glue-neon.o aes-neon.o
 
+obj-$(CONFIG_CRYPTO_CRCT10DIF_NEON) += crct10dif-neon.o
+crct10dif-neon-y := crct10dif-neon-asm_64.o crct10dif-neon_glue.o
+AFLAGS_crct10dif-neon-asm_64.o := -march=armv8-a+crypto
+
 AFLAGS_aes-ce.o:= -DINTERLEAVE=4
 AFLAGS_aes-neon.o  := -DINTERLEAVE=4
 
diff --git a/arch/arm64/crypto/crct10dif-neon-asm_64.S 
b/arch/arm64/crypto/crct10dif-neon-asm_64.S
new file mode 100644
index 000..2ae3033
--- /dev/null
+++ b/arch/arm64/crypto/crct10dif-neon-asm_64.S
@@ -0,0 +1,751 @@
+/*
+ * Copyright (c) 2016-2017 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+
+.global crc_t10dif_neon
+.text
+
+/* X0 is initial CRC value
+ * X1 is data buffer
+ * X2 is the length of buffer
+ * X3 is the backup buffer(for extend)
+