Re: [PATCH 3/4] crypto: [sha] glue code for Intel SHA extensions optimized SHA1 & SHA256
On Fri, Sep 11, 2015 at 01:10:27PM -0700, Tim Chen wrote: > > Mmmm..., this is a restructuring of the algorithms within > the glue code into multiple drivers instead of one and exposing > them all. It is a bit orthogonal to the intention of this > patch set. I think it is better that I create a > separate patch on the glue code on top of this patch set > to implement this. > > Herbert, do you agree with this approach? Yes I think we can work on the individual crypto registration separately from this patch-set. Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] crypto: [sha] glue code for Intel SHA extensions optimized SHA1 & SHA256
On Fri, 2015-09-11 at 12:15 -0700, David Miller wrote: > From: Tim Chen > Date: Fri, 11 Sep 2015 11:49:32 -0700 > > > Still, why would some kernel module specifically not want to > > use the fastest HW implementation, and explicitly ask for > > a slower driver? > > Temporary workaround if a bug is found. > > There is really no reason to prevent the user from having this > flexibility, and in return anyone can test any implementation > their cpu can support. Mmmm..., this is a restructuring of the algorithms within the glue code into multiple drivers instead of one and exposing them all. It is a bit orthogonal to the intention of this patch set. I think it is better that I create a separate patch on the glue code on top of this patch set to implement this. Herbert, do you agree with this approach? Tim -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] crypto: [sha] glue code for Intel SHA extensions optimized SHA1 & SHA256
Am Freitag, 11. September 2015, 11:49:32 schrieb Tim Chen: Hi Tim, >On Fri, 2015-09-11 at 19:02 +0200, Stephan Mueller wrote: >> Am Donnerstag, 10. September 2015, 17:04:31 schrieb Tim Chen: >> >> Hi Tim, >> >> >Is there a scenario you can think of >> >when a lower performing sha1 transform needs to >> >be exposed as a separate driver? >> >> My immediate concern is testing: it is hard to test the individual >> implementations. > >Not hard, just one line in the glue code to set the transform >to the one you need it you really want to test individual >implementation. Usually user of sha don't care which sha driver >they got, but just the highest priority one. >So you will anyway need to patch and change the priority of the sha >driver to expose a specific one for testing. Sure, it is not hard when you recompile. But when you have to test one given kernel binary, it is a challenge. > >> >Otherwise the glue code logic will only expose the >> >best performing one for a cpu and hide the others, which was intentional >> >on our part to prevent a lower performing sha from getting used. >> >> Agreed, but the kernel crypto API does that already using the priorities -- >> IMHO a very clean and easy to interpret solution. >> >> Furthermore, if somebody really has a need to not use the fastest HW >> implementation, the kernel crypto API allows him to do that. With the hard- >> wired approach in the glue file, you are stuck. > >Still, why would some kernel module specifically not want to >use the fastest HW implementation, and explicitly ask for >a slower driver? I have seen one instance where a hardware driver was broken on one particular hardware. So, the only way was to disable it. In our case here, disabling means to go back to the software implementation of SHA. Ciao Stephan -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] crypto: [sha] glue code for Intel SHA extensions optimized SHA1 & SHA256
From: Tim Chen Date: Fri, 11 Sep 2015 11:49:32 -0700 > Still, why would some kernel module specifically not want to > use the fastest HW implementation, and explicitly ask for > a slower driver? Temporary workaround if a bug is found. There is really no reason to prevent the user from having this flexibility, and in return anyone can test any implementation their cpu can support. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] crypto: [sha] glue code for Intel SHA extensions optimized SHA1 & SHA256
On Fri, 2015-09-11 at 19:02 +0200, Stephan Mueller wrote: > Am Donnerstag, 10. September 2015, 17:04:31 schrieb Tim Chen: > > Hi Tim, > > > >Is there a scenario you can think of > >when a lower performing sha1 transform needs to > >be exposed as a separate driver? > > My immediate concern is testing: it is hard to test the individual > implementations. > > Not hard, just one line in the glue code to set the transform to the one you need it you really want to test individual implementation. Usually user of sha don't care which sha driver they got, but just the highest priority one. So you will anyway need to patch and change the priority of the sha driver to expose a specific one for testing. > >Otherwise the glue code logic will only expose the > >best performing one for a cpu and hide the others, which was intentional > >on our part to prevent a lower performing sha from getting used. > > Agreed, but the kernel crypto API does that already using the priorities -- > IMHO a very clean and easy to interpret solution. > > Furthermore, if somebody really has a need to not use the fastest HW > implementation, the kernel crypto API allows him to do that. With the hard- > wired approach in the glue file, you are stuck. Still, why would some kernel module specifically not want to use the fastest HW implementation, and explicitly ask for a slower driver? Tim -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] crypto: [sha] glue code for Intel SHA extensions optimized SHA1 & SHA256
Am Donnerstag, 10. September 2015, 17:04:31 schrieb Tim Chen: Hi Tim, > >Is there a scenario you can think of >when a lower performing sha1 transform needs to >be exposed as a separate driver? My immediate concern is testing: it is hard to test the individual implementations. > >Otherwise the glue code logic will only expose the >best performing one for a cpu and hide the others, which was intentional >on our part to prevent a lower performing sha from getting used. Agreed, but the kernel crypto API does that already using the priorities -- IMHO a very clean and easy to interpret solution. Furthermore, if somebody really has a need to not use the fastest HW implementation, the kernel crypto API allows him to do that. With the hard- wired approach in the glue file, you are stuck. Ciao Stephan -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] crypto: [sha] glue code for Intel SHA extensions optimized SHA1 & SHA256
On Fri, 2015-09-11 at 00:52 +0200, Stephan Mueller wrote: > Am Donnerstag, 10. September 2015, 15:27:20 schrieb Tim Chen: > > Hi Tim, > > >This patch adds the glue code to detect and utilize the Intel SHA > >extensions optimized SHA1 and SHA256 update transforms when available. > > > >This code has been tested on Broxton for functionality. > > A general comment on this file: shouldn't this file be cleaned and use the > standard mechanisms of the kernel crypto API? > > This glue implements its own selection of which SHA implementation to use. > But > the kernel crypto API implements that logic already. The issue with the > current implementation in this file is that you have no clue which particular > implementation of SHA is in use in one particular case. > > So, may I suggest a restructuring to define independent instances of SHA, > such > as > > - cra_name == "sha1", cra_driver_name="sha1_ssse3", cra_priority=300 > - cra_name == "sha1", cra_driver_name="sha1_avx", cra_priority=400 > - cra_name == "sha1", cra_driver_name="sha1_avx2", cra_priority=500 > - cra_name == "sha1", cra_driver_name="sha1_shavx", cra_priority=600 > > Similarly for the other SHAs? > > In all the register functions for the ciphers, you can bail out if the > hardware does not support an implementation. Stephen, Is there a scenario you can think of when a lower performing sha1 transform needs to be exposed as a separate driver? Otherwise the glue code logic will only expose the best performing one for a cpu and hide the others, which was intentional on our part to prevent a lower performing sha from getting used. Tim -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] crypto: [sha] glue code for Intel SHA extensions optimized SHA1 & SHA256
Am Donnerstag, 10. September 2015, 15:27:20 schrieb Tim Chen: Hi Tim, >This patch adds the glue code to detect and utilize the Intel SHA >extensions optimized SHA1 and SHA256 update transforms when available. > >This code has been tested on Broxton for functionality. A general comment on this file: shouldn't this file be cleaned and use the standard mechanisms of the kernel crypto API? This glue implements its own selection of which SHA implementation to use. But the kernel crypto API implements that logic already. The issue with the current implementation in this file is that you have no clue which particular implementation of SHA is in use in one particular case. So, may I suggest a restructuring to define independent instances of SHA, such as - cra_name == "sha1", cra_driver_name="sha1_ssse3", cra_priority=300 - cra_name == "sha1", cra_driver_name="sha1_avx", cra_priority=400 - cra_name == "sha1", cra_driver_name="sha1_avx2", cra_priority=500 - cra_name == "sha1", cra_driver_name="sha1_shavx", cra_priority=600 Similarly for the other SHAs? In all the register functions for the ciphers, you can bail out if the hardware does not support an implementation. Ciao Stephan -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] crypto: [sha] glue code for Intel SHA extensions optimized SHA1 & SHA256
This patch adds the glue code to detect and utilize the Intel SHA extensions optimized SHA1 and SHA256 update transforms when available. This code has been tested on Broxton for functionality. Originally-by: Chandramouli Narayanan Signed-off-by: Tim Chen --- arch/x86/crypto/sha1_ssse3_glue.c | 12 +++- arch/x86/crypto/sha256_ssse3_glue.c | 38 ++--- 2 files changed, 34 insertions(+), 16 deletions(-) diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c index 7c48e8b..98be8cc 100644 --- a/arch/x86/crypto/sha1_ssse3_glue.c +++ b/arch/x86/crypto/sha1_ssse3_glue.c @@ -44,6 +44,10 @@ asmlinkage void sha1_transform_avx(u32 *digest, const char *data, asmlinkage void sha1_transform_avx2(u32 *digest, const char *data, unsigned int rounds); #endif +#ifdef CONFIG_AS_SHA1_NI +asmlinkage void sha1_ni_transform(u32 *digest, const char *data, + unsigned int rounds); +#endif static void (*sha1_transform_asm)(u32 *, const char *, unsigned int); @@ -166,12 +170,18 @@ static int __init sha1_ssse3_mod_init(void) #endif } #endif +#ifdef CONFIG_AS_SHA1_NI + if (boot_cpu_has(X86_FEATURE_SHA_NI)) { + sha1_transform_asm = sha1_ni_transform; + algo_name = "SHA-NI"; + } +#endif if (sha1_transform_asm) { pr_info("Using %s optimized SHA-1 implementation\n", algo_name); return crypto_register_shash(&alg); } - pr_info("Neither AVX nor AVX2 nor SSSE3 is available/usable.\n"); + pr_info("Neither AVX nor AVX2 nor SSSE3/SHA-NI is available/usable.\n"); return -ENODEV; } diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c index f8097fc..9c7b22c 100644 --- a/arch/x86/crypto/sha256_ssse3_glue.c +++ b/arch/x86/crypto/sha256_ssse3_glue.c @@ -50,6 +50,10 @@ asmlinkage void sha256_transform_avx(u32 *digest, const char *data, asmlinkage void sha256_transform_rorx(u32 *digest, const char *data, u64 rounds); #endif +#ifdef CONFIG_AS_SHA256_NI +asmlinkage void sha256_ni_transform(u32 *digest, const char *data, + u64 rounds); /*unsigned int rounds);*/ +#endif static void (*sha256_transform_asm)(u32 *, const char *, u64); @@ -142,36 +146,40 @@ static bool __init avx_usable(void) static int __init sha256_ssse3_mod_init(void) { + char *algo; + /* test for SSSE3 first */ - if (cpu_has_ssse3) + if (cpu_has_ssse3) { sha256_transform_asm = sha256_transform_ssse3; + algo = "SSSE3"; + } #ifdef CONFIG_AS_AVX /* allow AVX to override SSSE3, it's a little faster */ if (avx_usable()) { + sha256_transform_asm = sha256_transform_avx; + algo = "AVX"; #ifdef CONFIG_AS_AVX2 - if (boot_cpu_has(X86_FEATURE_AVX2) && boot_cpu_has(X86_FEATURE_BMI2)) + if (boot_cpu_has(X86_FEATURE_AVX2) && + boot_cpu_has(X86_FEATURE_BMI2)) { sha256_transform_asm = sha256_transform_rorx; - else + algo = "AVX2"; + } +#endif + } #endif - sha256_transform_asm = sha256_transform_avx; +#ifdef CONFIG_AS_SHA256_NI + if (boot_cpu_has(X86_FEATURE_SHA_NI)) { + sha256_transform_asm = sha256_ni_transform; + algo = "SHA-256-NI"; } #endif if (sha256_transform_asm) { -#ifdef CONFIG_AS_AVX - if (sha256_transform_asm == sha256_transform_avx) - pr_info("Using AVX optimized SHA-256 implementation\n"); -#ifdef CONFIG_AS_AVX2 - else if (sha256_transform_asm == sha256_transform_rorx) - pr_info("Using AVX2 optimized SHA-256 implementation\n"); -#endif - else -#endif - pr_info("Using SSSE3 optimized SHA-256 implementation\n"); + pr_info("Using %s optimized SHA-256 implementation\n", algo); return crypto_register_shashes(algs, ARRAY_SIZE(algs)); } - pr_info("Neither AVX nor SSSE3 is available/usable.\n"); + pr_info("Neither AVX nor SSSE3/SHA-NI is available/usable.\n"); return -ENODEV; } -- 2.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html