Re: [PATCH 1/1] SHA1 transform: x86_64 AVX2 optimization -v3
Sorry, there seems to be a problem with the patch. Let me retest from the list again and repost. thanks - mouli On Tue, 2014-03-18 at 17:12 -0700, chandramouli narayanan wrote: > This git patch adds x86_64 AVX2 optimization of SHA1 transform > to crypto support. The patch has been tested with 3.14.0-rc1 > kernel. > > On a Haswell desktop, with turbo disabled and all cpus running > at maximum frequency, tcrypt shows AVX2 performance improvement > from 3% for 256 bytes update to 16% for 1024 bytes update over > AVX implementation. > > This patch adds sha1_avx2_transform(), the glue, build and > configuration changes needed for AVX2 optimization of SHA1 transform to > crypto support. > > Changes noted from the initial version of this patch are based on the > feedback from the community: > a) check for BMI2 in addition to AVX2 support since > __sha1_transform_avx2() uses rorx > b) Since the module build has dependency on 64bit, it is > redundant to check it in the code here. > c) coding style cleanup > d) simplification of the assembly code where macros are repetitively used. > > With regard to clean up the sha1-ssse3 module configuration on lines simlar > to Camellia: > > On a cursory look at the Camellia implementation, there are separate modules > for > AVX/AVX2. However, sha1-ssse3 is one module which adds the necessary > optimization > support (SSSE3/AVX/AVX2) for the low-level SHA1 transform function. With > better > optimization support, transform function is overridden as the case may be. > In the case of AVX2, due to performance reasons across datablock sizes, > the AVX or AVX2 transform function is used at run-time as it suits best. > The Makefile change therefore appends the necessary objects to the linkage. > Due to this, the patch appends AVX2 transform to the build mix and leaves > the configuration build support as is. > > Signed-off-by: Chandramouli Narayanan > > diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile > index 6ba54d6..61d6e28 100644 > --- a/arch/x86/crypto/Makefile > +++ b/arch/x86/crypto/Makefile > @@ -79,6 +79,9 @@ aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o > aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o > ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o > sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o > +ifeq ($(avx2_supported),yes) > +sha1-ssse3-y += sha1_avx2_x86_64_asm.o > +endif > crc32c-intel-y := crc32c-intel_glue.o > crc32c-intel-$(CONFIG_64BIT) += crc32c-pcl-intel-asm_64.o > crc32-pclmul-y := crc32-pclmul_asm.o crc32-pclmul_glue.o > diff --git a/arch/x86/crypto/sha1_ssse3_glue.c > b/arch/x86/crypto/sha1_ssse3_glue.c > index 4a11a9d..bdd6295 100644 > --- a/arch/x86/crypto/sha1_ssse3_glue.c > +++ b/arch/x86/crypto/sha1_ssse3_glue.c > @@ -10,6 +10,7 @@ > * Copyright (c) Andrew McDonald > * Copyright (c) Jean-Francois Dive > * Copyright (c) Mathias Krause > + * Copyright (c) Chandramouli Narayanan > * > * This program is free software; you can redistribute it and/or modify it > * under the terms of the GNU General Public License as published by the Free > @@ -39,6 +40,12 @@ asmlinkage void sha1_transform_ssse3(u32 *digest, const > char *data, > asmlinkage void sha1_transform_avx(u32 *digest, const char *data, > unsigned int rounds); > #endif > +#ifdef CONFIG_AS_AVX2 > +#define SHA1_AVX2_BLOCK_OPTSIZE 4 /* optimal 4*64 bytes of SHA1 > blocks */ > + > +asmlinkage void sha1_transform_avx2(u32 *digest, const char *data, > + unsigned int rounds); > +#endif > > static asmlinkage void (*sha1_transform_asm)(u32 *, const char *, unsigned > int); > > @@ -165,6 +172,19 @@ static int sha1_ssse3_import(struct shash_desc *desc, > const void *in) > return 0; > } > > +#ifdef CONFIG_AS_AVX2 > +static void __sha1_transform_avx2(u32 *digest, const char *data, > + unsigned int rounds) > +{ > + > + /* Select the optimal transform based on data block size */ > + if (rounds >= SHA1_AVX2_BLOCK_OPTSIZE) > + sha1_transform_avx2(digest, data, rounds); > + else > + sha1_transform_avx(digest, data, rounds); > +} > +#endif > + > static struct shash_alg alg = { > .digestsize = SHA1_DIGEST_SIZE, > .init = sha1_ssse3_init, > @@ -189,7 +209,11 @@ static bool __init avx_usable(void) > { > u64 xcr0; > > +#if defined(CONFIG_AS_AVX2) > + if (!cpu_has_avx || !cpu_has_avx2 || !cpu_has_osxsave) > +#else > if (!cpu_has_avx || !cpu_has_osxsave) > +#endif > return false; > > xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); > @@ -205,23 +229,35 @@ static bool __init avx_usable(void) > > static int __init sha1_ssse3_mod_init(void) > { > + char *algo_name; > /* test for SSSE3 first */ > - if (cpu_has_ssse3) > + if (cpu_has_ssse3) { > sha1_transform_as
[PATCH 1/1] SHA1 transform: x86_64 AVX2 optimization -v3
This git patch adds x86_64 AVX2 optimization of SHA1 transform to crypto support. The patch has been tested with 3.14.0-rc1 kernel. On a Haswell desktop, with turbo disabled and all cpus running at maximum frequency, tcrypt shows AVX2 performance improvement from 3% for 256 bytes update to 16% for 1024 bytes update over AVX implementation. This patch adds sha1_avx2_transform(), the glue, build and configuration changes needed for AVX2 optimization of SHA1 transform to crypto support. Changes noted from the initial version of this patch are based on the feedback from the community: a) check for BMI2 in addition to AVX2 support since __sha1_transform_avx2() uses rorx b) Since the module build has dependency on 64bit, it is redundant to check it in the code here. c) coding style cleanup d) simplification of the assembly code where macros are repetitively used. With regard to clean up the sha1-ssse3 module configuration on lines simlar to Camellia: On a cursory look at the Camellia implementation, there are separate modules for AVX/AVX2. However, sha1-ssse3 is one module which adds the necessary optimization support (SSSE3/AVX/AVX2) for the low-level SHA1 transform function. With better optimization support, transform function is overridden as the case may be. In the case of AVX2, due to performance reasons across datablock sizes, the AVX or AVX2 transform function is used at run-time as it suits best. The Makefile change therefore appends the necessary objects to the linkage. Due to this, the patch appends AVX2 transform to the build mix and leaves the configuration build support as is. Signed-off-by: Chandramouli Narayanan diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 6ba54d6..61d6e28 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -79,6 +79,9 @@ aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o +ifeq ($(avx2_supported),yes) +sha1-ssse3-y += sha1_avx2_x86_64_asm.o +endif crc32c-intel-y := crc32c-intel_glue.o crc32c-intel-$(CONFIG_64BIT) += crc32c-pcl-intel-asm_64.o crc32-pclmul-y := crc32-pclmul_asm.o crc32-pclmul_glue.o diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c index 4a11a9d..bdd6295 100644 --- a/arch/x86/crypto/sha1_ssse3_glue.c +++ b/arch/x86/crypto/sha1_ssse3_glue.c @@ -10,6 +10,7 @@ * Copyright (c) Andrew McDonald * Copyright (c) Jean-Francois Dive * Copyright (c) Mathias Krause + * Copyright (c) Chandramouli Narayanan * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the Free @@ -39,6 +40,12 @@ asmlinkage void sha1_transform_ssse3(u32 *digest, const char *data, asmlinkage void sha1_transform_avx(u32 *digest, const char *data, unsigned int rounds); #endif +#ifdef CONFIG_AS_AVX2 +#define SHA1_AVX2_BLOCK_OPTSIZE4 /* optimal 4*64 bytes of SHA1 blocks */ + +asmlinkage void sha1_transform_avx2(u32 *digest, const char *data, + unsigned int rounds); +#endif static asmlinkage void (*sha1_transform_asm)(u32 *, const char *, unsigned int); @@ -165,6 +172,19 @@ static int sha1_ssse3_import(struct shash_desc *desc, const void *in) return 0; } +#ifdef CONFIG_AS_AVX2 +static void __sha1_transform_avx2(u32 *digest, const char *data, + unsigned int rounds) +{ + + /* Select the optimal transform based on data block size */ + if (rounds >= SHA1_AVX2_BLOCK_OPTSIZE) + sha1_transform_avx2(digest, data, rounds); + else + sha1_transform_avx(digest, data, rounds); +} +#endif + static struct shash_alg alg = { .digestsize = SHA1_DIGEST_SIZE, .init = sha1_ssse3_init, @@ -189,7 +209,11 @@ static bool __init avx_usable(void) { u64 xcr0; +#if defined(CONFIG_AS_AVX2) + if (!cpu_has_avx || !cpu_has_avx2 || !cpu_has_osxsave) +#else if (!cpu_has_avx || !cpu_has_osxsave) +#endif return false; xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); @@ -205,23 +229,35 @@ static bool __init avx_usable(void) static int __init sha1_ssse3_mod_init(void) { + char *algo_name; /* test for SSSE3 first */ - if (cpu_has_ssse3) + if (cpu_has_ssse3) { sha1_transform_asm = sha1_transform_ssse3; + algo_name = "SSSE3"; + } #ifdef CONFIG_AS_AVX /* allow AVX to override SSSE3, it's a little faster */ - if (avx_usable()) - sha1_transform_asm = sha1_transform_avx; + if (avx_usable()) { + if (cpu_has_avx) { + sha1_transform_asm = sha1_transform_avx; +
Re: [RFC PATCH 03/22] staging: crypto: skein: allow building statically
On Tue, Mar 18, 2014 at 08:58:49AM -0400, Jason Cooper wrote: > On Mon, Mar 17, 2014 at 02:52:52PM -0700, Greg KH wrote: > > On Tue, Mar 11, 2014 at 09:32:35PM +, Jason Cooper wrote: > > > These are the minimum changes required to get the code to build > > > statically in the kernel. It's necessary to do this first so that we > > > can empirically determine that future cleanup patches aren't changing > > > the generated object code. > > > > > > Signed-off-by: Jason Cooper > > > > This doesn't apply to my latest tree :( > > Ah, ok. I'll rebase this series on the staging tree. > > > > --- a/drivers/staging/Makefile > > > +++ b/drivers/staging/Makefile > > > @@ -65,3 +65,4 @@ obj-$(CONFIG_XILLYBUS) += xillybus/ > > > obj-$(CONFIG_DGNC) += dgnc/ > > > obj-$(CONFIG_DGAP) += dgap/ > > > obj-$(CONFIG_MTD_SPINAND_MT29F) += mt29f_spinand/ > > > +obj-$(CONFIG_CRYPTO_SKEIN) += skein/ > > > > Care to align these up with the way this file is formatted? > > Of course, not sure what happened there (well, other than the obvious > :-P) > > > And I have no objection to taking the drivers/staging/ patches, the > > script looks useful, but I can't take it through the staging tree, > > sorry. > > Ok, I'll pull that out as a separate branch. Do you mind taking a > series that depends on a topic branch from another tree? We do it a lot > in arm-soc, but I'm not sure how popular that is elsewhere. It's not a dependancy at all, and I don't take git pull requests for the staging tree, just email patches, sorry. So just resend these patches thanks. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
GOOGLE Gewinnbenachrichtigung!!!
GOOGLE INTERNATIONALE Förderung-GEWINNZUTEILUNG GOOGLE Förderung GOOGLE Einarbeitung© UROPA INTERLOTTO/EUROMILLINEN LOTTO BONUS-PROGRAMM BEL-GRAVE HOUSE, 76 BUCKINGHAM PALACE ROAD, LONDON SW1W 9TQ, UNITED KINGDOM.OFFICE; VOM SCHREIBTISCH VON VICE PRESIDENT INTERNATIONAL PROMOTION/PREISVERLEIHUNG AKTENZEICHEN: ANA/BCL654560010/ESP. OFFIZIELLE GEWINNBENACHRITIGUNG. DATUM: 18/03/2014. ACHTUNG! DER BEGÜNSTIGTE, GOOGLE Gewinnbenachrichtigung!!! Wir freuen uns,Ihnen noch einmal gratulieren zu dieser Notiz, die Teil unserer glücklichen Gewinner ausgewählt diesem Jahr. Diese Aktion wurde im Anschluss an die aktive Nutzung des Google-Suchmaschine und im Google-Hilfsdienste anzukurbeln. Daher machen wir mit Ihren Preis zu gewinnen glauben, werden Sie weiterhin aktiv zu sein und Schirmherrschaft für dieses Unternehmen. Google ist heute der weltweit führende Suchmaschine weltweit und in dem Bestreben, sicherzustellen, dass sie die am weitesten verbreitete Suchmaschinen bleibt, ein Online-E-Mail Stimmabgabe aus wurde am 13/03/2014 ohne Ihr Wissen ausgeführt wurde offiziell freigegeben am 18/03/2014. Wir möchten Ihnen förmlich mitteilen, dass Ihre E-Mail-Adresse zu einem Pauschalbetrag von:(450,000,00)(VIER HUNDERT UND FÜNFZIG TAUSEND Euro.)Die summe ergibt sich einer gewinnausschuttung von.(18,000,000:00)(ACHTZEHN MILLIONEN EURO.)Die summe wurde durch 40 gewinnern aus der gleichen kategorie geteilt.HERZLICHEN GLUCKWUNSCH!!! befestigt war Wir möchten Ihnen mitteilen, dass Sie erfolgreich den Anforderungen, den gesetzlichen Auflagen, Überprüfungen und unsere zufriedenstellenden Bericht Test für alle unsere Online-Gewinner durchgeführt weitergegeben. Eine gewinnende Scheck wird in Ihrem Namen von Google Förderpreis Team erteilt werden, und auch eine Bescheinigung über die Auszeichnung Ansprüche werden an der Seite Ihrer gewinnenden Scheck zugesandt. Dies sind Ihre Auszeichnung Details. Security Code Nummer: GU/N/3642375G Ticket Nr.: GU/K/699/33/2014 Gewinnzahlen: UK/877/798/2014 Erforderliche Angaben von Ihnen Teil unserer vorsorglich zu verdoppeln behaupten und ungerechtfertigten Missbrauch von diesem Programm zu vermeiden. Um Ihren Preis gewonnen, bitte, sind die Mittel unserer ausländischen Transfer Manager Mr. Audrew C. Nielsen korrekt, Prüfung und Genehmigung gefüllt Bitte vollständige Überprüfung und Mittel freizugeben Form (1) Ihr vollständiger Name ... (2) Kontakt-Adresse (3) Tel. Zahlen ... (4) Mobile .. (5) Nationalität / Land . (6) Beruf ... (7) Sex /Geschlecht (8) Alter (9) jemals gewonnen eine Online-Lotterie? . (10) Mode von Preisgeld nach Hause: Courier Lieferung oder Banküberweisung .. Die oben genannten Anforderungen sind erforderlich. Glückwünsche noch einmal. Mode des Geldpreises Transfer (1) Überweisung / Courier Lieferung Ihrer Certified Winning Scheck Name und andere Dokumente, um sicherzustellen, Sie gewinnen. Wir empfehlen, dass Ihr auswärtige Transfer Manager, Mr. Audrew C. Nielsen mit seiner privaten E-Mail unten, um unnötige Verzögerungen und Komplikationen: *** AUSLANDSFORDERUNGEN MANAGER Dr. Mrs. ROSALINE CHARLES GOOGLE Sicherheitsabteilung [London] Kontakt E-Mail: gogle.eurolo...@gmail.com google-verwalt...@linuxmail.org google-verwalt...@planetmail.net *** Security Code Nummer: GU/N/3642375G Ticket Nr.: GU/K/699/33/2014 Gewinnzahlen: UK/877/798/2014 *** Herzlichen Glückwunsch noch einmal von Mitgliedern und Personal der Lotterie Bord. Herzlichst, Mr. larry Page. Chief Executive Officer (C.E.O) GOOGLE zonale Koordinator LONDON, UNITED KINGDOM. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 03/22] staging: crypto: skein: allow building statically
On Mon, Mar 17, 2014 at 02:52:52PM -0700, Greg KH wrote: > On Tue, Mar 11, 2014 at 09:32:35PM +, Jason Cooper wrote: > > These are the minimum changes required to get the code to build > > statically in the kernel. It's necessary to do this first so that we > > can empirically determine that future cleanup patches aren't changing > > the generated object code. > > > > Signed-off-by: Jason Cooper > > This doesn't apply to my latest tree :( Ah, ok. I'll rebase this series on the staging tree. > > --- a/drivers/staging/Makefile > > +++ b/drivers/staging/Makefile > > @@ -65,3 +65,4 @@ obj-$(CONFIG_XILLYBUS)+= xillybus/ > > obj-$(CONFIG_DGNC) += dgnc/ > > obj-$(CONFIG_DGAP) += dgap/ > > obj-$(CONFIG_MTD_SPINAND_MT29F)+= mt29f_spinand/ > > +obj-$(CONFIG_CRYPTO_SKEIN) += skein/ > > Care to align these up with the way this file is formatted? Of course, not sure what happened there (well, other than the obvious :-P) > And I have no objection to taking the drivers/staging/ patches, the > script looks useful, but I can't take it through the staging tree, > sorry. Ok, I'll pull that out as a separate branch. Do you mind taking a series that depends on a topic branch from another tree? We do it a lot in arm-soc, but I'm not sure how popular that is elsewhere. It's purely an audit/testing dependency, but it would be nice to have it available in the history if someone wants to audit the changes. I have one change I'd like to do to the objdiff script. I'd like it to assume 'HEAD^ HEAD' when the user executes './scripts/objdiff diff'. I'll respin both and submit a v1. Thanks for the review. thx, Jason. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/lib: add optimized implementation of sha_transform
On Tuesday, March 18, 2014 at 08:26:00 AM, Ard Biesheuvel wrote: > On 17 March 2014 22:18, Marek Vasut wrote: > > On Friday, March 14, 2014 at 04:02:33 PM, Ard Biesheuvel wrote: > >> This implementation keeps the 64 bytes of workspace in registers rather > >> than on the stack, eliminating most of the loads and stores, and > >> reducing the instruction count by about 25%. > >> > >> Signed-off-by: Ard Biesheuvel > >> --- > >> Hello all, > >> > >> No performance numbers I am allowed to share, unfortunately, so if > >> anyone else (with access to actual, representative hardware) would care > >> to have a go, I would be very grateful. > >> > >> This can be done by building the tcrypt.ko module > >> (CONFIG_CRYPTO_TEST=m), and inserting the module using 'mode=303' as a > >> parameter (note that the insmod always fails, but produces its test > >> output to the kernel log). Also note that the sha_transform() function > >> will be part of the kernel proper, so just rebuilding the sha1_generic > >> module is not sufficient. > >> > >> Cheers, > > > > Won't the function sha_transform() collide with the one in lib/sha1.c ? > > Or will the one in lib/sha1.c be overriden somehow ? > > No, this works pretty well, in fact: arch/*/lib has precedence over > lib/, and objects (declared with lib-y +=) are only included to > satisfy unresolved dependencies. So the second (generic) sha1.o will > not get linked. Thanks for clearing this ! > > Otherwise: > > > > Reviewed-by: Marek Vasut > > Thanks. I did send a v2 which is actually a lot different from the > version you reviewed, so I won't carry over your reviewed-by without > your acknowledgement. Thanks! Best regards, Marek Vasut -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/lib: add optimized implementation of sha_transform
On 17 March 2014 22:18, Marek Vasut wrote: > On Friday, March 14, 2014 at 04:02:33 PM, Ard Biesheuvel wrote: >> This implementation keeps the 64 bytes of workspace in registers rather >> than on the stack, eliminating most of the loads and stores, and reducing >> the instruction count by about 25%. >> >> Signed-off-by: Ard Biesheuvel >> --- >> Hello all, >> >> No performance numbers I am allowed to share, unfortunately, so if anyone >> else (with access to actual, representative hardware) would care to have a >> go, I would be very grateful. >> >> This can be done by building the tcrypt.ko module (CONFIG_CRYPTO_TEST=m), >> and inserting the module using 'mode=303' as a parameter (note that the >> insmod always fails, but produces its test output to the kernel log). Also >> note that the sha_transform() function will be part of the kernel proper, >> so just rebuilding the sha1_generic module is not sufficient. >> >> Cheers, > > Won't the function sha_transform() collide with the one in lib/sha1.c ? Or > will > the one in lib/sha1.c be overriden somehow ? > No, this works pretty well, in fact: arch/*/lib has precedence over lib/, and objects (declared with lib-y +=) are only included to satisfy unresolved dependencies. So the second (generic) sha1.o will not get linked. > Otherwise: > > Reviewed-by: Marek Vasut > Thanks. I did send a v2 which is actually a lot different from the version you reviewed, so I won't carry over your reviewed-by without your acknowledgement. Cheers, Ard. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html