Re: [PATCH 0/7] Phase one of sparc crypto opcode support.
You mentioned Montgomery BN. Here are how the instructions work. The basic model is that there is a range of sizes supported by the instruction, and all of the data is loaded into a combination of the floating point registers and all of the register windows of the cpu. Ouch! ... save ... restore ... Of course, you might quickly ask what happens in 32-bit mode? No, before thinking about 32-bit mode, I quickly ask what's with save-s without arguments? I quickly ask what happens if context switch strikes in the middle? save without argument means that %sp will be effectively uninitialized and attempts to refer stack [during context switch or asynchronous signal delivery] are either doomed or corrupt stack. So save-s ought to allocate frames. But even then, [and in 64-bit mode], do instructions in question ensure that register windows are loaded prior execution? I mean consider context switch between a save and say montmul. Kernel dumps all windows on stack and when execution resumes it normally brings in only one top window and let's window trap bring in remaining ones on demand. So that before instructions in question can start actual processing, all windows has to be loaded. Presumably the instructions can trigger window trap, then kernel would have to see that it's one of the instructions that triggered it and act accordingly, i.e. bring in all the windows. Does it work that way? Or do I get it backwards? I assume that instructions in question are uninterruptible, so that trap can be generated only prior calculation... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
Provide these so that the assembler users can be oblivious about whether this is PIC or non-PIC, 64-bit or 32-bit, etc. It is important to use a real call and return to implement the obtaining of the %pc as part of the PIC sequence. Sequences such as: call. + 8 mov %o7, %PIC_REG are to be avoided at all costs on UltraSPARC cpus. This is because such a sequence flushes the Return Address Stack (RAS) because the call is not paired with a return. Every time a call or jmpl with RD=%o7 is performed, the chip pushes the PC+8 onto the top of the RAS. The next jmpl %o7 + 8 or return %i7 + 8 the chip sees will cause it to pop the top entry off the RAS and begin fetching down that path. If there is a mis-match the entire pipeline is flushed and the chip restarts fetching down the correct path. Therefore, the above discouraged sequence will cause all of the RAS entries to mismatch and there will therefore be a full pipeline flush on every subsequent function return. Well, last time I looked into this I could establish following. call .+8 was actually used by vendor compiler [maybe not anymore, I don't know, but at Sun days it was used extensively]. SPARC V manual is explicit about call .+8 *not* affecting RAS. Purify also was discussed in context, and it actually recognizes the construct and treats it specially. In other words it was considered widely adopted practice and it was found to be backed up by at least one hardware design. Penalties are measured to be minimal on UltraSPARC, two additional cycles (in comparison to 20 cycles for save and restore alone). But of course today situation might be different and T-SPARCs can suffer from it more... I'll handle this, but differently. Specifically I won't go through GOT, but directly to variable, something like this: .Lretl: retl nop ... sethi %hi(var-.Lpic),%reg .Lpic: call.Lretl add %o7,%lo(var-.Lpic),%reg This works with both Solaris and Linux toolchains and in both 32- and 64-bit mode (it was hell to get des_enc work everywhere). In 64-bit mode it implies that shared library itself is limited by 2GB, but it's considered reasonable limitation. Avoiding GOT allows to hide OPENSSL_sparcv9cap_P with __attribute__((visibility(hidden)));. Now it's static. Once again, don't think about it no more, it will be taken care of. As for OPENSSL_sparcv9cap_P itself. I'd rather extend it to int OPENSSL_sparcv9cap_P[2] and save %cfr as is to OPENSSL_sparcv9cap_P[1]. Any objections? __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured
Attached are patches for 1.0.0 and 0.9.8. -- Rob Stradling Senior Research Development Scientist COMODO - Creating Trust Online Office Tel: +44.(0)1274.730505 Office Fax: +44.(0)1274.730909 www.comodo.com COMODO CA Limited, Registered in England No. 04058690 Registered Office: 3rd Floor, 26 Office Village, Exchange Quay, Trafford Road, Salford, Manchester M5 3EQ This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender by replying to the e-mail containing this attachment. Replies to this email may be monitored by COMODO for operational or business reasons. Whilst every endeavour is taken to ensure that e-mails are free from viruses, no liability can be accepted and the recipient is requested to use their own virus checking software. Index: ssl/s3_srvr.c === RCS file: /v/openssl/cvs/openssl/ssl/s3_srvr.c,v retrieving revision 1.126.2.42 diff -u -r1.126.2.42 s3_srvr.c --- ssl/s3_srvr.c 16 Feb 2012 15:21:17 - 1.126.2.42 +++ ssl/s3_srvr.c 20 Sep 2012 14:40:41 - @@ -1005,7 +1005,7 @@ goto f_err; } } - if (ssl_check_clienthello_tlsext(s) = 0) { + if (ssl_check_clienthello_tlsext_early(s) = 0) { SSLerr(SSL_F_SSL3_GET_CLIENT_HELLO,SSL_R_CLIENTHELLO_TLSEXT); goto err; } @@ -1131,6 +1131,16 @@ * s-tmp.new_cipher- the new cipher to use. */ + /* Handles TLS extensions that we couldn't check earlier */ + if (s-version = SSL3_VERSION) + { + if (ssl_check_clienthello_tlsext_late(s) = 0) + { + SSLerr(SSL_F_SSL3_GET_CLIENT_HELLO,SSL_R_CLIENTHELLO_TLSEXT); + goto err; + } + } + if (ret 0) ret=1; if (0) { Index: ssl/ssl_lib.c === RCS file: /v/openssl/cvs/openssl/ssl/ssl_lib.c,v retrieving revision 1.133.2.31 diff -u -r1.133.2.31 ssl_lib.c --- ssl/ssl_lib.c 5 Jan 2012 10:21:49 - 1.133.2.31 +++ ssl/ssl_lib.c 20 Sep 2012 14:40:41 - @@ -1943,7 +1943,7 @@ } /* THIS NEEDS CLEANING UP */ -X509 *ssl_get_server_send_cert(SSL *s) +X509 *ssl_get_server_send_cert(const SSL *s) { unsigned long alg,kalg; CERT *c; @@ -2420,7 +2420,9 @@ /* Fix this function so that it takes an optional type parameter */ X509 *SSL_get_certificate(const SSL *s) { - if (s-cert != NULL) + if (s-server) + return(ssl_get_server_send_cert(s)); + else if (s-cert != NULL) return(s-cert-key-x509); else return(NULL); Index: ssl/ssl_locl.h === RCS file: /v/openssl/cvs/openssl/ssl/ssl_locl.h,v retrieving revision 1.63.2.22 diff -u -r1.63.2.22 ssl_locl.h --- ssl/ssl_locl.h 9 Mar 2012 15:51:56 - 1.63.2.22 +++ ssl/ssl_locl.h 20 Sep 2012 14:40:41 - @@ -740,7 +740,7 @@ int ssl_undefined_function(SSL *s); int ssl_undefined_void_function(void); int ssl_undefined_const_function(const SSL *s); -X509 *ssl_get_server_send_cert(SSL *); +X509 *ssl_get_server_send_cert(const SSL *); EVP_PKEY *ssl_get_sign_pkey(SSL *,SSL_CIPHER *); int ssl_cert_type(X509 *x,EVP_PKEY *pkey); void ssl_set_cert_masks(CERT *c, SSL_CIPHER *cipher); @@ -979,7 +979,8 @@ int ssl_parse_serverhello_tlsext(SSL *s, unsigned char **data, unsigned char *d, int n, int *al); int ssl_prepare_clienthello_tlsext(SSL *s); int ssl_prepare_serverhello_tlsext(SSL *s); -int ssl_check_clienthello_tlsext(SSL *s); +int ssl_check_clienthello_tlsext_early(SSL *s); +int ssl_check_clienthello_tlsext_late(SSL *s); int ssl_check_serverhello_tlsext(SSL *s); #ifdef OPENSSL_NO_SHA256 Index: ssl/t1_lib.c === RCS file: /v/openssl/cvs/openssl/ssl/t1_lib.c,v retrieving revision 1.13.2.30 diff -u -r1.13.2.30 t1_lib.c --- ssl/t1_lib.c4 Jan 2012 14:25:10 - 1.13.2.30 +++ ssl/t1_lib.c20 Sep 2012 14:40:41 - @@ -745,7 +745,7 @@ return 1; } -int ssl_check_clienthello_tlsext(SSL *s) +int ssl_check_clienthello_tlsext_early(SSL *s) { int ret=SSL_TLSEXT_ERR_NOACK; int al = SSL_AD_UNRECOGNIZED_NAME; @@ -755,11 +755,34 @@ else if (s-initial_ctx != NULL s-initial_ctx-tlsext_servername_callback != 0) ret = s-initial_ctx-tlsext_servername_callback(s, al, s-initial_ctx-tlsext_servername_arg); + switch (ret) + { + case
[openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured
[rob.stradl...@comodo.com - Fri Sep 21 15:02:54 2012]: Attached are patches for 1.0.0 and 0.9.8. Note, I updated the original change to retain compatibility with existing behaviour as far as possible. See: http://cvs.openssl.org/chngview?cn=22808 Steve. -- Dr Stephen N. Henson. OpenSSL project core developer. Commercial tech support now available see: http://www.openssl.org __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured
Hi Steve. I saw your update (to 1.0.2 and HEAD), and I did start looking at backporting it into my 1.0.1/1.0.0/0.9.8 patches. ssl_get_server_send_pkey() is not available in 1.0.1 and earlier, so the t1_lib.c patch would have to be something like... + X509 *x; + x = ssl_get_server_send_cert)s); + /* If no certificate can't return certificate status */ + if (x == NULL) + { + s-tlsext_status_expected = 0; + return 1; + } + /* Set current certificate to one we will use so +* SSL_get_certificate et al can pick it up. +*/ + s-cert-key-x509 = x; Is it OK to update s-cert-key-x509 like this? On 21/09/12 14:34, Stephen Henson via RT wrote: [rob.stradl...@comodo.com - Fri Sep 21 15:02:54 2012]: Attached are patches for 1.0.0 and 0.9.8. Note, I updated the original change to retain compatibility with existing behaviour as far as possible. See: http://cvs.openssl.org/chngview?cn=22808 Steve. -- Rob Stradling Senior Research Development Scientist COMODO - Creating Trust Online __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
[openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured
[rob.stradl...@comodo.com - Fri Sep 21 15:55:39 2012]: Hi Steve. I saw your update (to 1.0.2 and HEAD), and I did start looking at backporting it into my 1.0.1/1.0.0/0.9.8 patches. ssl_get_server_send_pkey() is not available in 1.0.1 and earlier, so the t1_lib.c patch would have to be something like... + X509 *x; + x = ssl_get_server_send_cert)s); + /* If no certificate can't return certificate status */ + if (x == NULL) + { + s-tlsext_status_expected = 0; + return 1; + } + /* Set current certificate to one we will use so + * SSL_get_certificate et al can pick it up. + */ + s-cert-key-x509 = x; Is it OK to update s-cert-key-x509 like this? No because you could end up with all sorts of bad things happening (keys and certificates not matching, certificate types not matching and memory leaks). Easiest solution is to also backport ssl_get_server_send_pkey see: http://cvs.openssl.org/chngview?cn=22840 Steve. -- Dr Stephen N. Henson. OpenSSL project core developer. Commercial tech support now available see: http://www.openssl.org __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured
On 21/09/12 15:04, Stephen Henson via RT wrote: [rob.stradl...@comodo.com - Fri Sep 21 15:55:39 2012]: Hi Steve. I saw your update (to 1.0.2 and HEAD), and I did start looking at backporting it into my 1.0.1/1.0.0/0.9.8 patches. ssl_get_server_send_pkey() is not available in 1.0.1 and earlier, so the t1_lib.c patch would have to be something like... +X509 *x; +x = ssl_get_server_send_cert)s); +/* If no certificate can't return certificate status */ +if (x == NULL) +{ +s-tlsext_status_expected = 0; +return 1; +} +/* Set current certificate to one we will use so + * SSL_get_certificate et al can pick it up. + */ +s-cert-key-x509 = x; Is it OK to update s-cert-key-x509 like this? No because you could end up with all sorts of bad things happening (keys and certificates not matching, certificate types not matching and memory leaks). That's what I thought. Easiest solution is to also backport ssl_get_server_send_pkey see: http://cvs.openssl.org/chngview?cn=22840 I didn't think of that. Thanks! I'll prepare patches to backport 22840 to 1.0.0 and 0.9.8 (unless you or Ben get there first). -- Rob Stradling Senior Research Development Scientist COMODO - Creating Trust Online __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured
On 21/09/12 15:12, Rob Stradling via RT wrote: On 21/09/12 15:04, Stephen Henson via RT wrote: snip Easiest solution is to also backport ssl_get_server_send_pkey see: http://cvs.openssl.org/chngview?cn=22840 I didn't think of that. Thanks! I'll prepare patches to backport 22840 to 1.0.0 and 0.9.8 (unless you or Ben get there first). http://cvs.openssl.org/patchset?cn=22840 applies cleanly (i.e. no failed hunks) on top of my patches for 1.0.0 and 0.9.8. -- Rob Stradling Senior Research Development Scientist COMODO - Creating Trust Online __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 0/7] Phase one of sparc crypto opcode support.
From: Andy Polyakov ap...@openssl.org Date: Fri, 21 Sep 2012 11:36:16 +0200 No, before thinking about 32-bit mode, I quickly ask what's with save-s without arguments? Sorry, I just wrote that code as pseudo-code off the top of my head without attending to all of the necessary details. We would indeed need to allocate a minimal stack frame in each save instruction. It's just an oversight in my example code, that's all. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
From: Andy Polyakov ap...@openssl.org Date: Fri, 21 Sep 2012 12:21:25 +0200 I'll handle this, but differently. Specifically I won't go through GOT, but directly to variable, something like this: I would like to politely request that you don't go down this road. .Lretl: retl nop ... sethi %hi(var-.Lpic),%reg .Lpic:call.Lretl add %o7,%lo(var-.Lpic),%reg I honestly think it's easiest to to simply generate correct PIC sequences, as my macros are trying to do. We can add whatever ifdefs and code generation cases we need to sparc_arch.h The code that I'm emitting is identical to what GCC generates on Linux and Solaris under Sparc regardless of which assembler and linker are in use. I should know, I wrote much of the sparc GCC backend. If you describe to me what problems your scheme ran into, I can fix them up. Did you test if my code sequences work for you? It is also important to note that they are also specifically designed to be usable in leaf functions. BTW, the real long term answer is mark openssl internal symbols as hidden and then use GOT_DATA optimization sequences which will get rid of the GOT reference altogether. But that requires some configure checks to see if the assembler and linker support these constructs. As for OPENSSL_sparcv9cap_P itself. I'd rather extend it to int OPENSSL_sparcv9cap_P[2] and save %cfr as is to OPENSSL_sparcv9cap_P[1]. Any objections? I think this is code masterbation at this early stage of the sparc crypto opcode support implementation and is something we can clean up later. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
Here is a more detailed reply specifically about generating correct and optimal Sparc PIC sequences. Let's get the non-PIC static case out of the way, we should always use: set symbol, %reg! 32-bit setxsymbol, %tmp_reg, %reg ! 64-bit Using calls to PIC stubs is completely pointless overhead when we are doing a static build. If we are generating PIC we need a stub function, there are a lot of ways to do this. One scheme is to simply emit a stub in each source file where the stub is needed. If the assembler and linker support got-data optimizations, we can emit the following sequence: sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG call__sparc_pic_stub or %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG sethi %gdop_hix22(symbol), %TMP xor %TMP, %gdop_lox10(symbol), %TMP LDPTR [%PIC_REG + %TMP], %REG, %gdop(symbol) If the linker finds that the resolution of symbol (f.e. the symbol is static to the compilation unit, or marked as 'hidden') can be done at final link time, that LDPTR above will be optimized into: add %PIC_REG, %TMP, %REG The symbol offset will also be adjusted, as needed, in the %gdop_*() sethi and xor instructions. And finally, the reference to the global offset table slot that would have been generated for 'symbol', will be removed. Otherwise, if the linker and assembler lack gotdata optimization support, we use just a plain PIC sequence: sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG call__sparc_pic_stub or %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG sethi %hi(symbol), %TMP or %TMP, %lo(symbol), %TMP LDPTR [%PIC_REG + %TMP], %REG If this doesn't work in some cases, we need to discover exactly why instead of dismissing my approach completely. Now, of course, all of the above if for -fPIC, but I see no sparc target (nor any target except one strange hpux case) that specifies -fpic instead of -fPIC in Configure. However that case is simple to accomodate as well, and I'd be happy to do so in my macros. About the RAS stack missing cost, every Sun produced UltraSPARC chip pushes unconditionally onto the RAS and does not special case the call.+8 pattern. Thinking about this logically, a RAS miss can (at best) perform like a full branch misprediction. Which on UltraSPARC results in a full pipeline flush as the mis-predicted fetched instructions needs to be cancelled and cleared out of the pipeline so we can begin executing down the correct path. This can be huge, depending upon the contents of the improperly fetched path of instructions. In the worst possible case, up to 18 instructions can need to be cancelled (UltraSPARC-I programmers manual, section 16.2.9, page 270) Worse than the immediate cost of the RAS corruption, is that every subsequent function return out of openssl is going to miss the RAS and incur the penalty as well. I consider it absolutely critical that the PIC sequences support being used in leaf functions, without save and restore instructions. And my macros have been designed with this in mind. When used, one need not allocate a register window merely for the sake of performing a PIC sequence. When we get past these initial patches and I post my DES work, you will see that I adjusted dec_enc.m4 to use the new PIC interfaces I created. In fact I had to, because the 13-bit relocations used there no longer fit with the crypto opcode code added. There are other problems in des_enc.m4, which I have fixed in my patches. As just one other example, it doesn't include opensslconf.h and therefore OPENSSL_SYSNAME_ULTRASPARC is never defined and the V9 sequences are never used for 32-bit, which hurts performance. Only one valid set of CPP tests exists for the various cases we care about on sparc. __PIC__ means PIC code generation is in use. __arch64__ means 64-bit code generation, and __sparc_v9__ means V9 code can be used. These are fully standardized and both SunPRO and GCC set them consistently. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
[PATCH 0/2] Sparc AES crypto opcode support.
This builds on top of the 7 patch series I sent the other day which laid the foundation for sparc crypto opcode support. The first patch plugs in optimized versions of key expansion and AES_{decrypt,encrypt}() The second patch is modelled on the AESNI support and explicitly optimizes ECB, CBC, CTR, OFB, and CFB modes. I'll do the remaining modes soon. I've put this through a battery of tests, and in particular I hacked up a local copy of test/test_aesni (which doesn't seem to get run even on x86?) that uses the appropriate sparc environment variable to turn off crypto opcode usage. That script helped a lot during validation. The 35GB/sec benchmark result in the second patch is not a typo :-) Signed-off-by: David S. Miller da...@davemloft.net __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
[PATCH 1/2] sparc: Add initial support for AES opcodes.
Currently AES_encrypt, AES_decrypt, and the key expansion are optimized. Direct support for CBC, ECB, CTR, etc. will come in subsequent changes. The following measurements were taken on a SPARC-T4. Baseline (OPENSSL_sparcv9cap=0): type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes aes-128 cbc 85241.72k90930.60k94282.67k95158.95k95087.08k aes-192 cbc 73300.41k77576.49k80022.95k80657.75k80838.66k aes-256 cbc 64390.17k67656.43k69442.30k69893.80k70022.49k With AES opcodes enabled: type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes aes-128 cbc 298612.77k 353669.87k 389577.22k 400843.61k 406031.02k aes-192 cbc 282841.19k 323486.85k 364641.37k 375664.98k 378989.23k aes-256 cbc 269449.24k 310281.81k 343170.05k 352550.23k 355317.08k There were several interesting implementation issues dealt with here. The AES opcodes need the decryption key in a different format than the generic sparc v9 code wants (basically, no pre-application of the MixColumn). To address this and also to facilitate using the AES opcodes for key expansion, a new aes_sparccore.c file is used in place of aes_core.c when building for sparcv9. The non-AES-opcode sparc code was changed to use a real proper PIC sequence with sparc_arch.h macros. The code which was there flushes the UltraSPARC return address stack, negatively impacting performance. Any call, or jmpl with destination register %o7, that lacks a paired ret/retl will effectively corrupt the return address stack, making every subsequent ret/retl miss the cache and take a full pipeline flush. The sparc_arch.h PIC loading sequences lack this problem, and also they know how to do non-PIC loading of symbol addresses even more efficiently. Next, usage of the AES instructions is unnecessarily difficult if the key is not 8-byte aligned. So we use a trick so that we always have an aligned key to work with. We determine if the AES_KEY is 8 or 4 byte aligned, these are the only two possibilities on sparc. If it is 8 byte aligned, we use the existing interpretation of the AES_KEY contents. However, if it is 4 byte aligned, we put the -rounds value first and then the key so that they key becomes 8-byte aligned. All of the aes_sparccore.c and aes-sparcv9.pl code is aware of this convention. Since we don't have any control over the alignment of the input buffers, output buffers, and input key, we make use of alignaddr, faligndata, and masked partial stores to deal with the unaligned cases. Signed-off-by: David S. Miller da...@davemloft.net --- Configure |2 +- crypto/aes/Makefile |4 +- crypto/aes/aes_sparccore.c| 272 ++ crypto/aes/asm/aes-sparcv9.pl | 502 - crypto/sparc_arch.h | 34 +++ 5 files changed, 802 insertions(+), 12 deletions(-) create mode 100644 crypto/aes/aes_sparccore.c diff --git a/Configure b/Configure index 2333a63..66b4ff8 100755 --- a/Configure +++ b/Configure @@ -130,7 +130,7 @@ my $x86_elf_asm=$x86_asm:elf; my $x86_64_asm=x86_64cpuid.o:x86_64-gcc.o x86_64-mont.o x86_64-mont5.o x86_64-gf2m.o modexp512-x86_64.o::aes-x86_64.o vpaes-x86_64.o bsaes-x86_64.o aesni-x86_64.o aesni-sha1-x86_64.o::md5-x86_64.o:sha1-x86_64.o sha256-x86_64.o sha512-x86_64.o::rc4-x86_64.o rc4-md5-x86_64.o:::wp-x86_64.o:cmll-x86_64.o cmll_misc.o:ghash-x86_64.o:e_padlock-x86_64.o; my $ia64_asm=ia64cpuid.o:bn-ia64.o ia64-mont.o::aes_core.o aes_cbc.o aes-ia64.o::md5-ia64.o:sha1-ia64.o sha256-ia64.o sha512-ia64.o::rc4-ia64.o rc4_skey.o:ghash-ia64.o::void; -my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_core.o aes_cbc.o aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o sha512-sparcv9.o:::ghash-sparcv9.o::void; +my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_sparccore.o aes_cbc.o aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o sha512-sparcv9.o:::ghash-sparcv9.o::void; my $sparcv8_asm=:sparcv8.o:des_enc-sparc.o fcrypt_b.o:void; my $alpha_asm=alphacpuid.o:bn_asm.o alpha-mont.o:sha1-alpha.o:::ghash-alpha.o::void; my $mips64_asm=:bn-mips.o mips-mont.o::aes_cbc.o aes-mips.o:::sha1-mips.o sha256-mips.o sha512-mips.o; diff --git a/crypto/aes/Makefile b/crypto/aes/Makefile index 8edd358..2f32983 100644 --- a/crypto/aes/Makefile +++ b/crypto/aes/Makefile @@ -66,8 +66,10 @@ aesni-x86_64.s: asm/aesni-x86_64.pl aesni-sha1-x86_64.s: asm/aesni-sha1-x86_64.pl $(PERL) asm/aesni-sha1-x86_64.pl $(PERLASM_SCHEME) $@ -aes-sparcv9.s: asm/aes-sparcv9.pl +aes-sparcv9.S: asm/aes-sparcv9.pl $(PERL) asm/aes-sparcv9.pl $(CFLAGS) $@ +aes-sparcv9.s: aes-sparcv9.S + $(CC) $(CFLAGS) -E aes-sparcv9.S $@
[PATCH 2/2] sparc: Expand AES crypto opcodes support to various modes.
On a SPARC-T4, with AES opcodes disabled (OPENSSL_sparcv9cap=0): type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes aes-128 cbc 75200.21k83425.11k86767.67k87853.06k88279.72k aes-192 cbc 64906.68k71059.56k73902.42k74532.52k74855.77k aes-256 cbc 56814.90k61781.72k63903.74k64367.27k64607.57k And with them enabled: type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes aes-128 cbc 501882.74k 836726.87k 993102.76k 1020379.48k 1054083.75k aes-192 cbc 435068.22k 707080.77k 837915.90k 864243.03k 889279.83k aes-256 cbc 393746.28k 620463.13k 727483.31k 749580.97k 769029.46k This system is a T4-2 so it's fun to show off some parallel benchmarks, for example openssl speed -multi 16 -evp aes-128-ecb gives: type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes evp7429568.93k 17815630.93k 28436597.93k 32033047.55k 35120630.44k 35GB/sec AES encryption, not too bad. Currently CBC, ECB, CTR, OFB, and CFB modes are explicitly optimized. Other modes will be optimized in the future. Signed-off-by: David S. Miller da...@davemloft.net --- Configure |2 +- crypto/aes/aes_sparccore.c| 55 crypto/aes/asm/aes-sparcv9.pl | 666 + crypto/evp/e_aes.c| 400 + crypto/sparc_arch.h | 19 ++ 5 files changed, 1141 insertions(+), 1 deletion(-) diff --git a/Configure b/Configure index 66b4ff8..217a552 100755 --- a/Configure +++ b/Configure @@ -130,7 +130,7 @@ my $x86_elf_asm=$x86_asm:elf; my $x86_64_asm=x86_64cpuid.o:x86_64-gcc.o x86_64-mont.o x86_64-mont5.o x86_64-gf2m.o modexp512-x86_64.o::aes-x86_64.o vpaes-x86_64.o bsaes-x86_64.o aesni-x86_64.o aesni-sha1-x86_64.o::md5-x86_64.o:sha1-x86_64.o sha256-x86_64.o sha512-x86_64.o::rc4-x86_64.o rc4-md5-x86_64.o:::wp-x86_64.o:cmll-x86_64.o cmll_misc.o:ghash-x86_64.o:e_padlock-x86_64.o; my $ia64_asm=ia64cpuid.o:bn-ia64.o ia64-mont.o::aes_core.o aes_cbc.o aes-ia64.o::md5-ia64.o:sha1-ia64.o sha256-ia64.o sha512-ia64.o::rc4-ia64.o rc4_skey.o:ghash-ia64.o::void; -my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_sparccore.o aes_cbc.o aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o sha512-sparcv9.o:::ghash-sparcv9.o::void; +my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_sparccore.o aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o sha512-sparcv9.o:::ghash-sparcv9.o::void; my $sparcv8_asm=:sparcv8.o:des_enc-sparc.o fcrypt_b.o:void; my $alpha_asm=alphacpuid.o:bn_asm.o alpha-mont.o:sha1-alpha.o:::ghash-alpha.o::void; my $mips64_asm=:bn-mips.o mips-mont.o::aes_cbc.o aes-mips.o:::sha1-mips.o sha256-mips.o sha512-mips.o; diff --git a/crypto/aes/aes_sparccore.c b/crypto/aes/aes_sparccore.c index 2842cbc..658cc66 100644 --- a/crypto/aes/aes_sparccore.c +++ b/crypto/aes/aes_sparccore.c @@ -36,6 +36,7 @@ #include stdlib.h #include openssl/crypto.h #include openssl/aes.h +#include openssl/modes.h #include aes_locl.h #include sparc_arch.h @@ -270,3 +271,57 @@ int AES_set_decrypt_key(const unsigned char *userKey, const int bits, } return 0; } + +void aes_sparc_hw_cbc_encrypt(const unsigned char *in, unsigned char *out, + size_t length, const AES_KEY *key, + unsigned char *ivec, int enc); + +void AES_cbc_encrypt(const unsigned char *in, unsigned char *out, +size_t len, const AES_KEY *key, +unsigned char *ivec, const int enc) +{ + const void *aligned_in; + void *aligned_out; + int aligned_len; + size_t bl = 16; + + if (!(OPENSSL_sparcv9cap_P SPARCV9_AES)) + goto slow; + + aligned_len = len ~(bl - 1); + if (!aligned_len) + goto trailing; + + aligned_out = out; + if ((unsigned long) out 0x7) { + aligned_out = OPENSSL_malloc(aligned_len); + if (!aligned_out) + goto slow; + } + aligned_in = in; + if ((unsigned long)in 0x7) { + memcpy(aligned_out, in, aligned_len); + aligned_in = (const void *) aligned_out; + } + + aes_sparc_hw_cbc_encrypt(aligned_in, aligned_out, aligned_len, +key, ivec, enc); + + if ((unsigned long)out 0x7) { + memcpy(out, aligned_out, aligned_len); + OPENSSL_free(aligned_out); + } +trailing: + len -= aligned_len; + if (len) { + out += aligned_len; + in += aligned_len; +slow: + if (enc) + CRYPTO_cbc128_encrypt(in, out,
[PATCH] sparc: Add support for CAMELLIA opcodes.
On a SPARC T4-2, with CAMELLIA opcodes disabled: type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes camellia-128 cbc63737.35k66054.61k66780.50k66775.35k 67062.44k camellia-192 cbc51126.33k53836.78k54761.73k54964.91k 55017.47k camellia-256 cbc51126.24k53774.55k54760.02k54963.54k 55017.47k with CAMELLIA opcodes enabled: type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes camellia-128 cbc 483488.94k 608627.31k 646251.78k 645825.54k 657945.94k camellia-192 cbc 396779.71k 474317.61k 497627.22k 497634.65k 504881.15k camellia-256 cbc 396796.10k 474297.19k 497624.06k 497644.20k 504872.96k Signed-off-by: David S. Miller da...@davemloft.net --- If this is applied before the sparc AES opcode patches, there is a minor and easy to resolve conflict in the top-level Configure file. Tested on the full matrix of {static,shared}/linux{,64}-sparcv9 Configure |2 +- crypto/camellia/Makefile |2 + crypto/camellia/asm/cmll-sparcv9.S | 604 crypto/camellia/cmll_sparccore.c | 219 + crypto/sparc_arch.h| 11 + 5 files changed, 837 insertions(+), 1 deletion(-) create mode 100644 crypto/camellia/asm/cmll-sparcv9.S create mode 100644 crypto/camellia/cmll_sparccore.c diff --git a/Configure b/Configure index 217a552..b4cbb56 100755 --- a/Configure +++ b/Configure @@ -130,7 +130,7 @@ my $x86_elf_asm=$x86_asm:elf; my $x86_64_asm=x86_64cpuid.o:x86_64-gcc.o x86_64-mont.o x86_64-mont5.o x86_64-gf2m.o modexp512-x86_64.o::aes-x86_64.o vpaes-x86_64.o bsaes-x86_64.o aesni-x86_64.o aesni-sha1-x86_64.o::md5-x86_64.o:sha1-x86_64.o sha256-x86_64.o sha512-x86_64.o::rc4-x86_64.o rc4-md5-x86_64.o:::wp-x86_64.o:cmll-x86_64.o cmll_misc.o:ghash-x86_64.o:e_padlock-x86_64.o; my $ia64_asm=ia64cpuid.o:bn-ia64.o ia64-mont.o::aes_core.o aes_cbc.o aes-ia64.o::md5-ia64.o:sha1-ia64.o sha256-ia64.o sha512-ia64.o::rc4-ia64.o rc4_skey.o:ghash-ia64.o::void; -my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_sparccore.o aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o sha512-sparcv9.o:::ghash-sparcv9.o::void; +my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_sparccore.o aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o sha512-sparcv9.o::cmll-sparcv9.o cmll_sparccore.o:ghash-sparcv9.o::void; my $sparcv8_asm=:sparcv8.o:des_enc-sparc.o fcrypt_b.o:void; my $alpha_asm=alphacpuid.o:bn_asm.o alpha-mont.o:sha1-alpha.o:::ghash-alpha.o::void; my $mips64_asm=:bn-mips.o mips-mont.o::aes_cbc.o aes-mips.o:::sha1-mips.o sha256-mips.o sha512-mips.o; diff --git a/crypto/camellia/Makefile b/crypto/camellia/Makefile index 8858dd0..6802393 100644 --- a/crypto/camellia/Makefile +++ b/crypto/camellia/Makefile @@ -48,6 +48,8 @@ cmll-x86.s: asm/cmll-x86.pl ../perlasm/x86asm.pl $(PERL) asm/cmll-x86.pl $(PERLASM_SCHEME) $(CFLAGS) $(PROCESSOR) $@ cmll-x86_64.s: asm/cmll-x86_64.pl $(PERL) asm/cmll-x86_64.pl $(PERLASM_SCHEME) $@ +cmll-sparcv9.s: asm/cmll-sparcv9.S + $(CC) $(CFLAGS) -E asm/cmll-sparcv9.S $@ files: $(PERL) $(TOP)/util/files.pl Makefile $(TOP)/MINFO diff --git a/crypto/camellia/asm/cmll-sparcv9.S b/crypto/camellia/asm/cmll-sparcv9.S new file mode 100644 index 000..015d5ee --- /dev/null +++ b/crypto/camellia/asm/cmll-sparcv9.S @@ -0,0 +1,604 @@ +/* Written by David S. Miller da...@davemloft.net for the OpenSSL + * project. The module is, however, dual licensed under OpenSSL and + * CRYPTOGAMS licenses depending on where you obtain it. For further + * details see http://www.openssl.org/~appro/cryptogams/. + */ + +#include sparc_arch.h + +#ifdef __arch64__ + .register %g2,#scratch + .register %g3,#scratch +#endif + +#define CAMELLIA_6ROUNDS(KEY_BASE, I0, I1) \ + CAMELLIA_F(KEY_BASE + 0, I1, I0, I1) \ + CAMELLIA_F(KEY_BASE + 2, I0, I1, I0) \ + CAMELLIA_F(KEY_BASE + 4, I1, I0, I1) \ + CAMELLIA_F(KEY_BASE + 6, I0, I1, I0) \ + CAMELLIA_F(KEY_BASE + 8, I1, I0, I1) \ + CAMELLIA_F(KEY_BASE + 10, I0, I1, I0) + +#define CAMELLIA_6ROUNDS_FL_FLI(KEY_BASE, I0, I1) \ + CAMELLIA_6ROUNDS(KEY_BASE, I0, I1) \ + CAMELLIA_FL(KEY_BASE + 12, I0, I0) \ + CAMELLIA_FLI(KEY_BASE + 14, I1, I1) + + .data + + .align 8 +SIGMA: .xword 0xA09E667F3BCC908B + .xword 0xB67AE8584CAA73B2 + .xword 0xC6EF372FE94F82BE + .xword 0x54FF53A5F1D36F1C + .xword 0x10E527FADE682D1D + .xword 0xB05688C2B3E6C1FD + + .text + +SPARC_PIC_THUNK(g3) + + .align 32 + .globl sparc_hw_camellia_ekeygen + .type sparc_hw_camellia_ekeygen,#function