Re: [PATCH 0/7] Phase one of sparc crypto opcode support.

2012-09-21 Thread Andy Polyakov
 You mentioned Montgomery BN.
 
 Here are how the instructions work.
 
 The basic model is that there is a range of sizes supported by the
 instruction, and all of the data is loaded into a combination of
 the floating point registers and all of the register windows of
 the cpu.

Ouch!

   ...
 
   save
 
   ...
 
   restore
   ...
 
 Of course, you might quickly ask what happens in 32-bit mode?

No, before thinking about 32-bit mode, I quickly ask what's with save-s
without arguments? I quickly ask what happens if context switch strikes
in the middle? save without argument means that %sp will be effectively
uninitialized and attempts to refer stack [during context switch or
asynchronous signal delivery] are either doomed or corrupt stack. So
save-s ought to allocate frames. But even then, [and in 64-bit mode], do
instructions in question ensure that register windows are loaded prior
execution? I mean consider context switch between a save and say
montmul. Kernel dumps all windows on stack and when execution resumes it
normally brings in only one top window and let's window trap bring in
remaining ones on demand. So that before instructions in question can
start actual processing, all windows has to be loaded. Presumably the
instructions can trigger window trap, then kernel would have to see that
it's one of the instructions that triggered it and act accordingly, i.e.
bring in all the windows. Does it work that way? Or do I get it
backwards? I assume that instructions in question are uninterruptible,
so that trap can be generated only prior calculation...
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-21 Thread Andy Polyakov
 Provide these so that the assembler users can be oblivious about
 whether this is PIC or non-PIC, 64-bit or 32-bit, etc.
 
 It is important to use a real call and return to implement the
 obtaining of the %pc as part of the PIC sequence.  Sequences
 such as:
 
   call. + 8
   mov %o7, %PIC_REG
 
 are to be avoided at all costs on UltraSPARC cpus.  This is because
 such a sequence flushes the Return Address Stack (RAS) because the
 call is not paired with a return.
 
 Every time a call or jmpl with RD=%o7 is performed, the chip pushes
 the PC+8 onto the top of the RAS.  The next jmpl %o7 + 8 or return
 %i7 + 8 the chip sees will cause it to pop the top entry off the RAS
 and begin fetching down that path.  If there is a mis-match the entire
 pipeline is flushed and the chip restarts fetching down the correct
 path.
 
 Therefore, the above discouraged sequence will cause all of the RAS
 entries to mismatch and there will therefore be a full pipeline flush
 on every subsequent function return.

Well, last time I looked into this I could establish following. call .+8
was actually used by vendor compiler [maybe not anymore, I don't know,
but at Sun days it was used extensively]. SPARC V manual is explicit
about call .+8 *not* affecting RAS. Purify also was discussed in
context, and it actually recognizes the construct and treats it
specially. In other words it was considered widely adopted practice and
it was found to be backed up by at least one hardware design. Penalties
are measured to be minimal on UltraSPARC, two additional cycles (in
comparison to 20 cycles for save and restore alone). But of course today
situation might be different and T-SPARCs can suffer from it more...

I'll handle this, but differently. Specifically I won't go through GOT,
but directly to variable, something like this:

.Lretl:
retl
nop
...
sethi   %hi(var-.Lpic),%reg
.Lpic:  call.Lretl
add %o7,%lo(var-.Lpic),%reg

This works with both Solaris and Linux toolchains and in both 32- and
64-bit mode (it was hell to get des_enc work everywhere). In 64-bit mode
it implies that shared library itself is limited by 2GB, but it's
considered reasonable limitation. Avoiding GOT allows to hide
OPENSSL_sparcv9cap_P with __attribute__((visibility(hidden)));. Now
it's static.

Once again, don't think about it no more, it will be taken care of.

As for OPENSSL_sparcv9cap_P itself. I'd rather extend it to int
OPENSSL_sparcv9cap_P[2] and save %cfr as is to  OPENSSL_sparcv9cap_P[1].
Any objections?

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured

2012-09-21 Thread Rob Stradling via RT
Attached are patches for 1.0.0 and 0.9.8.

-- 
Rob Stradling
Senior Research  Development Scientist
COMODO - Creating Trust Online
Office Tel: +44.(0)1274.730505
Office Fax: +44.(0)1274.730909
www.comodo.com

COMODO CA Limited, Registered in England No. 04058690
Registered Office:
   3rd Floor, 26 Office Village, Exchange Quay,
   Trafford Road, Salford, Manchester M5 3EQ

This e-mail and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed.  If you have received this email in error please notify the 
sender by replying to the e-mail containing this attachment. Replies to 
this email may be monitored by COMODO for operational or business 
reasons. Whilst every endeavour is taken to ensure that e-mails are free 
from viruses, no liability can be accepted and the recipient is 
requested to use their own virus checking software.

Index: ssl/s3_srvr.c
===
RCS file: /v/openssl/cvs/openssl/ssl/s3_srvr.c,v
retrieving revision 1.126.2.42
diff -u -r1.126.2.42 s3_srvr.c
--- ssl/s3_srvr.c   16 Feb 2012 15:21:17 -  1.126.2.42
+++ ssl/s3_srvr.c   20 Sep 2012 14:40:41 -
@@ -1005,7 +1005,7 @@
goto f_err;
}
}
-   if (ssl_check_clienthello_tlsext(s) = 0) {
+   if (ssl_check_clienthello_tlsext_early(s) = 0) {

SSLerr(SSL_F_SSL3_GET_CLIENT_HELLO,SSL_R_CLIENTHELLO_TLSEXT);
goto err;
}
@@ -1131,6 +1131,16 @@
 * s-tmp.new_cipher- the new cipher to use.
 */
 
+   /* Handles TLS extensions that we couldn't check earlier */
+   if (s-version = SSL3_VERSION)
+   {
+   if (ssl_check_clienthello_tlsext_late(s) = 0)
+   {
+   
SSLerr(SSL_F_SSL3_GET_CLIENT_HELLO,SSL_R_CLIENTHELLO_TLSEXT);
+   goto err;
+   }
+   }
+
if (ret  0) ret=1;
if (0)
{
Index: ssl/ssl_lib.c
===
RCS file: /v/openssl/cvs/openssl/ssl/ssl_lib.c,v
retrieving revision 1.133.2.31
diff -u -r1.133.2.31 ssl_lib.c
--- ssl/ssl_lib.c   5 Jan 2012 10:21:49 -   1.133.2.31
+++ ssl/ssl_lib.c   20 Sep 2012 14:40:41 -
@@ -1943,7 +1943,7 @@
}
 
 /* THIS NEEDS CLEANING UP */
-X509 *ssl_get_server_send_cert(SSL *s)
+X509 *ssl_get_server_send_cert(const SSL *s)
{
unsigned long alg,kalg;
CERT *c;
@@ -2420,7 +2420,9 @@
 /* Fix this function so that it takes an optional type parameter */
 X509 *SSL_get_certificate(const SSL *s)
{
-   if (s-cert != NULL)
+   if (s-server)
+   return(ssl_get_server_send_cert(s));
+   else if (s-cert != NULL)
return(s-cert-key-x509);
else
return(NULL);
Index: ssl/ssl_locl.h
===
RCS file: /v/openssl/cvs/openssl/ssl/ssl_locl.h,v
retrieving revision 1.63.2.22
diff -u -r1.63.2.22 ssl_locl.h
--- ssl/ssl_locl.h  9 Mar 2012 15:51:56 -   1.63.2.22
+++ ssl/ssl_locl.h  20 Sep 2012 14:40:41 -
@@ -740,7 +740,7 @@
 int ssl_undefined_function(SSL *s);
 int ssl_undefined_void_function(void);
 int ssl_undefined_const_function(const SSL *s);
-X509 *ssl_get_server_send_cert(SSL *);
+X509 *ssl_get_server_send_cert(const SSL *);
 EVP_PKEY *ssl_get_sign_pkey(SSL *,SSL_CIPHER *);
 int ssl_cert_type(X509 *x,EVP_PKEY *pkey);
 void ssl_set_cert_masks(CERT *c, SSL_CIPHER *cipher);
@@ -979,7 +979,8 @@
 int ssl_parse_serverhello_tlsext(SSL *s, unsigned char **data, unsigned char 
*d, int n, int *al);
 int ssl_prepare_clienthello_tlsext(SSL *s);
 int ssl_prepare_serverhello_tlsext(SSL *s);
-int ssl_check_clienthello_tlsext(SSL *s);
+int ssl_check_clienthello_tlsext_early(SSL *s);
+int ssl_check_clienthello_tlsext_late(SSL *s);
 int ssl_check_serverhello_tlsext(SSL *s);
 
 #ifdef OPENSSL_NO_SHA256
Index: ssl/t1_lib.c
===
RCS file: /v/openssl/cvs/openssl/ssl/t1_lib.c,v
retrieving revision 1.13.2.30
diff -u -r1.13.2.30 t1_lib.c
--- ssl/t1_lib.c4 Jan 2012 14:25:10 -   1.13.2.30
+++ ssl/t1_lib.c20 Sep 2012 14:40:41 -
@@ -745,7 +745,7 @@
return 1;
}
 
-int ssl_check_clienthello_tlsext(SSL *s)
+int ssl_check_clienthello_tlsext_early(SSL *s)
{
int ret=SSL_TLSEXT_ERR_NOACK;
int al = SSL_AD_UNRECOGNIZED_NAME;
@@ -755,11 +755,34 @@
else if (s-initial_ctx != NULL  
s-initial_ctx-tlsext_servername_callback != 0) 
ret = s-initial_ctx-tlsext_servername_callback(s, al, 
s-initial_ctx-tlsext_servername_arg);
 
+   switch (ret)
+   {
+   case 

[openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured

2012-09-21 Thread Stephen Henson via RT
 [rob.stradl...@comodo.com - Fri Sep 21 15:02:54 2012]:
 
 Attached are patches for 1.0.0 and 0.9.8.
 
 

Note, I updated the original change to retain compatibility with
existing behaviour as far as possible. See:

http://cvs.openssl.org/chngview?cn=22808

Steve.
-- 
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured

2012-09-21 Thread Rob Stradling via RT
Hi Steve.

I saw your update (to 1.0.2 and HEAD), and I did start looking at 
backporting it into my 1.0.1/1.0.0/0.9.8 patches.

ssl_get_server_send_pkey() is not available in 1.0.1 and earlier, so the 
t1_lib.c patch would have to be something like...

+   X509 *x;
+   x = ssl_get_server_send_cert)s);
+   /* If no certificate can't return certificate status */
+   if (x == NULL)
+   {
+   s-tlsext_status_expected = 0;
+   return 1;
+   }
+   /* Set current certificate to one we will use so
+* SSL_get_certificate et al can pick it up.
+*/
+   s-cert-key-x509 = x;

Is it OK to update s-cert-key-x509 like this?


On 21/09/12 14:34, Stephen Henson via RT wrote:
 [rob.stradl...@comodo.com - Fri Sep 21 15:02:54 2012]:

 Attached are patches for 1.0.0 and 0.9.8.



 Note, I updated the original change to retain compatibility with
 existing behaviour as far as possible. See:

 http://cvs.openssl.org/chngview?cn=22808

 Steve.


-- 
Rob Stradling
Senior Research  Development Scientist
COMODO - Creating Trust Online


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured

2012-09-21 Thread Stephen Henson via RT
 [rob.stradl...@comodo.com - Fri Sep 21 15:55:39 2012]:
 
 Hi Steve.
 
 I saw your update (to 1.0.2 and HEAD), and I did start looking at 
 backporting it into my 1.0.1/1.0.0/0.9.8 patches.
 
 ssl_get_server_send_pkey() is not available in 1.0.1 and earlier, so the 
 t1_lib.c patch would have to be something like...
 
 + X509 *x;
 + x = ssl_get_server_send_cert)s);
 + /* If no certificate can't return certificate status */
 + if (x == NULL)
 + {
 + s-tlsext_status_expected = 0;
 + return 1;
 + }
 + /* Set current certificate to one we will use so
 +  * SSL_get_certificate et al can pick it up.
 +  */
 + s-cert-key-x509 = x;
 
 Is it OK to update s-cert-key-x509 like this?
 

No because you could end up with all sorts of bad things happening (keys
and certificates not matching, certificate types not matching and memory
leaks). Easiest solution is to also backport ssl_get_server_send_pkey see:

http://cvs.openssl.org/chngview?cn=22840

Steve.
-- 
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured

2012-09-21 Thread Rob Stradling via RT
On 21/09/12 15:04, Stephen Henson via RT wrote:
 [rob.stradl...@comodo.com - Fri Sep 21 15:55:39 2012]:

 Hi Steve.

 I saw your update (to 1.0.2 and HEAD), and I did start looking at
 backporting it into my 1.0.1/1.0.0/0.9.8 patches.

 ssl_get_server_send_pkey() is not available in 1.0.1 and earlier, so the
 t1_lib.c patch would have to be something like...

 +X509 *x;
 +x = ssl_get_server_send_cert)s);
 +/* If no certificate can't return certificate status */
 +if (x == NULL)
 +{
 +s-tlsext_status_expected = 0;
 +return 1;
 +}
 +/* Set current certificate to one we will use so
 + * SSL_get_certificate et al can pick it up.
 + */
 +s-cert-key-x509 = x;

 Is it OK to update s-cert-key-x509 like this?


 No because you could end up with all sorts of bad things happening (keys
 and certificates not matching, certificate types not matching and memory
 leaks).

That's what I thought.

 Easiest solution is to also backport ssl_get_server_send_pkey see:

 http://cvs.openssl.org/chngview?cn=22840

I didn't think of that.  Thanks!

I'll prepare patches to backport 22840 to 1.0.0 and 0.9.8 (unless you or 
Ben get there first).

-- 
Rob Stradling
Senior Research  Development Scientist
COMODO - Creating Trust Online


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2836] [PATCH] Staple the correct OCSP Response when multiple certs are configured

2012-09-21 Thread Rob Stradling via RT
On 21/09/12 15:12, Rob Stradling via RT wrote:
 On 21/09/12 15:04, Stephen Henson via RT wrote:
snip
 Easiest solution is to also backport ssl_get_server_send_pkey see:

 http://cvs.openssl.org/chngview?cn=22840

 I didn't think of that.  Thanks!

 I'll prepare patches to backport 22840 to 1.0.0 and 0.9.8 (unless you or
 Ben get there first).

http://cvs.openssl.org/patchset?cn=22840 applies cleanly (i.e. no failed 
hunks) on top of my patches for 1.0.0 and 0.9.8.

-- 
Rob Stradling
Senior Research  Development Scientist
COMODO - Creating Trust Online


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 0/7] Phase one of sparc crypto opcode support.

2012-09-21 Thread David Miller
From: Andy Polyakov ap...@openssl.org
Date: Fri, 21 Sep 2012 11:36:16 +0200

 No, before thinking about 32-bit mode, I quickly ask what's with save-s
 without arguments?

Sorry, I just wrote that code as pseudo-code off the top of my
head without attending to all of the necessary details.

We would indeed need to allocate a minimal stack frame in each
save instruction.

It's just an oversight in my example code, that's all.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-21 Thread David Miller
From: Andy Polyakov ap...@openssl.org
Date: Fri, 21 Sep 2012 12:21:25 +0200

 I'll handle this, but differently. Specifically I won't go through GOT,
 but directly to variable, something like this:

I would like to politely request that you don't go down this road.

 .Lretl:
   retl
   nop
 ...
   sethi   %hi(var-.Lpic),%reg
 .Lpic:call.Lretl
   add %o7,%lo(var-.Lpic),%reg

I honestly think it's easiest to to simply generate correct PIC
sequences, as my macros are trying to do.

We can add whatever ifdefs and code generation cases we need to
sparc_arch.h The code that I'm emitting is identical to what GCC
generates on Linux and Solaris under Sparc regardless of which
assembler and linker are in use.

I should know, I wrote much of the sparc GCC backend.

If you describe to me what problems your scheme ran into, I can fix
them up.

Did you test if my code sequences work for you?  It is also important
to note that they are also specifically designed to be usable in leaf
functions.

BTW, the real long term answer is mark openssl internal symbols as
hidden and then use GOT_DATA optimization sequences which will get
rid of the GOT reference altogether.  But that requires some configure
checks to see if the assembler and linker support these constructs.

 As for OPENSSL_sparcv9cap_P itself. I'd rather extend it to int
 OPENSSL_sparcv9cap_P[2] and save %cfr as is to  OPENSSL_sparcv9cap_P[1].
 Any objections?

I think this is code masterbation at this early stage of the sparc
crypto opcode support implementation and is something we can clean up
later.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-21 Thread David Miller

Here is a more detailed reply specifically about generating
correct and optimal Sparc PIC sequences.

Let's get the non-PIC static case out of the way, we should
always use:

set symbol, %reg! 32-bit
setxsymbol, %tmp_reg, %reg  ! 64-bit

Using calls to PIC stubs is completely pointless overhead when we are
doing a static build.

If we are generating PIC we need a stub function, there are a lot of
ways to do this.  One scheme is to simply emit a stub in each source
file where the stub is needed.

If the assembler and linker support got-data optimizations, we can
emit the following sequence:

sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG
call__sparc_pic_stub
 or %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG
sethi   %gdop_hix22(symbol), %TMP
xor %TMP, %gdop_lox10(symbol), %TMP
LDPTR   [%PIC_REG + %TMP], %REG, %gdop(symbol)

If the linker finds that the resolution of symbol (f.e. the symbol
is static to the compilation unit, or marked as 'hidden') can be done
at final link time, that LDPTR above will be optimized into:

add %PIC_REG, %TMP, %REG

The symbol offset will also be adjusted, as needed, in the %gdop_*()
sethi and xor instructions.  And finally, the reference to the global
offset table slot that would have been generated for 'symbol', will be
removed.

Otherwise, if the linker and assembler lack gotdata optimization
support, we use just a plain PIC sequence:

sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG
call__sparc_pic_stub
 or %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG
sethi   %hi(symbol), %TMP
or  %TMP, %lo(symbol), %TMP
LDPTR   [%PIC_REG + %TMP], %REG

If this doesn't work in some cases, we need to discover exactly
why instead of dismissing my approach completely.

Now, of course, all of the above if for -fPIC, but I see no sparc
target (nor any target except one strange hpux case) that specifies
-fpic instead of -fPIC in Configure.

However that case is simple to accomodate as well, and I'd be happy to
do so in my macros.

About the RAS stack missing cost, every Sun produced UltraSPARC chip
pushes unconditionally onto the RAS and does not special case the

call.+8

pattern.

Thinking about this logically, a RAS miss can (at best) perform like a
full branch misprediction.  Which on UltraSPARC results in a full
pipeline flush as the mis-predicted fetched instructions needs to be
cancelled and cleared out of the pipeline so we can begin executing
down the correct path.

This can be huge, depending upon the contents of the improperly
fetched path of instructions.  In the worst possible case, up to 18
instructions can need to be cancelled (UltraSPARC-I programmers
manual, section 16.2.9, page 270)

Worse than the immediate cost of the RAS corruption, is that every
subsequent function return out of openssl is going to miss the RAS
and incur the penalty as well.

I consider it absolutely critical that the PIC sequences support being
used in leaf functions, without save and restore instructions.  And my
macros have been designed with this in mind.

When used, one need not allocate a register window merely for the sake
of performing a PIC sequence.

When we get past these initial patches and I post my DES work, you
will see that I adjusted dec_enc.m4 to use the new PIC interfaces I
created.  In fact I had to, because the 13-bit relocations used there
no longer fit with the crypto opcode code added.

There are other problems in des_enc.m4, which I have fixed in my
patches.  As just one other example, it doesn't include opensslconf.h
and therefore OPENSSL_SYSNAME_ULTRASPARC is never defined and the V9
sequences are never used for 32-bit, which hurts performance.

Only one valid set of CPP tests exists for the various cases we care
about on sparc.  __PIC__ means PIC code generation is in use.
__arch64__ means 64-bit code generation, and __sparc_v9__ means V9
code can be used.  These are fully standardized and both SunPRO and
GCC set them consistently.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[PATCH 0/2] Sparc AES crypto opcode support.

2012-09-21 Thread David Miller

This builds on top of the 7 patch series I sent the other day which
laid the foundation for sparc crypto opcode support.

The first patch plugs in optimized versions of key expansion and
AES_{decrypt,encrypt}()

The second patch is modelled on the AESNI support and explicitly
optimizes ECB, CBC, CTR, OFB, and CFB modes.  I'll do the remaining
modes soon.

I've put this through a battery of tests, and in particular I hacked
up a local copy of test/test_aesni (which doesn't seem to get run even
on x86?) that uses the appropriate sparc environment variable to turn
off crypto opcode usage.  That script helped a lot during validation.

The 35GB/sec benchmark result in the second patch is not a typo :-)

Signed-off-by: David S. Miller da...@davemloft.net
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[PATCH 1/2] sparc: Add initial support for AES opcodes.

2012-09-21 Thread David Miller

Currently AES_encrypt, AES_decrypt, and the key expansion are
optimized.  Direct support for CBC, ECB, CTR, etc. will come
in subsequent changes.

The following measurements were taken on a SPARC-T4.

Baseline (OPENSSL_sparcv9cap=0):

type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
aes-128 cbc  85241.72k90930.60k94282.67k95158.95k95087.08k
aes-192 cbc  73300.41k77576.49k80022.95k80657.75k80838.66k
aes-256 cbc  64390.17k67656.43k69442.30k69893.80k70022.49k

With AES opcodes enabled:

type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
aes-128 cbc 298612.77k   353669.87k   389577.22k   400843.61k   406031.02k
aes-192 cbc 282841.19k   323486.85k   364641.37k   375664.98k   378989.23k
aes-256 cbc 269449.24k   310281.81k   343170.05k   352550.23k   355317.08k

There were several interesting implementation issues dealt with here.

The AES opcodes need the decryption key in a different format than the
generic sparc v9 code wants (basically, no pre-application of the
MixColumn).  To address this and also to facilitate using the AES
opcodes for key expansion, a new aes_sparccore.c file is used in place
of aes_core.c when building for sparcv9.

The non-AES-opcode sparc code was changed to use a real proper PIC sequence
with sparc_arch.h macros.  The code which was there flushes the UltraSPARC
return address stack, negatively impacting performance.

Any call, or jmpl with destination register %o7, that lacks a paired
ret/retl will effectively corrupt the return address stack, making
every subsequent ret/retl miss the cache and take a full pipeline
flush.

The sparc_arch.h PIC loading sequences lack this problem, and also
they know how to do non-PIC loading of symbol addresses even more
efficiently.

Next, usage of the AES instructions is unnecessarily difficult if
the key is not 8-byte aligned.  So we use a trick so that we always
have an aligned key to work with.

We determine if the AES_KEY is 8 or 4 byte aligned, these are the only
two possibilities on sparc.  If it is 8 byte aligned, we use the
existing interpretation of the AES_KEY contents.  However, if it is
4 byte aligned, we put the -rounds value first and then the key
so that they key becomes 8-byte aligned.  All of the aes_sparccore.c
and aes-sparcv9.pl code is aware of this convention.

Since we don't have any control over the alignment of the input buffers,
output buffers, and input key, we make use of alignaddr, faligndata,
and masked partial stores to deal with the unaligned cases.

Signed-off-by: David S. Miller da...@davemloft.net
---
 Configure |2 +-
 crypto/aes/Makefile   |4 +-
 crypto/aes/aes_sparccore.c|  272 ++
 crypto/aes/asm/aes-sparcv9.pl |  502 -
 crypto/sparc_arch.h   |   34 +++
 5 files changed, 802 insertions(+), 12 deletions(-)
 create mode 100644 crypto/aes/aes_sparccore.c

diff --git a/Configure b/Configure
index 2333a63..66b4ff8 100755
--- a/Configure
+++ b/Configure
@@ -130,7 +130,7 @@ my $x86_elf_asm=$x86_asm:elf;
 
 my $x86_64_asm=x86_64cpuid.o:x86_64-gcc.o x86_64-mont.o x86_64-mont5.o 
x86_64-gf2m.o modexp512-x86_64.o::aes-x86_64.o vpaes-x86_64.o bsaes-x86_64.o 
aesni-x86_64.o aesni-sha1-x86_64.o::md5-x86_64.o:sha1-x86_64.o sha256-x86_64.o 
sha512-x86_64.o::rc4-x86_64.o rc4-md5-x86_64.o:::wp-x86_64.o:cmll-x86_64.o 
cmll_misc.o:ghash-x86_64.o:e_padlock-x86_64.o;
 my $ia64_asm=ia64cpuid.o:bn-ia64.o ia64-mont.o::aes_core.o aes_cbc.o 
aes-ia64.o::md5-ia64.o:sha1-ia64.o sha256-ia64.o sha512-ia64.o::rc4-ia64.o 
rc4_skey.o:ghash-ia64.o::void;
-my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o 
sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_core.o aes_cbc.o 
aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o 
sha512-sparcv9.o:::ghash-sparcv9.o::void;
+my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o 
sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_sparccore.o aes_cbc.o 
aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o 
sha512-sparcv9.o:::ghash-sparcv9.o::void;
 my $sparcv8_asm=:sparcv8.o:des_enc-sparc.o fcrypt_b.o:void;
 my $alpha_asm=alphacpuid.o:bn_asm.o 
alpha-mont.o:sha1-alpha.o:::ghash-alpha.o::void;
 my $mips64_asm=:bn-mips.o mips-mont.o::aes_cbc.o aes-mips.o:::sha1-mips.o 
sha256-mips.o sha512-mips.o;
diff --git a/crypto/aes/Makefile b/crypto/aes/Makefile
index 8edd358..2f32983 100644
--- a/crypto/aes/Makefile
+++ b/crypto/aes/Makefile
@@ -66,8 +66,10 @@ aesni-x86_64.s: asm/aesni-x86_64.pl
 aesni-sha1-x86_64.s:   asm/aesni-sha1-x86_64.pl
$(PERL) asm/aesni-sha1-x86_64.pl $(PERLASM_SCHEME)  $@
 
-aes-sparcv9.s: asm/aes-sparcv9.pl
+aes-sparcv9.S: asm/aes-sparcv9.pl
$(PERL) asm/aes-sparcv9.pl $(CFLAGS)  $@
+aes-sparcv9.s: aes-sparcv9.S
+   $(CC) $(CFLAGS) -E aes-sparcv9.S  $@
 
 

[PATCH 2/2] sparc: Expand AES crypto opcodes support to various modes.

2012-09-21 Thread David Miller

On a SPARC-T4, with AES opcodes disabled (OPENSSL_sparcv9cap=0):

type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
aes-128 cbc  75200.21k83425.11k86767.67k87853.06k88279.72k
aes-192 cbc  64906.68k71059.56k73902.42k74532.52k74855.77k
aes-256 cbc  56814.90k61781.72k63903.74k64367.27k64607.57k

And with them enabled:

type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
aes-128 cbc 501882.74k   836726.87k   993102.76k  1020379.48k  1054083.75k
aes-192 cbc 435068.22k   707080.77k   837915.90k   864243.03k   889279.83k
aes-256 cbc 393746.28k   620463.13k   727483.31k   749580.97k   769029.46k

This system is a T4-2 so it's fun to show off some parallel benchmarks,
for example openssl speed -multi 16 -evp aes-128-ecb gives:

type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
evp7429568.93k 17815630.93k 28436597.93k 32033047.55k 35120630.44k

35GB/sec AES encryption, not too bad.

Currently CBC, ECB, CTR, OFB, and CFB modes are explicitly optimized.
Other modes will be optimized in the future.

Signed-off-by: David S. Miller da...@davemloft.net
---
 Configure |2 +-
 crypto/aes/aes_sparccore.c|   55 
 crypto/aes/asm/aes-sparcv9.pl |  666 +
 crypto/evp/e_aes.c|  400 +
 crypto/sparc_arch.h   |   19 ++
 5 files changed, 1141 insertions(+), 1 deletion(-)

diff --git a/Configure b/Configure
index 66b4ff8..217a552 100755
--- a/Configure
+++ b/Configure
@@ -130,7 +130,7 @@ my $x86_elf_asm=$x86_asm:elf;
 
 my $x86_64_asm=x86_64cpuid.o:x86_64-gcc.o x86_64-mont.o x86_64-mont5.o 
x86_64-gf2m.o modexp512-x86_64.o::aes-x86_64.o vpaes-x86_64.o bsaes-x86_64.o 
aesni-x86_64.o aesni-sha1-x86_64.o::md5-x86_64.o:sha1-x86_64.o sha256-x86_64.o 
sha512-x86_64.o::rc4-x86_64.o rc4-md5-x86_64.o:::wp-x86_64.o:cmll-x86_64.o 
cmll_misc.o:ghash-x86_64.o:e_padlock-x86_64.o;
 my $ia64_asm=ia64cpuid.o:bn-ia64.o ia64-mont.o::aes_core.o aes_cbc.o 
aes-ia64.o::md5-ia64.o:sha1-ia64.o sha256-ia64.o sha512-ia64.o::rc4-ia64.o 
rc4_skey.o:ghash-ia64.o::void;
-my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o 
sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_sparccore.o aes_cbc.o 
aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o 
sha512-sparcv9.o:::ghash-sparcv9.o::void;
+my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o 
sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_sparccore.o 
aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o 
sha512-sparcv9.o:::ghash-sparcv9.o::void;
 my $sparcv8_asm=:sparcv8.o:des_enc-sparc.o fcrypt_b.o:void;
 my $alpha_asm=alphacpuid.o:bn_asm.o 
alpha-mont.o:sha1-alpha.o:::ghash-alpha.o::void;
 my $mips64_asm=:bn-mips.o mips-mont.o::aes_cbc.o aes-mips.o:::sha1-mips.o 
sha256-mips.o sha512-mips.o;
diff --git a/crypto/aes/aes_sparccore.c b/crypto/aes/aes_sparccore.c
index 2842cbc..658cc66 100644
--- a/crypto/aes/aes_sparccore.c
+++ b/crypto/aes/aes_sparccore.c
@@ -36,6 +36,7 @@
 #include stdlib.h
 #include openssl/crypto.h
 #include openssl/aes.h
+#include openssl/modes.h
 #include aes_locl.h
 
 #include sparc_arch.h
@@ -270,3 +271,57 @@ int AES_set_decrypt_key(const unsigned char *userKey, 
const int bits,
}
return 0;
 }
+
+void aes_sparc_hw_cbc_encrypt(const unsigned char *in, unsigned char *out,
+ size_t length, const AES_KEY *key,
+ unsigned char *ivec, int enc);
+
+void AES_cbc_encrypt(const unsigned char *in, unsigned char *out,
+size_t len, const AES_KEY *key,
+unsigned char *ivec, const int enc)
+{
+   const void *aligned_in;
+   void *aligned_out;
+   int aligned_len;
+   size_t bl = 16;
+
+   if (!(OPENSSL_sparcv9cap_P  SPARCV9_AES))
+   goto slow;
+
+   aligned_len = len  ~(bl - 1);
+   if (!aligned_len)
+   goto trailing;
+
+   aligned_out = out;
+   if ((unsigned long) out  0x7) {
+   aligned_out = OPENSSL_malloc(aligned_len);
+   if (!aligned_out)
+   goto slow;
+   }
+   aligned_in = in;
+   if ((unsigned long)in  0x7) {
+   memcpy(aligned_out, in, aligned_len);
+   aligned_in = (const void *) aligned_out;
+   }
+
+   aes_sparc_hw_cbc_encrypt(aligned_in, aligned_out, aligned_len,
+key, ivec, enc);
+
+   if ((unsigned long)out  0x7) {
+   memcpy(out, aligned_out, aligned_len);
+   OPENSSL_free(aligned_out);
+   }
+trailing:
+   len -= aligned_len;
+   if (len) {
+   out += aligned_len;
+   in += aligned_len;
+slow:
+   if (enc)
+   CRYPTO_cbc128_encrypt(in, out, 

[PATCH] sparc: Add support for CAMELLIA opcodes.

2012-09-21 Thread David Miller

On a SPARC T4-2, with CAMELLIA opcodes disabled:

type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
camellia-128 cbc63737.35k66054.61k66780.50k66775.35k
67062.44k
camellia-192 cbc51126.33k53836.78k54761.73k54964.91k
55017.47k
camellia-256 cbc51126.24k53774.55k54760.02k54963.54k
55017.47k

with CAMELLIA opcodes enabled:

type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
camellia-128 cbc   483488.94k   608627.31k   646251.78k   645825.54k   
657945.94k
camellia-192 cbc   396779.71k   474317.61k   497627.22k   497634.65k   
504881.15k
camellia-256 cbc   396796.10k   474297.19k   497624.06k   497644.20k   
504872.96k

Signed-off-by: David S. Miller da...@davemloft.net
---

If this is applied before the sparc AES opcode patches, there is a minor
and easy to resolve conflict in the top-level Configure file.

Tested on the full matrix of {static,shared}/linux{,64}-sparcv9

 Configure  |2 +-
 crypto/camellia/Makefile   |2 +
 crypto/camellia/asm/cmll-sparcv9.S |  604 
 crypto/camellia/cmll_sparccore.c   |  219 +
 crypto/sparc_arch.h|   11 +
 5 files changed, 837 insertions(+), 1 deletion(-)
 create mode 100644 crypto/camellia/asm/cmll-sparcv9.S
 create mode 100644 crypto/camellia/cmll_sparccore.c

diff --git a/Configure b/Configure
index 217a552..b4cbb56 100755
--- a/Configure
+++ b/Configure
@@ -130,7 +130,7 @@ my $x86_elf_asm=$x86_asm:elf;
 
 my $x86_64_asm=x86_64cpuid.o:x86_64-gcc.o x86_64-mont.o x86_64-mont5.o 
x86_64-gf2m.o modexp512-x86_64.o::aes-x86_64.o vpaes-x86_64.o bsaes-x86_64.o 
aesni-x86_64.o aesni-sha1-x86_64.o::md5-x86_64.o:sha1-x86_64.o sha256-x86_64.o 
sha512-x86_64.o::rc4-x86_64.o rc4-md5-x86_64.o:::wp-x86_64.o:cmll-x86_64.o 
cmll_misc.o:ghash-x86_64.o:e_padlock-x86_64.o;
 my $ia64_asm=ia64cpuid.o:bn-ia64.o ia64-mont.o::aes_core.o aes_cbc.o 
aes-ia64.o::md5-ia64.o:sha1-ia64.o sha256-ia64.o sha512-ia64.o::rc4-ia64.o 
rc4_skey.o:ghash-ia64.o::void;
-my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o 
sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_sparccore.o 
aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o 
sha512-sparcv9.o:::ghash-sparcv9.o::void;
+my $sparcv9_asm=sparcv9cap.o sparccpuid.o:bn-sparcv9.o sparcv9-mont.o 
sparcv9a-mont.o:des_enc-sparc.o fcrypt_b.o:aes_sparccore.o 
aes-sparcv9.o::md5-sparcv9.o:sha1-sparcv9.o sha256-sparcv9.o 
sha512-sparcv9.o::cmll-sparcv9.o cmll_sparccore.o:ghash-sparcv9.o::void;
 my $sparcv8_asm=:sparcv8.o:des_enc-sparc.o fcrypt_b.o:void;
 my $alpha_asm=alphacpuid.o:bn_asm.o 
alpha-mont.o:sha1-alpha.o:::ghash-alpha.o::void;
 my $mips64_asm=:bn-mips.o mips-mont.o::aes_cbc.o aes-mips.o:::sha1-mips.o 
sha256-mips.o sha512-mips.o;
diff --git a/crypto/camellia/Makefile b/crypto/camellia/Makefile
index 8858dd0..6802393 100644
--- a/crypto/camellia/Makefile
+++ b/crypto/camellia/Makefile
@@ -48,6 +48,8 @@ cmll-x86.s:   asm/cmll-x86.pl ../perlasm/x86asm.pl
$(PERL) asm/cmll-x86.pl $(PERLASM_SCHEME) $(CFLAGS) $(PROCESSOR)  $@
 cmll-x86_64.s:  asm/cmll-x86_64.pl
$(PERL) asm/cmll-x86_64.pl $(PERLASM_SCHEME)  $@
+cmll-sparcv9.s: asm/cmll-sparcv9.S
+   $(CC) $(CFLAGS) -E asm/cmll-sparcv9.S  $@
 
 files:
$(PERL) $(TOP)/util/files.pl Makefile  $(TOP)/MINFO
diff --git a/crypto/camellia/asm/cmll-sparcv9.S 
b/crypto/camellia/asm/cmll-sparcv9.S
new file mode 100644
index 000..015d5ee
--- /dev/null
+++ b/crypto/camellia/asm/cmll-sparcv9.S
@@ -0,0 +1,604 @@
+/* Written by David S. Miller da...@davemloft.net for the OpenSSL
+ * project. The module is, however, dual licensed under OpenSSL and
+ * CRYPTOGAMS licenses depending on where you obtain it. For further
+ * details see http://www.openssl.org/~appro/cryptogams/.
+ */
+
+#include sparc_arch.h
+
+#ifdef __arch64__
+   .register   %g2,#scratch
+   .register   %g3,#scratch
+#endif
+
+#define CAMELLIA_6ROUNDS(KEY_BASE, I0, I1) \
+   CAMELLIA_F(KEY_BASE +  0, I1, I0, I1) \
+   CAMELLIA_F(KEY_BASE +  2, I0, I1, I0) \
+   CAMELLIA_F(KEY_BASE +  4, I1, I0, I1) \
+   CAMELLIA_F(KEY_BASE +  6, I0, I1, I0) \
+   CAMELLIA_F(KEY_BASE +  8, I1, I0, I1) \
+   CAMELLIA_F(KEY_BASE + 10, I0, I1, I0)
+
+#define CAMELLIA_6ROUNDS_FL_FLI(KEY_BASE, I0, I1) \
+   CAMELLIA_6ROUNDS(KEY_BASE, I0, I1) \
+   CAMELLIA_FL(KEY_BASE + 12, I0, I0) \
+   CAMELLIA_FLI(KEY_BASE + 14, I1, I1)
+
+   .data
+
+   .align  8
+SIGMA: .xword  0xA09E667F3BCC908B
+   .xword  0xB67AE8584CAA73B2
+   .xword  0xC6EF372FE94F82BE
+   .xword  0x54FF53A5F1D36F1C
+   .xword  0x10E527FADE682D1D
+   .xword  0xB05688C2B3E6C1FD
+
+   .text
+
+SPARC_PIC_THUNK(g3)
+
+   .align  32
+   .globl  sparc_hw_camellia_ekeygen
+   .type   sparc_hw_camellia_ekeygen,#function