[openssl-dev] [openssl.org #2650] major ssl read/ write performance improvement - updated

2016-06-12 Thread Rich Salz via RT
Sorry it took so long to look at this.

The code has changed significantly since then, including making the structures
opaque.

Please open a new ticker (or GitHub pull request) against current sources if
this is still an issue.

-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=2650
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-11 Thread Ryan Brown
I'm getting more SSL timeouts when running apachebench with this patch enabled,

http://www.pastie.org/3002992
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-09 Thread Deng Michael
Hi Andrey,

I measured on a chip that has no OS which supports cryto acceleration (cavium 
octeon). My setup doe not involve TCP io since the TCP data has been received 
and passed to ssl through custom BIO (or mem bio). I measure SSL_read SSL_write 
(about 1K size) in ms (aes256_cbc/sha1). the measurement is done through cpu 
ticks, the number seems:

without any change and crypto accel: 170ms (this is linear almost to the size 
of record)
with cryto accel only:                         54ms (or something like that, 
the acceleration is done on the same cavium cpu through engine interface)
with the patch:                                   25ms 

since there is no OS so the code runs to finish and IOs are done separately. 
The memory allocation is based cavium provided code. for me the saving is fixed 
so the percentage depends on other part. I don't have a way of measuring if IO 
is involved.

Regards,
Michael


- Original Message -
From: Andrey Kulikov amde...@gmail.com
To: openssl-dev@openssl.org
Cc: 
Sent: Thursday, December 8, 2011 4:11 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

Hello Michael,

I have tested youe patch.
It is working stable at least with ccgost engine (and without any
engine too, of cource).
Thanks for contribution!

Could you please describe, what was your test environmnet and test methodology?
How did you measure that doubling read/write speed? What tool/profiler
do you use?
How it depends from SSL record size?
What the overall speed improvement if we'll count OS IO?

I'm asking because I'm trying to measure performance improvement your
changes can give with my crypto-accelerator, and my results not even
close to doube read/write speed.
But my test resources are limited for the moment, and it is possible
it is due to these limitations.

In any case, I guess comunity will be grateful if your share your expirience.

WBR,
Andrey

On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote:
 Hi,
  I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

  The saving is fairly major, on cpu with cryto acceleration, the change
 can more than double the overall ssl read /write speed for 1K record
 excluding OS IO time.
__
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-09 Thread Deng Michael
I forgot to mention when I tested it was a slightly different impl that 
contains couple other small optimizations, in the tls1_mac() function I 
combined the first two update calls into one call which saved couple of ms 
also. the numbers were tls numbers. 

as for the question of record size, the smaller the record the larger the 
percentage of saving since the saving is fixed.


- Original Message -
From: Deng Michael mdeng...@yahoo.com
To: openssl-dev@openssl.org openssl-dev@openssl.org
Cc: 
Sent: Friday, December 9, 2011 5:15 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

Hi Andrey,

I measured on a chip that has no OS which supports cryto acceleration (cavium 
octeon). My setup doe not involve TCP io since the TCP data has been received 
and passed to ssl through custom BIO (or mem bio). I measure SSL_read or 
SSL_write (about 1K size) in ms (aes256_cbc/sha1). the measurement is done 
through cpu ticks, the number seems:

without any change and crypto accel: 170ms (this is linear almost to the size 
of record)
with cryto accel only:                         54ms (or something like that, 
the acceleration is done on the same cavium cpu through engine interface)
with the patch:                                   25ms 

since there is no OS so the code runs to finish and IOs are done separately. 
The memory allocation is based cavium provided code. for me the saving is fixed 
so the percentage depends on other part. I don't have a way of measuring if IO 
is involved.

Regards,
Michael


- Original Message -
From: Andrey Kulikov amde...@gmail.com
To: openssl-dev@openssl.org
Cc: 
Sent: Thursday, December 8, 2011 4:11 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

Hello Michael,

I have tested youe patch.
It is working stable at least with ccgost engine (and without any
engine too, of cource).
Thanks for contribution!

Could you please describe, what was your test environmnet and test methodology?
How did you measure that doubling read/write speed? What tool/profiler
do you use?
How it depends from SSL record size?
What the overall speed improvement if we'll count OS IO?

I'm asking because I'm trying to measure performance improvement your
changes can give with my crypto-accelerator, and my results not even
close to doube read/write speed.
But my test resources are limited for the moment, and it is possible
it is due to these limitations.

In any case, I guess comunity will be grateful if your share your expirience.

WBR,
Andrey

On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote:
 Hi,
  I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

  The saving is fairly major, on cpu with cryto acceleration, the change
 can more than double the overall ssl read /write speed for 1K record
 excluding OS IO time.
__
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project                                http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-09 Thread Deng Michael
I just tried a simple tls echo server running on the chip with only one core 
enabled (to rule malloc contention by cpu cores, this is a 16 core cpu). I did 
two runs and the ONLY difference in code is with /without checkpointing the 
ctx. both have crypto accel. the speed is measured on data part only and each 
ssl write is of size 1000 on the client side. the openssl code is running on 
thread protected mode (registered lock callbacks).

The server has no OS (with TCP stack in software). the overall performance 
difference is close to double (54 vs 96). This also may indicate the memory 
allocation and deallocation routines in my setup are not very good.

in system with no OS I think the timing is more indicative of software 
efficiency. for my setup the unknown is the memory arena malloc / free calls.


- Original Message -
From: Deng Michael mdeng...@yahoo.com
To: openssl-dev@openssl.org openssl-dev@openssl.org
Cc: 
Sent: Friday, December 9, 2011 5:34 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

I forgot to mention when I tested it was a slightly different impl that 
contains couple other small optimizations, in the tls1_mac() function I 
combined the first two update calls into one call which saved couple of ms 
also. the numbers were tls numbers. 

as for the question of record size, the smaller the record the larger the 
percentage of saving since the saving is fixed.


- Original Message -
From: Deng Michael mdeng...@yahoo.com
To: openssl-dev@openssl.org openssl-dev@openssl.org
Cc: 
Sent: Friday, December 9, 2011 5:15 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

Hi Andrey,

I measured on a chip that has no OS which supports cryto acceleration (cavium 
octeon). My setup doe not involve TCP io since the TCP data has been received 
and passed to ssl through custom BIO (or mem bio). I measure SSL_read or 
SSL_write (about 1K size) in ms (aes256_cbc/sha1). the measurement is done 
through cpu ticks, the number seems:

without any change and crypto accel: 170ms (this is linear almost to the size 
of record)
with cryto accel only:                         54ms (or something like that, 
the acceleration is done on the same cavium cpu through engine interface)
with the patch:                                   25ms 

since there is no OS so the code runs to finish and IOs are done separately. 
The memory allocation is based cavium provided code. for me the saving is fixed 
so the percentage depends on other part. I don't have a way of measuring if IO 
is involved.

Regards,
Michael


- Original Message -
From: Andrey Kulikov amde...@gmail.com
To: openssl-dev@openssl.org
Cc: 
Sent: Thursday, December 8, 2011 4:11 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

Hello Michael,

I have tested youe patch.
It is working stable at least with ccgost engine (and without any
engine too, of cource).
Thanks for contribution!

Could you please describe, what was your test environmnet and test methodology?
How did you measure that doubling read/write speed? What tool/profiler
do you use?
How it depends from SSL record size?
What the overall speed improvement if we'll count OS IO?

I'm asking because I'm trying to measure performance improvement your
changes can give with my crypto-accelerator, and my results not even
close to doube read/write speed.
But my test resources are limited for the moment, and it is possible
it is due to these limitations.

In any case, I guess comunity will be grateful if your share your expirience.

WBR,
Andrey

On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote:
 Hi,
  I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

  The saving is fairly major, on cpu with cryto acceleration, the change
 can more than double the overall ssl read /write speed for 1K record
 excluding OS IO time.
__
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project                                http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project                                http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project http://www.openssl.org

Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-08 Thread Andrey Kulikov
Hello Michael,

I have tested youe patch.
It is working stable at least with ccgost engine (and without any
engine too, of cource).
Thanks for contribution!

Could you please describe, what was your test environmnet and test methodology?
How did you measure that doubling read/write speed? What tool/profiler
do you use?
How it depends from SSL record size?
What the overall speed improvement if we'll count OS IO?

I'm asking because I'm trying to measure performance improvement your
changes can give with my crypto-accelerator, and my results not even
close to doube read/write speed.
But my test resources are limited for the moment, and it is possible
it is due to these limitations.

In any case, I guess comunity will be grateful if your share your expirience.

WBR,
Andrey

On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote:
 Hi,
  I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

  The saving is fairly major, on cpu with cryto acceleration, the change
 can more than double the overall ssl read /write speed for 1K record
 excluding OS IO time.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-05 Thread Deng Michael via RT
Hi,
 I have changed the mac code which gives substantial improvement for both read 
and write (not handshake)

 The saving is fairly major, on cpu with cryto acceleration, the change 
can more than double the overall ssl read /write speed for 1K record 
excluding OS IO time. this implies the change removed majority of the 
code overhead for read and write.

 The basic idea
is to remove all the EVP_MD_CTX duplications (which is very cpu 
intensive) during read and write. the original code involves numerous 
memory allocations and frees for each read or write all due to the ctx's
deep copy.

 the new way of keeping the ctx is to
make it do state checkpoint and restore instead of deep copy, after 
this change there is NO memory operation for read and write. The changes
are not too big also.

 One catch (should not 
really be a catch) is that at application level NO MORE than one thread 
can work on the SAME SSL/TLS connection for read or write (read or write
can be done at the same time). But I would think most apps would NEVER 
allow more than one thread to read or write on the same connection (I 
don't think it would work if you do that anyway, even without my 
change).

 the patch file I attached is based on 1.0.0e version.


Andrey found some problem in original version of the patch when PKEY_METHS 
engine is used. so this is an updated patch (complete, not incremental patch) 
to fix that.

This checkpoint/restore is  enabled if PKEY_METHS engine is used UNLESS the 
engine code implements the control interface to do the checkpointing/restore.

As pointed out by others, there can be other ways to achieve similar thing, the 
saving also depends your system's memory allocation routines. also part of the 
patch look a bit like hack


Thanks to Andrey!

Regards,
Michael



checkpoint.patch
Description: Binary data


Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-05 Thread Fanboy
Got a patch for trunk also?

On Mon, Dec 5, 2011 at 11:33 PM, Deng Michael via RT r...@openssl.org wrote:
 Hi,
  I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

  The saving is fairly major, on cpu with cryto acceleration, the change
 can more than double the overall ssl read /write speed for 1K record
 excluding OS IO time. this implies the change removed majority of the
 code overhead for read and write.

  The basic idea
 is to remove all the EVP_MD_CTX duplications (which is very cpu
 intensive) during read and write. the original code involves numerous
 memory allocations and frees for each read or write all due to the ctx's
 deep copy.

  the new way of keeping the ctx is to
 make it do state checkpoint and restore instead of deep copy, after
 this change there is NO memory operation for read and write. The changes
 are not too big also.

  One catch (should not
 really be a catch) is that at application level NO MORE than one thread
 can work on the SAME SSL/TLS connection for read or write (read or write
 can be done at the same time). But I would think most apps would NEVER
 allow more than one thread to read or write on the same connection (I
 don't think it would work if you do that anyway, even without my
 change).

  the patch file I attached is based on 1.0.0e version.


 Andrey found some problem in original version of the patch when PKEY_METHS 
 engine is used. so this is an updated patch (complete, not incremental patch) 
 to fix that.

 This checkpoint/restore is  enabled if PKEY_METHS engine is used UNLESS the 
 engine code implements the control interface to do the checkpointing/restore.

 As pointed out by others, there can be other ways to achieve similar thing, 
 the saving also depends your system's memory allocation routines. also part 
 of the patch look a bit like hack


 Thanks to Andrey!

 Regards,
 Michael

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


major ssl read/ write performance improvement - updated

2011-12-04 Thread Deng Michael
Hi,
 I have changed the mac code which gives substantial improvement for both read 
and write (not handshake)

 The saving is fairly major, on cpu with cryto acceleration, the change 
can more than double the overall ssl read /write speed for 1K record 
excluding OS IO time. this implies the change removed majority of the 
code overhead for read and write.

 The basic idea
is to remove all the EVP_MD_CTX duplications (which is very cpu 
intensive) during read and write. the original code involves numerous 
memory allocations and frees for each read or write all due to the ctx's
deep copy.

 the new way of keeping the ctx is to
make it do state checkpoint and restore instead of deep copy, after 
this change there is NO memory operation for read and write. The changes
are not too big also.

 One catch (should not 
really be a catch) is that at application level NO MORE than one thread 
can work on the SAME SSL/TLS connection for read or write (read or write
can be done at the same time). But I would think most apps would NEVER 
allow more than one thread to read or write on the same connection (I 
don't think it would work if you do that anyway, even without my 
change).

 the patch file I attached is based on 1.0.0e version.


Andrey found some problem in original version of the patch when PKEY_METHS 
engine is used. so this is an updated patch (complete, not incremental patch) 
to fix that.

This checkpoint/restore is  enabled if PKEY_METHS engine is used UNLESS the 
engine code implements the control interface to do the checkpointing/restore.

As pointed out by others, there can be other ways to achieve similar thing, the 
saving also depends your system's memory allocation routines. also part of the 
patch look a bit like hack


Thanks to Andrey!

Regards,
Michael


checkpoint.patch
Description: Binary data


Re: major ssl read/ write performance improvement

2011-12-03 Thread Andrey Kulikov
Hello,

Thanks for interesting contribution!

Unfortunately when I apply the patch s_server failed with SEGFAULT,
when using ccgost engine (and possibly others) here:

EVP_DigestSignFinal
if (sctx)
 r = md_ctx_ptr-pctx-pmeth-signctx(md_ctx_ptr-pctx,
   sigret, siglen, 
md_ctx_ptr);
else

because of
pmeth-signctx == 0x08
(or something like this)

When I use RSA certificate segfault didn't occur, as pmeth-signctx
points to some valid place.

Stacktrace is:

EVP_DigestSignFinal (ctx=0x87802b0, sigret=0xbfd5f6dc
\\\b\\002x\\b\\001\, siglen=0xbfd5f698)
tls1_mac (ssl=0x877a088, md=0xbfd5f6dc \\\b\\002x\\b\\001\, send=0)
ssl3_get_record (s=0x877a088)
ssl3_read_bytes (s=0x877a088, type=22, buf=0x8788d50 \\\020\, len=4, peek=0)
ssl3_get_message (s=0x877a088, st1=8608, stn=8609, mt=-1, max=514,
ok=0xbfd5f8b8)
ssl3_get_cert_verify (s=0x877a088)
ssl3_accept (s=0x877a088)
ssl3_read_bytes (s=0x877a088, type=23, buf=0x877e7e8 \\, len=4096, peek=0)
ssl3_read_internal (s=0x877a088, buf=0x877e7e8, len=4096, peek=0)
ssl3_read (s=0x877a088, buf=0x877e7e8, len=4096)
SSL_read (s=0x877a088, buf=0x877e7e8, num=4096)
ssl_read (b=0x8779370, out=0x877e7e8 \\, outl=4096)
BIO_read (b=0x8779370, out=0x877e7e8, outl=4096)
buffer_gets (b=0x8777e00, buf=0x877a7e0 \\, size=16382)
BIO_gets (b=0x8777e00, in=0x877a7e0 \\, inl=16383)
www_body (hostname=0x0, s=6, context=0x0)
do_server (port=443, type=1, ret=0x8248ac8, cb=0x8072d24 www_body,
context=0x0)
s_server_main (argc=0, argv=0xbfd602b8)
do_cmd (prog=0x8770868, argc=16, argv=0xbfd60278)
main (Argc=16, Argv=0xbfd60278)


Could you please advice, what going wrong with your code???

Go check it you need:
1. Adjust your openssl.cnf file, bu adding there:

openssl_conf = openssl_def

[openssl_def]
engines = engine_section

[engine_section]
gost = gost_section

[gost_section]
engine_id = gost
default_algorithms = ALL

somewhhere before [ new_oids ] (if we talking about sample config
file from OpenSSL distribution).

2. Generate private key:
./apps/openssl genpkey -engine gost -algorithm gost2001 -pkeyopt
paramset:A -out botkey.p8

3. Create self-sign certificate
./apps/openssl req -x509 -days 1095 -subj
'/C=US/CN=ccgost_srv/O=sam...@mail.com' -engine gost -new -key
botkey.p8 -out botcert.cer

4. Run s_server
./apps/openssl s_server  -engine gost  -tls1 -www -accept 443  -state
 -cert botcert.cer  -key botkey.p8 -cipher aGOST01

5. Run s_client
./apps/openssl s_client -tls1 -connect 192.168.10.103:443 -msg

Well Here s_client will crash with segfault... But if you'll
connect via browser - s_server will crash.


Please let me know if you'll have any questions.

Andrey.


On 30 November 2011 05:56, Deng Michael mdeng...@yahoo.com wrote:
 Thanks Steve for the comment.


 I guess there are other ways to do similar things, since I was not sure about 
 the intentions of the original code I was trying to make the change in a way 
 such that when checkpoint is not call it should behave like before. Adding a 
 new field for me is less likely to interfere with other code. It seems to me 
 the three evp_md_ctxs contained within the hmac_md_ctx has the data for 
 restoring the state but I was not sure. Also the new field serves as a flag 
 to tell if it has checkpoint data (I could have used an existing flag). My 
 patch also contains some hacking I would think.


 anyway the real saving comes from redo of state preserving of the evp_md_ctx 
 that contains evp_pkey_ctx which in turn contains hmac_ctx which again 
 contains three evp_md_ctx's. the dup of these are called in

 tls1_mac() similar place for ssl3
 and
 EVP_DigestSignFinal()

 these two are the super expensive ones (real super)

 the copy of ctx in
 HMAC_Final() --- this one is not too bad


 can be simplified.


 I would think the saving is so much that is worth changing maybe in future 
 releases.

 regards,
 Michael



 - Original Message -
 From: Dr. Stephen Henson st...@openssl.org
 To: openssl-dev@openssl.org
 Cc:
 Sent: Tuesday, November 29, 2011 1:21 PM
 Subject: Re: major ssl read/ write performance improvement

 On Mon, Nov 28, 2011, Deng Michael wrote:

 Hi,

 I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

 The saving is fairly major, on cpu with cryto acceleration, the change can 
 more than double the overall ssl read /write speed for 1K record excluding 
 OS IO time. this implies the change removed majority of the code overhead 
 for read and write.

 The basic idea is to remove all the EVP_MD_CTX duplications (which is very 
 cpu intensive) during read and write. the original code involves numerous 
 memory allocations and frees for each read or write all due to the ctx's 
 deep copy.

 the new way of keeping the ctx is to make it do state checkpoint and restore 
 instead of deep copy, after this change there is NO memory operation for 
 read and write. The changes

Re: major ssl read/ write performance improvement

2011-12-03 Thread Deng Michael
Hi Andrey,

Thanks for trying it out. I did not try this version with many engines. I am 
very interested in your set up. could you try without the patch how it works 
(under gdb)

what is the value of ctx-pctx-pmeth-signctx when the function was entered. 
and what is the  tmp_ctx.pctx-pmeth-signctx after the ctx copy. 


Also I am not sure how you use engines. The patch should work if digest engine 
is used (as digest engine such as sha1 or md5). I am sure if there is signing 
engine.

It would be great if your could send me the engine code and how your code used 
the engine then we could figure out how to escape that. I am not sure how the 
pointer is set up by openssl (I'll do some digging there). but the value 0x08 
likely coming from a member of NULL pointer structure (the member happens to be 
at offset 8). this is a guess.

Regards,

Michael Deng

mdeng...@yahoo.com


- Original Message -
From: Andrey Kulikov amde...@gmail.com
To: openssl-dev@openssl.org
Cc: 
Sent: Saturday, December 3, 2011 5:15 PM
Subject: Re: major ssl read/ write performance improvement

Hello,

Thanks for interesting contribution!

Unfortunately when I apply the patch s_server failed with SEGFAULT,
when using ccgost engine (and possibly others) here:

EVP_DigestSignFinal
        if (sctx)
      r = md_ctx_ptr-pctx-pmeth-signctx(md_ctx_ptr-pctx,
                               sigret, siglen, md_ctx_ptr);
        else

because of
pmeth-signctx == 0x08
(or something like this)

When I use RSA certificate segfault didn't occur, as pmeth-signctx
points to some valid place.

Stacktrace is:

EVP_DigestSignFinal (ctx=0x87802b0, sigret=0xbfd5f6dc
\\\b\\002x\\b\\001\, siglen=0xbfd5f698)
tls1_mac (ssl=0x877a088, md=0xbfd5f6dc \\\b\\002x\\b\\001\, send=0)
ssl3_get_record (s=0x877a088)
ssl3_read_bytes (s=0x877a088, type=22, buf=0x8788d50 \\\020\, len=4, peek=0)
ssl3_get_message (s=0x877a088, st1=8608, stn=8609, mt=-1, max=514,
ok=0xbfd5f8b8)
ssl3_get_cert_verify (s=0x877a088)
ssl3_accept (s=0x877a088)
ssl3_read_bytes (s=0x877a088, type=23, buf=0x877e7e8 \\, len=4096, peek=0)
ssl3_read_internal (s=0x877a088, buf=0x877e7e8, len=4096, peek=0)
ssl3_read (s=0x877a088, buf=0x877e7e8, len=4096)
SSL_read (s=0x877a088, buf=0x877e7e8, num=4096)
ssl_read (b=0x8779370, out=0x877e7e8 \\, outl=4096)
BIO_read (b=0x8779370, out=0x877e7e8, outl=4096)
buffer_gets (b=0x8777e00, buf=0x877a7e0 \\, size=16382)
BIO_gets (b=0x8777e00, in=0x877a7e0 \\, inl=16383)
www_body (hostname=0x0, s=6, context=0x0)
do_server (port=443, type=1, ret=0x8248ac8, cb=0x8072d24 www_body,
context=0x0)
s_server_main (argc=0, argv=0xbfd602b8)
do_cmd (prog=0x8770868, argc=16, argv=0xbfd60278)
main (Argc=16, Argv=0xbfd60278)


Could you please advice, what going wrong with your code???

Go check it you need:
1. Adjust your openssl.cnf file, bu adding there:

openssl_conf = openssl_def

[openssl_def]
engines = engine_section

[engine_section]
gost = gost_section

[gost_section]
engine_id = gost
default_algorithms = ALL

somewhhere before [ new_oids ] (if we talking about sample config
file from OpenSSL distribution).

2. Generate private key:
./apps/openssl genpkey -engine gost -algorithm gost2001 -pkeyopt
paramset:A -out botkey.p8

3. Create self-sign certificate
./apps/openssl req -x509 -days 1095 -subj
'/C=US/CN=ccgost_srv/O=sam...@mail.com' -engine gost -new -key
botkey.p8 -out botcert.cer

4. Run s_server
./apps/openssl s_server  -engine gost  -tls1 -www -accept 443  -state
-cert botcert.cer  -key botkey.p8 -cipher aGOST01

5. Run s_client
./apps/openssl s_client -tls1 -connect 192.168.10.103:443 -msg

Well Here s_client will crash with segfault... But if you'll
connect via browser - s_server will crash.


Please let me know if you'll have any questions.

Andrey.


On 30 November 2011 05:56, Deng Michael mdeng...@yahoo.com wrote:
 Thanks Steve for the comment.


 I guess there are other ways to do similar things, since I was not sure about 
 the intentions of the original code I was trying to make the change in a way 
 such that when checkpoint is not call it should behave like before. Adding a 
 new field for me is less likely to interfere with other code. It seems to me 
 the three evp_md_ctxs contained within the hmac_md_ctx has the data for 
 restoring the state but I was not sure. Also the new field serves as a flag 
 to tell if it has checkpoint data (I could have used an existing flag). My 
 patch also contains some hacking I would think.


 anyway the real saving comes from redo of state preserving of the evp_md_ctx 
 that contains evp_pkey_ctx which in turn contains hmac_ctx which again 
 contains three evp_md_ctx's. the dup of these are called in

 tls1_mac() similar place for ssl3
 and
 EVP_DigestSignFinal()

 these two are the super expensive ones (real super)

 the copy of ctx in
 HMAC_Final() --- this one is not too bad


 can be simplified.


 I would think the saving is so much that is worth changing maybe in future 
 releases.

 regards,
 Michael

Re: major ssl read/ write performance improvement

2011-12-03 Thread Deng Michael
Hi Andrey again,

Maybe there is a bug in the patch 


        if(EVP_MD_CTX_has_checkpoint(ctx)){
          md_ctx_ptr = ctx;
        }

Should be changed to 


        if(EVP_MD_CTX_has_checkpoint(ctx)){
          md_ctx_ptr = ctx;
        }
        else
    {
     EVP_MD_CTX_init(tmp_ctx);
    }



- Original Message -
From: Andrey Kulikov amde...@gmail.com
To: openssl-dev@openssl.org
Cc: 
Sent: Saturday, December 3, 2011 5:15 PM
Subject: Re: major ssl read/ write performance improvement

Hello,

Thanks for interesting contribution!

Unfortunately when I apply the patch s_server failed with SEGFAULT,
when using ccgost engine (and possibly others) here:

EVP_DigestSignFinal
        if (sctx)
      r = md_ctx_ptr-pctx-pmeth-signctx(md_ctx_ptr-pctx,
                               sigret, siglen, md_ctx_ptr);
        else

because of
pmeth-signctx == 0x08
(or something like this)

When I use RSA certificate segfault didn't occur, as pmeth-signctx
points to some valid place.

Stacktrace is:

EVP_DigestSignFinal (ctx=0x87802b0, sigret=0xbfd5f6dc
\\\b\\002x\\b\\001\, siglen=0xbfd5f698)
tls1_mac (ssl=0x877a088, md=0xbfd5f6dc \\\b\\002x\\b\\001\, send=0)
ssl3_get_record (s=0x877a088)
ssl3_read_bytes (s=0x877a088, type=22, buf=0x8788d50 \\\020\, len=4, peek=0)
ssl3_get_message (s=0x877a088, st1=8608, stn=8609, mt=-1, max=514,
ok=0xbfd5f8b8)
ssl3_get_cert_verify (s=0x877a088)
ssl3_accept (s=0x877a088)
ssl3_read_bytes (s=0x877a088, type=23, buf=0x877e7e8 \\, len=4096, peek=0)
ssl3_read_internal (s=0x877a088, buf=0x877e7e8, len=4096, peek=0)
ssl3_read (s=0x877a088, buf=0x877e7e8, len=4096)
SSL_read (s=0x877a088, buf=0x877e7e8, num=4096)
ssl_read (b=0x8779370, out=0x877e7e8 \\, outl=4096)
BIO_read (b=0x8779370, out=0x877e7e8, outl=4096)
buffer_gets (b=0x8777e00, buf=0x877a7e0 \\, size=16382)
BIO_gets (b=0x8777e00, in=0x877a7e0 \\, inl=16383)
www_body (hostname=0x0, s=6, context=0x0)
do_server (port=443, type=1, ret=0x8248ac8, cb=0x8072d24 www_body,
context=0x0)
s_server_main (argc=0, argv=0xbfd602b8)
do_cmd (prog=0x8770868, argc=16, argv=0xbfd60278)
main (Argc=16, Argv=0xbfd60278)


Could you please advice, what going wrong with your code???

Go check it you need:
1. Adjust your openssl.cnf file, bu adding there:

openssl_conf = openssl_def

[openssl_def]
engines = engine_section

[engine_section]
gost = gost_section

[gost_section]
engine_id = gost
default_algorithms = ALL

somewhhere before [ new_oids ] (if we talking about sample config
file from OpenSSL distribution).

2. Generate private key:
./apps/openssl genpkey -engine gost -algorithm gost2001 -pkeyopt
paramset:A -out botkey.p8

3. Create self-sign certificate
./apps/openssl req -x509 -days 1095 -subj
'/C=US/CN=ccgost_srv/O=sam...@mail.com' -engine gost -new -key
botkey.p8 -out botcert.cer

4. Run s_server
./apps/openssl s_server  -engine gost  -tls1 -www -accept 443  -state
-cert botcert.cer  -key botkey.p8 -cipher aGOST01

5. Run s_client
./apps/openssl s_client -tls1 -connect 192.168.10.103:443 -msg

Well Here s_client will crash with segfault... But if you'll
connect via browser - s_server will crash.


Please let me know if you'll have any questions.

Andrey.


On 30 November 2011 05:56, Deng Michael mdeng...@yahoo.com wrote:
 Thanks Steve for the comment.


 I guess there are other ways to do similar things, since I was not sure about 
 the intentions of the original code I was trying to make the change in a way 
 such that when checkpoint is not call it should behave like before. Adding a 
 new field for me is less likely to interfere with other code. It seems to me 
 the three evp_md_ctxs contained within the hmac_md_ctx has the data for 
 restoring the state but I was not sure. Also the new field serves as a flag 
 to tell if it has checkpoint data (I could have used an existing flag). My 
 patch also contains some hacking I would think.


 anyway the real saving comes from redo of state preserving of the evp_md_ctx 
 that contains evp_pkey_ctx which in turn contains hmac_ctx which again 
 contains three evp_md_ctx's. the dup of these are called in

 tls1_mac() similar place for ssl3
 and
 EVP_DigestSignFinal()

 these two are the super expensive ones (real super)

 the copy of ctx in
 HMAC_Final() --- this one is not too bad


 can be simplified.


 I would think the saving is so much that is worth changing maybe in future 
 releases.

 regards,
 Michael



 - Original Message -
 From: Dr. Stephen Henson st...@openssl.org
 To: openssl-dev@openssl.org
 Cc:
 Sent: Tuesday, November 29, 2011 1:21 PM
 Subject: Re: major ssl read/ write performance improvement

 On Mon, Nov 28, 2011, Deng Michael wrote:

 Hi,

 I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

 The saving is fairly major, on cpu with cryto acceleration, the change can 
 more than double the overall ssl read /write speed for 1K record excluding 
 OS IO time. this implies the change removed

Re: major ssl read/ write performance improvement

2011-11-29 Thread Dr. Stephen Henson
On Mon, Nov 28, 2011, Deng Michael wrote:

 Hi,
 
 I have changed the mac code which gives substantial improvement for both read 
 and write (not handshake)
 
 The saving is fairly major, on cpu with cryto acceleration, the change can 
 more than double the overall ssl read /write speed for 1K record excluding OS 
 IO time. this implies the change removed majority of the code overhead for 
 read and write.
 
 The basic idea is to remove all the EVP_MD_CTX duplications (which is very 
 cpu intensive) during read and write. the original code involves numerous 
 memory allocations and frees for each read or write all due to the ctx's deep 
 copy.
 
 the new way of keeping the ctx is to make it do state checkpoint and restore 
 instead of deep copy, after this change there is NO memory operation for read 
 and write. The changes are not too big also.
 
 One catch (should not really be a catch) is that at application level NO MORE 
 than one thread can work on the SAME SSL/TLS connection for read or write 
 (read or write can be done at the same time). But I would think most apps 
 would NEVER allow more than one thread to read or write on the same 
 connection (I don't think it would work if you do that anyway, even without 
 my change).
 
 the patch file I attached is mad from 1.0.0e version.
 

Thanks for the patch. It should really go to the request tracker RT though.

There are a few problems with the patch as it stands.

Firstly new features will never be added to 1.0.0x only security and bug
fixes.

Your patch adds a field in the middle of an EVP_MD_CTX which will result in
binary compatibility issues with existing applications so that makes it
problematical including it in 1.0.1 either. Adding the field on the end would
result in fewer problems but it would still increase the size of EVP_MD_CTX.

However I wonder if the same savings could be achieved in a different way. If
the destination EVP_MD_CTX is the same digest as the existing one no new
memory is allocated and it should simply memcpy the result across which should
be a far less expensive operation.

So perhaps if instead of having a temporary EVP_MD_CTX which is created and
destroyed regularly we could have a more persistent one tied to the SSL
structure: so the initial copy would allocate memory but subsequent ones would
only be a memcpy? Adding fields at the end of an SSL structure is likely to
cause far fewer problems because SSL structures are allocated using SSL_new().

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: major ssl read/ write performance improvement

2011-11-29 Thread Deng Michael
Thanks Steve for the comment. 


I guess there are other ways to do similar things, since I was not sure about 
the intentions of the original code I was trying to make the change in a way 
such that when checkpoint is not call it should behave like before. Adding a 
new field for me is less likely to interfere with other code. It seems to me 
the three evp_md_ctxs contained within the hmac_md_ctx has the data for 
restoring the state but I was not sure. Also the new field serves as a flag to 
tell if it has checkpoint data (I could have used an existing flag). My patch 
also contains some hacking I would think.


anyway the real saving comes from redo of state preserving of the evp_md_ctx 
that contains evp_pkey_ctx which in turn contains hmac_ctx which again contains 
three evp_md_ctx's. the dup of these are called in 

tls1_mac() similar place for ssl3
and
EVP_DigestSignFinal()

these two are the super expensive ones (real super)

the copy of ctx in 
HMAC_Final() --- this one is not too bad


can be simplified.


I would think the saving is so much that is worth changing maybe in future 
releases.

regards,
Michael



- Original Message -
From: Dr. Stephen Henson st...@openssl.org
To: openssl-dev@openssl.org
Cc: 
Sent: Tuesday, November 29, 2011 1:21 PM
Subject: Re: major ssl read/ write performance improvement

On Mon, Nov 28, 2011, Deng Michael wrote:

 Hi,
 
 I have changed the mac code which gives substantial improvement for both read 
 and write (not handshake)
 
 The saving is fairly major, on cpu with cryto acceleration, the change can 
 more than double the overall ssl read /write speed for 1K record excluding OS 
 IO time. this implies the change removed majority of the code overhead for 
 read and write.
 
 The basic idea is to remove all the EVP_MD_CTX duplications (which is very 
 cpu intensive) during read and write. the original code involves numerous 
 memory allocations and frees for each read or write all due to the ctx's deep 
 copy.
 
 the new way of keeping the ctx is to make it do state checkpoint and restore 
 instead of deep copy, after this change there is NO memory operation for read 
 and write. The changes are not too big also.
 
 One catch (should not really be a catch) is that at application level NO MORE 
 than one thread can work on the SAME SSL/TLS connection for read or write 
 (read or write can be done at the same time). But I would think most apps 
 would NEVER allow more than one thread to read or write on the same 
 connection (I don't think it would work if you do that anyway, even without 
 my change).
 
 the patch file I attached is mad from 1.0.0e version.
 

Thanks for the patch. It should really go to the request tracker RT though.

There are a few problems with the patch as it stands.

Firstly new features will never be added to 1.0.0x only security and bug
fixes.

Your patch adds a field in the middle of an EVP_MD_CTX which will result in
binary compatibility issues with existing applications so that makes it
problematical including it in 1.0.1 either. Adding the field on the end would
result in fewer problems but it would still increase the size of EVP_MD_CTX.

However I wonder if the same savings could be achieved in a different way. If
the destination EVP_MD_CTX is the same digest as the existing one no new
memory is allocated and it should simply memcpy the result across which should
be a far less expensive operation.

So perhaps if instead of having a temporary EVP_MD_CTX which is created and
destroyed regularly we could have a more persistent one tied to the SSL
structure: so the initial copy would allocate memory but subsequent ones would
only be a memcpy? Adding fields at the end of an SSL structure is likely to
cause far fewer problems because SSL structures are allocated using SSL_new().

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
__
OpenSSL Project                                http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


major ssl read/ write performance improvement

2011-11-28 Thread Deng Michael
Hi,

I have changed the mac code which gives substantial improvement for both read 
and write (not handshake)

The saving is fairly major, on cpu with cryto acceleration, the change can more 
than double the overall ssl read /write speed for 1K record excluding OS IO 
time. this implies the change removed majority of the code overhead for read 
and write.

The basic idea is to remove all the EVP_MD_CTX duplications (which is very cpu 
intensive) during read and write. the original code involves numerous memory 
allocations and frees for each read or write all due to the ctx's deep copy.

the new way of keeping the ctx is to make it do state checkpoint and restore 
instead of deep copy, after this change there is NO memory operation for read 
and write. The changes are not too big also.

One catch (should not really be a catch) is that at application level NO MORE 
than one thread can work on the SAME SSL/TLS connection for read or write (read 
or write can be done at the same time). But I would think most apps would NEVER 
allow more than one thread to read or write on the same connection (I don't 
think it would work if you do that anyway, even without my change).

the patch file I attached is mad from 1.0.0e version.

Happy coding!

Michael


checkpoint.patch
Description: Binary data