[openssl-dev] [openssl.org #2650] major ssl read/ write performance improvement - updated

2016-06-12 Thread Rich Salz via RT
Sorry it took so long to look at this.

The code has changed significantly since then, including making the structures
opaque.

Please open a new ticker (or GitHub pull request) against current sources if
this is still an issue.

-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=2650
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-11 Thread Ryan Brown
I'm getting more SSL timeouts when running apachebench with this patch enabled,

http://www.pastie.org/3002992
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-09 Thread Deng Michael
Hi Andrey,

I measured on a chip that has no OS which supports cryto acceleration (cavium 
octeon). My setup doe not involve TCP io since the TCP data has been received 
and passed to ssl through custom BIO (or mem bio). I measure SSL_read SSL_write 
(about 1K size) in ms (aes256_cbc/sha1). the measurement is done through cpu 
ticks, the number seems:

without any change and crypto accel: 170ms (this is linear almost to the size 
of record)
with cryto accel only:                         54ms (or something like that, 
the acceleration is done on the same cavium cpu through engine interface)
with the patch:                                   25ms 

since there is no OS so the code runs to finish and IOs are done separately. 
The memory allocation is based cavium provided code. for me the saving is fixed 
so the percentage depends on other part. I don't have a way of measuring if IO 
is involved.

Regards,
Michael


- Original Message -
From: Andrey Kulikov amde...@gmail.com
To: openssl-dev@openssl.org
Cc: 
Sent: Thursday, December 8, 2011 4:11 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

Hello Michael,

I have tested youe patch.
It is working stable at least with ccgost engine (and without any
engine too, of cource).
Thanks for contribution!

Could you please describe, what was your test environmnet and test methodology?
How did you measure that doubling read/write speed? What tool/profiler
do you use?
How it depends from SSL record size?
What the overall speed improvement if we'll count OS IO?

I'm asking because I'm trying to measure performance improvement your
changes can give with my crypto-accelerator, and my results not even
close to doube read/write speed.
But my test resources are limited for the moment, and it is possible
it is due to these limitations.

In any case, I guess comunity will be grateful if your share your expirience.

WBR,
Andrey

On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote:
 Hi,
  I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

  The saving is fairly major, on cpu with cryto acceleration, the change
 can more than double the overall ssl read /write speed for 1K record
 excluding OS IO time.
__
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-09 Thread Deng Michael
I forgot to mention when I tested it was a slightly different impl that 
contains couple other small optimizations, in the tls1_mac() function I 
combined the first two update calls into one call which saved couple of ms 
also. the numbers were tls numbers. 

as for the question of record size, the smaller the record the larger the 
percentage of saving since the saving is fixed.


- Original Message -
From: Deng Michael mdeng...@yahoo.com
To: openssl-dev@openssl.org openssl-dev@openssl.org
Cc: 
Sent: Friday, December 9, 2011 5:15 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

Hi Andrey,

I measured on a chip that has no OS which supports cryto acceleration (cavium 
octeon). My setup doe not involve TCP io since the TCP data has been received 
and passed to ssl through custom BIO (or mem bio). I measure SSL_read or 
SSL_write (about 1K size) in ms (aes256_cbc/sha1). the measurement is done 
through cpu ticks, the number seems:

without any change and crypto accel: 170ms (this is linear almost to the size 
of record)
with cryto accel only:                         54ms (or something like that, 
the acceleration is done on the same cavium cpu through engine interface)
with the patch:                                   25ms 

since there is no OS so the code runs to finish and IOs are done separately. 
The memory allocation is based cavium provided code. for me the saving is fixed 
so the percentage depends on other part. I don't have a way of measuring if IO 
is involved.

Regards,
Michael


- Original Message -
From: Andrey Kulikov amde...@gmail.com
To: openssl-dev@openssl.org
Cc: 
Sent: Thursday, December 8, 2011 4:11 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

Hello Michael,

I have tested youe patch.
It is working stable at least with ccgost engine (and without any
engine too, of cource).
Thanks for contribution!

Could you please describe, what was your test environmnet and test methodology?
How did you measure that doubling read/write speed? What tool/profiler
do you use?
How it depends from SSL record size?
What the overall speed improvement if we'll count OS IO?

I'm asking because I'm trying to measure performance improvement your
changes can give with my crypto-accelerator, and my results not even
close to doube read/write speed.
But my test resources are limited for the moment, and it is possible
it is due to these limitations.

In any case, I guess comunity will be grateful if your share your expirience.

WBR,
Andrey

On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote:
 Hi,
  I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

  The saving is fairly major, on cpu with cryto acceleration, the change
 can more than double the overall ssl read /write speed for 1K record
 excluding OS IO time.
__
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project                                http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-09 Thread Deng Michael
I just tried a simple tls echo server running on the chip with only one core 
enabled (to rule malloc contention by cpu cores, this is a 16 core cpu). I did 
two runs and the ONLY difference in code is with /without checkpointing the 
ctx. both have crypto accel. the speed is measured on data part only and each 
ssl write is of size 1000 on the client side. the openssl code is running on 
thread protected mode (registered lock callbacks).

The server has no OS (with TCP stack in software). the overall performance 
difference is close to double (54 vs 96). This also may indicate the memory 
allocation and deallocation routines in my setup are not very good.

in system with no OS I think the timing is more indicative of software 
efficiency. for my setup the unknown is the memory arena malloc / free calls.


- Original Message -
From: Deng Michael mdeng...@yahoo.com
To: openssl-dev@openssl.org openssl-dev@openssl.org
Cc: 
Sent: Friday, December 9, 2011 5:34 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

I forgot to mention when I tested it was a slightly different impl that 
contains couple other small optimizations, in the tls1_mac() function I 
combined the first two update calls into one call which saved couple of ms 
also. the numbers were tls numbers. 

as for the question of record size, the smaller the record the larger the 
percentage of saving since the saving is fixed.


- Original Message -
From: Deng Michael mdeng...@yahoo.com
To: openssl-dev@openssl.org openssl-dev@openssl.org
Cc: 
Sent: Friday, December 9, 2011 5:15 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

Hi Andrey,

I measured on a chip that has no OS which supports cryto acceleration (cavium 
octeon). My setup doe not involve TCP io since the TCP data has been received 
and passed to ssl through custom BIO (or mem bio). I measure SSL_read or 
SSL_write (about 1K size) in ms (aes256_cbc/sha1). the measurement is done 
through cpu ticks, the number seems:

without any change and crypto accel: 170ms (this is linear almost to the size 
of record)
with cryto accel only:                         54ms (or something like that, 
the acceleration is done on the same cavium cpu through engine interface)
with the patch:                                   25ms 

since there is no OS so the code runs to finish and IOs are done separately. 
The memory allocation is based cavium provided code. for me the saving is fixed 
so the percentage depends on other part. I don't have a way of measuring if IO 
is involved.

Regards,
Michael


- Original Message -
From: Andrey Kulikov amde...@gmail.com
To: openssl-dev@openssl.org
Cc: 
Sent: Thursday, December 8, 2011 4:11 PM
Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement 
- updated

Hello Michael,

I have tested youe patch.
It is working stable at least with ccgost engine (and without any
engine too, of cource).
Thanks for contribution!

Could you please describe, what was your test environmnet and test methodology?
How did you measure that doubling read/write speed? What tool/profiler
do you use?
How it depends from SSL record size?
What the overall speed improvement if we'll count OS IO?

I'm asking because I'm trying to measure performance improvement your
changes can give with my crypto-accelerator, and my results not even
close to doube read/write speed.
But my test resources are limited for the moment, and it is possible
it is due to these limitations.

In any case, I guess comunity will be grateful if your share your expirience.

WBR,
Andrey

On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote:
 Hi,
  I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

  The saving is fairly major, on cpu with cryto acceleration, the change
 can more than double the overall ssl read /write speed for 1K record
 excluding OS IO time.
__
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project                                http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project                                http://www.openssl.org
Development Mailing List                      openssl-dev@openssl.org
Automated List Manager                          majord...@openssl.org

__
OpenSSL Project http://www.openssl.org

Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-08 Thread Andrey Kulikov
Hello Michael,

I have tested youe patch.
It is working stable at least with ccgost engine (and without any
engine too, of cource).
Thanks for contribution!

Could you please describe, what was your test environmnet and test methodology?
How did you measure that doubling read/write speed? What tool/profiler
do you use?
How it depends from SSL record size?
What the overall speed improvement if we'll count OS IO?

I'm asking because I'm trying to measure performance improvement your
changes can give with my crypto-accelerator, and my results not even
close to doube read/write speed.
But my test resources are limited for the moment, and it is possible
it is due to these limitations.

In any case, I guess comunity will be grateful if your share your expirience.

WBR,
Andrey

On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote:
 Hi,
  I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

  The saving is fairly major, on cpu with cryto acceleration, the change
 can more than double the overall ssl read /write speed for 1K record
 excluding OS IO time.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-05 Thread Deng Michael via RT
Hi,
 I have changed the mac code which gives substantial improvement for both read 
and write (not handshake)

 The saving is fairly major, on cpu with cryto acceleration, the change 
can more than double the overall ssl read /write speed for 1K record 
excluding OS IO time. this implies the change removed majority of the 
code overhead for read and write.

 The basic idea
is to remove all the EVP_MD_CTX duplications (which is very cpu 
intensive) during read and write. the original code involves numerous 
memory allocations and frees for each read or write all due to the ctx's
deep copy.

 the new way of keeping the ctx is to
make it do state checkpoint and restore instead of deep copy, after 
this change there is NO memory operation for read and write. The changes
are not too big also.

 One catch (should not 
really be a catch) is that at application level NO MORE than one thread 
can work on the SAME SSL/TLS connection for read or write (read or write
can be done at the same time). But I would think most apps would NEVER 
allow more than one thread to read or write on the same connection (I 
don't think it would work if you do that anyway, even without my 
change).

 the patch file I attached is based on 1.0.0e version.


Andrey found some problem in original version of the patch when PKEY_METHS 
engine is used. so this is an updated patch (complete, not incremental patch) 
to fix that.

This checkpoint/restore is  enabled if PKEY_METHS engine is used UNLESS the 
engine code implements the control interface to do the checkpointing/restore.

As pointed out by others, there can be other ways to achieve similar thing, the 
saving also depends your system's memory allocation routines. also part of the 
patch look a bit like hack


Thanks to Andrey!

Regards,
Michael



checkpoint.patch
Description: Binary data


Re: [openssl.org #2650] major ssl read/ write performance improvement - updated

2011-12-05 Thread Fanboy
Got a patch for trunk also?

On Mon, Dec 5, 2011 at 11:33 PM, Deng Michael via RT r...@openssl.org wrote:
 Hi,
  I have changed the mac code which gives substantial improvement for both 
 read and write (not handshake)

  The saving is fairly major, on cpu with cryto acceleration, the change
 can more than double the overall ssl read /write speed for 1K record
 excluding OS IO time. this implies the change removed majority of the
 code overhead for read and write.

  The basic idea
 is to remove all the EVP_MD_CTX duplications (which is very cpu
 intensive) during read and write. the original code involves numerous
 memory allocations and frees for each read or write all due to the ctx's
 deep copy.

  the new way of keeping the ctx is to
 make it do state checkpoint and restore instead of deep copy, after
 this change there is NO memory operation for read and write. The changes
 are not too big also.

  One catch (should not
 really be a catch) is that at application level NO MORE than one thread
 can work on the SAME SSL/TLS connection for read or write (read or write
 can be done at the same time). But I would think most apps would NEVER
 allow more than one thread to read or write on the same connection (I
 don't think it would work if you do that anyway, even without my
 change).

  the patch file I attached is based on 1.0.0e version.


 Andrey found some problem in original version of the patch when PKEY_METHS 
 engine is used. so this is an updated patch (complete, not incremental patch) 
 to fix that.

 This checkpoint/restore is  enabled if PKEY_METHS engine is used UNLESS the 
 engine code implements the control interface to do the checkpointing/restore.

 As pointed out by others, there can be other ways to achieve similar thing, 
 the saving also depends your system's memory allocation routines. also part 
 of the patch look a bit like hack


 Thanks to Andrey!

 Regards,
 Michael

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org