[openssl-dev] [openssl.org #2650] major ssl read/ write performance improvement - updated
Sorry it took so long to look at this. The code has changed significantly since then, including making the structures opaque. Please open a new ticker (or GitHub pull request) against current sources if this is still an issue. -- Ticket here: http://rt.openssl.org/Ticket/Display.html?id=2650 Please log in as guest with password guest if prompted -- openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl.org #2650] major ssl read/ write performance improvement - updated
I'm getting more SSL timeouts when running apachebench with this patch enabled, http://www.pastie.org/3002992 __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #2650] major ssl read/ write performance improvement - updated
Hi Andrey, I measured on a chip that has no OS which supports cryto acceleration (cavium octeon). My setup doe not involve TCP io since the TCP data has been received and passed to ssl through custom BIO (or mem bio). I measure SSL_read SSL_write (about 1K size) in ms (aes256_cbc/sha1). the measurement is done through cpu ticks, the number seems: without any change and crypto accel: 170ms (this is linear almost to the size of record) with cryto accel only: 54ms (or something like that, the acceleration is done on the same cavium cpu through engine interface) with the patch: 25ms since there is no OS so the code runs to finish and IOs are done separately. The memory allocation is based cavium provided code. for me the saving is fixed so the percentage depends on other part. I don't have a way of measuring if IO is involved. Regards, Michael - Original Message - From: Andrey Kulikov amde...@gmail.com To: openssl-dev@openssl.org Cc: Sent: Thursday, December 8, 2011 4:11 PM Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement - updated Hello Michael, I have tested youe patch. It is working stable at least with ccgost engine (and without any engine too, of cource). Thanks for contribution! Could you please describe, what was your test environmnet and test methodology? How did you measure that doubling read/write speed? What tool/profiler do you use? How it depends from SSL record size? What the overall speed improvement if we'll count OS IO? I'm asking because I'm trying to measure performance improvement your changes can give with my crypto-accelerator, and my results not even close to doube read/write speed. But my test resources are limited for the moment, and it is possible it is due to these limitations. In any case, I guess comunity will be grateful if your share your expirience. WBR, Andrey On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote: Hi, I have changed the mac code which gives substantial improvement for both read and write (not handshake) The saving is fairly major, on cpu with cryto acceleration, the change can more than double the overall ssl read /write speed for 1K record excluding OS IO time. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #2650] major ssl read/ write performance improvement - updated
I forgot to mention when I tested it was a slightly different impl that contains couple other small optimizations, in the tls1_mac() function I combined the first two update calls into one call which saved couple of ms also. the numbers were tls numbers. as for the question of record size, the smaller the record the larger the percentage of saving since the saving is fixed. - Original Message - From: Deng Michael mdeng...@yahoo.com To: openssl-dev@openssl.org openssl-dev@openssl.org Cc: Sent: Friday, December 9, 2011 5:15 PM Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement - updated Hi Andrey, I measured on a chip that has no OS which supports cryto acceleration (cavium octeon). My setup doe not involve TCP io since the TCP data has been received and passed to ssl through custom BIO (or mem bio). I measure SSL_read or SSL_write (about 1K size) in ms (aes256_cbc/sha1). the measurement is done through cpu ticks, the number seems: without any change and crypto accel: 170ms (this is linear almost to the size of record) with cryto accel only: 54ms (or something like that, the acceleration is done on the same cavium cpu through engine interface) with the patch: 25ms since there is no OS so the code runs to finish and IOs are done separately. The memory allocation is based cavium provided code. for me the saving is fixed so the percentage depends on other part. I don't have a way of measuring if IO is involved. Regards, Michael - Original Message - From: Andrey Kulikov amde...@gmail.com To: openssl-dev@openssl.org Cc: Sent: Thursday, December 8, 2011 4:11 PM Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement - updated Hello Michael, I have tested youe patch. It is working stable at least with ccgost engine (and without any engine too, of cource). Thanks for contribution! Could you please describe, what was your test environmnet and test methodology? How did you measure that doubling read/write speed? What tool/profiler do you use? How it depends from SSL record size? What the overall speed improvement if we'll count OS IO? I'm asking because I'm trying to measure performance improvement your changes can give with my crypto-accelerator, and my results not even close to doube read/write speed. But my test resources are limited for the moment, and it is possible it is due to these limitations. In any case, I guess comunity will be grateful if your share your expirience. WBR, Andrey On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote: Hi, I have changed the mac code which gives substantial improvement for both read and write (not handshake) The saving is fairly major, on cpu with cryto acceleration, the change can more than double the overall ssl read /write speed for 1K record excluding OS IO time. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #2650] major ssl read/ write performance improvement - updated
I just tried a simple tls echo server running on the chip with only one core enabled (to rule malloc contention by cpu cores, this is a 16 core cpu). I did two runs and the ONLY difference in code is with /without checkpointing the ctx. both have crypto accel. the speed is measured on data part only and each ssl write is of size 1000 on the client side. the openssl code is running on thread protected mode (registered lock callbacks). The server has no OS (with TCP stack in software). the overall performance difference is close to double (54 vs 96). This also may indicate the memory allocation and deallocation routines in my setup are not very good. in system with no OS I think the timing is more indicative of software efficiency. for my setup the unknown is the memory arena malloc / free calls. - Original Message - From: Deng Michael mdeng...@yahoo.com To: openssl-dev@openssl.org openssl-dev@openssl.org Cc: Sent: Friday, December 9, 2011 5:34 PM Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement - updated I forgot to mention when I tested it was a slightly different impl that contains couple other small optimizations, in the tls1_mac() function I combined the first two update calls into one call which saved couple of ms also. the numbers were tls numbers. as for the question of record size, the smaller the record the larger the percentage of saving since the saving is fixed. - Original Message - From: Deng Michael mdeng...@yahoo.com To: openssl-dev@openssl.org openssl-dev@openssl.org Cc: Sent: Friday, December 9, 2011 5:15 PM Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement - updated Hi Andrey, I measured on a chip that has no OS which supports cryto acceleration (cavium octeon). My setup doe not involve TCP io since the TCP data has been received and passed to ssl through custom BIO (or mem bio). I measure SSL_read or SSL_write (about 1K size) in ms (aes256_cbc/sha1). the measurement is done through cpu ticks, the number seems: without any change and crypto accel: 170ms (this is linear almost to the size of record) with cryto accel only: 54ms (or something like that, the acceleration is done on the same cavium cpu through engine interface) with the patch: 25ms since there is no OS so the code runs to finish and IOs are done separately. The memory allocation is based cavium provided code. for me the saving is fixed so the percentage depends on other part. I don't have a way of measuring if IO is involved. Regards, Michael - Original Message - From: Andrey Kulikov amde...@gmail.com To: openssl-dev@openssl.org Cc: Sent: Thursday, December 8, 2011 4:11 PM Subject: Re: [openssl.org #2650] major ssl read/ write performance improvement - updated Hello Michael, I have tested youe patch. It is working stable at least with ccgost engine (and without any engine too, of cource). Thanks for contribution! Could you please describe, what was your test environmnet and test methodology? How did you measure that doubling read/write speed? What tool/profiler do you use? How it depends from SSL record size? What the overall speed improvement if we'll count OS IO? I'm asking because I'm trying to measure performance improvement your changes can give with my crypto-accelerator, and my results not even close to doube read/write speed. But my test resources are limited for the moment, and it is possible it is due to these limitations. In any case, I guess comunity will be grateful if your share your expirience. WBR, Andrey On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote: Hi, I have changed the mac code which gives substantial improvement for both read and write (not handshake) The saving is fairly major, on cpu with cryto acceleration, the change can more than double the overall ssl read /write speed for 1K record excluding OS IO time. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org __ OpenSSL Project http://www.openssl.org
Re: [openssl.org #2650] major ssl read/ write performance improvement - updated
Hello Michael, I have tested youe patch. It is working stable at least with ccgost engine (and without any engine too, of cource). Thanks for contribution! Could you please describe, what was your test environmnet and test methodology? How did you measure that doubling read/write speed? What tool/profiler do you use? How it depends from SSL record size? What the overall speed improvement if we'll count OS IO? I'm asking because I'm trying to measure performance improvement your changes can give with my crypto-accelerator, and my results not even close to doube read/write speed. But my test resources are limited for the moment, and it is possible it is due to these limitations. In any case, I guess comunity will be grateful if your share your expirience. WBR, Andrey On 5 December 2011 14:33, Deng Michael via RT r...@openssl.org wrote: Hi, I have changed the mac code which gives substantial improvement for both read and write (not handshake) The saving is fairly major, on cpu with cryto acceleration, the change can more than double the overall ssl read /write speed for 1K record excluding OS IO time. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
[openssl.org #2650] major ssl read/ write performance improvement - updated
Hi, I have changed the mac code which gives substantial improvement for both read and write (not handshake) The saving is fairly major, on cpu with cryto acceleration, the change can more than double the overall ssl read /write speed for 1K record excluding OS IO time. this implies the change removed majority of the code overhead for read and write. The basic idea is to remove all the EVP_MD_CTX duplications (which is very cpu intensive) during read and write. the original code involves numerous memory allocations and frees for each read or write all due to the ctx's deep copy. the new way of keeping the ctx is to make it do state checkpoint and restore instead of deep copy, after this change there is NO memory operation for read and write. The changes are not too big also. One catch (should not really be a catch) is that at application level NO MORE than one thread can work on the SAME SSL/TLS connection for read or write (read or write can be done at the same time). But I would think most apps would NEVER allow more than one thread to read or write on the same connection (I don't think it would work if you do that anyway, even without my change). the patch file I attached is based on 1.0.0e version. Andrey found some problem in original version of the patch when PKEY_METHS engine is used. so this is an updated patch (complete, not incremental patch) to fix that. This checkpoint/restore is enabled if PKEY_METHS engine is used UNLESS the engine code implements the control interface to do the checkpointing/restore. As pointed out by others, there can be other ways to achieve similar thing, the saving also depends your system's memory allocation routines. also part of the patch look a bit like hack Thanks to Andrey! Regards, Michael checkpoint.patch Description: Binary data
Re: [openssl.org #2650] major ssl read/ write performance improvement - updated
Got a patch for trunk also? On Mon, Dec 5, 2011 at 11:33 PM, Deng Michael via RT r...@openssl.org wrote: Hi, I have changed the mac code which gives substantial improvement for both read and write (not handshake) The saving is fairly major, on cpu with cryto acceleration, the change can more than double the overall ssl read /write speed for 1K record excluding OS IO time. this implies the change removed majority of the code overhead for read and write. The basic idea is to remove all the EVP_MD_CTX duplications (which is very cpu intensive) during read and write. the original code involves numerous memory allocations and frees for each read or write all due to the ctx's deep copy. the new way of keeping the ctx is to make it do state checkpoint and restore instead of deep copy, after this change there is NO memory operation for read and write. The changes are not too big also. One catch (should not really be a catch) is that at application level NO MORE than one thread can work on the SAME SSL/TLS connection for read or write (read or write can be done at the same time). But I would think most apps would NEVER allow more than one thread to read or write on the same connection (I don't think it would work if you do that anyway, even without my change). the patch file I attached is based on 1.0.0e version. Andrey found some problem in original version of the patch when PKEY_METHS engine is used. so this is an updated patch (complete, not incremental patch) to fix that. This checkpoint/restore is enabled if PKEY_METHS engine is used UNLESS the engine code implements the control interface to do the checkpointing/restore. As pointed out by others, there can be other ways to achieve similar thing, the saving also depends your system's memory allocation routines. also part of the patch look a bit like hack Thanks to Andrey! Regards, Michael __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org