[openssl-dev] [openssl.org #2937] Handshake performance degradation in 1.0.1 and up.
The patches were large and added new features and API's which isn't appropriate for bugfix releases. In the master branch, branch the PRF functionality has been redirected to libcrypto so it's possible it can be optimised by using a more efficient implementation in crypto/kdf or in an engine. There is already one optimisation: the number of updates should be reduced because all the seed values are now concatenated -- Rich Salz, OpenSSL dev team; rs...@openssl.org ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl.org #2937] Handshake performance degradation in 1.0.1 and up.
On 11 December 2012 04:00, Stephen Henson via RT r...@openssl.org wrote: I also notice that even the original HMAC version initialises two HMAC contexts with the same key. That could be improved by initialising one and copying the context across. This kind of optimization can be also applied P_hash implementation via to EVP_DigestSign*. HMAC contexts being re-initialized in inner loop every time it executed. It is possible to improve it here by initializing only once in the beginning of function, save to some temporary variables, and restore restore from them when needed. It would solve original issue with re-initialilization excessive overhead. BUT: 1. It gives only miserable performance benefits in normal case (i.e. software-only) ( 1%). Hashing of few more bytes is almost nothing in comparison to BN-manipulations. 2. In case of external hardware usage it may lead to the same result: in my case cost of MAC copying call is the same as cost of MAC calculation call. And only way to improve performance is to reduce total amount of remote calls. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #2937] Handshake performance degradation in 1.0.1 and up.
Please find attached two patches, together implementing proper HMAC context re-initialization instead of full re-creation. In comparison to openssl-1.0.1c it gives ~10% handshake performance improvements when some engine-specific MAC are used. In order to apply patches use command patch -p1 -i filename Patches checked to applied to: openssl-1.0.1c openssl-1.0.2-stable-SNAP-20130108 make test report Ok. for both versions (Linux, x86[_64]) Please let me know if you have any questions. TLS_P_hash-HMAC_reinit.patch Description: Binary data TLS_P_hash-HMAC_reinit2.patch Description: Binary data
Re: [openssl.org #2937] Handshake performance degradation in 1.0.1 and up.
In my case, handshake rate drops down to 5-6% on the same hardware in 1.0.1c in comparison to 1.0.0i. I was wrong. Handshake performance degradation is about 10%. First guilty function is EVP_DigestSignFinal what is perform copying of supplied context. When I replaces in tls1_P_hash() EVP_DigestSignFinal() by it's dull equvivalent, not performing context copying i got these numbers: OpenSSL 1.0.1c - EVP_DigestSignFinalNoCopy Digest init called 105 times. Digest copy called 70 times. Digest cleanup called 174 times. And get 5% performance improvement in comparison to original OpenSSL 1.0.1c (Still, 5% worth than 1.0.0) Second suspect is re-initialization of MAC context in tls1_P_hash()'s loop. While it was real re-initialization when there was HMAC_Init_ex(), what performed initialization of internal i_ctx and o_ctx only when necessary, with EVP_DigestSignInit it is full re-creating, and internal i_ctx and o_ctx is always initialized. This is why we see 3 times more 'digest init' calls. Also, there could be other reasons for what is still hidden from me. Eliminating EVP_DigestSignFinal overhead in tls1_P_hash() by replasing it with calls, what do not perform context copying is trivial. But how can we properly perfrom MAC true re-initialization instead of creation from very beggining? As a drawback, keyblock setup for a chiphersuites with 256-bit encryption and MAC key require about 3 times more intensive usage of hash objects. For example, in order to perform one handshake, in OpenSSL 1.0.0i Digest init called 30 times. Digest copy called 69 times. Digest cleanup called 98 times. OpenSSL 1.0.1c Digest init called 105 times. Digest copy called 160 times. Digest cleanup called 264 times. ~3 times more intensive hashes objects usage definitely not good for performance. In my case, handshake rate drops down to 5-6% on the same hardware in 1.0.1c in comparison to 1.0.0i. Is there any way to reduce hash objects usage, while keeping TLS 1.1/1.2 features?
[openssl.org #2937] Handshake performance degradation in 1.0.1 and up.
[openssl-dev@openssl.org - Tue Dec 11 00:48:42 2012]: In my case, handshake rate drops down to 5-6% on the same hardware in 1.0.1c in comparison to 1.0.0i. I was wrong. Handshake performance degradation is about 10%. First guilty function is EVP_DigestSignFinal what is perform copying of supplied context. That's documented behaviour but there's no reason why a flag couldn't be added which stops the copying if it isn't needed. Eliminating EVP_DigestSignFinal overhead in tls1_P_hash() by replasing it with calls, what do not perform context copying is trivial. But how can we properly perfrom MAC true re-initialization instead of creation from very beggining? Looking the way HMAC is translated into EVP_DigestSign* could be made more efficient so it supports a proper HMAC context reset instead of reinitialising with the same key all the time. I also notice that even the original HMAC version initialises two HMAC contexts with the same key. That could be improved by initialising one and copying the context across. Steve. -- Dr Stephen N. Henson. OpenSSL project core developer. Commercial tech support now available see: http://www.openssl.org __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
[openssl.org #2937] Handshake performance degradation in 1.0.1 and up.
In comparison to 1.0.0, in 1.0.1 the implementation of PRF have been changed. In order to supporf TLS 1.1/1.2 features, in file ssl/t1_enc.c, in function tls_P_hash() calls to HMAC_Init/HMAC_Update/HMAC_Final where replaced by EVP_DigestSignInit/EVP_DigestSignUpdate/EVP_DigestSignFinal. As a drawback, keyblock setup for a chiphersuites with 256-bit encryption and MAC key require about 3 times more intensive usage of hash objects. For example, in order to perform one handshake, in OpenSSL 1.0.0i Digest init called 30 times. Digest copy called 69 times. Digest cleanup called 98 times. OpenSSL 1.0.1c Digest init called 105 times. Digest copy called 160 times. Digest cleanup called 264 times. ~3 times more intensive hashes objects usage definitely not good for performance. In my case, handshake rate drops down to 5-6% on the same hardware in 1.0.1c in comparison to 1.0.0i. Also, more intense malloc usage leads to faster head fragmentation. But I didn't able to measure impact of that factor yet. Is there any way to reduce hash objects usage, while keeping TLS 1.1/1.2 features? In comparison to 1.0.0, in 1.0.1 the implementation of PRF have been changed.In order to supporf TLS 1.1/1.2 features, in file ssl/t1_enc.c, in function tls_P_hash() calls to HMAC_Init/HMAC_Update/HMAC_Final where replaced by EVP_DigestSignInit/EVP_DigestSignUpdate/EVP_DigestSignFinal. As a drawback, keyblock setup for a chiphersuites with 256-bit encryption and MAC key require about 3 times more intensive usage of hash objects.For example, in order to perform one handshake, in OpenSSL 1.0.0iDigest init called 30 times.Digest copy called 69 times.Digest cleanup called 98 times. OpenSSL 1.0.1c Digest init called 105 times. Digest copy called 160 times.Digest cleanup called 264 times. ~3 times more intensive hashes objects usage definitely not good for performance.In my case, handshake rate drops down to 5-6% on the same hardware in 1.0.1c in comparison to 1.0.0i. Also, more intense malloc usage leads to faster head fragmentation. But I didnt able to measure impact of that factor yet.Is there any way to reduce hash objects usage, while keeping TLS 1.1/1.2 features?