[openssl-dev] [openssl.org #2937] Handshake performance degradation in 1.0.1 and up.

2016-02-02 Thread Rich Salz via RT
The patches were large and added new features and API's which isn't appropriate
for bugfix releases.

In the master branch, branch the PRF functionality has been redirected to
libcrypto so it's possible it can be optimised by using a more efficient
implementation in crypto/kdf or in an engine. There is already one
optimisation: the number of updates should be reduced because all the seed
values are now concatenated
--
Rich Salz, OpenSSL dev team; rs...@openssl.org

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl.org #2937] Handshake performance degradation in 1.0.1 and up.

2013-01-10 Thread Andrey Kulikov via RT
On 11 December 2012 04:00, Stephen Henson via RT r...@openssl.org wrote:



 I also notice that even the original HMAC version initialises two HMAC
 contexts with the same key. That could be improved by initialising one
 and copying the context across.


This kind of optimization can be also applied P_hash implementation via to
EVP_DigestSign*.
HMAC contexts being re-initialized in inner loop every time it executed.
It is possible to improve it here by initializing only once in the
beginning of function, save to some temporary variables, and restore
restore from them when needed.
It would solve original issue with re-initialilization excessive overhead.
BUT:
1. It gives only miserable performance benefits in normal case (i.e.
software-only) (  1%). Hashing of few more bytes is almost nothing in
comparison to BN-manipulations.
2. In case of external hardware usage it may lead to the same result: in my
case cost of MAC copying call is the same as cost of MAC calculation call.
And only way to improve performance is to reduce total amount of remote
calls.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2937] Handshake performance degradation in 1.0.1 and up.

2013-01-10 Thread Andrey Kulikov via RT
Please find attached two patches, together implementing proper HMAC context
re-initialization instead of full re-creation.
In comparison to openssl-1.0.1c it gives ~10% handshake performance
improvements when some engine-specific MAC are used.

In order to apply patches use command
patch -p1 -i filename

Patches checked to applied to:
openssl-1.0.1c
openssl-1.0.2-stable-SNAP-20130108

make test report Ok. for both versions (Linux, x86[_64])

Please let me know if you have any questions.



TLS_P_hash-HMAC_reinit.patch
Description: Binary data


TLS_P_hash-HMAC_reinit2.patch
Description: Binary data


Re: [openssl.org #2937] Handshake performance degradation in 1.0.1 and up.

2012-12-10 Thread Andrey Kulikov
 In my case, handshake rate drops down to 5-6% on the same hardware in 1.0.1c
in comparison to 1.0.0i.
I was wrong. Handshake performance degradation is about 10%.

First guilty function is EVP_DigestSignFinal what is perform copying of
supplied context.

When I replaces in tls1_P_hash() EVP_DigestSignFinal()  by it's dull
equvivalent, not performing context copying i got these numbers:

OpenSSL 1.0.1c - EVP_DigestSignFinalNoCopy
Digest init called 105 times.
Digest copy called 70 times.
Digest cleanup called 174 times.

And  get 5% performance improvement in comparison to original OpenSSL 1.0.1c
(Still, 5% worth than 1.0.0)

Second suspect is re-initialization of MAC context in tls1_P_hash()'s loop.
While it was real re-initialization when there was HMAC_Init_ex(), what
performed initialization of internal i_ctx and o_ctx only when necessary,
with EVP_DigestSignInit it is full re-creating, and internal i_ctx and
o_ctx is always initialized.

This is why we see 3 times more 'digest init' calls.
Also, there could be other reasons for what is still hidden from me.


Eliminating EVP_DigestSignFinal overhead in tls1_P_hash() by replasing it
with calls, what do not perform context copying is trivial.
But how can we properly perfrom MAC true re-initialization instead of
creation from very beggining?




 As a drawback, keyblock setup for a chiphersuites with 256-bit encryption
 and MAC key require about 3 times more intensive usage of hash objects.
 For example, in order to perform one handshake,
 in OpenSSL 1.0.0i
 Digest init called 30 times.
 Digest copy called 69 times.
 Digest cleanup called 98 times.

 OpenSSL 1.0.1c
 Digest init called 105 times.
 Digest copy called 160 times.
 Digest cleanup called 264 times.

 ~3 times more intensive hashes objects usage definitely not good for
 performance.
 In my case, handshake rate drops down to 5-6% on the same hardware in 1.0.1c
 in comparison to 1.0.0i.



 Is there any way to reduce hash objects usage, while keeping TLS 1.1/1.2
 features?




[openssl.org #2937] Handshake performance degradation in 1.0.1 and up.

2012-12-10 Thread Stephen Henson via RT
 [openssl-dev@openssl.org - Tue Dec 11 00:48:42 2012]:
 
  In my case, handshake rate drops down to 5-6% on the same hardware
 in 1.0.1c
 in comparison to 1.0.0i.
 I was wrong. Handshake performance degradation is about 10%.
 
 First guilty function is EVP_DigestSignFinal what is perform copying
 of
 supplied context.
 

That's documented behaviour but there's no reason why a flag couldn't be
added which stops the copying if it isn't needed.

 
 Eliminating EVP_DigestSignFinal overhead in tls1_P_hash() by replasing
 it
 with calls, what do not perform context copying is trivial.
 But how can we properly perfrom MAC true re-initialization instead of
 creation from very beggining?
 

Looking the way HMAC is translated into EVP_DigestSign* could be made
more efficient so it supports a proper HMAC context reset instead of
reinitialising with the same key all the time.

I also notice that even the original HMAC version initialises two HMAC
contexts with the same key. That could be improved by initialising one
and copying the context across.

Steve.
-- 
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[openssl.org #2937] Handshake performance degradation in 1.0.1 and up.

2012-12-09 Thread Andrey Kulikov via RT
In comparison to 1.0.0, in 1.0.1 the implementation of PRF have been
changed.
In order to supporf TLS 1.1/1.2 features, in file ssl/t1_enc.c, in function
tls_P_hash() calls to HMAC_Init/HMAC_Update/HMAC_Final where replaced by
EVP_DigestSignInit/EVP_DigestSignUpdate/EVP_DigestSignFinal.

As a drawback, keyblock setup for a chiphersuites with 256-bit encryption and
MAC key require about 3 times more intensive usage of hash objects.
For example, in order to perform one handshake,
in OpenSSL 1.0.0i
Digest init called 30 times.
Digest copy called 69 times.
Digest cleanup called 98 times.

OpenSSL 1.0.1c
Digest init called 105 times.
Digest copy called 160 times.
Digest cleanup called 264 times.

~3 times more intensive hashes objects usage definitely not good for
performance.
In my case, handshake rate drops down to 5-6% on the same hardware in 1.0.1c
in comparison to 1.0.0i.

Also, more intense malloc usage leads to faster head fragmentation. But I
didn't able to measure impact of that factor yet.

Is there any way to reduce hash objects usage, while keeping TLS 1.1/1.2
features?

In comparison to 1.0.0, in 1.0.1 the implementation of PRF have been changed.In order to supporf TLS 1.1/1.2 features, in file ssl/t1_enc.c, in function tls_P_hash() calls to HMAC_Init/HMAC_Update/HMAC_Final where replaced by EVP_DigestSignInit/EVP_DigestSignUpdate/EVP_DigestSignFinal.
As a drawback, keyblock setup for a chiphersuites with 256-bit encryption and MAC key require about 3 times more intensive usage of hash objects.For example, in order to perform one handshake,
in OpenSSL 1.0.0iDigest init called 30 times.Digest copy called 69 times.Digest cleanup called 98 times.




OpenSSL 1.0.1c

Digest init called 105 times.

Digest copy called 160 times.Digest cleanup called 264 times.




~3 times more intensive hashes objects usage definitely not good for performance.In my case, handshake rate drops down to 5-6% on the same hardware in 1.0.1c in comparison to 1.0.0i.
Also, more intense malloc usage leads to faster head fragmentation. But I didnt able to measure impact of that factor yet.Is there any way to reduce hash objects usage, while keeping TLS 1.1/1.2 features?