Hello all - 

This patch is a contribution to OpenSSL. It offers an efficient implementation 
of AES-CTR, using Intel's AES-NI and AVX architecture.

This contribution also improves the performance of AES-GCM. While faster 
AES-GCM can be achieved by interleaving the CTR and GHASH, we understand from 
[1] and [2] that the OpenSSL team prefers to implement the encryption and the 
authentication serially (and separately). With this as the preferred direction, 
a faster CTR mode implementation would also improve AES-GCM. 

The performance improvement provided in this patch is achieved by observing  
that with a given IV, 96 bit of consecutive counter blocks are constant. 
Counter blocks are incremented only on their remaining 32 bits, and this can be 
carried out with ALU instructions. In addition, we note that the 96 bits are 
also constant after the initial xor, and can therefore be pre- calculated. This 
way, each counter requires only 32 bit xor (and done with ALU instructions).

The performance:
===============

AES-CTR performance:
===================
The performance was measured by using openssl speed utility as follows:
openssl speed -evp aes-128-ctr

Single thread performance in 1000s of B/S, for 8KB buffer:   

Core i7-2600K @3.4GHz *:

OpenSSL Git[1]: 3683194.43  (0.92 Cycles/Byte)
This patch:         4664828.92  (0.73 Cycles/Byte)
Speedup: 1.27X

Core i7-3770  @3.4GHz **:

OpenSSL Git[1]: 4016931.84  (0.85 Cycles/Byte)
This patch:         5021340.50 (0.68 Cycles/Byte)
Speedup: 1.25X

AES-GCM performance:
===================
The performance was measured by using openssl speed utility as follows:
openssl speed -evp aes-128-gcm

Single thread performance in 1000s of B/S, for 8KB buffer:   

Core i7-2600K @3.4GHz *:

OpenSSL Git[1]: 1240734.57 (2.74 Cycles/Byte)
This patch:         1346268.90 (2.53 Cycles/Byte)
Speedup: 1.09X

Core i7-3770  @3.4GHz **:

OpenSSL Git[1]: 1354109.51 (2.51 Cycles/Byte)
This patch:         1456667.28 (2.33 Cycles/Byte)
Speedup: 1.08X


*Codename "Sandy Bridge"
**Codename "Ivy Bridge"

As a comparison baseline, we post OpenSSL’s AES-ECB performance. 
The CTR mode implementation of the proposed patch is faster than the current 
OpenSSL ECB. 
(this is obviously less-than-optimal) 

AES-ECB performance:
==================

The performance was measured by using openssl speed utility as follows:
openssl speed -evp aes-128-ecb

Core i7-2600K @3.4GHz *:

OpenSSL Git[1]: 4005271.55 (0.85 Cycles/Byte)

Core i7-3770  @3.4GHz **:

OpenSSL Git[1]: 4364206.94 (0.78 Cycles/Byte)


[1] OpenSSL Gitweb: http://git.openssl.org/gitweb/, references are true for  
3/20/2013.
[2] S.Gueron, V.Krasnov, “[PATCH] Efficient implementation of AES-GCM,  using 
Intel's AES-NI, PCLMULQDQ”, 
    http://rt.openssl.org/Ticket/Display.html?id=2900&user=guest&pass=guest


Developers and authors:
***************************************************************************
Shay Gueron (1, 2), and Vlad Krasnov (1)

(1) Intel Corporation, Israel Development Center, Haifa, Israel
(2) University of Haifa, Israel
***************************************************************************
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Attachment: intel_CTR_patch.patch
Description: Binary data

Reply via email to