Hello all - This patch is a contribution to OpenSSL. It offers an efficient implementation of AES-CTR, using Intel's AES-NI and AVX architecture.
This contribution also improves the performance of AES-GCM. While faster AES-GCM can be achieved by interleaving the CTR and GHASH, we understand from [1] and [2] that the OpenSSL team prefers to implement the encryption and the authentication serially (and separately). With this as the preferred direction, a faster CTR mode implementation would also improve AES-GCM. The performance improvement provided in this patch is achieved by observing that with a given IV, 96 bit of consecutive counter blocks are constant. Counter blocks are incremented only on their remaining 32 bits, and this can be carried out with ALU instructions. In addition, we note that the 96 bits are also constant after the initial xor, and can therefore be pre- calculated. This way, each counter requires only 32 bit xor (and done with ALU instructions). The performance: =============== AES-CTR performance: =================== The performance was measured by using openssl speed utility as follows: openssl speed -evp aes-128-ctr Single thread performance in 1000s of B/S, for 8KB buffer: Core i7-2600K @3.4GHz *: OpenSSL Git[1]: 3683194.43 (0.92 Cycles/Byte) This patch: 4664828.92 (0.73 Cycles/Byte) Speedup: 1.27X Core i7-3770 @3.4GHz **: OpenSSL Git[1]: 4016931.84 (0.85 Cycles/Byte) This patch: 5021340.50 (0.68 Cycles/Byte) Speedup: 1.25X AES-GCM performance: =================== The performance was measured by using openssl speed utility as follows: openssl speed -evp aes-128-gcm Single thread performance in 1000s of B/S, for 8KB buffer: Core i7-2600K @3.4GHz *: OpenSSL Git[1]: 1240734.57 (2.74 Cycles/Byte) This patch: 1346268.90 (2.53 Cycles/Byte) Speedup: 1.09X Core i7-3770 @3.4GHz **: OpenSSL Git[1]: 1354109.51 (2.51 Cycles/Byte) This patch: 1456667.28 (2.33 Cycles/Byte) Speedup: 1.08X *Codename "Sandy Bridge" **Codename "Ivy Bridge" As a comparison baseline, we post OpenSSL’s AES-ECB performance. The CTR mode implementation of the proposed patch is faster than the current OpenSSL ECB. (this is obviously less-than-optimal) AES-ECB performance: ================== The performance was measured by using openssl speed utility as follows: openssl speed -evp aes-128-ecb Core i7-2600K @3.4GHz *: OpenSSL Git[1]: 4005271.55 (0.85 Cycles/Byte) Core i7-3770 @3.4GHz **: OpenSSL Git[1]: 4364206.94 (0.78 Cycles/Byte) [1] OpenSSL Gitweb: http://git.openssl.org/gitweb/, references are true for 3/20/2013. [2] S.Gueron, V.Krasnov, “[PATCH] Efficient implementation of AES-GCM, using Intel's AES-NI, PCLMULQDQ”, http://rt.openssl.org/Ticket/Display.html?id=2900&user=guest&pass=guest Developers and authors: *************************************************************************** Shay Gueron (1, 2), and Vlad Krasnov (1) (1) Intel Corporation, Israel Development Center, Haifa, Israel (2) University of Haifa, Israel *************************************************************************** --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
intel_CTR_patch.patch
Description: Binary data
