>>  [1] S. Gueron, V. Krasnov: "Parallelizing message schedules to accelerate 
>> the
>>  computations of hash functions", http://eprint.iacr.org/2012/067.pdf        
>>          
>>
> ...
> 
> As for Haswell. As discussed it's capable of executing 8xSMS SHA256 and 
> 4xSMS SHA512, i.e. loading 8/4x input blocks and pre-processing them 
> simultaneously. Improvement estimates are much higher, 14% for SHA256 
> and 20% for SHA512. On the other hand the processor is also capable of 
> loading 2x data and pre-processing already parallelized schedules 
> simultaneously... More careful consideration will be given at later point.

http://git.openssl.org/gitweb/?p=openssl.git;a=commitdiff;h=c4558efbf3a44a1b5e68dce46347dd3888db4760
adds AVX2 code. The code does not use SMS either, not in form discussed
in paper. This is because estimated performance improvement is even less
than for AVX1 code, at most 3% for SHA256 and 1.5% for SHA512. Estimate
is based on amount of computational instructions, i.e. discounting for
zero-latency moves. For reference. Above 14/20% estimate refers to
improvement over code that doesn't pre-process multiple input blocks at
all. While 3/1.5% are "relative" to committed code that loads pair of
input blocks and process them in parallel. In other words there is
simultaneous message schedule processing taking place, simply not in way
suggested in paper.


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to