For those that are interested, the timings below compare an 'openssl.exe speed' 
run between a normal build and a PGO build of libeay32.dll.  Machine is a 
Windows Server 2008 x64 (dual quad-core 1.6GHz L5310 Xeons), build was done via 
Visual Studio 2008 x64.  (Mind you, it wasn't straight forward; the `perl 
Configure VC-WIN64A && ms\do_win64a && nmake -f ms\ntdll.mk` approach does not 
lend itself well to being augmented with an interim PGO/PGU cycle).

Normal build:

OpenSSL 0.9.8g 19 Oct 2007
built on: Thu Feb 28 07:36:23 2008
options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,4,long) aes(partial) 
idea(int) blowfish(idx)
compiler: cl  /MD /Ox /W3 /Gs0 /GF /Gy /nologo -DWIN32_LEAN_AND_MEAN -DL_ENDIAN 
-DDSO_WIN32 -DOPENSSL_SYSNAME_WIN32 -DOPENSSL_SYSNAM
E_WINNT -DUNICODE -D_UNICODE -D_CRT_SECURE_NO_DEPRECATE 
-D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_USE_APPLINK -I. /Fdout32dll -DOPENSSL_
NO_CAMELLIA -DOPENSSL_NO_SEED -DOPENSSL_NO_RC5 -DOPENSSL_NO_MDC2 
-DOPENSSL_NO_TLSEXT -DOPENSSL_NO_KRB5 -DOPENSSL_NO_DYNAMIC_ENGINE
available timing options: TIMEB HZ=1000
timing function used: ftime
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2                875.77k     1796.56k     2436.60k     2675.31k     2755.84k
mdc2                 0.00         0.00         0.00         0.00         0.00
md4              18029.35k    59417.30k   156723.18k   263896.44k   332222.10k
md5              14864.85k    47665.93k   119987.24k   191630.11k   233178.82k
hmac(md5)        15873.99k    48995.30k   121695.28k   192041.39k   233138.32k
sha1             15757.69k    48281.50k   113801.70k   172427.71k   203391.01k
rmd160           11715.11k    31724.71k    64739.40k    87609.48k    97883.41k
rc4             215534.64k   234810.58k   237402.24k   246108.49k   250668.10k
des cbc          28718.27k    30039.78k    30554.03k    30640.52k    30595.82k
des ede3         11309.21k    11658.13k    11733.94k    11753.62k    11733.67k
idea cbc         31037.31k    32688.19k    33143.45k    32886.83k    33245.25k
seed cbc             0.00         0.00         0.00         0.00         0.00
rc2 cbc          14291.86k    14642.36k    14711.69k    14732.36k    14722.67k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00
blowfish cbc     49446.55k    53773.13k    55151.93k    55434.38k    55581.30k
cast cbc         47798.34k    51087.75k    51829.52k    52983.47k    53371.13k
aes-128 cbc      75743.64k    81166.99k    82100.40k    82423.07k    82402.83k
aes-192 cbc      65382.76k    70522.14k    71453.22k    71697.50k    71943.46k
aes-256 cbc      59747.92k    63072.24k    63646.49k    63815.96k    63828.10k
camellia-128 cbc        0.00         0.00         0.00         0.00         0.00
camellia-192 cbc        0.00         0.00         0.00         0.00         0.00
camellia-256 cbc        0.00         0.00         0.00         0.00         0.00
sha256           12607.81k    30391.44k    54975.72k    69835.96k    75735.09k
sha512            9071.95k    36271.14k    64255.90k    95176.38k   111448.75k
aes-128 ige      79081.86k    87086.51k    88510.77k    87792.86k    88885.91k
aes-192 ige      70984.62k    75217.29k    77083.46k    76555.86k    77083.46k
aes-256 ige      62345.66k    66391.83k    67636.43k    67432.54k    67855.27k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.000478s 0.000032s   2093.7  31697.7
rsa 1024 bits 0.001691s 0.000077s    591.4  12926.0
rsa 2048 bits 0.008902s 0.000229s    112.3   4373.8
rsa 4096 bits 0.054600s 0.000762s     18.3   1312.5
                  sign    verify    sign/s verify/s
dsa  512 bits 0.000330s 0.000339s   3034.4   2947.7
dsa 1024 bits 0.000774s 0.000869s   1292.4   1151.1
dsa 2048 bits 0.002274s 0.002739s    439.7    365.1

PGO-enabled build (the instrumentation was captured via an openssl.exe speed 
run, then everything was rebuilt with the new profiling information):

OpenSSL 0.9.8g 19 Oct 2007
built on: Thu Feb 28 19:23:56 2008
options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,4,long) aes(partial) 
idea(int) blowfish(idx)
compiler: cl  /MD /Ox /favor:INTEL64 /GA /GL /W3 /Gs0 /GF /Gy /nologo 
-DWIN32_LEAN_AND_MEAN -DL_ENDIAN -DDSO_WIN32 -DOPENSSL_SYSNAME
_WIN32 -DOPENSSL_SYSNAME_WINNT -DUNICODE -D_UNICODE -D_CRT_SECURE_NO_DEPRECATE 
-D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_USE_APPLINK -I.
 /Fdout32dll -DOPENSSL_NO_CAMELLIA -DOPENSSL_NO_SEED -DOPENSSL_NO_RC5 
-DOPENSSL_NO_MDC2 -DOPENSSL_NO_TLSEXT -DOPENSSL_NO_KRB5 -DOPEN
SSL_NO_DYNAMIC_ENGINE
available timing options: TIMEB HZ=1000
timing function used: ftime
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2                884.43k     1812.43k     2454.02k     2694.54k     2773.61k
mdc2                 0.00         0.00         0.00         0.00         0.00
md4              19307.46k    62482.07k   162648.73k   268865.64k   333460.19k
md5              16051.68k    50344.23k   123079.07k   193341.58k   233178.82k
hmac(md5)        16751.25k    51000.39k   123794.25k   194631.28k   233788.06k
sha1             16778.05k    50490.06k   117057.15k   175585.72k   204350.99k
rmd160           12193.19k    32864.28k    65725.35k    88336.01k    97990.60k
rc4             210451.78k   234318.66k   236899.41k   246651.22k   250705.56k
des cbc          28948.69k    29831.47k    30082.87k    30210.17k    30294.72k
des ede3         11638.72k    11850.82k    11910.12k    11910.07k    11928.90k
idea cbc         32198.86k    33039.02k    33504.18k    33662.15k    33662.15k
seed cbc             0.00         0.00         0.00         0.00         0.00
rc2 cbc          14197.53k    14484.34k    14542.73k    14552.82k    14572.41k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00
blowfish cbc     50021.51k    54041.60k    55016.28k    55288.24k    55434.38k
cast cbc         48447.06k    52078.89k    53371.13k    53644.18k    53773.13k
aes-128 cbc      77385.68k    81462.57k    82423.07k    82402.83k    83055.52k
aes-192 cbc      67216.41k    70745.17k    71943.46k    71697.50k    71928.04k
aes-256 cbc      60090.32k    63072.24k    63634.42k    63453.92k    63634.42k
camellia-128 cbc        0.00         0.00         0.00         0.00         0.00
camellia-192 cbc        0.00         0.00         0.00         0.00         0.00
camellia-256 cbc        0.00         0.00         0.00         0.00         0.00
sha256           12712.42k    30552.64k    55650.44k    70179.20k    76000.98k
sha512            9450.36k    37685.73k    65577.63k    96887.12k   113063.54k
aes-128 ige      81166.99k    87086.51k    88138.78k    88534.12k    89240.51k
aes-192 ige      70984.62k    75217.29k    76538.39k    76818.75k    77101.18k
aes-256 ige      63072.24k    66182.31k    66801.58k    67636.43k    67650.06k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.000408s 0.000032s   2452.6  31412.8
rsa 1024 bits 0.001786s 0.000086s    559.8  11671.4
rsa 2048 bits 0.010049s 0.000267s     99.5   3748.3
rsa 4096 bits 0.064000s 0.000895s     15.6   1117.3
                  sign    verify    sign/s verify/s
dsa  512 bits 0.000330s 0.000350s   3027.9   2860.1
dsa 1024 bits 0.000869s 0.000976s   1150.8   1024.2
dsa 2048 bits 0.002667s 0.003168s    374.9    315.7


I find the results a bit unexciting (compared to say, Python, where PGO 
provides a good 35% speedup or so); some runs are faster, but there are a few 
that are actually slower.

Regards,

    Trent.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to