Joerg Schilling schrieb:
64 Bit code on Sparc is typically 5-10% slower, AMD64 code is typically 30%
faster because there are twice as much registers.

30% is very optimistic. My test results vary between 30% slower and 200% faster depending on the application and compiler. On average I'd say AMD64 code will be ~10% faster.

My previous posted results with "openssl speed" are void. 32 bit code was compiled with -xO3 while the 64 bit code was compiled with -xO5. I reran the tests which on average still favour 64 bit code - but to a lesser extent.

Test environment:

cc: Sun C 5.8 Patch 121016-03 2006/06/07
ube: Sun Compiler Common 11 Patch 120759-08 2006/08/08
../gcc-4.1.1/configure --with-system-zlib --with-gnu-as --with-as=/usr/sfw/bin/gas --without-included-gettext --without-libiconv-prefix --enable-languages=c,c++,ada,fortran,objc --with-x --enable-java-awt=xlib
Thread-Modell: posix
gcc-Version 4.1.1
AMD Athlon(tm) 64 X2 Dual Core Processor 4400+          ( == Opteron 175)
2x1 GB RAM Dual Channel DDR400 CL3 ECC


Numbers below are relative performance AMD64 vs. IA32 (<0 IA32 faster, >0% AMD64 faster)


(1) OpenSSL 0.9.8d

Studio 11       32 vs. 64 bits
./Configure no-asm solaris-x86-cc
./Configure no-asm solaris64-x86_64-cc
cc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -fast
        -xstrconst [ -xarch=amd64 -Xa -DL_ENDIAN ]

type                    16B     64B     256B    1024B   8192B
md2                     -10.05% -11.15% -11.74% -12.12% -12.34%
md4                       8.86%   6.05%   0.52%  -5.88% -10.08%
md5                      16.38%  10.94%   0.72%  -8.88% -14.06%
hmac(md5)                -2.63%  -5.14%  -8.79% -12.78% -14.68%
sha1                      4.24% -11.21% -21.91% -26.25% -28.58%
rmd160                   -1.22% -10.99% -20.55% -26.60% -29.35%
rc4                      78.52%  82.69%  80.98%  81.75%  81.79%
des cbc                  -8.77%  -9.57%  -9.63%  -9.69%  -9.63%
idea cbc                  6.43%   6.04%   6.02%   6.10%   5.85%
rc2 cbc                  -0.68%  -1.16%  -1.18%  -1.27%  -1.46%
blowfish cbc             -7.59%  -9.09%  -9.35%  -9.42%  -9.98%
cast cbc                -23.04% -24.26% -24.59% -25.31% -24.85%
aes-128 cbc              60.48%  61.71%  61.91%  62.32%  62.27%
aes-192 cbc              64.41%  63.91%  64.31%  65.11%  65.13%
aes-256 cbc              65.03%  66.60%  67.89%  67.40%  67.45%
sha256                  -16.11% -19.27% -23.42% -25.54% -26.56%
sha512                   82.83%  83.21% 112.24% 129.11% 137.42%

                        sign    verify
rsa 512 bits             40.73%  28.55%
rsa 1024 bits            28.89%  17.55%
rsa 2048 bits            15.93%   3.47%
rsa 4096 bits             7.69%  -3.87%
dsa 512 bits             29.38%  30.25%
dsa 1024 bits            20.51%  21.10%
dsa 2048 bits             7.06%   7.65%


gcc 4.1.1       32 vs. 64 bits
./Configure no-asm solaris-x86-gcc
./Configure no-asm solaris64-x86_64-gcc
gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -O3
        -fomit-frame-pointer -DL_ENDIAN
        { -march=pentium -DOPENSSL_NO_INLINE_ASM |
          -m64 -DL_ENDIAN -DMD32_REG_T=int }

type                    16B     64B     256B    1024B   8192B
md2                      -7.40%  -9.46% -10.21%  -9.36%  -8.95%
md4                      26.77%  24.25%  19.80%  14.47%  11.19%
md5                      18.20%  16.72%  11.50%   6.06%   2.69%
hmac(md5)                19.03%  16.02%  10.69%   5.95%   2.59%
sha1                     16.22%  13.13%  16.53%  20.12%  22.24%
rmd160                   24.41%  17.51%  12.67%   8.13%   6.07%
rc4                      22.65%  22.98%  23.10%  23.19%  23.17%
des cbc                  38.35%  37.66%  37.36%  37.29%  37.11%
idea cbc                 10.96%   6.71%   3.94%   3.69%   3.33%
rc2 cbc                   1.53%   0.27%  -0.23%  -0.22%  -0.33%
blowfish cbc              1.14%  -1.38%  -1.93%  -2.16%  -2.19%
cast cbc                 95.12%  97.09%  97.57%  97.94%  98.07%
aes-128 cbc              76.22%  82.13%  83.89%  84.50%  84.79%
aes-192 cbc              84.24%  86.69%  88.12%  88.91%  89.08%
aes-256 cbc              83.59%  90.55%  91.96%  92.34%  92.52%
sha256                   -3.48%  -2.75%  -1.07%  -0.29%   0.09%
sha512                  177.33% 177.60% 242.40% 279.34% 301.04%

                        sign    verify
rsa 512 bits             94.92% 109.87%
rsa 1024 bits           124.20% 123.21%
rsa 2048 bits           136.36% 130.01%
rsa 4096 bits           142.86% 129.65%
dsa 512 bits            117.52% 114.45%
dsa 1024 bits           137.08% 128.02%
dsa 2048 bits           134.24% 130.59%




(2) gzip/bzip2

I did also measure compression/decompression speed with gzip and bzip2 (test file: gcc-4.1.1.tar):
                Studio 11  32 vs. 64            gcc 4.1.1  32 vs. 64
gzip                     -5.78 %                         23.69 %
gunzip2                   2.46 %                          2.26 %
bzip2                     3.47 %                          4.71 %
bunzip2                  10.38 %                         12.12 %

gcc options:    -O3 [ -m64 ]
cc options:     -fast [ -xarch=amd64 ]

gzip-1.2.4a / bzip2-1.0.3


(3) Oracle 10g Release 2

And Oracle 10g Release 2 (10.2.0.2) 32 bit vs. 64 bit
(time for
        @?/rdbms/admin/catalog.sql
        @?/rdbms/admin/catproc.sql
 on a newly created database. init.ora parameters were the same for
 32 bit and 64 bit):


Time for catalog/catproc

32 bit          381s (user)     841s (real)
64 bit          365s (user)     833s (real)
--------------------------------------------
Speedup:          4.38%         N/A



Conclusions:

(1) OpenSSL

40% slowdown up to 300% speedup. gcc64 results require further investigation.
Average is difficult to calculate. Some benchmarks are *much* faster in 64 bit versions (RC4, AES, SHA512, RSA, DSA) others slower or nearly equal speed.

The 64 bit gcc results for OpenSSL are remarkeable. Let's compare them to the 64 bit Studio-11 results:

gcc 4.1.1 -m64 vs Studio-11 -xarch=amd64

type                    16B     64B     256B    1024B   8192B
md2                       2.31%   2.38%   2.17%   1.90%   1.95%
md4                      26.94%  31.90%  42.53%  57.73%  69.77%
md5                      15.15%  18.31%  26.14%  34.08%  38.99%
hmac(md5)                29.79%  29.06%  33.26%  37.32%  39.36%
sha1                     14.51%  20.64%  31.24%  33.62%  35.94%
rmd160                   30.72%  39.88%  54.02%  63.29%  68.00%
rc4                     -14.06% -15.15% -15.19% -15.35% -15.40%
des cbc                   4.39%   5.26%   5.42%   5.53%   5.41%
idea cbc                 -5.23%  -9.13% -10.22% -10.49% -10.47%
rc2 cbc                  -5.78%  -5.70%  -5.95%  -6.00%  -5.91%
blowfish cbc             17.46%  18.34%  18.54%  18.63%  19.03%
cast cbc                 42.77%  43.79%  43.92%  45.47%  44.42%
aes-128 cbc              13.49%  16.56%  17.83%  17.73%  18.04%
aes-192 cbc              15.29%  18.08%  18.68%  18.78%  19.05%
aes-256 cbc              15.41%  18.71%  19.12%  19.48%  19.71%
sha256                   29.78%  34.61%  39.95%  43.14%  44.50%
sha512                   29.04%  29.04%  33.51%  35.32%  37.11%

                        sign    verify
rsa 512 bits             46.14%  53.01%
rsa 1024 bits            70.39%  67.51%
rsa 2048 bits            86.89%  89.64%
rsa 4096 bits            94.29%  95.69%
dsa 512 bits             66.00%  66.58%
dsa 1024 bits            81.30%  80.66%
dsa 2048 bits            91.30%  90.99%

Wow! I am shocked by the bad results of Studio 11 compared to gcc.


(2) gzip/bzip2

Speedup between -5% and 25%
Average speedup
        for Studio 11: 2%
        for gcc 4.1.1: 9%

(3) Oracle
        ~5% speedup in CPU time


Studio 11 vs. gcc 4.1.1: No clear winner. Perhaps gcc is generating better 64 bit code.

Code size: AMD64 code is ~20-30% (Studio 11) resp. 10% (gcc) larger. Studio-11 code is ~20% larger than gcc code:

$ size gzip-*
gzip-32.cc: 67210 + 5979 + 330505 = 403694
gzip-32.gcc: 58347 + 3036 + 330912 = 392295
gzip-64.cc: 80982 + 8275 + 332353 = 421610
gzip-64.gcc: 65463 + 5096 + 332668 = 403227
$ size bzip2-*
bzip2-32.cc: 105248 + 4366 + 5839 = 115453
bzip2-32.gcc: 79768 + 3588 + 5981 = 89337
bzip2-64.cc: 120572 + 4922 + 7455 = 132949
bzip2-64.gcc: 85096 + 3976 + 7625 = 96697
$ size openssl-*
openssl-32.cc: 1721518 + 75476 + 16332 = 1813326
openssl-32.gcc: 1523704 + 73864 + 17976 = 1615544
openssl-64.cc: 2257218 + 126616 + 18912 = 2402746
openssl-64.gcc: 1814240 + 124056 + 21192 = 1959488
$ size 10.2.0*/bin/oracle
10.2.0_64/bin/oracle: 94541351 + 2484717 + 34179 = 97060247
10.2.0_32/bin/oracle: 71377376 + 301249 + 27895 = 71706520


Daniel
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to