(Sorry this is long.. the short version is for i386 use -O2, where possible, and for the x86_64 -O1 beats -O2!!, and the x86_64 is a good 30% faster)
For all my testing, I'm using my desktop, an Intel Q9300 @ 2.5Ghz (quad core cpu, 3MB cache) running OS X 10.6.4, with 8GB of ram. I have my machine setup to use the full 64bit kernel $ uname -a Darwin Jason-Stevenss-Mac-Pro.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:27:12 PDT 2010; root:xnu-1504.7.4~1/RELEASE_X86_64 x86_64 $ hostinfo Mach kernel version: Darwin Kernel Version 10.4.0: Fri Apr 23 18:27:12 PDT 2010; root:xnu-1504.7.4~1/RELEASE_X86_64 Kernel configured for up to 4 processors. 4 processors are physically available. 4 processors are logically available. Processor type: i486 (Intel 80486) Processors active: 0 1 2 3 Primary memory available: 8.00 gigabytes Default processor set: 87 tasks, 393 threads, 4 processors Load average: 0.03, Mach factor: 3.96 $ gcc -v Using built-in specs. Target: i686-apple-darwin10 Configured with: /var/tmp/gcc/gcc-5664~38/src/configure --disable-checking --enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin10 --program-prefix=i686-apple-darwin10- --host=x86_64-apple-darwin10 --target=i686-apple-darwin10 --with-gxx-include-dir=/include/c++/4.2.1 Thread model: posix gcc version 4.2.1 (Apple Inc. build 5664) Ok, very exciting I know. I'm testing with 4.3 BSD UWisc [ http://sourceforge.net/projects/bsd42/files/4BSD%20under%20Windows/v0.4/4.3BSD-Uwisc-install-0.4.exe/download] from the TUHS, along with gcc 2.7.2.2 in the VM, and the dhyrstone program from http://www.superglobalmegacorp.com/index.php/Dhrystone.c Every time I build vax780, I'm using the first set of flags for all of the program, and the second for the isolated op_ldpctx,op_mtpr procedures.. I'm also listing the exe size for some comparison. When building for the i386 I get the following results: -O2/O1 533,924 Dhrystone(1.1) time for 500000 passes = 17 This machine benchmarks at 29411 dhrystones/second Dhrystone(1.1) time for 500000 passes = 17 This machine benchmarks at 29411 dhrystones/second Dhrystone(1.1) time for 500000 passes = 17 This machine benchmarks at 29411 dhrystones/second -O1/-O1 513,448 Dhrystone(1.1) time for 500000 passes = 18 This machine benchmarks at 27777 dhrystones/second Dhrystone(1.1) time for 500000 passes = 18 This machine benchmarks at 27777 dhrystones/second Dhrystone(1.1) time for 500000 passes = 18 This machine benchmarks at 27777 dhrystones/second -Os/-O1 513,396 Dhrystone(1.1) time for 500000 passes = 17 This machine benchmarks at 29411 dhrystones/second Dhrystone(1.1) time for 500000 passes = 18 This machine benchmarks at 27777 dhrystones/second Dhrystone(1.1) time for 500000 passes = 17 This machine benchmarks at 29411 dhrystones/second And as we can see, and what I'd have expected is the -O2/-O1 combination was the most consistent for speed. Now onto the 64bit stuff... -O2/-O1 576,112 Dhrystone(1.1) time for 500000 passes = 14 This machine benchmarks at 35714 dhrystones/second Dhrystone(1.1) time for 500000 passes = 13 This machine benchmarks at 38461 dhrystones/second Dhrystone(1.1) time for 500000 passes = 12 This machine benchmarks at 41666 dhrystones/second -O1/-O1 559,736 Dhrystone(1.1) time for 500000 passes = 12 This machine benchmarks at 41666 dhrystones/second Dhrystone(1.1) time for 500000 passes = 13 This machine benchmarks at 38461 dhrystones/second Dhrystone(1.1) time for 500000 passes = 13 This machine benchmarks at 38461 dhrystones/second -O0/-O0 675,832 Dhrystone(1.1) time for 500000 passes = 19 This machine benchmarks at 26315 dhrystones/second Dhrystone(1.1) time for 500000 passes = 19 This machine benchmarks at 26315 dhrystones/second Dhrystone(1.1) time for 500000 passes = 19 This machine benchmarks at 26315 dhrystones/second -O0/-O1 675,816 Dhrystone(1.1) time for 500000 passes = 19 This machine benchmarks at 26315 dhrystones/second Dhrystone(1.1) time for 500000 passes = 19 This machine benchmarks at 26315 dhrystones/second Dhrystone(1.1) time for 500000 passes = 17 This machine benchmarks at 29411 dhrystones/second -Os/-O1 555,576 Dhrystone(1.1) time for 500000 passes = 13 This machine benchmarks at 38461 dhrystones/second Dhrystone(1.1) time for 500000 passes = 12 This machine benchmarks at 41666 dhrystones/second Dhrystone(1.1) time for 500000 passes = 14 This machine benchmarks at 35714 dhrystones/second What is interesting to me, is that the -O2 wasn't as fast as the -O1.. I'll have to try this on some other x86_64 platforms to see if it's consistent, but I thought I'd pass along just how much SIMH is on 64bit machines with a 64bit compiler, and that O2 isn't necessarily the best fit....
_______________________________________________ Simh mailing list [email protected] http://mailman.trailing-edge.com/mailman/listinfo/simh
