Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-16 07:19, Konstantin Belousov wrote: On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote: ... I tried to map the CPUID into more human-friendly family moniker, and it seems that these are Pentium-4 class CPUs. Am I right ? Yes, it is apparently a Nocona model, this is part of the dmesg: CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.24-MHz K8-class CPU) Origin = GenuineIntel Id = 0xf41 Family = f Model = 4 Stepping = 1 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x641dSSE3,DTES64,MON,DS_CPL,CNXT-ID,CX16,xTPR AMD Features=0x20100800SYSCALL,NX,LM TSC: P-state invariant real memory = 4294967296 (4096 MB) avail memory = 4097470464 (3907 MB) Event timer LAPIC quality 400 ACPI APIC Table: DELL PE BKC FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 2 package(s) x 1 core(s) x 2 HTT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP/HT): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP/HT): APIC ID: 7 If yes, could you, please, rerun the tests on anything more recent than Core2, i.e. any Core i7-whatever class of Xeons ? I would love to, especially because the tests will complete faster, but I currently do not have access to physical machines of that class. Normally I do performance tests on the FreeBSD reference machines, but since these tests require booting with a custom kernel (and preferably root access + remote console), I cannot use them. So if somebody can offer such a machine (for a limited time only, a few days most likely, 1 week maximum), it would be great. -Dimitry ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-16 07:25, Garrett Cooper wrote: ... If you can provide the tests, I can rerun it on some Nehalem class workstations I have access to. I unfortunately don't have access to SNB/Romley hardware yet. I did these tests as follows: - Install a recent -CURRENT snapshot on the box (or rebuild world and kernel by hand and install them). - Install Subversion. - Checkout head sources into /usr/src, if not already there. - Build GENERIC kernel with gcc, using default settings, and install it into /boot/kernel.gcc. - Build GENERIC kernel with clang, using default settings, and install it into /boot/kernel.clang. - Boot machine with either kernel, then run the attached runtest.sh script, with the buildworld_{single,multi}.sh scripts in the same directory. Save the resulting run-*.txt files in a directory that indicates whether the kernel in use was built by gcc or by clang. You can tweak the 'num_runs' variable at the top of runtest.sh to do more runs, if the machine is fast. This should give more confidence in the final statistics. I did just 3 runs on Gavin's machine, since it took more than 7 hours for a single-threaded buildworld to complete. Doing 6 runs should be more than enough. The run-*.txt files contain the time(1) output of each run, and should be processed through ministat to give average, stddev and so on. Just send them to me, I will process them and summarize the statistics. Alternatively, you can give me remote access, and I'll do it. :) #!/bin/sh mypath=${0%/*} num_runs=3 set -e do_runtest() { for i in $(jot ${num_runs}); do rm -rf /usr/obj/* sync echo Doing build $1, run $i... /usr/bin/time -l -o run-$1-$i.txt ${mypath}/build$1.sh run-$1-$i.log head -1 run-$1-$i.txt done } do_runtest world_single do_runtest world_multi #!/bin/sh set -e cd /usr/src make -s buildworld #!/bin/sh set -e cd /usr/src make -s -j8 buildworld ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 16/09/2012 00:34, Dimitry Andric wrote: ... The executive summary: GENERIC kernels compiled with clang 3.2 are slightly faster than those compiled by gcc 4.2.1, though the difference will not very noticeable in practice. It has been my impression in the past, that math heavy applications benefit from GCC whereas I/O heavy applications yield better performance when compiled with clang. I'd say a kernel has a lot more I/O than math to deal with. -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Compiler performance tests on FreeBSD 10.0-CURRENT
Hi all, By request, I performed a series of kernel performance tests on FreeBSD 10.0-CURRENT, particularly comparing the runtime performance of GENERIC kernels compiled by gcc 4.2.1 and by clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: GENERIC kernels compiled with clang 3.2 are slightly faster than those compiled by gcc 4.2.1, though the difference will not very noticeable in practice. Last but not least, thanks to Gavin Atkinson for providing the required hardware. -Dimitry [1]: Also available at: http://www.andric.com/freebsd/perftest/perftest-kernel-2012-09-14a.txt KERNEL PERFORMANCE TESTS ON FREEBSD 10.0-CURRENT, SEPTEMBER 2012 INTRODUCTION These tests aim to give an indication of the runtime performance of FreeBSD kernels compiled with different compilers. The compilers tested were: - gcc 4.2.1, the system compiler in FreeBSD. - clang 3.2 (trunk 162107), which is the default version of clang in FreeBSD 10.0-CURRENT, after r239462. All tests were run on a machine gracefully provided by Gavin Atkinson, which is a Dell PowerEdge 2850, with two 2.80 GHz Xeon-class CPUs (id=0xf41), and 4 GB RAM. It runs FreeBSD/amd64 10.0-CURRENT as of Tue Sep 11 19:11:00 UTC 2012. With each compiler, a stock GENERIC kernel for amd64 was built from head as of r240384, with the default optimization flags for this architecture, which are for gcc: -O2 -frename-registers -pipe -fno-strict-aliasing and for clang: -O2 -pipe -fno-strict-aliasing Each kernel was installed into /boot/kernel.${compilername}. The system was then booted with each of these kernels, without modifying anything else, and multiple runs of make buildworld were done; first in single-threaded mode, next in multi-threaded mode, using the -j8 flag. Between each run, the /usr/obj directory was fully cleaned out, and filesystems were synced. The timing results are below. Building world, single-threaded, on a GENERIC kernel compiled by clang 3.2 -- N Min MaxMedian Avg Stddev real 3 26589.27 26680.48 26653.58 26641.11 46.866211 user 3 20449.52 20472.88 20463.4 20461.933 11.748861 sys 3 7809.87 7837.94 7830.35 7826.0533 14.519891 maxrss3759420759420759420759420 0 ixrss 3 4923 4926 4924 4924. 1.5275252 idrss 3 584 584 584 584 0 isrss 3 131 131 131 131 0 minflt3 6.5828088e+08 6.5855089e+08 6.5828258e+08 6.5837145e+08 155402.8 majflt3 0 2573 2568 1713.6667 1484.081 nswap 3 0 0 0 0 0 inblock 3 2176 30252 30170 20866 16186.067 oublock 3 28370 28377 28375 28374 3.6055513 msgsnd3 0 5 2 2.333 2.5166115 msgrcv3 0 3 2 1.667 1.5275252 nsignals 3 74107 74107 74107 74107 0 nvcsw 3 1086164 1107104 1106650 1099972.7 11960.81 nivcsw3604641658906616307626618 28564.14 Building world, single-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -- N Min MaxMedian Avg Stddev real 3 26986.71 27080.38 26992.54 27019.877 52.478445 user 3 20506.89 20516.1 20511.66 20511.55 4.6059852 sys 3 8245.69 8285.79 8253.04 8261.5067 21.348673 maxrss3759420759420759420759420 0 ixrss 3 4894 4900 4898 4897. 3.0550505 idrss 3 581 581 581 581 0 isrss 3 131 131 131 131 0 minflt3 6.5855245e+08 6.5855409e+08 6.5855253e+08 6.5855302e+08 922.2581 majflt3 0 2566 0 855.3 1481.4808 nswap 3 0 0 0 0 0 inblock 3 1619 29805 2008 11144 16162.07 oublock 3 28652 28747 28662
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote: Hi all, By request, I performed a series of kernel performance tests on FreeBSD 10.0-CURRENT, particularly comparing the runtime performance of GENERIC kernels compiled by gcc 4.2.1 and by clang 3.2. the fact that the difference is so small is interesting, and it might almost suggests that the test is dominated by other factors than the compiler. By chance do you have a way to produce other data points with different optimization levels in the compiler ? cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-16 01:22, Luigi Rizzo wrote: ... the fact that the difference is so small is interesting, and it might almost suggests that the test is dominated by other factors than the compiler. Yes, this result was more or less what I expected: runtime performance is probably related more to hardware speed, and the efficiency of the chosen algorithms in the kernel, than to the optimizations any current compiler can produce. Apparently our kernel hackers already produce quite efficient code. :) By chance do you have a way to produce other data points with different optimization levels in the compiler ? I could re-run the tests with e.g. -O1 instead of -O2, or maybe even -O0, though I am not sure if the kernel will compile correctly without any optimization. This will take a while though, and I am not sure if I can borrow Gavin's machine long enough. :) -Dimitry ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote: Hi all, By request, I performed a series of kernel performance tests on FreeBSD 10.0-CURRENT, particularly comparing the runtime performance of GENERIC kernels compiled by gcc 4.2.1 and by clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: GENERIC kernels compiled with clang 3.2 are slightly faster than those compiled by gcc 4.2.1, though the difference will not very noticeable in practice. Last but not least, thanks to Gavin Atkinson for providing the required hardware. Thank you very much for doing this. I tried to map the CPUID into more human-friendly family moniker, and it seems that these are Pentium-4 class CPUs. Am I right ? If yes, could you, please, rerun the tests on anything more recent than Core2, i.e. any Core i7-whatever class of Xeons ? Thank again. pgphJY78vtxIX.pgp Description: PGP signature
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Sat, Sep 15, 2012 at 10:19 PM, Konstantin Belousov kostik...@gmail.com wrote: On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote: Hi all, By request, I performed a series of kernel performance tests on FreeBSD 10.0-CURRENT, particularly comparing the runtime performance of GENERIC kernels compiled by gcc 4.2.1 and by clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: GENERIC kernels compiled with clang 3.2 are slightly faster than those compiled by gcc 4.2.1, though the difference will not very noticeable in practice. Last but not least, thanks to Gavin Atkinson for providing the required hardware. Thank you very much for doing this. I tried to map the CPUID into more human-friendly family moniker, and it seems that these are Pentium-4 class CPUs. Am I right ? If yes, could you, please, rerun the tests on anything more recent than Core2, i.e. any Core i7-whatever class of Xeons ? If you can provide the tests, I can rerun it on some Nehalem class workstations I have access to. I unfortunately don't have access to SNB/Romley hardware yet. Thanks, -Garrett ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Wed, Sep 05, 2012 at 03:13:11PM -0700, Steve Kargl wrote: On Wed, Sep 05, 2012 at 11:31:26AM +0200, Dimitry Andric wrote: On 2012-09-05 01:40, Garrett Cooper wrote: ... Steve does have a point. Posting the results of CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through the code to figure out what *FLAGS were used elsewhere) is more valuable than the data is in its current state (unfortunately.. autoconf makes things more complicated). 1) For building the FreeBSD in-tree version of clang 3.2: -O2 -pipe -fno-strict-aliasing 2) For building the FreeBSD in-tree version of gcc 4.2.1: -O2 -pipe 3) For building Boost 1.50.0: -ftemplate-depth-128 -O3 -finline-functions Dimitry thanks for the follow-up. I performed an unscientific (micro)benchmark of /usr/bin/cc vs /usr/bin/clang where cc is the base system's gcc 4.2.1. Here's what I found/feared. Compiling libm on CPU: AMD Opteron(tm) Processor 248 (2192.01-MHz K8-class CPU) Origin = AuthenticAMD Id = 0xf5a Family = f Model = 5 Stepping = 10 Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,\ MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2 AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow!+,3DNow! with default CFLAGS (ie., -O2 -pipe) and -march=opteron. Was this compiled as amd64 or i386? Also, can you send me the test case? So that we can explore the difference. The working theory now is SSE vs FPU mathematics, but it would be nice to see the testcase. Thank you, roman Using 'setenv CC /usr/bin/cc' with 3 runs of make clean time make -DNO_MAN yields 69.39 real52.00 user38.55 sys 69.57 real52.35 user38.37 sys 69.48 real52.25 user38.38 sys Now, repeating with 'setenv CC /usr/bin/clang' yields 39.65 real21.86 user17.37 sys 40.91 real21.48 user17.91 sys 39.77 real21.65 user17.64 sys So, clang does appear to be faster in this particular compiling speed benchmark. However, if I know build my test program for libm's j0f() function where the only difference is whether libm was built with /usr/bin/cc or /usr/bin/clang, I observe the following results. 1234567 x values in the interval [0:25] gcc libm| clang libm |- ULP = 0.6 -- 565515 (45.81%) | 513763 (41.61%) 0.6 ULP = 0.7 -- 74148 ( 6.01%) | 67221 ( 5.44%) 0.7 ULP = 0.8 -- 69112 ( 5.60%) | 62846 ( 5.09%) 0.8 ULP = 0.9 -- 63798 ( 5.17%) | 58217 ( 4.72%) 0.9 ULP = 1.0 -- 58679 ( 4.75%) | 53834 ( 4.36%) 1.0 ULP = 2.0 -- 328221 (26.59%) | 306728 (24.84%) 2.0 ULP = 3.0 -- 65323 ( 5.29%) | 63452 ( 5.14%) 3.0 ULP-- 9771 ( 0.79%) | 108506 ( 8.79%) gcc libm | clang libm ---| MAX ULP: 12152.27637| 1129606938624.0 x at MAX ULP: 5.520077 0x1.6148f2p+2 | 2.404833 0x1.33d19p+1 Speed test with gcc libm. 1234567 j0f calls in 0.193427 seconds. 1234567 j0f calls in 0.193410 seconds. 1234567 j0f calls in 0.194158 seconds. Speed test with clang libm. 1234567 j0f calls in 0.180260 seconds. 1234567 j0f calls in 0.180130 seconds. 1234567 j0f calls in 0.179739 seconds. So, although the clang built j0f() appears to be faster than the gcc built j0f(), the clang built j0f() has much worse accuracy issues. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 6 Sep 2012, at 09:43, Roman Divacky wrote: Was this compiled as amd64 or i386? Also, can you send me the test case? So that we can explore the difference. The working theory now is SSE vs FPU mathematics, but it would be nice to see the testcase. There may also be a difference in whether -ffast-math is the default on each compiler. On x86, this will replace a number of libm calls with (much faster, but less accurate) SSE or x87 instructions. If this is enabled by default with clang and not with gcc, it would account for the difference. David___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-06 12:20, David Chisnall wrote: ... There may also be a difference in whether -ffast-math is the default on each compiler. On x86, this will replace a number of libm calls with (much faster, but less accurate) SSE or x87 instructions. If this is enabled by default with clang and not with gcc, it would account for the difference. No, -ffast-math is not enabled by default in clang, as far as I can tell. Also, the help text for the option says: Enable the *frontend*'s 'fast-math' mode. This has no effect on optimizations, but provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math flag. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Thu, Sep 06, 2012 at 10:43:12AM +0200, Roman Divacky wrote: On Wed, Sep 05, 2012 at 03:13:11PM -0700, Steve Kargl wrote: Compiling libm on CPU: AMD Opteron(tm) Processor 248 (2192.01-MHz K8-class CPU) Origin = AuthenticAMD Id = 0xf5a Family = f Model = 5 Stepping = 10 Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,\ MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2 AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow!+,3DNow! with default CFLAGS (ie., -O2 -pipe) and -march=opteron. Was this compiled as amd64 or i386? Also, can you send me the test case? So that we can explore the difference. The working theory now is SSE vs FPU mathematics, but it would be nice to see the testcase. Thank you, roman It was compiled on amd64. I can do the same testing on i386 this weekend. The testcase is not a self-contained piece of code. It uses parts of my floating point test frame. Putting together the testcase may take a few hours. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-05 01:40, Garrett Cooper wrote: ... Steve does have a point. Posting the results of CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through the code to figure out what *FLAGS were used elsewhere) is more valuable than the data is in its current state (unfortunately.. autoconf makes things more complicated). Just to note, autoconf is not used in the FreeBSD source tree, so it does not apply to the first two builds in the performance test (e.g. building in-tree clang and gcc). The other build is Boost, which has yet another totally different build system, based on Perforce's Jam. Again, no autoconf. In any case, for all three builds, the default optimization options were used. Basically: 1) For building the FreeBSD in-tree version of clang 3.2: -O2 -pipe -fno-strict-aliasing These are just the default FreeBSD optimization flags for building clang, which are probably used by the majority of users out there. This is the case that I was interested in particularly. The -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. 2) For building the FreeBSD in-tree version of gcc 4.2.1: -O2 -pipe These are the default FreeBSD optimization flags. 3) For building Boost 1.50.0: -ftemplate-depth-128 -O3 -finline-functions These are the Boost defaults for gcc-compatible compilers, from tools/build/v2/tools/gcc.jam. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 5 Sep 2012, at 10:31, Dimitry Andric wrote: These are just the default FreeBSD optimization flags for building clang, which are probably used by the majority of users out there. This is the case that I was interested in particularly. The -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. David___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-05 11:36, David Chisnall wrote: On 5 Sep 2012, at 10:31, Dimitry Andric wrote: TThe -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. Yes, but upstream has never used -fno-strict-aliasing, just plain -O2. I run regular separate builds of pristine upstream clang on FreeBSD, and I haven't seen any failures due aliasing problems in all the regression tests. That doesn't guarantee there are no problems, of course... In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. I'm actually quite interested in the performance difference; I think I will run a few tests. :) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com wrote: On 2012-09-05 11:36, David Chisnall wrote: On 5 Sep 2012, at 10:31, Dimitry Andric wrote: TThe -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. Yes, but upstream has never used -fno-strict-aliasing, just plain -O2. I run regular separate builds of pristine upstream clang on FreeBSD, and I haven't seen any failures due aliasing problems in all the regression tests. That doesn't guarantee there are no problems, of course... Aliasing problems are seen much more frequently on PowerPC than any other platform for Clang. I found this a while back when doing some Clang testing, and I still see problems with upstream unless I explicitly set -fno-strict-aliasing. Nathan had mentioned wanting to get upstream to use -fno-strict-aliasing by default on all platforms, but I don't think that ever made it beyond his suggesting. I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955 In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. I'm actually quite interested in the performance difference; I think I will run a few tests. :) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
What makes you think it's a bug in llvm code and not a plain gcc miscompile? Other people seem to compile llvm on PPC64 with gcc and -fstrict-aliasing just fine. They just dont happen to use gcc4.2.1. Ie. gcc47 is reported to not have this problem. I personally can confirm that fbsd+gcc48 is ok to On Wed, Sep 05, 2012 at 09:11:22AM -0400, Justin Hibbits wrote: On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com wrote: On 2012-09-05 11:36, David Chisnall wrote: On 5 Sep 2012, at 10:31, Dimitry Andric wrote: TThe -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. Yes, but upstream has never used -fno-strict-aliasing, just plain -O2. I run regular separate builds of pristine upstream clang on FreeBSD, and I haven't seen any failures due aliasing problems in all the regression tests. That doesn't guarantee there are no problems, of course... Aliasing problems are seen much more frequently on PowerPC than any other platform for Clang. I found this a while back when doing some Clang testing, and I still see problems with upstream unless I explicitly set -fno-strict-aliasing. Nathan had mentioned wanting to get upstream to use -fno-strict-aliasing by default on all platforms, but I don't think that ever made it beyond his suggesting. I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955 In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. I'm actually quite interested in the performance difference; I think I will run a few tests. :) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
Actually, Nathan does say it's gcc's fault in a comment on that bug. However, I do all my clang work compiling it with gcc4.2.1, so run into this constantly when I forget to add the flag. - Justin On Wed, Sep 5, 2012 at 1:37 PM, Roman Divacky rdiva...@freebsd.org wrote: What makes you think it's a bug in llvm code and not a plain gcc miscompile? Other people seem to compile llvm on PPC64 with gcc and -fstrict-aliasing just fine. They just dont happen to use gcc4.2.1. Ie. gcc47 is reported to not have this problem. I personally can confirm that fbsd+gcc48 is ok to On Wed, Sep 05, 2012 at 09:11:22AM -0400, Justin Hibbits wrote: On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com wrote: On 2012-09-05 11:36, David Chisnall wrote: On 5 Sep 2012, at 10:31, Dimitry Andric wrote: TThe -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. Yes, but upstream has never used -fno-strict-aliasing, just plain -O2. I run regular separate builds of pristine upstream clang on FreeBSD, and I haven't seen any failures due aliasing problems in all the regression tests. That doesn't guarantee there are no problems, of course... Aliasing problems are seen much more frequently on PowerPC than any other platform for Clang. I found this a while back when doing some Clang testing, and I still see problems with upstream unless I explicitly set -fno-strict-aliasing. Nathan had mentioned wanting to get upstream to use -fno-strict-aliasing by default on all platforms, but I don't think that ever made it beyond his suggesting. I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955 In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. I'm actually quite interested in the performance difference; I think I will run a few tests. :) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
I've been compiling clang with itself on PPC64 for a while now. Works quite good :) On Wed, Sep 05, 2012 at 01:44:00PM -0400, Justin Hibbits wrote: Actually, Nathan does say it's gcc's fault in a comment on that bug. However, I do all my clang work compiling it with gcc4.2.1, so run into this constantly when I forget to add the flag. - Justin On Wed, Sep 5, 2012 at 1:37 PM, Roman Divacky rdiva...@freebsd.org wrote: What makes you think it's a bug in llvm code and not a plain gcc miscompile? Other people seem to compile llvm on PPC64 with gcc and -fstrict-aliasing just fine. They just dont happen to use gcc4.2.1. Ie. gcc47 is reported to not have this problem. I personally can confirm that fbsd+gcc48 is ok to On Wed, Sep 05, 2012 at 09:11:22AM -0400, Justin Hibbits wrote: On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com wrote: On 2012-09-05 11:36, David Chisnall wrote: On 5 Sep 2012, at 10:31, Dimitry Andric wrote: TThe -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. Yes, but upstream has never used -fno-strict-aliasing, just plain -O2. I run regular separate builds of pristine upstream clang on FreeBSD, and I haven't seen any failures due aliasing problems in all the regression tests. That doesn't guarantee there are no problems, of course... Aliasing problems are seen much more frequently on PowerPC than any other platform for Clang. I found this a while back when doing some Clang testing, and I still see problems with upstream unless I explicitly set -fno-strict-aliasing. Nathan had mentioned wanting to get upstream to use -fno-strict-aliasing by default on all platforms, but I don't think that ever made it beyond his suggesting. I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955 In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. I'm actually quite interested in the performance difference; I think I will run a few tests. :) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Wed, Sep 05, 2012 at 11:31:26AM +0200, Dimitry Andric wrote: On 2012-09-05 01:40, Garrett Cooper wrote: ... Steve does have a point. Posting the results of CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through the code to figure out what *FLAGS were used elsewhere) is more valuable than the data is in its current state (unfortunately.. autoconf makes things more complicated). 1) For building the FreeBSD in-tree version of clang 3.2: -O2 -pipe -fno-strict-aliasing 2) For building the FreeBSD in-tree version of gcc 4.2.1: -O2 -pipe 3) For building Boost 1.50.0: -ftemplate-depth-128 -O3 -finline-functions Dimitry thanks for the follow-up. I performed an unscientific (micro)benchmark of /usr/bin/cc vs /usr/bin/clang where cc is the base system's gcc 4.2.1. Here's what I found/feared. Compiling libm on CPU: AMD Opteron(tm) Processor 248 (2192.01-MHz K8-class CPU) Origin = AuthenticAMD Id = 0xf5a Family = f Model = 5 Stepping = 10 Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,\ MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2 AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow!+,3DNow! with default CFLAGS (ie., -O2 -pipe) and -march=opteron. Using 'setenv CC /usr/bin/cc' with 3 runs of make clean time make -DNO_MAN yields 69.39 real52.00 user38.55 sys 69.57 real52.35 user38.37 sys 69.48 real52.25 user38.38 sys Now, repeating with 'setenv CC /usr/bin/clang' yields 39.65 real21.86 user17.37 sys 40.91 real21.48 user17.91 sys 39.77 real21.65 user17.64 sys So, clang does appear to be faster in this particular compiling speed benchmark. However, if I know build my test program for libm's j0f() function where the only difference is whether libm was built with /usr/bin/cc or /usr/bin/clang, I observe the following results. 1234567 x values in the interval [0:25] gcc libm| clang libm |- ULP = 0.6 -- 565515 (45.81%) | 513763 (41.61%) 0.6 ULP = 0.7 -- 74148 ( 6.01%) | 67221 ( 5.44%) 0.7 ULP = 0.8 -- 69112 ( 5.60%) | 62846 ( 5.09%) 0.8 ULP = 0.9 -- 63798 ( 5.17%) | 58217 ( 4.72%) 0.9 ULP = 1.0 -- 58679 ( 4.75%) | 53834 ( 4.36%) 1.0 ULP = 2.0 -- 328221 (26.59%) | 306728 (24.84%) 2.0 ULP = 3.0 -- 65323 ( 5.29%) | 63452 ( 5.14%) 3.0 ULP-- 9771 ( 0.79%) | 108506 ( 8.79%) gcc libm | clang libm ---| MAX ULP: 12152.27637| 1129606938624.0 x at MAX ULP: 5.520077 0x1.6148f2p+2 | 2.404833 0x1.33d19p+1 Speed test with gcc libm. 1234567 j0f calls in 0.193427 seconds. 1234567 j0f calls in 0.193410 seconds. 1234567 j0f calls in 0.194158 seconds. Speed test with clang libm. 1234567 j0f calls in 0.180260 seconds. 1234567 j0f calls in 0.180130 seconds. 1234567 j0f calls in 0.179739 seconds. So, although the clang built j0f() appears to be faster than the gcc built j0f(), the clang built j0f() has much worse accuracy issues. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Compiler performance tests on FreeBSD 10.0-CURRENT
Hi all, I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: clang compiles mostly faster than gcc (sometimes much faster), and uses significantly less memory. Finally, please note these tests were purely about compilation speed, not about the performance of the resulting executables. This still needs to be tested. -Dimitry [1]: Also available at: http://www.andric.com/freebsd/perftest/perftest-2012-09-01a.txt COMPILER PERFORMANCE TESTS ON FREEBSD 10.0-CURRENT, SEPTEMBER 2012 == INTRODUCTION The compilers tested were: - gcc 4.2.1, the system compiler in FreeBSD, which is compiled by gcc 4.2.1. - gcc 4.7.1, from the official gcc.gnu.org release, compiled via a three-stage bootstrap, so the final compiler has been compiled by gcc 4.7.1. - clang 3.1 (branches/release_31 156863), which is the default version of clang in FreeBSD 10-CURRENT before r239462. The used executable was compiled by a previous copy of itself. - clang 3.2 (trunk 162107), which is the default version of clang in FreeBSD 10.0-CURRENT, after r239462. The used executable was compiled by a previous copy of itself. All tests were run on ref10-amd64.freebsd.org, which is a Dell 2950, 1.86GHz Core2 Xeon, 2x4 Core, 16G RAM. It runs FreeBSD/amd64 10.0-CURRENT #0 r231914: Sun Feb 19 17:24:37 UTC 2012. Each build was repeated 6 times, after cleaning out the object directories, and syncing. Each build was timed using the system time(1) command, using the -l argument to obtain rusage information. The programs tested by compilation were: - A large C++ program: clang 3.2, as it occurs in the FreeBSD 10.0-CURRENT source tree as of r239532. - A medium-large C program: gcc 4.2.1, as it occurs in the FreeBSD 10.0-CURRENT source tree as of r239532. - A large C++ library: boost 1.50.0, the officially released version from http://www.boost.org/. Building a large C++ program (clang 3.2) single-threaded Using clang 3.1: N Min MaxMedian Avg Stddev real 6 2283.69 2288.46 2285.74 2285.505 1.6470064 user 62145.22147.2 2146.18 2146.0567 0.68266146 sys 6 128.3132.08130.65 130.54833 1.256653 maxrss6179264179264179264179264 0 ixrss 6 21407 21436 21420 21419.833 9.6211572 idrss 6 3628 3632 3630 3629.8333 1.3291601 isrss 6 252 252 252 252 0 minflt6 12485556 12485556 12485556 12485556 0 majflt6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 0 0 0 0 oublock 6 2058 2106 2103 2081. 25.216397 msgsnd618181818 0 msgrcv6 0 0 0 0 0 nsignals 6 1878 1878 1878 1878 0 nvcsw 6 16288 16357 16333 16320.667 29.615311 nivcsw6 2071535 3998751 3057756 2966314 635381.66 Using clang 3.2: N Min MaxMedian Avg Stddev real 6 2358.61 2362.84 2362.67 2361.22 1.7831321 user 6 2215.33 2221.13 2218.72 2218.57 2.0094278 sys 6130.78134.63133.41 132.99833 1.4702301 maxrss6177796177796177796177796 0 ixrss 6 21388 21413 21408 21400.833 11.052903 idrss 6 3702 3707 3706 3704.6667 2.2509257 isrss 6 253 253 253 253 0 minflt6 12583827 12583827 12583827 12583827 0 majflt6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 0 0 0 0 oublock 6 2036 2074 2071
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 09/04/12 22:39, Dimitry Andric wrote: Hi all, I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: clang compiles mostly faster than gcc (sometimes much faster), and uses significantly less memory. Finally, please note these tests were purely about compilation speed, not about the performance of the resulting executables. This still needs to be tested. -Dimitry [1]: Also available at: http://www.andric.com/freebsd/perftest/perftest-2012-09-01a.txt Very intersting. It would also be of great interest to have some benchmarks on FBSD 10 at hand which compare the performance of the resulting binary of those compilers. Regards, Oliver signature.asc Description: OpenPGP digital signature
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote: I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: clang compiles mostly faster than gcc (sometimes much faster), and uses significantly less memory. The benchmark is somewhat meaningless if one does not know the options that were used during the testing. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Tue, Sep 4, 2012 at 1:39 PM, Dimitry Andric dimi...@andric.com wrote: Hi all, I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: clang compiles mostly faster than gcc (sometimes much faster), and uses significantly less memory. Finally, please note these tests were purely about compilation speed, not about the performance of the resulting executables. This still needs to be tested. It would be interesting to see how clang++ performs vs g++ when dealing with nested classes and with complicated code when trying to optimize things because the optimizer in g++ apparently has some scaling issues. Thanks! -Garrett ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-04 23:43, Steve Kargl wrote: On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote: I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. ... The benchmark is somewhat meaningless if one does not know the options that were used during the testing. If you meant the compilation options, those were simply the FreeBSD defaults for all tested programs, e.g. -O2 -pipe, except for boost, which uses -ftemplate-depth-128 -O3 -finline-functions. I will add some explicit notes about them. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Tue, Sep 04, 2012 at 11:59:39PM +0200, Dimitry Andric wrote: On 2012-09-04 23:43, Steve Kargl wrote: On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote: I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. ... The benchmark is somewhat meaningless if one does not know the options that were used during the testing. If you meant the compilation options, those were simply the FreeBSD defaults for all tested programs, e.g. -O2 -pipe, except for boost, which uses -ftemplate-depth-128 -O3 -finline-functions. I will add some explicit notes about them. Yes, I meant the options specified on the compiler command line. 'gcc -O0 -pipe' compiles code faster than 'gcc -O3 -save-temps', and the former uses much less memory. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Tue, Sep 4, 2012 at 3:14 PM, Steve Kargl s...@troutmask.apl.washington.edu wrote: On Tue, Sep 04, 2012 at 11:59:39PM +0200, Dimitry Andric wrote: On 2012-09-04 23:43, Steve Kargl wrote: On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote: I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. ... The benchmark is somewhat meaningless if one does not know the options that were used during the testing. If you meant the compilation options, those were simply the FreeBSD defaults for all tested programs, e.g. -O2 -pipe, except for boost, which uses -ftemplate-depth-128 -O3 -finline-functions. I will add some explicit notes about them. Yes, I meant the options specified on the compiler command line. 'gcc -O0 -pipe' compiles code faster than 'gcc -O3 -save-temps', and the former uses much less memory. Steve does have a point. Posting the results of CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through the code to figure out what *FLAGS were used elsewhere) is more valuable than the data is in its current state (unfortunately.. autoconf makes things more complicated). Maybe we need some micro benchmarks for this (no, I'm not volunteering :P). Thanks! -Garrett ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org