Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-16 07:19, Konstantin Belousov wrote: On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote: ... I tried to map the CPUID into more human-friendly family moniker, and it seems that these are Pentium-4 class CPUs. Am I right ? Yes, it is apparently a Nocona model, this is part of the dmesg: CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.24-MHz K8-class CPU) Origin = GenuineIntel Id = 0xf41 Family = f Model = 4 Stepping = 1 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x641dSSE3,DTES64,MON,DS_CPL,CNXT-ID,CX16,xTPR AMD Features=0x20100800SYSCALL,NX,LM TSC: P-state invariant real memory = 4294967296 (4096 MB) avail memory = 4097470464 (3907 MB) Event timer LAPIC quality 400 ACPI APIC Table: DELL PE BKC FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 2 package(s) x 1 core(s) x 2 HTT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP/HT): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP/HT): APIC ID: 7 If yes, could you, please, rerun the tests on anything more recent than Core2, i.e. any Core i7-whatever class of Xeons ? I would love to, especially because the tests will complete faster, but I currently do not have access to physical machines of that class. Normally I do performance tests on the FreeBSD reference machines, but since these tests require booting with a custom kernel (and preferably root access + remote console), I cannot use them. So if somebody can offer such a machine (for a limited time only, a few days most likely, 1 week maximum), it would be great. -Dimitry ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-16 07:25, Garrett Cooper wrote: ... If you can provide the tests, I can rerun it on some Nehalem class workstations I have access to. I unfortunately don't have access to SNB/Romley hardware yet. I did these tests as follows: - Install a recent -CURRENT snapshot on the box (or rebuild world and kernel by hand and install them). - Install Subversion. - Checkout head sources into /usr/src, if not already there. - Build GENERIC kernel with gcc, using default settings, and install it into /boot/kernel.gcc. - Build GENERIC kernel with clang, using default settings, and install it into /boot/kernel.clang. - Boot machine with either kernel, then run the attached runtest.sh script, with the buildworld_{single,multi}.sh scripts in the same directory. Save the resulting run-*.txt files in a directory that indicates whether the kernel in use was built by gcc or by clang. You can tweak the 'num_runs' variable at the top of runtest.sh to do more runs, if the machine is fast. This should give more confidence in the final statistics. I did just 3 runs on Gavin's machine, since it took more than 7 hours for a single-threaded buildworld to complete. Doing 6 runs should be more than enough. The run-*.txt files contain the time(1) output of each run, and should be processed through ministat to give average, stddev and so on. Just send them to me, I will process them and summarize the statistics. Alternatively, you can give me remote access, and I'll do it. :) #!/bin/sh mypath=${0%/*} num_runs=3 set -e do_runtest() { for i in $(jot ${num_runs}); do rm -rf /usr/obj/* sync echo Doing build $1, run $i... /usr/bin/time -l -o run-$1-$i.txt ${mypath}/build$1.sh run-$1-$i.log head -1 run-$1-$i.txt done } do_runtest world_single do_runtest world_multi #!/bin/sh set -e cd /usr/src make -s buildworld #!/bin/sh set -e cd /usr/src make -s -j8 buildworld ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 16/09/2012 00:34, Dimitry Andric wrote: ... The executive summary: GENERIC kernels compiled with clang 3.2 are slightly faster than those compiled by gcc 4.2.1, though the difference will not very noticeable in practice. It has been my impression in the past, that math heavy applications benefit from GCC whereas I/O heavy applications yield better performance when compiled with clang. I'd say a kernel has a lot more I/O than math to deal with. -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote: Hi all, By request, I performed a series of kernel performance tests on FreeBSD 10.0-CURRENT, particularly comparing the runtime performance of GENERIC kernels compiled by gcc 4.2.1 and by clang 3.2. the fact that the difference is so small is interesting, and it might almost suggests that the test is dominated by other factors than the compiler. By chance do you have a way to produce other data points with different optimization levels in the compiler ? cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-16 01:22, Luigi Rizzo wrote: ... the fact that the difference is so small is interesting, and it might almost suggests that the test is dominated by other factors than the compiler. Yes, this result was more or less what I expected: runtime performance is probably related more to hardware speed, and the efficiency of the chosen algorithms in the kernel, than to the optimizations any current compiler can produce. Apparently our kernel hackers already produce quite efficient code. :) By chance do you have a way to produce other data points with different optimization levels in the compiler ? I could re-run the tests with e.g. -O1 instead of -O2, or maybe even -O0, though I am not sure if the kernel will compile correctly without any optimization. This will take a while though, and I am not sure if I can borrow Gavin's machine long enough. :) -Dimitry ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote: Hi all, By request, I performed a series of kernel performance tests on FreeBSD 10.0-CURRENT, particularly comparing the runtime performance of GENERIC kernels compiled by gcc 4.2.1 and by clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: GENERIC kernels compiled with clang 3.2 are slightly faster than those compiled by gcc 4.2.1, though the difference will not very noticeable in practice. Last but not least, thanks to Gavin Atkinson for providing the required hardware. Thank you very much for doing this. I tried to map the CPUID into more human-friendly family moniker, and it seems that these are Pentium-4 class CPUs. Am I right ? If yes, could you, please, rerun the tests on anything more recent than Core2, i.e. any Core i7-whatever class of Xeons ? Thank again. pgphJY78vtxIX.pgp Description: PGP signature
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Sat, Sep 15, 2012 at 10:19 PM, Konstantin Belousov kostik...@gmail.com wrote: On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote: Hi all, By request, I performed a series of kernel performance tests on FreeBSD 10.0-CURRENT, particularly comparing the runtime performance of GENERIC kernels compiled by gcc 4.2.1 and by clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: GENERIC kernels compiled with clang 3.2 are slightly faster than those compiled by gcc 4.2.1, though the difference will not very noticeable in practice. Last but not least, thanks to Gavin Atkinson for providing the required hardware. Thank you very much for doing this. I tried to map the CPUID into more human-friendly family moniker, and it seems that these are Pentium-4 class CPUs. Am I right ? If yes, could you, please, rerun the tests on anything more recent than Core2, i.e. any Core i7-whatever class of Xeons ? If you can provide the tests, I can rerun it on some Nehalem class workstations I have access to. I unfortunately don't have access to SNB/Romley hardware yet. Thanks, -Garrett ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Wed, Sep 05, 2012 at 03:13:11PM -0700, Steve Kargl wrote: On Wed, Sep 05, 2012 at 11:31:26AM +0200, Dimitry Andric wrote: On 2012-09-05 01:40, Garrett Cooper wrote: ... Steve does have a point. Posting the results of CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through the code to figure out what *FLAGS were used elsewhere) is more valuable than the data is in its current state (unfortunately.. autoconf makes things more complicated). 1) For building the FreeBSD in-tree version of clang 3.2: -O2 -pipe -fno-strict-aliasing 2) For building the FreeBSD in-tree version of gcc 4.2.1: -O2 -pipe 3) For building Boost 1.50.0: -ftemplate-depth-128 -O3 -finline-functions Dimitry thanks for the follow-up. I performed an unscientific (micro)benchmark of /usr/bin/cc vs /usr/bin/clang where cc is the base system's gcc 4.2.1. Here's what I found/feared. Compiling libm on CPU: AMD Opteron(tm) Processor 248 (2192.01-MHz K8-class CPU) Origin = AuthenticAMD Id = 0xf5a Family = f Model = 5 Stepping = 10 Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,\ MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2 AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow!+,3DNow! with default CFLAGS (ie., -O2 -pipe) and -march=opteron. Was this compiled as amd64 or i386? Also, can you send me the test case? So that we can explore the difference. The working theory now is SSE vs FPU mathematics, but it would be nice to see the testcase. Thank you, roman Using 'setenv CC /usr/bin/cc' with 3 runs of make clean time make -DNO_MAN yields 69.39 real52.00 user38.55 sys 69.57 real52.35 user38.37 sys 69.48 real52.25 user38.38 sys Now, repeating with 'setenv CC /usr/bin/clang' yields 39.65 real21.86 user17.37 sys 40.91 real21.48 user17.91 sys 39.77 real21.65 user17.64 sys So, clang does appear to be faster in this particular compiling speed benchmark. However, if I know build my test program for libm's j0f() function where the only difference is whether libm was built with /usr/bin/cc or /usr/bin/clang, I observe the following results. 1234567 x values in the interval [0:25] gcc libm| clang libm |- ULP = 0.6 -- 565515 (45.81%) | 513763 (41.61%) 0.6 ULP = 0.7 -- 74148 ( 6.01%) | 67221 ( 5.44%) 0.7 ULP = 0.8 -- 69112 ( 5.60%) | 62846 ( 5.09%) 0.8 ULP = 0.9 -- 63798 ( 5.17%) | 58217 ( 4.72%) 0.9 ULP = 1.0 -- 58679 ( 4.75%) | 53834 ( 4.36%) 1.0 ULP = 2.0 -- 328221 (26.59%) | 306728 (24.84%) 2.0 ULP = 3.0 -- 65323 ( 5.29%) | 63452 ( 5.14%) 3.0 ULP-- 9771 ( 0.79%) | 108506 ( 8.79%) gcc libm | clang libm ---| MAX ULP: 12152.27637| 1129606938624.0 x at MAX ULP: 5.520077 0x1.6148f2p+2 | 2.404833 0x1.33d19p+1 Speed test with gcc libm. 1234567 j0f calls in 0.193427 seconds. 1234567 j0f calls in 0.193410 seconds. 1234567 j0f calls in 0.194158 seconds. Speed test with clang libm. 1234567 j0f calls in 0.180260 seconds. 1234567 j0f calls in 0.180130 seconds. 1234567 j0f calls in 0.179739 seconds. So, although the clang built j0f() appears to be faster than the gcc built j0f(), the clang built j0f() has much worse accuracy issues. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 6 Sep 2012, at 09:43, Roman Divacky wrote: Was this compiled as amd64 or i386? Also, can you send me the test case? So that we can explore the difference. The working theory now is SSE vs FPU mathematics, but it would be nice to see the testcase. There may also be a difference in whether -ffast-math is the default on each compiler. On x86, this will replace a number of libm calls with (much faster, but less accurate) SSE or x87 instructions. If this is enabled by default with clang and not with gcc, it would account for the difference. David___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-06 12:20, David Chisnall wrote: ... There may also be a difference in whether -ffast-math is the default on each compiler. On x86, this will replace a number of libm calls with (much faster, but less accurate) SSE or x87 instructions. If this is enabled by default with clang and not with gcc, it would account for the difference. No, -ffast-math is not enabled by default in clang, as far as I can tell. Also, the help text for the option says: Enable the *frontend*'s 'fast-math' mode. This has no effect on optimizations, but provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math flag. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Thu, Sep 06, 2012 at 10:43:12AM +0200, Roman Divacky wrote: On Wed, Sep 05, 2012 at 03:13:11PM -0700, Steve Kargl wrote: Compiling libm on CPU: AMD Opteron(tm) Processor 248 (2192.01-MHz K8-class CPU) Origin = AuthenticAMD Id = 0xf5a Family = f Model = 5 Stepping = 10 Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,\ MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2 AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow!+,3DNow! with default CFLAGS (ie., -O2 -pipe) and -march=opteron. Was this compiled as amd64 or i386? Also, can you send me the test case? So that we can explore the difference. The working theory now is SSE vs FPU mathematics, but it would be nice to see the testcase. Thank you, roman It was compiled on amd64. I can do the same testing on i386 this weekend. The testcase is not a self-contained piece of code. It uses parts of my floating point test frame. Putting together the testcase may take a few hours. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-05 01:40, Garrett Cooper wrote: ... Steve does have a point. Posting the results of CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through the code to figure out what *FLAGS were used elsewhere) is more valuable than the data is in its current state (unfortunately.. autoconf makes things more complicated). Just to note, autoconf is not used in the FreeBSD source tree, so it does not apply to the first two builds in the performance test (e.g. building in-tree clang and gcc). The other build is Boost, which has yet another totally different build system, based on Perforce's Jam. Again, no autoconf. In any case, for all three builds, the default optimization options were used. Basically: 1) For building the FreeBSD in-tree version of clang 3.2: -O2 -pipe -fno-strict-aliasing These are just the default FreeBSD optimization flags for building clang, which are probably used by the majority of users out there. This is the case that I was interested in particularly. The -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. 2) For building the FreeBSD in-tree version of gcc 4.2.1: -O2 -pipe These are the default FreeBSD optimization flags. 3) For building Boost 1.50.0: -ftemplate-depth-128 -O3 -finline-functions These are the Boost defaults for gcc-compatible compilers, from tools/build/v2/tools/gcc.jam. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 5 Sep 2012, at 10:31, Dimitry Andric wrote: These are just the default FreeBSD optimization flags for building clang, which are probably used by the majority of users out there. This is the case that I was interested in particularly. The -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. David___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-05 11:36, David Chisnall wrote: On 5 Sep 2012, at 10:31, Dimitry Andric wrote: TThe -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. Yes, but upstream has never used -fno-strict-aliasing, just plain -O2. I run regular separate builds of pristine upstream clang on FreeBSD, and I haven't seen any failures due aliasing problems in all the regression tests. That doesn't guarantee there are no problems, of course... In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. I'm actually quite interested in the performance difference; I think I will run a few tests. :) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com wrote: On 2012-09-05 11:36, David Chisnall wrote: On 5 Sep 2012, at 10:31, Dimitry Andric wrote: TThe -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. Yes, but upstream has never used -fno-strict-aliasing, just plain -O2. I run regular separate builds of pristine upstream clang on FreeBSD, and I haven't seen any failures due aliasing problems in all the regression tests. That doesn't guarantee there are no problems, of course... Aliasing problems are seen much more frequently on PowerPC than any other platform for Clang. I found this a while back when doing some Clang testing, and I still see problems with upstream unless I explicitly set -fno-strict-aliasing. Nathan had mentioned wanting to get upstream to use -fno-strict-aliasing by default on all platforms, but I don't think that ever made it beyond his suggesting. I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955 In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. I'm actually quite interested in the performance difference; I think I will run a few tests. :) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
What makes you think it's a bug in llvm code and not a plain gcc miscompile? Other people seem to compile llvm on PPC64 with gcc and -fstrict-aliasing just fine. They just dont happen to use gcc4.2.1. Ie. gcc47 is reported to not have this problem. I personally can confirm that fbsd+gcc48 is ok to On Wed, Sep 05, 2012 at 09:11:22AM -0400, Justin Hibbits wrote: On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com wrote: On 2012-09-05 11:36, David Chisnall wrote: On 5 Sep 2012, at 10:31, Dimitry Andric wrote: TThe -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. Yes, but upstream has never used -fno-strict-aliasing, just plain -O2. I run regular separate builds of pristine upstream clang on FreeBSD, and I haven't seen any failures due aliasing problems in all the regression tests. That doesn't guarantee there are no problems, of course... Aliasing problems are seen much more frequently on PowerPC than any other platform for Clang. I found this a while back when doing some Clang testing, and I still see problems with upstream unless I explicitly set -fno-strict-aliasing. Nathan had mentioned wanting to get upstream to use -fno-strict-aliasing by default on all platforms, but I don't think that ever made it beyond his suggesting. I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955 In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. I'm actually quite interested in the performance difference; I think I will run a few tests. :) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
Actually, Nathan does say it's gcc's fault in a comment on that bug. However, I do all my clang work compiling it with gcc4.2.1, so run into this constantly when I forget to add the flag. - Justin On Wed, Sep 5, 2012 at 1:37 PM, Roman Divacky rdiva...@freebsd.org wrote: What makes you think it's a bug in llvm code and not a plain gcc miscompile? Other people seem to compile llvm on PPC64 with gcc and -fstrict-aliasing just fine. They just dont happen to use gcc4.2.1. Ie. gcc47 is reported to not have this problem. I personally can confirm that fbsd+gcc48 is ok to On Wed, Sep 05, 2012 at 09:11:22AM -0400, Justin Hibbits wrote: On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com wrote: On 2012-09-05 11:36, David Chisnall wrote: On 5 Sep 2012, at 10:31, Dimitry Andric wrote: TThe -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. Yes, but upstream has never used -fno-strict-aliasing, just plain -O2. I run regular separate builds of pristine upstream clang on FreeBSD, and I haven't seen any failures due aliasing problems in all the regression tests. That doesn't guarantee there are no problems, of course... Aliasing problems are seen much more frequently on PowerPC than any other platform for Clang. I found this a while back when doing some Clang testing, and I still see problems with upstream unless I explicitly set -fno-strict-aliasing. Nathan had mentioned wanting to get upstream to use -fno-strict-aliasing by default on all platforms, but I don't think that ever made it beyond his suggesting. I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955 In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. I'm actually quite interested in the performance difference; I think I will run a few tests. :) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
I've been compiling clang with itself on PPC64 for a while now. Works quite good :) On Wed, Sep 05, 2012 at 01:44:00PM -0400, Justin Hibbits wrote: Actually, Nathan does say it's gcc's fault in a comment on that bug. However, I do all my clang work compiling it with gcc4.2.1, so run into this constantly when I forget to add the flag. - Justin On Wed, Sep 5, 2012 at 1:37 PM, Roman Divacky rdiva...@freebsd.org wrote: What makes you think it's a bug in llvm code and not a plain gcc miscompile? Other people seem to compile llvm on PPC64 with gcc and -fstrict-aliasing just fine. They just dont happen to use gcc4.2.1. Ie. gcc47 is reported to not have this problem. I personally can confirm that fbsd+gcc48 is ok to On Wed, Sep 05, 2012 at 09:11:22AM -0400, Justin Hibbits wrote: On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com wrote: On 2012-09-05 11:36, David Chisnall wrote: On 5 Sep 2012, at 10:31, Dimitry Andric wrote: TThe -fno-strict-aliasing is not really my choice, but it was introduced in the past by Nathan Whitehorn, who apparently saw problems without it. It will hopefully disappear in the future. Clang currently defaults to no strict aliasing on FreeBSD. Yes, but upstream has never used -fno-strict-aliasing, just plain -O2. I run regular separate builds of pristine upstream clang on FreeBSD, and I haven't seen any failures due aliasing problems in all the regression tests. That doesn't guarantee there are no problems, of course... Aliasing problems are seen much more frequently on PowerPC than any other platform for Clang. I found this a while back when doing some Clang testing, and I still see problems with upstream unless I explicitly set -fno-strict-aliasing. Nathan had mentioned wanting to get upstream to use -fno-strict-aliasing by default on all platforms, but I don't think that ever made it beyond his suggesting. I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955 In my experience, most C programmers misunderstand the aliasing rules of C and even people on the C++ standards committee often get them wrong for C++, so trading a 1-10% performance increase for a significant chance of generating non-working code seems like a poor gain. If people are certain that they do understand the rules, then they can add -fstrict-aliasing to their own CFLAGS. I'm actually quite interested in the performance difference; I think I will run a few tests. :) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Wed, Sep 05, 2012 at 11:31:26AM +0200, Dimitry Andric wrote: On 2012-09-05 01:40, Garrett Cooper wrote: ... Steve does have a point. Posting the results of CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through the code to figure out what *FLAGS were used elsewhere) is more valuable than the data is in its current state (unfortunately.. autoconf makes things more complicated). 1) For building the FreeBSD in-tree version of clang 3.2: -O2 -pipe -fno-strict-aliasing 2) For building the FreeBSD in-tree version of gcc 4.2.1: -O2 -pipe 3) For building Boost 1.50.0: -ftemplate-depth-128 -O3 -finline-functions Dimitry thanks for the follow-up. I performed an unscientific (micro)benchmark of /usr/bin/cc vs /usr/bin/clang where cc is the base system's gcc 4.2.1. Here's what I found/feared. Compiling libm on CPU: AMD Opteron(tm) Processor 248 (2192.01-MHz K8-class CPU) Origin = AuthenticAMD Id = 0xf5a Family = f Model = 5 Stepping = 10 Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,\ MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2 AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow!+,3DNow! with default CFLAGS (ie., -O2 -pipe) and -march=opteron. Using 'setenv CC /usr/bin/cc' with 3 runs of make clean time make -DNO_MAN yields 69.39 real52.00 user38.55 sys 69.57 real52.35 user38.37 sys 69.48 real52.25 user38.38 sys Now, repeating with 'setenv CC /usr/bin/clang' yields 39.65 real21.86 user17.37 sys 40.91 real21.48 user17.91 sys 39.77 real21.65 user17.64 sys So, clang does appear to be faster in this particular compiling speed benchmark. However, if I know build my test program for libm's j0f() function where the only difference is whether libm was built with /usr/bin/cc or /usr/bin/clang, I observe the following results. 1234567 x values in the interval [0:25] gcc libm| clang libm |- ULP = 0.6 -- 565515 (45.81%) | 513763 (41.61%) 0.6 ULP = 0.7 -- 74148 ( 6.01%) | 67221 ( 5.44%) 0.7 ULP = 0.8 -- 69112 ( 5.60%) | 62846 ( 5.09%) 0.8 ULP = 0.9 -- 63798 ( 5.17%) | 58217 ( 4.72%) 0.9 ULP = 1.0 -- 58679 ( 4.75%) | 53834 ( 4.36%) 1.0 ULP = 2.0 -- 328221 (26.59%) | 306728 (24.84%) 2.0 ULP = 3.0 -- 65323 ( 5.29%) | 63452 ( 5.14%) 3.0 ULP-- 9771 ( 0.79%) | 108506 ( 8.79%) gcc libm | clang libm ---| MAX ULP: 12152.27637| 1129606938624.0 x at MAX ULP: 5.520077 0x1.6148f2p+2 | 2.404833 0x1.33d19p+1 Speed test with gcc libm. 1234567 j0f calls in 0.193427 seconds. 1234567 j0f calls in 0.193410 seconds. 1234567 j0f calls in 0.194158 seconds. Speed test with clang libm. 1234567 j0f calls in 0.180260 seconds. 1234567 j0f calls in 0.180130 seconds. 1234567 j0f calls in 0.179739 seconds. So, although the clang built j0f() appears to be faster than the gcc built j0f(), the clang built j0f() has much worse accuracy issues. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 09/04/12 22:39, Dimitry Andric wrote: Hi all, I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: clang compiles mostly faster than gcc (sometimes much faster), and uses significantly less memory. Finally, please note these tests were purely about compilation speed, not about the performance of the resulting executables. This still needs to be tested. -Dimitry [1]: Also available at: http://www.andric.com/freebsd/perftest/perftest-2012-09-01a.txt Very intersting. It would also be of great interest to have some benchmarks on FBSD 10 at hand which compare the performance of the resulting binary of those compilers. Regards, Oliver signature.asc Description: OpenPGP digital signature
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote: I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: clang compiles mostly faster than gcc (sometimes much faster), and uses significantly less memory. The benchmark is somewhat meaningless if one does not know the options that were used during the testing. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Tue, Sep 4, 2012 at 1:39 PM, Dimitry Andric dimi...@andric.com wrote: Hi all, I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: clang compiles mostly faster than gcc (sometimes much faster), and uses significantly less memory. Finally, please note these tests were purely about compilation speed, not about the performance of the resulting executables. This still needs to be tested. It would be interesting to see how clang++ performs vs g++ when dealing with nested classes and with complicated code when trying to optimize things because the optimizer in g++ apparently has some scaling issues. Thanks! -Garrett ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On 2012-09-04 23:43, Steve Kargl wrote: On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote: I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. ... The benchmark is somewhat meaningless if one does not know the options that were used during the testing. If you meant the compilation options, those were simply the FreeBSD defaults for all tested programs, e.g. -O2 -pipe, except for boost, which uses -ftemplate-depth-128 -O3 -finline-functions. I will add some explicit notes about them. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Tue, Sep 04, 2012 at 11:59:39PM +0200, Dimitry Andric wrote: On 2012-09-04 23:43, Steve Kargl wrote: On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote: I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. ... The benchmark is somewhat meaningless if one does not know the options that were used during the testing. If you meant the compilation options, those were simply the FreeBSD defaults for all tested programs, e.g. -O2 -pipe, except for boost, which uses -ftemplate-depth-128 -O3 -finline-functions. I will add some explicit notes about them. Yes, I meant the options specified on the compiler command line. 'gcc -O0 -pipe' compiles code faster than 'gcc -O3 -save-temps', and the former uses much less memory. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Compiler performance tests on FreeBSD 10.0-CURRENT
On Tue, Sep 4, 2012 at 3:14 PM, Steve Kargl s...@troutmask.apl.washington.edu wrote: On Tue, Sep 04, 2012 at 11:59:39PM +0200, Dimitry Andric wrote: On 2012-09-04 23:43, Steve Kargl wrote: On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote: I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. ... The benchmark is somewhat meaningless if one does not know the options that were used during the testing. If you meant the compilation options, those were simply the FreeBSD defaults for all tested programs, e.g. -O2 -pipe, except for boost, which uses -ftemplate-depth-128 -O3 -finline-functions. I will add some explicit notes about them. Yes, I meant the options specified on the compiler command line. 'gcc -O0 -pipe' compiles code faster than 'gcc -O3 -save-temps', and the former uses much less memory. Steve does have a point. Posting the results of CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through the code to figure out what *FLAGS were used elsewhere) is more valuable than the data is in its current state (unfortunately.. autoconf makes things more complicated). Maybe we need some micro benchmarks for this (no, I'm not volunteering :P). Thanks! -Garrett ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org