Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-16 Thread Dimitry Andric

On 2012-09-16 07:19, Konstantin Belousov wrote:

On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote:

...

I tried to map the CPUID into more human-friendly family moniker, and it
seems that these are Pentium-4 class CPUs. Am I right ?


Yes, it is apparently a Nocona model, this is part of the dmesg:

CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.24-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0xf41  Family = f  Model = 4  Stepping = 1
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0x641dSSE3,DTES64,MON,DS_CPL,CNXT-ID,CX16,xTPR
  AMD Features=0x20100800SYSCALL,NX,LM
  TSC: P-state invariant
real memory  = 4294967296 (4096 MB)
avail memory = 4097470464 (3907 MB)
Event timer LAPIC quality 400
ACPI APIC Table: DELL   PE BKC  
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 1 core(s) x 2 HTT threads
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP/HT): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP/HT): APIC ID:  7



If yes, could you, please, rerun the tests on anything more recent than
Core2, i.e. any Core i7-whatever class of Xeons ?


I would love to, especially because the tests will complete faster, but
I currently do not have access to physical machines of that class.

Normally I do performance tests on the FreeBSD reference machines, but
since these tests require booting with a custom kernel (and preferably
root access + remote console), I cannot use them.

So if somebody can offer such a machine (for a limited time only, a few
days most likely, 1 week maximum), it would be great.

-Dimitry
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-16 Thread Dimitry Andric

On 2012-09-16 07:25, Garrett Cooper wrote:
...

 If you can provide the tests, I can rerun it on some Nehalem class
workstations I have access to. I unfortunately don't have access to
SNB/Romley hardware yet.


I did these tests as follows:
- Install a recent -CURRENT snapshot on the box (or rebuild world and
  kernel by hand and install them).
- Install Subversion.
- Checkout head sources into /usr/src, if not already there.
- Build GENERIC kernel with gcc, using default settings, and install it
  into /boot/kernel.gcc.
- Build GENERIC kernel with clang, using default settings, and install
  it into /boot/kernel.clang.
- Boot machine with either kernel, then run the attached runtest.sh
  script, with the buildworld_{single,multi}.sh scripts in the same
  directory.  Save the resulting run-*.txt files in a directory that
  indicates whether the kernel in use was built by gcc or by clang.

You can tweak the 'num_runs' variable at the top of runtest.sh to do
more runs, if the machine is fast.  This should give more confidence in
the final statistics.  I did just 3 runs on Gavin's machine, since it
took more than 7 hours for a single-threaded buildworld to complete.
Doing 6 runs should be more than enough.

The run-*.txt files contain the time(1) output of each run, and should
be processed through ministat to give average, stddev and so on.  Just
send them to me, I will process them and summarize the statistics.

Alternatively, you can give me remote access, and I'll do it. :)
#!/bin/sh
mypath=${0%/*}
num_runs=3

set -e

do_runtest() {
  for i in $(jot ${num_runs}); do
rm -rf /usr/obj/*
sync
echo Doing build $1, run $i...
/usr/bin/time -l -o run-$1-$i.txt ${mypath}/build$1.sh  run-$1-$i.log
head -1 run-$1-$i.txt
  done
}

do_runtest world_single
do_runtest world_multi
#!/bin/sh
set -e
cd /usr/src
make -s buildworld
#!/bin/sh
set -e
cd /usr/src
make -s -j8 buildworld
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-16 Thread Dominic Fandrey

On 16/09/2012 00:34, Dimitry Andric wrote:

...

The executive summary: GENERIC kernels compiled with clang 3.2 are
slightly faster than those compiled by gcc 4.2.1, though the difference
will not very noticeable in practice.


It has been my impression in the past, that math heavy applications
benefit from GCC whereas I/O heavy applications yield better performance
when compiled with clang.

I'd say a kernel has a lot more I/O than math to deal with.


--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-15 Thread Luigi Rizzo
On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote:
 Hi all,
 
 By request, I performed a series of kernel performance tests on FreeBSD
 10.0-CURRENT, particularly comparing the runtime performance of GENERIC
 kernels compiled by gcc 4.2.1 and by clang 3.2.

the fact that the difference is so small is interesting,
and it might almost suggests that the test is dominated by
other factors than the compiler. By chance do you have a
way to produce other data points with different optimization
levels in the compiler ?

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-15 Thread Dimitry Andric

On 2012-09-16 01:22, Luigi Rizzo wrote:
...

the fact that the difference is so small is interesting,
and it might almost suggests that the test is dominated by
other factors than the compiler.


Yes, this result was more or less what I expected: runtime performance
is probably related more to hardware speed, and the efficiency of the
chosen algorithms in the kernel, than to the optimizations any current
compiler can produce.

Apparently our kernel hackers already produce quite efficient code. :)



By chance do you have a
way to produce other data points with different optimization
levels in the compiler ?


I could re-run the tests with e.g. -O1 instead of -O2, or maybe even
-O0, though I am not sure if the kernel will compile correctly without
any optimization.  This will take a while though, and I am not sure if I
can borrow Gavin's machine long enough. :)

-Dimitry
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-15 Thread Konstantin Belousov
On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote:
 Hi all,
 
 By request, I performed a series of kernel performance tests on FreeBSD
 10.0-CURRENT, particularly comparing the runtime performance of GENERIC
 kernels compiled by gcc 4.2.1 and by clang 3.2.
 
 The attached text file[1] contains more information about the tests,
 some semi-cooked performance data, and my conclusions.  Any errors and
 omissions are also my fault, so if you notice them, please let me know.
 
 The executive summary: GENERIC kernels compiled with clang 3.2 are
 slightly faster than those compiled by gcc 4.2.1, though the difference
 will not very noticeable in practice.
 
 Last but not least, thanks to Gavin Atkinson for providing the required
 hardware.

Thank you very much for doing this.

I tried to map the CPUID into more human-friendly family moniker, and it
seems that these are Pentium-4 class CPUs. Am I right ?

If yes, could you, please, rerun the tests on anything more recent than
Core2, i.e. any Core i7-whatever class of Xeons ?

Thank again.


pgphJY78vtxIX.pgp
Description: PGP signature


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-15 Thread Garrett Cooper
On Sat, Sep 15, 2012 at 10:19 PM, Konstantin Belousov
kostik...@gmail.com wrote:
 On Sun, Sep 16, 2012 at 12:34:45AM +0200, Dimitry Andric wrote:
 Hi all,

 By request, I performed a series of kernel performance tests on FreeBSD
 10.0-CURRENT, particularly comparing the runtime performance of GENERIC
 kernels compiled by gcc 4.2.1 and by clang 3.2.

 The attached text file[1] contains more information about the tests,
 some semi-cooked performance data, and my conclusions.  Any errors and
 omissions are also my fault, so if you notice them, please let me know.

 The executive summary: GENERIC kernels compiled with clang 3.2 are
 slightly faster than those compiled by gcc 4.2.1, though the difference
 will not very noticeable in practice.

 Last but not least, thanks to Gavin Atkinson for providing the required
 hardware.

 Thank you very much for doing this.

 I tried to map the CPUID into more human-friendly family moniker, and it
 seems that these are Pentium-4 class CPUs. Am I right ?

 If yes, could you, please, rerun the tests on anything more recent than
 Core2, i.e. any Core i7-whatever class of Xeons ?

If you can provide the tests, I can rerun it on some Nehalem class
workstations I have access to. I unfortunately don't have access to
SNB/Romley hardware yet.
Thanks,
-Garrett
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-06 Thread Roman Divacky
On Wed, Sep 05, 2012 at 03:13:11PM -0700, Steve Kargl wrote:
 On Wed, Sep 05, 2012 at 11:31:26AM +0200, Dimitry Andric wrote:
  On 2012-09-05 01:40, Garrett Cooper wrote:
  ...
   Steve does have a point. Posting the results of
  CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through
  the code to figure out what *FLAGS were used elsewhere) is more
  valuable than the data is in its current state (unfortunately..
  autoconf makes things more complicated).
  
  1) For building the FreeBSD in-tree version of clang 3.2:
  
   -O2 -pipe -fno-strict-aliasing
  
  2) For building the FreeBSD in-tree version of gcc 4.2.1:
  
   -O2 -pipe
  
  3) For building Boost 1.50.0:
  
   -ftemplate-depth-128 -O3 -finline-functions
  
 
 Dimitry thanks for the follow-up.  I performed an unscientific
 (micro)benchmark of /usr/bin/cc vs /usr/bin/clang where cc is
 the base system's gcc 4.2.1.  Here's what I found/feared.
 
 Compiling libm on 
 
 CPU: AMD Opteron(tm) Processor 248 (2192.01-MHz K8-class CPU)
   Origin = AuthenticAMD  Id = 0xf5a  Family = f  Model = 5  Stepping = 10
   Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,\
  MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2
   AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow!+,3DNow!
 
 with default CFLAGS (ie., -O2 -pipe) and -march=opteron.
 
Was this compiled as amd64 or i386? Also, can you send me the test case?
So that we can explore the difference. The working theory now is SSE vs FPU
mathematics, but it would be nice to see the testcase.

Thank you, roman

 Using 'setenv CC /usr/bin/cc' with 3 runs of
 
 make clean
 time make -DNO_MAN
 
 yields
 
69.39 real52.00 user38.55 sys
69.57 real52.35 user38.37 sys
69.48 real52.25 user38.38 sys
 
 Now, repeating with 'setenv CC /usr/bin/clang' yields
 
39.65 real21.86 user17.37 sys
40.91 real21.48 user17.91 sys
39.77 real21.65 user17.64 sys
 
 So, clang does appear to be faster in this particular 
 compiling speed benchmark.
 
 However, if I know build my test program for libm's j0f()
 function where the only difference is whether libm was
 built with /usr/bin/cc or /usr/bin/clang, I observe the
 following results. 
  
 1234567 x values in the interval [0:25]
 
  gcc libm|   clang libm
  |-
   ULP = 0.6 -- 565515 (45.81%) | 513763 (41.61%)
 0.6  ULP = 0.7 -- 74148  ( 6.01%) | 67221  ( 5.44%)
 0.7  ULP = 0.8 -- 69112  ( 5.60%) | 62846  ( 5.09%)
 0.8  ULP = 0.9 -- 63798  ( 5.17%) | 58217  ( 4.72%)
 0.9  ULP = 1.0 -- 58679  ( 4.75%) | 53834  ( 4.36%)
 1.0  ULP = 2.0 -- 328221 (26.59%) | 306728 (24.84%)
 2.0  ULP = 3.0 -- 65323  ( 5.29%) | 63452  ( 5.14%)
 3.0  ULP-- 9771   ( 0.79%) | 108506 ( 8.79%)
 
 gcc libm | clang libm
   ---|
  MAX ULP: 12152.27637| 1129606938624.0
 x at MAX ULP: 5.520077 0x1.6148f2p+2 | 2.404833 0x1.33d19p+1
 
 Speed test with gcc libm.
 1234567 j0f calls in 0.193427 seconds.
 1234567 j0f calls in 0.193410 seconds.
 1234567 j0f calls in 0.194158 seconds.
 
 Speed test with clang libm.
 1234567 j0f calls in 0.180260 seconds.
 1234567 j0f calls in 0.180130 seconds.
 1234567 j0f calls in 0.179739 seconds.
 
 So, although the clang built j0f() appears to be faster than
 the gcc built j0f(), the clang built j0f() has much worse
 accuracy issues.
 
 -- 
 Steve
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-06 Thread David Chisnall
On 6 Sep 2012, at 09:43, Roman Divacky wrote:

 Was this compiled as amd64 or i386? Also, can you send me the test case?
 So that we can explore the difference. The working theory now is SSE vs FPU
 mathematics, but it would be nice to see the testcase.

There may also be a difference in whether -ffast-math is the default on each 
compiler.  On x86, this will replace a number of libm calls with (much faster, 
but less accurate) SSE or x87 instructions.  If this is enabled by default with 
clang and not with gcc, it would account for the difference.  

David___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-06 Thread Dimitry Andric

On 2012-09-06 12:20, David Chisnall wrote:
...

There may also be a difference in whether -ffast-math is the default on each 
compiler.  On x86, this will replace a number of libm calls with (much faster, 
but less accurate) SSE or x87 instructions.  If this is enabled by default with 
clang and not with gcc, it would account for the difference.


No, -ffast-math is not enabled by default in clang, as far as I can
tell.  Also, the help text for the option says:

Enable the *frontend*'s 'fast-math' mode. This has no effect on
optimizations, but provides a preprocessor macro __FAST_MATH__ the same
as GCC's -ffast-math flag.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-06 Thread Steve Kargl
On Thu, Sep 06, 2012 at 10:43:12AM +0200, Roman Divacky wrote:
 On Wed, Sep 05, 2012 at 03:13:11PM -0700, Steve Kargl wrote:
  
  Compiling libm on 
  
  CPU: AMD Opteron(tm) Processor 248 (2192.01-MHz K8-class CPU)
Origin = AuthenticAMD  Id = 0xf5a  Family = f  Model = 5  Stepping = 10
Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,\
   MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2
AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow!+,3DNow!
  
  with default CFLAGS (ie., -O2 -pipe) and -march=opteron.
  
 Was this compiled as amd64 or i386? Also, can you send me the test case?
 So that we can explore the difference. The working theory now is SSE vs FPU
 mathematics, but it would be nice to see the testcase.
 
 Thank you, roman
 

It was compiled on amd64.  I can do the same testing on i386
this weekend.

The testcase is not a self-contained piece of code.  It
uses parts of my floating point test frame.  Putting
together the testcase may take a few hours.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-05 Thread Dimitry Andric

On 2012-09-05 01:40, Garrett Cooper wrote:
...

 Steve does have a point. Posting the results of
CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through
the code to figure out what *FLAGS were used elsewhere) is more
valuable than the data is in its current state (unfortunately..
autoconf makes things more complicated).


Just to note, autoconf is not used in the FreeBSD source tree, so it
does not apply to the first two builds in the performance test (e.g.
building in-tree clang and gcc).

The other build is Boost, which has yet another totally different build
system, based on Perforce's Jam.  Again, no autoconf.

In any case, for all three builds, the default optimization options were
used.  Basically:

1) For building the FreeBSD in-tree version of clang 3.2:

 -O2 -pipe -fno-strict-aliasing

   These are just the default FreeBSD optimization flags for building
   clang, which are probably used by the majority of users out there.
   This is the case that I was interested in particularly.  The
   -fno-strict-aliasing is not really my choice, but it was introduced
   in the past by Nathan Whitehorn, who apparently saw problems without
   it.  It will hopefully disappear in the future.

2) For building the FreeBSD in-tree version of gcc 4.2.1:

 -O2 -pipe

   These are the default FreeBSD optimization flags.

3) For building Boost 1.50.0:

 -ftemplate-depth-128 -O3 -finline-functions

   These are the Boost defaults for gcc-compatible compilers, from
   tools/build/v2/tools/gcc.jam.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-05 Thread David Chisnall
On 5 Sep 2012, at 10:31, Dimitry Andric wrote:

   These are just the default FreeBSD optimization flags for building
   clang, which are probably used by the majority of users out there.
   This is the case that I was interested in particularly.  The
   -fno-strict-aliasing is not really my choice, but it was introduced
   in the past by Nathan Whitehorn, who apparently saw problems without
   it.  It will hopefully disappear in the future.

Clang currently defaults to no strict aliasing on FreeBSD.  In my experience, 
most C programmers misunderstand the aliasing rules of C and even people on the 
C++ standards committee often get them wrong for C++, so trading a 1-10% 
performance increase  for a significant chance of generating non-working code 
seems like a poor gain.  If people are certain that they do understand the 
rules, then they can add -fstrict-aliasing to their own CFLAGS.

David___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-05 Thread Dimitry Andric

On 2012-09-05 11:36, David Chisnall wrote:

On 5 Sep 2012, at 10:31, Dimitry Andric wrote:

   TThe
   -fno-strict-aliasing is not really my choice, but it was introduced
   in the past by Nathan Whitehorn, who apparently saw problems without
   it.  It will hopefully disappear in the future.

Clang currently defaults to no strict aliasing on FreeBSD.


Yes, but upstream has never used -fno-strict-aliasing, just plain -O2.
I run regular separate builds of pristine upstream clang on FreeBSD, and
I haven't seen any failures due aliasing problems in all the regression
tests.  That doesn't guarantee there are no problems, of course...



In my experience, most C programmers misunderstand the aliasing rules of C and 
even people on the C++ standards committee often get them wrong for C++, so 
trading a 1-10% performance increase  for a significant chance of generating 
non-working code seems like a poor gain.  If people are certain that they do 
understand the rules, then they can add -fstrict-aliasing to their own CFLAGS.


I'm actually quite interested in the performance difference; I think I
will run a few tests. :)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-05 Thread Justin Hibbits
On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com wrote:

 On 2012-09-05 11:36, David Chisnall wrote:

 On 5 Sep 2012, at 10:31, Dimitry Andric wrote:

TThe

-fno-strict-aliasing is not really my choice, but it was introduced
in the past by Nathan Whitehorn, who apparently saw problems without
it.  It will hopefully disappear in the future.

 Clang currently defaults to no strict aliasing on FreeBSD.


 Yes, but upstream has never used -fno-strict-aliasing, just plain -O2.
 I run regular separate builds of pristine upstream clang on FreeBSD, and
 I haven't seen any failures due aliasing problems in all the regression
 tests.  That doesn't guarantee there are no problems, of course...


Aliasing problems are seen much more frequently on PowerPC than any other
platform for Clang.  I found this a while back when doing some Clang
testing, and I still see problems with upstream unless I explicitly set
-fno-strict-aliasing.  Nathan had mentioned wanting to get upstream to use
-fno-strict-aliasing by default on all platforms, but I don't think that
ever made it beyond his suggesting.

I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955


In my experience, most C programmers misunderstand the aliasing rules of C
 and even people on the C++ standards committee often get them wrong for
 C++, so trading a 1-10% performance increase  for a significant chance of
 generating non-working code seems like a poor gain.  If people are certain
 that they do understand the rules, then they can add -fstrict-aliasing to
 their own CFLAGS.


 I'm actually quite interested in the performance difference; I think I
 will run a few tests. :)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-05 Thread Roman Divacky
What makes you think it's a bug in llvm code and not a plain gcc miscompile?
Other people seem to compile llvm on PPC64 with gcc and -fstrict-aliasing
just fine. They just dont happen to use gcc4.2.1. Ie. gcc47 is reported
to not have this problem. I personally can confirm that fbsd+gcc48 is ok to

On Wed, Sep 05, 2012 at 09:11:22AM -0400, Justin Hibbits wrote:
 On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com wrote:
 
  On 2012-09-05 11:36, David Chisnall wrote:
 
  On 5 Sep 2012, at 10:31, Dimitry Andric wrote:
 
 TThe
 
 -fno-strict-aliasing is not really my choice, but it was introduced
 in the past by Nathan Whitehorn, who apparently saw problems without
 it.  It will hopefully disappear in the future.
 
  Clang currently defaults to no strict aliasing on FreeBSD.
 
 
  Yes, but upstream has never used -fno-strict-aliasing, just plain -O2.
  I run regular separate builds of pristine upstream clang on FreeBSD, and
  I haven't seen any failures due aliasing problems in all the regression
  tests.  That doesn't guarantee there are no problems, of course...
 
 
 Aliasing problems are seen much more frequently on PowerPC than any other
 platform for Clang.  I found this a while back when doing some Clang
 testing, and I still see problems with upstream unless I explicitly set
 -fno-strict-aliasing.  Nathan had mentioned wanting to get upstream to use
 -fno-strict-aliasing by default on all platforms, but I don't think that
 ever made it beyond his suggesting.
 
 I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955
 
 
 In my experience, most C programmers misunderstand the aliasing rules of C
  and even people on the C++ standards committee often get them wrong for
  C++, so trading a 1-10% performance increase  for a significant chance of
  generating non-working code seems like a poor gain.  If people are certain
  that they do understand the rules, then they can add -fstrict-aliasing to
  their own CFLAGS.
 
 
  I'm actually quite interested in the performance difference; I think I
  will run a few tests. :)
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-05 Thread Justin Hibbits
Actually, Nathan does say it's gcc's fault in a comment on that bug.
 However, I do all my clang work compiling it with gcc4.2.1, so run into
this constantly when I forget to add the flag.

- Justin

On Wed, Sep 5, 2012 at 1:37 PM, Roman Divacky rdiva...@freebsd.org wrote:

 What makes you think it's a bug in llvm code and not a plain gcc
 miscompile?
 Other people seem to compile llvm on PPC64 with gcc and -fstrict-aliasing
 just fine. They just dont happen to use gcc4.2.1. Ie. gcc47 is reported
 to not have this problem. I personally can confirm that fbsd+gcc48 is ok to

 On Wed, Sep 05, 2012 at 09:11:22AM -0400, Justin Hibbits wrote:
  On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com
 wrote:
 
   On 2012-09-05 11:36, David Chisnall wrote:
  
   On 5 Sep 2012, at 10:31, Dimitry Andric wrote:
  
  TThe
  
  -fno-strict-aliasing is not really my choice, but it was
 introduced
  in the past by Nathan Whitehorn, who apparently saw problems
 without
  it.  It will hopefully disappear in the future.
  
   Clang currently defaults to no strict aliasing on FreeBSD.
  
  
   Yes, but upstream has never used -fno-strict-aliasing, just plain -O2.
   I run regular separate builds of pristine upstream clang on FreeBSD,
 and
   I haven't seen any failures due aliasing problems in all the regression
   tests.  That doesn't guarantee there are no problems, of course...
 
 
  Aliasing problems are seen much more frequently on PowerPC than any other
  platform for Clang.  I found this a while back when doing some Clang
  testing, and I still see problems with upstream unless I explicitly set
  -fno-strict-aliasing.  Nathan had mentioned wanting to get upstream to
 use
  -fno-strict-aliasing by default on all platforms, but I don't think that
  ever made it beyond his suggesting.
 
  I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955
 
 
  In my experience, most C programmers misunderstand the aliasing rules of
 C
   and even people on the C++ standards committee often get them wrong
 for
   C++, so trading a 1-10% performance increase  for a significant
 chance of
   generating non-working code seems like a poor gain.  If people are
 certain
   that they do understand the rules, then they can add
 -fstrict-aliasing to
   their own CFLAGS.
  
  
   I'm actually quite interested in the performance difference; I think I
   will run a few tests. :)
  ___
  freebsd-current@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to 
 freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-05 Thread Roman Divacky
I've been compiling clang with itself on PPC64 for a while now. Works quite
good :)

On Wed, Sep 05, 2012 at 01:44:00PM -0400, Justin Hibbits wrote:
 Actually, Nathan does say it's gcc's fault in a comment on that bug.
  However, I do all my clang work compiling it with gcc4.2.1, so run into
 this constantly when I forget to add the flag.
 
 - Justin
 
 On Wed, Sep 5, 2012 at 1:37 PM, Roman Divacky rdiva...@freebsd.org wrote:
 
  What makes you think it's a bug in llvm code and not a plain gcc
  miscompile?
  Other people seem to compile llvm on PPC64 with gcc and -fstrict-aliasing
  just fine. They just dont happen to use gcc4.2.1. Ie. gcc47 is reported
  to not have this problem. I personally can confirm that fbsd+gcc48 is ok to
 
  On Wed, Sep 05, 2012 at 09:11:22AM -0400, Justin Hibbits wrote:
   On Wed, Sep 5, 2012 at 6:56 AM, Dimitry Andric dimi...@andric.com
  wrote:
  
On 2012-09-05 11:36, David Chisnall wrote:
   
On 5 Sep 2012, at 10:31, Dimitry Andric wrote:
   
   TThe
   
   -fno-strict-aliasing is not really my choice, but it was
  introduced
   in the past by Nathan Whitehorn, who apparently saw problems
  without
   it.  It will hopefully disappear in the future.
   
Clang currently defaults to no strict aliasing on FreeBSD.
   
   
Yes, but upstream has never used -fno-strict-aliasing, just plain -O2.
I run regular separate builds of pristine upstream clang on FreeBSD,
  and
I haven't seen any failures due aliasing problems in all the regression
tests.  That doesn't guarantee there are no problems, of course...
  
  
   Aliasing problems are seen much more frequently on PowerPC than any other
   platform for Clang.  I found this a while back when doing some Clang
   testing, and I still see problems with upstream unless I explicitly set
   -fno-strict-aliasing.  Nathan had mentioned wanting to get upstream to
  use
   -fno-strict-aliasing by default on all platforms, but I don't think that
   ever made it beyond his suggesting.
  
   I filed this bug to track it: http://llvm.org/bugs/show_bug.cgi?id=11955
  
  
   In my experience, most C programmers misunderstand the aliasing rules of
  C
and even people on the C++ standards committee often get them wrong
  for
C++, so trading a 1-10% performance increase  for a significant
  chance of
generating non-working code seems like a poor gain.  If people are
  certain
that they do understand the rules, then they can add
  -fstrict-aliasing to
their own CFLAGS.
   
   
I'm actually quite interested in the performance difference; I think I
will run a few tests. :)
   ___
   freebsd-current@freebsd.org mailing list
   http://lists.freebsd.org/mailman/listinfo/freebsd-current
   To unsubscribe, send any mail to 
  freebsd-current-unsubscr...@freebsd.org
 
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-05 Thread Steve Kargl
On Wed, Sep 05, 2012 at 11:31:26AM +0200, Dimitry Andric wrote:
 On 2012-09-05 01:40, Garrett Cooper wrote:
 ...
  Steve does have a point. Posting the results of
 CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through
 the code to figure out what *FLAGS were used elsewhere) is more
 valuable than the data is in its current state (unfortunately..
 autoconf makes things more complicated).
 
 1) For building the FreeBSD in-tree version of clang 3.2:
 
  -O2 -pipe -fno-strict-aliasing
 
 2) For building the FreeBSD in-tree version of gcc 4.2.1:
 
  -O2 -pipe
 
 3) For building Boost 1.50.0:
 
  -ftemplate-depth-128 -O3 -finline-functions
 

Dimitry thanks for the follow-up.  I performed an unscientific
(micro)benchmark of /usr/bin/cc vs /usr/bin/clang where cc is
the base system's gcc 4.2.1.  Here's what I found/feared.

Compiling libm on 

CPU: AMD Opteron(tm) Processor 248 (2192.01-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0xf5a  Family = f  Model = 5  Stepping = 10
  Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,\
 MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2
  AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow!+,3DNow!

with default CFLAGS (ie., -O2 -pipe) and -march=opteron.

Using 'setenv CC /usr/bin/cc' with 3 runs of

make clean
time make -DNO_MAN

yields

   69.39 real52.00 user38.55 sys
   69.57 real52.35 user38.37 sys
   69.48 real52.25 user38.38 sys

Now, repeating with 'setenv CC /usr/bin/clang' yields

   39.65 real21.86 user17.37 sys
   40.91 real21.48 user17.91 sys
   39.77 real21.65 user17.64 sys

So, clang does appear to be faster in this particular 
compiling speed benchmark.

However, if I know build my test program for libm's j0f()
function where the only difference is whether libm was
built with /usr/bin/cc or /usr/bin/clang, I observe the
following results. 
 
1234567 x values in the interval [0:25]

 gcc libm|   clang libm
 |-
  ULP = 0.6 -- 565515 (45.81%) | 513763 (41.61%)
0.6  ULP = 0.7 -- 74148  ( 6.01%) | 67221  ( 5.44%)
0.7  ULP = 0.8 -- 69112  ( 5.60%) | 62846  ( 5.09%)
0.8  ULP = 0.9 -- 63798  ( 5.17%) | 58217  ( 4.72%)
0.9  ULP = 1.0 -- 58679  ( 4.75%) | 53834  ( 4.36%)
1.0  ULP = 2.0 -- 328221 (26.59%) | 306728 (24.84%)
2.0  ULP = 3.0 -- 65323  ( 5.29%) | 63452  ( 5.14%)
3.0  ULP-- 9771   ( 0.79%) | 108506 ( 8.79%)

gcc libm | clang libm
  ---|
 MAX ULP: 12152.27637| 1129606938624.0
x at MAX ULP: 5.520077 0x1.6148f2p+2 | 2.404833 0x1.33d19p+1

Speed test with gcc libm.
1234567 j0f calls in 0.193427 seconds.
1234567 j0f calls in 0.193410 seconds.
1234567 j0f calls in 0.194158 seconds.

Speed test with clang libm.
1234567 j0f calls in 0.180260 seconds.
1234567 j0f calls in 0.180130 seconds.
1234567 j0f calls in 0.179739 seconds.

So, although the clang built j0f() appears to be faster than
the gcc built j0f(), the clang built j0f() has much worse
accuracy issues.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-04 Thread O. Hartmann
On 09/04/12 22:39, Dimitry Andric wrote:
 Hi all,
 
 I recently performed a series of compiler performance tests on FreeBSD
 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against
 clang 3.1 and clang 3.2.
 
 The attached text file[1] contains more information about the tests,
 some semi-cooked performance data, and my conclusions.  Any errors and
 omissions are also my fault, so if you notice them, please let me know.
 
 The executive summary: clang compiles mostly faster than gcc (sometimes
 much faster), and uses significantly less memory.
 
 Finally, please note these tests were purely about compilation speed,
 not about the performance of the resulting executables.  This still
 needs to be tested.
 
 -Dimitry
 
 [1]: Also available at:
 http://www.andric.com/freebsd/perftest/perftest-2012-09-01a.txt




Very intersting.
It would also be of great interest to have some benchmarks on FBSD 10 at
hand which compare the performance of the resulting binary of those
compilers.

Regards,
Oliver











signature.asc
Description: OpenPGP digital signature


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-04 Thread Steve Kargl
On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote:
 
 I recently performed a series of compiler performance tests on FreeBSD
 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against
 clang 3.1 and clang 3.2.
 
 The attached text file[1] contains more information about the tests,
 some semi-cooked performance data, and my conclusions.  Any errors and
 omissions are also my fault, so if you notice them, please let me know.
 
 The executive summary: clang compiles mostly faster than gcc (sometimes
 much faster), and uses significantly less memory.

The benchmark is somewhat meaningless if one does not 
know the options that were used during the testing.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-04 Thread Garrett Cooper
On Tue, Sep 4, 2012 at 1:39 PM, Dimitry Andric dimi...@andric.com wrote:
 Hi all,

 I recently performed a series of compiler performance tests on FreeBSD
 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against
 clang 3.1 and clang 3.2.

 The attached text file[1] contains more information about the tests,
 some semi-cooked performance data, and my conclusions.  Any errors and
 omissions are also my fault, so if you notice them, please let me know.

 The executive summary: clang compiles mostly faster than gcc (sometimes
 much faster), and uses significantly less memory.

 Finally, please note these tests were purely about compilation speed,
 not about the performance of the resulting executables.  This still
 needs to be tested.

It would be interesting to see how clang++ performs vs g++ when
dealing with nested classes and with complicated code when trying to
optimize things because the optimizer in g++ apparently has some
scaling issues.
Thanks!
-Garrett
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-04 Thread Dimitry Andric

On 2012-09-04 23:43, Steve Kargl wrote:

On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote:

I recently performed a series of compiler performance tests on FreeBSD
10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against
clang 3.1 and clang 3.2.

...

The benchmark is somewhat meaningless if one does not
know the options that were used during the testing.


If you meant the compilation options, those were simply the FreeBSD
defaults for all tested programs, e.g. -O2 -pipe, except for boost,
which uses -ftemplate-depth-128 -O3 -finline-functions.  I will add
some explicit notes about them.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-04 Thread Steve Kargl
On Tue, Sep 04, 2012 at 11:59:39PM +0200, Dimitry Andric wrote:
 On 2012-09-04 23:43, Steve Kargl wrote:
 On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote:
 I recently performed a series of compiler performance tests on FreeBSD
 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against
 clang 3.1 and clang 3.2.
 ...
 The benchmark is somewhat meaningless if one does not
 know the options that were used during the testing.
 
 If you meant the compilation options, those were simply the FreeBSD
 defaults for all tested programs, e.g. -O2 -pipe, except for boost,
 which uses -ftemplate-depth-128 -O3 -finline-functions.  I will add
 some explicit notes about them.

Yes, I meant the options specified on the compiler command line.
'gcc -O0 -pipe' compiles code faster than 'gcc -O3 -save-temps',
and the former uses much less memory.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Compiler performance tests on FreeBSD 10.0-CURRENT

2012-09-04 Thread Garrett Cooper
On Tue, Sep 4, 2012 at 3:14 PM, Steve Kargl
s...@troutmask.apl.washington.edu wrote:
 On Tue, Sep 04, 2012 at 11:59:39PM +0200, Dimitry Andric wrote:
 On 2012-09-04 23:43, Steve Kargl wrote:
 On Tue, Sep 04, 2012 at 10:39:40PM +0200, Dimitry Andric wrote:
 I recently performed a series of compiler performance tests on FreeBSD
 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against
 clang 3.1 and clang 3.2.
 ...
 The benchmark is somewhat meaningless if one does not
 know the options that were used during the testing.

 If you meant the compilation options, those were simply the FreeBSD
 defaults for all tested programs, e.g. -O2 -pipe, except for boost,
 which uses -ftemplate-depth-128 -O3 -finline-functions.  I will add
 some explicit notes about them.

 Yes, I meant the options specified on the compiler command line.
 'gcc -O0 -pipe' compiles code faster than 'gcc -O3 -save-temps',
 and the former uses much less memory.

Steve does have a point. Posting the results of
CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through
the code to figure out what *FLAGS were used elsewhere) is more
valuable than the data is in its current state (unfortunately..
autoconf makes things more complicated).
Maybe we need some micro benchmarks for this (no, I'm not volunteering :P).
Thanks!
-Garrett
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org