Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-14 Thread Martin Matuska
After my recent patches to HEAD not anymore.
I have also a SSSE3 patch and a general gcc 4.2 update patch pending.

Dňa 12.03.2011 09:42, Jakub Lach  wrote / napísal(a):
 
 Core i7 based procesors run slower with -march=core2 (new option) on the
 system 
 compiler than with -march=nocona
 
 Sorry for double mail, isn't CPUTYPE=core2 just alias to nocona with base
 compiler?
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-13 Thread Jakub Lach


Vinícius Zavam wrote:
 
 
 i'm still curious about things like CPUTYPE= and -march= configured as
 native, gentlemen.
 is it the golden egg to use with our system or not? why natives
 aren't in the benchs?
 
 /me feels confused.
 
 
 -- 
 Vinícius Zavam
 profiles.google.com/egypcio
 

Apparently -march=native would equal -march=core2
with 65nm generation Core2s, this is not the case with
Penryns.. But there are none in the test? 

However, I agree that testing with -march=native
would be simpler and more straightforward.

regards, 
- Jakub Lach
-- 
View this message in context: 
http://old.nabble.com/FreeBSD-Compiler-Benchmark%3A-gcc-base-vs.-gcc-ports-vs.-clang-tp31119986p31138978.html
Sent from the freebsd-current mailing list archive at Nabble.com.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-13 Thread Mehmet Erol Sanliturk
On Sun, Mar 13, 2011 at 3:19 PM, Jakub Lach jakub_l...@mailplus.pl wrote:



 Vinícius Zavam wrote:
 
 
  i'm still curious about things like CPUTYPE= and -march= configured as
  native, gentlemen.
  is it the golden egg to use with our system or not? why natives
  aren't in the benchs?
 
  /me feels confused.
 
 
  --
  Vinícius Zavam
  profiles.google.com/egypcio
 

 Apparently -march=native would equal -march=core2
 with 65nm generation Core2s, this is not the case with
 Penryns.. But there are none in the test?

 However, I agree that testing with -march=native
 would be simpler and more straightforward.

 regards,
 - Jakub Lach
 --





The compilers Clang and GCC may also be compared with the following design
because ( on the same computer , multiple parameters are measured ) :

  Clang Version x
 
 Repeated Measures ---
  p(1)   p(2)   p(3)   p(4) and other parameters
up to p(m)
   -  -  -  -
 Computer 1  value  value  value  value   .
 Computer 2  value  value  value  value   .
.
.
.
 Computer n  value  value  value  value   .



  GCC Version x
 
 Repeated Measures ---
p(1)   p(2)   p(3)   p(4) and other
parameters up to p(m)
 -  -  -  -
 Computer n+1value  value  value  value   .
 Computer n+2value  value  value  value   .
.
.
.
 Computer n+nvalue  value  value  value   .




For each compiler the same number of computers are used ( This is called
balanced design ) .
Evaluation of unbalanced designs may not be available in used statistical
packages , and
theoretically , they are NOT very good .


Here factors are :
   (1) Compilers ( CLang , GCC )
   (2) Measured parameters as Compilation Parameters
   such as
   ( No optimization )
   ( Optimization Level 1 )
   ( Optimization Level 2 )
   ( Processor Kind 1 )
   ( Processor Kind 2 ) and
   ( Code Generation Kind 1 )
   ( Code Generation Kind 2 ) and/or
 any number of other parameters

Number of computers as n should be greater than Number of parameters as m .

Subjects are the computers which no one of them is equal to the other .
Measured parameters are also called treatments .

In statistical analysis packages and
   books on experimental designs
this design is called

two-factor experiment with repeated measures on one factor .

Also the other names may be used :

Repeated measures design , or
within-subjects factorial design , or
multifactor experiments having repeated measures on the same elements



Inclusion of two GCC versions into the above table as another compiler may
NOT be very well ,
because GCC compilers are likely VERY CORRELATED with each other ( because
they are using
the  same code generator perhaps some patches applied to distinguish the
versions ).


To obtain an idea about correlation strength of the GCC compilers ,
CANONICAL correlation analysis may be used when there are multiple
parameters
( do NOT use two-by-two correlation coefficients when there are more than
two parameters ) , or ,
simple correlation when there are only two parameters ( one for each
compiler ) .


Design is as follows :

GCC Version x   GCC version y
-   -
 p(1)  P(2) ... p(k)p(1) (p2) ... p(k)

Computer 1   vvv   v v   v
computer 2   vvv   v v   v
.
.
.
Computer n  vvv   v v   v


where
 p(1) , p(2) , ... , p(k) are the measured parameters ,
  k : Number of parameters for each block individually
  ( There may be different parameter sets , but
for our purposes equivalent parameter sets are required )
  n : Number of observations , where each computer should be different
from
  the others .

  v : Value measured for a parameter

When there is a significant CANONICAL correlation between two compiler
related
values blocks :
 (i) it is NOT possible to include the two compilers in the above
 repeated measures design because of high collinearity .
 (2) Selection of BEST compiler is possible because two compilers are very
similar
 ( there are no difference between them other than performance level ) .


When the CANONICAL correlation is NOT significant ,
the other GCC compiler may be included
as a third compiler into the repeated measures design .




http://en.wikipedia.org/wiki/Repeated_measures_design
http://en.wikipedia.org/wiki/Category:Experimental_design


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-12 Thread Jakub Lach

Thanks for starting this interesting
comparison.

Maybe using -march=native would be 
simpler and more meaningful? I'm thinking 
about penryns especially.

regards, 
- Jakub Lach



-- 
View this message in context: 
http://old.nabble.com/FreeBSD-Compiler-Benchmark%3A-gcc-base-vs.-gcc-ports-vs.-clang-tp31119986p31131269.html
Sent from the freebsd-current mailing list archive at Nabble.com.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-12 Thread Jakub Lach

Core i7 based procesors run slower with -march=core2 (new option) on the
system 
compiler than with -march=nocona

Sorry for double mail, isn't CPUTYPE=core2 just alias to nocona with base
compiler?

-- 
View this message in context: 
http://old.nabble.com/FreeBSD-Compiler-Benchmark%3A-gcc-base-vs.-gcc-ports-vs.-clang-tp31119986p31131280.html
Sent from the freebsd-current mailing list archive at Nabble.com.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-12 Thread Martin Matuska
Hi Poul-Henning,

I have redone the test for majority of the processors, this time taking
5 samples of each whole testrun, calculating the average, standard
deviation, relative standard deviation, standard error and relative
standard error.

The relative standard error is below 0.25% for ~91%, between 0.25% and
0.5% for ~7%, 0.5%-1.0% for ~1% and between 1.0%-2.0% for 1% of the
tests. Under a test I mean 5 runs for the same setting of the same
compiler on the same preocessor.

So let's say I have now the string/base64 test for a core i7 showing the
following (score +/- standard deviation):
gcc421: 82.7892 points +/- 0.8314 (1%)
gcc45-nocona: 96.0882 points +/- 1.1652 (1.21%)

For a relative comparsion of two settings of the same test I could
calculate the difference of averages = 13.299 (16.06%) points and sum of
standard deviations = 2.4834 points (3.00%)

Therefore if assuming normal distribution intervals I could say that:
With a 95% probability gcc45-nocona is faster than gcc421 by at least
10.18% (16.06 - 1.96x3.00) or with a 99.9% probability by at least 6.12%
(16,06 - 3.2906x3.00).

So I should probably take a significance level (e.g. 95%, 99% or 99.9%)
and normalize all the test scores for this level. Results out of the
interval (difference is below zero) are then not significant.

What significance level should I take?

I hope this approach is better :)

Dňa 11.03.2011 17:46, Poul-Henning Kamp  wrote / napísal(a):
 In message 4d7a42cc.8020...@freebsd.org, Martin Matuska writes:
 
 But what I can say, e.g. for the Intel Atom processor, if there are
 performance gains in all but one test (that falls 2% behind), generic
 perl code (the routines benchmarked) on this processor is very likely to
 run faster with that setup.
 
 No, actually you cannot say that, unless you run all the tests at
 least three times for each compiler(+flag), calculate the average
 and standard deviation of all the tests, and see which, if any of
 the results are statistically significant.
 
 Until you do that, you numbers are meaningless, because we have no
 idea what the signal/noise ratio is.
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-12 Thread Mehmet Erol Sanliturk
2011/3/12 Martin Matuska m...@freebsd.org

 Hi Poul-Henning,

 I have redone the test for majority of the processors, this time taking
 5 samples of each whole testrun, calculating the average, standard
 deviation, relative standard deviation, standard error and relative
 standard error.

 The relative standard error is below 0.25% for ~91%, between 0.25% and
 0.5% for ~7%, 0.5%-1.0% for ~1% and between 1.0%-2.0% for 1% of the
 tests. Under a test I mean 5 runs for the same setting of the same
 compiler on the same preocessor.

 So let's say I have now the string/base64 test for a core i7 showing the
 following (score +/- standard deviation):
 gcc421: 82.7892 points +/- 0.8314 (1%)
 gcc45-nocona: 96.0882 points +/- 1.1652 (1.21%)

 For a relative comparsion of two settings of the same test I could
 calculate the difference of averages = 13.299 (16.06%) points and sum of
 standard deviations = 2.4834 points (3.00%)

 Therefore if assuming normal distribution intervals I could say that:
 With a 95% probability gcc45-nocona is faster than gcc421 by at least
 10.18% (16.06 - 1.96x3.00) or with a 99.9% probability by at least 6.12%
 (16,06 - 3.2906x3.00).

 So I should probably take a significance level (e.g. 95%, 99% or 99.9%)
 and normalize all the test scores for this level. Results out of the
 interval (difference is below zero) are then not significant.

 What significance level should I take?

 I hope this approach is better :)

 Dňa 11.03.2011 17:46, Poul-Henning Kamp  wrote / napísal(a):
  In message 4d7a42cc.8020...@freebsd.org, Martin Matuska writes:
 
  But what I can say, e.g. for the Intel Atom processor, if there are
  performance gains in all but one test (that falls 2% behind), generic
  perl code (the routines benchmarked) on this processor is very likely to
  run faster with that setup.
 
  No, actually you cannot say that, unless you run all the tests at
  least three times for each compiler(+flag), calculate the average
  and standard deviation of all the tests, and see which, if any of
  the results are statistically significant.
 
  Until you do that, you numbers are meaningless, because we have no
  idea what the signal/noise ratio is.
 



Additionally to possible answer by Poul-Henning Kamp , you may consider the
following pages because strength ( sensitivity ) of hypothesis tests are
determined by statistical power computations :

http://en.wikipedia.org/wiki/Statistical_power


http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
http://en.wikipedia.org/wiki/Category:Hypothesis_testing

http://en.wikipedia.org/wiki/Category:Statistical_terminology


Thank you very much .

Mehmet Erol Sanliturk
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-12 Thread Mehmet Erol Sanliturk
2011/3/12 Martin Matuska m...@freebsd.org

 Hi Poul-Henning,

 I have redone the test for majority of the processors, this time taking
 5 samples of each whole testrun, calculating the average, standard
 deviation, relative standard deviation, standard error and relative
 standard error.

 The relative standard error is below 0.25% for ~91%, between 0.25% and
 0.5% for ~7%, 0.5%-1.0% for ~1% and between 1.0%-2.0% for 1% of the
 tests.



...


 Under a test I mean 5 runs for the same setting of the same
 compiler on the same processor.


...


To have VALID test results , it is NECESSARY to obtain the results by using
DIFFERENT computers .
( This point is NOT mentioned in your message . I am assuming that the SAME
computer is used to get the results . )

If you repeat the same computations on the SAME computer , the values are
CORRELATED , and the t test
is NOT valid , because you are computing mean and standard deviation of
CORRELATED values , where the correlation is introduced by the SAME
processor .

To obtain a proper test values set , you may use the following set up :
( CLang and GCC versions , compilation parameters will be the same in all of
the computers )

 CLangGCC
 ----
Computer 1  v(1,1)v(1,2)
Computer 2  v(2,1)v(2,2)
.
.
.
Computer n  v(n,1)v(n,2)

If you do NOT have so many computers , you may obtain test results from
other reliable sources by using the same compilation parameters .

Now it is possible to use t-test on PAIRED values .

To determine the sample size , it is necessary to make power computations
BEFORE execution of experiment   by specifying required values a priori .


If you want to compare ( Clang Version x ) ... ( Clang Version y ) ( GCC
Version x ) ... ( GCC version y ) ... etc.
as MORE than TWO compilers at the same time , it is necessary to use
MULTIPLE COMPARISONS .
Using two-by-two t-tests as isolated from the rest of the results (
variables as compilers ) will give distorted results unless differences are
significant at the 0.001 level ( where actual significance level will be
greater than 0.001 , but very likely that less than 0.05 ) .

Such computations ( paired t-test , power , multiple comparisons and others
) are available in R statistical package which is in the Ports .

It is my opinion that using different processor models with approximate
speeds will not distort results very much . Personally I prefer such a
different processors set up . In this set up it will be possible to test
performance of the compilers on a mixture of processors ( likely as
independent from processor model ) .


Thank you very much .


Mehmet Erol Sanliturk
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-12 Thread Poul-Henning Kamp
In message 4d7b44af.7040...@freebsd.org, Martin Matuska writes:


Thanks a lot for doing this properly.

What significance level should I take?

I think I set ministat(1) to use 95 % confidence level by default
and that is in general a pretty safe bet (1 in 20 chance)

I hope this approach is better :)

Much, much better.

As I said, this was not to go after you personally, but to point
out that we need to be more rigorous with benchmarks in general.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-12 Thread Vinícius Zavam
2011/3/12 Poul-Henning Kamp p...@phk.freebsd.dk:
 In message 4d7b44af.7040...@freebsd.org, Martin Matuska writes:


 Thanks a lot for doing this properly.

What significance level should I take?

 I think I set ministat(1) to use 95 % confidence level by default
 and that is in general a pretty safe bet (1 in 20 chance)

I hope this approach is better :)

 Much, much better.

 As I said, this was not to go after you personally, but to point
 out that we need to be more rigorous with benchmarks in general.

 --
 Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
 p...@freebsd.org         | TCP/IP since RFC 956
 FreeBSD committer       | BSD since 4.3-tahoe
 Never attribute to malice what can adequately be explained by incompetence.

i'm still curious about things like CPUTYPE= and -march= configured as
native, gentlemen.
is it the golden egg to use with our system or not? why natives
aren't in the benchs?

/me feels confused.


-- 
Vinícius Zavam
profiles.google.com/egypcio
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-11 Thread Kostik Belousov
On Thu, Mar 10, 2011 at 10:33:37PM +0100, Martin Matuska wrote:
 Hi everyone,
 
 we have performed a benchmark of the perl binary compiled with base gcc,
 ports gcc and ports clang using the perlbench benchmark suite.
 Our benchmark was performed solely on amd64 with 10 different processors
 and we have tried different -march= flags to compare binary performance
 of the same compiler with different flags.
 
 Here is some statistics from the results:
 - clang falls 10% behind the base gcc 4.2.1 (test average)
 - gcc 4.5 from ports gives 5-10% better average performance than the
 base gcc 4.2.1
 - 4% average penalty for Intel Atom and -march=nocona (using gcc from base)
 - core i7 class processors run best with -march=nocona (using gcc from base)
 
 This benchmark speaks only for perl, but it tests quite a lot of
 generic features so we a are seriously considering using ports gcc for
 heavily used ports (e.g. PHP, MySQL, PostgreSQL) and suggesting that an
 user should be provided with a easily settable choice of using gcc 4.5
 for ports.
 
 A first step in this direction is in this PR (allowing build-only
 dependency on GCC):
 http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/155408
 
 More information, detailed test results and test configuration are at
 our blog:
 http://blog.vx.sk/archives/25-FreeBSD-Compiler-Benchmark-gcc-base-vs-gcc-ports-vs-clang.html

Putting the 'speed' question completely aside, I would like to comment
on other issue(s) there. The switching of the ports to use the port-provided
compiler (and binutils) would be very useful and often talked about feature.

Your approach of USE_GCC_BUILD as implemented is probably not going
to work. The problem is that gcc provides two libraries, libgcc and
libstdc++, that are not forward-compatible with the same libraries from
older compilers and our base.

libstdc++ definitely did grown new symbols and new versions of old
symbols, and I suspect that libgcc did the same. Also, we are trusting
the ABI stability premise.

For this scheme to work, we at least need a gcc-runtime port with dsos
provided by full port, and some mechnanism to force the binaries
compiled with port gcc to use gcc-runtime libs instead of base.
Might be, -Rpath linker cludge.


pgpRJa4cZqZ0P.pgp
Description: PGP signature


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-11 Thread Poul-Henning Kamp
In message 4d7943b1.1030...@freebsd.org, Martin Matuska writes:

More information, detailed test results and test configuration are at
our blog:
http://blog.vx.sk/archives/25-FreeBSD-Compiler-Benchmark-gcc-base-vs-gcc-ports-vs-clang.html

Please don't take this personally Martin, but you have triggered
my periodic rant about proper running, evaluation and reporting of
benchmarks.

These results are not published at a level of detail that allows
anybody to draw any kind of conclusions from them.

In particular, your use of overall best result selection is totally
bogus from a statistical point of view.

At the very least, we need to see standard-deviations on your numbers,
and preferably, when you claim that X is N% better than Y, you should
also provide the confidence interval on that judgment, Student's T
being the canonical test.

The ministat(1) program does both of these things, and is now in
FreeBSD/src, so there is absolutely no excuse for not using it.

In practice this means that you have to run each test at least three
times, to get a standardeviation, and you have to make sure that
your testconditions are as identical as possible.

Therefore, proper benchmarking procedure is something like:

(boot machine single-user   // Improves reproducibility)
(mount md(4)/malloc filesystem  // ditto)
(newfs test-partition   // ditto)
for at least 4 iterations:
run test A
run test B
run test C
...
Throw first result away for all tests
Run remaining results through ministat(1)

This was a public service announcement.

Poul-Henning

PS: Recommended reading: http://www.larrygonick.com/html/pub/books/sci7.html

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-11 Thread Alexander Leidinger
Quoting Martin Matuska m...@freebsd.org (from Thu, 10 Mar 2011  
22:33:37 +0100):



Hi everyone,

we have performed a benchmark of the perl binary compiled with base gcc,
ports gcc and ports clang using the perlbench benchmark suite.
Our benchmark was performed solely on amd64 with 10 different processors
and we have tried different -march= flags to compare binary performance
of the same compiler with different flags.

Here is some statistics from the results:
- clang falls 10% behind the base gcc 4.2.1 (test average)
- gcc 4.5 from ports gives 5-10% better average performance than the
base gcc 4.2.1


Can you rule out gcc specific optimizations as a cause of this  
difference for clang? As an example of what I mean: the configure  
script of LAME will use additional optimization flags if it detects  
gcc (even depending on the version of gcc). For clang (or other  
compilers which have similar flags than gcc but are not identified as  
gcc) there it will not use add those flags. Another possibility are  
preprocessor checks for gcc-specific defines (in case clang does not  
provide the same predefined defines, I do not know)?


Bye,
Alexander.

--
This MUST be a good party -- My RIB CAGE is being painfully pressed up
against someone's MARTINI!!

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-11 Thread Martin Matuska
I don't take this personally and fully understand your point.

But even if all conditions you described are met, I am still not able to
say this is better as I am not doing a microbenchmark. The +x% score
is just an average of all test scores weightened by factor 1 - this does
not reflect any real application out there, as these applications don't
use the tested functions in that exact weighting ratio. If one function
had score 0%, the program actually would be stale forever when executing
this function but the score of this average would still look promising :-)

But what I can say, e.g. for the Intel Atom processor, if there are
performance gains in all but one test (that falls 2% behind), generic
perl code (the routines benchmarked) on this processor is very likely to
run faster with that setup.
On the other hand, if clang generated code falls short in all tests, I
can say it is very likely that it will run slower. But again, I am
benchmarking just a subset of generic perl functions.

Cheers,
mm


Dňa 11.03.2011 15:01, Poul-Henning Kamp  wrote / napísal(a):
 In message 4d7943b1.1030...@freebsd.org, Martin Matuska writes:

 More information, detailed test results and test configuration are at
 our blog:
 http://blog.vx.sk/archives/25-FreeBSD-Compiler-Benchmark-gcc-base-vs-gcc-ports-vs-clang.html
 Please don't take this personally Martin, but you have triggered
 my periodic rant about proper running, evaluation and reporting of
 benchmarks.

 These results are not published at a level of detail that allows
 anybody to draw any kind of conclusions from them.

 In particular, your use of overall best result selection is totally
 bogus from a statistical point of view.

 At the very least, we need to see standard-deviations on your numbers,
 and preferably, when you claim that X is N% better than Y, you should
 also provide the confidence interval on that judgment, Student's T
 being the canonical test.

 The ministat(1) program does both of these things, and is now in
 FreeBSD/src, so there is absolutely no excuse for not using it.

 In practice this means that you have to run each test at least three
 times, to get a standardeviation, and you have to make sure that
 your testconditions are as identical as possible.

 Therefore, proper benchmarking procedure is something like:

   (boot machine single-user   // Improves reproducibility)
   (mount md(4)/malloc filesystem  // ditto)
   (newfs test-partition   // ditto)
   for at least 4 iterations:
   run test A
   run test B
   run test C
   ...
   Throw first result away for all tests
   Run remaining results through ministat(1)

 This was a public service announcement.

 Poul-Henning

 PS: Recommended reading: http://www.larrygonick.com/html/pub/books/sci7.html

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-11 Thread Poul-Henning Kamp
In message 4d7a42cc.8020...@freebsd.org, Martin Matuska writes:

But what I can say, e.g. for the Intel Atom processor, if there are
performance gains in all but one test (that falls 2% behind), generic
perl code (the routines benchmarked) on this processor is very likely to
run faster with that setup.

No, actually you cannot say that, unless you run all the tests at
least three times for each compiler(+flag), calculate the average
and standard deviation of all the tests, and see which, if any of
the results are statistically significant.

Until you do that, you numbers are meaningless, because we have no
idea what the signal/noise ratio is.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-11 Thread b. f.
 Putting the 'speed' question completely aside, I would like to comment
 on other issue(s) there. The switching of the ports to use the port-provided
 compiler (and binutils) would be very useful and often talked about feature.

 Your approach of USE_GCC_BUILD as implemented is probably not going
 to work. The problem is that gcc provides two libraries, libgcc and
 libstdc++, that are not forward-compatible with the same libraries from
 older compilers and our base.

 libstdc++ definitely did grown new symbols and new versions of old
 symbols, and I suspect that libgcc did the same. Also, we are trusting
 the ABI stability premise.

 For this scheme to work, we at least need a gcc-runtime port with dsos
 provided by full port, and some mechnanism to force the binaries
 compiled with port gcc to use gcc-runtime libs instead of base.
 Might be, -Rpath linker cludge.

There are a number of incompatible libraries.  The existing USE_GCC
scheme adds -Wl,rpath=... flags to CFLAGS and LDFLAGS in bsd.gcc.mk,
in an attempt to point the binaries to newer libraries.  Matuska is
not suggesting changing this -- his proposed new variable
USE_GCC_BUILD uses the existing USE_GCC framework, and differs from
the existing usage only in that it does not register any runtime
dependencies on lang/gcc*  in the packages that are produced.  His new
variable is intended, as he said, only for ports that don't need any
of the compiler libraries at runtime.

There are only two reasons for doing this: (1) reducing the number of
dependencies that must be installed when installing a package, or (2)
attempting to use lang/gcc4* to build a port that is currently needed
to build lang/gcc4* itself, without causing a problem with circular
dependencies.

For (2), I think that there are only:

devel/gmake
devel/binutils
devel/bison
lang/perl5.1[02]
devel/binutils
devel/libelf
converters/libiconv

(the others have runtime dependencies on libgcc_s) and the new
variable could not be added to the port Makefiles, because it would
still cause problems with circular dependencies when building these
ports if lang/gcc4* were not already installed.  It would have to be
added by users in local Makefiles, who had arranged their builds so
that it could be used.  But since the same effect could be obtained by
editing packages or the package database after the build, or by using
the methods Matuska advocated in:

http://www.freebsd.org/doc/en_US.ISO8859-1/articles/custom-gcc/index.html

the new variable does not seem to be worth including for the purpose of (2).

For (1), I'm not sure how many ports could use it.  We are already
working on reducing the amount of dependencies for those ports that
USE_FORTRAN or USE_GCC, by trying to add runtime-only lang/gcc4*
ports, but, owing to some awkward details involving the Ports
infrastructure and the way tinderboxes operate, the existing
lang/gcc4* ports have to be split into non-intersecting slave ports,
so there has been a delay while we sort out the details.

b.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

2011-03-11 Thread b. f.
Martin Matuska wrote:
 we have performed a benchmark of the perl binary compiled with base gcc,
 ports gcc and ports clang using the perlbench benchmark suite.
 Our benchmark was performed solely on amd64 with 10 different processors
 and we have tried different -march= flags to compare binary performance
 of the same compiler with different flags.

 Here is some statistics from the results:
 - clang falls 10% behind the base gcc 4.2.1 (test average)
 - gcc 4.5 from ports gives 5-10% better average performance than the
 base gcc 4.2.1
 - 4% average penalty for Intel Atom and -march=nocona (using gcc from base)
 - core i7 class processors run best with -march=nocona (using gcc from base)

...

 More information, detailed test results and test configuration are at
 our blog:
 http://blog.vx.sk/archives/25-FreeBSD-Compiler-Benchmark-gcc-base-vs-gcc-ports-vs-clang.html

Methodological objections aside, thank you for conducting tests and
publishing the results. Are you going to continue to conduct tests as
lang/gcc4*  (the default for USE_GCC/USE_FORTRAN may be switched from
4.5 to 4.6 after the upcoming release of 4.6) and clang (there seem to
be improvements in the more recent versions -- e.g.,
http://llvm.org/viewvc/llvm-project?view=revrevision=127208 ) are
updated?

b.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org