Re: [boinc_dev] Astonishing effect of Link Time Optimisation on BOINC client

2016-05-09 Thread Richard Haselgrove
Doesn't the formal definition of the benchmarks (both Dhrystone and Whetstone) 
attempt to exclude compiler optimisations?
OK, so we've already broken that principle by allowing optimisation in the 
Android benchmark code, but I think that it would be wise to evaluate the 
effect of LTO on a sample Linux project science applications, before distorting 
the x86 operating system balance in this way.
You raise the dread word "credit". It is possible to distort the supposed 
definition of the cobblestone in various ways, by optimising or dis-optimising 
either the benchmark code, the scientific application code, or both. Rather 
than making unilateral changes, wouldn't it be better to have a proper review 
of the now rather elderly CreditNew framework? 

On Monday, 9 May 2016, 11:07, Steffen Möller  wrote:
 

 Dear all,

some of you may be aware of the BOINC client shipping with Debian,
Ubuntu, Mint and the other .deb-based Linux distributions [1]. While
updating to 7.6.32 we have revised our invocation of the Link Time
Optimisation (LTO) [2]. This had the effect of a notable reduction
of the disk-space occupied by the (stripped) binaries:

previous revised change
 7.6.31  7.6.32  
1292528  932032  -28%  /usr/bin/boinc
  34904    30864  -16%  /usr/bin/boinccmd
3657424  2292936  -37%  /usr/bin/boincmgr
  26464    26464    0%  /usr/lib/x86_64-linux-gnu/libboinc_crypt.so.7.6.32
 665960  665960    0%  /usr/lib/x86_64-linux-gnu/libboinc.so.7.6.32
 467392  467392    0%  /usr/lib/x86_64-linux-gnu/libboinc_zip.so.7.6.32

We cannot assess the effect on BOINC. But we reckon that the effort to
quickly bring BOINC functionality into action and then hide it again is
likely proportional to the reduction of the code base. LTO is also know
to help the computation - not surprisingly so since less code needs to be
processed for the same functionality. Completely ignorant of the complexity
of the BOINC benchmark, we just went and checked *boinccmd --run_benchmarks*
with the following results:

v7.6.31 (prior to revision)

09-May-2016 03:39:53 [---]    3973 floating point MIPS (Whetstone) per CPU
09-May-2016 03:39:53 [---]    17322 integer MIPS (Dhrystone) per CPU
09-May-2016 03:42:22 [---]    4087 floating point MIPS (Whetstone) per CPU
09-May-2016 03:42:22 [---]    17774 integer MIPS (Dhrystone) per CPU
09-May-2016 03:43:17 [---]    4069 floating point MIPS (Whetstone) per CPU
09-May-2016 03:43:17 [---]    17988 integer MIPS (Dhrystone) per CPU

v7.6.32 (after revision)
09-May-2016 03:45:18 [---]    3961 floating point MIPS (Whetstone) per CPU
09-May-2016 03:45:18 [---]    78155 integer MIPS (Dhrystone) per CPU
09-May-2016 03:46:35 [---]    4057 floating point MIPS (Whetstone) per CPU
09-May-2016 03:46:35 [---]    76289 integer MIPS (Dhrystone) per CPU
09-May-2016 03:48:42 [---]    3962 floating point MIPS (Whetstone) per CPU
09-May-2016 03:48:42 [---]    76374 integer MIPS (Dhrystone) per CPU

We tried two different computers with equivalent quadruplication of the
integer performance only. The change is the same as for v7.6.32 without
the LTO corrections, which vividly demonstrates how sensitive the LTO
compiler flags are. The -flto flags we had first added one or two years
ago.

Expected was to observe a 20% speedup that is frequently reported for
LTO. Without further investigation, the 300% we consider to be somehow
explainable - with hindsight - such that the application could be optimised
in a way that it is also faster for regular compilation. But why bother
if LTO has become so usable. And helpful. And controllable by merely 
looking at what it does to the disk footprint.

This email has two messages:
 * If using the Debian client already, please update to 7.6.32 that is now
  in Debian unstable and comes to you any time soon. It reduces the memory
  footprint, which helps the scientific app, a bit, and because of an
  astonishing effect on the benchmark, this also helps your RAC. (This is
  until BOINC developers adopt LTO also for their publicly offered builds.)
 * It hopes to be an eye opener. The scientific applications are the ones
  that should all adopt LTO. It is working, also for static binaries. But
  expect a 20% speedup, not as much as with the client's benchmarks. 

Keep crunching

Gianfranco and Steffen


[1] https://wiki.debian.org/BOINC
[2] https://gcc.gnu.org/onlinedocs/gccint/LTO.html#LTO

___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.


  
___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

[boinc_dev] Astonishing effect of Link Time Optimisation on BOINC client

2016-05-09 Thread Steffen Möller
Dear all,

some of you may be aware of the BOINC client shipping with Debian,
Ubuntu, Mint and the other .deb-based Linux distributions [1]. While
updating to 7.6.32 we have revised our invocation of the Link Time
Optimisation (LTO) [2]. This had the effect of a notable reduction
of the disk-space occupied by the (stripped) binaries:

previous revised change
 7.6.31   7.6.32   
1292528   932032  -28%  /usr/bin/boinc
  3490430864  -16%  /usr/bin/boinccmd
3657424  2292936  -37%  /usr/bin/boincmgr
  26464264640%  /usr/lib/x86_64-linux-gnu/libboinc_crypt.so.7.6.32
 665960   6659600%  /usr/lib/x86_64-linux-gnu/libboinc.so.7.6.32
 467392   4673920%  /usr/lib/x86_64-linux-gnu/libboinc_zip.so.7.6.32

We cannot assess the effect on BOINC. But we reckon that the effort to
quickly bring BOINC functionality into action and then hide it again is
likely proportional to the reduction of the code base. LTO is also know
to help the computation - not surprisingly so since less code needs to be
processed for the same functionality. Completely ignorant of the complexity
of the BOINC benchmark, we just went and checked *boinccmd --run_benchmarks*
with the following results:

v7.6.31 (prior to revision)

09-May-2016 03:39:53 [---]3973 floating point MIPS (Whetstone) per CPU
09-May-2016 03:39:53 [---]17322 integer MIPS (Dhrystone) per CPU
09-May-2016 03:42:22 [---]4087 floating point MIPS (Whetstone) per CPU
09-May-2016 03:42:22 [---]17774 integer MIPS (Dhrystone) per CPU
09-May-2016 03:43:17 [---]4069 floating point MIPS (Whetstone) per CPU
09-May-2016 03:43:17 [---]17988 integer MIPS (Dhrystone) per CPU

v7.6.32 (after revision)
09-May-2016 03:45:18 [---]3961 floating point MIPS (Whetstone) per CPU
09-May-2016 03:45:18 [---]78155 integer MIPS (Dhrystone) per CPU
09-May-2016 03:46:35 [---]4057 floating point MIPS (Whetstone) per CPU
09-May-2016 03:46:35 [---]76289 integer MIPS (Dhrystone) per CPU
09-May-2016 03:48:42 [---]3962 floating point MIPS (Whetstone) per CPU
09-May-2016 03:48:42 [---]76374 integer MIPS (Dhrystone) per CPU

We tried two different computers with equivalent quadruplication of the
integer performance only. The change is the same as for v7.6.32 without
the LTO corrections, which vividly demonstrates how sensitive the LTO
compiler flags are. The -flto flags we had first added one or two years
ago.

Expected was to observe a 20% speedup that is frequently reported for
LTO. Without further investigation, the 300% we consider to be somehow
explainable - with hindsight - such that the application could be optimised
in a way that it is also faster for regular compilation. But why bother
if LTO has become so usable. And helpful. And controllable by merely 
looking at what it does to the disk footprint.

This email has two messages:
 * If using the Debian client already, please update to 7.6.32 that is now
   in Debian unstable and comes to you any time soon. It reduces the memory
   footprint, which helps the scientific app, a bit, and because of an
   astonishing effect on the benchmark, this also helps your RAC. (This is
   until BOINC developers adopt LTO also for their publicly offered builds.)
 * It hopes to be an eye opener. The scientific applications are the ones
   that should all adopt LTO. It is working, also for static binaries. But
   expect a 20% speedup, not as much as with the client's benchmarks. 

Keep crunching

Gianfranco and Steffen


[1] https://wiki.debian.org/BOINC
[2] https://gcc.gnu.org/onlinedocs/gccint/LTO.html#LTO

___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.