Link-time optimization can be turned on by adding the
-flto flag to the proton library build, in both compilation
and linking steps. It offers the possibility of optimizations
using deeper knowledge of the whole program than is available
I have also been trying to get some extra performance by hand-
inlining functions that I select based on valgrind/callgrind
My test procedure has been to run 50 trials, where each trial
is a run of two programs: my psend and precv proton-C clients
written at the Engine level. Each trial involves sending and
receiving 5 million small messages.
The result from each trial is a single high-resolution timing
number. (From just before the sender sends the first message,
to just after the receiver receives the last message.)
The result of each test is a list of 50 of those numbers.
I compare tests using an online Student's T-test calculator.
("Student" was the pen-name of the guy who invented it.
His real name was Gosset, and he was working at the Guiness
Brewery in Dublin when he invented it. I am not making this up.)
The t-test gives a number that indicates the likihood
that the difference between two tests could have happened randomly.
A small t-test result indicates that the difference between
two test is unlikely to have happened randomly. For example
a t-test result of 0.01 means that the difference between your
two tests should only happen 1 time out of 100 times due to
random chance. Smaller results are better.
With 50 sample-points in each test, you can get nice high
certainty as to whether you are seeing real or random results.
All of the results below are hyper-significant. The *worst*
t-test result was 2.9e-8, i.e. 3 chances out of 100 million
that the difference between the two tests could happen randomly.
So .. here are the results. (in seconds)
( builds used throughout are normal release-with-debug-info,
with -O2 optimization. )
1. Proton code as of 0800 EDT yesterday, with no changes.
mean 41.267825 sigma 0.834826
2. LTO build
mean 40.073661 sigma 1.108513 improvement: 2.9%
3. manual inlining changes
mean 39.011794 sigma 1.056831 improvement: 5.5%
4. LTO build plus my changes
mean 39.211283 sigma 1.041303 improvement: 5.0%
So! The LTO technology really works, but it's not as
good as manual inlining based on profiling. In fact
it slows that down a little, probably because it is choosing
some inlining candidates that don't help enough to offset
cache thrash due to code size increase.
so there you go.