Re: Performance impact of disabling non-clone IPA optimizations for the Linux kernel (was: "GCC options for kernel live-patching")

2018-10-24 Thread Miroslav Benes
On Wed, 24 Oct 2018, Jiri Kosina wrote:

> On Tue, 23 Oct 2018, Nicolai Stange wrote:
> 
> > let me summarize some results from performance comparisons of Linux
> > kernels compiled with and without certain IPA optimizations.
> 
> Thanks a lot for the summary.
> 
> So, would it make sense to submit a patch upstream (with exactly this 
> justification / explanation) that'd basically disable those optimizations 
> for CONFIG_LIVEPATCH=y configs?

It is premature in my opinion. I'd solve it on GCC side first, so that we 
have a new option which would cover everything (see the other emails in 
the thread). Then we can use it in the kernel for CONFIG_LIVEPATCH=y 
configs.

We could disable what is possible to disable even now but it would not 
solve everything and it is questionable if it is worth it then.

Miroslav


Re: Performance impact of disabling non-clone IPA optimizations for the Linux kernel (was: "GCC options for kernel live-patching")

2018-10-24 Thread Jiri Kosina
On Tue, 23 Oct 2018, Nicolai Stange wrote:

> let me summarize some results from performance comparisons of Linux
> kernels compiled with and without certain IPA optimizations.

Thanks a lot for the summary.

So, would it make sense to submit a patch upstream (with exactly this 
justification / explanation) that'd basically disable those optimizations 
for CONFIG_LIVEPATCH=y configs?

Thanks,

-- 
Jiri Kosina
SUSE Labs



Performance impact of disabling non-clone IPA optimizations for the Linux kernel (was: "GCC options for kernel live-patching")

2018-10-23 Thread Nicolai Stange
Hi,

let me summarize some results from performance comparisons of Linux
kernels compiled with and without certain IPA optimizations.

It's a slight abuse of this thread, but I think having the numbers might
perhaps give some useful insights on the potential costs associated with
the -flive-patching discussed here.

All kudos go to Giovanni Gherdovich from the SUSE Performance Team who
did all of the work presented below.

For a TL;DR, see the conclusion at the end of this email.

Martin Jambor  writes:

> (this message is a part of the thread originating with
> https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01018.html)
>
> We have just had a quick discussion with two upstream maintainers of
> Linux kernel live-patching about this and the key points were:
>
> 1. SUSE live-patch creators (and I assume all that use the upstream
>live-patching method) use Martin Liska's (somewhat under-documented)
>-fdump-ipa-clones option and a utility he wrote
>(https://github.com/marxin/kgraft-analysis-tool) to deal with all
>kinds of inlining, IPA-CP and generally all IPA optimizations that
>internally create a clone.  The tool tells them what happened and
>also lists all callers that need to be live-patched.
>
> 2. However, there is growing concern about other IPA analyses that do
>not create a clone but still affect code generation in other
>functions.  Kernel developers have identified and disabled IPA-RA but
>there is more of them such as IPA-modref analysis, stack alignment
>propagation and possibly quite a few others which extract information
>from one function and use it a caller or perhaps even some
>almost-unrelated functions (such as detection of read-only and
>write-only static global variables).
>
>The kernel live-patching community would welcome if GCC had an option
>that could disable all such optimizations/analyses for which it
>cannot provide a list of all affected functions (i.e. which ones need
>to be live-patched if a particular function is).

AFAIU, the currently known IPA optimizations of this category are
(c.f. [1] and [2] from this thread):

 - -fipa-pure-const
 - -fipa-pta
 - -fipa-reference
 - -fipa-ra
 - -fipa-icf
 - -fipa-bit-cp
 - -fipa-vrp
 - and some others which might be problematic but currently can't get
   disabled on the cli:
   - stack alignment requirements
   - duplication of or skipping of alias analysis for
 functions/variables whose address is not taken (I don't know what
 that means, TBH).

Some time ago, Giovanni compared the performance of a kernel compiled with

 -fno-ipa-pure-const
 -fno-ipa-pta
 -fno-ipa-reference
 -fno-ipa-ra
 -fno-ipa-icf
 -fno-ipa-bit-cp
 -fno-ipa-vrp

plus (because I wasn't able to tell whether these are problematic in the
context of live patching)

 -fno-ipa-cp
 -fno-ipa-cp-clone
 -fno-ipa-profile
 -fno-ipa-sra

against a kernel compiled without any of these.

The kernel was a 4.12.14 one with additional patches on top.

The benchmarks had been performed on a smaller and on a bigger machine
each. Specs:
- single socket with a Xeon E3-1240 v5 (Skylake), 4 cores / 8 threads,
  32G of memory (UMA)
- 2 sockets with each one mounting a Xeon E5-2698 v4 (Broadwell) for a
  total of 40 cores / 80 threads and 528G of memory (NUMA)

You can find the results here:

  
https://beta.suse.com/private/nstange/ggherdovich-no-ipa-results/dashboard.html

"laurel2" is the smaller machine, "hardy4" the bigger one.

The numbers presented in the dashboard are a relative measure of how the
no-ipa kernel was performing in comparison to the stock one. "1" means
no change, and, roughly speaking, each deviation by 0.01 from that value
corresponds to an overall performance change of 1%. Depending on the
benchmark, higher means better (e.g. for throughput) or vice versa
(e.g. for latencies). Some of the numbers are highlighted in green or
red. Green means that the no-ipa kernel performs better, red the
contrary.

The sockperf-{tcp,udp}-under-load results are spoiled due to outliers,
probably because of slow vs. fast paths. Please ignore.

(If you're interested in the detailed results, you can click on any of
 those accumulated numbers in the dashboard. Scroll down and you'll find
 some nice plots.)

For the overall outcome, let me quote Giovanni who summarized it nicely:

  What's left in red:

  * fsmark-threaded on laurel2 (skylake 8 cores), down 2%: if you look at the
histograms of files created per seconds, there is never a clear winner
between with and without IPA (except for the single-threaded case). Clean
on hardy4.

  * sockperf-udp-throughput, hardy4: yep this one is statistically
significant (in the plot you clearly see that the green dots are all
below the yellow dots). 4% worst on average. Clean on the other machine.

  * tbench: this one is significant too (look at the histogram, no
overlapping between the two distributions) but it's a curious one,
because on the