Re: Compile with -fno-omit-frame-pointer on x86_64?

2010-11-04 Thread Dave Jones
On Wed, Nov 03, 2010 at 04:51:01PM -0400, Owen Taylor wrote:

  [ But yes, 4% is a big hit. 1% I would accept without hesitation.
4% does make me hesitate a little bit. During devel cycles, we
accept much more slowdown than that for the debug kernel, 
of course. If we can figure out profiling without frame
pointers, that would be even better ]

I've had a bunch of people talking to me about the impact of the
kernel debugging causing grief for people wanting to do performance work.
Our options as I see it are
- Don't do debug by default, and ship kernel-debug in rawhide like we
  do in releases.  Downside: We lose coverage testing because not everyone
  will run it.
- We do the inverse of what we do in releases, and add a kernel-nodebug package
  I looked into this, and it really uglied up the spec file.
- We do debug off builds on Mondays, and the rest of the weeks builds are
  debug-on like they are now.  This way those doing perf work can just stay
  on the kernel from the beginning of the week.

If we go ahead and do something about that problem, what about just using
-fno-omit-frame-pointer during rawhide builds, and then switching it off
at branch time ?

As for the DWARF unwinder in the kernel.. I wouldn't rule out it ever
making a reappearance, but it really needs a lot more testing before it
gets merged. The reason it got ripped out was that it made backtraces
unreliable, which was the whole reason for even having it, so..
Rather than improve it, and then re-merge it later, the authors seem
to have got discouraged to the point where it just got dropped on the floor.
(that said, it may still be alive in SLES for all I know).

Additionally, back then, x86 maintainence in the kernel was a bit.. random.
It's a lot more focussed these days, so I'm pretty sure Ingo  co could
be persuaded to get something merged as long as it was actually stable
enough to be a viable replacement for the existing kernel backtrace
infrastructure.

Dave

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel


Re: Compile with -fno-omit-frame-pointer on x86_64?

2010-11-03 Thread Jakub Jelinek
On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote:
 Lack of decent profiling is a major problem for making our operating
 system fast. By far the most effective of profiling is sampling profile
 with callgraph information.
 
 Soeren's comment from March:
 
  http://lwn.net/Articles/380582/
 
 Basically summarizes the situation, and as far as I know nothing has
 changed ... with default compilation options, getting callgraph
 profiling on x86_64 really requires a DWARF unwinder in the kernel.
 Which seems unlikely to happen.

But that's the right thing to do.

 As a developer, your options for profiling are:
 
  - Recompile everything you care about profiling 
with -fno-omit-frame-pointer instead of using system packages.

Instead of this, which really is a big performance penalty.  Even i?86 is
changing in GCC 4.6 to not do -fno-omit-frame-pointer by default.
The unwind info recent GCCs provide is correct even in epilogues and can be
relied upon.  There are several lightweight unwinders that can be easily
adapted for kernel purposes.  Just talk to the systemtap folks.

There is always callgrind if you don't want to recompile anything and
need to profile something even when kernel doesn't support it.

Jakub
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel


Re: Compile with -fno-omit-frame-pointer on x86_64?

2010-11-03 Thread Owen Taylor
On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote:
 On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote:
  Lack of decent profiling is a major problem for making our operating
  system fast. By far the most effective of profiling is sampling profile
  with callgraph information.
  
  Soeren's comment from March:
  
   http://lwn.net/Articles/380582/
  
  Basically summarizes the situation, and as far as I know nothing has
  changed ... with default compilation options, getting callgraph
  profiling on x86_64 really requires a DWARF unwinder in the kernel.
  Which seems unlikely to happen.
 
 But that's the right thing to do.
 
  As a developer, your options for profiling are:
  
   - Recompile everything you care about profiling 
 with -fno-omit-frame-pointer instead of using system packages.
 
 Instead of this, which really is a big performance penalty. 

Do you have a sense of the quantification of big here? I know in
compiler terms, 1% is big, but we're no where close to wringing the last
1% out of overall Fedora performance. If you create a sufficiently
complex system, there's lots of stupid stuff going on. And you can't
find the stupid stuff without appropriate tools.

 Even i?86 is
 changing in GCC 4.6 to not do -fno-omit-frame-pointer by default.
 The unwind info recent GCCs provide is correct even in epilogues and can be
 relied upon.  There are several lightweight unwinders that can be easily
 adapted for kernel purposes.  Just talk to the systemtap folks.

It seems like if it was that easy, it would have happened and we'd have
a solution in the upstream kernel...

(One thing that definitely makes things tricky is paging in debuginfo. I
think I saw a discussion somewhere that systemtap preemptively was
paging in all debuginfo for traced modules. That's tricky in systemwide
profiling situations, but maybe you could have something where you do
one run, load the debuginfo for everything that was hit in the first
run, then do a second run.)

 There is always callgrind if you don't want to recompile anything and
 need to profile something even when kernel doesn't support it.

callgrind is reasonable if you a single program that is slow and where
the slowness is pretty much straightup CPU.

But we're seldom trying to profile a program - we are trying to
profile system situations that involve several programs and the kernel.

And programs are frequently not straight-up bound on things that
valgrind can easily model. For example, if our program is reading from
uncached graphics memory somewhere, that won't show up at all in
callgrind - to callgrind, it's just memory reads. But it may dominate a
more accurate sampled profile.

Plus the performance hit of callgrind makes it not very useful for
real-time interactive user interface.

- Owen


-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel


Re: Compile with -fno-omit-frame-pointer on x86_64?

2010-11-03 Thread Jakub Jelinek
On Wed, Nov 03, 2010 at 03:20:59PM -0400, Owen Taylor wrote:
 On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote:
  On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote:

  Instead of this, which really is a big performance penalty. 
 
 Do you have a sense of the quantification of big here? I know in
 compiler terms, 1% is big, but we're no where close to wringing the last
 1% out of overall Fedora performance. If you create a sufficiently
 complex system, there's lots of stupid stuff going on. And you can't
 find the stupid stuff without appropriate tools.

The last numbers I was pointed at for x86_64 were 4% slowdown, which
really is a lot and it takes several years to achieve that improvement on the
compiler side.

 It seems like if it was that easy, it would have happened and we'd have
 a solution in the upstream kernel...

I think we had one in the upstream kernel for some time, then Linus just
didn't like to see it needing too many bugfixes needed for it and nuked it.

 (One thing that definitely makes things tricky is paging in debuginfo. I
 think I saw a discussion somewhere that systemtap preemptively was
 paging in all debuginfo for traced modules. That's tricky in systemwide

Yeah, systemtap does that (and has that in kernel unwinder for userspace).

Jakub
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel


Re: Compile with -fno-omit-frame-pointer on x86_64?

2010-11-03 Thread John Reiser
On 11/03/2010 11:48 AM, Owen Taylor wrote:
 Lack of decent profiling is a major problem for making our operating
 system fast. By far the most effective of profiling is sampling profile
 with callgraph information.

I am the author of tsprof,  http://bitwagon.com/tsprof/tsprof.html .
Eight years ago that app provided everything you desire, and with
no compilation flags necessary: not -pg, not -p.  [The implementation
is equivalent to infecting the memory image of the application with
a profiling virus and it was at process entry in just a couple
seconds.]  But nobody would pay for it on i686, so the product
was abandoned despite a working prototype for x86_64.

A few years before that, there was TracePoint Technology, a startup
funded by venture capital that offered nifty profiling tools:
http://venturebeatprofiles.com/company/profile/tracepoint-technology
Soon they were acquired by Digital Equipment Corp and died with DEC.

Over several years, dueling proposals (perfctr, perfmon, perfmon2)
failed to get into the Linux kernel.  Then the CPU and motherboard
designers made the underlying hardware counter (RDTSC) unreliable
in too many cases (non-constant frequency, not synchronized for SMP,
arbitrarily scribbled by SystemManagementMode, ...).

Today the infrastructure work for kernel ftrace comes close to what
is required for use by apps, but gcc still won't do exactly the
right thing.

In short, those who want profiling have failed repeatedly to present
an _effective_ case.

What are you doing to do differently this time?
[The workaround is to spend a week learning how to run oprofile
and interpret its output.]

-- 
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel


Re: Compile with -fno-omit-frame-pointer on x86_64?

2010-11-03 Thread Adam Jackson
On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote:
 On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote:
  Basically summarizes the situation, and as far as I know nothing has
  changed ... with default compilation options, getting callgraph
  profiling on x86_64 really requires a DWARF unwinder in the kernel.
  Which seems unlikely to happen.
 
 But that's the right thing to do.

Sure, but so is a kernel debugger, and it's taken us over ten years to
get one.  I'm pretty okay with doing something wrong now if it gets me
something usable for long enough to get something right later.  I'll
take 4% across the board if it helps me find the 20% that matters.

 There is always callgrind if you don't want to recompile anything and
 need to profile something even when kernel doesn't support it.

I don't want to know how callgrinded X performs, I want to know how X
performs.  callgrind means operations that would be one millisecond
become half a second, and that's thirty frames instead of a sixteenth of
a frame.  That means I end up optimizing for function call cycle counts
instead of fixing my algorithms to not starve the hardware.

If wall time matters, callgrind is the wrong tool, and you need a live
profiler.

- ajax

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel


Re: Compile with -fno-omit-frame-pointer on x86_64?

2010-11-03 Thread Jakub Jelinek
On Wed, Nov 03, 2010 at 04:10:30PM -0400, Adam Jackson wrote:
 On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote:
  On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote:
   Basically summarizes the situation, and as far as I know nothing has
   changed ... with default compilation options, getting callgraph
   profiling on x86_64 really requires a DWARF unwinder in the kernel.
   Which seems unlikely to happen.
  
  But that's the right thing to do.
 
 Sure, but so is a kernel debugger, and it's taken us over ten years to
 get one.  I'm pretty okay with doing something wrong now if it gets me
 something usable for long enough to get something right later.  I'll
 take 4% across the board if it helps me find the 20% that matters.

Most of the time you don't find the 20% improvements with profilers though,
so all we end up with is just slowing everything by 4%.  Definitely a bad
idea, now that per core performance doesn't increase very much and most
programs aren't parallelized at all or just very badly.

Jakub
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel


Re: Compile with -fno-omit-frame-pointer on x86_64?

2010-11-03 Thread Owen Taylor
On Wed, 2010-11-03 at 21:11 +0100, Jakub Jelinek wrote:
 On Wed, Nov 03, 2010 at 04:10:30PM -0400, Adam Jackson wrote:
  On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote:
   On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote:
Basically summarizes the situation, and as far as I know nothing has
changed ... with default compilation options, getting callgraph
profiling on x86_64 really requires a DWARF unwinder in the kernel.
Which seems unlikely to happen.
   
   But that's the right thing to do.
  
  Sure, but so is a kernel debugger, and it's taken us over ten years to
  get one.  I'm pretty okay with doing something wrong now if it gets me
  something usable for long enough to get something right later.  I'll
  take 4% across the board if it helps me find the 20% that matters.
 
 Most of the time you don't find the 20% improvements with profilers though,
 so all we end up with is just slowing everything by 4%.  Definitely a bad
 idea, now that per core performance doesn't increase very much and most
 programs aren't parallelized at all or just very badly.

I would agree that it would be extraordinarily hard to use a profiler to
identify a code change you could make in glibc to make a non-trivial
program 4% faster.

But usually what you want a profiler for is to be able to efficiently
identify the hot spots and do 10 or so 1% changes in a row. And we also
work on a lot of code bases that are a lot less mature an tuned than
glibc. Usually, what we are trying to do is not to figure out the
function we could rewrite with a clever algorithm to do the same thing
faster; we are trying to find out the stuff we are doing that we
shouldn't be doing at all.

The other argument for profiling is that in many cases you want to ask
someone else to get a profile of a situation that is slow for them, that
maybe isn't slow for you. When things are *massively* slow, then it's
pretty easy for me to track that down using top to identify the
massively slow process, and attaching to it with gdb. But it's not
something that's easy to guide someone through over IRC.

I'm sure you wouldn't claim that Fedora as an operating system is within
4% of how fast it could be, or that our most efficient way of making
Fedora faster is compiler optimization :-)

- Owen

[ But yes, 4% is a big hit. 1% I would accept without hesitation.
  4% does make me hesitate a little bit. During devel cycles, we
  accept much more slowdown than that for the debug kernel, 
  of course. If we can figure out profiling without frame
  pointers, that would be even better ]



-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel


Re: Compile with -fno-omit-frame-pointer on x86_64?

2010-11-03 Thread Owen Taylor
On Wed, 2010-11-03 at 20:29 +0100, Jakub Jelinek wrote:

  It seems like if it was that easy, it would have happened and we'd have
  a solution in the upstream kernel...
 
 I think we had one in the upstream kernel for some time, then Linus just
 didn't like to see it needing too many bugfixes needed for it and nuked it.

[..]

  (One thing that definitely makes things tricky is paging in debuginfo. I
  think I saw a discussion somewhere that systemtap preemptively was
  paging in all debuginfo for traced modules. That's tricky in systemwide
 
 Yeah, systemtap does that (and has that in kernel unwinder for userspace).

Looking at systemstap, they are exploiting the fact that they already
have an infrastructure for compiling arbitrary code into modules and
loading it into the kernel. So they've entirely bypassed the question of
how to get a DWARF unwinder upstream into the kernel.

Of course, any profiling framework *could* work with a custom kernel
module, but sysprof was actually kicked out of Fedora for a while
because it had a much simpler not-in-the-kernel module. It now uses
upstream hooks.

So systemtap at best answers the technical questions, and not the
practical question of actually making something happen. If the earlier
experience was a DWARF one got in, then got removed, it's a little hard
for me to see how a DWARF unwinder in the kernel is a practical
direction. Maybe I should have headed down earlier this week and
picketed outside the Kernel summit...

- Owen


-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel


Re: Compile with -fno-omit-frame-pointer on x86_64?

2010-11-03 Thread John Reiser
On 11/03/2010 01:51 PM, Owen Taylor wrote:
 [ But yes, 4% is a big hit. 1% I would accept without hesitation.
   4% does make me hesitate a little bit. During devel cycles, we
   accept much more slowdown than that for the debug kernel, 
   of course. If we can figure out profiling without frame
   pointers, that would be even better ]

Would you accept an overhead of 1 CPU cycle per subroutine call
(all the time in *ALL* code) plus a few dozen cycles per
subroutine call (perhaps restricted to some subset of routines)
in the pieces that were being profiled at the moment?

-- 
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel