Re: Compile with -fno-omit-frame-pointer on x86_64?
On Wed, Nov 03, 2010 at 04:51:01PM -0400, Owen Taylor wrote: [ But yes, 4% is a big hit. 1% I would accept without hesitation. 4% does make me hesitate a little bit. During devel cycles, we accept much more slowdown than that for the debug kernel, of course. If we can figure out profiling without frame pointers, that would be even better ] I've had a bunch of people talking to me about the impact of the kernel debugging causing grief for people wanting to do performance work. Our options as I see it are - Don't do debug by default, and ship kernel-debug in rawhide like we do in releases. Downside: We lose coverage testing because not everyone will run it. - We do the inverse of what we do in releases, and add a kernel-nodebug package I looked into this, and it really uglied up the spec file. - We do debug off builds on Mondays, and the rest of the weeks builds are debug-on like they are now. This way those doing perf work can just stay on the kernel from the beginning of the week. If we go ahead and do something about that problem, what about just using -fno-omit-frame-pointer during rawhide builds, and then switching it off at branch time ? As for the DWARF unwinder in the kernel.. I wouldn't rule out it ever making a reappearance, but it really needs a lot more testing before it gets merged. The reason it got ripped out was that it made backtraces unreliable, which was the whole reason for even having it, so.. Rather than improve it, and then re-merge it later, the authors seem to have got discouraged to the point where it just got dropped on the floor. (that said, it may still be alive in SLES for all I know). Additionally, back then, x86 maintainence in the kernel was a bit.. random. It's a lot more focussed these days, so I'm pretty sure Ingo co could be persuaded to get something merged as long as it was actually stable enough to be a viable replacement for the existing kernel backtrace infrastructure. Dave -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Compile with -fno-omit-frame-pointer on x86_64?
On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote: Lack of decent profiling is a major problem for making our operating system fast. By far the most effective of profiling is sampling profile with callgraph information. Soeren's comment from March: http://lwn.net/Articles/380582/ Basically summarizes the situation, and as far as I know nothing has changed ... with default compilation options, getting callgraph profiling on x86_64 really requires a DWARF unwinder in the kernel. Which seems unlikely to happen. But that's the right thing to do. As a developer, your options for profiling are: - Recompile everything you care about profiling with -fno-omit-frame-pointer instead of using system packages. Instead of this, which really is a big performance penalty. Even i?86 is changing in GCC 4.6 to not do -fno-omit-frame-pointer by default. The unwind info recent GCCs provide is correct even in epilogues and can be relied upon. There are several lightweight unwinders that can be easily adapted for kernel purposes. Just talk to the systemtap folks. There is always callgrind if you don't want to recompile anything and need to profile something even when kernel doesn't support it. Jakub -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Compile with -fno-omit-frame-pointer on x86_64?
On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote: On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote: Lack of decent profiling is a major problem for making our operating system fast. By far the most effective of profiling is sampling profile with callgraph information. Soeren's comment from March: http://lwn.net/Articles/380582/ Basically summarizes the situation, and as far as I know nothing has changed ... with default compilation options, getting callgraph profiling on x86_64 really requires a DWARF unwinder in the kernel. Which seems unlikely to happen. But that's the right thing to do. As a developer, your options for profiling are: - Recompile everything you care about profiling with -fno-omit-frame-pointer instead of using system packages. Instead of this, which really is a big performance penalty. Do you have a sense of the quantification of big here? I know in compiler terms, 1% is big, but we're no where close to wringing the last 1% out of overall Fedora performance. If you create a sufficiently complex system, there's lots of stupid stuff going on. And you can't find the stupid stuff without appropriate tools. Even i?86 is changing in GCC 4.6 to not do -fno-omit-frame-pointer by default. The unwind info recent GCCs provide is correct even in epilogues and can be relied upon. There are several lightweight unwinders that can be easily adapted for kernel purposes. Just talk to the systemtap folks. It seems like if it was that easy, it would have happened and we'd have a solution in the upstream kernel... (One thing that definitely makes things tricky is paging in debuginfo. I think I saw a discussion somewhere that systemtap preemptively was paging in all debuginfo for traced modules. That's tricky in systemwide profiling situations, but maybe you could have something where you do one run, load the debuginfo for everything that was hit in the first run, then do a second run.) There is always callgrind if you don't want to recompile anything and need to profile something even when kernel doesn't support it. callgrind is reasonable if you a single program that is slow and where the slowness is pretty much straightup CPU. But we're seldom trying to profile a program - we are trying to profile system situations that involve several programs and the kernel. And programs are frequently not straight-up bound on things that valgrind can easily model. For example, if our program is reading from uncached graphics memory somewhere, that won't show up at all in callgrind - to callgrind, it's just memory reads. But it may dominate a more accurate sampled profile. Plus the performance hit of callgrind makes it not very useful for real-time interactive user interface. - Owen -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Compile with -fno-omit-frame-pointer on x86_64?
On Wed, Nov 03, 2010 at 03:20:59PM -0400, Owen Taylor wrote: On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote: On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote: Instead of this, which really is a big performance penalty. Do you have a sense of the quantification of big here? I know in compiler terms, 1% is big, but we're no where close to wringing the last 1% out of overall Fedora performance. If you create a sufficiently complex system, there's lots of stupid stuff going on. And you can't find the stupid stuff without appropriate tools. The last numbers I was pointed at for x86_64 were 4% slowdown, which really is a lot and it takes several years to achieve that improvement on the compiler side. It seems like if it was that easy, it would have happened and we'd have a solution in the upstream kernel... I think we had one in the upstream kernel for some time, then Linus just didn't like to see it needing too many bugfixes needed for it and nuked it. (One thing that definitely makes things tricky is paging in debuginfo. I think I saw a discussion somewhere that systemtap preemptively was paging in all debuginfo for traced modules. That's tricky in systemwide Yeah, systemtap does that (and has that in kernel unwinder for userspace). Jakub -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Compile with -fno-omit-frame-pointer on x86_64?
On 11/03/2010 11:48 AM, Owen Taylor wrote: Lack of decent profiling is a major problem for making our operating system fast. By far the most effective of profiling is sampling profile with callgraph information. I am the author of tsprof, http://bitwagon.com/tsprof/tsprof.html . Eight years ago that app provided everything you desire, and with no compilation flags necessary: not -pg, not -p. [The implementation is equivalent to infecting the memory image of the application with a profiling virus and it was at process entry in just a couple seconds.] But nobody would pay for it on i686, so the product was abandoned despite a working prototype for x86_64. A few years before that, there was TracePoint Technology, a startup funded by venture capital that offered nifty profiling tools: http://venturebeatprofiles.com/company/profile/tracepoint-technology Soon they were acquired by Digital Equipment Corp and died with DEC. Over several years, dueling proposals (perfctr, perfmon, perfmon2) failed to get into the Linux kernel. Then the CPU and motherboard designers made the underlying hardware counter (RDTSC) unreliable in too many cases (non-constant frequency, not synchronized for SMP, arbitrarily scribbled by SystemManagementMode, ...). Today the infrastructure work for kernel ftrace comes close to what is required for use by apps, but gcc still won't do exactly the right thing. In short, those who want profiling have failed repeatedly to present an _effective_ case. What are you doing to do differently this time? [The workaround is to spend a week learning how to run oprofile and interpret its output.] -- -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Compile with -fno-omit-frame-pointer on x86_64?
On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote: On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote: Basically summarizes the situation, and as far as I know nothing has changed ... with default compilation options, getting callgraph profiling on x86_64 really requires a DWARF unwinder in the kernel. Which seems unlikely to happen. But that's the right thing to do. Sure, but so is a kernel debugger, and it's taken us over ten years to get one. I'm pretty okay with doing something wrong now if it gets me something usable for long enough to get something right later. I'll take 4% across the board if it helps me find the 20% that matters. There is always callgrind if you don't want to recompile anything and need to profile something even when kernel doesn't support it. I don't want to know how callgrinded X performs, I want to know how X performs. callgrind means operations that would be one millisecond become half a second, and that's thirty frames instead of a sixteenth of a frame. That means I end up optimizing for function call cycle counts instead of fixing my algorithms to not starve the hardware. If wall time matters, callgrind is the wrong tool, and you need a live profiler. - ajax -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Compile with -fno-omit-frame-pointer on x86_64?
On Wed, Nov 03, 2010 at 04:10:30PM -0400, Adam Jackson wrote: On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote: On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote: Basically summarizes the situation, and as far as I know nothing has changed ... with default compilation options, getting callgraph profiling on x86_64 really requires a DWARF unwinder in the kernel. Which seems unlikely to happen. But that's the right thing to do. Sure, but so is a kernel debugger, and it's taken us over ten years to get one. I'm pretty okay with doing something wrong now if it gets me something usable for long enough to get something right later. I'll take 4% across the board if it helps me find the 20% that matters. Most of the time you don't find the 20% improvements with profilers though, so all we end up with is just slowing everything by 4%. Definitely a bad idea, now that per core performance doesn't increase very much and most programs aren't parallelized at all or just very badly. Jakub -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Compile with -fno-omit-frame-pointer on x86_64?
On Wed, 2010-11-03 at 21:11 +0100, Jakub Jelinek wrote: On Wed, Nov 03, 2010 at 04:10:30PM -0400, Adam Jackson wrote: On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote: On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote: Basically summarizes the situation, and as far as I know nothing has changed ... with default compilation options, getting callgraph profiling on x86_64 really requires a DWARF unwinder in the kernel. Which seems unlikely to happen. But that's the right thing to do. Sure, but so is a kernel debugger, and it's taken us over ten years to get one. I'm pretty okay with doing something wrong now if it gets me something usable for long enough to get something right later. I'll take 4% across the board if it helps me find the 20% that matters. Most of the time you don't find the 20% improvements with profilers though, so all we end up with is just slowing everything by 4%. Definitely a bad idea, now that per core performance doesn't increase very much and most programs aren't parallelized at all or just very badly. I would agree that it would be extraordinarily hard to use a profiler to identify a code change you could make in glibc to make a non-trivial program 4% faster. But usually what you want a profiler for is to be able to efficiently identify the hot spots and do 10 or so 1% changes in a row. And we also work on a lot of code bases that are a lot less mature an tuned than glibc. Usually, what we are trying to do is not to figure out the function we could rewrite with a clever algorithm to do the same thing faster; we are trying to find out the stuff we are doing that we shouldn't be doing at all. The other argument for profiling is that in many cases you want to ask someone else to get a profile of a situation that is slow for them, that maybe isn't slow for you. When things are *massively* slow, then it's pretty easy for me to track that down using top to identify the massively slow process, and attaching to it with gdb. But it's not something that's easy to guide someone through over IRC. I'm sure you wouldn't claim that Fedora as an operating system is within 4% of how fast it could be, or that our most efficient way of making Fedora faster is compiler optimization :-) - Owen [ But yes, 4% is a big hit. 1% I would accept without hesitation. 4% does make me hesitate a little bit. During devel cycles, we accept much more slowdown than that for the debug kernel, of course. If we can figure out profiling without frame pointers, that would be even better ] -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Compile with -fno-omit-frame-pointer on x86_64?
On Wed, 2010-11-03 at 20:29 +0100, Jakub Jelinek wrote: It seems like if it was that easy, it would have happened and we'd have a solution in the upstream kernel... I think we had one in the upstream kernel for some time, then Linus just didn't like to see it needing too many bugfixes needed for it and nuked it. [..] (One thing that definitely makes things tricky is paging in debuginfo. I think I saw a discussion somewhere that systemtap preemptively was paging in all debuginfo for traced modules. That's tricky in systemwide Yeah, systemtap does that (and has that in kernel unwinder for userspace). Looking at systemstap, they are exploiting the fact that they already have an infrastructure for compiling arbitrary code into modules and loading it into the kernel. So they've entirely bypassed the question of how to get a DWARF unwinder upstream into the kernel. Of course, any profiling framework *could* work with a custom kernel module, but sysprof was actually kicked out of Fedora for a while because it had a much simpler not-in-the-kernel module. It now uses upstream hooks. So systemtap at best answers the technical questions, and not the practical question of actually making something happen. If the earlier experience was a DWARF one got in, then got removed, it's a little hard for me to see how a DWARF unwinder in the kernel is a practical direction. Maybe I should have headed down earlier this week and picketed outside the Kernel summit... - Owen -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Compile with -fno-omit-frame-pointer on x86_64?
On 11/03/2010 01:51 PM, Owen Taylor wrote: [ But yes, 4% is a big hit. 1% I would accept without hesitation. 4% does make me hesitate a little bit. During devel cycles, we accept much more slowdown than that for the debug kernel, of course. If we can figure out profiling without frame pointers, that would be even better ] Would you accept an overhead of 1 CPU cycle per subroutine call (all the time in *ALL* code) plus a few dozen cycles per subroutine call (perhaps restricted to some subset of routines) in the pieces that were being profiled at the moment? -- -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel