Re: [patch 10/10] Scheduler profiling - Use immediate values
On Sat, Jul 07, 2007 at 07:20:12PM +0200, Willy Tarreau wrote: > On Sat, Jul 07, 2007 at 07:01:57PM +0200, Adrian Bunk wrote: > > On Sat, Jul 07, 2007 at 11:45:20AM -0400, Frank Ch. Eigler wrote: >... > > You always have to decide between some debug code and some small bit of > > performance. There's a reason why options to disable things like BUG() > > or printk() are in the kernel config menus hidden behind CONFIG_EMBEDDED > > although they obviously have some performance impact. > > It is not for the CPU performance they can be disabled, but for the code > size which is a real problem on embedded system. While you often have > mem/cpu_mhz ratios around 1GB/1GHz on servers and desktops, you more often > have ratios like 16MB/500MHz which is 1:32 of the former. That's why you > optimize for size at the expense of speed on such systems. The latter is not true for my two examples. CONFIG_PRINTK=n, CONFIG_BUG=n will obviously make the kernel both smaller and faster. [1] > Regards, > Willy cu Adrian [1] faster due to less code to execute and positive cache effects due to the smaller code [2] [2] whether the "faster" is big enough that it is in any way measurable is a different question -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
Hi, Adrian - On Sat, Jul 07, 2007 at 07:01:57PM +0200, Adrian Bunk wrote: > [...] > > Things are not so simple. One might not know that one has a > > performance problem until one tries some analysis tools. Rebooting > > into different kernels just to investigate does not work generally [...] > > I'm not getting this: > > You'll only start looking into an analysis tool if you have a > performance problem, IOW if you are not satisfied with the > performance. There may be people whose jobs entail continually suspecting performance problems. Or one may run instrumentation code on a long-term basis specifically to locate performance spikes. > And the debug code will not have been tested on this machine no matter > whether it's enabled through a compile option or at runtime. There is a big difference in favour of the former. The additional instrumentation code may be small enough to inspect carefully. The rest of the kernel would be unaffected. > [...] If you might be able to get a big part of tracing and other > debug code enabled with a performance penalty of a few percent of > _kernel_ performance, then you might get much debugging aid without > any effective impact on application performance. Agreed. - FChE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Sat, Jul 07, 2007 at 07:01:57PM +0200, Adrian Bunk wrote: > On Sat, Jul 07, 2007 at 11:45:20AM -0400, Frank Ch. Eigler wrote: > > Adrian Bunk <[EMAIL PROTECTED]> writes: > > > > > [...] > > > profiling = debugging of performance problems > > > > Indeed. > > > > > My words were perhaps a bit sloppy, but profiling isn't part of > > > normal operation and if people use a separate kernel for such > > > purposes we don't need infrastructure for reducing performance > > > penalties of enabled debug options. > > > > Things are not so simple. One might not know that one has a > > performance problem until one tries some analysis tools. Rebooting > > into different kernels just to investigate does not work generally: > > the erroneous phenomenon may have been short lived; the debug kernel, > > being "only" for debugging, may not be well tested => sufficiently > > trustworthy. > > I'm not getting this: > > You'll only start looking into an analysis tool if you have a > performance problem, IOW if you are not satisfied with the performance. > > And the debug code will not have been tested on this machine no matter > whether it's enabled through a compile option or at runtime. At least all the rest of the code will be untouched and you will not have to reboot the machine. If you reboot to another kernel, nothing ensures that you will have the same code sequences (in fact, gcc will reorder some parts of code such as loops just because of an additional 'if'). So you know that the non-debug code you run will remain untouched. *This* is important, because the debug code is not there to debug itself, but to debug the rest. > > Your question asking for an actual performance impact of dormant hooks > > is OTOH entirely legitimate. It clearly depends on the placement of > > those hooks and thus their encounter rate, more so than their > > underlying technology (markers with whatever optimizations). If the > > cost is small enough, you will likely find that people will be willing > > to pay a small fraction of average performance, in order to eke out > > large gains when finding occasional e.g. algorithmic bugs. > > If you might be able to get a big part of tracing and other debug code > enabled with a performance penalty of a few percent of _kernel_ > performance, then you might get much debugging aid without any effective > impact on application performance. it largely depends on the application. Applications which require a lot of system calls will be more sensible to kernel debugging. Common sense also implies that such applications will be the ones for which kernel debugging will be relevant. > You always have to decide between some debug code and some small bit of > performance. There's a reason why options to disable things like BUG() > or printk() are in the kernel config menus hidden behind CONFIG_EMBEDDED > although they obviously have some performance impact. It is not for the CPU performance they can be disabled, but for the code size which is a real problem on embedded system. While you often have mem/cpu_mhz ratios around 1GB/1GHz on servers and desktops, you more often have ratios like 16MB/500MHz which is 1:32 of the former. That's why you optimize for size at the expense of speed on such systems. > > > - FChE > > cu > Adrian Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Sat, Jul 07, 2007 at 11:45:20AM -0400, Frank Ch. Eigler wrote: > Adrian Bunk <[EMAIL PROTECTED]> writes: > > > [...] > > profiling = debugging of performance problems > > Indeed. > > > My words were perhaps a bit sloppy, but profiling isn't part of > > normal operation and if people use a separate kernel for such > > purposes we don't need infrastructure for reducing performance > > penalties of enabled debug options. > > Things are not so simple. One might not know that one has a > performance problem until one tries some analysis tools. Rebooting > into different kernels just to investigate does not work generally: > the erroneous phenomenon may have been short lived; the debug kernel, > being "only" for debugging, may not be well tested => sufficiently > trustworthy. I'm not getting this: You'll only start looking into an analysis tool if you have a performance problem, IOW if you are not satisfied with the performance. And the debug code will not have been tested on this machine no matter whether it's enabled through a compile option or at runtime. > Your question asking for an actual performance impact of dormant hooks > is OTOH entirely legitimate. It clearly depends on the placement of > those hooks and thus their encounter rate, more so than their > underlying technology (markers with whatever optimizations). If the > cost is small enough, you will likely find that people will be willing > to pay a small fraction of average performance, in order to eke out > large gains when finding occasional e.g. algorithmic bugs. If you might be able to get a big part of tracing and other debug code enabled with a performance penalty of a few percent of _kernel_ performance, then you might get much debugging aid without any effective impact on application performance. You always have to decide between some debug code and some small bit of performance. There's a reason why options to disable things like BUG() or printk() are in the kernel config menus hidden behind CONFIG_EMBEDDED although they obviously have some performance impact. > - FChE cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
Adrian Bunk <[EMAIL PROTECTED]> writes: > [...] > profiling = debugging of performance problems Indeed. > My words were perhaps a bit sloppy, but profiling isn't part of > normal operation and if people use a separate kernel for such > purposes we don't need infrastructure for reducing performance > penalties of enabled debug options. Things are not so simple. One might not know that one has a performance problem until one tries some analysis tools. Rebooting into different kernels just to investigate does not work generally: the erroneous phenomenon may have been short lived; the debug kernel, being "only" for debugging, may not be well tested => sufficiently trustworthy. Your question asking for an actual performance impact of dormant hooks is OTOH entirely legitimate. It clearly depends on the placement of those hooks and thus their encounter rate, more so than their underlying technology (markers with whatever optimizations). If the cost is small enough, you will likely find that people will be willing to pay a small fraction of average performance, in order to eke out large gains when finding occasional e.g. algorithmic bugs. - FChE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Sat, Jul 07, 2007 at 06:03:07AM +0200, Adrian Bunk wrote: > On Fri, Jul 06, 2007 at 10:35:11PM -0400, Mathieu Desnoyers wrote: > > * Adrian Bunk ([EMAIL PROTECTED]) wrote: > > > On Fri, Jul 06, 2007 at 07:43:15PM -0400, Mathieu Desnoyers wrote: > > > > * Adrian Bunk ([EMAIL PROTECTED]) wrote: > > > > > On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote: > > > > > > On 07/06/2007 07:44 AM, Andi Kleen wrote: > > > > > > > I think the optimization is a good idea, although i dislike it > > > > > > > that it is complicated for the dynamic markers. If it was just > > > > > > > static it would be much simpler. > > > > > > > > > > > > Another thing to consider is that there might be hundreds of these > > > > > > probes/tracepoints active in an instrumented kernel. The overhead > > > > > > adds up fast, so the gain may be worth all the pain. > > > > > > > > > > Only if you want to squeeze the last bit of performance out of > > > > > _debugging_ functionality. > > > > > > > > > > You avoid all the pain if you simply don't use debugging > > > > > functionality > > > > > on production systems. > > > > > > > > > > > > > Adrian, > > > > > > > > Please have a look at my markers posts, especially: > > > > > > > > http://www.ussg.iu.edu/hypermail/linux/kernel/0707.0/0669.html > > > > > > > > And also look into OLS 2007 proceedings for Martin Bligh's paper on > > > > Debugging Google sized clusters. It basically makes the case for adding > > > > functionnality to debug _user space_ problems on production systems that > > > > can be turned on dynamically. > > > > > > Using a different kernel for tracing still fulfills all the requirements > > > listed in section 5 of your paper... > > > > > > > Not exactly. I assume you understand that rebooting 1000 live production > > servers to find the source of a rare bug or the cause of a performance > > issue is out of question. > > > > Moreover, strategies like enabling flight recorder traces on a few nodes > > on demand to detect performance problems can only be deployed in > > production environment if they are part of the standard production > > kernel. > > > > Also, managing two different kernels is often out of question. Not only > > is it a maintainance burden, but just switching to the "debug" kernel > > can impact the system's behavior so badly that it could make the problem > > disappear. > > As can turning tracing on at runtime. > > And you can always define requirements in a way that your solution is > the only one... On large production environments, you always lose a certain percentage of machines at each reboot. Most often, it's the CR2032 lithium battery which is dead and which causes all or parts of the CMOS settings to vanish, hanging the system at boot. Then you play with the ON/OFF switch and a small percentage of the power supplies refuse to restart and some disks refuse to spin up. Fortunately this does not happen with all machines, but if you have such problems with 1% of your machines, you lose 10 machines when you reboot 1000 of them. Those problems require a lot of man power, which explains why such systems are rarely updated. Causing that much trouble just to enable debugging is clearly unacceptable, and your debug kernel will simply never be used. Not to mention the fact that people will never trust it because it's almost never used ! Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Fri, Jul 06, 2007 at 10:35:11PM -0400, Mathieu Desnoyers wrote: > * Adrian Bunk ([EMAIL PROTECTED]) wrote: > > On Fri, Jul 06, 2007 at 07:43:15PM -0400, Mathieu Desnoyers wrote: > > > * Adrian Bunk ([EMAIL PROTECTED]) wrote: > > > > On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote: > > > > > On 07/06/2007 07:44 AM, Andi Kleen wrote: > > > > > > I think the optimization is a good idea, although i dislike it > > > > > > that it is complicated for the dynamic markers. If it was just > > > > > > static it would be much simpler. > > > > > > > > > > Another thing to consider is that there might be hundreds of these > > > > > probes/tracepoints active in an instrumented kernel. The overhead > > > > > adds up fast, so the gain may be worth all the pain. > > > > > > > > Only if you want to squeeze the last bit of performance out of > > > > _debugging_ functionality. > > > > > > > > You avoid all the pain if you simply don't use debugging functionality > > > > on production systems. > > > > > > > > > > Adrian, > > > > > > Please have a look at my markers posts, especially: > > > > > > http://www.ussg.iu.edu/hypermail/linux/kernel/0707.0/0669.html > > > > > > And also look into OLS 2007 proceedings for Martin Bligh's paper on > > > Debugging Google sized clusters. It basically makes the case for adding > > > functionnality to debug _user space_ problems on production systems that > > > can be turned on dynamically. > > > > Using a different kernel for tracing still fulfills all the requirements > > listed in section 5 of your paper... > > > > Not exactly. I assume you understand that rebooting 1000 live production > servers to find the source of a rare bug or the cause of a performance > issue is out of question. > > Moreover, strategies like enabling flight recorder traces on a few nodes > on demand to detect performance problems can only be deployed in > production environment if they are part of the standard production > kernel. > > Also, managing two different kernels is often out of question. Not only > is it a maintainance burden, but just switching to the "debug" kernel > can impact the system's behavior so badly that it could make the problem > disappear. As can turning tracing on at runtime. And you can always define requirements in a way that your solution is the only one... Let's go to a different point: Your paper says "When not running, must have zero effective impact." How big is the measured impact of your markers when not used without any immediate voodoo? You have sent many numbers about micro-benchmarks and theoretical numbers, but if you have sent the interesting numbers comparing 1. MARKERS=n 2. MARKERS=y, IMMEDIATE=n 3. MARKERS=y, IMMEDIATE=y in actual benchmark testing I must have missed it. Does 3. have a measurable and effective advantage over 2. or are you optimizing for some 0.01% or 1% performance difference without any effective impact and therefore not requred for the goals outlined in your paper? > Mathieu cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
* Adrian Bunk ([EMAIL PROTECTED]) wrote: > On Fri, Jul 06, 2007 at 07:43:15PM -0400, Mathieu Desnoyers wrote: > > * Adrian Bunk ([EMAIL PROTECTED]) wrote: > > > On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote: > > > > On 07/06/2007 07:44 AM, Andi Kleen wrote: > > > > > I think the optimization is a good idea, although i dislike it > > > > > that it is complicated for the dynamic markers. If it was just > > > > > static it would be much simpler. > > > > > > > > Another thing to consider is that there might be hundreds of these > > > > probes/tracepoints active in an instrumented kernel. The overhead > > > > adds up fast, so the gain may be worth all the pain. > > > > > > Only if you want to squeeze the last bit of performance out of > > > _debugging_ functionality. > > > > > > You avoid all the pain if you simply don't use debugging functionality > > > on production systems. > > > > > > > Adrian, > > > > Please have a look at my markers posts, especially: > > > > http://www.ussg.iu.edu/hypermail/linux/kernel/0707.0/0669.html > > > > And also look into OLS 2007 proceedings for Martin Bligh's paper on > > Debugging Google sized clusters. It basically makes the case for adding > > functionnality to debug _user space_ problems on production systems that > > can be turned on dynamically. > > Using a different kernel for tracing still fulfills all the requirements > listed in section 5 of your paper... > Not exactly. I assume you understand that rebooting 1000 live production servers to find the source of a rare bug or the cause of a performance issue is out of question. Moreover, strategies like enabling flight recorder traces on a few nodes on demand to detect performance problems can only be deployed in production environment if they are part of the standard production kernel. Also, managing two different kernels is often out of question. Not only is it a maintainance burden, but just switching to the "debug" kernel can impact the system's behavior so badly that it could make the problem disappear. Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Fri, Jul 06, 2007 at 07:43:15PM -0400, Mathieu Desnoyers wrote: > * Adrian Bunk ([EMAIL PROTECTED]) wrote: > > On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote: > > > On 07/06/2007 07:44 AM, Andi Kleen wrote: > > > > I think the optimization is a good idea, although i dislike it > > > > that it is complicated for the dynamic markers. If it was just > > > > static it would be much simpler. > > > > > > Another thing to consider is that there might be hundreds of these > > > probes/tracepoints active in an instrumented kernel. The overhead > > > adds up fast, so the gain may be worth all the pain. > > > > Only if you want to squeeze the last bit of performance out of > > _debugging_ functionality. > > > > You avoid all the pain if you simply don't use debugging functionality > > on production systems. > > > > Adrian, > > Please have a look at my markers posts, especially: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0707.0/0669.html > > And also look into OLS 2007 proceedings for Martin Bligh's paper on > Debugging Google sized clusters. It basically makes the case for adding > functionnality to debug _user space_ problems on production systems that > can be turned on dynamically. Using a different kernel for tracing still fulfills all the requirements listed in section 5 of your paper... > Mathieu cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Fri, Jul 06, 2007 at 07:38:27PM -0400, Dave Jones wrote: > On Sat, Jul 07, 2007 at 01:28:43AM +0200, Adrian Bunk wrote: > > > Only if you want to squeeze the last bit of performance out of > > _debugging_ functionality. > > > > You avoid all the pain if you simply don't use debugging functionality > > on production systems. > > I think you're mixing up profiling and debugging. The former is > extremely valuable under production systems. profiling = debugging of performance problems My words were perhaps a bit sloppy, but profiling isn't part of normal operation and if people use a separate kernel for such purposes we don't need infrastructure for reducing performance penalties of enabled debug options. > Dave cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
* Adrian Bunk ([EMAIL PROTECTED]) wrote: > On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote: > > On 07/06/2007 07:44 AM, Andi Kleen wrote: > > > I think the optimization is a good idea, although i dislike it > > > that it is complicated for the dynamic markers. If it was just > > > static it would be much simpler. > > > > Another thing to consider is that there might be hundreds of these > > probes/tracepoints active in an instrumented kernel. The overhead > > adds up fast, so the gain may be worth all the pain. > > Only if you want to squeeze the last bit of performance out of > _debugging_ functionality. > > You avoid all the pain if you simply don't use debugging functionality > on production systems. > Adrian, Please have a look at my markers posts, especially: http://www.ussg.iu.edu/hypermail/linux/kernel/0707.0/0669.html And also look into OLS 2007 proceedings for Martin Bligh's paper on Debugging Google sized clusters. It basically makes the case for adding functionnality to debug _user space_ problems on production systems that can be turned on dynamically. Mathieu > cu > Adrian > > -- > >"Is there not promise of rain?" Ling Tan asked suddenly out > of the darkness. There had been need of rain for many days. >"Only a promise," Lao Er said. >Pearl S. Buck - Dragon Seed > -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Sat, Jul 07, 2007 at 01:28:43AM +0200, Adrian Bunk wrote: > Only if you want to squeeze the last bit of performance out of > _debugging_ functionality. > > You avoid all the pain if you simply don't use debugging functionality > on production systems. I think you're mixing up profiling and debugging. The former is extremely valuable under production systems. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote: > On 07/06/2007 07:44 AM, Andi Kleen wrote: > > I think the optimization is a good idea, although i dislike it > > that it is complicated for the dynamic markers. If it was just > > static it would be much simpler. > > Another thing to consider is that there might be hundreds of these > probes/tracepoints active in an instrumented kernel. The overhead > adds up fast, so the gain may be worth all the pain. Only if you want to squeeze the last bit of performance out of _debugging_ functionality. You avoid all the pain if you simply don't use debugging functionality on production systems. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On 07/06/2007 07:44 AM, Andi Kleen wrote: > I think the optimization is a good idea, although i dislike it > that it is complicated for the dynamic markers. If it was just > static it would be much simpler. > Another thing to consider is that there might be hundreds of these probes/tracepoints active in an instrumented kernel. The overhead adds up fast, so the gain may be worth all the pain. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Thu, Jul 05, 2007 at 11:46:44AM -0400, Mathieu Desnoyers wrote: > * Bodo Eggert ([EMAIL PROTECTED]) wrote: > > Andi Kleen <[EMAIL PROTECTED]> wrote: > > > Alexey Dobriyan <[EMAIL PROTECTED]> writes: > > >> On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote: > > > > >> > Use immediate values with lower d-cache hit in optimized version as a > > >> > condition for scheduler profiling call. > > >> > > >> I think it's better to put profile.c under CONFIG_PROFILING as > > >> _expected_, so CONFIG_PROFILING=n users won't get any overhead, > > >> immediate or > > >> not. That's what I'm going to do after test-booting bunch of kernels. > > > > > > No, it's better to handle this efficiently at runtime e.g. for > > > distribution kernels. Mathieu's patch is good > > > > IMO you should combine them. For distibutions, it may be good to include > > profiling support unconditionally, but how many of the vanilla kernel users > > are going to use profiling at all? > > For CONFIG_PROFILING, I think of it more like a chicken and egg problem: > as long as it won't be easy to enable when needed in distributions > kernels, few profiling applications will use it. So if you ban it from > distros kernel with a CONFIG option under the pretext that no profiling > application use it, you run straight into a conceptual deadlock. :) > > Another simliar example would be CONFIG_TIMER_STATS, used by powertop, > which users will use to tune their laptops (I got 45 minutes more > battery time on mine thanks to this wonderful tool). Compiling-in, but > dynamically turning it on/off makes sense for a lot of kernel > "profiling/stats extraction" mechanisms like those. > > But I suspect they will be used by distros only when their presence when > disabled will be unnoticeable or when a major portion of their users > will yell loudly enough telling that they want this and this features, > leaving the more specialized minority of distro users without these > features that they need to fine-tune their applications or their kernel. There's a surprisingly simple solution solving the problems you describe: For userspace libraries, the common approach is to get them stripped and have versions with all debugging symbols in some -dbg package. So if you want to debug an application using such a library, you simply install this -dbg package. Just let distributions make the same for the kernel - add a -dbg flavour with many debugging options enabled. It might perhaps run 5% or 20% slower than the regular kernel, but you don't need profiling or powertop during normal operation - these are _debug_ tools. This way, there's no runtime penalty and therefore no trickery required for getting the overhead of _debug code_ lower. > Mathieu cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
* Li, Tong N ([EMAIL PROTECTED]) wrote: > > I found that memory latency is difficult to measure in modern x86 > > CPUs because they have very clever prefetchers that can often > > outwit benchmarks. > > A pointer-chasing program that accesses a random sequence of addresses > usually can produce a good estimate on memory latency. Also, prefetching > can be turned off in BIOS or by modifying the MSRs. > > > Another trap on P4 is that RDTSC is actually quite slow and > synchronizes > > the CPU; that can add large measurement errors. > > > > -Andi > > The cost can be amortized if the portion of memory accesses is long > enough. > > tong > That's what I am currently doing.. the results are coming in a few moments... :) -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch 10/10] Scheduler profiling - Use immediate values
> I found that memory latency is difficult to measure in modern x86 > CPUs because they have very clever prefetchers that can often > outwit benchmarks. A pointer-chasing program that accesses a random sequence of addresses usually can produce a good estimate on memory latency. Also, prefetching can be turned off in BIOS or by modifying the MSRs. > Another trap on P4 is that RDTSC is actually quite slow and synchronizes > the CPU; that can add large measurement errors. > > -Andi The cost can be amortized if the portion of memory accesses is long enough. tong - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Fri, Jul 06, 2007 at 10:50:30AM -0700, Li, Tong N wrote: > > Also cache misses in this situation tend to be much more than 48 > cycles > > (even an K8 with integrated memory controller with fastest DIMMs is > > slower than that) Mathieu probably measured an L2 miss, not a load ^^^ I meant L2 cache hit of course > from > > RAM. > > Load from RAM can be hundreds of ns in the worst case. > > > > The 48 cycles sounds to me like a memory load in an unloaded system, but > it is quite low. I wonder how it was measured... I found that memory latency is difficult to measure in modern x86 CPUs because they have very clever prefetchers that can often outwit benchmarks. Another trap on P4 is that RDTSC is actually quite slow and synchronizes the CPU; that can add large measurement errors. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch 10/10] Scheduler profiling - Use immediate values
> Also cache misses in this situation tend to be much more than 48 cycles > (even an K8 with integrated memory controller with fastest DIMMs is > slower than that) Mathieu probably measured an L2 miss, not a load from > RAM. > Load from RAM can be hundreds of ns in the worst case. > The 48 cycles sounds to me like a memory load in an unloaded system, but it is quite low. I wonder how it was measured... tong - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
Andrew Morton <[EMAIL PROTECTED]> writes: > Is that 48 cycles measured when the target of the read is in L1 cache, as > it would be in any situation which we actually care about? I guess so... The normal situation is big database or other bloated software runs; clears all the dcaches, then enters kernel. Kernel has a cache miss on all its data. But icache access is faster because the CPU prefetches. We've had cases like this -- e.g. the additional dcache line accesses that were added by the new time code in vgettimeofday() were visible in macro benchmarks. Also cache misses in this situation tend to be much more than 48 cycles (even an K8 with integrated memory controller with fastest DIMMs is slower than that) Mathieu probably measured an L2 miss, not a load from RAM. Load from RAM can be hundreds of ns in the worst case. I think the optimization is a good idea, although i dislike it that it is complicated for the dynamic markers. If it was just static it would be much simpler. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
* Andrew Morton ([EMAIL PROTECTED]) wrote: > On Thu, 5 Jul 2007 13:21:20 -0700 > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > Please prepare and maintain a short document which describes the > > justification for making all these changes to the kernel. > > oh, you did. It's there in the add-kconfig-stuff patch. Yes, if you feel it should be put in a different patch header, I'll move it. -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Thu, 5 Jul 2007 13:21:20 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > Please prepare and maintain a short document which describes the > justification for making all these changes to the kernel. oh, you did. It's there in the add-kconfig-stuff patch. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Tue, 3 Jul 2007 14:57:48 -0400 Mathieu Desnoyers <[EMAIL PROTECTED]> wrote: > Measuring the overall impact on the system of this single modification > results in the difference brought by one site within the standard > deviation of the normal samples. It will become significant when the > number of immediate values used instead of global variables at hot > kernel paths (need to ponder with the frequency at which the data is > accessed) will start to be significant compared to the L1 data cache > size. We could characterize this in memory to L1 cache transfers per > seconds. > > On 3GHz P4: > > memory read: ~48 cycles > > So we can definitely say that 48*HZ (approximation of the frequency at > which the scheduler is called) won't make much difference, but as it > grows, it will. > > On a 1000HZ system, it results in: > > 48000 cycles/second, or 16__s/second, or 0.16% speedup. > > However, if we place this in code called much more often, such as > do_page_fault, we get, with an hypotetical scenario of approximation > of 10 page faults per second: > > 480 cycles/s, 1.6ms/second or 0.0016% speedup. > > So as the number of immediate values used increase, the overall memory > bandwidth required by the kernel will go down. Is that 48 cycles measured when the target of the read is in L1 cache, as it would be in any situation which we actually care about? I guess so... Boy, this is a tiny optimisation and boy, you added a pile of tricky new code to obtain it. Frankly, I'm thinking that life would be simpler if we just added static markers and stopped trying to add lots of tricksy maintenance-load-increasing things like this. Ho hum. Need more convincing, please. Also: a while back (maybe as much as a year) we had an extensive discussion regarding whether we want static markers at all in the kernel. The eventual outcome was, I believe, "yes". But our reasons for making that decision appear to have been lost. So if I were to send the markers patches to Linus and he were to ask me "why are you sending these", I'd be forced to answer "I don't know". This is not a good situation. Please prepare and maintain a short document which describes the justification for making all these changes to the kernel. The changelog for the main markers patch wold be an appropriate place for this. The target audience would be kernel developers and it should capture the pro- and con- arguments which were raised during that discussion. Bascially: tell us why we should merge _any_ of this stuff, because I for one have forgotten. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
* Bodo Eggert ([EMAIL PROTECTED]) wrote: > Andi Kleen <[EMAIL PROTECTED]> wrote: > > Alexey Dobriyan <[EMAIL PROTECTED]> writes: > >> On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote: > > >> > Use immediate values with lower d-cache hit in optimized version as a > >> > condition for scheduler profiling call. > >> > >> I think it's better to put profile.c under CONFIG_PROFILING as > >> _expected_, so CONFIG_PROFILING=n users won't get any overhead, immediate > >> or > >> not. That's what I'm going to do after test-booting bunch of kernels. > > > > No, it's better to handle this efficiently at runtime e.g. for > > distribution kernels. Mathieu's patch is good > > IMO you should combine them. For distibutions, it may be good to include > profiling support unconditionally, but how many of the vanilla kernel users > are going to use profiling at all? For CONFIG_PROFILING, I think of it more like a chicken and egg problem: as long as it won't be easy to enable when needed in distributions kernels, few profiling applications will use it. So if you ban it from distros kernel with a CONFIG option under the pretext that no profiling application use it, you run straight into a conceptual deadlock. :) Another simliar example would be CONFIG_TIMER_STATS, used by powertop, which users will use to tune their laptops (I got 45 minutes more battery time on mine thanks to this wonderful tool). Compiling-in, but dynamically turning it on/off makes sense for a lot of kernel "profiling/stats extraction" mechanisms like those. But I suspect they will be used by distros only when their presence when disabled will be unnoticeable or when a major portion of their users will yell loudly enough telling that they want this and this features, leaving the more specialized minority of distro users without these features that they need to fine-tune their applications or their kernel. Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
Andi Kleen <[EMAIL PROTECTED]> wrote: > Alexey Dobriyan <[EMAIL PROTECTED]> writes: >> On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote: >> > Use immediate values with lower d-cache hit in optimized version as a >> > condition for scheduler profiling call. >> >> I think it's better to put profile.c under CONFIG_PROFILING as >> _expected_, so CONFIG_PROFILING=n users won't get any overhead, immediate or >> not. That's what I'm going to do after test-booting bunch of kernels. > > No, it's better to handle this efficiently at runtime e.g. for > distribution kernels. Mathieu's patch is good IMO you should combine them. For distibutions, it may be good to include profiling support unconditionally, but how many of the vanilla kernel users are going to use profiling at all? -- A man inserted an advertisement in the classified: Wife Wanted." The next day he received a hundred letters. They all said the same thing: "You can have mine." Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
Alexey Dobriyan <[EMAIL PROTECTED]> writes: > On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote: > > Use immediate values with lower d-cache hit in optimized version as a > > condition for scheduler profiling call. > > I think it's better to put profile.c under CONFIG_PROFILING as > _expected_, so CONFIG_PROFILING=n users won't get any overhead, immediate or > not. That's what I'm going to do after test-booting bunch of kernels. No, it's better to handle this efficiently at runtime e.g. for distribution kernels. Mathieu's patch is good -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote: > Use immediate values with lower d-cache hit in optimized version as a > condition for scheduler profiling call. I think it's better to put profile.c under CONFIG_PROFILING as _expected_, so CONFIG_PROFILING=n users won't get any overhead, immediate or not. That's what I'm going to do after test-booting bunch of kernels. Thus, enabling CONFIG_PROFILING option will buy you some overhead, again, as _expected_. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Tue, Jul 03, 2007 at 02:57:48PM -0400, Mathieu Desnoyers wrote: > * Alexey Dobriyan ([EMAIL PROTECTED]) wrote: > > On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote: > > > Use immediate values with lower d-cache hit in optimized version as a > > > condition for scheduler profiling call. > > > > How much difference in performance do you see? > > > > Hi Alexey, > > Please have a look at Documentation/immediate.txt for that information. > Also note that the main advantage of the load immediate is to free a > cache line. Therefore, I guess the best way to quantify the improvement > it brings at one single site is not in terms of cycles, but in terms of > number of cache lines used by the scheduler code. Since memory bandwidth > seems to be an increasing bottleneck (CPU frequency increases faster > than the available memory bandwidth), it makes sense to free as much > cache lines as we can. > > Measuring the overall impact on the system of this single modification > results in the difference brought by one site within the standard > deviation of the normal samples. It will become significant when the > number of immediate values used instead of global variables at hot > kernel paths (need to ponder with the frequency at which the data is > accessed) will start to be significant compared to the L1 data cache > size. L1 cache is 8K here. Just how many such variables should exist? On hot paths! > We could characterize this in memory to L1 cache transfers per > seconds. > > On 3GHz P4: > > memory read: ~48 cycles > > So we can definitely say that 48*HZ (approximation of the frequency at > which the scheduler is called) won't make much difference, but as it > grows, it will. > > On a 1000HZ system, it results in: > > 48000 cycles/second, or 16µs/second, or 0.16% speedup. > > However, if we place this in code called much more often, such as > do_page_fault, we get, with an hypotetical scenario of approximation > of 10 page faults per second: > > 480 cycles/s, 1.6ms/second or 0.0016% speedup. > > So as the number of immediate values used increase, the overall memory > bandwidth required by the kernel will go down. Adding so many infrastructure for something that you can't even measure is totally unjustified. There are already too many places where unlikely() and __read_mostly are used just because they can be used, so adding yet another such very specific, let's call it annotation, seems wrong to me. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Tue, Jul 03, 2007 at 02:57:48PM -0400, Mathieu Desnoyers wrote: > * Alexey Dobriyan ([EMAIL PROTECTED]) wrote: > > On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote: > > > Use immediate values with lower d-cache hit in optimized version as a > > > condition for scheduler profiling call. > > > > How much difference in performance do you see? > > > > Hi Alexey, > > Please have a look at Documentation/immediate.txt for that information. > Also note that the main advantage of the load immediate is to free a > cache line. Therefore, I guess the best way to quantify the improvement > it brings at one single site is not in terms of cycles, but in terms of > number of cache lines used by the scheduler code. Since memory bandwidth > seems to be an increasing bottleneck (CPU frequency increases faster > than the available memory bandwidth), it makes sense to free as much > cache lines as we can. > > Measuring the overall impact on the system of this single modification > results in the difference brought by one site within the standard > deviation of the normal samples. It will become significant when the > number of immediate values used instead of global variables at hot > kernel paths (need to ponder with the frequency at which the data is > accessed) will start to be significant compared to the L1 data cache > size. We could characterize this in memory to L1 cache transfers per > seconds. > > On 3GHz P4: > > memory read: ~48 cycles > > So we can definitely say that 48*HZ (approximation of the frequency at > which the scheduler is called) won't make much difference, but as it > grows, it will. > > On a 1000HZ system, it results in: > > 48000 cycles/second, or 16µs/second, or 0.16% speedup. > > However, if we place this in code called much more often, such as > do_page_fault, we get, with an hypotetical scenario of approximation > of 10 page faults per second: > > 480 cycles/s, 1.6ms/second or 0.0016% speedup. > > So as the number of immediate values used increase, the overall memory > bandwidth required by the kernel will go down. Might make a nice scientific paper, but even according to your own optimistic numbers it's not realistic that you will ever achieve any visible improvement even if you'd find 100 places in hotpaths you could mark this way. And a better direction for hotpaths seems to be Andi's __cold/COLD in -mm without adding an own framework for doing such things. > Mathieu cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
* Alexey Dobriyan ([EMAIL PROTECTED]) wrote: > On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote: > > Use immediate values with lower d-cache hit in optimized version as a > > condition for scheduler profiling call. > > How much difference in performance do you see? > Hi Alexey, Please have a look at Documentation/immediate.txt for that information. Also note that the main advantage of the load immediate is to free a cache line. Therefore, I guess the best way to quantify the improvement it brings at one single site is not in terms of cycles, but in terms of number of cache lines used by the scheduler code. Since memory bandwidth seems to be an increasing bottleneck (CPU frequency increases faster than the available memory bandwidth), it makes sense to free as much cache lines as we can. Measuring the overall impact on the system of this single modification results in the difference brought by one site within the standard deviation of the normal samples. It will become significant when the number of immediate values used instead of global variables at hot kernel paths (need to ponder with the frequency at which the data is accessed) will start to be significant compared to the L1 data cache size. We could characterize this in memory to L1 cache transfers per seconds. On 3GHz P4: memory read: ~48 cycles So we can definitely say that 48*HZ (approximation of the frequency at which the scheduler is called) won't make much difference, but as it grows, it will. On a 1000HZ system, it results in: 48000 cycles/second, or 16µs/second, or 0.16% speedup. However, if we place this in code called much more often, such as do_page_fault, we get, with an hypotetical scenario of approximation of 10 page faults per second: 480 cycles/s, 1.6ms/second or 0.0016% speedup. So as the number of immediate values used increase, the overall memory bandwidth required by the kernel will go down. Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/10] Scheduler profiling - Use immediate values
On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote: > Use immediate values with lower d-cache hit in optimized version as a > condition for scheduler profiling call. How much difference in performance do you see? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/