Re: [RFC PATCH 00/11] mcount tracing utility
Steven Rostedt wrote: > The following patch series brings to vanilla Linux a bit of the RT kernel > trace facility. This incorporates the "-pg" profiling option of gcc > that will call the "mcount" function for all functions called in > the kernel. > > This patch series implements the code for x86 (32 and 64 bit), but > other archs can easily be implemented as well. Steven, This is really exciting! As a former maintainer of the (out-of-tree) Kernel Function Trace system, I really welcome this. I'm just getting out from under a backlog of work due to the holiday break, but I'm very interested. I will take a detailed look at this this week. I have been working with -finstrument-functions for a few years now, so I know of a few gotchas with that (e.g. It's currently broken on ARM EABI with GCC 4.x) This bug is one of the issues that has prevented me from attempting to mainline the KFT work this last year. Please keep me CC'ed on developments in this area, and let me know if there are any specific things I can do to help. I'd be very interested in helping out with non-x86 arch support. Regards, -- Tim = Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Corporation of America = -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/11] mcount tracing utility
Steven Rostedt wrote: The following patch series brings to vanilla Linux a bit of the RT kernel trace facility. This incorporates the -pg profiling option of gcc that will call the mcount function for all functions called in the kernel. This patch series implements the code for x86 (32 and 64 bit), but other archs can easily be implemented as well. Steven, This is really exciting! As a former maintainer of the (out-of-tree) Kernel Function Trace system, I really welcome this. I'm just getting out from under a backlog of work due to the holiday break, but I'm very interested. I will take a detailed look at this this week. I have been working with -finstrument-functions for a few years now, so I know of a few gotchas with that (e.g. It's currently broken on ARM EABI with GCC 4.x) This bug is one of the issues that has prevented me from attempting to mainline the KFT work this last year. Please keep me CC'ed on developments in this area, and let me know if there are any specific things I can do to help. I'd be very interested in helping out with non-x86 arch support. Regards, -- Tim = Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Corporation of America = -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/11] mcount tracing utility
Steven Rostedt <[EMAIL PROTECTED]> writes: > The following patch series brings to vanilla Linux a bit of the RT kernel > trace facility. This incorporates the "-pg" profiling option of gcc > that will call the "mcount" function for all functions called in > the kernel. > [...] > [Future:] SystemTap: > -- > One thing that Arnaldo and I discussed last year was using systemtap to > add hooks into the kernel to start and stop tracing. Sure. The dual of this makes sense too: letting systemtap scripts hook up to the mcount callback itself, for purposes beyond just tracing the function calls. > kprobes is too heavy to do on all funtion calls, but it would be > perfect to add to non hot paths to start the tracer and stop the > tracer. (Note that kprobes are not the only event sources systemtap can use: markers, timers, procfs control files, and some others. Any combination of these can be used in a script to express start/stop decisions.) - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/11] mcount tracing utility
Steven Rostedt <[EMAIL PROTECTED]> writes: > The following patch series brings to vanilla Linux a bit of the RT kernel > trace facility. This incorporates the "-pg" profiling option of gcc > that will call the "mcount" function for all functions called in > the kernel. My personal feeling regarding this code was that it would be much simpler/cleaner to implement a driver for the "jump tracer"s implemented in various CPUs. Basically the CPU will write all jumps into a buffer by itself. That allows you to do many traces (although not latency traces) too. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/11] mcount tracing utility
On Thu, 3 Jan 2008, Mathieu Desnoyers wrote: > Hi Steven, > > Great work! Thanks! > > (added Tim Bird, author of KFT/KFI to the CC list) I'm currently investigating using -finstrument-functions instead of -pg, but if the overhead is too much, I may try to incorporate both. > > One interesting aspect of LTTng is that is would be very lightweight. > You seem to use interrupt disabling with your simple tracer and do a > _lot_ of cacheline bouncing (trace_idx[NR_CPUS] is a very good exemple). Please note that this tracer is more of a "simple example". There's lots of improvements that can be made. It was meant more of to show what mcount can bring than to push the tracer itself. I want to stress that the tracer in this patch set is a *much* simplified version of the latency_tracer in the RT patch. I want to start out simple, complexity can come later ;-) > > LTTng would write the information to a per-cpu memory buffer in binary > format. I see that it would be especially useful in flight recorder > mode, where we overwrite the buffers without writing them to disk : when > a problematic condition is reached, (a kernel oops would be a good one), > then we just stop tracing and dump the last buffers to disk. In this > case, we would have the last function calls that led to an OOPS. This sounds great. My hope is that we can get the mcount (or cyg_profile) functionality in the kernel that many different users can deploy. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/11] mcount tracing utility
Hi Steven, Great work! (added Tim Bird, author of KFT/KFI to the CC list) * Steven Rostedt ([EMAIL PROTECTED]) wrote: > ... > > Future: > --- > The way the mcount hook is done here, other utilities can easily add their > own functions. Just care needs to be made not to call anything that is not > marked with notrace, or you will crash the box with recursion. But > even the simple tracer adds a "disabled" feature so in case it happens > to call something that is not marked with notrace, it is a safety net > not to kill the box. > > I was originally going to use the relay system to record the data, but > that had a chance of calling functions not marked with notrace. But, if > for example LTTng wanted to use this, it could disable tracing on a CPU > when doing the calls, and this will protect from recusion. > Yes, I'd love to add this information source to LTTng. It simply boils down to adding a "notrace" flag to LTTng tracing functions. Since I don't use relay code _at all_ in the tracing path, there is no problem with this (I only disable preemption and do "local" atomic operations on per-cpu variables). Then I would have to do the glue code that registers the LTTng handler to your mcount mechanism. One interesting aspect of LTTng is that is would be very lightweight. You seem to use interrupt disabling with your simple tracer and do a _lot_ of cacheline bouncing (trace_idx[NR_CPUS] is a very good exemple). LTTng would write the information to a per-cpu memory buffer in binary format. I see that it would be especially useful in flight recorder mode, where we overwrite the buffers without writing them to disk : when a problematic condition is reached, (a kernel oops would be a good one), then we just stop tracing and dump the last buffers to disk. In this case, we would have the last function calls that led to an OOPS. Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/11] mcount tracing utility
Hi Steven, Great work! (added Tim Bird, author of KFT/KFI to the CC list) * Steven Rostedt ([EMAIL PROTECTED]) wrote: ... Future: --- The way the mcount hook is done here, other utilities can easily add their own functions. Just care needs to be made not to call anything that is not marked with notrace, or you will crash the box with recursion. But even the simple tracer adds a disabled feature so in case it happens to call something that is not marked with notrace, it is a safety net not to kill the box. I was originally going to use the relay system to record the data, but that had a chance of calling functions not marked with notrace. But, if for example LTTng wanted to use this, it could disable tracing on a CPU when doing the calls, and this will protect from recusion. Yes, I'd love to add this information source to LTTng. It simply boils down to adding a notrace flag to LTTng tracing functions. Since I don't use relay code _at all_ in the tracing path, there is no problem with this (I only disable preemption and do local atomic operations on per-cpu variables). Then I would have to do the glue code that registers the LTTng handler to your mcount mechanism. One interesting aspect of LTTng is that is would be very lightweight. You seem to use interrupt disabling with your simple tracer and do a _lot_ of cacheline bouncing (trace_idx[NR_CPUS] is a very good exemple). LTTng would write the information to a per-cpu memory buffer in binary format. I see that it would be especially useful in flight recorder mode, where we overwrite the buffers without writing them to disk : when a problematic condition is reached, (a kernel oops would be a good one), then we just stop tracing and dump the last buffers to disk. In this case, we would have the last function calls that led to an OOPS. Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/11] mcount tracing utility
On Thu, 3 Jan 2008, Mathieu Desnoyers wrote: Hi Steven, Great work! Thanks! (added Tim Bird, author of KFT/KFI to the CC list) I'm currently investigating using -finstrument-functions instead of -pg, but if the overhead is too much, I may try to incorporate both. One interesting aspect of LTTng is that is would be very lightweight. You seem to use interrupt disabling with your simple tracer and do a _lot_ of cacheline bouncing (trace_idx[NR_CPUS] is a very good exemple). Please note that this tracer is more of a simple example. There's lots of improvements that can be made. It was meant more of to show what mcount can bring than to push the tracer itself. I want to stress that the tracer in this patch set is a *much* simplified version of the latency_tracer in the RT patch. I want to start out simple, complexity can come later ;-) LTTng would write the information to a per-cpu memory buffer in binary format. I see that it would be especially useful in flight recorder mode, where we overwrite the buffers without writing them to disk : when a problematic condition is reached, (a kernel oops would be a good one), then we just stop tracing and dump the last buffers to disk. In this case, we would have the last function calls that led to an OOPS. This sounds great. My hope is that we can get the mcount (or cyg_profile) functionality in the kernel that many different users can deploy. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/11] mcount tracing utility
Steven Rostedt [EMAIL PROTECTED] writes: The following patch series brings to vanilla Linux a bit of the RT kernel trace facility. This incorporates the -pg profiling option of gcc that will call the mcount function for all functions called in the kernel. My personal feeling regarding this code was that it would be much simpler/cleaner to implement a driver for the jump tracers implemented in various CPUs. Basically the CPU will write all jumps into a buffer by itself. That allows you to do many traces (although not latency traces) too. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 00/11] mcount tracing utility
The following patch series brings to vanilla Linux a bit of the RT kernel trace facility. This incorporates the "-pg" profiling option of gcc that will call the "mcount" function for all functions called in the kernel. This patch series implements the code for x86 (32 and 64 bit), but other archs can easily be implemented as well. Some Background: A while back, Ingo Molnar and William Lee Irwin III created a latency tracer to find problem latency areas in the kernel for the RT patch. This tracer became a very integral part of the RT kernel in solving where latency hot spots were. One of the features that the latency tracer added was a function trace. This function tracer would record all functions that were called (implemented by the gcc "-pg" option) and would show what was called when interrupts or preemption was turned off. This feature is also very helpful in normal debugging. So it's been talked about taking bits and pieces from the RT latency tracer and bring them to LKML. But no one had the time to do it. Arnaldo Carvalho de Melo took a crack at it. He pulled out the mcount as well as part of the tracing code and made it generic from the point of the tracing code. I'm not sure why this stopped. Probably because Arnaldo is a very busy man, and his efforts had to be utilized elsewhere. While I still maintain my own Logdev utility: http://rostedt.homelinux.com/logdev I came across a need to do the mcount with logdev too. I was successful but found that it became very dependent on a lot of code. One thing that I liked about my logdev utility was that it was very non-intrusive, and has been easy to port from the Linux 2.0 days. I did not want to burden the logdev patch with the intrusiveness of mcount (not really that intrusive, it just needs to add a "notrace" annotation to functions in the kernel that will cause more conflicts in applying patches for me). Being close to the holidays, I grabbed Arnaldos old patches and started massaging them into something that could be useful for logdev, and what I found out (and talking this over with Arnaldo too) that this can be much more useful for others as well. The main thing I changed, was that I made the mcount function itself generic, and not the dependency on the tracing code. That is I added register_mcount_function() and clear_mcount_function() So when ever mcount is enabled and a function is registered that function is called for all functions in the kernel that is not labeled with the "notrace" annotation. The key thing here is that *any* utility can now hook its own function into mcount! The Simple Tracer: -- To show the power of this I also massaged the tracer code that Arnaldo pulled from the RT patch and made it be a nice example of what can be done with this. The function that is registered to mcount has the prototype: void func(unsigned long ip, unsigned long parent_ip); The ip is the address of the function and parent_ip is the address of the parent function that called it. The x86_64 version has the assembly call the registered function directly to save having to do a double function call. To enable mcount, a sysctl is added: /proc/sys/kernel/mcount_enabled Once mcount is enabled, when a function is registed, it will be called by all functions. The tracer in this patch series shows how this is done. It adds a directory in the debugfs, called mctracer. With a ctrl file that will allow the user have the tracer register its function. Note, the order of enabling mcount and registering a function is not important, but both must be done to initiate the tracing. That is, you can disable tracing by either disabling mcount or by clearing the registered function. Only one function may be registered at a time. If another function is registered, it will simply override what ever was there previously. Here's a simple example of the tracer output: CPU 2: hackbench:11867 preempt_schedule+0xc/0x84 <-- avc_has_perm_noaudit+0x45d/0x52c CPU 1: hackbench:12052 selinux_file_permission+0x10/0x11c <-- security_file_permission+0x16/0x18 CPU 3: hackbench:12017 update_curr+0xe/0x8b <-- put_prev_task_fair+0x24/0x4c CPU 2: hackbench:11867 avc_audit+0x16/0x9e3 <-- avc_has_perm+0x51/0x63 CPU 0: hackbench:12019 socket_has_perm+0x16/0x7c <-- selinux_socket_sendmsg+0x27/0x3e CPU 1: hackbench:12052 file_has_perm+0x16/0xbb <-- selinux_file_permission+0x104/0x11c This is formated like: CPU : : <-- Overhead: - Note that having mcount compiled in seems to show a little overhead. Here's 3 runs of hackbench 50 without the patches: Time: 2.137 Time: 2.283 Time: 2.245 Avg: 2.221 and here's 3 runs with the patches (without tracing on): Time: 2.738 Time: 2.469 Time: 2.388 Avg: 2.531 So it is a 13% overhead when enabled (according to hackbench). But full tracing can cause a bit more problems: # hackbench 50 Time: 113.350 113.350! But this is tracing *every* function call! Future:
[RFC PATCH 00/11] mcount tracing utility
The following patch series brings to vanilla Linux a bit of the RT kernel trace facility. This incorporates the -pg profiling option of gcc that will call the mcount function for all functions called in the kernel. This patch series implements the code for x86 (32 and 64 bit), but other archs can easily be implemented as well. Some Background: A while back, Ingo Molnar and William Lee Irwin III created a latency tracer to find problem latency areas in the kernel for the RT patch. This tracer became a very integral part of the RT kernel in solving where latency hot spots were. One of the features that the latency tracer added was a function trace. This function tracer would record all functions that were called (implemented by the gcc -pg option) and would show what was called when interrupts or preemption was turned off. This feature is also very helpful in normal debugging. So it's been talked about taking bits and pieces from the RT latency tracer and bring them to LKML. But no one had the time to do it. Arnaldo Carvalho de Melo took a crack at it. He pulled out the mcount as well as part of the tracing code and made it generic from the point of the tracing code. I'm not sure why this stopped. Probably because Arnaldo is a very busy man, and his efforts had to be utilized elsewhere. While I still maintain my own Logdev utility: http://rostedt.homelinux.com/logdev I came across a need to do the mcount with logdev too. I was successful but found that it became very dependent on a lot of code. One thing that I liked about my logdev utility was that it was very non-intrusive, and has been easy to port from the Linux 2.0 days. I did not want to burden the logdev patch with the intrusiveness of mcount (not really that intrusive, it just needs to add a notrace annotation to functions in the kernel that will cause more conflicts in applying patches for me). Being close to the holidays, I grabbed Arnaldos old patches and started massaging them into something that could be useful for logdev, and what I found out (and talking this over with Arnaldo too) that this can be much more useful for others as well. The main thing I changed, was that I made the mcount function itself generic, and not the dependency on the tracing code. That is I added register_mcount_function() and clear_mcount_function() So when ever mcount is enabled and a function is registered that function is called for all functions in the kernel that is not labeled with the notrace annotation. The key thing here is that *any* utility can now hook its own function into mcount! The Simple Tracer: -- To show the power of this I also massaged the tracer code that Arnaldo pulled from the RT patch and made it be a nice example of what can be done with this. The function that is registered to mcount has the prototype: void func(unsigned long ip, unsigned long parent_ip); The ip is the address of the function and parent_ip is the address of the parent function that called it. The x86_64 version has the assembly call the registered function directly to save having to do a double function call. To enable mcount, a sysctl is added: /proc/sys/kernel/mcount_enabled Once mcount is enabled, when a function is registed, it will be called by all functions. The tracer in this patch series shows how this is done. It adds a directory in the debugfs, called mctracer. With a ctrl file that will allow the user have the tracer register its function. Note, the order of enabling mcount and registering a function is not important, but both must be done to initiate the tracing. That is, you can disable tracing by either disabling mcount or by clearing the registered function. Only one function may be registered at a time. If another function is registered, it will simply override what ever was there previously. Here's a simple example of the tracer output: CPU 2: hackbench:11867 preempt_schedule+0xc/0x84 -- avc_has_perm_noaudit+0x45d/0x52c CPU 1: hackbench:12052 selinux_file_permission+0x10/0x11c -- security_file_permission+0x16/0x18 CPU 3: hackbench:12017 update_curr+0xe/0x8b -- put_prev_task_fair+0x24/0x4c CPU 2: hackbench:11867 avc_audit+0x16/0x9e3 -- avc_has_perm+0x51/0x63 CPU 0: hackbench:12019 socket_has_perm+0x16/0x7c -- selinux_socket_sendmsg+0x27/0x3e CPU 1: hackbench:12052 file_has_perm+0x16/0xbb -- selinux_file_permission+0x104/0x11c This is formated like: CPU CPU#: task-comm:task-pid function -- parent-function Overhead: - Note that having mcount compiled in seems to show a little overhead. Here's 3 runs of hackbench 50 without the patches: Time: 2.137 Time: 2.283 Time: 2.245 Avg: 2.221 and here's 3 runs with the patches (without tracing on): Time: 2.738 Time: 2.469 Time: 2.388 Avg: 2.531 So it is a 13% overhead when enabled (according to hackbench). But full tracing can cause a bit more problems: # hackbench 50 Time: 113.350 113.350! But this is tracing