Re: [RFC PATCH 00/11] mcount tracing utility

2008-01-08 Thread Tim Bird
Steven Rostedt wrote:
> The following patch series brings to vanilla Linux a bit of the RT kernel
> trace facility. This incorporates the "-pg" profiling option of gcc
> that will call the "mcount" function for all functions called in
> the kernel.
> 
> This patch series implements the code for x86 (32 and 64 bit), but
> other archs can easily be implemented as well.

Steven,

This is really exciting!

As a former maintainer of the (out-of-tree) Kernel Function Trace
system, I really welcome this.  I'm just getting out from under
a backlog of work due to the holiday break, but I'm very interested.
I will take a detailed look at this this week.

I have been working with -finstrument-functions for a few years
now, so I know of a few gotchas with that (e.g. It's currently broken
on ARM EABI with GCC 4.x)  This bug is one of the issues that has
prevented me from attempting to mainline the KFT work this last
year.

Please keep me CC'ed on developments in this area, and let me
know if there are any specific things I can do to help.  I'd be
very interested in helping out with non-x86 arch support.

Regards,
 -- Tim

=
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] mcount tracing utility

2008-01-08 Thread Tim Bird
Steven Rostedt wrote:
 The following patch series brings to vanilla Linux a bit of the RT kernel
 trace facility. This incorporates the -pg profiling option of gcc
 that will call the mcount function for all functions called in
 the kernel.
 
 This patch series implements the code for x86 (32 and 64 bit), but
 other archs can easily be implemented as well.

Steven,

This is really exciting!

As a former maintainer of the (out-of-tree) Kernel Function Trace
system, I really welcome this.  I'm just getting out from under
a backlog of work due to the holiday break, but I'm very interested.
I will take a detailed look at this this week.

I have been working with -finstrument-functions for a few years
now, so I know of a few gotchas with that (e.g. It's currently broken
on ARM EABI with GCC 4.x)  This bug is one of the issues that has
prevented me from attempting to mainline the KFT work this last
year.

Please keep me CC'ed on developments in this area, and let me
know if there are any specific things I can do to help.  I'd be
very interested in helping out with non-x86 arch support.

Regards,
 -- Tim

=
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] mcount tracing utility

2008-01-03 Thread Frank Ch. Eigler
Steven Rostedt <[EMAIL PROTECTED]> writes:

> The following patch series brings to vanilla Linux a bit of the RT kernel
> trace facility. This incorporates the "-pg" profiling option of gcc
> that will call the "mcount" function for all functions called in
> the kernel.
> [...]
> [Future:] SystemTap:
> --
> One thing that Arnaldo and I discussed last year was using systemtap to
> add hooks into the kernel to start and stop tracing.  

Sure.  The dual of this makes sense too: letting systemtap scripts
hook up to the mcount callback itself, for purposes beyond just
tracing the function calls.

> kprobes is too heavy to do on all funtion calls, but it would be
> perfect to add to non hot paths to start the tracer and stop the
> tracer.

(Note that kprobes are not the only event sources systemtap can use:
markers, timers, procfs control files, and some others.  Any
combination of these can be used in a script to express start/stop
decisions.)


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] mcount tracing utility

2008-01-03 Thread Andi Kleen
Steven Rostedt <[EMAIL PROTECTED]> writes:

> The following patch series brings to vanilla Linux a bit of the RT kernel
> trace facility. This incorporates the "-pg" profiling option of gcc
> that will call the "mcount" function for all functions called in
> the kernel.

My personal feeling regarding this code was that it would be much 
simpler/cleaner
to implement a driver for the "jump tracer"s implemented in various CPUs.
Basically the CPU will write all jumps into a buffer by itself. That
allows you to do many traces (although not latency traces) too.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] mcount tracing utility

2008-01-03 Thread Steven Rostedt

On Thu, 3 Jan 2008, Mathieu Desnoyers wrote:

> Hi Steven,
>
> Great work!

Thanks!

>
> (added Tim Bird, author of KFT/KFI to the CC list)

I'm currently investigating using -finstrument-functions instead of -pg,
but if the overhead is too much, I may try to incorporate both.

>
> One interesting aspect of LTTng is that is would be very lightweight.
> You seem to use interrupt disabling with your simple tracer and do a
> _lot_ of cacheline bouncing (trace_idx[NR_CPUS] is a very good exemple).

Please note that this tracer is more of a "simple example". There's lots
of improvements that can be made. It was meant more of to show what mcount
can bring than to push the tracer itself.

I want to stress that the tracer in this patch set is a *much* simplified
version of the latency_tracer in the RT patch. I want to start out simple,
complexity can come later ;-)

>
> LTTng would write the information to a per-cpu memory buffer in binary
> format. I see that it would be especially useful in flight recorder
> mode, where we overwrite the buffers without writing them to disk : when
> a problematic condition is reached, (a kernel oops would be a good one),
> then we just stop tracing and dump the last buffers to disk. In this
> case, we would have the last function calls that led to an OOPS.

This sounds great. My hope is that we can get the mcount (or cyg_profile)
functionality in the kernel that many different users can deploy.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] mcount tracing utility

2008-01-03 Thread Mathieu Desnoyers
Hi Steven,

Great work!

(added Tim Bird, author of KFT/KFI to the CC list)

* Steven Rostedt ([EMAIL PROTECTED]) wrote:
>
...
> 
> Future:
> ---
> The way the mcount hook is done here, other utilities can easily add their
> own functions. Just care needs to be made not to call anything that is not
> marked with notrace, or you will crash the box with recursion. But
> even the simple tracer adds a "disabled" feature so in case it happens
> to call something that is not marked with notrace, it is a safety net
> not to kill the box.
> 
> I was originally going to use the relay system to record the data, but
> that had a chance of calling functions not marked with notrace. But, if
> for example LTTng wanted to use this, it could disable tracing on a CPU
> when doing the calls, and this will protect from recusion.
> 

Yes, I'd love to add this information source to LTTng. It simply boils
down to adding a "notrace" flag to LTTng tracing functions. Since I
don't use relay code _at all_ in the tracing path, there is no problem
with this (I only disable preemption and do "local" atomic operations on
per-cpu variables). Then I would have to do the glue code that registers
the LTTng handler to your mcount mechanism.

One interesting aspect of LTTng is that is would be very lightweight.
You seem to use interrupt disabling with your simple tracer and do a
_lot_ of cacheline bouncing (trace_idx[NR_CPUS] is a very good exemple).

LTTng would write the information to a per-cpu memory buffer in binary
format. I see that it would be especially useful in flight recorder
mode, where we overwrite the buffers without writing them to disk : when
a problematic condition is reached, (a kernel oops would be a good one),
then we just stop tracing and dump the last buffers to disk. In this
case, we would have the last function calls that led to an OOPS.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] mcount tracing utility

2008-01-03 Thread Mathieu Desnoyers
Hi Steven,

Great work!

(added Tim Bird, author of KFT/KFI to the CC list)

* Steven Rostedt ([EMAIL PROTECTED]) wrote:

...
 
 Future:
 ---
 The way the mcount hook is done here, other utilities can easily add their
 own functions. Just care needs to be made not to call anything that is not
 marked with notrace, or you will crash the box with recursion. But
 even the simple tracer adds a disabled feature so in case it happens
 to call something that is not marked with notrace, it is a safety net
 not to kill the box.
 
 I was originally going to use the relay system to record the data, but
 that had a chance of calling functions not marked with notrace. But, if
 for example LTTng wanted to use this, it could disable tracing on a CPU
 when doing the calls, and this will protect from recusion.
 

Yes, I'd love to add this information source to LTTng. It simply boils
down to adding a notrace flag to LTTng tracing functions. Since I
don't use relay code _at all_ in the tracing path, there is no problem
with this (I only disable preemption and do local atomic operations on
per-cpu variables). Then I would have to do the glue code that registers
the LTTng handler to your mcount mechanism.

One interesting aspect of LTTng is that is would be very lightweight.
You seem to use interrupt disabling with your simple tracer and do a
_lot_ of cacheline bouncing (trace_idx[NR_CPUS] is a very good exemple).

LTTng would write the information to a per-cpu memory buffer in binary
format. I see that it would be especially useful in flight recorder
mode, where we overwrite the buffers without writing them to disk : when
a problematic condition is reached, (a kernel oops would be a good one),
then we just stop tracing and dump the last buffers to disk. In this
case, we would have the last function calls that led to an OOPS.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] mcount tracing utility

2008-01-03 Thread Steven Rostedt

On Thu, 3 Jan 2008, Mathieu Desnoyers wrote:

 Hi Steven,

 Great work!

Thanks!


 (added Tim Bird, author of KFT/KFI to the CC list)

I'm currently investigating using -finstrument-functions instead of -pg,
but if the overhead is too much, I may try to incorporate both.


 One interesting aspect of LTTng is that is would be very lightweight.
 You seem to use interrupt disabling with your simple tracer and do a
 _lot_ of cacheline bouncing (trace_idx[NR_CPUS] is a very good exemple).

Please note that this tracer is more of a simple example. There's lots
of improvements that can be made. It was meant more of to show what mcount
can bring than to push the tracer itself.

I want to stress that the tracer in this patch set is a *much* simplified
version of the latency_tracer in the RT patch. I want to start out simple,
complexity can come later ;-)


 LTTng would write the information to a per-cpu memory buffer in binary
 format. I see that it would be especially useful in flight recorder
 mode, where we overwrite the buffers without writing them to disk : when
 a problematic condition is reached, (a kernel oops would be a good one),
 then we just stop tracing and dump the last buffers to disk. In this
 case, we would have the last function calls that led to an OOPS.

This sounds great. My hope is that we can get the mcount (or cyg_profile)
functionality in the kernel that many different users can deploy.

-- Steve

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] mcount tracing utility

2008-01-03 Thread Andi Kleen
Steven Rostedt [EMAIL PROTECTED] writes:

 The following patch series brings to vanilla Linux a bit of the RT kernel
 trace facility. This incorporates the -pg profiling option of gcc
 that will call the mcount function for all functions called in
 the kernel.

My personal feeling regarding this code was that it would be much 
simpler/cleaner
to implement a driver for the jump tracers implemented in various CPUs.
Basically the CPU will write all jumps into a buffer by itself. That
allows you to do many traces (although not latency traces) too.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 00/11] mcount tracing utility

2008-01-02 Thread Steven Rostedt

The following patch series brings to vanilla Linux a bit of the RT kernel
trace facility. This incorporates the "-pg" profiling option of gcc
that will call the "mcount" function for all functions called in
the kernel.

This patch series implements the code for x86 (32 and 64 bit), but
other archs can easily be implemented as well.

Some Background:


A while back, Ingo Molnar and William Lee Irwin III created a latency tracer
to find problem latency areas in the kernel for the RT patch.  This tracer
became a very integral part of the RT kernel in solving where latency hot
spots were.  One of the features that the latency tracer added was a
function trace.  This function tracer would record all functions that
were called (implemented by the gcc "-pg" option) and would show what was
called when interrupts or preemption was turned off.

This feature is also very helpful in normal debugging. So it's been talked
about taking bits and pieces from the RT latency tracer and bring them
to LKML. But no one had the time to do it.

Arnaldo Carvalho de Melo took a crack at it. He pulled out the mcount
as well as part of the tracing code and made it generic from the point
of the tracing code.  I'm not sure why this stopped. Probably because
Arnaldo is a very busy man, and his efforts had to be utilized elsewhere.

While I still maintain my own Logdev utility:

  http://rostedt.homelinux.com/logdev

I came across a need to do the mcount with logdev too. I was successful
but found that it became very dependent on a lot of code. One thing that
I liked about my logdev utility was that it was very non-intrusive, and has
been easy to port from the Linux 2.0 days. I did not want to burden the
logdev patch with the intrusiveness of mcount (not really that intrusive,
it just needs to add a "notrace" annotation to functions in the kernel
that will cause more conflicts in applying patches for me).

Being close to the holidays, I grabbed Arnaldos old patches and started
massaging them into something that could be useful for logdev, and what
I found out (and talking this over with Arnaldo too) that this can
be much more useful for others as well.

The main thing I changed, was that I made the mcount function itself
generic, and not the dependency on the tracing code.  That is I added

register_mcount_function()
 and
clear_mcount_function()

So when ever mcount is enabled and a function is registered that function
is called for all functions in the kernel that is not labeled with the
"notrace" annotation.

The key thing here is that *any* utility can now hook its own function into
mcount!

The Simple Tracer:
--

To show the power of this I also massaged the tracer code that Arnaldo pulled
from the RT patch and made it be a nice example of what can be done
with this.

The function that is registered to mcount has the prototype:

 void func(unsigned long ip, unsigned long parent_ip);

The ip is the address of the function and parent_ip is the address of
the parent function that called it.

The x86_64 version has the assembly call the registered function directly
to save having to do a double function call.

To enable mcount, a sysctl is added:

   /proc/sys/kernel/mcount_enabled

Once mcount is enabled, when a function is registed, it will be called by
all functions. The tracer in this patch series shows how this is done.
It adds a directory in the debugfs, called mctracer. With a ctrl file that
will allow the user have the tracer register its function.  Note, the order
of enabling mcount and registering a function is not important, but both
must be done to initiate the tracing. That is, you can disable tracing
by either disabling mcount or by clearing the registered function.

Only one function may be registered at a time. If another function is
registered, it will simply override what ever was there previously.

Here's a simple example of the tracer output:

CPU 2: hackbench:11867 preempt_schedule+0xc/0x84 <-- 
avc_has_perm_noaudit+0x45d/0x52c
CPU 1: hackbench:12052 selinux_file_permission+0x10/0x11c <-- 
security_file_permission+0x16/0x18
CPU 3: hackbench:12017 update_curr+0xe/0x8b <-- put_prev_task_fair+0x24/0x4c
CPU 2: hackbench:11867 avc_audit+0x16/0x9e3 <-- avc_has_perm+0x51/0x63
CPU 0: hackbench:12019 socket_has_perm+0x16/0x7c <-- 
selinux_socket_sendmsg+0x27/0x3e
CPU 1: hackbench:12052 file_has_perm+0x16/0xbb <-- 
selinux_file_permission+0x104/0x11c

This is formated like:

 CPU : :  <-- 


Overhead:
-

Note that having mcount compiled in seems to show a little overhead.

Here's 3 runs of hackbench 50 without the patches:
Time: 2.137
Time: 2.283
Time: 2.245

 Avg: 2.221

and here's 3 runs with the patches (without tracing on):
Time: 2.738
Time: 2.469
Time: 2.388

  Avg: 2.531

So it is a 13% overhead when enabled (according to hackbench).

But full tracing can cause a bit more problems:

# hackbench 50
Time: 113.350

  113.350!

But this is tracing *every* function call!


Future:

[RFC PATCH 00/11] mcount tracing utility

2008-01-02 Thread Steven Rostedt

The following patch series brings to vanilla Linux a bit of the RT kernel
trace facility. This incorporates the -pg profiling option of gcc
that will call the mcount function for all functions called in
the kernel.

This patch series implements the code for x86 (32 and 64 bit), but
other archs can easily be implemented as well.

Some Background:


A while back, Ingo Molnar and William Lee Irwin III created a latency tracer
to find problem latency areas in the kernel for the RT patch.  This tracer
became a very integral part of the RT kernel in solving where latency hot
spots were.  One of the features that the latency tracer added was a
function trace.  This function tracer would record all functions that
were called (implemented by the gcc -pg option) and would show what was
called when interrupts or preemption was turned off.

This feature is also very helpful in normal debugging. So it's been talked
about taking bits and pieces from the RT latency tracer and bring them
to LKML. But no one had the time to do it.

Arnaldo Carvalho de Melo took a crack at it. He pulled out the mcount
as well as part of the tracing code and made it generic from the point
of the tracing code.  I'm not sure why this stopped. Probably because
Arnaldo is a very busy man, and his efforts had to be utilized elsewhere.

While I still maintain my own Logdev utility:

  http://rostedt.homelinux.com/logdev

I came across a need to do the mcount with logdev too. I was successful
but found that it became very dependent on a lot of code. One thing that
I liked about my logdev utility was that it was very non-intrusive, and has
been easy to port from the Linux 2.0 days. I did not want to burden the
logdev patch with the intrusiveness of mcount (not really that intrusive,
it just needs to add a notrace annotation to functions in the kernel
that will cause more conflicts in applying patches for me).

Being close to the holidays, I grabbed Arnaldos old patches and started
massaging them into something that could be useful for logdev, and what
I found out (and talking this over with Arnaldo too) that this can
be much more useful for others as well.

The main thing I changed, was that I made the mcount function itself
generic, and not the dependency on the tracing code.  That is I added

register_mcount_function()
 and
clear_mcount_function()

So when ever mcount is enabled and a function is registered that function
is called for all functions in the kernel that is not labeled with the
notrace annotation.

The key thing here is that *any* utility can now hook its own function into
mcount!

The Simple Tracer:
--

To show the power of this I also massaged the tracer code that Arnaldo pulled
from the RT patch and made it be a nice example of what can be done
with this.

The function that is registered to mcount has the prototype:

 void func(unsigned long ip, unsigned long parent_ip);

The ip is the address of the function and parent_ip is the address of
the parent function that called it.

The x86_64 version has the assembly call the registered function directly
to save having to do a double function call.

To enable mcount, a sysctl is added:

   /proc/sys/kernel/mcount_enabled

Once mcount is enabled, when a function is registed, it will be called by
all functions. The tracer in this patch series shows how this is done.
It adds a directory in the debugfs, called mctracer. With a ctrl file that
will allow the user have the tracer register its function.  Note, the order
of enabling mcount and registering a function is not important, but both
must be done to initiate the tracing. That is, you can disable tracing
by either disabling mcount or by clearing the registered function.

Only one function may be registered at a time. If another function is
registered, it will simply override what ever was there previously.

Here's a simple example of the tracer output:

CPU 2: hackbench:11867 preempt_schedule+0xc/0x84 -- 
avc_has_perm_noaudit+0x45d/0x52c
CPU 1: hackbench:12052 selinux_file_permission+0x10/0x11c -- 
security_file_permission+0x16/0x18
CPU 3: hackbench:12017 update_curr+0xe/0x8b -- put_prev_task_fair+0x24/0x4c
CPU 2: hackbench:11867 avc_audit+0x16/0x9e3 -- avc_has_perm+0x51/0x63
CPU 0: hackbench:12019 socket_has_perm+0x16/0x7c -- 
selinux_socket_sendmsg+0x27/0x3e
CPU 1: hackbench:12052 file_has_perm+0x16/0xbb -- 
selinux_file_permission+0x104/0x11c

This is formated like:

 CPU CPU#: task-comm:task-pid function -- parent-function


Overhead:
-

Note that having mcount compiled in seems to show a little overhead.

Here's 3 runs of hackbench 50 without the patches:
Time: 2.137
Time: 2.283
Time: 2.245

 Avg: 2.221

and here's 3 runs with the patches (without tracing on):
Time: 2.738
Time: 2.469
Time: 2.388

  Avg: 2.531

So it is a 13% overhead when enabled (according to hackbench).

But full tracing can cause a bit more problems:

# hackbench 50
Time: 113.350

  113.350!

But this is tracing