Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.

2012-07-19 Thread Hiraku Toyooka

Hello,
Sorry for the late reply.

(2012/07/12 3:00), Rob Landley wrote:

How about:

CONFIG_TRACER_MAX_TRACE makes a generic snapshot feature available to
all tracers. (Some tracers, such as "irqsoff" or "wakeup", use their own
internal snapshot implementation.)


Thanks, but I think the following one is more suitable.

(Some tracers, such as "irqsoff" or "wakeup", already use the snapshot
implementation internally)


This implies that setting flag is a NOP for them, rather than "if you
take a snapshot, they'll stomp all over the buffer".



How about:
(Some tracers which record max latency, such as "irqsoff" or "wakeup",
can't use this feature, since those are already using same buffer
internally.)


(Query: do you have to free the buffer after taking a snapshot below?)


No, we don't always need to free the buffer, although we can free it
when the snapshot becomes unnecessary. We can also reuse the buffer if
we'd like to take the next snapshot.
(I'll add this description.)


Actually I was worried about the lifetime rules for the buffer (when
does it need to be disposed of, and who is responsible for doing so?)
but it looks like ftrace only allows one trace to be going on in the
entire system at any given time, so all this context is kernel global
anyway...



Since current ftrace supposes single user, I think it's enough now.



Buffer allocation is a separate action because taking a snapshot is a
low-latency operation that should never fail. Got it.

But again, why couldn't open() do one and read() (from position 0) do
the other? And then if you wanted to take another snapshot you could
lseek() back to 0...

*shrug* I suppose the way you have it works. Just curious.



Taking a snapshot by using read() will cause a problem. Users don't
always want to read a snapshot just after taking the snapshot. Users
may want to transfer the snapshot data to persistent storage after
usage of system resources(such as CPU or I/O) becomes low. So we should
separate taking a snapshot from reading data.



Why is there only _one_ snapshot buffer?


Because of easy reviewing, I've decided to implement step by step. So
this patch just provide "only one" spare buffer. However, I got some
feedback for multiple snapshot at LinuxCon Japan 2012. So I may extend
this feature.


The implementation can change all you like after the code goes in, but
you really have to get the API right because you're going to be stuck
supporting it FOREVER.



I see. I'm now thinking extensible API.



+Here is an example of using the snapshot feature.
+
+ # echo nop > current_tracer
+ # echo 1 > snapshot_enabled
+ # echo 1 > events/sched/enable
+ [...]
+ # echo 1 > snapshot_pipe
Is the order significant here? 

If the current_tracer is a special tracer such as irqsoff,
snapshot_enabled can't be set to 1. So current_tracer has to be
configured to appropriate tracer in the first place.


But they aren't compatible anyway? (So presumably this call failing is
part of the enforcement mechanism by which the incompatible combination
is disallowed?)



Yes, we should have the special tracers fail to allocate a spare buffer.


What happens if you do it the other way around? (echo 1 >
snapshot_enabled and then echo irqsoff > current_tracer)?



Then, "echo irqsoff > current_tracer" succeed, and snapshot_enabled turn
into 0. (But in this patch, it remains 1. This is a bug.)


I'm guessing the reason buffer switching was described above is to
explain that taking a snapshot blanks the current trace history since
you switch to an empty buffer?


Yes. Actually, this snapshot feature only swaps the current buffer and
the spare buffer. Now I'm considering to call it "swap" instead of
"snapshot"...


I'm all for clarifying what it does, but we already have a subsystem
called "swap" (swap partitions, swapon/swapoff) so that name isn't
necessarily less confusing.



Hmm, a good name doesn't occur to me. I'll continue to call this feature
snapshot.


The size of two buffers is same because this is swap.


So the new one is also buffer_size_kb?



Yes.


Out of curiosity, what happens if you go:

  echo nop > current_tracer
  echo 1 > snapshot_enabled
  echo 64 > buffer_size_kb

(Techncially current_tracer is still nop...)



Write to buffer_size_kb affects both buffers when the snapshot is
enabled. In that case, the spare buffer is allocated and becomes same
size as the current buffer after second line. The size of both buffers
becomes 64KB after third line.



The reason of the separated operations is to swap two buffers
immediately without fail. if opening snapshot_pipe allocates a buffer,
we need a special userspace command to realize that.


Um, cat can open a file? I don't understand.

If you need a delay between open and read, you can do that from the
command line with something like:

  (read -n 1 < /dev/tty; cat > ~/walrus.txt) < /path/to/thingy

I dunno if that's a good idea, I'm just trying to understand the design
here...


Your method may be possible, bu

Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.

2012-07-19 Thread Hiraku Toyooka

Hello, Steven,
(Sorry for the late reply.)

Tnank you for your comments.

(2012/07/12 8:26), Steven Rostedt wrote:

+Snapshot
+
+If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
+feature is available in all tracers except for the special
+tracers which use a snapshot inside themselves(such as "irqsoff"
+or "wakeup").


I find this kind of ironic, that it is only defined when one of the
tracers that can't use it defines it.



Ah, I missed that.


Maybe we should make it a prompt config for this feature.



Yes, I'll make the new config like "CONFIG_TRACER_SNAPSHOT".



+  snapshot_enabled:
+
+   This is used to set or display whether the snapshot is
+   enabled. Echo 1 into this file to prepare a spare buffer
+   or 0 to shrink it. So, the memory for the spare buffer
+   will be consumed only when this knob is set.
+
+  snapshot_pipe:
+
+   This is used to take a snapshot and to read the output
+   of the snapshot. Echo 1 into this file to take a
+   snapshot. Reads from this file is the same as the
+   "trace_pipe" file (described above "The File System"
+   section), so that both reads from the snapshot and
+   tracing are executable in parallel.


I don't really like the name snapshot_pipe. What about just calling it
snapshot, and just document that it works like trace_pipe?



Agreed. I'll change the name to snapshot and modify document.



Also, rename snapshot_enabled, to snapshot_allocate. If someone echos 1
into snapshot, it would automatically allocate the buffer (and set
snapshot_allocate to 1). If you don't want the delay (for allocation),
then you can do the echo 1 into snapshot_allocate first, and it would
behave as it does here.



I'll change them to that way.




+
+Here is an example of using the snapshot feature.
+
+ # echo nop > current_tracer
+ # echo 1 > snapshot_enabled
+ # echo 1 > events/sched/enable
+ [...]
+ # echo 1 > snapshot_pipe
+ # cat snapshot_pipe
+bash-3352  [001] dN.. 18440.883932: sched_wakeup: comm=migration/6 
pid=28 prio=0 success=1 target_cpu=006
+bash-3352  [001] dN.. 18440.883933: sched_wakeup: comm=migration/7 
pid=32 prio=0 success=1 target_cpu=007
+bash-3352  [001] d... 18440.883935: sched_switch: prev_comm=bash 
prev_pid=3352 prev_prio=120 prev_state=R ==> next_comm=migration/1 next_pid=8 
next_prio=0
+[...]


BTW, why make it a pipe action anyway? As a snapshot doesn't have a
writer to it, doing just an iterate over the snapshot would make sense,
wouldn't it?



I thought I should reuse existing code as much as possible. So I'd like
to reuse the "trace" code at first. But opening "trace" stops tracing
until it is closed. Therefore, I reused "trace_pipe" code instead of
"trace".

However, it seems that I should have made new iteration code as you
pointed out. I will make it the "trace"-like action.


If you reply with a good rational for keeping the snapshot_pipe, then we
should have both snapshot and snapshot_pipe, where snapshot works like
trace and snapshot_pipe works like trace_pipe.



I think only "snapshot" file is enough for the present.



+static ssize_t
+tracing_write_snapshot_pipe(struct file *filp, const char __user *ubuf,
+  size_t cnt, loff_t *ppos)
+{
+   unsigned long val = 0;
+   int ret;
+
+   ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
+   if (ret)
+   return ret;
+
+   mutex_lock(&trace_types_lock);
+
+   /* If current tracer's use_max_tr == 0, we prevent taking a snapshot */


Here we should just allocate it first.



OK. I'll add that.


+   if (!current_trace->use_max_tr) {


I also have issues with the use of 'use_max_tr' here, but I'll explain
that below.


+   ret = -EBUSY;
+   goto out;
+   }
+   if (val) {
+   unsigned long flags;
+   local_irq_save(flags);


Interrupts will never be disabled here. Just use
'local_irq_disable/enable()', and remove flags.



Yes. I'll fix it.


+   update_max_tr(&global_trace, current, raw_smp_processor_id());


Also, get rid of the 'raw_' that's for critical paths that can be broken
by the debug version of the normal user (like in function tracing and
callbacks from disabling interrupts).



I'll fix it.



+static ssize_t
+tracing_snapshot_ctrl_write(struct file *filp, const char __user *ubuf,
+   size_t cnt, loff_t *ppos)
+{
+   unsigned long val;
+   int ret;
+
+   ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
+   if (ret)
+   return ret;
+
+   val = !!val;
+
+   mutex_lock(&trace_types_lock);
+   tracing_stop();
+   arch_spin_lock(&ftrace_max_lock);
+
+   /* When always_use_max_tr == 1, we can't toggle use_max_tr. */
+   if (current_trace->always_use_max_tr) {


I'll state my issue here. Don't rename use_max_tr to always_use_max_tr,
keep it as is and its use as is. Your other value should be
"alloca

Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.

2012-07-11 Thread Rob Landley
On 07/11/2012 12:39 AM, Hiraku Toyooka wrote:
> Hello, Rob,
> 
> Thank you very much for your advice.
> 
> (2012/07/05 10:01), Rob Landley wrote:
>> On 07/04/2012 05:47 AM, Hiraku Toyooka wrote:
>>> Hello, Steven,
>>>
>>> I've sent below RFC patch, but still have no responses. This patch can
>>> be applied to current tip tree.
>>>
>>> If you have time, could you give any comment about this patch?
>>
>> My familiarity with ftrace is "I saw a presentation on it at a
>> conference a couple years ago", so I'm not the guy to comment on the
>> advisability of the patch itself.
>>
>> As for the documentation: seems reasonable, could use some english
>> polishing.
>>
 +Snapshot
 +
 +If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
 +feature is available in all tracers except for the special
 +tracers which use a snapshot inside themselves(such as "irqsoff"
 +or "wakeup").
>>
>> This is confusing, I'm guessing irqsoff and wakeup already have
>> persistent buffers without CONFIG_TRACER_MAX_TRACE? (So there is a
>> generic snapshot feature, but some tracers have their own internal
>> buffer and don't use the generic one?)
>>
> 
> No, CONFIG_TRACER_MAX_TRACE is originally for making the spare buffer
> available. Both some special tracers (such as irqsoff) and generic
> snapshot uses the common spare buffer. But purpose of each snapshot is
> different. In the special tracers, snapshot is used for recording
> information related to max latency of something (such as longest
> interrupt-disabled area). This is automatically updated only when the
> max latency is detected. In other words, those special tracers use the
> same spare buffer in the different way. Thus, we can not enable generic
> snapshot for those tracers.

Ok, those tracers are not _compatible_ with snapshot.

Good to know.

>> Is the fact that some tracers don't use this feature an important part
>> of the description of the feature? Is making them use common code a todo
>> item, or just a comment?
>>
> 
> Anyway, the fact is not important here.
> 
> 
>> How about:
>>
>> CONFIG_TRACER_MAX_TRACE makes a generic snapshot feature available to
>> all tracers. (Some tracers, such as "irqsoff" or "wakeup", use their own
>> internal snapshot implementation.)
>>
> 
> Thanks, but I think the following one is more suitable.
> 
> (Some tracers, such as "irqsoff" or "wakeup", already use the snapshot
> implementation internally)

This implies that setting flag is a NOP for them, rather than "if you
take a snapshot, they'll stomp all over the buffer".

 +This enables to preserve trace buffer at a particular point in
 +time without stopping tracing. When a snapshot is taken, ftrace
 +swaps the current buffer with a spare buffer which is prepared
 +in advance. This means that the tracing itself continues on the
 +spare buffer.
>>
>> Snapshots preserve a trace buffer at a particular point in time without
>> stopping tracing; ftrace swaps the current buffer with a spare buffer,
>> and tracking continues in the spare buffer.
>>
 +Following debugfs files in "tracing" directory are related with
 +this feature.
>>
>> The following debugfs files in "tracing" are related to this feature:
>>
 +  snapshot_enabled:
 +
 +This is used to set or display whether the snapshot is
 +enabled. Echo 1 into this file to prepare a spare buffer
 +or 0 to shrink it. So, the memory for the spare buffer
 +will be consumed only when this knob is set.
>>
>> Write 1 to this file to allocate a snapshot buffer, 0 to free it.
>>
> 
> I'll fix them.
> 
> 
>> (Query: do you have to free the buffer after taking a snapshot below?)
>>
> 
> No, we don't always need to free the buffer, although we can free it
> when the snapshot becomes unnecessary. We can also reuse the buffer if
> we'd like to take the next snapshot.
> (I'll add this description.)

Actually I was worried about the lifetime rules for the buffer (when
does it need to be disposed of, and who is responsible for doing so?)
but it looks like ftrace only allows one trace to be going on in the
entire system at any given time, so all this context is kernel global
anyway...

 +  snapshot_pipe:
 +
 +This is used to take a snapshot and to read the output
 +of the snapshot. Echo 1 into this file to take a
 +snapshot. Reads from this file is the same as the
 +"trace_pipe" file (described above "The File System"
 +section), so that both reads from the snapshot and
 +tracing are executable in parallel.
>>
>> Echo 1 into this file to take a snapshot, then read the snapshot from
>> the file in the same format as "trace_pipe" (described above in the
>> section "The File System").
>>
> 
> I'll fix that.
> 
>> Design questions left for the reader: why are allocating a snapshot
>> buffer and taking a snapshot separate actions?
> 
> I'll add following description:
> Allocating a spare buffer and taking 

Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.

2012-07-11 Thread Steven Rostedt
Sorry for taking so long, I've been hammering away at other things. :-/


On Tue, 2012-06-05 at 21:06 +0900, Hiraku Toyooka wrote:
> Hello,
> 
> This patch provides new debugfs interfaces for taking a snapshot
> in ftrace.
> 
> Ftrace is available as a flight recorder. When a kernel panic
> occurs, we can get trace data from the dumped kernel image.
> 
> But in case of some systems, non-critical errors (such as
> application's errors) are difficult to handle because of following
> reasons.
> 
>  - When the system has to continue to run, it can't stop tracing.
>  - Even then, failure analysis is necessary to prevent the same
>errors.
> 
> So, I propose making the ftrace's snapshot feature available from
> userspace to solve above dilemma.
> (See also
> https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_toyooka.pdf)
> 
> This enables to preserve trace buffer at a particular point in time
> without stopping tracing.  When a snapshot is taken, ftrace swaps the
> current buffer with a spare buffer which is prepared in advance. This
> means that the tracing itself continues on the spare buffer.
> 
> Currently, ftrace has a snapshot feature available from kernel space
> and some tracers (e.g. irqsoff) are using it. This patch allows users
> to use the same snapshot feature via debugfs.
> 
> I added two debugfs files in "tracing" directory.
> 
>   snapshot_enabled:
> 
> This is used to set or display whether the snapshot is
> enabled. Echo 1 into this file to prepare a spare buffer
> or 0 to shrink it. So, the memory for the spare buffer
> will be consumed only when this knob is set.
> 
>   snapshot_pipe:
> 
> This is used to take a snapshot and to read the output
> of the snapshot (max_tr). Echo 1 into this file to take a
> snapshot. Reads from this file is the same as the
> "trace_pipe" file.
> 
> Here is an example of using the snapshot feature.
> 
>  # echo nop > current_tracer
>  # echo 1 > snapshot_enabled
>  # echo 1 > events/sched/enable
>  [...]
>  # echo 1 > snapshot_pipe
>  # cat snapshot_pipe
> bash-3352  [001] dN.. 18440.883932: sched_wakeup: 
> comm=migration/6 p
> id=28 prio=0 success=1 target_cpu=006
> bash-3352  [001] dN.. 18440.883933: sched_wakeup: 
> comm=migration/7 p
> id=32 prio=0 success=1 target_cpu=007
> bash-3352  [001] d... 18440.883935: sched_switch: prev_comm=bash 
> pre
> v_pid=3352 prev_prio=120 prev_state=R ==> next_comm=migration/1 next_pid=8 
> next_
> prio=0
> [...]
> 
> 
> Signed-off-by: Hiraku Toyooka 
> Cc: Steven Rostedt 
> Cc: Frederic Weisbecker 
> Cc: Ingo Molnar 
> Cc: Rob Landley 
> Cc: Masami Hiramatsu 
> Cc: linux-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> 
>  Documentation/trace/ftrace.txt|   47 +++
>  kernel/trace/trace.c  |  152 
> +
>  kernel/trace/trace.h  |8 ++
>  kernel/trace/trace_irqsoff.c  |3 +
>  kernel/trace/trace_sched_wakeup.c |2 
>  5 files changed, 210 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
> index 6f51fed..df9fa13 100644
> --- a/Documentation/trace/ftrace.txt
> +++ b/Documentation/trace/ftrace.txt
> @@ -1842,6 +1842,53 @@ an error.
>   # cat buffer_size_kb
>  85
>  
> +
> +Snapshot
> +
> +If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
> +feature is available in all tracers except for the special
> +tracers which use a snapshot inside themselves(such as "irqsoff"
> +or "wakeup").

I find this kind of ironic, that it is only defined when one of the
tracers that can't use it defines it.

Maybe we should make it a prompt config for this feature.

> +
> +This enables to preserve trace buffer at a particular point in
> +time without stopping tracing.  When a snapshot is taken, ftrace
> +swaps the current buffer with a spare buffer which is prepared
> +in advance. This means that the tracing itself continues on the
> +spare buffer.
> +
> +Following debugfs files in "tracing" directory are related with
> +this feature.
> +
> +  snapshot_enabled:
> +
> + This is used to set or display whether the snapshot is
> + enabled. Echo 1 into this file to prepare a spare buffer
> + or 0 to shrink it. So, the memory for the spare buffer
> + will be consumed only when this knob is set.
> +
> +  snapshot_pipe:
> +
> + This is used to take a snapshot and to read the output
> + of the snapshot. Echo 1 into this file to take a
> + snapshot. Reads from this file is the same as the
> + "trace_pipe" file (described above "The File System"
> + section), so that both reads from the snapshot and
> + tracing are executable in parallel.

I don't really like the name snapshot_pipe. What about just calling it
snapshot, and just document that it works like trace_pipe?

Also, rename snapshot_enabled, to snapshot_allocate. If someone echos

Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.

2012-07-10 Thread Hiraku Toyooka

Hello, Rob,

Thank you very much for your advice.

(2012/07/05 10:01), Rob Landley wrote:

On 07/04/2012 05:47 AM, Hiraku Toyooka wrote:

Hello, Steven,

I've sent below RFC patch, but still have no responses. This patch can
be applied to current tip tree.

If you have time, could you give any comment about this patch?


My familiarity with ftrace is "I saw a presentation on it at a
conference a couple years ago", so I'm not the guy to comment on the
advisability of the patch itself.

As for the documentation: seems reasonable, could use some english
polishing.


+Snapshot
+
+If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
+feature is available in all tracers except for the special
+tracers which use a snapshot inside themselves(such as "irqsoff"
+or "wakeup").


This is confusing, I'm guessing irqsoff and wakeup already have
persistent buffers without CONFIG_TRACER_MAX_TRACE? (So there is a
generic snapshot feature, but some tracers have their own internal
buffer and don't use the generic one?)



No, CONFIG_TRACER_MAX_TRACE is originally for making the spare buffer
available. Both some special tracers (such as irqsoff) and generic
snapshot uses the common spare buffer. But purpose of each snapshot is
different. In the special tracers, snapshot is used for recording
information related to max latency of something (such as longest
interrupt-disabled area). This is automatically updated only when the
max latency is detected. In other words, those special tracers use the
same spare buffer in the different way. Thus, we can not enable generic
snapshot for those tracers.



Is the fact that some tracers don't use this feature an important part
of the description of the feature? Is making them use common code a todo
item, or just a comment?



Anyway, the fact is not important here.



How about:

CONFIG_TRACER_MAX_TRACE makes a generic snapshot feature available to
all tracers. (Some tracers, such as "irqsoff" or "wakeup", use their own
internal snapshot implementation.)



Thanks, but I think the following one is more suitable.

(Some tracers, such as "irqsoff" or "wakeup", already use the snapshot
implementation internally)



+This enables to preserve trace buffer at a particular point in
+time without stopping tracing. When a snapshot is taken, ftrace
+swaps the current buffer with a spare buffer which is prepared
+in advance. This means that the tracing itself continues on the
+spare buffer.


Snapshots preserve a trace buffer at a particular point in time without
stopping tracing; ftrace swaps the current buffer with a spare buffer,
and tracking continues in the spare buffer.


+Following debugfs files in "tracing" directory are related with
+this feature.


The following debugfs files in "tracing" are related to this feature:


+  snapshot_enabled:
+
+This is used to set or display whether the snapshot is
+enabled. Echo 1 into this file to prepare a spare buffer
+or 0 to shrink it. So, the memory for the spare buffer
+will be consumed only when this knob is set.


Write 1 to this file to allocate a snapshot buffer, 0 to free it.



I'll fix them.



(Query: do you have to free the buffer after taking a snapshot below?)



No, we don't always need to free the buffer, although we can free it
when the snapshot becomes unnecessary. We can also reuse the buffer if
we'd like to take the next snapshot.
(I'll add this description.)



+  snapshot_pipe:
+
+This is used to take a snapshot and to read the output
+of the snapshot. Echo 1 into this file to take a
+snapshot. Reads from this file is the same as the
+"trace_pipe" file (described above "The File System"
+section), so that both reads from the snapshot and
+tracing are executable in parallel.


Echo 1 into this file to take a snapshot, then read the snapshot from
the file in the same format as "trace_pipe" (described above in the
section "The File System").



I'll fix that.


Design questions left for the reader: why are allocating a snapshot
buffer and taking a snapshot separate actions?


I'll add following description:
Allocating a spare buffer and taking a snapshot are separated because
latter one should be done successfully and immediately. If they are not
separated, we may fail to take a snapshot because of memory allocation
failure or we may lose older trace data while allocating memory.

> Why is there only _one_ snapshot buffer?
>

Because of easy reviewing, I've decided to implement step by step. So
this patch just provide "only one" spare buffer. However, I got some
feedback for multiple snapshot at LinuxCon Japan 2012. So I may extend
this feature.



+Here is an example of using the snapshot feature.
+
+ # echo nop > current_tracer
+ # echo 1 > snapshot_enabled
+ # echo 1 > events/sched/enable
+ [...]
+ # echo 1 > snapshot_pipe


Is the order significant here? 


If the current_tracer is a special tracer such as irqsoff,
snapshot_enabled can't be set to 1. So current_tracer has to be