Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

2018-12-08 Thread Steven Rostedt
On Sat, 8 Dec 2018 11:44:23 +0100
Peter Zijlstra  wrote:

> It's a tool I haven't used in years, given we have so many better tools
> around these days.

So because you don't use it, it's useless? As you don't care about lost
events I can see why you may think there are better tools out there.
But since those tools don't guarantee no lost events, they are
obviously not better to those that do care about lost events.

> 
> > Why do we care about lost events? Because strace records *all* events,
> > as that's what it does and that's what it always has done. It would be
> > a break in functionality (a regression) if it were to start losing
> > events. I use strace to see everything that an application is doing.  
> 
> So make a new tool; break the expectation of all events. See if there's
> anybody that really cares.

Basically you are saying, break strace and see if anyone notices?

> 
> > When we discussed this at plumbers, Oracle people came to me and said
> > how awesome it would be to run strace against their database accesses.
> > The problem today is that strace causes such a large overhead that it
> > isn't feasible to trace any high speed applications, especially if
> > there are time restraints involved.  
> 
> So have them run that perf thing acme pointed to.
> 
> So far nobody's made a good argument for why we cannot have LOST events.

If you don't see the use case, I'm not sure anyone can convince you.
Again, I like the fact that when I do a strace of an application I know
that all system calls that the application I'm tracing is recorded. I
don't need to worry about what happened in the "lost events" space.

-- Steve



Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

2018-12-08 Thread Steven Rostedt
On Sat, 8 Dec 2018 11:41:21 +0100
Peter Zijlstra  wrote:

> > [root@seventh bpf]# trace -e augmented_raw_syscalls.c  --filter-pids 
> > 2279,1643
> > 
> >  19766.027 ( 0.003 ms): gcc/27524 openat(dfd: CWD, filename: 
> > /lib64/libz.so.1, flags: CLOEXEC   ) = 5
> >  19766.035 ( 0.001 ms): gcc/27524 fstat(fd: 5, statbuf: 0x7ffe9323e2a0  
> > ) = 0
> >  19766.037 ( 0.003 ms): gcc/27524 mmap(len: 2187272, prot: EXEC|READ, 
> > flags: PRIVATE|DENYWRITE, fd: 5   ) = 0x7fa2df435000
> >  19766.042 ( 0.003 ms): gcc/27524 mprotect(start: 0x7fa2df44b000, len: 
> > 2093056  ) = 0
> >  19766.046 ( 0.004 ms): gcc/27524 mmap(addr: 0x7fa2df64a000, len: 4096, 
> > prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 5, off: 86016) = 
> > 0x7fa2df64a000
> >  19766.051 ( 0.002 ms): gcc/27524 mmap(addr: 0x7fa2df64b000, len: 8, prot: 
> > READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7fa2df64b000
> >  19766.057 ( 0.001 ms): gcc/27524 close(fd: 5   
> > ) = 0
> >  19766.062 ( 0.003 ms): gcc/27524 openat(dfd: CWD, filename: 
> > /lib64/libc.so.6, flags: CLOEXEC   ) = 5
> >   
> 
> Right; and that is all nice. And exactly doesn't answer my question. Why
> do we care about those LOST entries so much that we have to do such
> horribly ugly things?
> 
> Esp. as you point out, they're clearly marked in the output and easily
> avoided by using a slightly larger buffer.

For small cases like this a slightly larger buffer wont help. And it
would suck if you are tracing something to find out why there's some
kind of anomaly that takes hours to run, only to find out that the
anomaly happened in the lost events.

Yes, there is a use case for a guarantee of no lost events!


-- Steve


Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

2018-12-07 Thread Steven Rostedt
On Fri, 7 Dec 2018 16:11:05 +0100
Peter Zijlstra  wrote:

> On Fri, Dec 07, 2018 at 08:41:18AM -0500, Steven Rostedt wrote:
> > On Fri, 7 Dec 2018 09:58:39 +0100
> > Peter Zijlstra  wrote:
> >   
> > > These patches give no justification *what*so*ever* for why we're doing
> > > ugly arse things like this. And why does this, whatever this is, need to
> > > be done in perf?
> > > 
> > > IOW, what problem are we solving ?  
> > 
> > I guess the cover letter should have had a link (or copy) of this:
> > 
> >  http://lkml.kernel.org/r/20181128134700.212ed...@gandalf.local.home  
> 
> That doesn't even begin to explain. Who cares about strace and why? And
> why is it such a bad thing to loose the occasional record etc..

Who cares about strace? Do I really need to answer that? It's one of
the most used tools for seeing what a program is doing.

Why do we care about lost events? Because strace records *all* events,
as that's what it does and that's what it always has done. It would be
a break in functionality (a regression) if it were to start losing
events. I use strace to see everything that an application is doing.

Peter, I think you've spent too much time in the kernel. There's a
whole world out there that lives in userspace ;-)

When we discussed this at plumbers, Oracle people came to me and said
how awesome it would be to run strace against their database accesses.
The problem today is that strace causes such a large overhead that it
isn't feasible to trace any high speed applications, especially if
there are time restraints involved.

If you don't like this for perf, I'll be happy to implement something in
ftrace. I just figured that the perf interface was more suitable for
something like this.

-- Steve


Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

2018-12-07 Thread Steven Rostedt
On Fri, 7 Dec 2018 09:58:39 +0100
Peter Zijlstra  wrote:

> These patches give no justification *what*so*ever* for why we're doing
> ugly arse things like this. And why does this, whatever this is, need to
> be done in perf?
> 
> IOW, what problem are we solving ?

I guess the cover letter should have had a link (or copy) of this:

 http://lkml.kernel.org/r/20181128134700.212ed...@gandalf.local.home

-- Steve


Re: [PATCH] riscv: remove unused variable in ftrace

2018-12-06 Thread Steven Rostedt
On Thu, 6 Dec 2018 11:20:31 -0800
Olof Johansson  wrote:

> On Thu, Dec 6, 2018 at 2:26 AM David Abdurachmanov
>  wrote:
> >
> > Noticed while building kernel-4.20.0-0.rc5.git2.1.fc30 for
> > Fedora 30/RISCV.
> >
> > [..]
> > BUILDSTDERR: arch/riscv/kernel/ftrace.c: In function 
> > 'prepare_ftrace_return':
> > BUILDSTDERR: arch/riscv/kernel/ftrace.c:135:6: warning: unused variable 
> > 'err' [-Wunused-variable]
> > BUILDSTDERR:   int err;
> > BUILDSTDERR:   ^~~

Bah. I could have sworn I checked for all the error messages when I did
my cross-compiling of the architectures. I fixed this issue in other
places, not sure how I missed riscv.

Thanks for fixing it.

Acked-by: Steven Rostedt (VMware) 

-- Steve

 
> > [..]
> >
> > Signed-off-by: David Abdurachmanov   
> 
> Please add a:
> Fixes: e949b6db51dc1 ("riscv/function_graph: Simplify with
> function_graph_enter()")
> Reviewed-by: Olof Johansson 



Re: [GIT PULL] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-12-06 Thread Steven Rostedt
On Thu, 6 Dec 2018 13:54:57 -0800
Andrew Morton  wrote:

  
> > Acked-by: Oleg Nesterov 
> > Reviewed-by: Srikar Dronamraju 
> > Reported-by: syzbot+cb1fb754b771caca0...@syzkaller.appspotmail.com
> > Fixes: 1cc33161a83d ("uprobes: Support SDT markers having reference 
> > count (semaphore)")
> > Signed-off-by: Ravi Bangoria 
> > Signed-off-by: Steven Rostedt (VMware) 
> >   
> 
> No cc:stable?

Fixes is still in this -rc release.

-- Steve


Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

2018-12-06 Thread Steven Rostedt
On Thu, 6 Dec 2018 09:34:00 +0100
Peter Zijlstra  wrote:

> > 
> > I don't understand this.. why are we using schedule_timeout() and all
> > that?  
> 
> Urgh.. in fact, the more I look at this the more I hate it.
> 
> We want to block in __perf_output_begin(), but we cannot because both
> tracepoints and perf will have preemptability disabled down there.
> 
> So what we do is fail the event, fake the lost count and go all the way
> up that callstack, detect the failure and then poll-wait and retry.
> 
> And only do this for a few special events...  *yuck*

Since this is a special case, we should add a new option to the perf
system call that, 1 states that it wants the traced process to block
(and must have PTRACE permission to do so) and 2, after it reads from
the buffer, it needs to check a bit that says "this process is blocked,
please wake it up" and then do another perf call to kick the process to
continue.

I really dislike the polling too. But because this is not a default
case, and is a new feature, we can add more infrastructure to make it
work properly, instead of trying to hack the current method into
something that does something poorly.

-- Steve


Re: [PATCH] kernel/kcov.c: mark func write_comp_data() as notrace

2018-12-06 Thread Steven Rostedt
On Thu,  6 Dec 2018 15:30:11 +0100
Anders Roxell  wrote:

> Since __sanitizer_cov_trace_const_cmp4 is marked as notrace, the
> function called from __sanitizer_cov_trace_const_cmp4 shouldn't be
> traceable either.  ftrace_graph_caller() gets called every time func
> write_comp_data() gets called if it isn't marked 'notrace'. This is the
> backtrace from gdb:
> 
>  #0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
>  #1  0xff8010201920 in ftrace_caller () at 
> ../arch/arm64/kernel/entry-ftrace.S:151
>  #2  0xff8010439714 in write_comp_data (type=5, arg1=0, arg2=0, 
> ip=18446743524224276596) at ../kernel/kcov.c:116
>  #3  0xff8010439894 in __sanitizer_cov_trace_const_cmp4 (arg1= out>, arg2=) at ../kernel/kcov.c:188
>  #4  0xff8010201874 in prepare_ftrace_return 
> (self_addr=18446743524226602768, parent=0xff801014b918, 
> frame_pointer=18446743524223531344) at 
> ./include/generated/atomic-instrumented.h:27
>  #5  0xff801020194c in ftrace_graph_caller () at 
> ../arch/arm64/kernel/entry-ftrace.S:182
> 
> Rework so that write_comp_data() that are called from
> __sanitizer_cov_trace_*_cmp*() are marked as 'notrace'.
> 
> Commit 903e8ff86753 ("kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() 
> as notrace")
> missed to mark write_comp_data() as 'notrace'. When that patch was
> created gcc-7 was used. In lib/Kconfig.debug
> config KCOV_ENABLE_COMPARISONS
>   depends on $(cc-option,-fsanitize-coverage=trace-cmp)
> 
> That code path isn't hit with gcc-7. However, it were that with gcc-8.
> 
> Co-developed-by: Arnd Bergmann 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Anders Roxell 

Acked-by: Steven Rostedt (VMware) 

-- Steve

> ---
>  kernel/kcov.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/kcov.c b/kernel/kcov.c
> index 97959d7b77e2..c2277dbdbfb1 100644
> --- a/kernel/kcov.c
> +++ b/kernel/kcov.c
> @@ -112,7 +112,7 @@ void notrace __sanitizer_cov_trace_pc(void)
>  EXPORT_SYMBOL(__sanitizer_cov_trace_pc);
>  
>  #ifdef CONFIG_KCOV_ENABLE_COMPARISONS
> -static void write_comp_data(u64 type, u64 arg1, u64 arg2, u64 ip)
> +static void notrace write_comp_data(u64 type, u64 arg1, u64 arg2, u64 ip)
>  {
>   struct task_struct *t;
>   u64 *area;



[GIT PULL] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-12-06 Thread Steven Rostedt


Linus,

This is a single commit that fixes a bug in uprobes SDT code
due to a missing mutex protection.


Please pull the latest trace-v4.20-rc5 tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
trace-v4.20-rc5

Tag SHA1: a5405a88026387f237d97b607234db71d91c13e2
Head SHA1: 1aed58e67a6ec1e7a18bfabe8ba6ec2d27c15636


Ravi Bangoria (1):
  Uprobes: Fix kernel oops with delayed_uprobe_remove()


 kernel/events/uprobes.c | 2 ++
 1 file changed, 2 insertions(+)
---
commit 1aed58e67a6ec1e7a18bfabe8ba6ec2d27c15636
Author: Ravi Bangoria 
Date:   Wed Dec 5 09:04:23 2018 +0530

Uprobes: Fix kernel oops with delayed_uprobe_remove()

There could be a race between task exit and probe unregister:

  exit_mm()
  mmput()
  __mmput() uprobe_unregister()
  uprobe_clear_state()  put_uprobe()
  delayed_uprobe_remove()   delayed_uprobe_remove()

put_uprobe() is calling delayed_uprobe_remove() without taking
delayed_uprobe_lock and thus the race sometimes results in a
kernel crash. Fix this by taking delayed_uprobe_lock before
calling delayed_uprobe_remove() from put_uprobe().

Detailed crash log can be found at:
  Link: http://lkml.kernel.org/r/140c370577db5...@google.com

Link: 
http://lkml.kernel.org/r/20181205033423.26242-1-ravi.bango...@linux.ibm.com

Acked-by: Oleg Nesterov 
Reviewed-by: Srikar Dronamraju 
Reported-by: syzbot+cb1fb754b771caca0...@syzkaller.appspotmail.com
Fixes: 1cc33161a83d ("uprobes: Support SDT markers having reference count 
(semaphore)")
Signed-off-by: Ravi Bangoria 
Signed-off-by: Steven Rostedt (VMware) 

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 96d4bee83489..98b9312ce6b2 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -572,7 +572,9 @@ static void put_uprobe(struct uprobe *uprobe)
 * gets called, we don't get a chance to remove uprobe from
 * delayed_uprobe_list from remove_breakpoint(). Do it here.
 */
+   mutex_lock(_uprobe_lock);
delayed_uprobe_remove(uprobe, NULL);
+   mutex_unlock(_uprobe_lock);
kfree(uprobe);
}
 }


Re: [PATCH 2/2] arm64: ftrace: Set FTRACE_SCHEDULABLE before ftrace_modify_all_code()

2018-12-06 Thread Steven Rostedt
On Thu, 6 Dec 2018 13:20:07 +
Will Deacon  wrote:

> On Wed, Dec 05, 2018 at 12:48:54PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" 
> > 
> > It has been reported that ftrace_replace_code() which is called by
> > ftrace_modify_all_code() can cause a soft lockup warning for an
> > allmodconfig kernel. This is because all the debug options enabled
> > causes the loop in ftrace_replace_code() (which loops over all the
> > functions being enabled where there can be 10s of thousands), is too
> > slow, and never schedules out.
> > 
> > To solve this, setting FTRACE_SCHEDULABLE to the command passed into
> > ftrace_replace_code() will make it call cond_resched() in the loop,
> > which prevents the soft lockup warning from triggering.
> > 
> > Link: 
> > http://lkml.kernel.org/r/20181204192903.8193-1-anders.rox...@linaro.org
> > 
> > Reported-by: Anders Roxell 
> > Signed-off-by: Steven Rostedt (VMware) 
> > ---
> >  arch/arm64/kernel/ftrace.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
> > index 57e962290df3..9a8de0a79f97 100644
> > --- a/arch/arm64/kernel/ftrace.c
> > +++ b/arch/arm64/kernel/ftrace.c
> > @@ -193,6 +193,7 @@ int ftrace_make_nop(struct module *mod, struct 
> > dyn_ftrace *rec,
> >  
> >  void arch_ftrace_update_code(int command)
> >  {
> > +   command |= FTRACE_SCHEDULABLE;
> > ftrace_modify_all_code(command);
> >  }  
> 
> Bikeshed: I'd probably go for FTRACE_MAY_SLEEP, but I'm not going to die
> on that hill so...

I like bike sheds. Hmm, it's not too late to change this. Perhaps I
will.

> 
> Acked-by: Will Deacon 

Thanks!

If I decide to change the name to MAY_SLEEP, I assume I can still keep
your Acked-by.

-- Steve



Re: [for-next][PATCH 05/30] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH

2018-12-06 Thread Steven Rostedt
On Thu, 6 Dec 2018 15:49:32 +
Will Deacon  wrote:

> On Wed, Dec 05, 2018 at 06:47:54PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" 
> > 
> > Functions in the set_graph_notrace no longer subtract FTRACE_NOTRACE_DEPTH
> > from curr_ret_stack, as that is now implemented via the trace_recursion
> > flags. Access to curr_ret_stack no longer needs to worry about checking for
> > this. curr_ret_stack is still initialized to -1, when there's not a shadow
> > stack allocated.
> > 
> > Cc: Catalin Marinas 
> > Cc: Will Deacon 
> > Cc: linux-arm-ker...@lists.infradead.org
> > Reviewed-by: Joel Fernandes (Google) 
> > Signed-off-by: Steven Rostedt (VMware) 
> > ---
> >  arch/arm64/kernel/stacktrace.c | 3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index 4989f7ea1e59..7723dadf25be 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -61,9 +61,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
> > stackframe *frame)
> > (frame->pc == (unsigned long)return_to_handler)) {
> > if (WARN_ON_ONCE(frame->graph == -1))
> > return -EINVAL;
> > -   if (frame->graph < -1)
> > -   frame->graph += FTRACE_NOTRACE_DEPTH;
> > -  
> 
> Acked-by: Will Deacon 
> 

Thanks Will!

-- Steve


Re: [for-next][PATCH 05/30] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH

2018-12-05 Thread Steven Rostedt
On Wed, 05 Dec 2018 18:47:54 -0500
Steven Rostedt  wrote:

> From: "Steven Rostedt (VMware)" 
> 
> Functions in the set_graph_notrace no longer subtract FTRACE_NOTRACE_DEPTH
> from curr_ret_stack, as that is now implemented via the trace_recursion
> flags. Access to curr_ret_stack no longer needs to worry about checking for
> this. curr_ret_stack is still initialized to -1, when there's not a shadow
> stack allocated.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 

I haven't pushed this to Linux next yet. I tested the entire tree as
well as crossed compiled it against arm64.

Can you give me an ack for this patch?

Thanks!

-- Steve

> Cc: linux-arm-ker...@lists.infradead.org
> Reviewed-by: Joel Fernandes (Google) 
> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  arch/arm64/kernel/stacktrace.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 4989f7ea1e59..7723dadf25be 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -61,9 +61,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
> stackframe *frame)
>   (frame->pc == (unsigned long)return_to_handler)) {
>   if (WARN_ON_ONCE(frame->graph == -1))
>   return -EINVAL;
> - if (frame->graph < -1)
> - frame->graph += FTRACE_NOTRACE_DEPTH;
> -
>   /*
>* This is a case where function graph tracer has
>* modified a return address (LR) in a stack frame



Re: [PATCH 1/3] stackleak: mark stackleak_track_stack() as notrace

2018-12-05 Thread Steven Rostedt
On Wed, 5 Dec 2018 19:29:11 -0800
Kees Cook  wrote:

> On Wed, Dec 5, 2018 at 6:29 PM Steven Rostedt  wrote:
> >
> > On Wed, 5 Dec 2018 21:26:51 -0500
> > Steven Rostedt  wrote:
> >  
> > > On Wed, 5 Dec 2018 17:08:34 -0800
> > > Kees Cook  wrote:
> > >  
> >  
> > > I'll Ack the Makefile
> > > change in the tracing directory, but the rest belongs to others.  
> 
> Okay, I wasn't sure. Anders's patch was marked "1/3" so I thought it
> was directed at you. :)
> 
> I'll grab this one in the gcc-plugins tree.

Should I just take patch 2 then? I'm thinking it's independent too.

I'm collecting patches for the next merge window right now so it wont
really be an issue if I do.

-- Steve


Re: [PATCH 1/3] stackleak: mark stackleak_track_stack() as notrace

2018-12-05 Thread Steven Rostedt
On Wed, 5 Dec 2018 21:26:51 -0500
Steven Rostedt  wrote:

> On Wed, 5 Dec 2018 17:08:34 -0800
> Kees Cook  wrote:
> 

> I'll Ack the Makefile
> change in the tracing directory, but the rest belongs to others.
> 

I see I already acked that patch. BTW, when sending a patch series, you
really need a 0/3 patch as a header and the rest be threaded. I had a
hard time finding that patch in the sea of my INBOX.

If I was the one to pull it in, I wouldn't do it if the series was
unthreaded like this.

-- Steve


Re: [PATCH 1/3] stackleak: mark stackleak_track_stack() as notrace

2018-12-05 Thread Steven Rostedt
On Wed, 5 Dec 2018 17:08:34 -0800
Kees Cook  wrote:

> > diff --git a/kernel/stackleak.c b/kernel/stackleak.c
> > index e42892926244..5de3bf596dd7 100644
> > --- a/kernel/stackleak.c
> > +++ b/kernel/stackleak.c
> > @@ -102,7 +102,7 @@ asmlinkage void stackleak_erase(void)
> > current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64;
> >  }
> >
> > -void __used stackleak_track_stack(void)
> > +void __used notrace stackleak_track_stack(void)
> >  {
> > /*
> >  * N.B. stackleak_erase() fills the kernel stack with the poison 
> > value,
> > --
> > 2.19.2
> >  
> 
> Acked-by: Kees Cook 
> 
> Steven, I assume this series going via your tree?

??

A notrace addition doesn't make it mine.

I added changes for the cond_resched in a different patch series that
I'll pull in (they are independent from this). I'll Ack the Makefile
change in the tracing directory, but the rest belongs to others.

-- Steve


[for-next][PATCH 02/30] tracing: Do not line wrap short line in function_graph_enter()

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Commit 588ca1786f2dd ("function_graph: Use new curr_ret_depth to manage
depth instead of curr_ret_stack") removed a parameter from the call
ftrace_push_return_trace() that made it so that the entire call was under 80
characters, but it did not remove the line break. There's no reason to break
that line up, so make it a single line.

Link: 
http://lkml.kernel.org/r/20181122100322.gn2...@hirez.programming.kicks-ass.net

Reported-by: Peter Zijlstra 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_functions_graph.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 086af4f5c3e8..0d235e44d08e 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -188,8 +188,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
trace.func = func;
trace.depth = ++current->curr_ret_depth;
 
-   if (ftrace_push_return_trace(ret, func,
-frame_pointer, retp))
+   if (ftrace_push_return_trace(ret, func, frame_pointer, retp))
goto out;
 
/* Only trace if the calling function expects to */
-- 
2.19.1




[for-next][PATCH 01/30] function_graph: Remove unused task_curr_ret_stack()

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The static inline function task_curr_ret_stack() is unused, remove it.

Reviewed-by: Joel Fernandes (Google) 
Reviewed-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h | 10 --
 1 file changed, 10 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index dd16e8218db3..10bd46434908 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -809,11 +809,6 @@ extern void ftrace_graph_init_task(struct task_struct *t);
 extern void ftrace_graph_exit_task(struct task_struct *t);
 extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
 
-static inline int task_curr_ret_stack(struct task_struct *t)
-{
-   return t->curr_ret_stack;
-}
-
 static inline void pause_graph_tracing(void)
 {
atomic_inc(>tracing_graph_pause);
@@ -838,11 +833,6 @@ static inline int 
register_ftrace_graph(trace_func_graph_ret_t retfunc,
 }
 static inline void unregister_ftrace_graph(void) { }
 
-static inline int task_curr_ret_stack(struct task_struct *tsk)
-{
-   return -1;
-}
-
 static inline unsigned long
 ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
  unsigned long *retp)
-- 
2.19.1




[for-next][PATCH 00/30] tracing: Updates for the next merge window

2018-12-05 Thread Steven Rostedt
Note, I still have more in my queue that need to go through testing.

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
for-next

Head SHA1: e007f5165a2e366579324062a69e56236a97fad3


Dan Carpenter (1):
  tracing: Have trace_stack nr_entries compare not be so subtle

Joe Lawrence (1):
  scripts/recordmcount.{c,pl}: support -ffunction-sections .text.* section 
names

Masami Hiramatsu (11):
  tracing/uprobes: Add busy check when cleanup all uprobes
  tracing: Lock event_mutex before synth_event_mutex
  tracing: Simplify creation and deletion of synthetic events
  tracing: Integrate similar probe argument parsers
  tracing: Add unified dynamic event framework
  tracing/kprobes: Use dyn_event framework for kprobe events
  tracing/uprobes: Use dyn_event framework for uprobe events
  tracing: Use dyn_event framework for synthetic events
  tracing: Remove unneeded synth_event_mutex
  tracing: Add generic event-name based remove event method
  selftests/ftrace: Add testcases for dynamic event

Steven Rostedt (VMware) (17):
  function_graph: Remove unused task_curr_ret_stack()
  tracing: Do not line wrap short line in function_graph_enter()
  fgraph: Create a fgraph.c file to store function graph infrastructure
  fgraph: Have set_graph_notrace only affect function_graph tracer
  arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH
  function_graph: Remove the use of FTRACE_NOTRACE_DEPTH
  ftrace: Create new ftrace_internal.h header
  function_graph: Do not expose the graph_time option when profiler is not 
configured
  fgraph: Move function graph specific code into fgraph.c
  tracing: Rearrange functions in trace_sched_wakeup.c
  fgraph: Add new fgraph_ops structure to enable function graph hooks
  function_graph: Move ftrace_graph_ret_addr() to fgraph.c
  function_graph: Have profiler use new helper ftrace_graph_get_ret_stack()
  ring-buffer: Add percentage of ring buffer full to wake up reader
  tracing: Add tracefs file buffer_percentage
  tracing: Change default buffer_percent to 50
  tracing: Consolidate trace_add/remove_event_call back to the nolock 
functions


 Documentation/trace/kprobetrace.rst|   3 +
 Documentation/trace/uprobetracer.rst   |   4 +
 arch/arm64/kernel/stacktrace.c |   3 -
 include/linux/ftrace.h |  35 +-
 include/linux/ring_buffer.h|   4 +-
 kernel/trace/Kconfig   |   6 +
 kernel/trace/Makefile  |   2 +
 kernel/trace/fgraph.c  | 615 +
 kernel/trace/ftrace.c  | 471 ++--
 kernel/trace/ftrace_internal.h |  75 +++
 kernel/trace/ring_buffer.c |  94 +++-
 kernel/trace/trace.c   |  72 ++-
 kernel/trace/trace.h   |  13 +
 kernel/trace/trace_dynevent.c  | 217 
 kernel/trace/trace_dynevent.h  | 119 
 kernel/trace/trace_events.c|   8 +-
 kernel/trace/trace_events_hist.c   | 316 ++-
 kernel/trace/trace_functions_graph.c   | 334 ++-
 kernel/trace/trace_irqsoff.c   |  18 +-
 kernel/trace/trace_kprobe.c| 353 ++--
 kernel/trace/trace_probe.c |  74 ++-
 kernel/trace/trace_probe.h |   9 +-
 kernel/trace/trace_sched_wakeup.c  | 270 +
 kernel/trace/trace_selftest.c  |   8 +-
 kernel/trace/trace_stack.c |   2 +-
 kernel/trace/trace_uprobe.c| 301 +-
 scripts/recordmcount.c |   2 +-
 scripts/recordmcount.pl|  13 +
 .../ftrace/test.d/dynevent/add_remove_kprobe.tc|  30 +
 .../ftrace/test.d/dynevent/add_remove_synth.tc |  27 +
 .../ftrace/test.d/dynevent/clear_select_events.tc  |  50 ++
 .../ftrace/test.d/dynevent/generic_clear_event.tc  |  49 ++
 32 files changed, 2176 insertions(+), 1421 deletions(-)
 create mode 100644 kernel/trace/fgraph.c
 create mode 100644 kernel/trace/ftrace_internal.h
 create mode 100644 kernel/trace/trace_dynevent.c
 create mode 100644 kernel/trace/trace_dynevent.h
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc


[for-next][PATCH 03/30] fgraph: Create a fgraph.c file to store function graph infrastructure

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

As the function graph infrastructure can be used by thing other than
tracing, moving the code to its own file out of the trace_functions_graph.c
code makes more sense.

The fgraph.c file will only contain the infrastructure required to hook into
functions and their return code.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/Makefile|   1 +
 kernel/trace/fgraph.c| 232 +++
 kernel/trace/trace_functions_graph.c | 220 -
 3 files changed, 233 insertions(+), 220 deletions(-)
 create mode 100644 kernel/trace/fgraph.c

diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index f81dadbc7c4a..c7ade7965464 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
 obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
 obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
+obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
 ifeq ($(CONFIG_BLOCK),y)
 obj-$(CONFIG_EVENT_TRACING) += blktrace.o
 endif
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
new file mode 100644
index ..5ad9c0e88b80
--- /dev/null
+++ b/kernel/trace/fgraph.c
@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Infrastructure to took into function calls and returns.
+ * Copyright (c) 2008-2009 Frederic Weisbecker 
+ * Mostly borrowed from function tracer which
+ * is Copyright (c) Steven Rostedt 
+ *
+ * Highly modified by Steven Rostedt (VMware).
+ */
+#include 
+
+#include "trace.h"
+
+static bool kill_ftrace_graph;
+
+/**
+ * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
+ *
+ * ftrace_graph_stop() is called when a severe error is detected in
+ * the function graph tracing. This function is called by the critical
+ * paths of function graph to keep those paths from doing any more harm.
+ */
+bool ftrace_graph_is_dead(void)
+{
+   return kill_ftrace_graph;
+}
+
+/**
+ * ftrace_graph_stop - set to permanently disable function graph tracincg
+ *
+ * In case of an error int function graph tracing, this is called
+ * to try to keep function graph tracing from causing any more harm.
+ * Usually this is pretty severe and this is called to try to at least
+ * get a warning out to the user.
+ */
+void ftrace_graph_stop(void)
+{
+   kill_ftrace_graph = true;
+}
+
+/* Add a function return address to the trace stack on thread info.*/
+static int
+ftrace_push_return_trace(unsigned long ret, unsigned long func,
+unsigned long frame_pointer, unsigned long *retp)
+{
+   unsigned long long calltime;
+   int index;
+
+   if (unlikely(ftrace_graph_is_dead()))
+   return -EBUSY;
+
+   if (!current->ret_stack)
+   return -EBUSY;
+
+   /*
+* We must make sure the ret_stack is tested before we read
+* anything else.
+*/
+   smp_rmb();
+
+   /* The return trace stack is full */
+   if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+   atomic_inc(>trace_overrun);
+   return -EBUSY;
+   }
+
+   /*
+* The curr_ret_stack is an index to ftrace return stack of
+* current task.  Its value should be in [0, FTRACE_RETFUNC_
+* DEPTH) when the function graph tracer is used.  To support
+* filtering out specific functions, it makes the index
+* negative by subtracting huge value (FTRACE_NOTRACE_DEPTH)
+* so when it sees a negative index the ftrace will ignore
+* the record.  And the index gets recovered when returning
+* from the filtered function by adding the FTRACE_NOTRACE_
+* DEPTH and then it'll continue to record functions normally.
+*
+* The curr_ret_stack is initialized to -1 and get increased
+* in this function.  So it can be less than -1 only if it was
+* filtered out via ftrace_graph_notrace_addr() which can be
+* set from set_graph_notrace file in tracefs by user.
+*/
+   if (current->curr_ret_stack < -1)
+   return -EBUSY;
+
+   calltime = trace_clock_local();
+
+   index = ++current->curr_ret_stack;
+   if (ftrace_graph_notrace_addr(func))
+   current->curr_ret_stack -= FTRACE_NOTRACE_DEPTH;
+   barrier();
+   current->ret_stack[index].ret = ret;
+   current->ret_stack[index].func = func;
+   current->ret_stack[index].calltime = calltime;
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
+   current->ret_stack[index].fp = frame_pointer;
+#endif
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+   current->ret_stack[index].retp = retp;
+#endif
+   return 0;
+}
+
+int function_graph_enter(unsigned long ret, unsigned long func,
+

[for-next][PATCH 04/30] fgraph: Have set_graph_notrace only affect function_graph tracer

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

In order to make the function graph infrastructure more generic, there can
not be code specific for the function_graph tracer in the generic code. This
includes the set_graph_notrace logic, that stops all graph calls when a
function in the set_graph_notrace is hit.

By using the trace_recursion mask, we can use a bit in the current
task_struct to implement the notrace code, and move the logic out of
fgraph.c and into trace_functions_graph.c and keeps it affecting only the
tracer and not all call graph callbacks.

Acked-by: Namhyung Kim 
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c| 21 -
 kernel/trace/trace.h |  7 +++
 kernel/trace/trace_functions_graph.c | 22 ++
 3 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 5ad9c0e88b80..e852b69c0e64 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -64,30 +64,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
return -EBUSY;
}
 
-   /*
-* The curr_ret_stack is an index to ftrace return stack of
-* current task.  Its value should be in [0, FTRACE_RETFUNC_
-* DEPTH) when the function graph tracer is used.  To support
-* filtering out specific functions, it makes the index
-* negative by subtracting huge value (FTRACE_NOTRACE_DEPTH)
-* so when it sees a negative index the ftrace will ignore
-* the record.  And the index gets recovered when returning
-* from the filtered function by adding the FTRACE_NOTRACE_
-* DEPTH and then it'll continue to record functions normally.
-*
-* The curr_ret_stack is initialized to -1 and get increased
-* in this function.  So it can be less than -1 only if it was
-* filtered out via ftrace_graph_notrace_addr() which can be
-* set from set_graph_notrace file in tracefs by user.
-*/
-   if (current->curr_ret_stack < -1)
-   return -EBUSY;
-
calltime = trace_clock_local();
 
index = ++current->curr_ret_stack;
-   if (ftrace_graph_notrace_addr(func))
-   current->curr_ret_stack -= FTRACE_NOTRACE_DEPTH;
barrier();
current->ret_stack[index].ret = ret;
current->ret_stack[index].func = func;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 447bd96ee658..f67060a75f38 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -534,6 +534,13 @@ enum {
 
TRACE_GRAPH_DEPTH_START_BIT,
TRACE_GRAPH_DEPTH_END_BIT,
+
+   /*
+* To implement set_graph_notrace, if this bit is set, we ignore
+* function graph tracing of called functions, until the return
+* function is called to clear it.
+*/
+   TRACE_GRAPH_NOTRACE_BIT,
 };
 
 #define trace_recursion_set(bit)   do { (current)->trace_recursion |= 
(1<<(bit)); } while (0)
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index b846d82c2f95..ecf543df943b 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -188,6 +188,18 @@ int trace_graph_entry(struct ftrace_graph_ent *trace)
int cpu;
int pc;
 
+   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
+   return 0;
+
+   if (ftrace_graph_notrace_addr(trace->func)) {
+   trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
+   /*
+* Need to return 1 to have the return called
+* that will clear the NOTRACE bit.
+*/
+   return 1;
+   }
+
if (!ftrace_trace_task(tr))
return 0;
 
@@ -290,6 +302,11 @@ void trace_graph_return(struct ftrace_graph_ret *trace)
 
ftrace_graph_addr_finish(trace);
 
+   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
+   trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+   return;
+   }
+
local_irq_save(flags);
cpu = raw_smp_processor_id();
data = per_cpu_ptr(tr->trace_buffer.data, cpu);
@@ -315,6 +332,11 @@ static void trace_graph_thresh_return(struct 
ftrace_graph_ret *trace)
 {
ftrace_graph_addr_finish(trace);
 
+   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
+   trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+   return;
+   }
+
if (tracing_thresh &&
(trace->rettime - trace->calltime < tracing_thresh))
return;
-- 
2.19.1




[for-next][PATCH 11/30] fgraph: Add new fgraph_ops structure to enable function graph hooks

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Currently the registering of function graph is to pass in a entry and return
function. We need to have a way to associate those functions together where
the entry can determine to run the return hook. Having a structure that
contains both functions will facilitate the process of converting the code
to be able to do such.

This is similar to the way function hooks are enabled (it passes in
ftrace_ops). Instead of passing in the functions to use, a single structure
is passed in to the registering function.

The unregister function is now passed in the fgraph_ops handle. When we
allow more than one callback to the function graph hooks, this will let the
system know which one to remove.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h   | 21 +++--
 kernel/trace/fgraph.c|  9 -
 kernel/trace/ftrace.c| 10 +++---
 kernel/trace/trace_functions_graph.c | 21 -
 kernel/trace/trace_irqsoff.c | 18 +++---
 kernel/trace/trace_sched_wakeup.c| 16 +++-
 kernel/trace/trace_selftest.c|  8 ++--
 7 files changed, 58 insertions(+), 45 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 98625f10d982..21c80491ccde 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -749,6 +749,11 @@ typedef int (*trace_func_graph_ent_t)(struct 
ftrace_graph_ent *); /* entry */
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
+struct fgraph_ops {
+   trace_func_graph_ent_t  entryfunc;
+   trace_func_graph_ret_t  retfunc;
+};
+
 /*
  * Stack of return addresses for functions
  * of a thread.
@@ -792,8 +797,9 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
 
 #define FTRACE_RETFUNC_DEPTH 50
 #define FTRACE_RETSTACK_ALLOC_SIZE 32
-extern int register_ftrace_graph(trace_func_graph_ret_t retfunc,
-   trace_func_graph_ent_t entryfunc);
+
+extern int register_ftrace_graph(struct fgraph_ops *ops);
+extern void unregister_ftrace_graph(struct fgraph_ops *ops);
 
 extern bool ftrace_graph_is_dead(void);
 extern void ftrace_graph_stop(void);
@@ -802,8 +808,6 @@ extern void ftrace_graph_stop(void);
 extern trace_func_graph_ret_t ftrace_graph_return;
 extern trace_func_graph_ent_t ftrace_graph_entry;
 
-extern void unregister_ftrace_graph(void);
-
 extern void ftrace_graph_init_task(struct task_struct *t);
 extern void ftrace_graph_exit_task(struct task_struct *t);
 extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
@@ -825,12 +829,9 @@ static inline void ftrace_graph_init_task(struct 
task_struct *t) { }
 static inline void ftrace_graph_exit_task(struct task_struct *t) { }
 static inline void ftrace_graph_init_idle_task(struct task_struct *t, int cpu) 
{ }
 
-static inline int register_ftrace_graph(trace_func_graph_ret_t retfunc,
- trace_func_graph_ent_t entryfunc)
-{
-   return -1;
-}
-static inline void unregister_ftrace_graph(void) { }
+/* Define as macros as fgraph_ops may not be defined */
+#define register_ftrace_graph(ops) ({ -1; })
+#define unregister_ftrace_graph(ops) do { } while (0)
 
 static inline unsigned long
 ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 374f3e42e29e..cc35606e9a3e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -490,8 +490,7 @@ static int start_graph_tracing(void)
return ret;
 }
 
-int register_ftrace_graph(trace_func_graph_ret_t retfunc,
-   trace_func_graph_ent_t entryfunc)
+int register_ftrace_graph(struct fgraph_ops *gops)
 {
int ret = 0;
 
@@ -512,7 +511,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
goto out;
}
 
-   ftrace_graph_return = retfunc;
+   ftrace_graph_return = gops->retfunc;
 
/*
 * Update the indirect function to the entryfunc, and the
@@ -520,7 +519,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
 * call the update fgraph entry function to determine if
 * the entryfunc should be called directly or not.
 */
-   __ftrace_graph_entry = entryfunc;
+   __ftrace_graph_entry = gops->entryfunc;
ftrace_graph_entry = ftrace_graph_entry_test;
update_function_graph_func();
 
@@ -530,7 +529,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
return ret;
 }
 
-void unregister_ftrace_graph(void)
+void unregister_ftrace_graph(struct fgraph_ops *gops)
 {
mutex_lock(_lock);
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c53533b833cf..d06fe588e650 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -849,15 +849,19 @@ static void profile_graph_return(struct ftrace_g

[for-next][PATCH 12/30] function_graph: Move ftrace_graph_ret_addr() to fgraph.c

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Move the function function_graph_ret_addr() to fgraph.c, as the management
of the curr_ret_stack is going to change, and all the accesses to ret_stack
needs to be done in fgraph.c.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c| 55 
 kernel/trace/trace_functions_graph.c | 55 
 2 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index cc35606e9a3e..90fcefcaff2a 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -232,6 +232,61 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
return ret;
 }
 
+/**
+ * ftrace_graph_ret_addr - convert a potentially modified stack return address
+ *to its original value
+ *
+ * This function can be called by stack unwinding code to convert a found stack
+ * return address ('ret') to its original value, in case the function graph
+ * tracer has modified it to be 'return_to_handler'.  If the address hasn't
+ * been modified, the unchanged value of 'ret' is returned.
+ *
+ * 'idx' is a state variable which should be initialized by the caller to zero
+ * before the first call.
+ *
+ * 'retp' is a pointer to the return address on the stack.  It's ignored if
+ * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
+ */
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+   unsigned long ret, unsigned long *retp)
+{
+   int index = task->curr_ret_stack;
+   int i;
+
+   if (ret != (unsigned long)return_to_handler)
+   return ret;
+
+   if (index < 0)
+   return ret;
+
+   for (i = 0; i <= index; i++)
+   if (task->ret_stack[i].retp == retp)
+   return task->ret_stack[i].ret;
+
+   return ret;
+}
+#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+   unsigned long ret, unsigned long *retp)
+{
+   int task_idx;
+
+   if (ret != (unsigned long)return_to_handler)
+   return ret;
+
+   task_idx = task->curr_ret_stack;
+
+   if (!task->ret_stack || task_idx < *idx)
+   return ret;
+
+   task_idx -= *idx;
+   (*idx)++;
+
+   return task->ret_stack[task_idx].ret;
+}
+#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+
 static struct ftrace_ops graph_ops = {
.func   = ftrace_stub,
.flags  = FTRACE_OPS_FL_RECURSION_SAFE |
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 140b4b51ab34..c2af1560e856 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -94,61 +94,6 @@ static void
 print_graph_duration(struct trace_array *tr, unsigned long long duration,
 struct trace_seq *s, u32 flags);
 
-/**
- * ftrace_graph_ret_addr - convert a potentially modified stack return address
- *to its original value
- *
- * This function can be called by stack unwinding code to convert a found stack
- * return address ('ret') to its original value, in case the function graph
- * tracer has modified it to be 'return_to_handler'.  If the address hasn't
- * been modified, the unchanged value of 'ret' is returned.
- *
- * 'idx' is a state variable which should be initialized by the caller to zero
- * before the first call.
- *
- * 'retp' is a pointer to the return address on the stack.  It's ignored if
- * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
- */
-#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
-   unsigned long ret, unsigned long *retp)
-{
-   int index = task->curr_ret_stack;
-   int i;
-
-   if (ret != (unsigned long)return_to_handler)
-   return ret;
-
-   if (index < 0)
-   return ret;
-
-   for (i = 0; i <= index; i++)
-   if (task->ret_stack[i].retp == retp)
-   return task->ret_stack[i].ret;
-
-   return ret;
-}
-#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
-unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
-   unsigned long ret, unsigned long *retp)
-{
-   int task_idx;
-
-   if (ret != (unsigned long)return_to_handler)
-   return ret;
-
-   task_idx = task->curr_ret_stack;
-
-   if (!task->ret_stack || task_idx < *idx)
-   return ret;
-
-   task_idx -= *idx;
-   (*idx)++;
-
-   return task->ret_stack[task_idx].ret;
-}
-#endif /* HAVE_FUNCTION_GRA

[for-next][PATCH 14/30] tracing: Have trace_stack nr_entries compare not be so subtle

2018-12-05 Thread Steven Rostedt
From: Dan Carpenter 

Dan Carpenter reviewed the trace_stack.c code and figured he found an off by
one bug.

 "From reviewing the code, it seems possible for
  stack_trace_max.nr_entries to be set to .max_entries and in that case we
  would be reading one element beyond the end of the stack_dump_trace[]
  array.  If it's not set to .max_entries then the bug doesn't affect
  runtime."

Although it looks to be the case, it is not. Because we have:

 static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
 { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };

 struct stack_trace stack_trace_max = {
.max_entries= STACK_TRACE_ENTRIES - 1,
.entries= _dump_trace[0],
 };

And:

stack_trace_max.nr_entries = x;
for (; x < i; x++)
stack_dump_trace[x] = ULONG_MAX;

Even if nr_entries equals max_entries, indexing with it into the
stack_dump_trace[] array will not overflow the array. But if it is the case,
the second part of the conditional that tests stack_dump_trace[nr_entries]
to ULONG_MAX will always be true.

By applying Dan's patch, it removes the subtle aspect of it and makes the if
conditional slightly more efficient.

Link: http://lkml.kernel.org/r/20180620110758.crunhd5bfep7zuiz@kili.mountain

Signed-off-by: Dan Carpenter 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_stack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 2b0d1ee3241c..e2a153fc1afc 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -286,7 +286,7 @@ __next(struct seq_file *m, loff_t *pos)
 {
long n = *pos - 1;
 
-   if (n > stack_trace_max.nr_entries || stack_dump_trace[n] == ULONG_MAX)
+   if (n >= stack_trace_max.nr_entries || stack_dump_trace[n] == ULONG_MAX)
return NULL;
 
m->private = (void *)n;
-- 
2.19.1




[for-next][PATCH 17/30] tracing: Add tracefs file buffer_percentage

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Add a "buffer_percentage" file, that allows users to specify how much of the
buffer (percentage of pages) need to be filled before waking up a task
blocked on a per cpu trace_pipe_raw file.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/ring_buffer.c | 39 ---
 kernel/trace/trace.c   | 54 +-
 kernel/trace/trace.h   |  1 +
 3 files changed, 77 insertions(+), 17 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9edb628603ab..5434c16f2192 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -489,6 +489,7 @@ struct ring_buffer_per_cpu {
local_t commits;
local_t pages_touched;
local_t pages_read;
+   longlast_pages_touch;
size_t  shortest_full;
unsigned long   read;
unsigned long   read_bytes;
@@ -2632,7 +2633,9 @@ static void rb_commit(struct ring_buffer_per_cpu 
*cpu_buffer,
 static __always_inline void
 rb_wakeups(struct ring_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
 {
-   bool pagebusy;
+   size_t nr_pages;
+   size_t dirty;
+   size_t full;
 
if (buffer->irq_work.waiters_pending) {
buffer->irq_work.waiters_pending = false;
@@ -2646,24 +2649,27 @@ rb_wakeups(struct ring_buffer *buffer, struct 
ring_buffer_per_cpu *cpu_buffer)
irq_work_queue(_buffer->irq_work.work);
}
 
-   pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
+   if (cpu_buffer->last_pages_touch == 
local_read(_buffer->pages_touched))
+   return;
 
-   if (!pagebusy && cpu_buffer->irq_work.full_waiters_pending) {
-   size_t nr_pages;
-   size_t dirty;
-   size_t full;
+   if (cpu_buffer->reader_page == cpu_buffer->commit_page)
+   return;
 
-   full = cpu_buffer->shortest_full;
-   nr_pages = cpu_buffer->nr_pages;
-   dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
-   if (full && nr_pages && (dirty * 100) <= full * nr_pages)
-   return;
+   if (!cpu_buffer->irq_work.full_waiters_pending)
+   return;
 
-   cpu_buffer->irq_work.wakeup_full = true;
-   cpu_buffer->irq_work.full_waiters_pending = false;
-   /* irq_work_queue() supplies it's own memory barriers */
-   irq_work_queue(_buffer->irq_work.work);
-   }
+   cpu_buffer->last_pages_touch = local_read(_buffer->pages_touched);
+
+   full = cpu_buffer->shortest_full;
+   nr_pages = cpu_buffer->nr_pages;
+   dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
+   if (full && nr_pages && (dirty * 100) <= full * nr_pages)
+   return;
+
+   cpu_buffer->irq_work.wakeup_full = true;
+   cpu_buffer->irq_work.full_waiters_pending = false;
+   /* irq_work_queue() supplies it's own memory barriers */
+   irq_work_queue(_buffer->irq_work.work);
 }
 
 /*
@@ -4394,6 +4400,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
local_set(_buffer->commits, 0);
local_set(_buffer->pages_touched, 0);
local_set(_buffer->pages_read, 0);
+   cpu_buffer->last_pages_touch = 0;
cpu_buffer->shortest_full = 0;
cpu_buffer->read = 0;
cpu_buffer->read_bytes = 0;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 48d5eb22ff33..d382fd1aa4a6 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6948,7 +6948,7 @@ tracing_buffers_splice_read(struct file *file, loff_t 
*ppos,
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
 
-   ret = wait_on_pipe(iter, 1);
+   ret = wait_on_pipe(iter, iter->tr->buffer_percent);
if (ret)
goto out;
 
@@ -7662,6 +7662,53 @@ static const struct file_operations rb_simple_fops = {
.llseek = default_llseek,
 };
 
+static ssize_t
+buffer_percent_read(struct file *filp, char __user *ubuf,
+   size_t cnt, loff_t *ppos)
+{
+   struct trace_array *tr = filp->private_data;
+   char buf[64];
+   int r;
+
+   r = tr->buffer_percent;
+   r = sprintf(buf, "%d\n", r);
+
+   return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static ssize_t
+buffer_percent_write(struct file *filp, const char __user *ubuf,
+size_t cnt, loff_t *ppos)
+{
+   struct trace_array *tr = filp->priva

[for-next][PATCH 07/30] ftrace: Create new ftrace_internal.h header

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

In order to move function graph infrastructure into its own file (fgraph.h)
it needs to access various functions and variables in ftrace.c that are
currently static. Create a new file called ftrace-internal.h that holds the
function prototypes and the extern declarations of the variables needed by
fgraph.c as well, and make them global in ftrace.c such that they can be
used outside that file.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/ftrace.c  | 76 +++---
 kernel/trace/ftrace_internal.h | 75 +
 2 files changed, 89 insertions(+), 62 deletions(-)
 create mode 100644 kernel/trace/ftrace_internal.h

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 77734451cb05..52c89428b0db 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 
+#include "ftrace_internal.h"
 #include "trace_output.h"
 #include "trace_stat.h"
 
@@ -77,7 +78,7 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
-static struct ftrace_ops ftrace_list_end __read_mostly = {
+struct ftrace_ops ftrace_list_end __read_mostly = {
.func   = ftrace_stub,
.flags  = FTRACE_OPS_FL_RECURSION_SAFE | FTRACE_OPS_FL_STUB,
INIT_OPS_HASH(ftrace_list_end)
@@ -112,11 +113,11 @@ static void ftrace_update_trampoline(struct ftrace_ops 
*ops);
  */
 static int ftrace_disabled __read_mostly;
 
-static DEFINE_MUTEX(ftrace_lock);
+DEFINE_MUTEX(ftrace_lock);
 
-static struct ftrace_ops __rcu *ftrace_ops_list __read_mostly = 
_list_end;
+struct ftrace_ops __rcu *ftrace_ops_list __read_mostly = _list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
-static struct ftrace_ops global_ops;
+struct ftrace_ops global_ops;
 
 #if ARCH_SUPPORTS_FTRACE_OPS
 static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
@@ -127,26 +128,6 @@ static void ftrace_ops_no_ops(unsigned long ip, unsigned 
long parent_ip);
 #define ftrace_ops_list_func ((ftrace_func_t)ftrace_ops_no_ops)
 #endif
 
-/*
- * Traverse the ftrace_global_list, invoking all entries.  The reason that we
- * can use rcu_dereference_raw_notrace() is that elements removed from this 
list
- * are simply leaked, so there is no need to interact with a grace-period
- * mechanism.  The rcu_dereference_raw_notrace() calls are needed to handle
- * concurrent insertions into the ftrace_global_list.
- *
- * Silly Alpha and silly pointer-speculation compiler optimizations!
- */
-#define do_for_each_ftrace_op(op, list)\
-   op = rcu_dereference_raw_notrace(list); \
-   do
-
-/*
- * Optimized for just a single item in the list (as that is the normal case).
- */
-#define while_for_each_ftrace_op(op)   \
-   while (likely(op = rcu_dereference_raw_notrace((op)->next)) &&  \
-  unlikely((op) != _list_end))
-
 static inline void ftrace_ops_init(struct ftrace_ops *ops)
 {
 #ifdef CONFIG_DYNAMIC_FTRACE
@@ -187,17 +168,11 @@ static void ftrace_sync_ipi(void *data)
 }
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static void update_function_graph_func(void);
-
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 static bool fgraph_graph_time = true;
-
-#else
-static inline void update_function_graph_func(void) { }
 #endif
 
-
 static ftrace_func_t ftrace_ops_get_list_func(struct ftrace_ops *ops)
 {
/*
@@ -334,7 +309,7 @@ static int remove_ftrace_ops(struct ftrace_ops __rcu **list,
 
 static void ftrace_update_trampoline(struct ftrace_ops *ops);
 
-static int __register_ftrace_function(struct ftrace_ops *ops)
+int __register_ftrace_function(struct ftrace_ops *ops)
 {
if (ops->flags & FTRACE_OPS_FL_DELETED)
return -EINVAL;
@@ -375,7 +350,7 @@ static int __register_ftrace_function(struct ftrace_ops 
*ops)
return 0;
 }
 
-static int __unregister_ftrace_function(struct ftrace_ops *ops)
+int __unregister_ftrace_function(struct ftrace_ops *ops)
 {
int ret;
 
@@ -1022,9 +997,7 @@ static __init void ftrace_profile_tracefs(struct dentry 
*d_tracer)
 #endif /* CONFIG_FUNCTION_PROFILER */
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static int ftrace_graph_active;
-#else
-# define ftrace_graph_active 0
+int ftrace_graph_active;
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
@@ -1067,7 +1040,7 @@ static const struct ftrace_hash empty_hash = {
 };
 #define EMPTY_HASH ((struct ftrace_hash *)_hash)
 
-static struct ftrace_ops global_ops = {
+struct ftrace_ops global_ops = {
.func   = ftrace_stub,
.local_hash.notrace_hash= EMPTY_HASH,
.local_hash.filter_hash = EMPTY_HASH,
@@ -1503,7 +1476,7 @@ static bool hash_contains_ip(unsigned long ip,
  * This needs to be c

[for-next][PATCH 09/30] fgraph: Move function graph specific code into fgraph.c

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

To make the function graph infrastructure more managable, the code needs to
be in its own file (fgraph.c). Move the code that is specific for managing
the function graph infrastructure out of ftrace.c and into fgraph.c

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c | 360 -
 kernel/trace/ftrace.c | 368 +-
 2 files changed, 366 insertions(+), 362 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index de887a983ac7..374f3e42e29e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -7,11 +7,27 @@
  *
  * Highly modified by Steven Rostedt (VMware).
  */
+#include 
 #include 
+#include 
 
-#include "trace.h"
+#include 
+
+#include "ftrace_internal.h"
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+#define ASSIGN_OPS_HASH(opsname, val) \
+   .func_hash  = val, \
+   .local_hash.regex_lock  = 
__MUTEX_INITIALIZER(opsname.local_hash.regex_lock),
+#else
+#define ASSIGN_OPS_HASH(opsname, val)
+#endif
 
 static bool kill_ftrace_graph;
+int ftrace_graph_active;
+
+/* Both enabled by default (can be cleared by function_graph tracer flags */
+static bool fgraph_sleep_time = true;
 
 /**
  * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
@@ -161,6 +177,31 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
barrier();
 }
 
+/*
+ * Hibernation protection.
+ * The state of the current task is too much unstable during
+ * suspend/restore to disk. We want to protect against that.
+ */
+static int
+ftrace_suspend_notifier_call(struct notifier_block *bl, unsigned long state,
+   void *unused)
+{
+   switch (state) {
+   case PM_HIBERNATION_PREPARE:
+   pause_graph_tracing();
+   break;
+
+   case PM_POST_HIBERNATION:
+   unpause_graph_tracing();
+   break;
+   }
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block ftrace_suspend_notifier = {
+   .notifier_call = ftrace_suspend_notifier_call,
+};
+
 /*
  * Send the trace to the ring-buffer.
  * @return the original return address.
@@ -190,3 +231,320 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
 
return ret;
 }
+
+static struct ftrace_ops graph_ops = {
+   .func   = ftrace_stub,
+   .flags  = FTRACE_OPS_FL_RECURSION_SAFE |
+  FTRACE_OPS_FL_INITIALIZED |
+  FTRACE_OPS_FL_PID |
+  FTRACE_OPS_FL_STUB,
+#ifdef FTRACE_GRAPH_TRAMP_ADDR
+   .trampoline = FTRACE_GRAPH_TRAMP_ADDR,
+   /* trampoline_size is only needed for dynamically allocated tramps */
+#endif
+   ASSIGN_OPS_HASH(graph_ops, _ops.local_hash)
+};
+
+void ftrace_graph_sleep_time_control(bool enable)
+{
+   fgraph_sleep_time = enable;
+}
+
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+{
+   return 0;
+}
+
+/* The callbacks that hook a function */
+trace_func_graph_ret_t ftrace_graph_return =
+   (trace_func_graph_ret_t)ftrace_stub;
+trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
+static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
+
+/* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
+static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
+{
+   int i;
+   int ret = 0;
+   int start = 0, end = FTRACE_RETSTACK_ALLOC_SIZE;
+   struct task_struct *g, *t;
+
+   for (i = 0; i < FTRACE_RETSTACK_ALLOC_SIZE; i++) {
+   ret_stack_list[i] =
+   kmalloc_array(FTRACE_RETFUNC_DEPTH,
+ sizeof(struct ftrace_ret_stack),
+ GFP_KERNEL);
+   if (!ret_stack_list[i]) {
+   start = 0;
+   end = i;
+   ret = -ENOMEM;
+   goto free;
+   }
+   }
+
+   read_lock(_lock);
+   do_each_thread(g, t) {
+   if (start == end) {
+   ret = -EAGAIN;
+   goto unlock;
+   }
+
+   if (t->ret_stack == NULL) {
+   atomic_set(>tracing_graph_pause, 0);
+   atomic_set(>trace_overrun, 0);
+   t->curr_ret_stack = -1;
+   t->curr_ret_depth = -1;
+   /* Make sure the tasks see the -1 first: */
+   smp_wmb();
+   t->ret_stack = ret_stack_list[start++];
+   }
+   } while_each_thread(g, t);
+
+unlock:
+   read_unloc

[for-next][PATCH 06/30] function_graph: Remove the use of FTRACE_NOTRACE_DEPTH

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The curr_ret_stack is no longer set to a negative value when a function is
not to be traced by the function graph tracer. Remove the usage of
FTRACE_NOTRACE_DEPTH, as it is no longer needed.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h   |  1 -
 kernel/trace/fgraph.c| 19 ---
 kernel/trace/trace_functions_graph.c | 11 ---
 3 files changed, 31 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 10bd46434908..98625f10d982 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -790,7 +790,6 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
  */
 #define __notrace_funcgraphnotrace
 
-#define FTRACE_NOTRACE_DEPTH 65536
 #define FTRACE_RETFUNC_DEPTH 50
 #define FTRACE_RETSTACK_ALLOC_SIZE 32
 extern int register_ftrace_graph(trace_func_graph_ret_t retfunc,
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index e852b69c0e64..de887a983ac7 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -112,16 +112,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
 
index = current->curr_ret_stack;
 
-   /*
-* A negative index here means that it's just returned from a
-* notrace'd function.  Recover index to get an original
-* return address.  See ftrace_push_return_trace().
-*
-* TODO: Need to check whether the stack gets corrupted.
-*/
-   if (index < 0)
-   index += FTRACE_NOTRACE_DEPTH;
-
if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
ftrace_graph_stop();
WARN_ON(1);
@@ -190,15 +180,6 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
 */
barrier();
current->curr_ret_stack--;
-   /*
-* The curr_ret_stack can be less than -1 only if it was
-* filtered out and it's about to return from the function.
-* Recover the index and continue to trace normal functions.
-*/
-   if (current->curr_ret_stack < -1) {
-   current->curr_ret_stack += FTRACE_NOTRACE_DEPTH;
-   return ret;
-   }
 
if (unlikely(!ret)) {
ftrace_graph_stop();
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index ecf543df943b..eaf9b1629956 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -115,9 +115,6 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
if (ret != (unsigned long)return_to_handler)
return ret;
 
-   if (index < -1)
-   index += FTRACE_NOTRACE_DEPTH;
-
if (index < 0)
return ret;
 
@@ -675,10 +672,6 @@ print_graph_entry_leaf(struct trace_iterator *iter,
 
cpu_data = per_cpu_ptr(data->cpu_data, cpu);
 
-   /* If a graph tracer ignored set_graph_notrace */
-   if (call->depth < -1)
-   call->depth += FTRACE_NOTRACE_DEPTH;
-
/*
 * Comments display at + 1 to depth. Since
 * this is a leaf function, keep the comments
@@ -721,10 +714,6 @@ print_graph_entry_nested(struct trace_iterator *iter,
struct fgraph_cpu_data *cpu_data;
int cpu = iter->cpu;
 
-   /* If a graph tracer ignored set_graph_notrace */
-   if (call->depth < -1)
-   call->depth += FTRACE_NOTRACE_DEPTH;
-
cpu_data = per_cpu_ptr(data->cpu_data, cpu);
cpu_data->depth = call->depth;
 
-- 
2.19.1




[for-next][PATCH 24/30] tracing/kprobes: Use dyn_event framework for kprobe events

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Use dyn_event framework for kprobe events. This shows
kprobe events on "tracing/dynamic_events" file.

User can also define new events via tracing/dynamic_events.

Link: 
http://lkml.kernel.org/r/154140855646.17322.6619219995865980392.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/kprobetrace.rst |   3 +
 kernel/trace/Kconfig|   1 +
 kernel/trace/trace_kprobe.c | 319 +++-
 kernel/trace/trace_probe.c  |  27 +++
 kernel/trace/trace_probe.h  |   2 +
 5 files changed, 207 insertions(+), 145 deletions(-)

diff --git a/Documentation/trace/kprobetrace.rst 
b/Documentation/trace/kprobetrace.rst
index 47e765c2f2c3..235ce2ab131a 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -20,6 +20,9 @@ current_tracer. Instead of that, add probe points via
 /sys/kernel/debug/tracing/kprobe_events, and enable it via
 /sys/kernel/debug/tracing/events/kprobes//enable.
 
+You can also use /sys/kernel/debug/tracing/dynamic_events instead of
+kprobe_events. That interface will provide unified access to other
+dynamic events too.
 
 Synopsis of kprobe_events
 -
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index bf2e8a5a91f1..c0f6b0105609 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -461,6 +461,7 @@ config KPROBE_EVENTS
bool "Enable kprobes-based dynamic events"
select TRACING
select PROBE_EVENTS
+   select DYNAMIC_EVENTS
default y
help
  This allows the user to add tracing events (similar to tracepoints)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index d313bcc259dc..bdf8c2ad5152 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 
+#include "trace_dynevent.h"
 #include "trace_kprobe_selftest.h"
 #include "trace_probe.h"
 #include "trace_probe_tmpl.h"
@@ -19,17 +20,51 @@
 #define KPROBE_EVENT_SYSTEM "kprobes"
 #define KRETPROBE_MAXACTIVE_MAX 4096
 
+static int trace_kprobe_create(int argc, const char **argv);
+static int trace_kprobe_show(struct seq_file *m, struct dyn_event *ev);
+static int trace_kprobe_release(struct dyn_event *ev);
+static bool trace_kprobe_is_busy(struct dyn_event *ev);
+static bool trace_kprobe_match(const char *system, const char *event,
+  struct dyn_event *ev);
+
+static struct dyn_event_operations trace_kprobe_ops = {
+   .create = trace_kprobe_create,
+   .show = trace_kprobe_show,
+   .is_busy = trace_kprobe_is_busy,
+   .free = trace_kprobe_release,
+   .match = trace_kprobe_match,
+};
+
 /**
  * Kprobe event core functions
  */
 struct trace_kprobe {
-   struct list_headlist;
+   struct dyn_eventdevent;
struct kretproberp; /* Use rp.kp for kprobe use */
unsigned long __percpu *nhit;
const char  *symbol;/* symbol name */
struct trace_probe  tp;
 };
 
+static bool is_trace_kprobe(struct dyn_event *ev)
+{
+   return ev->ops == _kprobe_ops;
+}
+
+static struct trace_kprobe *to_trace_kprobe(struct dyn_event *ev)
+{
+   return container_of(ev, struct trace_kprobe, devent);
+}
+
+/**
+ * for_each_trace_kprobe - iterate over the trace_kprobe list
+ * @pos:   the struct trace_kprobe * for each entry
+ * @dpos:  the struct dyn_event * to use as a loop cursor
+ */
+#define for_each_trace_kprobe(pos, dpos)   \
+   for_each_dyn_event(dpos)\
+   if (is_trace_kprobe(dpos) && (pos = to_trace_kprobe(dpos)))
+
 #define SIZEOF_TRACE_KPROBE(n) \
(offsetof(struct trace_kprobe, tp.args) +   \
(sizeof(struct probe_arg) * (n)))
@@ -81,6 +116,22 @@ static nokprobe_inline bool 
trace_kprobe_module_exist(struct trace_kprobe *tk)
return ret;
 }
 
+static bool trace_kprobe_is_busy(struct dyn_event *ev)
+{
+   struct trace_kprobe *tk = to_trace_kprobe(ev);
+
+   return trace_probe_is_enabled(>tp);
+}
+
+static bool trace_kprobe_match(const char *system, const char *event,
+  struct dyn_event *ev)
+{
+   struct trace_kprobe *tk = to_trace_kprobe(ev);
+
+   return strcmp(trace_event_name(>tp.call), event) == 0 &&
+   (!system || strcmp(tk->tp.call.class->system, system) == 0);
+}
+
 static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk)
 {
unsigned long nhit = 0;
@@ -128,9 +179,6 @@ bool trace_kprobe_error_injectable(struct trace_event_call 
*call)
 static int register_kprobe_event(struct trace_kprobe *tk);
 static int unregister_kprobe_event(struct trace_kprobe *tk);
 
-s

[for-next][PATCH 10/30] tracing: Rearrange functions in trace_sched_wakeup.c

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Rearrange the functions in trace_sched_wakeup.c so that there are fewer
 #ifdef CONFIG_FUNCTION_TRACER and #ifdef CONFIG_FUNCTION_GRAPH_TRACER,
instead of having the #ifdefs spread all over.

No functional change is made.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_sched_wakeup.c | 272 ++
 1 file changed, 130 insertions(+), 142 deletions(-)

diff --git a/kernel/trace/trace_sched_wakeup.c 
b/kernel/trace/trace_sched_wakeup.c
index 7d04b9890755..2ce78100b4d3 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -35,26 +35,19 @@ static arch_spinlock_t wakeup_lock =
 
 static void wakeup_reset(struct trace_array *tr);
 static void __wakeup_reset(struct trace_array *tr);
+static int start_func_tracer(struct trace_array *tr, int graph);
+static void stop_func_tracer(struct trace_array *tr, int graph);
 
 static int save_flags;
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static int wakeup_display_graph(struct trace_array *tr, int set);
 # define is_graph(tr) ((tr)->trace_flags & TRACE_ITER_DISPLAY_GRAPH)
 #else
-static inline int wakeup_display_graph(struct trace_array *tr, int set)
-{
-   return 0;
-}
 # define is_graph(tr) false
 #endif
 
-
 #ifdef CONFIG_FUNCTION_TRACER
 
-static int wakeup_graph_entry(struct ftrace_graph_ent *trace);
-static void wakeup_graph_return(struct ftrace_graph_ret *trace);
-
 static bool function_enabled;
 
 /*
@@ -104,122 +97,8 @@ func_prolog_preempt_disable(struct trace_array *tr,
return 0;
 }
 
-/*
- * wakeup uses its own tracer function to keep the overhead down:
- */
-static void
-wakeup_tracer_call(unsigned long ip, unsigned long parent_ip,
-  struct ftrace_ops *op, struct pt_regs *pt_regs)
-{
-   struct trace_array *tr = wakeup_trace;
-   struct trace_array_cpu *data;
-   unsigned long flags;
-   int pc;
-
-   if (!func_prolog_preempt_disable(tr, , ))
-   return;
-
-   local_irq_save(flags);
-   trace_function(tr, ip, parent_ip, flags, pc);
-   local_irq_restore(flags);
-
-   atomic_dec(>disabled);
-   preempt_enable_notrace();
-}
-
-static int register_wakeup_function(struct trace_array *tr, int graph, int set)
-{
-   int ret;
-
-   /* 'set' is set if TRACE_ITER_FUNCTION is about to be set */
-   if (function_enabled || (!set && !(tr->trace_flags & 
TRACE_ITER_FUNCTION)))
-   return 0;
-
-   if (graph)
-   ret = register_ftrace_graph(_graph_return,
-   _graph_entry);
-   else
-   ret = register_ftrace_function(tr->ops);
-
-   if (!ret)
-   function_enabled = true;
-
-   return ret;
-}
-
-static void unregister_wakeup_function(struct trace_array *tr, int graph)
-{
-   if (!function_enabled)
-   return;
-
-   if (graph)
-   unregister_ftrace_graph();
-   else
-   unregister_ftrace_function(tr->ops);
-
-   function_enabled = false;
-}
-
-static int wakeup_function_set(struct trace_array *tr, u32 mask, int set)
-{
-   if (!(mask & TRACE_ITER_FUNCTION))
-   return 0;
-
-   if (set)
-   register_wakeup_function(tr, is_graph(tr), 1);
-   else
-   unregister_wakeup_function(tr, is_graph(tr));
-   return 1;
-}
-#else
-static int register_wakeup_function(struct trace_array *tr, int graph, int set)
-{
-   return 0;
-}
-static void unregister_wakeup_function(struct trace_array *tr, int graph) { }
-static int wakeup_function_set(struct trace_array *tr, u32 mask, int set)
-{
-   return 0;
-}
-#endif /* CONFIG_FUNCTION_TRACER */
-
-static int wakeup_flag_changed(struct trace_array *tr, u32 mask, int set)
-{
-   struct tracer *tracer = tr->current_trace;
-
-   if (wakeup_function_set(tr, mask, set))
-   return 0;
-
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-   if (mask & TRACE_ITER_DISPLAY_GRAPH)
-   return wakeup_display_graph(tr, set);
-#endif
-
-   return trace_keep_overwrite(tracer, mask, set);
-}
 
-static int start_func_tracer(struct trace_array *tr, int graph)
-{
-   int ret;
-
-   ret = register_wakeup_function(tr, graph, 0);
-
-   if (!ret && tracing_is_enabled())
-   tracer_enabled = 1;
-   else
-   tracer_enabled = 0;
-
-   return ret;
-}
-
-static void stop_func_tracer(struct trace_array *tr, int graph)
-{
-   tracer_enabled = 0;
-
-   unregister_wakeup_function(tr, graph);
-}
-
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
 static int wakeup_display_graph(struct trace_array *tr, int set)
 {
if (!(is_graph(tr) ^ set))
@@ -318,20 +197,94 @@ static void wakeup_print_header(struct seq_file *s)
else
trace_default_header(s);
 }
+#else /* CONFIG_FUNCTION_GRAPH_TRACER */
+static int wakeup_grap

[for-next][PATCH 22/30] tracing: Integrate similar probe argument parsers

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Integrate similar argument parsers for kprobes and uprobes events
into traceprobe_parse_probe_arg().

Link: 
http://lkml.kernel.org/r/154140850016.17322.9836787731210512176.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_kprobe.c | 48 ++---
 kernel/trace/trace_probe.c  | 47 +---
 kernel/trace/trace_probe.h  |  7 ++
 kernel/trace/trace_uprobe.c | 44 ++
 4 files changed, 50 insertions(+), 96 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index fec67188c4d2..d313bcc259dc 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -548,7 +548,6 @@ static int create_trace_kprobe(int argc, char **argv)
bool is_return = false, is_delete = false;
char *symbol = NULL, *event = NULL, *group = NULL;
int maxactive = 0;
-   char *arg;
long offset = 0;
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
@@ -676,53 +675,10 @@ static int create_trace_kprobe(int argc, char **argv)
}
 
/* parse arguments */
-   ret = 0;
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
-   struct probe_arg *parg = >tp.args[i];
-
-   /* Increment count for freeing args in error case */
-   tk->tp.nr_args++;
-
-   /* Parse argument name */
-   arg = strchr(argv[i], '=');
-   if (arg) {
-   *arg++ = '\0';
-   parg->name = kstrdup(argv[i], GFP_KERNEL);
-   } else {
-   arg = argv[i];
-   /* If argument name is omitted, set "argN" */
-   snprintf(buf, MAX_EVENT_NAME_LEN, "arg%d", i + 1);
-   parg->name = kstrdup(buf, GFP_KERNEL);
-   }
-
-   if (!parg->name) {
-   pr_info("Failed to allocate argument[%d] name.\n", i);
-   ret = -ENOMEM;
-   goto error;
-   }
-
-   if (!is_good_name(parg->name)) {
-   pr_info("Invalid argument[%d] name: %s\n",
-   i, parg->name);
-   ret = -EINVAL;
-   goto error;
-   }
-
-   if (traceprobe_conflict_field_name(parg->name,
-   tk->tp.args, i)) {
-   pr_info("Argument[%d] name '%s' conflicts with "
-   "another field.\n", i, argv[i]);
-   ret = -EINVAL;
-   goto error;
-   }
-
-   /* Parse fetch argument */
-   ret = traceprobe_parse_probe_arg(arg, >tp.size, parg,
-flags);
-   if (ret) {
-   pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
+   ret = traceprobe_parse_probe_arg(>tp, i, argv[i], flags);
+   if (ret)
goto error;
-   }
}
 
ret = register_trace_kprobe(tk);
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index bd30e9398d2a..449150c6a87f 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -348,7 +348,7 @@ static int __parse_bitfield_probe_arg(const char *bf,
 }
 
 /* String length checking wrapper */
-int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
+static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
struct probe_arg *parg, unsigned int flags)
 {
struct fetch_insn *code, *scode, *tmp = NULL;
@@ -491,8 +491,8 @@ int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
 }
 
 /* Return 1 if name is reserved or already used by another argument */
-int traceprobe_conflict_field_name(const char *name,
-  struct probe_arg *args, int narg)
+static int traceprobe_conflict_field_name(const char *name,
+ struct probe_arg *args, int narg)
 {
int i;
 
@@ -507,6 +507,47 @@ int traceprobe_conflict_field_name(const char *name,
return 0;
 }
 
+int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, char *arg,
+   unsigned int flags)
+{
+   struct probe_arg *parg = >args[i];
+   char *body;
+   int ret;
+
+   /* Increment count for freeing args in error case */
+   tp->nr_args++;
+
+   body = strchr(arg, '=');
+   if (body) {
+   parg->name = kmemdup_nul(arg, body - arg, GFP_KERNEL);
+   body++;
+   } else {
+

[for-next][PATCH 19/30] tracing/uprobes: Add busy check when cleanup all uprobes

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Add a busy check loop in cleanup_all_probes() before
trying to remove all events in uprobe_events, the same way
that kprobe_events does.

Without this change, writing null to uprobe_events will
try to remove events but if one of them is enabled, it will
stop there leaving some events cleared and others not clceared.

With this change, writing null to uprobe_events makes
sure all events are not enabled before removing events.
So, it clears all events, or returns an error (-EBUSY)
with keeping all events.

Link: 
http://lkml.kernel.org/r/154140841557.17322.12653952888762532401.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_uprobe.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 31ea48eceda1..b708e4ff7ea7 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -587,12 +587,19 @@ static int cleanup_all_probes(void)
int ret = 0;
 
mutex_lock(_lock);
+   /* Ensure no probe is in use. */
+   list_for_each_entry(tu, _list, list)
+   if (trace_probe_is_enabled(>tp)) {
+   ret = -EBUSY;
+   goto end;
+   }
while (!list_empty(_list)) {
tu = list_entry(uprobe_list.next, struct trace_uprobe, list);
ret = unregister_trace_uprobe(tu);
if (ret)
break;
}
+end:
mutex_unlock(_lock);
return ret;
 }
-- 
2.19.1




[for-next][PATCH 15/30] scripts/recordmcount.{c,pl}: support -ffunction-sections .text.* section names

2018-12-05 Thread Steven Rostedt
From: Joe Lawrence 

When building with -ffunction-sections, the compiler will place each
function into its own ELF section, prefixed with ".text".  For example,
a simple test module with functions test_module_do_work() and
test_module_wq_func():

  % objdump --section-headers test_module.o | awk '/\.text/{print $2}'
  .text
  .text.test_module_do_work
  .text.test_module_wq_func
  .init.text
  .exit.text

Adjust the recordmcount scripts to look for ".text" as a section name
prefix.  This will ensure that those functions will be included in the
__mcount_loc relocations:

  % objdump --reloc --section __mcount_loc test_module.o
  OFFSET   TYPE  VALUE
   R_X86_64_64   .text.test_module_do_work
  0008 R_X86_64_64   .text.test_module_wq_func
  0010 R_X86_64_64   .init.text

Link: 
http://lkml.kernel.org/r/1542745158-25392-2-git-send-email-joe.lawre...@redhat.com

Signed-off-by: Joe Lawrence 
Signed-off-by: Steven Rostedt (VMware) 
---
 scripts/recordmcount.c  |  2 +-
 scripts/recordmcount.pl | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index 895c40e8679f..a50a2aa963ad 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -397,7 +397,7 @@ static uint32_t (*w2)(uint16_t);
 static int
 is_mcounted_section_name(char const *const txtname)
 {
-   return strcmp(".text",   txtname) == 0 ||
+   return strncmp(".text",  txtname, 5) == 0 ||
strcmp(".init.text", txtname) == 0 ||
strcmp(".ref.text",  txtname) == 0 ||
strcmp(".sched.text",txtname) == 0 ||
diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index f599031260d5..68841d01162c 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -142,6 +142,11 @@ my %text_sections = (
  ".text.unlikely" => 1,
 );
 
+# Acceptable section-prefixes to record.
+my %text_section_prefixes = (
+ ".text." => 1,
+);
+
 # Note: we are nice to C-programmers here, thus we skip the '||='-idiom.
 $objdump = 'objdump' if (!$objdump);
 $objcopy = 'objcopy' if (!$objcopy);
@@ -519,6 +524,14 @@ while () {
 
# Only record text sections that we know are safe
$read_function = defined($text_sections{$1});
+   if (!$read_function) {
+   foreach my $prefix (keys %text_section_prefixes) {
+   if (substr($1, 0, length $prefix) eq $prefix) {
+   $read_function = 1;
+   last;
+   }
+   }
+   }
# print out any recorded offsets
update_funcs();
 
-- 
2.19.1




[for-next][PATCH 18/30] tracing: Change default buffer_percent to 50

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

After running several tests, it appears that having the reader wait till
half the buffer is full before starting to read (and causing its own events
to fill up the ring buffer constantly), works well. It keeps trace-cmd (the
main user of this interface) from dominating the traces it records.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d382fd1aa4a6..194c01838e3f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8017,7 +8017,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry 
*d_tracer)
trace_create_file("timestamp_mode", 0444, d_tracer, tr,
  _time_stamp_mode_fops);
 
-   tr->buffer_percent = 1;
+   tr->buffer_percent = 50;
 
trace_create_file("buffer_percent", 0444, d_tracer,
tr, _percent_fops);
-- 
2.19.1




[for-next][PATCH 29/30] tracing: Add generic event-name based remove event method

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Add a generic method to remove event from dynamic event
list. This is same as other system under ftrace. You
just need to pass the event name with '!', e.g.

  # echo p:new_grp/new_event _do_fork > dynamic_events

This creates an event, and

  # echo '!p:new_grp/new_event _do_fork' > dynamic_events

Or,

  # echo '!p:new_grp/new_event' > dynamic_events

will remove new_grp/new_event event.

Note that this doesn't check the event prefix (e.g. "p:")
strictly, because the "group/event" name must be unique.

Link: 
http://lkml.kernel.org/r/154140869774.17322.8887303560398645347.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_dynevent.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c
index f17a887abb66..dd1f43588d70 100644
--- a/kernel/trace/trace_dynevent.c
+++ b/kernel/trace/trace_dynevent.c
@@ -37,10 +37,17 @@ int dyn_event_release(int argc, char **argv, struct 
dyn_event_operations *type)
char *system = NULL, *event, *p;
int ret = -ENOENT;
 
-   if (argv[0][1] != ':')
-   return -EINVAL;
+   if (argv[0][0] == '-') {
+   if (argv[0][1] != ':')
+   return -EINVAL;
+   event = [0][2];
+   } else {
+   event = strchr(argv[0], ':');
+   if (!event)
+   return -EINVAL;
+   event++;
+   }
 
-   event = [0][2];
p = strchr(event, '/');
if (p) {
system = event;
@@ -69,7 +76,7 @@ static int create_dyn_event(int argc, char **argv)
struct dyn_event_operations *ops;
int ret;
 
-   if (argv[0][0] == '-')
+   if (argv[0][0] == '-' || argv[0][0] == '!')
return dyn_event_release(argc, argv, NULL);
 
mutex_lock(_event_ops_mutex);
-- 
2.19.1




[for-next][PATCH 16/30] ring-buffer: Add percentage of ring buffer full to wake up reader

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Instead of just waiting for a page to be full before waking up a pending
reader, allow the reader to pass in a "percentage" of pages that have
content before waking up a reader. This should help keep the process of
reading the events not cause wake ups that constantly cause reading of the
buffer.

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ring_buffer.h |  4 ++-
 kernel/trace/ring_buffer.c  | 71 ++---
 kernel/trace/trace.c|  8 ++---
 3 files changed, 73 insertions(+), 10 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 0940fda59872..5b9ae62272bb 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -97,7 +97,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, 
struct lock_class_key *k
__ring_buffer_alloc((size), (flags), &__key);   \
 })
 
-int ring_buffer_wait(struct ring_buffer *buffer, int cpu, bool full);
+int ring_buffer_wait(struct ring_buffer *buffer, int cpu, int full);
 __poll_t ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu,
  struct file *filp, poll_table *poll_table);
 
@@ -189,6 +189,8 @@ bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer);
 
 size_t ring_buffer_page_len(void *page);
 
+size_t ring_buffer_nr_pages(struct ring_buffer *buffer, int cpu);
+size_t ring_buffer_nr_dirty_pages(struct ring_buffer *buffer, int cpu);
 
 void *ring_buffer_alloc_read_page(struct ring_buffer *buffer, int cpu);
 void ring_buffer_free_read_page(struct ring_buffer *buffer, int cpu, void 
*data);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 65bd4616220d..9edb628603ab 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -487,6 +487,9 @@ struct ring_buffer_per_cpu {
local_t dropped_events;
local_t committing;
local_t commits;
+   local_t pages_touched;
+   local_t pages_read;
+   size_t  shortest_full;
unsigned long   read;
unsigned long   read_bytes;
u64 write_stamp;
@@ -529,6 +532,41 @@ struct ring_buffer_iter {
u64 read_stamp;
 };
 
+/**
+ * ring_buffer_nr_pages - get the number of buffer pages in the ring buffer
+ * @buffer: The ring_buffer to get the number of pages from
+ * @cpu: The cpu of the ring_buffer to get the number of pages from
+ *
+ * Returns the number of pages used by a per_cpu buffer of the ring buffer.
+ */
+size_t ring_buffer_nr_pages(struct ring_buffer *buffer, int cpu)
+{
+   return buffer->buffers[cpu]->nr_pages;
+}
+
+/**
+ * ring_buffer_nr_pages_dirty - get the number of used pages in the ring buffer
+ * @buffer: The ring_buffer to get the number of pages from
+ * @cpu: The cpu of the ring_buffer to get the number of pages from
+ *
+ * Returns the number of pages that have content in the ring buffer.
+ */
+size_t ring_buffer_nr_dirty_pages(struct ring_buffer *buffer, int cpu)
+{
+   size_t read;
+   size_t cnt;
+
+   read = local_read(>buffers[cpu]->pages_read);
+   cnt = local_read(>buffers[cpu]->pages_touched);
+   /* The reader can read an empty page, but not more than that */
+   if (cnt < read) {
+   WARN_ON_ONCE(read > cnt + 1);
+   return 0;
+   }
+
+   return cnt - read;
+}
+
 /*
  * rb_wake_up_waiters - wake up tasks waiting for ring buffer input
  *
@@ -556,7 +594,7 @@ static void rb_wake_up_waiters(struct irq_work *work)
  * as data is added to any of the @buffer's cpu buffers. Otherwise
  * it will wait for data to be added to a specific cpu buffer.
  */
-int ring_buffer_wait(struct ring_buffer *buffer, int cpu, bool full)
+int ring_buffer_wait(struct ring_buffer *buffer, int cpu, int full)
 {
struct ring_buffer_per_cpu *uninitialized_var(cpu_buffer);
DEFINE_WAIT(wait);
@@ -571,7 +609,7 @@ int ring_buffer_wait(struct ring_buffer *buffer, int cpu, 
bool full)
if (cpu == RING_BUFFER_ALL_CPUS) {
work = >irq_work;
/* Full only makes sense on per cpu reads */
-   full = false;
+   full = 0;
} else {
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return -ENODEV;
@@ -623,15 +661,22 @@ int ring_buffer_wait(struct ring_buffer *buffer, int cpu, 
bool full)
!ring_buffer_empty_cpu(buffer, cpu)) {
unsigned long flags;
bool pagebusy;
+   size_t nr_pages;
+   size_t dirty;
 
if (!full)
break;
 
   

[for-next][PATCH 08/30] function_graph: Do not expose the graph_time option when profiler is not configured

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

When the function profiler is not configured, the "graph_time" option is
meaningless, as the function profiler is the only thing that makes use of
it. Do not expose it if the profiler is not configured.

Link: http://lkml.kernel.org/r/20181123061133.ga195...@google.com

Reported-by: Joel Fernandes 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.h | 5 +
 kernel/trace/trace_functions_graph.c | 4 
 2 files changed, 9 insertions(+)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f67060a75f38..ab16eca76e59 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -862,7 +862,12 @@ static __always_inline bool ftrace_hash_empty(struct 
ftrace_hash *hash)
 #define TRACE_GRAPH_PRINT_FILL_MASK(0x3 << TRACE_GRAPH_PRINT_FILL_SHIFT)
 
 extern void ftrace_graph_sleep_time_control(bool enable);
+
+#ifdef CONFIG_FUNCTION_PROFILER
 extern void ftrace_graph_graph_time_control(bool enable);
+#else
+static inline void ftrace_graph_graph_time_control(bool enable) { }
+#endif
 
 extern enum print_line_t
 print_graph_function_flags(struct trace_iterator *iter, u32 flags);
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index eaf9b1629956..855c13c61e77 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -60,8 +60,12 @@ static struct tracer_opt trace_opts[] = {
{ TRACER_OPT(funcgraph-tail, TRACE_GRAPH_PRINT_TAIL) },
/* Include sleep time (scheduled out) between entry and return */
{ TRACER_OPT(sleep-time, TRACE_GRAPH_SLEEP_TIME) },
+
+#ifdef CONFIG_FUNCTION_PROFILER
/* Include time within nested functions */
{ TRACER_OPT(graph-time, TRACE_GRAPH_GRAPH_TIME) },
+#endif
+
{ } /* Empty entry */
 };
 
-- 
2.19.1




[for-next][PATCH 27/30] tracing: Remove unneeded synth_event_mutex

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Rmove unneeded synth_event_mutex. This mutex protects the reference
count in synth_event, however, those operational points are already
protected by event_mutex.

1. In __create_synth_event() and create_or_delete_synth_event(),
 those synth_event_mutex clearly obtained right after event_mutex.

2. event_hist_trigger_func() is trigger_hist_cmd.func() which is
 called by trigger_process_regex(), which is a part of
 event_trigger_regex_write() and this function takes event_mutex.

3. hist_unreg_all() is trigger_hist_cmd.unreg_all() which is called
 by event_trigger_regex_open() and it takes event_mutex.

4. onmatch_destroy() and onmatch_create() have long call tree,
 but both are finally invoked from event_trigger_regex_write()
 and event_trace_del_tracer(), former takes event_mutex, and latter
 ensures called under event_mutex locked.

Finally, I ensured there is no resource conflict. For safety,
I added lockdep_assert_held(_mutex) for each function.

Link: 
http://lkml.kernel.org/r/154140864134.17322.4796059721306031894.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_hist.c | 30 +++---
 1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 414aabd67d1f..21e4954375a1 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -444,8 +444,6 @@ static bool have_hist_err(void)
return false;
 }
 
-static DEFINE_MUTEX(synth_event_mutex);
-
 struct synth_trace_event {
struct trace_entry  ent;
u64 fields[];
@@ -1077,7 +1075,6 @@ static int __create_synth_event(int argc, const char 
*name, const char **argv)
return -EINVAL;
 
mutex_lock(_mutex);
-   mutex_lock(_event_mutex);
 
event = find_synth_event(name);
if (event) {
@@ -1119,7 +1116,6 @@ static int __create_synth_event(int argc, const char 
*name, const char **argv)
else
free_synth_event(event);
  out:
-   mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
 
return ret;
@@ -1139,7 +1135,6 @@ static int create_or_delete_synth_event(int argc, char 
**argv)
/* trace_run_command() ensures argc != 0 */
if (name[0] == '!') {
mutex_lock(_mutex);
-   mutex_lock(_event_mutex);
event = find_synth_event(name + 1);
if (event) {
if (event->ref)
@@ -1153,7 +1148,6 @@ static int create_or_delete_synth_event(int argc, char 
**argv)
}
} else
ret = -ENOENT;
-   mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
return ret;
}
@@ -3535,7 +3529,7 @@ static void onmatch_destroy(struct action_data *data)
 {
unsigned int i;
 
-   mutex_lock(_event_mutex);
+   lockdep_assert_held(_mutex);
 
kfree(data->onmatch.match_event);
kfree(data->onmatch.match_event_system);
@@ -3548,8 +3542,6 @@ static void onmatch_destroy(struct action_data *data)
data->onmatch.synth_event->ref--;
 
kfree(data);
-
-   mutex_unlock(_event_mutex);
 }
 
 static void destroy_field_var(struct field_var *field_var)
@@ -3700,15 +3692,14 @@ static int onmatch_create(struct hist_trigger_data 
*hist_data,
struct synth_event *event;
int ret = 0;
 
-   mutex_lock(_event_mutex);
+   lockdep_assert_held(_mutex);
+
event = find_synth_event(data->onmatch.synth_event_name);
if (!event) {
hist_err("onmatch: Couldn't find synthetic event: ", 
data->onmatch.synth_event_name);
-   mutex_unlock(_event_mutex);
return -EINVAL;
}
event->ref++;
-   mutex_unlock(_event_mutex);
 
var_ref_idx = hist_data->n_var_refs;
 
@@ -3782,9 +3773,7 @@ static int onmatch_create(struct hist_trigger_data 
*hist_data,
  out:
return ret;
  err:
-   mutex_lock(_event_mutex);
event->ref--;
-   mutex_unlock(_event_mutex);
 
goto out;
 }
@@ -5492,6 +5481,8 @@ static void hist_unreg_all(struct trace_event_file *file)
struct synth_event *se;
const char *se_name;
 
+   lockdep_assert_held(_mutex);
+
if (hist_file_check_refs(file))
return;
 
@@ -5501,12 +5492,10 @@ static void hist_unreg_all(struct trace_event_file 
*file)
list_del_rcu(>list);
trace_event_trigger_enable_disable(file, 0);
 
-   mutex_lock(_event_mutex);
se_name = trace_event_name(file->event_call);
se = find_synth_event(se_name);
i

[for-next][PATCH 13/30] function_graph: Have profiler use new helper ftrace_graph_get_ret_stack()

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The ret_stack processing is going to change, and that is going
to break anything that is accessing the ret_stack directly. One user is the
function graph profiler. By using the ftrace_graph_get_ret_stack() helper
function, the profiler can access the ret_stack entry without relying on the
implementation details of the stack itself.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h |  3 +++
 kernel/trace/fgraph.c  | 11 +++
 kernel/trace/ftrace.c  | 21 +++--
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 21c80491ccde..98e141c71ad0 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -785,6 +785,9 @@ extern int
 function_graph_enter(unsigned long ret, unsigned long func,
 unsigned long frame_pointer, unsigned long *retp);
 
+struct ftrace_ret_stack *
+ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
+
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp);
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 90fcefcaff2a..a3704ec8b599 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -232,6 +232,17 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
return ret;
 }
 
+struct ftrace_ret_stack *
+ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
+{
+   idx = current->curr_ret_stack - idx;
+
+   if (idx >= 0 && idx <= task->curr_ret_stack)
+   return >ret_stack[idx];
+
+   return NULL;
+}
+
 /**
  * ftrace_graph_ret_addr - convert a potentially modified stack return address
  *to its original value
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index d06fe588e650..8ef9fc226037 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -792,7 +792,7 @@ void ftrace_graph_graph_time_control(bool enable)
 
 static int profile_graph_entry(struct ftrace_graph_ent *trace)
 {
-   int index = current->curr_ret_stack;
+   struct ftrace_ret_stack *ret_stack;
 
function_profile_call(trace->func, 0, NULL, NULL);
 
@@ -800,14 +800,16 @@ static int profile_graph_entry(struct ftrace_graph_ent 
*trace)
if (!current->ret_stack)
return 0;
 
-   if (index >= 0 && index < FTRACE_RETFUNC_DEPTH)
-   current->ret_stack[index].subtime = 0;
+   ret_stack = ftrace_graph_get_ret_stack(current, 0);
+   if (ret_stack)
+   ret_stack->subtime = 0;
 
return 1;
 }
 
 static void profile_graph_return(struct ftrace_graph_ret *trace)
 {
+   struct ftrace_ret_stack *ret_stack;
struct ftrace_profile_stat *stat;
unsigned long long calltime;
struct ftrace_profile *rec;
@@ -825,16 +827,15 @@ static void profile_graph_return(struct ftrace_graph_ret 
*trace)
calltime = trace->rettime - trace->calltime;
 
if (!fgraph_graph_time) {
-   int index;
-
-   index = current->curr_ret_stack;
 
/* Append this call time to the parent time to subtract */
-   if (index)
-   current->ret_stack[index - 1].subtime += calltime;
+   ret_stack = ftrace_graph_get_ret_stack(current, 1);
+   if (ret_stack)
+   ret_stack->subtime += calltime;
 
-   if (current->ret_stack[index].subtime < calltime)
-   calltime -= current->ret_stack[index].subtime;
+   ret_stack = ftrace_graph_get_ret_stack(current, 0);
+   if (ret_stack && ret_stack->subtime < calltime)
+   calltime -= ret_stack->subtime;
else
calltime = 0;
}
-- 
2.19.1




[for-next][PATCH 26/30] tracing: Use dyn_event framework for synthetic events

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Use dyn_event framework for synthetic events. This shows
synthetic events on "tracing/dynamic_events" file in addition
to tracing/synthetic_events interface.

User can also define new events via tracing/dynamic_events
with "s:" prefix. So, the new syntax is below;

  s:[synthetic/]EVENT_NAME TYPE ARG; [TYPE ARG;]...

To remove events via tracing/dynamic_events, you can use
"-:" prefix as same as other events.

Link: 
http://lkml.kernel.org/r/154140861301.17322.15454611233735614508.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/Kconfig |   1 +
 kernel/trace/trace.c |   8 +
 kernel/trace/trace_events_hist.c | 265 +++
 3 files changed, 176 insertions(+), 98 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 2cab3c5dfe2c..fa8b1fe824f3 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -635,6 +635,7 @@ config HIST_TRIGGERS
depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
select TRACING_MAP
select TRACING
+   select DYNAMIC_EVENTS
default n
help
  Hist triggers allow one or more arbitrary trace event fields
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7e0332f90ed4..911470ad9e94 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4620,6 +4620,9 @@ static const char readme_msg[] =
"\t  accepts: event-definitions (one definition per line)\n"
"\t   Format: p[:[/]]  []\n"
"\t   r[maxactive][:[/]]  []\n"
+#ifdef CONFIG_HIST_TRIGGERS
+   "\t   s:[synthetic/]  []\n"
+#endif
"\t   -:[/]\n"
 #ifdef CONFIG_KPROBE_EVENTS
"\tplace: [:][+]|\n"
@@ -4638,6 +4641,11 @@ static const char readme_msg[] =
"\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
"\t   b@/,\n"
"\t   \\[\\]\n"
+#ifdef CONFIG_HIST_TRIGGERS
+   "\tfield:  ;\n"
+   "\tstype: u8/u16/u32/u64, s8/s16/s32/s64, pid_t,\n"
+   "\t   [unsigned] char/int/long\n"
+#endif
 #endif
"  events/\t\t- Directory containing all trace event subsystems:\n"
"  enable\t\t- Write 0/1 to enable/disable tracing of all events\n"
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0feb7f460123..414aabd67d1f 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -15,6 +15,7 @@
 
 #include "tracing_map.h"
 #include "trace.h"
+#include "trace_dynevent.h"
 
 #define SYNTH_SYSTEM   "synthetic"
 #define SYNTH_FIELDS_MAX   16
@@ -292,6 +293,21 @@ struct hist_trigger_data {
unsigned intn_max_var_str;
 };
 
+static int synth_event_create(int argc, const char **argv);
+static int synth_event_show(struct seq_file *m, struct dyn_event *ev);
+static int synth_event_release(struct dyn_event *ev);
+static bool synth_event_is_busy(struct dyn_event *ev);
+static bool synth_event_match(const char *system, const char *event,
+ struct dyn_event *ev);
+
+static struct dyn_event_operations synth_event_ops = {
+   .create = synth_event_create,
+   .show = synth_event_show,
+   .is_busy = synth_event_is_busy,
+   .free = synth_event_release,
+   .match = synth_event_match,
+};
+
 struct synth_field {
char *type;
char *name;
@@ -301,7 +317,7 @@ struct synth_field {
 };
 
 struct synth_event {
-   struct list_headlist;
+   struct dyn_eventdevent;
int ref;
char*name;
struct synth_field  **fields;
@@ -312,6 +328,32 @@ struct synth_event {
struct tracepoint   *tp;
 };
 
+static bool is_synth_event(struct dyn_event *ev)
+{
+   return ev->ops == _event_ops;
+}
+
+static struct synth_event *to_synth_event(struct dyn_event *ev)
+{
+   return container_of(ev, struct synth_event, devent);
+}
+
+static bool synth_event_is_busy(struct dyn_event *ev)
+{
+   struct synth_event *event = to_synth_event(ev);
+
+   return event->ref != 0;
+}
+
+static bool synth_event_match(const char *system, const char *event,
+ struct dyn_event *ev)
+{
+   struct synth_event *sev = to_synth_event(ev);
+
+   return strcmp(sev->name, event) == 0 &&
+   (!system || strcmp(system, SYNTH_SYSTEM) == 0);
+}
+
 struct action_data;
 
 typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,
@@ -402,7 +444,6 @@ static bo

[for-next][PATCH 21/30] tracing: Simplify creation and deletion of synthetic events

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Since the event_mutex and synth_event_mutex ordering issue
is gone, we can skip existing event check when adding or
deleting events, and some redundant code in error path.

This changes release_all_synth_events() to abort the process
when it hits any error and returns the error code. It succeeds
only if it has no error.

Link: 
http://lkml.kernel.org/r/154140847194.17322.17960275728005067803.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_hist.c | 53 +++-
 1 file changed, 18 insertions(+), 35 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 1670c65389fe..0feb7f460123 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1008,18 +1008,6 @@ struct hist_var_data {
struct hist_trigger_data *hist_data;
 };
 
-static void add_or_delete_synth_event(struct synth_event *event, int delete)
-{
-   if (delete)
-   free_synth_event(event);
-   else {
-   if (!find_synth_event(event->name))
-   list_add(>list, _event_list);
-   else
-   free_synth_event(event);
-   }
-}
-
 static int create_synth_event(int argc, char **argv)
 {
struct synth_field *field, *fields[SYNTH_FIELDS_MAX];
@@ -1052,15 +1040,16 @@ static int create_synth_event(int argc, char **argv)
if (event) {
if (delete_event) {
if (event->ref) {
-   event = NULL;
ret = -EBUSY;
goto out;
}
-   list_del(>list);
-   goto out;
-   }
-   event = NULL;
-   ret = -EEXIST;
+   ret = unregister_synth_event(event);
+   if (!ret) {
+   list_del(>list);
+   free_synth_event(event);
+   }
+   } else
+   ret = -EEXIST;
goto out;
} else if (delete_event) {
ret = -ENOENT;
@@ -1100,29 +1089,21 @@ static int create_synth_event(int argc, char **argv)
event = NULL;
goto err;
}
+   ret = register_synth_event(event);
+   if (!ret)
+   list_add(>list, _event_list);
+   else
+   free_synth_event(event);
  out:
-   if (event) {
-   if (delete_event) {
-   ret = unregister_synth_event(event);
-   add_or_delete_synth_event(event, !ret);
-   } else {
-   ret = register_synth_event(event);
-   add_or_delete_synth_event(event, ret);
-   }
-   }
mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
 
return ret;
  err:
-   mutex_unlock(_event_mutex);
-   mutex_unlock(_mutex);
-
for (i = 0; i < n_fields; i++)
free_synth_field(fields[i]);
-   free_synth_event(event);
 
-   return ret;
+   goto out;
 }
 
 static int release_all_synth_events(void)
@@ -1141,10 +1122,12 @@ static int release_all_synth_events(void)
}
 
list_for_each_entry_safe(event, e, _event_list, list) {
-   list_del(>list);
-
ret = unregister_synth_event(event);
-   add_or_delete_synth_event(event, !ret);
+   if (!ret) {
+   list_del(>list);
+   free_synth_event(event);
+   } else
+   break;
}
mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
-- 
2.19.1




[for-next][PATCH 20/30] tracing: Lock event_mutex before synth_event_mutex

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

synthetic event is using synth_event_mutex for protecting
synth_event_list, and event_trigger_write() path acquires
locks as below order.

event_trigger_write(event_mutex)
  ->trigger_process_regex(trigger_cmd_mutex)
->event_hist_trigger_func(synth_event_mutex)

On the other hand, synthetic event creation and deletion paths
call trace_add_event_call() and trace_remove_event_call()
which acquires event_mutex. In that case, if we keep the
synth_event_mutex locked while registering/unregistering synthetic
events, its dependency will be inversed.

To avoid this issue, current synthetic event is using a 2 phase
process to create/delete events. For example, it searches existing
events under synth_event_mutex to check for event-name conflicts, and
unlocks synth_event_mutex, then registers a new event under event_mutex
locked. Finally, it locks synth_event_mutex and tries to add the
new event to the list. But it can introduce complexity and a chance
for name conflicts.

To solve this simpler, this introduces trace_add_event_call_nolock()
and trace_remove_event_call_nolock() which don't acquire
event_mutex inside. synthetic event can lock event_mutex before
synth_event_mutex to solve the lock dependency issue simpler.

Link: 
http://lkml.kernel.org/r/154140844377.17322.13781091165954002713.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/trace_events.h |  2 ++
 kernel/trace/trace_events.c  | 34 ++--
 kernel/trace/trace_events_hist.c | 24 ++
 3 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 4130a5497d40..3aa05593a53f 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -529,6 +529,8 @@ extern int trace_event_raw_init(struct trace_event_call 
*call);
 extern int trace_define_field(struct trace_event_call *call, const char *type,
  const char *name, int offset, int size,
  int is_signed, int filter_type);
+extern int trace_add_event_call_nolock(struct trace_event_call *call);
+extern int trace_remove_event_call_nolock(struct trace_event_call *call);
 extern int trace_add_event_call(struct trace_event_call *call);
 extern int trace_remove_event_call(struct trace_event_call *call);
 extern int trace_event_get_offsets(struct trace_event_call *call);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index f94be0c2827b..a3b157f689ee 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2305,11 +2305,11 @@ __trace_early_add_new_event(struct trace_event_call 
*call,
 struct ftrace_module_file_ops;
 static void __add_event_to_tracers(struct trace_event_call *call);
 
-/* Add an additional event_call dynamically */
-int trace_add_event_call(struct trace_event_call *call)
+int trace_add_event_call_nolock(struct trace_event_call *call)
 {
int ret;
-   mutex_lock(_mutex);
+   lockdep_assert_held(_mutex);
+
mutex_lock(_types_lock);
 
ret = __register_event(call, NULL);
@@ -2317,6 +2317,16 @@ int trace_add_event_call(struct trace_event_call *call)
__add_event_to_tracers(call);
 
mutex_unlock(_types_lock);
+   return ret;
+}
+
+/* Add an additional event_call dynamically */
+int trace_add_event_call(struct trace_event_call *call)
+{
+   int ret;
+
+   mutex_lock(_mutex);
+   ret = trace_add_event_call_nolock(call);
mutex_unlock(_mutex);
return ret;
 }
@@ -2366,17 +2376,29 @@ static int probe_remove_event_call(struct 
trace_event_call *call)
return 0;
 }
 
-/* Remove an event_call */
-int trace_remove_event_call(struct trace_event_call *call)
+/* no event_mutex version */
+int trace_remove_event_call_nolock(struct trace_event_call *call)
 {
int ret;
 
-   mutex_lock(_mutex);
+   lockdep_assert_held(_mutex);
+
mutex_lock(_types_lock);
down_write(_event_sem);
ret = probe_remove_event_call(call);
up_write(_event_sem);
mutex_unlock(_types_lock);
+
+   return ret;
+}
+
+/* Remove an event_call */
+int trace_remove_event_call(struct trace_event_call *call)
+{
+   int ret;
+
+   mutex_lock(_mutex);
+   ret = trace_remove_event_call_nolock(call);
mutex_unlock(_mutex);
 
return ret;
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index eb908ef2ecec..1670c65389fe 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -912,7 +912,7 @@ static int register_synth_event(struct synth_event *event)
call->data = event;
call->tp = event->tp;
 
-   ret = trace_add_event_call(call);
+   ret = trace_add_event_call_nolock(call);
if (ret) {
pr_warn(&quo

[for-next][PATCH 30/30] selftests/ftrace: Add testcases for dynamic event

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Add common testcases for dynamic_events interface.
 - Add/remove kprobe events via dynamic_events
 - Add/remove synthetic events via dynamic_events
 - Selective clear events (clear events other interfaces)
 - Genelic clear events ("!LINE" syntax)

Link: 
http://lkml.kernel.org/r/154140872590.17322.10394440849261743052.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 .../test.d/dynevent/add_remove_kprobe.tc  | 30 +++
 .../test.d/dynevent/add_remove_synth.tc   | 27 ++
 .../test.d/dynevent/clear_select_events.tc| 50 +++
 .../test.d/dynevent/generic_clear_event.tc| 49 ++
 4 files changed, 156 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc

diff --git 
a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
new file mode 100644
index ..c6d8387dbbb8
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
@@ -0,0 +1,30 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - add/remove kprobe events
+
+[ -f dynamic_events ] || exit_unsupported
+
+grep -q "place: \[:\]" README || exit_unsupported
+grep -q "place (kretprobe): \[:\]" README || exit_unsupported
+
+echo 0 > events/enable
+echo > dynamic_events
+
+PLACE=_do_fork
+
+echo "p:myevent1 $PLACE" >> dynamic_events
+echo "r:myevent2 $PLACE" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+grep -q myevent2 dynamic_events
+test -d events/kprobes/myevent1
+test -d events/kprobes/myevent2
+
+echo "-:myevent2" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+! grep -q myevent2 dynamic_events
+
+echo > dynamic_events
+
+clear_trace
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
new file mode 100644
index ..62b77b5941d0
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
@@ -0,0 +1,27 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - add/remove synthetic events
+
+[ -f dynamic_events ] || exit_unsupported
+
+grep -q "s:\[synthetic/\]" README || exit_unsupported
+
+echo 0 > events/enable
+echo > dynamic_events
+
+echo "s:latency1 u64 lat; pid_t pid;" >> dynamic_events
+echo "s:latency2 u64 lat; pid_t pid;" >> dynamic_events
+
+grep -q latency1 dynamic_events
+grep -q latency2 dynamic_events
+test -d events/synthetic/latency1
+test -d events/synthetic/latency2
+
+echo "-:synthetic/latency2" >> dynamic_events
+
+grep -q latency1 dynamic_events
+! grep -q latency2 dynamic_events
+
+echo > dynamic_events
+
+clear_trace
diff --git 
a/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
new file mode 100644
index ..e0842109cb57
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
@@ -0,0 +1,50 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - selective clear (compatibility)
+
+[ -f dynamic_events ] || exit_unsupported
+
+grep -q "place: \[:\]" README || exit_unsupported
+grep -q "place (kretprobe): \[:\]" README || exit_unsupported
+
+grep -q "s:\[synthetic/\]" README || exit_unsupported
+
+[ -f synthetic_events ] || exit_unsupported
+[ -f kprobe_events ] || exit_unsupported
+
+echo 0 > events/enable
+echo > dynamic_events
+
+PLACE=_do_fork
+
+setup_events() {
+echo "p:myevent1 $PLACE" >> dynamic_events
+echo "s:latency1 u64 lat; pid_t pid;" >> dynamic_events
+echo "r:myevent2 $PLACE" >> dynamic_events
+echo "s:latency2 u64 lat; pid_t pid;" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+grep -q myevent2 dynamic_events
+grep -q latency1 dynamic_events
+grep -q latency2 dynamic_events
+}
+
+setup_events
+echo > synthetic_events
+
+grep -q myevent1 dynamic_events
+grep -q myevent2 dynamic_events
+! grep -q latency1 dynamic_events
+! grep -q latency2 dynamic_events
+
+echo > dynamic_events
+
+setup_events
+echo > kprobe_events
+
+! grep -q myevent1 dynamic_events
+! grep -q myevent2 dynamic_events
+grep -q latency1 dynamic_events
+grep -q latency2 dynamic_events
+
+echo > dyn

[for-next][PATCH 23/30] tracing: Add unified dynamic event framework

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Add unified dynamic event framework for ftrace kprobes, uprobes
and synthetic events. Those dynamic events can be co-exist on
same file because those syntax doesn't overlap.

This introduces a framework part which provides a unified tracefs
interface and operations.

Link: 
http://lkml.kernel.org/r/154140852824.17322.12250362185969352095.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/Kconfig  |   3 +
 kernel/trace/Makefile |   1 +
 kernel/trace/trace.c  |   4 +
 kernel/trace/trace_dynevent.c | 210 ++
 kernel/trace/trace_dynevent.h | 119 +++
 5 files changed, 337 insertions(+)
 create mode 100644 kernel/trace/trace_dynevent.c
 create mode 100644 kernel/trace/trace_dynevent.h

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 5e3de28c7677..bf2e8a5a91f1 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -518,6 +518,9 @@ config BPF_EVENTS
help
  This allows the user to attach BPF programs to kprobe events.
 
+config DYNAMIC_EVENTS
+   def_bool n
+
 config PROBE_EVENTS
def_bool n
 
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index c7ade7965464..c2b2148bb1d2 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -79,6 +79,7 @@ endif
 ifeq ($(CONFIG_TRACING),y)
 obj-$(CONFIG_KGDB_KDB) += trace_kdb.o
 endif
+obj-$(CONFIG_DYNAMIC_EVENTS) += trace_dynevent.o
 obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
 obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 194c01838e3f..7e0332f90ed4 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4604,6 +4604,10 @@ static const char readme_msg[] =
"\t\t\t  traces\n"
 #endif
 #endif /* CONFIG_STACK_TRACER */
+#ifdef CONFIG_DYNAMIC_EVENTS
+   "  dynamic_events\t\t- Add/remove/show the generic dynamic events\n"
+   "\t\t\t  Write into this file to define/undefine new trace events.\n"
+#endif
 #ifdef CONFIG_KPROBE_EVENTS
"  kprobe_events\t\t- Add/remove/show the kernel dynamic events\n"
"\t\t\t  Write into this file to define/undefine new trace events.\n"
diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c
new file mode 100644
index ..f17a887abb66
--- /dev/null
+++ b/kernel/trace/trace_dynevent.c
@@ -0,0 +1,210 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Generic dynamic event control interface
+ *
+ * Copyright (C) 2018 Masami Hiramatsu 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "trace.h"
+#include "trace_dynevent.h"
+
+static DEFINE_MUTEX(dyn_event_ops_mutex);
+static LIST_HEAD(dyn_event_ops_list);
+
+int dyn_event_register(struct dyn_event_operations *ops)
+{
+   if (!ops || !ops->create || !ops->show || !ops->is_busy ||
+   !ops->free || !ops->match)
+   return -EINVAL;
+
+   INIT_LIST_HEAD(>list);
+   mutex_lock(_event_ops_mutex);
+   list_add_tail(>list, _event_ops_list);
+   mutex_unlock(_event_ops_mutex);
+   return 0;
+}
+
+int dyn_event_release(int argc, char **argv, struct dyn_event_operations *type)
+{
+   struct dyn_event *pos, *n;
+   char *system = NULL, *event, *p;
+   int ret = -ENOENT;
+
+   if (argv[0][1] != ':')
+   return -EINVAL;
+
+   event = [0][2];
+   p = strchr(event, '/');
+   if (p) {
+   system = event;
+   event = p + 1;
+   *p = '\0';
+   }
+   if (event[0] == '\0')
+   return -EINVAL;
+
+   mutex_lock(_mutex);
+   for_each_dyn_event_safe(pos, n) {
+   if (type && type != pos->ops)
+   continue;
+   if (pos->ops->match(system, event, pos)) {
+   ret = pos->ops->free(pos);
+   break;
+   }
+   }
+   mutex_unlock(_mutex);
+
+   return ret;
+}
+
+static int create_dyn_event(int argc, char **argv)
+{
+   struct dyn_event_operations *ops;
+   int ret;
+
+   if (argv[0][0] == '-')
+   return dyn_event_release(argc, argv, NULL);
+
+   mutex_lock(_event_ops_mutex);
+   list_for_each_entry(ops, _event_ops_list, list) {
+   ret = ops->create(argc, (const char **)argv);
+   if (!ret || ret != -ECANCELED)
+   break;
+   }
+   mutex_unlock(_event_ops_mutex);
+   if (ret == -ECANCELED)
+   ret = -EINVAL;
+
+   return ret;
+}
+
+/* Protected by event_mutex */
+LIST_HEAD(dyn_event_list);
+
+void *dyn_event_seq_start(struct seq_file *m, loff_t *pos)
+{
+   mutex_lock(_mutex);
+   return seq_list_start(_event_list, *pos);
+}
+
+v

[for-next][PATCH 25/30] tracing/uprobes: Use dyn_event framework for uprobe events

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Use dyn_event framework for uprobe events. This shows
uprobe events on "dynamic_events" file.
User can also define new uprobe events via dynamic_events.

Link: 
http://lkml.kernel.org/r/154140858481.17322.9091293846515154065.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/uprobetracer.rst |   4 +
 kernel/trace/Kconfig |   1 +
 kernel/trace/trace_uprobe.c  | 278 ++-
 3 files changed, 153 insertions(+), 130 deletions(-)

diff --git a/Documentation/trace/uprobetracer.rst 
b/Documentation/trace/uprobetracer.rst
index d0822811527a..4c3bfde2ba47 100644
--- a/Documentation/trace/uprobetracer.rst
+++ b/Documentation/trace/uprobetracer.rst
@@ -18,6 +18,10 @@ current_tracer. Instead of that, add probe points via
 However unlike kprobe-event tracer, the uprobe event interface expects the
 user to calculate the offset of the probepoint in the object.
 
+You can also use /sys/kernel/debug/tracing/dynamic_events instead of
+uprobe_events. That interface will provide unified access to other
+dynamic events too.
+
 Synopsis of uprobe_tracer
 -
 ::
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index c0f6b0105609..2cab3c5dfe2c 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -501,6 +501,7 @@ config UPROBE_EVENTS
depends on PERF_EVENTS
select UPROBES
select PROBE_EVENTS
+   select DYNAMIC_EVENTS
select TRACING
default y
help
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 6eaaa2150685..4a7b21c891f3 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -7,6 +7,7 @@
  */
 #define pr_fmt(fmt)"trace_kprobe: " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -14,6 +15,7 @@
 #include 
 #include 
 
+#include "trace_dynevent.h"
 #include "trace_probe.h"
 #include "trace_probe_tmpl.h"
 
@@ -37,11 +39,26 @@ struct trace_uprobe_filter {
struct list_headperf_events;
 };
 
+static int trace_uprobe_create(int argc, const char **argv);
+static int trace_uprobe_show(struct seq_file *m, struct dyn_event *ev);
+static int trace_uprobe_release(struct dyn_event *ev);
+static bool trace_uprobe_is_busy(struct dyn_event *ev);
+static bool trace_uprobe_match(const char *system, const char *event,
+  struct dyn_event *ev);
+
+static struct dyn_event_operations trace_uprobe_ops = {
+   .create = trace_uprobe_create,
+   .show = trace_uprobe_show,
+   .is_busy = trace_uprobe_is_busy,
+   .free = trace_uprobe_release,
+   .match = trace_uprobe_match,
+};
+
 /*
  * uprobe event core functions
  */
 struct trace_uprobe {
-   struct list_headlist;
+   struct dyn_eventdevent;
struct trace_uprobe_filter  filter;
struct uprobe_consumer  consumer;
struct path path;
@@ -53,6 +70,25 @@ struct trace_uprobe {
struct trace_probe  tp;
 };
 
+static bool is_trace_uprobe(struct dyn_event *ev)
+{
+   return ev->ops == _uprobe_ops;
+}
+
+static struct trace_uprobe *to_trace_uprobe(struct dyn_event *ev)
+{
+   return container_of(ev, struct trace_uprobe, devent);
+}
+
+/**
+ * for_each_trace_uprobe - iterate over the trace_uprobe list
+ * @pos:   the struct trace_uprobe * for each entry
+ * @dpos:  the struct dyn_event * to use as a loop cursor
+ */
+#define for_each_trace_uprobe(pos, dpos)   \
+   for_each_dyn_event(dpos)\
+   if (is_trace_uprobe(dpos) && (pos = to_trace_uprobe(dpos)))
+
 #define SIZEOF_TRACE_UPROBE(n) \
(offsetof(struct trace_uprobe, tp.args) +   \
(sizeof(struct probe_arg) * (n)))
@@ -60,9 +96,6 @@ struct trace_uprobe {
 static int register_uprobe_event(struct trace_uprobe *tu);
 static int unregister_uprobe_event(struct trace_uprobe *tu);
 
-static DEFINE_MUTEX(uprobe_lock);
-static LIST_HEAD(uprobe_list);
-
 struct uprobe_dispatch_data {
struct trace_uprobe *tu;
unsigned long   bp_addr;
@@ -209,6 +242,22 @@ static inline bool is_ret_probe(struct trace_uprobe *tu)
return tu->consumer.ret_handler != NULL;
 }
 
+static bool trace_uprobe_is_busy(struct dyn_event *ev)
+{
+   struct trace_uprobe *tu = to_trace_uprobe(ev);
+
+   return trace_probe_is_enabled(>tp);
+}
+
+static bool trace_uprobe_match(const char *system, const char *event,
+  struct dyn_event *ev)
+{
+   struct trace_uprobe *tu = to_trace_uprobe(ev);
+
+   return strcmp(trace_event_name(>tp.call), event) == 0 &&
+   (!system || strcmp(tu->tp.call.class->system, system) == 0);
+}
+
 /*
  * Al

[for-next][PATCH 28/30] tracing: Consolidate trace_add/remove_event_call back to the nolock functions

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The trace_add/remove_event_call_nolock() functions were added to allow
the tace_add/remove_event_call() code be called when the event_mutex
lock was already taken. Now that all callers are done within the
event_mutex, there's no reason to have two different interfaces.

Remove the current wrapper trace_add/remove_event_call()s and rename the
_nolock versions back to the original names.

Link: 
http://lkml.kernel.org/r/154140866955.17322.2081425494660638846.stgit@devbox

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/trace_events.h |  2 --
 kernel/trace/trace_events.c  | 30 --
 kernel/trace/trace_events_hist.c |  6 +++---
 kernel/trace/trace_kprobe.c  |  4 ++--
 kernel/trace/trace_uprobe.c  |  4 ++--
 5 files changed, 11 insertions(+), 35 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 3aa05593a53f..4130a5497d40 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -529,8 +529,6 @@ extern int trace_event_raw_init(struct trace_event_call 
*call);
 extern int trace_define_field(struct trace_event_call *call, const char *type,
  const char *name, int offset, int size,
  int is_signed, int filter_type);
-extern int trace_add_event_call_nolock(struct trace_event_call *call);
-extern int trace_remove_event_call_nolock(struct trace_event_call *call);
 extern int trace_add_event_call(struct trace_event_call *call);
 extern int trace_remove_event_call(struct trace_event_call *call);
 extern int trace_event_get_offsets(struct trace_event_call *call);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index a3b157f689ee..bd0162c0467c 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2305,7 +2305,8 @@ __trace_early_add_new_event(struct trace_event_call *call,
 struct ftrace_module_file_ops;
 static void __add_event_to_tracers(struct trace_event_call *call);
 
-int trace_add_event_call_nolock(struct trace_event_call *call)
+/* Add an additional event_call dynamically */
+int trace_add_event_call(struct trace_event_call *call)
 {
int ret;
lockdep_assert_held(_mutex);
@@ -2320,17 +2321,6 @@ int trace_add_event_call_nolock(struct trace_event_call 
*call)
return ret;
 }
 
-/* Add an additional event_call dynamically */
-int trace_add_event_call(struct trace_event_call *call)
-{
-   int ret;
-
-   mutex_lock(_mutex);
-   ret = trace_add_event_call_nolock(call);
-   mutex_unlock(_mutex);
-   return ret;
-}
-
 /*
  * Must be called under locking of trace_types_lock, event_mutex and
  * trace_event_sem.
@@ -2376,8 +2366,8 @@ static int probe_remove_event_call(struct 
trace_event_call *call)
return 0;
 }
 
-/* no event_mutex version */
-int trace_remove_event_call_nolock(struct trace_event_call *call)
+/* Remove an event_call */
+int trace_remove_event_call(struct trace_event_call *call)
 {
int ret;
 
@@ -2392,18 +2382,6 @@ int trace_remove_event_call_nolock(struct 
trace_event_call *call)
return ret;
 }
 
-/* Remove an event_call */
-int trace_remove_event_call(struct trace_event_call *call)
-{
-   int ret;
-
-   mutex_lock(_mutex);
-   ret = trace_remove_event_call_nolock(call);
-   mutex_unlock(_mutex);
-
-   return ret;
-}
-
 #define for_each_event(event, start, end)  \
for (event = start; \
 (unsigned long)event < (unsigned long)end; \
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 21e4954375a1..82e72c48a5a9 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -960,7 +960,7 @@ static int register_synth_event(struct synth_event *event)
call->data = event;
call->tp = event->tp;
 
-   ret = trace_add_event_call_nolock(call);
+   ret = trace_add_event_call(call);
if (ret) {
pr_warn("Failed to register synthetic event: %s\n",
trace_event_name(call));
@@ -969,7 +969,7 @@ static int register_synth_event(struct synth_event *event)
 
ret = set_synth_event_print_fmt(call);
if (ret < 0) {
-   trace_remove_event_call_nolock(call);
+   trace_remove_event_call(call);
goto err;
}
  out:
@@ -984,7 +984,7 @@ static int unregister_synth_event(struct synth_event *event)
struct trace_event_call *call = >call;
int ret;
 
-   ret = trace_remove_event_call_nolock(call);
+   ret = trace_remove_event_call(call);
 
return ret;
 }
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index bdf8c2ad5152..0e0f7b8024fb 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1353,7 +1353,7 @@ static int reg

[for-next][PATCH 05/30] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Functions in the set_graph_notrace no longer subtract FTRACE_NOTRACE_DEPTH
from curr_ret_stack, as that is now implemented via the trace_recursion
flags. Access to curr_ret_stack no longer needs to worry about checking for
this. curr_ret_stack is still initialized to -1, when there's not a shadow
stack allocated.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 arch/arm64/kernel/stacktrace.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 4989f7ea1e59..7723dadf25be 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -61,9 +61,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
(frame->pc == (unsigned long)return_to_handler)) {
if (WARN_ON_ONCE(frame->graph == -1))
return -EINVAL;
-   if (frame->graph < -1)
-   frame->graph += FTRACE_NOTRACE_DEPTH;
-
/*
 * This is a case where function graph tracer has
 * modified a return address (LR) in a stack frame
-- 
2.19.1




[PATCH 03/14 v2] arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Functions in the set_graph_notrace no longer subtract FTRACE_NOTRACE_DEPTH
from curr_ret_stack, as that is now implemented via the trace_recursion
flags. Access to curr_ret_stack no longer needs to worry about checking for
this. curr_ret_stack is still initialized to -1, when there's not a shadow
stack allocated.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---

Changes since v1: A better change log. The patch itself is the same.

 arch/arm64/kernel/stacktrace.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 4989f7ea1e59..7723dadf25be 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -61,9 +61,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
(frame->pc == (unsigned long)return_to_handler)) {
if (WARN_ON_ONCE(frame->graph == -1))
return -EINVAL;
-   if (frame->graph < -1)
-   frame->graph += FTRACE_NOTRACE_DEPTH;
-
/*
 * This is a case where function graph tracer has
 * modified a return address (LR) in a stack frame
-- 
2.19.1



[PATCH 0/2] tracing: arm64: Make ftrace_replace_code() schedulable for arm64

2018-12-05 Thread Steven Rostedt


This is a little more involved, and I would like to push this through my
tree. Can I get a reviewed-by/ack for the second (arm64) patch?

Anders, can you also test this to make sure that it fixes the issue you
see?

Thanks!

-- Steve


Steven Rostedt (VMware) (2):
  ftrace: Allow ftrace_replace_code() to be schedulable
  arm64: ftrace: Set FTRACE_SCHEDULABLE before ftrace_modify_all_code()


 arch/arm64/kernel/ftrace.c |  1 +
 include/linux/ftrace.h |  1 +
 kernel/trace/ftrace.c  | 19 ---
 3 files changed, 18 insertions(+), 3 deletions(-)


[PATCH 2/2] arm64: ftrace: Set FTRACE_SCHEDULABLE before ftrace_modify_all_code()

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

It has been reported that ftrace_replace_code() which is called by
ftrace_modify_all_code() can cause a soft lockup warning for an
allmodconfig kernel. This is because all the debug options enabled
causes the loop in ftrace_replace_code() (which loops over all the
functions being enabled where there can be 10s of thousands), is too
slow, and never schedules out.

To solve this, setting FTRACE_SCHEDULABLE to the command passed into
ftrace_replace_code() will make it call cond_resched() in the loop,
which prevents the soft lockup warning from triggering.

Link: http://lkml.kernel.org/r/20181204192903.8193-1-anders.rox...@linaro.org

Reported-by: Anders Roxell 
Signed-off-by: Steven Rostedt (VMware) 
---
 arch/arm64/kernel/ftrace.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 57e962290df3..9a8de0a79f97 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -193,6 +193,7 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace 
*rec,
 
 void arch_ftrace_update_code(int command)
 {
+   command |= FTRACE_SCHEDULABLE;
ftrace_modify_all_code(command);
 }
 
-- 
2.19.1




[PATCH 1/2] ftrace: Allow ftrace_replace_code() to be schedulable

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The function ftrace_replace_code() is the ftrace engine that does the
work to modify all the nops into the calls to the function callback in
all the functions being traced.

The generic version which is normally called from stop machine, but an
architecture can implement a non stop machine version and still use the
generic ftrace_replace_code(). When an architecture does this,
ftrace_replace_code() may be called from a schedulable context, where
it can allow the code to be preemptible, and schedule out.

In order to allow an architecture to make ftrace_replace_code()
schedulable, a new command flag is added called:

 FTRACE_SCHEDULABLE

Which can be or'd to the command that is passed to
ftrace_modify_all_code() that calls ftrace_replace_code() and will have
it call cond_resched() in the loop that modifies the nops into the
calls to the ftrace trampolines.

Link: http://lkml.kernel.org/r/20181204192903.8193-1-anders.rox...@linaro.org

Reported-by: Anders Roxell 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h |  1 +
 kernel/trace/ftrace.c  | 19 ---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index dd16e8218db3..c281b16baef9 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -389,6 +389,7 @@ enum {
FTRACE_UPDATE_TRACE_FUNC= (1 << 2),
FTRACE_START_FUNC_RET   = (1 << 3),
FTRACE_STOP_FUNC_RET= (1 << 4),
+   FTRACE_SCHEDULABLE  = (1 << 5),
 };
 
 /*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 77734451cb05..74fdcacba514 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -77,6 +77,11 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
+enum {
+   FTRACE_MODIFY_ENABLE_FL = (1 << 0),
+   FTRACE_MODIFY_SCHEDULABLE_FL= (1 << 1),
+};
+
 static struct ftrace_ops ftrace_list_end __read_mostly = {
.func   = ftrace_stub,
.flags  = FTRACE_OPS_FL_RECURSION_SAFE | FTRACE_OPS_FL_STUB,
@@ -2415,10 +2420,12 @@ __ftrace_replace_code(struct dyn_ftrace *rec, int 
enable)
return -1; /* unknow ftrace bug */
 }
 
-void __weak ftrace_replace_code(int enable)
+void __weak ftrace_replace_code(int mod_flags)
 {
struct dyn_ftrace *rec;
struct ftrace_page *pg;
+   int enable = mod_flags & FTRACE_MODIFY_ENABLE_FL;
+   int schedulable = mod_flags & FTRACE_MODIFY_SCHEDULABLE_FL;
int failed;
 
if (unlikely(ftrace_disabled))
@@ -2435,6 +2442,8 @@ void __weak ftrace_replace_code(int enable)
/* Stop processing */
return;
}
+   if (schedulable)
+   cond_resched();
} while_for_each_ftrace_rec();
 }
 
@@ -2548,8 +2557,12 @@ int __weak ftrace_arch_code_modify_post_process(void)
 void ftrace_modify_all_code(int command)
 {
int update = command & FTRACE_UPDATE_TRACE_FUNC;
+   int mod_flags = 0;
int err = 0;
 
+   if (command & FTRACE_SCHEDULABLE)
+   mod_flags = FTRACE_MODIFY_SCHEDULABLE_FL;
+
/*
 * If the ftrace_caller calls a ftrace_ops func directly,
 * we need to make sure that it only traces functions it
@@ -2567,9 +2580,9 @@ void ftrace_modify_all_code(int command)
}
 
if (command & FTRACE_UPDATE_CALLS)
-   ftrace_replace_code(1);
+   ftrace_replace_code(mod_flags | FTRACE_MODIFY_ENABLE_FL);
else if (command & FTRACE_DISABLE_CALLS)
-   ftrace_replace_code(0);
+   ftrace_replace_code(mod_flags);
 
if (update && ftrace_trace_function != ftrace_ops_list_func) {
function_trace_op = set_function_trace_op;
-- 
2.19.1




Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

2018-12-05 Thread Steven Rostedt
On Wed,  5 Dec 2018 17:05:02 +0100
Jiri Olsa  wrote:

> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 3b2490b81918..e55cf9169a03 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -60,6 +60,32 @@ static void do_audit_syscall_entry(struct pt_regs *regs, 
> u32 arch)
>   }
>  }
>  
> +static void trace_block_syscall(struct pt_regs *regs, bool enter)
> +{
> + current->perf_blocked = true;
> +
> + do {
> + schedule_timeout(100 * HZ);
> + current->perf_blocked_cnt = 0;
> +
> + if (enter) {
> + /* perf syscalls:* enter */
> + perf_trace_syscall_enter(regs);
> +
> + /* perf raw_syscalls:* enter */
> + perf_trace_sys_enter(_sys_enter, regs, 
> regs->orig_ax);
> + } else {
> + /* perf syscalls:* enter */
> + perf_trace_syscall_exit(regs);
> +
> + /* perf raw_syscalls:* enter */
> + perf_trace_sys_exit(_sys_exit, regs, regs->ax);
> + }
> + } while (current->perf_blocked_cnt);

I was thinking, if the process reading the perf buffer dies, how do we
tell this task to continue on?

-- Steve

> +
> + current->perf_blocked = false;
> +}
> +


Re: [PATCH 3/9] tools/lib/traceevent: Install trace-seq.h API header file

2018-12-05 Thread Steven Rostedt
On Wed, 5 Dec 2018 13:25:17 +0100
Jiri Olsa  wrote:

> On Tue, Dec 04, 2018 at 02:41:45PM -0500, Steven Rostedt wrote:
> > On Tue, 4 Dec 2018 16:47:39 +0900
> > Namhyung Kim  wrote:
> > 
> >   
> > > > @@ -302,6 +302,7 @@ install_headers:
> > > > $(call QUIET_INSTALL, headers) \
> > > > $(call 
> > > > do_install,event-parse.h,$(prefix)/include/traceevent,644); \
> > > > $(call 
> > > > do_install,event-utils.h,$(prefix)/include/traceevent,644); \
> > > > +   $(call 
> > > > do_install,trace-seq.h,$(prefix)/include/traceevent,644); \
> > > > $(call 
> > > > do_install,kbuffer.h,$(prefix)/include/traceevent,644)
> > > 
> > > Do you still wanna have 'traceevent' directory prefix?  I just
> > > sometimes feel a bit annoying to type it. ;-)  
> > 
> > I'd still want the separate directory for it. I'll probably have a
> > ftrace.h file added to this for ftrace specific code in the future.
> >   
> > > 
> > > Or you can rename it something like 'tep' or 'libtep' - and hopefully
> > > having only single header file to include..
> > >  
> > 
> > Hmm, I wonder if we should just call the directory "trace"?  
> 
> hum, I think it should match the library name, like 'include/tep/'

I was hoping to add other headers in this directly, like ftrace.h and
perf.h ;-)

> 
> also we should change the plugin installation directory
> 
> [jolsa@krava traceevent]$ rpm -ql perf | grep traceevent
> /usr/lib64/traceevent
> /usr/lib64/traceevent/plugins
> /usr/lib64/traceevent/plugins/plugin_cfg80211.so
> /usr/lib64/traceevent/plugins/plugin_function.so
> /usr/lib64/traceevent/plugins/plugin_hrtimer.so
> /usr/lib64/traceevent/plugins/plugin_jbd2.so
> /usr/lib64/traceevent/plugins/plugin_kmem.so
> /usr/lib64/traceevent/plugins/plugin_kvm.so
> /usr/lib64/traceevent/plugins/plugin_mac80211.so
> /usr/lib64/traceevent/plugins/plugin_sched_switch.so
> /usr/lib64/traceevent/plugins/plugin_scsi.so
> /usr/lib64/traceevent/plugins/plugin_xen.so

Change it to tep?

-- Steve



Re: [PATCH v2] tracing: add cond_resched to ftrace_replace_code()

2018-12-05 Thread Steven Rostedt
On Wed, 5 Dec 2018 11:43:12 +0100
Anders Roxell  wrote:

> > > + schedulable = !irqs_disabled() && !preempt_count();  
> >
> > Is there a reason not to use preemptible() here?  
> 
> As I understand it preemptible() is defined to 0 if
> CONFIG_PREEMPT_COUNT is disabled.
> Thats no good right ?

No it's not, which means this isn't a good approach. I have a much
better idea on how to solve this. I'll post a small patch set in a bit.

-- Steve


Re: [PATCH v2 0/4] Static calls

2018-12-04 Thread Steven Rostedt


Where did this end up BTW?

I know that there's controversy about the
CONFIG_HAVE_STATIC_CALL_OPTIMIZED option, but I don't think the 
CONFIG_HAVE_STATIC_CALL_UNOPTIMIZED version was controversial. From the
v1 patch 0 description:

There are three separate implementations, depending on what the arch
supports:

  1) CONFIG_HAVE_STATIC_CALL_OPTIMIZED: patched call sites - requires
 objtool and a small amount of arch code
  
  2) CONFIG_HAVE_STATIC_CALL_UNOPTIMIZED: patched trampolines - requires
 a small amount of arch code
  
  3) If no arch support, fall back to regular function pointers

My benchmarks showed the best improvements with the
STATIC_CALL_OPTIMIZED, but it still showed improvement with the
UNOPTIMIZED version as well. Can we at least apply 2 and 3 from the
above (which happen to be the first part of the patch set. 1 comes in
at the end).

I would also just call it CONFIG_STATIC_CALL. If we every agree on the
optimized version, then we can call it CONFIG_STATIC_CALL_OPTIMIZED.
Have an option called UNOPTIMIZED just seems wrong.

-- Steve



On Mon, 26 Nov 2018 07:54:56 -0600
Josh Poimboeuf  wrote:

> v2:
> - fix STATIC_CALL_TRAMP() macro by using __PASTE() [Ard]
> - rename optimized/unoptimized -> inline/out-of-line [Ard]
> - tweak arch interfaces for PLT and add key->tramp field [Ard]
> - rename 'poison' to 'defuse' and do it after all sites have been patched 
> [Ard]
> - fix .init handling [Ard, Steven]
> - add CONFIG_HAVE_STATIC_CALL [Steven]
> - make interfaces more consistent across configs to allow tracepoints to
>   use them [Steven]
> - move __ADDRESSABLE() to static_call() macro [Steven]
> - prevent 2-byte jumps [Steven]
> - add offset to asm-offsets.c instead of hard coding key->func offset
> - add kernel_text_address() sanity check
> - make __ADDRESSABLE() symbols truly unique
> 
> TODO:
> - port Ard's arm64 patches to the new arch interfaces
> - tracepoint performance testing
> 
> 
> 
> These patches are related to two similar patch sets from Ard and Steve:
> 
> - https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheu...@linaro.org
> - https://lkml.kernel.org/r/20181006015110.653946...@goodmis.org
> 
> The code is also heavily inspired by the jump label code, as some of the
> concepts are very similar.
> 
> There are three separate implementations, depending on what the arch
> supports:
> 
>   1) CONFIG_HAVE_STATIC_CALL_INLINE: patched call sites - requires
>  objtool and a small amount of arch code
>   
>   2) CONFIG_HAVE_STATIC_CALL_OUTLINE: patched trampolines - requires
>  a small amount of arch code
>   
>   3) If no arch support, fall back to regular function pointers
> 
> 
> Josh Poimboeuf (4):
>   compiler.h: Make __ADDRESSABLE() symbol truly unique
>   static_call: Add static call infrastructure
>   x86/static_call: Add out-of-line static call implementation
>   x86/static_call: Add inline static call implementation for x86-64
> 
>  arch/Kconfig  |  10 +
>  arch/x86/Kconfig  |   4 +-
>  arch/x86/include/asm/static_call.h|  52 +++
>  arch/x86/kernel/Makefile  |   1 +
>  arch/x86/kernel/asm-offsets.c |   6 +
>  arch/x86/kernel/static_call.c |  78 
>  include/asm-generic/vmlinux.lds.h |  11 +
>  include/linux/compiler.h  |   2 +-
>  include/linux/module.h|  10 +
>  include/linux/static_call.h   | 202 ++
>  include/linux/static_call_types.h |  19 +
>  kernel/Makefile   |   1 +
>  kernel/module.c   |   5 +
>  kernel/static_call.c  | 350 ++
>  tools/objtool/Makefile|   3 +-
>  tools/objtool/check.c | 126 ++-
>  tools/objtool/check.h |   2 +
>  tools/objtool/elf.h   |   1 +
>  .../objtool/include/linux/static_call_types.h |  19 +
>  tools/objtool/sync-check.sh   |   1 +
>  20 files changed, 899 insertions(+), 4 deletions(-)
>  create mode 100644 arch/x86/include/asm/static_call.h
>  create mode 100644 arch/x86/kernel/static_call.c
>  create mode 100644 include/linux/static_call.h
>  create mode 100644 include/linux/static_call_types.h
>  create mode 100644 kernel/static_call.c
>  create mode 100644 tools/objtool/include/linux/static_call_types.h
> 



Re: [PATCH v3] tracing: add cond_resched to ftrace_replace_code()

2018-12-04 Thread Steven Rostedt
On Tue,  4 Dec 2018 20:40:44 +0100
Anders Roxell  wrote:

> When running in qemu on an kernel built with allmodconfig and debug
> options (in particular kcov and ubsan) enabled, ftrace_replace_code
> function call take minutes. The ftrace selftest calls
> ftrace_replace_code to look >4 through
> ftrace_make_call/ftrace_make_nop, and these end up calling
> __aarch64_insn_write/aarch64_insn_patch_text_nosync.
> 
> Microseconds add up because this is called in a loop for each dyn_ftrace
> record, and this triggers the softlockup watchdog unless we let it sleep
> occasionally.
> 
> Rework so that we call cond_resched() if !irqs_disabled() && !preempt_count().

This isn't urgent is it? That is, it doesn't need a stable tag?

-- Steve




Re: [PATCH 3/9] tools/lib/traceevent: Install trace-seq.h API header file

2018-12-04 Thread Steven Rostedt
On Tue, 4 Dec 2018 16:47:39 +0900
Namhyung Kim  wrote:


> > @@ -302,6 +302,7 @@ install_headers:
> > $(call QUIET_INSTALL, headers) \
> > $(call 
> > do_install,event-parse.h,$(prefix)/include/traceevent,644); \
> > $(call 
> > do_install,event-utils.h,$(prefix)/include/traceevent,644); \
> > +   $(call 
> > do_install,trace-seq.h,$(prefix)/include/traceevent,644); \
> > $(call do_install,kbuffer.h,$(prefix)/include/traceevent,644)  
> 
> Do you still wanna have 'traceevent' directory prefix?  I just
> sometimes feel a bit annoying to type it. ;-)

I'd still want the separate directory for it. I'll probably have a
ftrace.h file added to this for ftrace specific code in the future.

> 
> Or you can rename it something like 'tep' or 'libtep' - and hopefully
> having only single header file to include..
>

Hmm, I wonder if we should just call the directory "trace"?

-- Steve


Re: Strange hang with gcc 8 of kprobe multiple_kprobes test

2018-12-04 Thread Steven Rostedt
On Tue, 4 Dec 2018 09:15:06 +0100
Ingo Molnar  wrote:

> * Masami Hiramatsu  wrote:
> 
> > I remember I have fixed this, and actually WE did it :-D
> > 
> > https://lkml.org/lkml/2018/8/23/1203
> > 
> > Ah, we hit a same bug...
> > 
> > Ingo, could you pick the patch? Should I resend it?  
> 
> Indeed: I just picked it up into tip:perf/urgent.
> 
> It's my bad: I missed the original submission due to Steve's feedback 
> which I mistook as a request for another iteration, while he only 
> commented on the reason for the original breakage and gave his 
> Reviewed-by ...
> 

Sorry for the confusion. The patch and code was quite complex, and I
was documenting what I thought of the patch (and the bug), so that my
Reviewed-by had a bit more meaning.

-- Steve


Re: [PATCH] tracing: add cond_resched to ftrace_replace_code()

2018-12-04 Thread Steven Rostedt
On Tue, 4 Dec 2018 20:25:31 +0100
Anders Roxell  wrote:

> On Tue, 4 Dec 2018 at 20:21, Steven Rostedt  wrote:
> >
> > On Tue, 4 Dec 2018 14:19:08 -0500
> > Steven Rostedt  wrote:
> >  
> > > > @@ -2435,6 +2438,13 @@ void __weak ftrace_replace_code(int enable)
> > > > /* Stop processing */
> > > > return;
> > > > }
> > > > +   /*
> > > > +* Some archs calls this function with interrupts or 
> > > > preemption
> > > > +* disabled. Howeve, other archs don't and this can cause a 
> > > >  
> >
> > typo "However". But could you write it this way:
> >
> > "However, for other archs that can preempt, this can cause an
> > tremendous unneeded latency."
> >  
> 
> I'll fix this and move it up (where I added it in the first place but
> moved it) =)
>

I also noticed a grammar issue:

"Some archs call this function .."

-- Steve


Re: [PATCH] tracing: add cond_resched to ftrace_replace_code()

2018-12-04 Thread Steven Rostedt
On Tue, 4 Dec 2018 14:19:08 -0500
Steven Rostedt  wrote:

> > @@ -2435,6 +2438,13 @@ void __weak ftrace_replace_code(int enable)
> > /* Stop processing */
> > return;
> > }
> > +   /*
> > +* Some archs calls this function with interrupts or preemption
> > +* disabled. Howeve, other archs don't and this can cause a

typo "However". But could you write it this way:

"However, for other archs that can preempt, this can cause an
tremendous unneeded latency."

Thanks!

-- Steve

> > +* tremendous unneeded latency.
> > +*/  
> 
> Actually, could you move the comment up where schedulable gets set?
> 
> Thanks!
> 
> -- Steve
> 
> > +   if (schedulable)
> > +   cond_resched();
> > } while_for_each_ftrace_rec();
> >  }
> >
> 



Re: [PATCH] tracing: add cond_resched to ftrace_replace_code()

2018-12-04 Thread Steven Rostedt
On Tue,  4 Dec 2018 20:12:28 +0100
Anders Roxell  wrote:

> When running in qemu on an kernel built with allmodconfig and debug
> options (in particular kcov and ubsan) enabled, ftrace_replace_code
> function call take minutes. The ftrace selftest calls
> ftrace_replace_code to look >4 through
> ftrace_make_call/ftrace_make_nop, and these end up calling
> __aarch64_insn_write/aarch64_insn_patch_text_nosync.
> 
> Microseconds add up because this is called in a loop for each dyn_ftrace
> record, and this triggers the softlockup watchdog unless we let it sleep
> occasionally.
> 
> Rework so that we call cond_resched() if !irqs_disabled() && !preempt_count().
> 
> Suggested-by: Steven Rostedt (VMware) 
> Signed-off-by: Anders Roxell 
> ---
>  kernel/trace/ftrace.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index c375e33239f7..582e3441e318 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2419,11 +2419,14 @@ void __weak ftrace_replace_code(int enable)
>  {
>   struct dyn_ftrace *rec;
>   struct ftrace_page *pg;
> + bool schedulable;
>   int failed;
>  
>   if (unlikely(ftrace_disabled))
>   return;
>  
> + schedulable = !irqs_disabled() && !preempt_count();
> +
>   do_for_each_ftrace_rec(pg, rec) {
>  
>   if (rec->flags & FTRACE_FL_DISABLED)
> @@ -2435,6 +2438,13 @@ void __weak ftrace_replace_code(int enable)
>   /* Stop processing */
>   return;
>   }
> + /*
> +  * Some archs calls this function with interrupts or preemption
> +  * disabled. Howeve, other archs don't and this can cause a
> +  * tremendous unneeded latency.
> +  */

Actually, could you move the comment up where schedulable gets set?

Thanks!

-- Steve

> + if (schedulable)
> + cond_resched();
>   } while_for_each_ftrace_rec();
>  }
>  



Re: BUG: ftrace/perf dropping events at the begin of interrupt handlers

2018-12-04 Thread Steven Rostedt
On Thu, 22 Nov 2018 10:45:05 +0100
Daniel Bristot de Oliveira  wrote:

> While developing the automata [1], I've hit cases in which need resched
> and/or sched wakeup events were being fired with preemption and/or
> interrupts enabled. However, this is not possible because interrupts must
> be disabled to avoid concurrence with an interrupt handler, and the
> preemption must be disabled to avoid concurrence with the scheduler.
> The tool I use to validate the model is based on perf, and it was
> complaining about this situation. I’ve talked to Arnaldo about it
> two months.
> 
> Further debug on perf has shown that those cases always took place
> associated with the occurrence of interrupts. At ELC Europe
> Marko Pusch (Siemens) also mentioned hitting cases in which he saw
> missing events related to IRQ handling, but using ftrace. Steven and
> I also discussed this during the last Plumbers (Vancouver - CA) and we
> agreed that there is a problem on ftrace too.
> 
> To reproduce this problem with ftrace, one needs to enable function
> tracer and do kernel operations in a CPU in which IRQs are taking place.
> 
> For instance, in a single CPU VM, run:
> 
> # while [ 1 ]; do echo > /dev/null; done
> 
> In a shell, and 
> 
> # trace-cmd record -b [enough buff to avoid missing trace because of buffer 
> overun] -p function sleep 5
> 
> In another shell.
> 
> Then, using trace-cmd report --debug, we can see the problem. Here is one
> example of output, first the expected one:
> 
> 


>  Faulty execution =
> 
> Thus, ftrace and perf, sometimes, are dropping events at the begin of
> interrupt handlers. And that is The reason why interrupt disable and
> preempt disable were not being recorded (and causing problem in the
> automata execution) is that these events take place in the very early
> execution of the interrupt handler, in the section in which the
> perf/ftrace are dropping events [ without notifying ].
> 
> [1] This is a good demonstration of problems that can be found using the
> automata model presented in this workshop paper:
> 
> Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using
> Automata. Daniel Bristot de Oliveira, Tommaso Cucinotta, Rômulo Silva de
> Oliveira - EWiLi'2018 – Embedded operating system workshop Torino, Italy,
> 4 October 2018.
> 
> And in the presentations:
> "Mind the gap between real-time Linux and real-time theory"
> "How can we catch problems that can break the PREEMPT_RT preemption model?"
> At the Linux Plumbers (Vancouver - CA)
> 
> Steven is already aware of this problem, and he is working on it.

Yes, it's a simple fix. The problem is that the recursion detection of
the function tracer requires that when its called from interrupt, the
"in_interrupt" needs to be true, otherwise it thinks that the function
tracer is recursing on itself (which is common).

Looking an the dropped events, and the code in __irq_enter() we have
this:

#define __irq_enter()   \
do {\
account_irq_enter_time(current);\
preempt_count_add(HARDIRQ_OFFSET);  \ <<-- in_interrupt() 
returns true here
trace_hardirq_enter();  \
} while (0)

Interesting enough, the dropped events happen to be in
account_irq_enter_time()!

Thus what I believe is happening is that an interrupt came in while one
event was being recorded. When account_irq_enter_time was called, the
function tracer noticed that its recursion bit for the current context
was already set, and just dropped the event because it thought it was
just tracing itself. After we add HARDIRQ_OFFSET to preempt_count, the
"in_interrupt()" will be set and the function tracer will know its in a
new context where its safe to continue tracing.

Can you try this patch to see if it fixes it for you?

-- Steve

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 0fbbcdf0c178..0290531ebe3c 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -35,8 +35,8 @@ extern void rcu_nmi_exit(void);
  */
 #define __irq_enter()  \
do {\
-   account_irq_enter_time(current);\
preempt_count_add(HARDIRQ_OFFSET);  \
+   account_irq_enter_time(current);\
trace_hardirq_enter();  \
} while (0)
 


Re: [PATCH v2 10/12] tracing: Remove orphaned trace_add/remove_event_call functions

2018-12-04 Thread Steven Rostedt
On Mon,  5 Nov 2018 18:04:29 +0900
Masami Hiramatsu  wrote:

> Remove trace_add_event_call() and trace_remove_event_call()
> functions since those are not used anymore.
> 
> Signed-off-by: Masami Hiramatsu 

Hi Masami,

I've applied the series locally (need to test it) except for this
patch. Honestly, I hate the "_nolock" name, and it makes no sense when

 1) they still grab locks
 2) there's no version without "_nolock"

I added this patch in its place:

-- Steve

From: "Steven Rostedt (VMware)" 
Date: Tue, 4 Dec 2018 13:35:45 -0500
Subject: [PATCH] tracing: Consolidate trace_add/remove_event_call back to the
 nolock functions

The trace_add/remove_event_call_nolock() functions were added to allow
the tace_add/remove_event_call() code be called when the event_mutex
lock was already taken. Now that all callers are done within the
event_mutex, there's no reason to have two different interfaces.

Remove the current wrapper trace_add/remove_event_call()s and rename the
_nolock versions back to the original names.

Link: 
http://lkml.kernel.org/r/154140866955.17322.2081425494660638846.stgit@devbox

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/trace_events.h |  2 --
 kernel/trace/trace_events.c  | 30 --
 kernel/trace/trace_events_hist.c |  6 +++---
 kernel/trace/trace_kprobe.c  |  4 ++--
 kernel/trace/trace_uprobe.c  |  4 ++--
 5 files changed, 11 insertions(+), 35 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 3aa05593a53f..4130a5497d40 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -529,8 +529,6 @@ extern int trace_event_raw_init(struct trace_event_call 
*call);
 extern int trace_define_field(struct trace_event_call *call, const char *type,
  const char *name, int offset, int size,
  int is_signed, int filter_type);
-extern int trace_add_event_call_nolock(struct trace_event_call *call);
-extern int trace_remove_event_call_nolock(struct trace_event_call *call);
 extern int trace_add_event_call(struct trace_event_call *call);
 extern int trace_remove_event_call(struct trace_event_call *call);
 extern int trace_event_get_offsets(struct trace_event_call *call);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index a3b157f689ee..bd0162c0467c 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2305,7 +2305,8 @@ __trace_early_add_new_event(struct trace_event_call *call,
 struct ftrace_module_file_ops;
 static void __add_event_to_tracers(struct trace_event_call *call);
 
-int trace_add_event_call_nolock(struct trace_event_call *call)
+/* Add an additional event_call dynamically */
+int trace_add_event_call(struct trace_event_call *call)
 {
int ret;
lockdep_assert_held(_mutex);
@@ -2320,17 +2321,6 @@ int trace_add_event_call_nolock(struct trace_event_call 
*call)
return ret;
 }
 
-/* Add an additional event_call dynamically */
-int trace_add_event_call(struct trace_event_call *call)
-{
-   int ret;
-
-   mutex_lock(_mutex);
-   ret = trace_add_event_call_nolock(call);
-   mutex_unlock(_mutex);
-   return ret;
-}
-
 /*
  * Must be called under locking of trace_types_lock, event_mutex and
  * trace_event_sem.
@@ -2376,8 +2366,8 @@ static int probe_remove_event_call(struct 
trace_event_call *call)
return 0;
 }
 
-/* no event_mutex version */
-int trace_remove_event_call_nolock(struct trace_event_call *call)
+/* Remove an event_call */
+int trace_remove_event_call(struct trace_event_call *call)
 {
int ret;
 
@@ -2392,18 +2382,6 @@ int trace_remove_event_call_nolock(struct 
trace_event_call *call)
return ret;
 }
 
-/* Remove an event_call */
-int trace_remove_event_call(struct trace_event_call *call)
-{
-   int ret;
-
-   mutex_lock(_mutex);
-   ret = trace_remove_event_call_nolock(call);
-   mutex_unlock(_mutex);
-
-   return ret;
-}
-
 #define for_each_event(event, start, end)  \
for (event = start; \
 (unsigned long)event < (unsigned long)end; \
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 21e4954375a1..82e72c48a5a9 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -960,7 +960,7 @@ static int register_synth_event(struct synth_event *event)
call->data = event;
call->tp = event->tp;
 
-   ret = trace_add_event_call_nolock(call);
+   ret = trace_add_event_call(call);
if (ret) {
pr_warn("Failed to register synthetic event: %s\n",
trace_event_name(call));
@@ -969,7 +969,7 @@ static int register_synth_event(struct synth_event *event)
 
ret = set_synth_event_print_fmt(call);
if

Re: [PATCH 3/3] arm64: ftrace: add cond_resched() to func ftrace_make_(call|nop)

2018-12-04 Thread Steven Rostedt
On Tue, 4 Dec 2018 19:07:16 +0100
Anders Roxell  wrote:


> > > > +   schedulable = !irqs_disabled() & !preempt_count();  
> > >
> > > Looks suspiciously like a bitwise preemptible() to me!  
> >
> > Ah, thanks. Yeah, that should have been &&. But what did you expect.
> > I didn't even compile this ;-)
> >  


> > If it does, then I'll add it. Or take a patch for it ;-)  
> 
> I tested your patch. it worked.
> 
> I'll send a patch shortly.
>

Thanks. Please add a comment above the schedulable test stating that
some archs call this with interrupts or preemption disabled, but
other archs don't and this can cause a tremendous unneeded latency.

-- Steve
 


Re: [PATCH v2 01/12] tracing/uprobes: Add busy check when cleanup all uprobes

2018-12-04 Thread Steven Rostedt
On Mon,  5 Nov 2018 18:00:15 +0900
Masami Hiramatsu  wrote:

> Add a busy check loop in cleanup_all_probes() before
> trying to remove all events in uprobe_events as same as
> kprobe_events does.
> 
> Without this change, writing null to uprobe_events will
> try to remove events but if one of them is enabled, it
> stopped there but some of events are already cleared.
> 
> With this change, writing null to uprobe_events make
> sure all events are not enabled before removing events.
> So, it clears all events, or return an error (-EBUSY)
> with keeping all events.
> 

Hmm, should this patch be marked as stable?

-- Steve

> Signed-off-by: Masami Hiramatsu 
> ---
>  kernel/trace/trace_uprobe.c |7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 31ea48eceda1..b708e4ff7ea7 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -587,12 +587,19 @@ static int cleanup_all_probes(void)
>   int ret = 0;
>  
>   mutex_lock(_lock);
> + /* Ensure no probe is in use. */
> + list_for_each_entry(tu, _list, list)
> + if (trace_probe_is_enabled(>tp)) {
> + ret = -EBUSY;
> + goto end;
> + }
>   while (!list_empty(_list)) {
>   tu = list_entry(uprobe_list.next, struct trace_uprobe, list);
>   ret = unregister_trace_uprobe(tu);
>   if (ret)
>   break;
>   }
> +end:
>   mutex_unlock(_lock);
>   return ret;
>  }



Re: [PATCH 2/9] tools/lib/traceevent: Added support for pkg-config

2018-12-04 Thread Steven Rostedt
On Tue, 4 Dec 2018 16:32:35 +0900
Namhyung Kim  wrote:

> > +++ b/tools/lib/traceevent/libtraceevent.pc.template
> > @@ -0,0 +1,10 @@
> > +prefix=INSTALL_PREFIX
> > +libdir=${prefix}/lib64  
> 
> Don't we care 32-bit systems anymore? :)

No we don't ;-)

But, I guess because some people still do, we need to fix it.

Thanks for reviewing!

-- Steve


Re: [PATCH 3/3] arm64: ftrace: add cond_resched() to func ftrace_make_(call|nop)

2018-12-04 Thread Steven Rostedt
On Tue, 4 Dec 2018 11:12:43 +
Will Deacon  wrote:

> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index 8ef9fc226037..42e89397778b 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -2393,11 +2393,14 @@ void __weak ftrace_replace_code(int enable)
> >  {
> > struct dyn_ftrace *rec;
> > struct ftrace_page *pg;
> > +   bool schedulable;
> > int failed;
> >  
> > if (unlikely(ftrace_disabled))
> > return;
> >  
> > +   schedulable = !irqs_disabled() & !preempt_count();  
> 
> Looks suspiciously like a bitwise preemptible() to me!

Ah, thanks. Yeah, that should have been &&. But what did you expect.
I didn't even compile this ;-)

> 
> > +
> > do_for_each_ftrace_rec(pg, rec) {
> >  
> > if (rec->flags & FTRACE_FL_DISABLED)
> > @@ -2409,6 +2412,8 @@ void __weak ftrace_replace_code(int enable)
> > /* Stop processing */
> > return;
> > }
> > +   if (schedulable)
> > +   cond_resched();
> > } while_for_each_ftrace_rec();
> >  }  
> 
> If this solves the problem in core code, them I'm all for it. Otherwise, I
> was thinking of rolling our own ftrace_replace_code() for arm64, but that's
> going to involve a fair amount of duplication.
> 

If it does, then I'll add it. Or take a patch for it ;-) 

My main concern is that this can be called from non schedulable context.

-- Steve


Re: [PATCH] tools: Fix diverse typos

2018-12-04 Thread Steven Rostedt
On Tue, 4 Dec 2018 10:41:22 -0300
Arnaldo Carvalho de Melo  wrote:

> Em Mon, Dec 03, 2018 at 11:22:00AM +0100, Ingo Molnar escreveu:
> > Go over the tools/ files that are maintained in Arnaldo's tree and
> > fix common typos: half of them were in comments, the other half
> > in JSON files.  
> 
> Steven, Tzvetomir,
> 
> I'm going to split this patch into different subsystems, will have you
> in the CC list for the libtracecmd ones, so that it becomes easier for
> you guys to pick these fixes,

Thanks Arnaldo, much appreciated.

-- Steve


Re: [PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-12-03 Thread Steven Rostedt
On Mon, 3 Dec 2018 11:52:41 +0530
Ravi Bangoria  wrote:

> Hi Steve,
> 
> Please pull this patch.
> 

Please send a v2 version of the patch with the updated change log. And
should it have a Fixes and be tagged for stable?

-- Steve

> Thanks.
> 
> On 11/15/18 6:13 PM, Oleg Nesterov wrote:
> > On 11/15, Ravi Bangoria wrote:  
> >>
> >> There could be a race between task exit and probe unregister:
> >>
> >>   exit_mm()
> >>   mmput()
> >>   __mmput() uprobe_unregister()
> >>   uprobe_clear_state()  put_uprobe()
> >>   delayed_uprobe_remove()   delayed_uprobe_remove()
> >>
> >> put_uprobe() is calling delayed_uprobe_remove() without taking
> >> delayed_uprobe_lock and thus the race sometimes results in a
> >> kernel crash. Fix this by taking delayed_uprobe_lock before
> >> calling delayed_uprobe_remove() from put_uprobe().
> >>
> >> Detailed crash log can be found at:
> >>   https://lkml.org/lkml/2018/11/1/1244  
> > 
> > Thanks, looks good,
> > 
> > Oleg.
> >   



Re: [PATCH 3/3] arm64: ftrace: add cond_resched() to func ftrace_make_(call|nop)

2018-12-03 Thread Steven Rostedt
On Mon, 3 Dec 2018 22:51:52 +0100
Arnd Bergmann  wrote:

> On Mon, Dec 3, 2018 at 8:22 PM Will Deacon  wrote:
> >
> > Hi Anders,
> >
> > On Fri, Nov 30, 2018 at 04:09:56PM +0100, Anders Roxell wrote:  
> > > Both of those functions end up calling ftrace_modify_code(), which is
> > > expensive because it changes the page tables and flush caches.
> > > Microseconds add up because this is called in a loop for each dyn_ftrace
> > > record, and this triggers the softlockup watchdog unless we let it sleep
> > > occasionally.
> > > Rework so that we call cond_resched() before going into the
> > > ftrace_modify_code() function.
> > >
> > > Co-developed-by: Arnd Bergmann 
> > > Signed-off-by: Arnd Bergmann 
> > > Signed-off-by: Anders Roxell 
> > > ---
> > >  arch/arm64/kernel/ftrace.c | 10 ++
> > >  1 file changed, 10 insertions(+)  
> >
> > It sounds like you're running into issues with the existing code, but I'd
> > like to understand a bit more about exactly what you're seeing. Which part
> > of the ftrace patching is proving to be expensive?
> >
> > The page table manipulation only happens once per module when using PLTs,
> > and the cache maintenance is just a single line per patch site without an
> > IPI.
> >
> > Is it the loop in ftrace_replace_code() that is causing the hassle?  
> 
> Yes: with an allmodconfig kernel, the ftrace selftest calls 
> ftrace_replace_code
> to look >4 through ftrace_make_call/ftrace_make_nop, and these
> end up calling
> 
> static int __kprobes __aarch64_insn_write(void *addr, __le32 insn)
> {
> void *waddr = addr;
> unsigned long flags = 0;
> int ret;
> 
> raw_spin_lock_irqsave(_lock, flags);
> waddr = patch_map(addr, FIX_TEXT_POKE0);
> 
> ret = probe_kernel_write(waddr, , AARCH64_INSN_SIZE);
> 
> patch_unmap(FIX_TEXT_POKE0);
> raw_spin_unlock_irqrestore(_lock, flags);
> 
> return ret;
> }
> int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
> {
> u32 *tp = addr;
> int ret;
> 
> /* A64 instructions must be word aligned */
> if ((uintptr_t)tp & 0x3)
> return -EINVAL;
> 
> ret = aarch64_insn_write(tp, insn);
> if (ret == 0)
> __flush_icache_range((uintptr_t)tp,
>  (uintptr_t)tp + AARCH64_INSN_SIZE);
> 
> return ret;
> }
> 
> which seems to be where the main cost is. This is with inside of
> qemu, and with lots of debugging options (in particular
> kcov and ubsan) enabled, that make each function call
> more expensive.

I was thinking more about this. Would something like this work?

-- Steve

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8ef9fc226037..42e89397778b 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2393,11 +2393,14 @@ void __weak ftrace_replace_code(int enable)
 {
struct dyn_ftrace *rec;
struct ftrace_page *pg;
+   bool schedulable;
int failed;
 
if (unlikely(ftrace_disabled))
return;
 
+   schedulable = !irqs_disabled() & !preempt_count();
+
do_for_each_ftrace_rec(pg, rec) {
 
if (rec->flags & FTRACE_FL_DISABLED)
@@ -2409,6 +2412,8 @@ void __weak ftrace_replace_code(int enable)
/* Stop processing */
return;
}
+   if (schedulable)
+   cond_resched();
} while_for_each_ftrace_rec();
 }
 



[PATCH 5/6] tools/lib/traceevent: Rename tep_is_file_bigendian() to tep_file_bigendian()

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

In order to make libtraceevent into a proper library, its API
should be straightforward. After discussion with Steven Rostedt,
we decided to rename a few APIs, to have more intuitive names.
This patch renames tep_is_file_bigendian() to tep_file_bigendian().

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/event-parse-api.c | 4 ++--
 tools/lib/traceevent/event-parse.h | 2 +-
 tools/lib/traceevent/plugin_kvm.c  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/lib/traceevent/event-parse-api.c 
b/tools/lib/traceevent/event-parse-api.c
index 8b31c0e00ba3..d463761a58f4 100644
--- a/tools/lib/traceevent/event-parse-api.c
+++ b/tools/lib/traceevent/event-parse-api.c
@@ -194,13 +194,13 @@ void tep_set_page_size(struct tep_handle *pevent, int 
_page_size)
 }
 
 /**
- * tep_is_file_bigendian - get if the file is in big endian order
+ * tep_file_bigendian - get if the file is in big endian order
  * @pevent: a handle to the tep_handle
  *
  * This returns if the file is in big endian order
  * If @pevent is NULL, 0 is returned.
  */
-int tep_is_file_bigendian(struct tep_handle *pevent)
+int tep_file_bigendian(struct tep_handle *pevent)
 {
if(pevent)
return pevent->file_bigendian;
diff --git a/tools/lib/traceevent/event-parse.h 
b/tools/lib/traceevent/event-parse.h
index ac377ae99008..bd1bd9a27839 100644
--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -559,7 +559,7 @@ int tep_get_long_size(struct tep_handle *pevent);
 void tep_set_long_size(struct tep_handle *pevent, int long_size);
 int tep_get_page_size(struct tep_handle *pevent);
 void tep_set_page_size(struct tep_handle *pevent, int _page_size);
-int tep_is_file_bigendian(struct tep_handle *pevent);
+int tep_file_bigendian(struct tep_handle *pevent);
 void tep_set_file_bigendian(struct tep_handle *pevent, enum tep_endian endian);
 int tep_is_host_bigendian(struct tep_handle *pevent);
 void tep_set_host_bigendian(struct tep_handle *pevent, enum tep_endian endian);
diff --git a/tools/lib/traceevent/plugin_kvm.c 
b/tools/lib/traceevent/plugin_kvm.c
index 637be7c18476..388a78a6035f 100644
--- a/tools/lib/traceevent/plugin_kvm.c
+++ b/tools/lib/traceevent/plugin_kvm.c
@@ -389,7 +389,7 @@ static int kvm_mmu_print_role(struct trace_seq *s, struct 
tep_record *record,
 * We can only use the structure if file is of the same
 * endianess.
 */
-   if (tep_is_file_bigendian(event->pevent) ==
+   if (tep_file_bigendian(event->pevent) ==
tep_is_host_bigendian(event->pevent)) {
 
trace_seq_printf(s, "%u q%u%s %s%s %spae %snxe %swp%s%s%s",
-- 
2.19.1




[PATCH 1/6] tools/lib/traceevent: Initialize host_bigendian at tep_handle allocation

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

This patch initializes host_bigendian member of the tep_handle structure
with the byte order of current host, when this handler is created - in
tep_alloc() API. We need this in order to remove tep_set_host_bigendian() API.

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/event-parse.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index 0923e331441e..5cd99bdb0517 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -6761,8 +6761,10 @@ struct tep_handle *tep_alloc(void)
 {
struct tep_handle *pevent = calloc(1, sizeof(*pevent));
 
-   if (pevent)
+   if (pevent) {
pevent->ref_count = 1;
+   pevent->host_bigendian = tep_host_bigendian();
+   }
 
return pevent;
 }
-- 
2.19.1




[PATCH 0/6] tools/lib/traceevent: Some more library updates

2018-11-30 Thread Steven Rostedt
Arnaldo and Jiri,

Here's another set of patches to get us closer to having a legitimate
standalone library for libtraceevent. There's still a lot of man pages
to come, but I need to continue reviewing them.

Please pull this tree (based on current tip/perf/core) or apply
the patches.

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/perf/core

Head SHA1: 5b2e18a71601544c4ae95e9ed1f953f3883714f5


Tzvetomir Stoyanov (6):
  tools/lib/traceevent: Initialize host_bigendian at tep_handle allocation
  tools/lib/traceevent: Rename struct cmdline to struct tep_cmdline
  tools/lib/traceevent: Changed return logic of trace_seq_printf() and 
trace_seq_vprintf() APIs
  tools/lib/traceevent: Changed return logic of 
tep_register_event_handler() API
  tools/lib/traceevent: Rename tep_is_file_bigendian() to 
tep_file_bigendian()
  tools/lib/traceevent: Remove tep_data_event_from_type() API


 tools/lib/traceevent/event-parse-api.c   |  4 +--
 tools/lib/traceevent/event-parse-local.h |  4 +--
 tools/lib/traceevent/event-parse.c   | 62 +++-
 tools/lib/traceevent/event-parse.h   | 16 +
 tools/lib/traceevent/plugin_kvm.c|  2 +-
 tools/lib/traceevent/trace-seq.c | 17 ++---
 6 files changed, 56 insertions(+), 49 deletions(-)


[PATCH 4/6] tools/lib/traceevent: Changed return logic of tep_register_event_handler() API

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

In order to make libtraceevent into a proper library, its API
should be straightforward. The tep_register_event_handler()
functions returns -1 in case it successfully registers the
new event handler. Such return code is used by the other library
APIs in case of an error. To unify the return logic of
tep_register_event_handler() with the other APIs, this patch
introduces enum tep_reg_handler, which is used by this function
as return value, to handle all possible successful return cases.

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/event-parse.c | 10 --
 tools/lib/traceevent/event-parse.h |  5 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index c3d22d0a2935..b3c00d6b524e 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -6631,6 +6631,12 @@ static struct tep_event *search_event(struct tep_handle 
*pevent, int id,
  *
  * If @id is >= 0, then it is used to find the event.
  * else @sys_name and @event_name are used.
+ *
+ * Returns:
+ *  TEP_REGISTER_SUCCESS_OVERWRITE if an existing handler is overwritten
+ *  TEP_REGISTER_SUCCESS if a new handler is registered successfully
+ *  negative TEP_ERRNO_... in case of an error
+ *
  */
 int tep_register_event_handler(struct tep_handle *pevent, int id,
   const char *sys_name, const char *event_name,
@@ -6648,7 +6654,7 @@ int tep_register_event_handler(struct tep_handle *pevent, 
int id,
 
event->handler = func;
event->context = context;
-   return 0;
+   return TEP_REGISTER_SUCCESS_OVERWRITE;
 
  not_found:
/* Save for later use. */
@@ -6678,7 +6684,7 @@ int tep_register_event_handler(struct tep_handle *pevent, 
int id,
pevent->handlers = handle;
handle->context = context;
 
-   return -1;
+   return TEP_REGISTER_SUCCESS;
 }
 
 static int handle_matches(struct event_handler *handler, int id,
diff --git a/tools/lib/traceevent/event-parse.h 
b/tools/lib/traceevent/event-parse.h
index 77a4a1dd4b4d..ac377ae99008 100644
--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -485,6 +485,11 @@ int tep_print_func_field(struct trace_seq *s, const char 
*fmt,
 struct tep_event *event, const char *name,
 struct tep_record *record, int err);
 
+enum tep_reg_handler {
+   TEP_REGISTER_SUCCESS = 0,
+   TEP_REGISTER_SUCCESS_OVERWRITE,
+};
+
 int tep_register_event_handler(struct tep_handle *pevent, int id,
   const char *sys_name, const char *event_name,
   tep_event_handler_func func, void *context);
-- 
2.19.1




[PATCH 6/6] tools/lib/traceevent: Remove tep_data_event_from_type() API

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

In order to make libtraceevent into a proper library, its API
should be straightforward. After discussion with Steven Rostedt,
we decided to remove the tep_data_event_from_type() API and to
replace it with tep_find_event(), as it does the same.

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/event-parse.c | 12 
 tools/lib/traceevent/event-parse.h |  1 -
 2 files changed, 13 deletions(-)

diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index b3c00d6b524e..f84ce3897ce6 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -5264,18 +5264,6 @@ int tep_data_type(struct tep_handle *pevent, struct 
tep_record *rec)
return trace_parse_common_type(pevent, rec->data);
 }
 
-/**
- * tep_data_event_from_type - find the event by a given type
- * @pevent: a handle to the pevent
- * @type: the type of the event.
- *
- * This returns the event form a given @type;
- */
-struct tep_event *tep_data_event_from_type(struct tep_handle *pevent, int type)
-{
-   return tep_find_event(pevent, type);
-}
-
 /**
  * tep_data_pid - parse the PID from record
  * @pevent: a handle to the pevent
diff --git a/tools/lib/traceevent/event-parse.h 
b/tools/lib/traceevent/event-parse.h
index bd1bd9a27839..aec48f2aea8a 100644
--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -526,7 +526,6 @@ tep_find_event_by_record(struct tep_handle *pevent, struct 
tep_record *record);
 void tep_data_lat_fmt(struct tep_handle *pevent,
  struct trace_seq *s, struct tep_record *record);
 int tep_data_type(struct tep_handle *pevent, struct tep_record *rec);
-struct tep_event *tep_data_event_from_type(struct tep_handle *pevent, int 
type);
 int tep_data_pid(struct tep_handle *pevent, struct tep_record *rec);
 int tep_data_preempt_count(struct tep_handle *pevent, struct tep_record *rec);
 int tep_data_flags(struct tep_handle *pevent, struct tep_record *rec);
-- 
2.19.1




[PATCH 2/6] tools/lib/traceevent: Rename struct cmdline to struct tep_cmdline

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

In order to make libtraceevent into a proper library, variables, data
structures and functions require a unique prefix to prevent name space
conflicts. That prefix will be "tep_".
This patch renames struct cmdline to struct tep_cmdline.

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/event-parse-local.h |  4 +--
 tools/lib/traceevent/event-parse.c   | 36 
 tools/lib/traceevent/event-parse.h   |  8 +++---
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/tools/lib/traceevent/event-parse-local.h 
b/tools/lib/traceevent/event-parse-local.h
index 9a092dd4a86d..35833ee32d6c 100644
--- a/tools/lib/traceevent/event-parse-local.h
+++ b/tools/lib/traceevent/event-parse-local.h
@@ -7,7 +7,7 @@
 #ifndef _PARSE_EVENTS_INT_H
 #define _PARSE_EVENTS_INT_H
 
-struct cmdline;
+struct tep_cmdline;
 struct cmdline_list;
 struct func_map;
 struct func_list;
@@ -36,7 +36,7 @@ struct tep_handle {
int long_size;
int page_size;
 
-   struct cmdline *cmdlines;
+   struct tep_cmdline *cmdlines;
struct cmdline_list *cmdlist;
int cmdline_count;
 
diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index 5cd99bdb0517..c3d22d0a2935 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -124,15 +124,15 @@ struct tep_print_arg *alloc_arg(void)
return calloc(1, sizeof(struct tep_print_arg));
 }
 
-struct cmdline {
+struct tep_cmdline {
char *comm;
int pid;
 };
 
 static int cmdline_cmp(const void *a, const void *b)
 {
-   const struct cmdline *ca = a;
-   const struct cmdline *cb = b;
+   const struct tep_cmdline *ca = a;
+   const struct tep_cmdline *cb = b;
 
if (ca->pid < cb->pid)
return -1;
@@ -152,7 +152,7 @@ static int cmdline_init(struct tep_handle *pevent)
 {
struct cmdline_list *cmdlist = pevent->cmdlist;
struct cmdline_list *item;
-   struct cmdline *cmdlines;
+   struct tep_cmdline *cmdlines;
int i;
 
cmdlines = malloc(sizeof(*cmdlines) * pevent->cmdline_count);
@@ -179,8 +179,8 @@ static int cmdline_init(struct tep_handle *pevent)
 
 static const char *find_cmdline(struct tep_handle *pevent, int pid)
 {
-   const struct cmdline *comm;
-   struct cmdline key;
+   const struct tep_cmdline *comm;
+   struct tep_cmdline key;
 
if (!pid)
return "";
@@ -208,8 +208,8 @@ static const char *find_cmdline(struct tep_handle *pevent, 
int pid)
  */
 int tep_pid_is_registered(struct tep_handle *pevent, int pid)
 {
-   const struct cmdline *comm;
-   struct cmdline key;
+   const struct tep_cmdline *comm;
+   struct tep_cmdline key;
 
if (!pid)
return 1;
@@ -235,9 +235,9 @@ int tep_pid_is_registered(struct tep_handle *pevent, int 
pid)
 static int add_new_comm(struct tep_handle *pevent,
const char *comm, int pid, bool override)
 {
-   struct cmdline *cmdlines = pevent->cmdlines;
-   struct cmdline *cmdline;
-   struct cmdline key;
+   struct tep_cmdline *cmdlines = pevent->cmdlines;
+   struct tep_cmdline *cmdline;
+   struct tep_cmdline key;
char *new_comm;
 
if (!pid)
@@ -5330,8 +5330,8 @@ const char *tep_data_comm_from_pid(struct tep_handle 
*pevent, int pid)
return comm;
 }
 
-static struct cmdline *
-pid_from_cmdlist(struct tep_handle *pevent, const char *comm, struct cmdline 
*next)
+static struct tep_cmdline *
+pid_from_cmdlist(struct tep_handle *pevent, const char *comm, struct 
tep_cmdline *next)
 {
struct cmdline_list *cmdlist = (struct cmdline_list *)next;
 
@@ -5343,7 +5343,7 @@ pid_from_cmdlist(struct tep_handle *pevent, const char 
*comm, struct cmdline *ne
while (cmdlist && strcmp(cmdlist->comm, comm) != 0)
cmdlist = cmdlist->next;
 
-   return (struct cmdline *)cmdlist;
+   return (struct tep_cmdline *)cmdlist;
 }
 
 /**
@@ -5359,10 +5359,10 @@ pid_from_cmdlist(struct tep_handle *pevent, const char 
*comm, struct cmdline *ne
  * next pid.
  * Also, it does a linear seach, so it may be slow.
  */
-struct cmdline *tep_data_pid_from_comm(struct tep_handle *pevent, const char 
*comm,
-  struct cmdline *next)
+struct tep_cmdline *tep_data_pid_from_comm(struct tep_handle *pevent, const 
char *comm,
+  struct tep_cmdline *next)
 {
-   struct cmdline *cmdline;
+   struct tep_cmdline *cmdline;
 
/*
 * If the cmdlines have not been converted yet, then use
@@ -5401,7 +5401,7 @@ struct cmdline *tep_data_pid_from_comm(struct tep_handle 
*pevent, const char *co
  * Returns the pid for a give cmdline. If @cmdline is NULL, then
  * -1 is returned.
  */
-int te

[PATCH 3/6] tools/lib/traceevent: Changed return logic of trace_seq_printf() and trace_seq_vprintf() APIs

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

In order to make libtraceevent into a proper library, its API
should be straightforward. The trace_seq_printf() and
trace_seq_vprintf() APIs have inconsistent returned values with
the other trace_seq_* APIs. This path changes the return logic of
trace_seq_printf() and trace_seq_vprintf() - to return the number
of printed characters, as the other trace_seq_* related APIs.

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/trace-seq.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/tools/lib/traceevent/trace-seq.c b/tools/lib/traceevent/trace-seq.c
index 8ff1d55954d1..8d5ecd2bf877 100644
--- a/tools/lib/traceevent/trace-seq.c
+++ b/tools/lib/traceevent/trace-seq.c
@@ -100,7 +100,8 @@ static void expand_buffer(struct trace_seq *s)
  * @fmt: printf format string
  *
  * It returns 0 if the trace oversizes the buffer's free
- * space, 1 otherwise.
+ * space, the number of characters printed, or a negative
+ * value in case of an error.
  *
  * The tracer may use either sequence operations or its own
  * copy to user routines. To simplify formating of a trace
@@ -129,9 +130,10 @@ trace_seq_printf(struct trace_seq *s, const char *fmt, ...)
goto try_again;
}
 
-   s->len += ret;
+   if (ret > 0)
+   s->len += ret;
 
-   return 1;
+   return ret;
 }
 
 /**
@@ -139,6 +141,10 @@ trace_seq_printf(struct trace_seq *s, const char *fmt, ...)
  * @s: trace sequence descriptor
  * @fmt: printf format string
  *
+ * It returns 0 if the trace oversizes the buffer's free
+ * space, the number of characters printed, or a negative
+ * value in case of an error.
+ * *
  * The tracer may use either sequence operations or its own
  * copy to user routines. To simplify formating of a trace
  * trace_seq_printf is used to store strings into a special
@@ -163,9 +169,10 @@ trace_seq_vprintf(struct trace_seq *s, const char *fmt, 
va_list args)
goto try_again;
}
 
-   s->len += ret;
+   if (ret > 0)
+   s->len += ret;
 
-   return len;
+   return ret;
 }
 
 /**
-- 
2.19.1




Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 12:59:36 -0800
Andy Lutomirski  wrote:

> For all I know, the SMI handler will explode and the computer will catch fire.

That sounds like an AWESOME feature!!!

-- Steve



Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Steven Rostedt
[ Cleared out the Cc list to something more reasonable ]

On Fri, 30 Nov 2018 20:45:57 +
Abuse  wrote:

> On Friday, 30 November 2018 20:42:28 GMT David Miller wrote:
> > From: Abuse 
> > Date: Fri, 30 Nov 2018 20:39:01 +
> >   
> > > I assume I will now be barred.  
> > 
> > Perhaps, but not because you said fuck.  It would be because you're
> > intentionally creating a disturbance on the list and making it more
> > difficult for developers to get their work done and intentionally
> > creating a distraction and a hostile environment for the discussion at
> > hand.
> > 
> > That would not be censorship.
> > 
> > There is a big difference.
> >   
> 
> I would beg to differ, as would calling the removal of the word 'Fuck' 
> censorship.

Technically that is censorship. The only reason to remove the word is
because some people find it unnecessary, where as other people find it
appropriate.

Removing language people find unnecessary or offensive is censorship.

That said. I don't always find censorship a bad thing. Removing
language that was an attack to someone's race, religion, sexuality, is
also censorship. But I'm fine with that kind of censorship. Censoring
words that someone simply finds distasteful, I honestly don't really
care, because some people find "heck" distasteful too.

I would also agree with you that David blocking you for creating a
disturbance is also censorship. But that's also a kind of censorship I
would prefer to have. (Blocking spam is censorship too).

-- Steve


> 
> It's a word I find is totally unnecessary in normal public usage.
> 





Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 12:18:33 -0800
Andy Lutomirski  wrote:

> Or we could replace that IPI with x86's bona fide serialize-all-cpus
> primitive and then we can just retry instead of emulating.  It's a
> piece of cake -- we just trigger an SMI :)  /me runs away.

I must have fallen on my head one too many times, because I really like
the idea of synchronizing all the CPUs with an SMI! (If that's even
possible). The IPI's that are sent are only to force smp_mb() on all
CPUs. Which should be something an SMI could do.

/me runs after Andy

-- Steve


Re: [PATCH 7/9] tools/lib/traceevent: traceevent API cleanup

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 16:18:56 -0300
Arnaldo Carvalho de Melo  wrote:

> Em Fri, Nov 30, 2018 at 10:44:10AM -0500, Steven Rostedt escreveu:
> > From: Tzvetomir Stoyanov 
> > 
> > In order to make libtraceevent into a proper library, its API
> > should be straightforward. This patch hides few API functions,
> > intended for internal usage only:
> > tep_free_event(), tep_free_format_field(), __tep_data2host2(),
> > __tep_data2host4() and __tep_data2host8().
> > The patch also alignes the libtraceevent summary man page with
> > these API changes.  
> 
> I applied the previous patches, stopped here as it this one fails with
> the error bellow.
> 
> Please resend from this patch onwards.
> 
> BTW, this is what I have right now:
> 
> [acme@quaco perf]$ git log --oneline - 6
> 9db9efe53c72 (HEAD -> perf/core) tools lib traceevent: Rename 
> tep_free_format() to tep_free_event()
> e58c351e8383 tools lib traceevent, perf tools: Rename 'struct 
> tep_event_format' to 'struct tep_event'
> 4237fd0b60d8 tools lib traceevent: Install trace-seq.h API header file
> bb837f2581dc tools lib traceevent: Added support for pkg-config
> a2c167ad70b6 tools lib traceevent: Implement new API tep_get_ref()
> 51d0337d0198 (acme.korg/perf/core) tools lib traceevent: Add sanity check to 
> is_timestamp_in_us()
> 
> - Arnaldo
> 
> [acme@quaco perf]$ m
> make: Entering directory '/home/acme/git/perf/tools/perf'
>   BUILD:   Doing 'make -j8' parallel build
>   CC   /tmp/build/perf/util/trace-event-read.o
>   CC   /tmp/build/perf/util/trace-event-scripting.o
>   CC   /tmp/build/perf/util/trace-event.o
>   CC   /tmp/build/perf/util/sort.o
>   CC   /tmp/build/perf/util/hist.o
>   INSTALL  trace_plugins
>   CC   /tmp/build/perf/util/cgroup.o
>   CC   /tmp/build/perf/util/stat.o
>   CC   /tmp/build/perf/util/stat-shadow.o
>   CC   /tmp/build/perf/util/stat-display.o
>   CC   /tmp/build/perf/util/record.o
> util/trace-event-read.c: In function ‘read4’:
> util/trace-event-read.c:105:9: error: implicit declaration of function 
> ‘__tep_data2host4’; did you mean ‘tep_data_flags’? 
> [-Werror=implicit-function-declaration]
>   return __tep_data2host4(pevent, data);
>  ^~~~
>  tep_data_flags

This should have been changed with patch 6 in the series.

-- Steve

> util/trace-event-read.c:105:9: error: nested extern declaration of 
> ‘__tep_data2host4’ [-Werror=nested-externs]
> util/trace-event-read.c: In function ‘read8’:
> util/trace-event-read.c:114:9: error: implicit declaration of function 
> ‘__tep_data2host8’; did you mean ‘tep_data_flags’? 
> [-Werror=implicit-function-declaration]
>   return __tep_data2host8(pevent, data);
>  ^~~~
>  tep_data_flags
> util/trace-event-read.c:114:9: error: nested extern declaration of 
> ‘__tep_data2host8’ [-Werror=nested-externs]
> cc1: all warnings being treated as errors
> mv: cannot stat '/tmp/build/perf/util/.trace-event-read.o.tmp': No such file 
> or directory
> make[4]: *** [/home/acme/git/perf/tools/build/Makefile.build:96: 
> /tmp/build/perf/util/trace-event-read.o] Error 1
> make[4]: *** Waiting for unfinished jobs
> make[3]: *** [/home/acme/git/perf/tools/build/Makefile.build:139: util] Error 
> 2
> make[2]: *** [Makefile.perf:658: /tmp/build/perf/libperf-in.o] Error 2
> make[1]: *** [Makefile.perf:215: sub-make] Error 2
> make: *** [Makefile:110: install-bin] Error 2
> make: Leaving directory '/home/acme/git/perf/tools/perf'
> 
>  Performance counter stats for 'make -k O=/tmp/build/perf -C tools/perf 
> install-bin':
> 
> 20,606,463,070  cycles:u
> 24,937,056,161  instructions:u#1.21  insn per cycle
> 
>2.192507189 seconds time elapsed
> 
>5.571323000 seconds user
>0.994057000 seconds sys
> 
> 
> [acme@quaco perf]$



Re: [PATCH 7/9] tools/lib/traceevent: traceevent API cleanup

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 14:37:36 -0500
Steven Rostedt  wrote:
 
> What branch are you applying it against? Just to make sure I'm testing
> the same thing you are.

Nevermind, I just downloaded your repo.

BTW, should I be basing these patches off of your repo or tip/perf/core?

-- Steve


Re: [PATCH RFC 14/15] lib: replace **** with a hug

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 11:27:23 -0800
Jarkko Sakkinen  wrote:

> In order to comply with the CoC, replace  with a hug.
> 
> Signed-off-by: Jarkko Sakkinen 
> ---
>  lib/vsprintf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 37a54a6dd594..b7a92f5d47bb 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -6,7 +6,7 @@
>  
>  /* vsprintf.c -- Lars Wirzenius & Linus Torvalds. */
>  /*
> - * Wirzenius wrote this portably, Torvalds fucked it up :-)
> + * Wirzenius wrote this portably, Torvalds hugged it up :-)

Since the code has been greatly modified since that comment was added,
I would say the comment is simply out of date.

Just nuke the comment, and that will be an accurate change with or
without CoC.

-- Steve

>   */
>  
>  /*



Re: [PATCH 7/9] tools/lib/traceevent: traceevent API cleanup

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 16:18:56 -0300
Arnaldo Carvalho de Melo  wrote:

> Em Fri, Nov 30, 2018 at 10:44:10AM -0500, Steven Rostedt escreveu:
> > From: Tzvetomir Stoyanov 
> > 
> > In order to make libtraceevent into a proper library, its API
> > should be straightforward. This patch hides few API functions,
> > intended for internal usage only:
> > tep_free_event(), tep_free_format_field(), __tep_data2host2(),
> > __tep_data2host4() and __tep_data2host8().
> > The patch also alignes the libtraceevent summary man page with
> > these API changes.  
> 
> I applied the previous patches, stopped here as it this one fails with
> the error bellow.

Thanks, I'll take a look at it.

What branch are you applying it against? Just to make sure I'm testing
the same thing you are.

-- Steve

> 
> Please resend from this patch onwards.
> 
> BTW, this is what I have right now:
> 
> [acme@quaco perf]$ git log --oneline - 6
> 9db9efe53c72 (HEAD -> perf/core) tools lib traceevent: Rename 
> tep_free_format() to tep_free_event()
> e58c351e8383 tools lib traceevent, perf tools: Rename 'struct 
> tep_event_format' to 'struct tep_event'
> 4237fd0b60d8 tools lib traceevent: Install trace-seq.h API header file
> bb837f2581dc tools lib traceevent: Added support for pkg-config
> a2c167ad70b6 tools lib traceevent: Implement new API tep_get_ref()
> 51d0337d0198 (acme.korg/perf/core) tools lib traceevent: Add sanity check to 
> is_timestamp_in_us()
> 


Re: [PATCH 0/2] [GIT PULL] tracing: More fixes for 4.20

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 10:56:06 -0800
Linus Torvalds  wrote:

> This way I got the matching diffstat from your pull requests, but more
> importantly also the independent merge messages.
> 
> The history looks slightly odd this way (with two adjacent merges of
> continuous history), but I thought I'd explain the reason.

It makes perfect sense.

-- Steve


Re: [PATCH 0/2] [GIT PULL] tracing: More fixes for 4.20

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 09:41:00 -0800
Linus Torvalds  wrote:

> On Thu, Nov 29, 2018 at 7:19 PM Steven Rostedt  wrote:
> >
> > Note, this is on top of a previous git pull that I have submitted:
> >
> >   http://lkml.kernel.org/r/20181127224031.76681...@vmware.local.home  
> 
> Hmm.
> 
> I had dismissed that, because the patch descriptors for that series
> had had "for-next" in them.
> 
> https://lore.kernel.org/lkml/20181122002801.501220...@goodmis.org/
> 
> so I dismissed that pull request entirely as being not for this
> release entirely.
> 
> I went back and merged things, but in general, please try to avoid
> confusing me. I'm easily confused when I get mixed messages about the
> patches and the pull requests, and will then generally default to
> "ignore, this is informational".
>

My apologies. I used my scripts to push them into my linux-next repo,
and it added the [for-next] when doing so in the series. I wanted it to
sit in next for a week (because I modified a bunch of architecture code
that I could only compile test, but not run).

I'll be more careful next time.

Thanks!

-- Steve


Re: [RFC PATCH v3] ftrace: support very early function tracing

2018-11-30 Thread Steven Rostedt
On Wed, 24 Oct 2018 19:22:30 +
Abderrahmane Benbachir  wrote:


> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -239,6 +239,16 @@ static inline void ftrace_free_init_mem(void) { }
>   static inline void ftrace_free_mem(struct module *mod, void *start,  
> void *end) { }
>   #endif /* CONFIG_FUNCTION_TRACER */
> 
> +#ifdef CONFIG_VERY_EARLY_FUNCTION_TRACER
> +extern void ftrace_early_init(char *command_line);
> +extern void ftrace_early_shutdown(void);
> +extern void ftrace_early_fill_ringbuffer(void *data);
> +#else
> +static inline void ftrace_early_init(char *command_line) { }
> +static inline void ftrace_early_shutdown(void) { }
> +static inline void ftrace_early_fill_ringbuffer(void *data) { }
> +#endif
> +
>   #ifdef CONFIG_STACK_TRACER
> 
>   #define STACK_TRACE_ENTRIES 500
> @@ -443,6 +453,10 @@ unsigned long ftrace_get_addr_curr(struct  
> dyn_ftrace *rec);
> 
>   extern ftrace_func_t ftrace_trace_function;
> 
> +#if defined(CONFIG_VERY_EARLY_FUNCTION_TRACER) &&  
> defined(CONFIG_DYNAMIC_FTRACE)

Seems the patch has some formatting issue. Can you resend with better
email client options.

> +extern ftrace_func_t ftrace_vearly_trace_function;
> +#endif
> +
>   int ftrace_regex_open(struct ftrace_ops *ops, int flag,
> struct inode *inode, struct file *file);
>   ssize_t ftrace_filter_write(struct file *file, const char __user *ubuf,
> @@ -716,7 +730,7 @@ static inline unsigned long get_lock_parent_ip(void)
>   #ifdef CONFIG_FTRACE_MCOUNT_RECORD
>   extern void ftrace_init(void);
>   #else
> -static inline void ftrace_init(void) { }
> +static inline void ftrace_init(void) { ftrace_early_shutdown(); }
>   #endif
> 
>   /*
> diff --git a/init/main.c b/init/main.c
> index 18f8f0140fa0..1b289325223f 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -533,6 +533,7 @@ asmlinkage __visible void __init start_kernel(void)
>   char *command_line;
>   char *after_dashes;
> 
> + ftrace_early_init(boot_command_line);
>   set_task_stack_end_magic(_task);
>   smp_setup_processor_id();
>   debug_objects_early_init();
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 5e3de28c7677..4b358bf6abb0 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -19,6 +19,11 @@ config HAVE_FUNCTION_TRACER
>   help
> See Documentation/trace/ftrace-design.rst
> 
> +config HAVE_VERY_EARLY_FTRACE
> + bool
> + help
> +   See Documentation/trace/ftrace-design.txt
> +
>   config HAVE_FUNCTION_GRAPH_TRACER
>   bool
>   help
> @@ -155,6 +160,52 @@ config FUNCTION_TRACER
> (the bootup default), then the overhead of the instructions is very
> small and not measurable even in micro-benchmarks.
> 
> +config VERY_EARLY_FUNCTION_TRACER
> + bool "Very Early Kernel Function Tracer"
> + depends on FUNCTION_TRACER
> + depends on HAVE_VERY_EARLY_FTRACE
> + help
> +   Normally, function tracing can only start after memory has been
> +   initialized early in boot. If "ftrace=function" is added to the
> +   command line, then function tracing will start after memory setup.
> +   In order to trace functions before that, this option will
> +   have function tracing starts before memory setup is complete, by

 s/starts/start/

> +   placing the trace in a temporary buffer, which will be copied to
> +   the trace buffer after memory setup. The size of this temporary
> +   buffer is defined by VERY_EARLY_FTRACE_BUF_SHIFT.

I'm also thinking that we should allocate a sub buffer for this. That
is, hold off on writing to ring buffer until after file systems are
initialized. Create a sub buffer "early_boot" (will be in
tracefs/instances/early_boot)" and copy the data there.

The reason I say this is that having ftrace=function will continue to
fill the buffer and you will most likely lose your data from
overwriting.

> +
> +config VERY_EARLY_FTRACE_BUF_SHIFT
> + int "Temporary buffer size (17 => 128 KB, 24 => 16 MB)"
> + depends on VERY_EARLY_FUNCTION_TRACER
> + range 8 24
> + default 19
> + help
> +   Select the size of the buffer to be used for storing function calls at
> +   very early stage.
> +   The value defines the size as a power of 2.
> +   Examples:
> + 20 =>   1 MB
> + 19 => 512 KB
> + 17 => 128 KB

Should state the allowable range in the help text as well.

> +
> +config VERY_EARLY_FTRACE_FILTER_SHIFT
> + int "Temporary filter size (filter/notrace) (17 => 128 KB, 19 => 512 
> KB)"
> + depends on VERY_EARLY_FUNCTION_TRACER
> + depends on FTRACE_MCOUNT_RECORD
> + range 0 19
> + default 17
> + help
> +   Select the size of the filter buffer to be used for filtering (trace/
> +   no trace) functions at very early stage.
> +   Two buffers (trace/no_trace) will be created using by this option.
> +   These following kernel parameters control filtering during bootup :
> + 

[PATCH 1/9] tools/lib/traceevent: Implemented new API tep_get_ref()

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

This patch implements a new API of the tracevent library:

  int tep_get_ref(struct tep_handle *tep);

The API returns the reference counter "ref_count" of the tep handler.
As "struct tep_handle" is internal only, its members cannot be accessed
by the library users, the API is used to get the reference counter.

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/event-parse.c | 7 +++
 tools/lib/traceevent/event-parse.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index 3692f29fee46..a5f3e37f81b5 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -6730,6 +6730,13 @@ void tep_ref(struct tep_handle *pevent)
pevent->ref_count++;
 }
 
+int tep_get_ref(struct tep_handle *tep)
+{
+   if (tep)
+   return tep->ref_count;
+   return 0;
+}
+
 void tep_free_format_field(struct tep_format_field *field)
 {
free(field->type);
diff --git a/tools/lib/traceevent/event-parse.h 
b/tools/lib/traceevent/event-parse.h
index 16bf4c890b6f..44ec26c72c2e 100644
--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -581,6 +581,7 @@ struct tep_handle *tep_alloc(void);
 void tep_free(struct tep_handle *pevent);
 void tep_ref(struct tep_handle *pevent);
 void tep_unref(struct tep_handle *pevent);
+int tep_get_ref(struct tep_handle *tep);
 
 /* access to the internal parser */
 void tep_buffer_init(const char *buf, unsigned long long size);
-- 
2.19.1




[PATCH 5/9] tools/lib/traceevent: Rename tep_free_format() to tep_free_event()

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

In order to make libtraceevent into a proper library, variables, data
structures and functions require a unique prefix to prevent name space
conflicts. This renames tep_free_format() to tep_free_event(),
which describes more closely the purpose of the function.

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/event-parse.c | 6 +++---
 tools/lib/traceevent/event-parse.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index bacd86c41563..848cd76b91a7 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -6154,7 +6154,7 @@ __parse_event(struct tep_handle *pevent,
return 0;
 
 event_add_failed:
-   tep_free_format(event);
+   tep_free_event(event);
return ret;
 }
 
@@ -6763,7 +6763,7 @@ static void free_formats(struct tep_format *format)
free_format_fields(format->fields);
 }
 
-void tep_free_format(struct tep_event *event)
+void tep_free_event(struct tep_event *event)
 {
free(event->name);
free(event->system);
@@ -6849,7 +6849,7 @@ void tep_free(struct tep_handle *pevent)
}
 
for (i = 0; i < pevent->nr_events; i++)
-   tep_free_format(pevent->events[i]);
+   tep_free_event(pevent->events[i]);
 
while (pevent->handlers) {
handle = pevent->handlers;
diff --git a/tools/lib/traceevent/event-parse.h 
b/tools/lib/traceevent/event-parse.h
index 2a1a644c5ec8..950ad185a5c4 100644
--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -475,7 +475,7 @@ enum tep_errno tep_parse_format(struct tep_handle *pevent,
struct tep_event **eventp,
const char *buf,
unsigned long size, const char *sys);
-void tep_free_format(struct tep_event *event);
+void tep_free_event(struct tep_event *event);
 void tep_free_format_field(struct tep_format_field *field);
 
 void *tep_get_field_raw(struct trace_seq *s, struct tep_event *event,
-- 
2.19.1




[PATCH 2/9] tools/lib/traceevent: Added support for pkg-config

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

This patch implements integration with pkg-config framework.
pkg-config can be used by the library users to determine
required CFLAGS and LDFLAGS in order to use the library

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/Makefile | 26 ---
 .../lib/traceevent/libtraceevent.pc.template  | 10 +++
 2 files changed, 33 insertions(+), 3 deletions(-)
 create mode 100644 tools/lib/traceevent/libtraceevent.pc.template

diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
index 0b4e833088a4..adb16f845ab3 100644
--- a/tools/lib/traceevent/Makefile
+++ b/tools/lib/traceevent/Makefile
@@ -25,6 +25,7 @@ endef
 $(call allow-override,CC,$(CROSS_COMPILE)gcc)
 $(call allow-override,AR,$(CROSS_COMPILE)ar)
 $(call allow-override,NM,$(CROSS_COMPILE)nm)
+$(call allow-override,PKG_CONFIG,pkg-config)
 
 EXT = -std=gnu99
 INSTALL = install
@@ -47,6 +48,8 @@ prefix ?= /usr/local
 libdir = $(prefix)/$(libdir_relative)
 man_dir = $(prefix)/share/man
 man_dir_SQ = '$(subst ','\'',$(man_dir))'
+pkgconfig_dir ?= $(word 1,$(shell $(PKG_CONFIG)\
+   --variable pc_path pkg-config | tr ":" " "))
 
 export man_dir man_dir_SQ INSTALL
 export DESTDIR DESTDIR_SQ
@@ -270,7 +273,19 @@ define do_generate_dynamic_list_file
fi
 endef
 
-install_lib: all_cmd install_plugins
+PKG_CONFIG_FILE = libtraceevent.pc
+define do_install_pkgconfig_file
+   if [ -n "${pkgconfig_dir}" ]; then  
\
+   cp -f ${PKG_CONFIG_FILE}.template ${PKG_CONFIG_FILE};   
\
+   sed -i "s|INSTALL_PREFIX|${1}|g" ${PKG_CONFIG_FILE};
\
+   sed -i "s|LIB_VERSION|${EVENT_PARSE_VERSION}|g" 
${PKG_CONFIG_FILE}; \
+   $(call do_install,$(PKG_CONFIG_FILE),$(pkgconfig_dir),644); 
\
+   else
\
+   (echo Failed to locate pkg-config directory) 1>&2;  
\
+   fi
+endef
+
+install_lib: all_cmd install_plugins install_pkgconfig
$(call QUIET_INSTALL, $(LIB_TARGET)) \
$(call do_install_mkdir,$(libdir_SQ)); \
cp -fpR $(LIB_INSTALL) $(DESTDIR)$(libdir_SQ)
@@ -279,6 +294,10 @@ install_plugins: $(PLUGINS)
$(call QUIET_INSTALL, trace_plugins) \
$(call do_install_plugins, $(PLUGINS))
 
+install_pkgconfig:
+   $(call QUIET_INSTALL, $(PKG_CONFIG_FILE)) \
+   $(call do_install_pkgconfig_file,$(prefix))
+
 install_headers:
$(call QUIET_INSTALL, headers) \
$(call 
do_install,event-parse.h,$(prefix)/include/traceevent,644); \
@@ -289,8 +308,9 @@ install: install_lib
 
 clean:
$(call QUIET_CLEAN, libtraceevent) \
-   $(RM) *.o *~ $(TARGETS) *.a *.so $(VERSION_FILES) .*.d .*.cmd \
-   $(RM) TRACEEVENT-CFLAGS tags TAGS
+   $(RM) *.o *~ $(TARGETS) *.a *.so $(VERSION_FILES) .*.d .*.cmd; \
+   $(RM) TRACEEVENT-CFLAGS tags TAGS; \
+   $(RM) $(PKG_CONFIG_FILE)
 
 PHONY += force plugins
 force:
diff --git a/tools/lib/traceevent/libtraceevent.pc.template 
b/tools/lib/traceevent/libtraceevent.pc.template
new file mode 100644
index ..42e4d6cb6b9e
--- /dev/null
+++ b/tools/lib/traceevent/libtraceevent.pc.template
@@ -0,0 +1,10 @@
+prefix=INSTALL_PREFIX
+libdir=${prefix}/lib64
+includedir=${prefix}/include/traceevent
+
+Name: libtraceevent
+URL: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
+Description: Linux kernel trace event library
+Version: LIB_VERSION
+Cflags: -I${includedir}
+Libs: -L${libdir} -ltraceevent
-- 
2.19.1




[PATCH 0/9] tools/lib/traceevent: More udpates to make libtraceevent into a library

2018-11-30 Thread Steven Rostedt


Arnaldo and Jiri,

Here's more patches to get us a step closer to having a legitimate
standalone library for libtraceevent. I'm currently reviewing man
pages, which I want finished before we call it done.

Please pull this tree (based on current tip/perf/core) or apply
the patches.

Thanks!

-- Steve


  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/perf/core

Head SHA1: 1a1dbb61ee77226e9097bfe307219abf5df8e4cd


Tzvetomir Stoyanov (9):
  tools/lib/traceevent: Implemented new API tep_get_ref()
  tools/lib/traceevent: Added support for pkg-config
  tools/lib/traceevent: Install trace-seq.h API header file
  tools/lib/traceevent, tools/perf: Rename struct tep_event_format to 
struct tep_event
  tools/lib/traceevent: Rename tep_free_format() to tep_free_event()
  tools/perf: traceevent API cleanup, remove __tep_data2host*()
  tools/lib/traceevent: traceevent API cleanup
  tools/lib/traceevent: Introduce new libtracevent API: tep_override_comm()
  tools/lib/traceevent: Add sanity check to is_timestamp_in_us()


 tools/lib/traceevent/Makefile  |  27 +-
 tools/lib/traceevent/event-parse-api.c |   8 +-
 tools/lib/traceevent/event-parse-local.h   |  13 +-
 tools/lib/traceevent/event-parse.c | 283 -
 tools/lib/traceevent/event-parse.h |  78 +++---
 tools/lib/traceevent/libtraceevent.pc.template |  10 +
 tools/lib/traceevent/parse-filter.c|  42 +--
 tools/lib/traceevent/plugin_function.c |   2 +-
 tools/lib/traceevent/plugin_hrtimer.c  |   4 +-
 tools/lib/traceevent/plugin_kmem.c |   2 +-
 tools/lib/traceevent/plugin_kvm.c  |  14 +-
 tools/lib/traceevent/plugin_mac80211.c |   4 +-
 tools/lib/traceevent/plugin_sched_switch.c |   4 +-
 tools/perf/builtin-trace.c |   2 +-
 tools/perf/util/evsel.h|   4 +-
 tools/perf/util/header.c   |   2 +-
 tools/perf/util/python.c   |   4 +-
 .../perf/util/scripting-engines/trace-event-perl.c |   6 +-
 .../util/scripting-engines/trace-event-python.c|   8 +-
 tools/perf/util/trace-event-parse.c|  16 +-
 tools/perf/util/trace-event-read.c |   4 +-
 tools/perf/util/trace-event.c  |   8 +-
 tools/perf/util/trace-event.h  |  16 +-
 23 files changed, 317 insertions(+), 244 deletions(-)
 create mode 100644 tools/lib/traceevent/libtraceevent.pc.template


[PATCH 8/9] tools/lib/traceevent: Introduce new libtracevent API: tep_override_comm()

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

This patch adds a new API of tracevent library: tep_override_comm()
It registers a pid / command mapping. If a mapping with the same
pid already exists, the entry is updated with the new command.

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/event-parse.c | 69 +++---
 tools/lib/traceevent/event-parse.h |  1 +
 2 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index 8863de9f8869..892cf032a096 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -232,11 +232,13 @@ int tep_pid_is_registered(struct tep_handle *pevent, int 
pid)
  * we must add this pid. This is much slower than when cmdlines
  * are added before the array is initialized.
  */
-static int add_new_comm(struct tep_handle *pevent, const char *comm, int pid)
+static int add_new_comm(struct tep_handle *pevent,
+   const char *comm, int pid, bool override)
 {
struct cmdline *cmdlines = pevent->cmdlines;
-   const struct cmdline *cmdline;
+   struct cmdline *cmdline;
struct cmdline key;
+   char *new_comm;
 
if (!pid)
return 0;
@@ -247,8 +249,19 @@ static int add_new_comm(struct tep_handle *pevent, const 
char *comm, int pid)
cmdline = bsearch(, pevent->cmdlines, pevent->cmdline_count,
   sizeof(*pevent->cmdlines), cmdline_cmp);
if (cmdline) {
-   errno = EEXIST;
-   return -1;
+   if (!override) {
+   errno = EEXIST;
+   return -1;
+   }
+   new_comm = strdup(comm);
+   if (!new_comm) {
+   errno = ENOMEM;
+   return -1;
+   }
+   free(cmdline->comm);
+   cmdline->comm = new_comm;
+
+   return 0;
}
 
cmdlines = realloc(cmdlines, sizeof(*cmdlines) * (pevent->cmdline_count 
+ 1));
@@ -275,21 +288,13 @@ static int add_new_comm(struct tep_handle *pevent, const 
char *comm, int pid)
return 0;
 }
 
-/**
- * tep_register_comm - register a pid / comm mapping
- * @pevent: handle for the pevent
- * @comm: the command line to register
- * @pid: the pid to map the command line to
- *
- * This adds a mapping to search for command line names with
- * a given pid. The comm is duplicated.
- */
-int tep_register_comm(struct tep_handle *pevent, const char *comm, int pid)
+static int _tep_register_comm(struct tep_handle *pevent,
+ const char *comm, int pid, bool override)
 {
struct cmdline_list *item;
 
if (pevent->cmdlines)
-   return add_new_comm(pevent, comm, pid);
+   return add_new_comm(pevent, comm, pid, override);
 
item = malloc(sizeof(*item));
if (!item)
@@ -312,6 +317,40 @@ int tep_register_comm(struct tep_handle *pevent, const 
char *comm, int pid)
return 0;
 }
 
+/**
+ * tep_register_comm - register a pid / comm mapping
+ * @pevent: handle for the pevent
+ * @comm: the command line to register
+ * @pid: the pid to map the command line to
+ *
+ * This adds a mapping to search for command line names with
+ * a given pid. The comm is duplicated. If a command with the same pid
+ * already exist, -1 is returned and errno is set to EEXIST
+ */
+int tep_register_comm(struct tep_handle *pevent, const char *comm, int pid)
+{
+   return _tep_register_comm(pevent, comm, pid, false);
+}
+
+/**
+ * tep_override_comm - register a pid / comm mapping
+ * @pevent: handle for the pevent
+ * @comm: the command line to register
+ * @pid: the pid to map the command line to
+ *
+ * This adds a mapping to search for command line names with
+ * a given pid. The comm is duplicated. If a command with the same pid
+ * already exist, the command string is udapted with the new one
+ */
+int tep_override_comm(struct tep_handle *pevent, const char *comm, int pid)
+{
+   if (!pevent->cmdlines && cmdline_init(pevent)) {
+   errno = ENOMEM;
+   return -1;
+   }
+   return _tep_register_comm(pevent, comm, pid, true);
+}
+
 int tep_register_trace_clock(struct tep_handle *pevent, const char 
*trace_clock)
 {
pevent->trace_clock = strdup(trace_clock);
diff --git a/tools/lib/traceevent/event-parse.h 
b/tools/lib/traceevent/event-parse.h
index 35d37087d3c5..e6f4249910e6 100644
--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -432,6 +432,7 @@ int tep_set_function_resolver(struct tep_handle *pevent,
  tep_func_resolver_t *func, void *priv);
 void tep_reset_function_resolver(struct tep_handle *pevent);
 int tep_register_comm(struct tep_handle *pevent, const char *comm, int pid);
+int tep_override_comm

[PATCH 6/9] tools/perf: traceevent API cleanup, remove __tep_data2host*()

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

In order to make libtraceevent into a proper library, its API
should be straightforward. The __tep_data2host*() functions are
going to no longer be available as a libtraceevent API, tep_read_number()
should be used instead. This patch replaces __tep_data2host*() usage with
tep_read_number() in perf.

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/perf/util/trace-event-read.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/trace-event-read.c 
b/tools/perf/util/trace-event-read.c
index 76f12c705ef9..efe2f58cff4e 100644
--- a/tools/perf/util/trace-event-read.c
+++ b/tools/perf/util/trace-event-read.c
@@ -102,7 +102,7 @@ static unsigned int read4(struct tep_handle *pevent)
 
if (do_read(, 4) < 0)
return 0;
-   return __tep_data2host4(pevent, data);
+   return tep_read_number(pevent, , 4);
 }
 
 static unsigned long long read8(struct tep_handle *pevent)
@@ -111,7 +111,7 @@ static unsigned long long read8(struct tep_handle *pevent)
 
if (do_read(, 8) < 0)
return 0;
-   return __tep_data2host8(pevent, data);
+   return tep_read_number(pevent, , 8);
 }
 
 static char *read_string(void)
-- 
2.19.1




[PATCH 7/9] tools/lib/traceevent: traceevent API cleanup

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

In order to make libtraceevent into a proper library, its API
should be straightforward. This patch hides few API functions,
intended for internal usage only:
tep_free_event(), tep_free_format_field(), __tep_data2host2(),
__tep_data2host4() and __tep_data2host8().
The patch also alignes the libtraceevent summary man page with
these API changes.

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/event-parse-api.c   |  6 +++---
 tools/lib/traceevent/event-parse-local.h |  7 +++
 tools/lib/traceevent/event-parse.c   | 13 -
 tools/lib/traceevent/event-parse.h   | 16 
 4 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/tools/lib/traceevent/event-parse-api.c 
b/tools/lib/traceevent/event-parse-api.c
index 0dc011154ee9..8b31c0e00ba3 100644
--- a/tools/lib/traceevent/event-parse-api.c
+++ b/tools/lib/traceevent/event-parse-api.c
@@ -51,7 +51,7 @@ void tep_set_flag(struct tep_handle *tep, int flag)
tep->flags |= flag;
 }
 
-unsigned short __tep_data2host2(struct tep_handle *pevent, unsigned short data)
+unsigned short tep_data2host2(struct tep_handle *pevent, unsigned short data)
 {
unsigned short swap;
 
@@ -64,7 +64,7 @@ unsigned short __tep_data2host2(struct tep_handle *pevent, 
unsigned short data)
return swap;
 }
 
-unsigned int __tep_data2host4(struct tep_handle *pevent, unsigned int data)
+unsigned int tep_data2host4(struct tep_handle *pevent, unsigned int data)
 {
unsigned int swap;
 
@@ -80,7 +80,7 @@ unsigned int __tep_data2host4(struct tep_handle *pevent, 
unsigned int data)
 }
 
 unsigned long long
-__tep_data2host8(struct tep_handle *pevent, unsigned long long data)
+tep_data2host8(struct tep_handle *pevent, unsigned long long data)
 {
unsigned long long swap;
 
diff --git a/tools/lib/traceevent/event-parse-local.h 
b/tools/lib/traceevent/event-parse-local.h
index 94746efef433..9a092dd4a86d 100644
--- a/tools/lib/traceevent/event-parse-local.h
+++ b/tools/lib/traceevent/event-parse-local.h
@@ -89,4 +89,11 @@ struct tep_handle {
char *trace_clock;
 };
 
+void tep_free_event(struct tep_event *event);
+void tep_free_format_field(struct tep_format_field *field);
+
+unsigned short tep_data2host2(struct tep_handle *pevent, unsigned short data);
+unsigned int tep_data2host4(struct tep_handle *pevent, unsigned int data);
+unsigned long long tep_data2host8(struct tep_handle *pevent, unsigned long 
long data);
+
 #endif /* _PARSE_EVENTS_INT_H */
diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index 848cd76b91a7..8863de9f8869 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -3328,15 +3328,18 @@ tep_find_any_field(struct tep_event *event, const char 
*name)
 unsigned long long tep_read_number(struct tep_handle *pevent,
   const void *ptr, int size)
 {
+   unsigned long long val;
+
switch (size) {
case 1:
return *(unsigned char *)ptr;
case 2:
-   return tep_data2host2(pevent, ptr);
+   return tep_data2host2(pevent, *(unsigned short *)ptr);
case 4:
-   return tep_data2host4(pevent, ptr);
+   return tep_data2host4(pevent, *(unsigned int *)ptr);
case 8:
-   return tep_data2host8(pevent, ptr);
+   memcpy(, (ptr), sizeof(unsigned long long));
+   return tep_data2host8(pevent, val);
default:
/* BUG! */
return 0;
@@ -4062,7 +4065,7 @@ static void print_str_arg(struct trace_seq *s, void 
*data, int size,
f = tep_find_any_field(event, arg->string.string);
arg->string.offset = f->offset;
}
-   str_offset = tep_data2host4(pevent, data + arg->string.offset);
+   str_offset = tep_data2host4(pevent, *(unsigned int *)(data + 
arg->string.offset));
str_offset &= 0x;
print_str_to_seq(s, format, len_arg, ((char *)data) + 
str_offset);
break;
@@ -4080,7 +4083,7 @@ static void print_str_arg(struct trace_seq *s, void 
*data, int size,
f = tep_find_any_field(event, arg->bitmask.bitmask);
arg->bitmask.offset = f->offset;
}
-   bitmask_offset = tep_data2host4(pevent, data + 
arg->bitmask.offset);
+   bitmask_offset = tep_data2host4(pevent, *(unsigned int *)(data 
+ arg->bitmask.offset));
bitmask_size = bitmask_offset >> 16;
bitmask_offset &= 0x;
print_bitmask_to_seq(pevent, s, format, len_arg,
diff --git a/tools/lib/traceevent/event-parse.h 
b/tools/lib/traceevent/event-parse.h
index 950ad185a5c4..35d37087d3c5 100644
--- a/tools/lib/trac

[PATCH 9/9] tools/lib/traceevent: Add sanity check to is_timestamp_in_us()

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

This patch adds a sanity check to is_timestamp_in_us() input parameter
trace_clock. It avoids a potential segfault in case trace_clock is NULL.

Reported-by: Slavomir Kaslev 
Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/event-parse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index 892cf032a096..0923e331441e 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -5451,7 +5451,7 @@ void tep_event_info(struct trace_seq *s, struct tep_event 
*event,
 
 static bool is_timestamp_in_us(char *trace_clock, bool use_trace_clock)
 {
-   if (!use_trace_clock)
+   if (!trace_clock || !use_trace_clock)
return true;
 
if (!strcmp(trace_clock, "local") || !strcmp(trace_clock, "global")
-- 
2.19.1




[PATCH 4/9] tools/lib/traceevent, tools/perf: Rename struct tep_event_format to struct tep_event

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

In order to make libtraceevent into a proper library, variables, data
structures and functions require a unique prefix to prevent name space
conflicts. This renames struct tep_event_format to struct tep_event,
which describes more closely the purpose of the struct.

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/event-parse-api.c|   2 +-
 tools/lib/traceevent/event-parse-local.h  |   6 +-
 tools/lib/traceevent/event-parse.c| 188 +-
 tools/lib/traceevent/event-parse.h|  62 +++---
 tools/lib/traceevent/parse-filter.c   |  42 ++--
 tools/lib/traceevent/plugin_function.c|   2 +-
 tools/lib/traceevent/plugin_hrtimer.c |   4 +-
 tools/lib/traceevent/plugin_kmem.c|   2 +-
 tools/lib/traceevent/plugin_kvm.c |  14 +-
 tools/lib/traceevent/plugin_mac80211.c|   4 +-
 tools/lib/traceevent/plugin_sched_switch.c|   4 +-
 tools/perf/builtin-trace.c|   2 +-
 tools/perf/util/evsel.h   |   4 +-
 tools/perf/util/header.c  |   2 +-
 tools/perf/util/python.c  |   4 +-
 .../util/scripting-engines/trace-event-perl.c |   6 +-
 .../scripting-engines/trace-event-python.c|   8 +-
 tools/perf/util/trace-event-parse.c   |  16 +-
 tools/perf/util/trace-event.c |   8 +-
 tools/perf/util/trace-event.h |  16 +-
 20 files changed, 198 insertions(+), 198 deletions(-)

diff --git a/tools/lib/traceevent/event-parse-api.c 
b/tools/lib/traceevent/event-parse-api.c
index 61f7149085ee..0dc011154ee9 100644
--- a/tools/lib/traceevent/event-parse-api.c
+++ b/tools/lib/traceevent/event-parse-api.c
@@ -15,7 +15,7 @@
  * This returns pointer to the first element of the events array
  * If @tep is NULL, NULL is returned.
  */
-struct tep_event_format *tep_get_first_event(struct tep_handle *tep)
+struct tep_event *tep_get_first_event(struct tep_handle *tep)
 {
if (tep && tep->events)
return tep->events[0];
diff --git a/tools/lib/traceevent/event-parse-local.h 
b/tools/lib/traceevent/event-parse-local.h
index b9bddde577f8..94746efef433 100644
--- a/tools/lib/traceevent/event-parse-local.h
+++ b/tools/lib/traceevent/event-parse-local.h
@@ -50,9 +50,9 @@ struct tep_handle {
unsigned int printk_count;
 
 
-   struct tep_event_format **events;
+   struct tep_event **events;
int nr_events;
-   struct tep_event_format **sort_events;
+   struct tep_event **sort_events;
enum tep_event_sort_type last_type;
 
int type_offset;
@@ -84,7 +84,7 @@ struct tep_handle {
struct tep_function_handler *func_handlers;
 
/* cache */
-   struct tep_event_format *last_event;
+   struct tep_event *last_event;
 
char *trace_clock;
 };
diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index a5f3e37f81b5..bacd86c41563 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -96,7 +96,7 @@ struct tep_function_handler {
 
 static unsigned long long
 process_defined_func(struct trace_seq *s, void *data, int size,
-struct tep_event_format *event, struct tep_print_arg *arg);
+struct tep_event *event, struct tep_print_arg *arg);
 
 static void free_func_handle(struct tep_function_handler *func);
 
@@ -739,16 +739,16 @@ void tep_print_printk(struct tep_handle *pevent)
}
 }
 
-static struct tep_event_format *alloc_event(void)
+static struct tep_event *alloc_event(void)
 {
-   return calloc(1, sizeof(struct tep_event_format));
+   return calloc(1, sizeof(struct tep_event));
 }
 
-static int add_event(struct tep_handle *pevent, struct tep_event_format *event)
+static int add_event(struct tep_handle *pevent, struct tep_event *event)
 {
int i;
-   struct tep_event_format **events = realloc(pevent->events, 
sizeof(event) *
- (pevent->nr_events + 1));
+   struct tep_event **events = realloc(pevent->events, sizeof(event) *
+   (pevent->nr_events + 1));
if (!events)
return -1;
 
@@ -1355,7 +1355,7 @@ static unsigned int type_size(const char *name)
return 0;
 }
 
-static int event_read_fields(struct tep_event_format *event, struct 
tep_format_field **fields)
+static int event_read_fields(struct tep_event *event, struct tep_format_field 
**fields)
 {
struct tep_format_field *field = NULL;
enum tep_event_type type;
@@ -1642,7 +1642,7 @@ static int event_read_fields(struct tep_event_format 
*event, struct tep_format_f
return -1;
 }
 
-static int event_read_format(struct tep_event_format *event)
+static int event_read_format(struct tep_event *event)
 {
char *token;
  

[PATCH 3/9] tools/lib/traceevent: Install trace-seq.h API header file

2018-11-30 Thread Steven Rostedt
From: Tzvetomir Stoyanov 

This patch installs trace-seq.h header file on "make install".

Signed-off-by: Tzvetomir Stoyanov 
Signed-off-by: Steven Rostedt (VMware) 
---
 tools/lib/traceevent/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
index adb16f845ab3..67fe5d7ef190 100644
--- a/tools/lib/traceevent/Makefile
+++ b/tools/lib/traceevent/Makefile
@@ -285,7 +285,7 @@ define do_install_pkgconfig_file
fi
 endef
 
-install_lib: all_cmd install_plugins install_pkgconfig
+install_lib: all_cmd install_plugins install_headers install_pkgconfig
$(call QUIET_INSTALL, $(LIB_TARGET)) \
$(call do_install_mkdir,$(libdir_SQ)); \
cp -fpR $(LIB_INSTALL) $(DESTDIR)$(libdir_SQ)
@@ -302,6 +302,7 @@ install_headers:
$(call QUIET_INSTALL, headers) \
$(call 
do_install,event-parse.h,$(prefix)/include/traceevent,644); \
$(call 
do_install,event-utils.h,$(prefix)/include/traceevent,644); \
+   $(call 
do_install,trace-seq.h,$(prefix)/include/traceevent,644); \
$(call do_install,kbuffer.h,$(prefix)/include/traceevent,644)
 
 install: install_lib
-- 
2.19.1




Re: [PATCH 3/3] arm64: ftrace: add cond_resched() to func ftrace_make_(call|nop)

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 16:09:56 +0100
Anders Roxell  wrote:

> Both of those functions end up calling ftrace_modify_code(), which is
> expensive because it changes the page tables and flush caches.
> Microseconds add up because this is called in a loop for each dyn_ftrace
> record, and this triggers the softlockup watchdog unless we let it sleep
> occasionally.
> Rework so that we call cond_resched() before going into the
> ftrace_modify_code() function.
> 
> Co-developed-by: Arnd Bergmann 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Anders Roxell 

I'm fine with this patch, but I'm not placing an ack on this patch
just because I don't know the repercussions of such a change. I'll let
you folks take full responsibility ;-)

-- Steve
 

> ---
>  arch/arm64/kernel/ftrace.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
> index de1a397d2d3f..9da38da58df7 100644
> --- a/arch/arm64/kernel/ftrace.c
> +++ b/arch/arm64/kernel/ftrace.c
> @@ -130,6 +130,11 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned 
> long addr)
>   old = aarch64_insn_gen_nop();
>   new = aarch64_insn_gen_branch_imm(pc, addr, AARCH64_INSN_BRANCH_LINK);
>  
> + /* This function can take a long time when sanitizers are enabled, so
> +  * lets make sure we allow RCU processing.
> +  */
> + cond_resched();
> +
>   return ftrace_modify_code(pc, old, new, true);
>  }
>  
> @@ -188,6 +193,11 @@ int ftrace_make_nop(struct module *mod, struct 
> dyn_ftrace *rec,
>  
>   new = aarch64_insn_gen_nop();
>  
> + /* This function can take a long time when sanitizers are enabled, so
> +  * lets make sure we allow RCU processing.
> +  */
> + cond_resched();
> +
>   return ftrace_modify_code(pc, old, new, validate);
>  }
>  



Re: [PATCH 2/3] tracing: instruct KCOV not to track tracing files

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 16:09:35 +0100
Anders Roxell  wrote:

> When we have KCOV enabled and running ftrace startup tests we end up in
> a softlockup. Kcov and ftrace tracing each other makes it really slow:
> 
> [  275.141388] Testing tracer wakeup_dl:  PASSED
> [  304.738345] Testing tracer function_graph:
> [  716.236822] watchdog: BUG: soft lockup - CPU#0 stuck for 21s! 
> [ksoftirqd/0:9]
> 
> Rework so that we don't let KCOV look at tracing files. Could probably
> be more selective here, but in in general letting KCOV and ftrace check
> each isn't the best idea.
> 
> Co-developed-by: Arnd Bergmann 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Anders Roxell 
> ---
>  kernel/trace/Makefile | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index f81dadbc7c4a..c7c73b976103 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -6,6 +6,11 @@ ifdef CONFIG_FUNCTION_TRACER
>  ORIG_CFLAGS := $(KBUILD_CFLAGS)
>  KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))
>  
> +# If instrumentation of this dir is enabled, the function tracer gets really
> +# slow. Probably could be more selective here, but note that files related
> +# to tracing.shouldn't be traced anyway.
> +KCOV_INSTRUMENT  := n
> +

The entire directory is also set to not be traced by function tracing,
which also is a bit overkill, as there's functions in this directory
that can (and probably should) be.

Acked-by: Steven Rostedt (VMware) 

-- Steve

>  ifdef CONFIG_FTRACE_SELFTEST
>  # selftest needs instrumentation
>  CFLAGS_trace_selftest_dynamic.o = $(CC_FLAGS_FTRACE)



Re: [PATCH 1/3] stackleak: mark stackleak_track_stack() as notrace

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 16:08:59 +0100
Anders Roxell  wrote:

> Function graph tracing recurses into itself when stackleak is enabled,
> causing the ftrace graph selftest to run for up to 90 seconds and
> trigger the softlockup watchdog.
> 
> Breakpoint 2, ftrace_graph_caller () at 
> ../arch/arm64/kernel/entry-ftrace.S:200
> 200 mcount_get_lr_addrx0// pointer to function's 
> saved lr
> (gdb) bt
> \#0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
> \#1  0xff80081d5280 in ftrace_caller () at 
> ../arch/arm64/kernel/entry-ftrace.S:153
> \#2  0xff8008555484 in stackleak_track_stack () at 
> ../kernel/stackleak.c:106
> \#3  0xff8008421ff8 in ftrace_ops_test (ops=0xff8009eaa840 
> , ip=18446743524091297036, regs=) at 
> ../kernel/trace/ftrace.c:1507
> \#4  0xff8008428770 in __ftrace_ops_list_func (regs=, 
> ignored=, parent_ip=, ip=) at 
> ../kernel/trace/ftrace.c:6286
> \#5  ftrace_ops_no_ops (ip=18446743524091297036, 
> parent_ip=18446743524091242824) at ../kernel/trace/ftrace.c:6321
> \#6  0xff80081d5280 in ftrace_caller () at 
> ../arch/arm64/kernel/entry-ftrace.S:153
> \#7  0xff800832fd10 in irq_find_mapping (domain=0xffc03fc4bc80, 
> hwirq=27) at ../kernel/irq/irqdomain.c:876
> \#8  0xff800832294c in __handle_domain_irq (domain=0xffc03fc4bc80, 
> hwirq=27, lookup=true, regs=0xff800814b840) at ../kernel/irq/irqdesc.c:650
> \#9  0xff80081d52b4 in ftrace_graph_caller () at 
> ../arch/arm64/kernel/entry-ftrace.S:205
> 
> Rework so we mark stackleak_track_stack as notrace
> 
> Co-developed-by: Arnd Bergmann 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Anders Roxell 

Acked-by: Steven Rostedt (VMware) 

-- Steve

> ---
>  kernel/stackleak.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
> index e42892926244..5de3bf596dd7 100644
> --- a/kernel/stackleak.c
> +++ b/kernel/stackleak.c
> @@ -102,7 +102,7 @@ asmlinkage void stackleak_erase(void)
>   current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64;
>  }
>  
> -void __used stackleak_track_stack(void)
> +void __used notrace stackleak_track_stack(void)
>  {
>   /*
>* N.B. stackleak_erase() fills the kernel stack with the poison value,



Re: [PATCH v2] kernel/trace: fix watchdog soft lockup

2018-11-30 Thread Steven Rostedt
On Fri, 30 Nov 2018 15:56:22 +0100
Anders Roxell  wrote:

> When building a allmodconfig kernel for arm64 and boot that in qemu,
> CONFIG_FTRACE_STARTUP_TEST gets enabled and that takes time so the
> watchdog expires and prints out a message like this:
> 'watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]'
> Depending on what the what test gets called from init_trace_selftests()
> it stays minutes in the loop.
> Rework so that function cond_resched() gets called in the
> init_trace_selftests loop.
> 

This looks fine to me. Should it be marked for stable, and pushed into
this release cycle, or wait till the next merge window?

-- Steve

> Co-developed-by: Arnd Bergmann 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Anders Roxell 
> ---
>  kernel/trace/trace.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 5706599ed534..109becbc81ca 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1547,6 +1547,10 @@ static __init int init_trace_selftests(void)
>   pr_info("Running postponed tracer tests:\n");
>  
>   list_for_each_entry_safe(p, n, _selftests, list) {
> + /* This loop can take minutes when sanitizers are enabled, so
> +  * lets make sure we allow RCU processing.
> +  */
> + cond_resched();
>   ret = run_tracer_selftest(p->type);
>   /* If the test fails, then warn and remove from 
> available_tracers */
>   if (ret < 0) {



  1   2   3   4   5   6   7   8   9   10   >