Re: [PATCH v3 1/2] kretprobe: produce sane stack traces

2018-11-09 Thread Aleksa Sarai
On 2018-11-09, Masami Hiramatsu  wrote:
> > diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> > index ee696efec99f..c4dfafd43e11 100644
> > --- a/arch/x86/include/asm/ptrace.h
> > +++ b/arch/x86/include/asm/ptrace.h
> > @@ -172,6 +172,7 @@ static inline unsigned long kernel_stack_pointer(struct 
> > pt_regs *regs)
> > return regs->sp;
> >  }
> >  #endif
> > +#define stack_addr(regs) ((unsigned long *) kernel_stack_pointer(regs))
> 
> No, you should use kernel_stack_pointer(regs) itself instead of stack_addr().
> 
> > 
> >  #define GET_IP(regs) ((regs)->ip)
> >  #define GET_FP(regs) ((regs)->bp)
> > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> > index b0d1e81c96bb..eb4da885020c 100644
> > --- a/arch/x86/kernel/kprobes/core.c
> > +++ b/arch/x86/kernel/kprobes/core.c
> > @@ -69,8 +69,6 @@
> >  DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
> >  DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
> >  
> > -#define stack_addr(regs) ((unsigned long *)kernel_stack_pointer(regs))
> 
> I don't like keeping this meaningless macro... this should be replaced with 
> generic
> kernel_stack_pointer() macro.

Sure. This patch was just an example -- I can remove stack_addr() all
over.

> > -   if (regs)
> > -   save_stack_address(trace, regs->ip, nosched);
> > +   if (regs) {
> > +   /* XXX: Currently broken -- stack_addr(regs) doesn't match 
> > entry. */
> > +   addr = regs->ip;
> 
> Since this part is for storing regs->ip as a top of call-stack, this
> seems correct code. Stack unwind will be done next block.

This comment was referring to the usage of stack_addr(). stack_addr()
doesn't give you the right result (it isn't the address of the return
address -- it's slightly wrong). This is the main issue I was having --
am I doing something wrong here?

> > +   //addr = ftrace_graph_ret_addr(current, _idx, addr, 
> > stack_addr(regs));
> 
> so func graph return trampoline address will be shown only when unwinding 
> stack entries.
> I mean func-graph tracer is not used as an event, so it never kicks stackdump.

Just to make sure I understand what you're saying -- func-graph trace
will never actually call __ftrace_stack_trace? Because if it does, then
this code will be necessary (and then I'm a bit confused why the
unwinder has func-graph trace code -- if stack traces are never taken
under func-graph then the code in the unwinder is not necessary)

My reason for commenting this out is because at this point "state" isn't
initialised and thus .graph_idx would not be correctly handled during
unwind (and it's the same reason I commented it out later).

> > +   addr = kretprobe_ret_addr(current, addr, stack_addr(regs));
> 
> But since kretprobe will be an event, which can kick the stackdump.
> BTW, from kretprobe, regs->ip should always be the trampoline handler, 
> see arch/x86/kernel/kprobes/core.c:772 :-)
> So it must be fixed always.

Right, but kretprobe_ret_addr() is returning the *original* return
address (and we need to do an (addr == kretprobe_trampoline)). The
real problem is that stack_addr(regs) isn't the same as it is during
kretprobe setup (but kretprobe_ret_addr() works everywhere else).

> > @@ -1856,6 +1870,41 @@ static int pre_handler_kretprobe(struct kprobe *p, 
> > struct pt_regs *regs)
> >  }
> >  NOKPROBE_SYMBOL(pre_handler_kretprobe);
> >  
> > +unsigned long kretprobe_ret_addr(struct task_struct *tsk, unsigned long 
> > ret,
> > +unsigned long *retp)
> > +{
> > +   struct kretprobe_instance *ri;
> > +   unsigned long flags = 0;
> > +   struct hlist_head *head;
> > +   bool need_lock;
> > +
> > +   if (likely(ret != (unsigned long) _trampoline))
> > +   return ret;
> > +
> > +   need_lock = !kretprobe_hash_is_locked(tsk);
> > +   if (WARN_ON(need_lock))
> > +   kretprobe_hash_lock(tsk, , );
> > +   else
> > +   head = kretprobe_inst_table_head(tsk);
> 
> This may not work unless this is called from the kretprobe handler context,
> since if we are out of kretprobe handler context, another CPU can lock the
> hash table and it can be detected by kretprobe_hash_is_locked();.

Yeah, I noticed this as well when writing it (but needed a quick impl
that I could test). I will fix this, thanks!

By is_kretprobe_handler_context() I imagine you are referring to
checking is_kretprobe(current_kprobe())?

> So, we should check we are in the kretprobe handler context if tsk == current,
> if not, we definately can lock the hash lock without any warning. This can
> be something like;
> 
> if (is_kretprobe_handler_context()) {
>   // kretprobe_hash_lock(current == tsk) has been locked by caller  
>   if (tsk != current && kretprobe_hash(tsk) != kretprobe_hash(current))
> // the hash of tsk and current can be same.
> need_lock = true;
> } else
>   // we should take a lock for tsk.
>   need_lock = true;

-- 
Aleksa Sarai
Senior Software Engineer 

Re: [PATCH v3 1/2] kretprobe: produce sane stack traces

2018-11-09 Thread Aleksa Sarai
On 2018-11-09, Masami Hiramatsu  wrote:
> > diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> > index ee696efec99f..c4dfafd43e11 100644
> > --- a/arch/x86/include/asm/ptrace.h
> > +++ b/arch/x86/include/asm/ptrace.h
> > @@ -172,6 +172,7 @@ static inline unsigned long kernel_stack_pointer(struct 
> > pt_regs *regs)
> > return regs->sp;
> >  }
> >  #endif
> > +#define stack_addr(regs) ((unsigned long *) kernel_stack_pointer(regs))
> 
> No, you should use kernel_stack_pointer(regs) itself instead of stack_addr().
> 
> > 
> >  #define GET_IP(regs) ((regs)->ip)
> >  #define GET_FP(regs) ((regs)->bp)
> > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> > index b0d1e81c96bb..eb4da885020c 100644
> > --- a/arch/x86/kernel/kprobes/core.c
> > +++ b/arch/x86/kernel/kprobes/core.c
> > @@ -69,8 +69,6 @@
> >  DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
> >  DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
> >  
> > -#define stack_addr(regs) ((unsigned long *)kernel_stack_pointer(regs))
> 
> I don't like keeping this meaningless macro... this should be replaced with 
> generic
> kernel_stack_pointer() macro.

Sure. This patch was just an example -- I can remove stack_addr() all
over.

> > -   if (regs)
> > -   save_stack_address(trace, regs->ip, nosched);
> > +   if (regs) {
> > +   /* XXX: Currently broken -- stack_addr(regs) doesn't match 
> > entry. */
> > +   addr = regs->ip;
> 
> Since this part is for storing regs->ip as a top of call-stack, this
> seems correct code. Stack unwind will be done next block.

This comment was referring to the usage of stack_addr(). stack_addr()
doesn't give you the right result (it isn't the address of the return
address -- it's slightly wrong). This is the main issue I was having --
am I doing something wrong here?

> > +   //addr = ftrace_graph_ret_addr(current, _idx, addr, 
> > stack_addr(regs));
> 
> so func graph return trampoline address will be shown only when unwinding 
> stack entries.
> I mean func-graph tracer is not used as an event, so it never kicks stackdump.

Just to make sure I understand what you're saying -- func-graph trace
will never actually call __ftrace_stack_trace? Because if it does, then
this code will be necessary (and then I'm a bit confused why the
unwinder has func-graph trace code -- if stack traces are never taken
under func-graph then the code in the unwinder is not necessary)

My reason for commenting this out is because at this point "state" isn't
initialised and thus .graph_idx would not be correctly handled during
unwind (and it's the same reason I commented it out later).

> > +   addr = kretprobe_ret_addr(current, addr, stack_addr(regs));
> 
> But since kretprobe will be an event, which can kick the stackdump.
> BTW, from kretprobe, regs->ip should always be the trampoline handler, 
> see arch/x86/kernel/kprobes/core.c:772 :-)
> So it must be fixed always.

Right, but kretprobe_ret_addr() is returning the *original* return
address (and we need to do an (addr == kretprobe_trampoline)). The
real problem is that stack_addr(regs) isn't the same as it is during
kretprobe setup (but kretprobe_ret_addr() works everywhere else).

> > @@ -1856,6 +1870,41 @@ static int pre_handler_kretprobe(struct kprobe *p, 
> > struct pt_regs *regs)
> >  }
> >  NOKPROBE_SYMBOL(pre_handler_kretprobe);
> >  
> > +unsigned long kretprobe_ret_addr(struct task_struct *tsk, unsigned long 
> > ret,
> > +unsigned long *retp)
> > +{
> > +   struct kretprobe_instance *ri;
> > +   unsigned long flags = 0;
> > +   struct hlist_head *head;
> > +   bool need_lock;
> > +
> > +   if (likely(ret != (unsigned long) _trampoline))
> > +   return ret;
> > +
> > +   need_lock = !kretprobe_hash_is_locked(tsk);
> > +   if (WARN_ON(need_lock))
> > +   kretprobe_hash_lock(tsk, , );
> > +   else
> > +   head = kretprobe_inst_table_head(tsk);
> 
> This may not work unless this is called from the kretprobe handler context,
> since if we are out of kretprobe handler context, another CPU can lock the
> hash table and it can be detected by kretprobe_hash_is_locked();.

Yeah, I noticed this as well when writing it (but needed a quick impl
that I could test). I will fix this, thanks!

By is_kretprobe_handler_context() I imagine you are referring to
checking is_kretprobe(current_kprobe())?

> So, we should check we are in the kretprobe handler context if tsk == current,
> if not, we definately can lock the hash lock without any warning. This can
> be something like;
> 
> if (is_kretprobe_handler_context()) {
>   // kretprobe_hash_lock(current == tsk) has been locked by caller  
>   if (tsk != current && kretprobe_hash(tsk) != kretprobe_hash(current))
> // the hash of tsk and current can be same.
> need_lock = true;
> } else
>   // we should take a lock for tsk.
>   need_lock = true;

-- 
Aleksa Sarai
Senior Software Engineer 

Re: [PATCH v3 1/2] kretprobe: produce sane stack traces

2018-11-07 Thread Aleksa Sarai
On 2018-11-06, Steven Rostedt  wrote:
> On Sun, 4 Nov 2018 22:59:13 +1100
> Aleksa Sarai  wrote:
> 
> > The same issue is present in __save_stack_trace
> > (arch/x86/kernel/stacktrace.c). This is likely the only reason that --
> > as Steven said -- stacktraces wouldn't work with ftrace-graph (and thus
> > with the refactor both of you are discussing).
> 
> By the way, I was playing with the the orc unwinder and stack traces
> from the function graph tracer return code, and got it working with the
> below patch. Caution, that patch also has a stack trace hardcoded in
> the return path of the function graph tracer, so you don't want to run
> function graph tracing without filtering.

Neat!

> diff --git a/kernel/trace/trace_functions_graph.c 
> b/kernel/trace/trace_functions_graph.c
> index 169b3c44ee97..aaeca73218cc 100644
> --- a/kernel/trace/trace_functions_graph.c
> +++ b/kernel/trace/trace_functions_graph.c
> @@ -242,13 +242,16 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
> unsigned long *ret,
>   trace->calltime = current->ret_stack[index].calltime;
>   trace->overrun = atomic_read(>trace_overrun);
>   trace->depth = index;
> +
> + trace_dump_stack(0);

Right, this works because save_stack is not being passed a pt_regs. But if
you pass a pt_regs (as happens with bpf_getstackid -- which is what
spawned this discussion) then the top-most entry of the stack will still
be a trampoline because there is no ftrace_graph_ret_addr call.

(I'm struggling with how to fix this -- I can't figure out what retp
should be if you have a pt_regs. ->sp doesn't appear to work -- it's off
by a few bytes.)

I will attach what I have at the moment to hopefully explain what the
issue I've found is (re-using the kretprobe architecture but with the
shadow-stack idea).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: [PATCH v3 1/2] kretprobe: produce sane stack traces

2018-11-07 Thread Aleksa Sarai
On 2018-11-06, Steven Rostedt  wrote:
> On Sun, 4 Nov 2018 22:59:13 +1100
> Aleksa Sarai  wrote:
> 
> > The same issue is present in __save_stack_trace
> > (arch/x86/kernel/stacktrace.c). This is likely the only reason that --
> > as Steven said -- stacktraces wouldn't work with ftrace-graph (and thus
> > with the refactor both of you are discussing).
> 
> By the way, I was playing with the the orc unwinder and stack traces
> from the function graph tracer return code, and got it working with the
> below patch. Caution, that patch also has a stack trace hardcoded in
> the return path of the function graph tracer, so you don't want to run
> function graph tracing without filtering.

Neat!

> diff --git a/kernel/trace/trace_functions_graph.c 
> b/kernel/trace/trace_functions_graph.c
> index 169b3c44ee97..aaeca73218cc 100644
> --- a/kernel/trace/trace_functions_graph.c
> +++ b/kernel/trace/trace_functions_graph.c
> @@ -242,13 +242,16 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
> unsigned long *ret,
>   trace->calltime = current->ret_stack[index].calltime;
>   trace->overrun = atomic_read(>trace_overrun);
>   trace->depth = index;
> +
> + trace_dump_stack(0);

Right, this works because save_stack is not being passed a pt_regs. But if
you pass a pt_regs (as happens with bpf_getstackid -- which is what
spawned this discussion) then the top-most entry of the stack will still
be a trampoline because there is no ftrace_graph_ret_addr call.

(I'm struggling with how to fix this -- I can't figure out what retp
should be if you have a pt_regs. ->sp doesn't appear to work -- it's off
by a few bytes.)

I will attach what I have at the moment to hopefully explain what the
issue I've found is (re-using the kretprobe architecture but with the
shadow-stack idea).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: [PATCH v3 1/2] kretprobe: produce sane stack traces

2018-11-03 Thread Masami Hiramatsu
On Sat, 3 Nov 2018 13:30:21 -0400
Steven Rostedt  wrote:

> On Sun, 4 Nov 2018 01:34:30 +0900
> Masami Hiramatsu  wrote:
> > 
> > > I was thinking of a bitmask that represents the handlers, and use that
> > > to map which handler gets called for which shadow entry for a
> > > particular task.  
> > 
> > Hmm, I doubt that is too complicated and not scalable. I rather like to see
> > the open shadow entry...
> 
> It can scale and not too complex (I already played a little with it).
> But that said, I'm not committed to it, and using the shadow stack is
> also an interesting idea.
> 
> > 
> > entry: [[original_retaddr][function][modified_retaddr]]
> > 
> > So if there are many users on same function, the entries will be like this 
> > 
> > [[original_return_address][function][trampoline_A]]
> > [[trampline_A][function][trampoline_B]]
> > [[trampline_B][function][trampoline_C]]
> > 
> > And on the top of the stack, there is trampline_C instead of 
> > original_return_address.
> > In this case, return to trampoline_C(), it jumps back to trampline_B() and 
> > then
> > it jumps back to trampline_A(). And eventually it jumps back to
> > original_return_address.
> 
> Where are trampolines A, B, and C made? Do we also need to dynamically
> create them? If I register multiple function tracing ones, each one
> will need its own trampoline?
> 

No, I think tramplines are very limited. currently we will only have ftrace
and kretprobe trampolines.


> > This way, we don't need allocate another bitmap/pages for the shadow stack.
> > We only need a shadow stack for each task.
> > Also, unwinder can easily find the trampline_C from the shadow stack and 
> > restores
> > original_return_address. (of course trampline_A,B,C must be registered so 
> > that
> > search function can skip it.)
> 
> What I was thinking was to store a count and the functions to be called:
> 
> 
>   [original_return_address]
>   [function_A]
>   [function_B]
>   [function_C]
>   [ 3 ]
> 
> Then the trampoline that processes the return codes for ftrace (and
> kretprobes and everyone else) can simply do:
> 
>   count = pop_shadow_stack();
>   for (i = 0; i < count; i++) {
>   func = pop_shadow_stack();
>   func(...);
>   }
>   return_address = pop_shadow_stack();

Ah, that's a good idea. I think we also have to store the called function
entry address with the number header, but basically I agree with you.

If we have a space to store a data with the function address, that is also
good to kretprobe. systemtap heavily uses "entry data" for saving some data
at function entry for exit handler.

> That way we only need to register a function to the return handler and
> it will be called, without worrying about making trampolines. There
> will just be a single trampoline that handles all the work.

OK, and could you make it independent from func graph tracer, so that
CONFIG_KPROBES=y but CONFIG_FUNCTION_GRAPH_TRACER=n kernel can support
kretprobes too.

Thank you,


-- 
Masami Hiramatsu 


Re: [PATCH v3 1/2] kretprobe: produce sane stack traces

2018-11-03 Thread Masami Hiramatsu
On Sat, 3 Nov 2018 13:30:21 -0400
Steven Rostedt  wrote:

> On Sun, 4 Nov 2018 01:34:30 +0900
> Masami Hiramatsu  wrote:
> > 
> > > I was thinking of a bitmask that represents the handlers, and use that
> > > to map which handler gets called for which shadow entry for a
> > > particular task.  
> > 
> > Hmm, I doubt that is too complicated and not scalable. I rather like to see
> > the open shadow entry...
> 
> It can scale and not too complex (I already played a little with it).
> But that said, I'm not committed to it, and using the shadow stack is
> also an interesting idea.
> 
> > 
> > entry: [[original_retaddr][function][modified_retaddr]]
> > 
> > So if there are many users on same function, the entries will be like this 
> > 
> > [[original_return_address][function][trampoline_A]]
> > [[trampline_A][function][trampoline_B]]
> > [[trampline_B][function][trampoline_C]]
> > 
> > And on the top of the stack, there is trampline_C instead of 
> > original_return_address.
> > In this case, return to trampoline_C(), it jumps back to trampline_B() and 
> > then
> > it jumps back to trampline_A(). And eventually it jumps back to
> > original_return_address.
> 
> Where are trampolines A, B, and C made? Do we also need to dynamically
> create them? If I register multiple function tracing ones, each one
> will need its own trampoline?
> 

No, I think tramplines are very limited. currently we will only have ftrace
and kretprobe trampolines.


> > This way, we don't need allocate another bitmap/pages for the shadow stack.
> > We only need a shadow stack for each task.
> > Also, unwinder can easily find the trampline_C from the shadow stack and 
> > restores
> > original_return_address. (of course trampline_A,B,C must be registered so 
> > that
> > search function can skip it.)
> 
> What I was thinking was to store a count and the functions to be called:
> 
> 
>   [original_return_address]
>   [function_A]
>   [function_B]
>   [function_C]
>   [ 3 ]
> 
> Then the trampoline that processes the return codes for ftrace (and
> kretprobes and everyone else) can simply do:
> 
>   count = pop_shadow_stack();
>   for (i = 0; i < count; i++) {
>   func = pop_shadow_stack();
>   func(...);
>   }
>   return_address = pop_shadow_stack();

Ah, that's a good idea. I think we also have to store the called function
entry address with the number header, but basically I agree with you.

If we have a space to store a data with the function address, that is also
good to kretprobe. systemtap heavily uses "entry data" for saving some data
at function entry for exit handler.

> That way we only need to register a function to the return handler and
> it will be called, without worrying about making trampolines. There
> will just be a single trampoline that handles all the work.

OK, and could you make it independent from func graph tracer, so that
CONFIG_KPROBES=y but CONFIG_FUNCTION_GRAPH_TRACER=n kernel can support
kretprobes too.

Thank you,


-- 
Masami Hiramatsu