Re: [PATCH v3 01/15] stacktrace/x86: add function for detecting reliable stack traces

2016-12-20 Thread Josh Poimboeuf
On Tue, Dec 20, 2016 at 10:39:16AM +0100, Petr Mladek wrote:
> On Mon 2016-12-19 11:25:49, Josh Poimboeuf wrote:
> > 3) probably some kind of runtime NMI stack checking feature to
> >complement objtool, along with a lot of burn time to ensure there are
> >no issues, particularly in entry code
> 
> Could you please provide more details about this NMI stack checking?
> What is it supposed to protect that objtool could not?
> Will it run regularly or will it be just a random check?

save_stack_trace_tsk_reliable(current) would be called periodically from
an NMI handler, and a warning would be printed if it ever doesn't reach
the "end" of the stack (i.e., user-mode pt_regs).  Due to the
performance impact it would probably only be a debug option.

It would verify the special hand-coded areas which objtool isn't smart
enough to understand, like entry code, ftrace, kprobes, bpf.  It would
also make sure that objtool itself didn't missing anything.

-- 
Josh


Re: [PATCH v3 01/15] stacktrace/x86: add function for detecting reliable stack traces

2016-12-20 Thread Petr Mladek
On Mon 2016-12-19 11:25:49, Josh Poimboeuf wrote:
> On Mon, Dec 19, 2016 at 05:25:19PM +0100, Miroslav Benes wrote:
> > On Thu, 8 Dec 2016, Josh Poimboeuf wrote:
> > 
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > index 215612c..b4a6663 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -155,6 +155,7 @@ config X86
> > >   select HAVE_PERF_REGS
> > >   select HAVE_PERF_USER_STACK_DUMP
> > >   select HAVE_REGS_AND_STACK_ACCESS_API
> > > + select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER && 
> > > STACK_VALIDATION
> > 
> > Tests to measure possible performance penalty of frame pointers were done 
> > by Mel Gorman. The outcome was quite clear. There IS a measurable 
> > impact. The percentage depends on the workflow but I think it is safe to 
> > say that FP usually takes 5-10 percents.
> > 
> > If my understanding is correct there is no single culprit. Register 
> > pressure is definitely not a problem. We ran simple benchmarks while 
> > taking a register away from GCC (RBP or a common one). The impact is a 
> > combination of more cacheline pressure, more memory accesses and the fact 
> > that the kernel contains a lot of small functions.
> > 
> > Thus, I think that DWARF should be the way to go here.
> > 
> > Other than that the patch looks good to me.
> 
> I agree that DWARF is generally a good idea, and I'm working toward it.
> However there's still quite a bit of work to get there.
> 
> For this consistency model to work with DWARF on x86, we would need:
> 
> 1) a reliable x86 DWARF unwinder with Linus's blessing
> 2) objtool DWARF support (I'm working on this at the moment)
> 3) probably some kind of runtime NMI stack checking feature to
>complement objtool, along with a lot of burn time to ensure there are
>no issues, particularly in entry code

Could you please provide more details about this NMI stack checking?
What is it supposed to protect that objtool could not?
Will it run regularly or will it be just a random check?

The other points are obvious. But I do not know what to
think about this NMI thing. And so I am curious :-)


> 4) port save_stack_trace_tsk_reliable() to work with DWARF

Best Regards,
Petr


Re: [PATCH v3 01/15] stacktrace/x86: add function for detecting reliable stack traces

2016-12-19 Thread Miroslav Benes
On Mon, 19 Dec 2016, Josh Poimboeuf wrote:

> On Mon, Dec 19, 2016 at 05:25:19PM +0100, Miroslav Benes wrote:
> > On Thu, 8 Dec 2016, Josh Poimboeuf wrote:
> > 
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > index 215612c..b4a6663 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -155,6 +155,7 @@ config X86
> > >   select HAVE_PERF_REGS
> > >   select HAVE_PERF_USER_STACK_DUMP
> > >   select HAVE_REGS_AND_STACK_ACCESS_API
> > > + select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER && 
> > > STACK_VALIDATION
> > 
> > Tests to measure possible performance penalty of frame pointers were done 
> > by Mel Gorman. The outcome was quite clear. There IS a measurable 
> > impact. The percentage depends on the workflow but I think it is safe to 
> > say that FP usually takes 5-10 percents.
> > 
> > If my understanding is correct there is no single culprit. Register 
> > pressure is definitely not a problem. We ran simple benchmarks while 
> > taking a register away from GCC (RBP or a common one). The impact is a 
> > combination of more cacheline pressure, more memory accesses and the fact 
> > that the kernel contains a lot of small functions.
> > 
> > Thus, I think that DWARF should be the way to go here.
> > 
> > Other than that the patch looks good to me.
> 
> I agree that DWARF is generally a good idea, and I'm working toward it.
> However there's still quite a bit of work to get there.
> 
> For this consistency model to work with DWARF on x86, we would need:
> 
> 1) a reliable x86 DWARF unwinder with Linus's blessing
> 2) objtool DWARF support (I'm working on this at the moment)
> 3) probably some kind of runtime NMI stack checking feature to
>complement objtool, along with a lot of burn time to ensure there are
>no issues, particularly in entry code
> 4) port save_stack_trace_tsk_reliable() to work with DWARF

Yes, this is a lot of work to do.
 
> DWARF will be nice to have, but it's definitely not required before
> merging this consistency model.

Oh, I didn't mean it to be done before merging this patch set. Sorry for 
the confusion. The point was that as long as the performance is involved 
FP does not look that promising and DWARF could be better (but who knows, 
right?).

> Also I doubt we'll ever be able to drop frame pointer support
> completely.  Some embedded systems may not want the overhead of the
> DWARF metadata.

True. There should be a choice in this respect.

Regards,
Miroslav


Re: [PATCH v3 01/15] stacktrace/x86: add function for detecting reliable stack traces

2016-12-19 Thread Josh Poimboeuf
On Mon, Dec 19, 2016 at 05:25:19PM +0100, Miroslav Benes wrote:
> On Thu, 8 Dec 2016, Josh Poimboeuf wrote:
> 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 215612c..b4a6663 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -155,6 +155,7 @@ config X86
> > select HAVE_PERF_REGS
> > select HAVE_PERF_USER_STACK_DUMP
> > select HAVE_REGS_AND_STACK_ACCESS_API
> > +   select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER && 
> > STACK_VALIDATION
> 
> Tests to measure possible performance penalty of frame pointers were done 
> by Mel Gorman. The outcome was quite clear. There IS a measurable 
> impact. The percentage depends on the workflow but I think it is safe to 
> say that FP usually takes 5-10 percents.
> 
> If my understanding is correct there is no single culprit. Register 
> pressure is definitely not a problem. We ran simple benchmarks while 
> taking a register away from GCC (RBP or a common one). The impact is a 
> combination of more cacheline pressure, more memory accesses and the fact 
> that the kernel contains a lot of small functions.
> 
> Thus, I think that DWARF should be the way to go here.
> 
> Other than that the patch looks good to me.

I agree that DWARF is generally a good idea, and I'm working toward it.
However there's still quite a bit of work to get there.

For this consistency model to work with DWARF on x86, we would need:

1) a reliable x86 DWARF unwinder with Linus's blessing
2) objtool DWARF support (I'm working on this at the moment)
3) probably some kind of runtime NMI stack checking feature to
   complement objtool, along with a lot of burn time to ensure there are
   no issues, particularly in entry code
4) port save_stack_trace_tsk_reliable() to work with DWARF

DWARF will be nice to have, but it's definitely not required before
merging this consistency model.

Also I doubt we'll ever be able to drop frame pointer support
completely.  Some embedded systems may not want the overhead of the
DWARF metadata.

-- 
Josh


Re: [PATCH v3 01/15] stacktrace/x86: add function for detecting reliable stack traces

2016-12-19 Thread Miroslav Benes
On Thu, 8 Dec 2016, Josh Poimboeuf wrote:

> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 215612c..b4a6663 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -155,6 +155,7 @@ config X86
>   select HAVE_PERF_REGS
>   select HAVE_PERF_USER_STACK_DUMP
>   select HAVE_REGS_AND_STACK_ACCESS_API
> + select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER && 
> STACK_VALIDATION

Tests to measure possible performance penalty of frame pointers were done 
by Mel Gorman. The outcome was quite clear. There IS a measurable 
impact. The percentage depends on the workflow but I think it is safe to 
say that FP usually takes 5-10 percents.

If my understanding is correct there is no single culprit. Register 
pressure is definitely not a problem. We ran simple benchmarks while 
taking a register away from GCC (RBP or a common one). The impact is a 
combination of more cacheline pressure, more memory accesses and the fact 
that the kernel contains a lot of small functions.

Thus, I think that DWARF should be the way to go here.

Other than that the patch looks good to me.

Miroslav 


Re: [PATCH v3 01/15] stacktrace/x86: add function for detecting reliable stack traces

2016-12-16 Thread Josh Poimboeuf
On Fri, Dec 16, 2016 at 02:07:39PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 12:08:26, Josh Poimboeuf wrote:
> > For live patching and possibly other use cases, a stack trace is only
> > useful if it can be assured that it's completely reliable.  Add a new
> > save_stack_trace_tsk_reliable() function to achieve that.
> > 
> > Scenarios which indicate that a stack trace may be unreliable:
> > 
> > - running task
> 
> It seems that this has to be enforced by save_stack_trace_tsk_reliable()
> caller. It should be mentioned in the function description.

Agreed.

> > - interrupt stack
> 
> I guess that it is detected by saved regs on the stack. And it covers
> also dynamic changes like kprobes. Do I get it correctly, please? 

Right.

> What about ftrace? Is ftrace without regs safe and detected?

Yes, it's safe because the mcount code does the right thing with respect
to frame pointers.  See save_mcount_regs().

> > - preemption
> 
> I wonder if some very active kthreads might almost always be
> preempted using irq in preemptive kernel. Then they block
> the conversion with the non-reliable stacks. Have you noticed
> such problems, please?

I haven't seen such a case and I think it would be quite rare for a
kthread to be CPU-bound like that.

> > - corrupted stack data
> > - stack grows the wrong way
> 
> This is detected in unwind_next_frame() and passed via state->error.
> Am I right?

Right.  I'll add more details to the commit message for all of these.
> 
> 
> > - stack walk doesn't reach the bottom
> > - user didn't provide a large enough entries array
> > 
> > Also add CONFIG_HAVE_RELIABLE_STACKTRACE so arch-independent code can
> > determine at build time whether the function is implemented.
> > 
> > diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> > index 0653788..3e0cf5e 100644
> > --- a/arch/x86/kernel/stacktrace.c
> > +++ b/arch/x86/kernel/stacktrace.c
> > @@ -74,6 +74,64 @@ void save_stack_trace_tsk(struct task_struct *tsk, 
> > struct stack_trace *trace)
> >  }
> >  EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
> >  
> > +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> > +static int __save_stack_trace_reliable(struct stack_trace *trace,
> > +  struct task_struct *task)
> > +{
> > +   struct unwind_state state;
> > +   struct pt_regs *regs;
> > +   unsigned long addr;
> > +
> > +   for (unwind_start(, task, NULL, NULL); !unwind_done();
> > +unwind_next_frame()) {
> > +
> > +   regs = unwind_get_entry_regs();
> > +   if (regs) {
> > +   /*
> > +* Preemption and page faults on the stack can make
> > +* frame pointers unreliable.
> > +*/
> > +   if (!user_mode(regs))
> > +   return -1;
> 
> By other words, it we find regs on the stack, it almost always mean
> a non-reliable stack. The only exception is when we are in the
> userspace mode. Do I get it correctly, please?

Right.

> > +
> > +   /*
> > +* This frame contains the (user mode) pt_regs at the
> > +* end of the stack.  Finish the unwind.
> > +*/
> > +   unwind_next_frame();
> > +   break;
> > +   }
> > +
> > +   addr = unwind_get_return_address();
> > +   if (!addr || save_stack_address(trace, addr, false))
> > +   return -1;
> > +   }
> > +
> > +   if (!unwind_done() || unwind_error())
> > +   return -1;
> > +
> > +   if (trace->nr_entries < trace->max_entries)
> > +   trace->entries[trace->nr_entries++] = ULONG_MAX;
> > +
> > +   return 0;
> > +}
> 
> Great work! I am surprised that it looks so straightforward.
> 
> I still have to think and investigate it more. But it looks
> very promissing.
> 
> Best Regards,
> Petr

-- 
Josh


Re: [PATCH v3 01/15] stacktrace/x86: add function for detecting reliable stack traces

2016-12-16 Thread Petr Mladek
On Thu 2016-12-08 12:08:26, Josh Poimboeuf wrote:
> For live patching and possibly other use cases, a stack trace is only
> useful if it can be assured that it's completely reliable.  Add a new
> save_stack_trace_tsk_reliable() function to achieve that.
> 
> Scenarios which indicate that a stack trace may be unreliable:
> 
> - running task

It seems that this has to be enforced by save_stack_trace_tsk_reliable()
caller. It should be mentioned in the function description.


> - interrupt stack

I guess that it is detected by saved regs on the stack. And it covers
also dynamic changes like kprobes. Do I get it correctly, please? 

What about ftrace? Is ftrace without regs safe and detected?


> - preemption

I wonder if some very active kthreads might almost always be
preempted using irq in preemptive kernel. Then they block
the conversion with the non-reliable stacks. Have you noticed
such problems, please?


> - corrupted stack data
> - stack grows the wrong way

This is detected in unwind_next_frame() and passed via state->error.
Am I right?


> - stack walk doesn't reach the bottom
> - user didn't provide a large enough entries array
> 
> Also add CONFIG_HAVE_RELIABLE_STACKTRACE so arch-independent code can
> determine at build time whether the function is implemented.
> 
> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> index 0653788..3e0cf5e 100644
> --- a/arch/x86/kernel/stacktrace.c
> +++ b/arch/x86/kernel/stacktrace.c
> @@ -74,6 +74,64 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct 
> stack_trace *trace)
>  }
>  EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
>  
> +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +static int __save_stack_trace_reliable(struct stack_trace *trace,
> +struct task_struct *task)
> +{
> + struct unwind_state state;
> + struct pt_regs *regs;
> + unsigned long addr;
> +
> + for (unwind_start(, task, NULL, NULL); !unwind_done();
> +  unwind_next_frame()) {
> +
> + regs = unwind_get_entry_regs();
> + if (regs) {
> + /*
> +  * Preemption and page faults on the stack can make
> +  * frame pointers unreliable.
> +  */
> + if (!user_mode(regs))
> + return -1;

By other words, it we find regs on the stack, it almost always mean
a non-reliable stack. The only exception is when we are in the
userspace mode. Do I get it correctly, please?

> +
> + /*
> +  * This frame contains the (user mode) pt_regs at the
> +  * end of the stack.  Finish the unwind.
> +  */
> + unwind_next_frame();
> + break;
> + }
> +
> + addr = unwind_get_return_address();
> + if (!addr || save_stack_address(trace, addr, false))
> + return -1;
> + }
> +
> + if (!unwind_done() || unwind_error())
> + return -1;
> +
> + if (trace->nr_entries < trace->max_entries)
> + trace->entries[trace->nr_entries++] = ULONG_MAX;
> +
> + return 0;
> +}

Great work! I am surprised that it looks so straightforward.

I still have to think and investigate it more. But it looks
very promissing.

Best Regards,
Petr


[PATCH v3 01/15] stacktrace/x86: add function for detecting reliable stack traces

2016-12-08 Thread Josh Poimboeuf
For live patching and possibly other use cases, a stack trace is only
useful if it can be assured that it's completely reliable.  Add a new
save_stack_trace_tsk_reliable() function to achieve that.

Scenarios which indicate that a stack trace may be unreliable:

- running task
- interrupt stack
- preemption
- corrupted stack data
- stack grows the wrong way
- stack walk doesn't reach the bottom
- user didn't provide a large enough entries array

Also add CONFIG_HAVE_RELIABLE_STACKTRACE so arch-independent code can
determine at build time whether the function is implemented.

Signed-off-by: Josh Poimboeuf 
---
 arch/Kconfig   |  6 +
 arch/x86/Kconfig   |  1 +
 arch/x86/include/asm/unwind.h  |  6 +
 arch/x86/kernel/stacktrace.c   | 59 +-
 arch/x86/kernel/unwind_frame.c |  1 +
 include/linux/stacktrace.h |  8 +++---
 kernel/stacktrace.c| 12 +++--
 7 files changed, 87 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 13f27c1..d61a133 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -678,6 +678,12 @@ config HAVE_STACK_VALIDATION
  Architecture supports the 'objtool check' host tool command, which
  performs compile-time stack metadata validation.
 
+config HAVE_RELIABLE_STACKTRACE
+   bool
+   help
+ Architecture has a save_stack_trace_tsk_reliable() function which
+ only returns a stack trace if it can guarantee the trace is reliable.
+
 config HAVE_ARCH_HASH
bool
default n
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 215612c..b4a6663 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -155,6 +155,7 @@ config X86
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER && 
STACK_VALIDATION
select HAVE_STACK_VALIDATIONif X86_64
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UNSTABLE_SCHED_CLOCK
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index c5a7f3a..44f86dc 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -11,6 +11,7 @@ struct unwind_state {
unsigned long stack_mask;
struct task_struct *task;
int graph_idx;
+   bool error;
 #ifdef CONFIG_FRAME_POINTER
unsigned long *bp;
struct pt_regs *regs;
@@ -40,6 +41,11 @@ void unwind_start(struct unwind_state *state, struct 
task_struct *task,
__unwind_start(state, task, regs, first_frame);
 }
 
+static inline bool unwind_error(struct unwind_state *state)
+{
+   return state->error;
+}
+
 #ifdef CONFIG_FRAME_POINTER
 
 static inline
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 0653788..3e0cf5e 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -74,6 +74,64 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct 
stack_trace *trace)
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
+#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
+static int __save_stack_trace_reliable(struct stack_trace *trace,
+  struct task_struct *task)
+{
+   struct unwind_state state;
+   struct pt_regs *regs;
+   unsigned long addr;
+
+   for (unwind_start(, task, NULL, NULL); !unwind_done();
+unwind_next_frame()) {
+
+   regs = unwind_get_entry_regs();
+   if (regs) {
+   /*
+* Preemption and page faults on the stack can make
+* frame pointers unreliable.
+*/
+   if (!user_mode(regs))
+   return -1;
+
+   /*
+* This frame contains the (user mode) pt_regs at the
+* end of the stack.  Finish the unwind.
+*/
+   unwind_next_frame();
+   break;
+   }
+
+   addr = unwind_get_return_address();
+   if (!addr || save_stack_address(trace, addr, false))
+   return -1;
+   }
+
+   if (!unwind_done() || unwind_error())
+   return -1;
+
+   if (trace->nr_entries < trace->max_entries)
+   trace->entries[trace->nr_entries++] = ULONG_MAX;
+
+   return 0;
+}
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+   int ret;
+
+   if (!try_get_task_stack(tsk))
+   return -EINVAL;
+
+   ret = __save_stack_trace_reliable(trace, tsk);
+
+   put_task_stack(tsk);
+
+   return ret;
+}
+#endif /* CONFIG_HAVE_RELIABLE_STACKTRACE */
+
 /* Userspace stacktrace - based on kernel/trace/trace_sysprof.c */
 
 struct