Re: [patch V6 04/37] x86: Make hardware latency tracing explicit
Andy Lutomirski writes: >> On May 20, 2020, at 1:10 PM, Thomas Gleixner wrote: >> Peter Zijlstra writes: >>> We probably have to anyway. But I can do that later I suppose. >> >> Second thoughts. For #DB and #INT3 we can just keep nmi_enter(), needs >> just annotation in nmi_enter() around that trace muck. >> >> For #NMI and #MCE I rather avoid the early trace call and do it once we >> have reached "stable" state, i.e. avoid it in the whole nested NMI mess. > > What’s the issue? The actual meat is mostly in the asm for NMI, and > for MCE it’s just the sync-all-the-cores thing. The actual > simultaneous NMI-and-MCE case is utterly busted regardless, and I’ve > been thinking about how to fix it. It won’t be pretty, but nmi_enter() > will have nothing to do with it. The issue is that I want to avoid anything which is not essential just for pure paranoia reasons. I can drop that and just move the trace muck after RCU is safe and annotate it properly.
Re: [patch V6 04/37] x86: Make hardware latency tracing explicit
> On May 20, 2020, at 1:10 PM, Thomas Gleixner wrote: > > Peter Zijlstra writes: >>> On Mon, May 18, 2020 at 10:05:56AM +0200, Thomas Gleixner wrote: >>> Peter Zijlstra writes: On Sat, May 16, 2020 at 01:45:51AM +0200, Thomas Gleixner wrote: > --- a/arch/x86/kernel/nmi.c > +++ b/arch/x86/kernel/nmi.c > @@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc >__this_cpu_write(last_nmi_rip, regs->ip); > >instrumentation_begin(); > +ftrace_nmi_handler_enter(); > >handled = nmi_handle(NMI_LOCAL, regs); >__this_cpu_add(nmi_stats.normal, handled); > @@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc >unknown_nmi_error(reason, regs); > > out: > +ftrace_nmi_handler_exit(); >instrumentation_end(); > } Yeah, so I'm confused about this and the previous patch too. Why not do just this? Remove that ftrace_nmi_handler.* crud from nmi_{enter,exit}() and stick it here? Why do we needs the nmi_{enter,exit}_notrace() thing? >>> >>> Because you then have to fixup _all_ architectures which use >>> nmi_enter/exit(). >> >> We probably have to anyway. But I can do that later I suppose. > > Second thoughts. For #DB and #INT3 we can just keep nmi_enter(), needs > just annotation in nmi_enter() around that trace muck. > > For #NMI and #MCE I rather avoid the early trace call and do it once we > have reached "stable" state, i.e. avoid it in the whole nested NMI mess. > > What’s the issue? The actual meat is mostly in the asm for NMI, and for MCE it’s just the sync-all-the-cores thing. The actual simultaneous NMI-and-MCE case is utterly busted regardless, and I’ve been thinking about how to fix it. It won’t be pretty, but nmi_enter() will have nothing to do with it.
Re: [patch V6 04/37] x86: Make hardware latency tracing explicit
Peter Zijlstra writes: > On Mon, May 18, 2020 at 10:05:56AM +0200, Thomas Gleixner wrote: >> Peter Zijlstra writes: >> > On Sat, May 16, 2020 at 01:45:51AM +0200, Thomas Gleixner wrote: >> >> --- a/arch/x86/kernel/nmi.c >> >> +++ b/arch/x86/kernel/nmi.c >> >> @@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc >> >> __this_cpu_write(last_nmi_rip, regs->ip); >> >> >> >> instrumentation_begin(); >> >> + ftrace_nmi_handler_enter(); >> >> >> >> handled = nmi_handle(NMI_LOCAL, regs); >> >> __this_cpu_add(nmi_stats.normal, handled); >> >> @@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc >> >> unknown_nmi_error(reason, regs); >> >> >> >> out: >> >> + ftrace_nmi_handler_exit(); >> >> instrumentation_end(); >> >> } >> > >> > Yeah, so I'm confused about this and the previous patch too. Why not >> > do just this? Remove that ftrace_nmi_handler.* crud from >> > nmi_{enter,exit}() and stick it here? Why do we needs the >> > nmi_{enter,exit}_notrace() thing? >> >> Because you then have to fixup _all_ architectures which use >> nmi_enter/exit(). > > We probably have to anyway. But I can do that later I suppose. Second thoughts. For #DB and #INT3 we can just keep nmi_enter(), needs just annotation in nmi_enter() around that trace muck. For #NMI and #MCE I rather avoid the early trace call and do it once we have reached "stable" state, i.e. avoid it in the whole nested NMI mess. Thanks, tglx
Re: [patch V6 04/37] x86: Make hardware latency tracing explicit
On Mon, May 18, 2020 at 1:03 AM Thomas Gleixner wrote: > > Andy Lutomirski writes: > > On Sun, May 17, 2020 at 1:48 AM Thomas Gleixner wrote: > >> Remember this is about ensuring that all the state is properly > >> established before any of this instrumentation muck can happen. > >> > >> DR7 handling is specific to #DB and done even before nmi_enter to > >> prevent recursion. > > > > So why is this change needed? > > We really want nmi_enter() to be the carefully crafted mechanism which > establishes correct state in whatever strange context the exception > hits. Not more, not less. > > Random instrumentation has absolutely no business there and I went a > long way to make sure that this is enforcible by objtool. > > Aside of that the tracing which is contained in nmi_enter() is about > taking timestamps for hardware latency detection. If someone runs > hardware latency detection with active break/watchpoints then I really > can't help it. > Okay. I'll stop looking for the bug you're fixing, then.
Re: [patch V6 04/37] x86: Make hardware latency tracing explicit
On Mon, May 18, 2020 at 10:05:56AM +0200, Thomas Gleixner wrote: > Peter Zijlstra writes: > > On Sat, May 16, 2020 at 01:45:51AM +0200, Thomas Gleixner wrote: > >> --- a/arch/x86/kernel/nmi.c > >> +++ b/arch/x86/kernel/nmi.c > >> @@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc > >>__this_cpu_write(last_nmi_rip, regs->ip); > >> > >>instrumentation_begin(); > >> + ftrace_nmi_handler_enter(); > >> > >>handled = nmi_handle(NMI_LOCAL, regs); > >>__this_cpu_add(nmi_stats.normal, handled); > >> @@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc > >>unknown_nmi_error(reason, regs); > >> > >> out: > >> + ftrace_nmi_handler_exit(); > >>instrumentation_end(); > >> } > > > > Yeah, so I'm confused about this and the previous patch too. Why not > > do just this? Remove that ftrace_nmi_handler.* crud from > > nmi_{enter,exit}() and stick it here? Why do we needs the > > nmi_{enter,exit}_notrace() thing? > > Because you then have to fixup _all_ architectures which use > nmi_enter/exit(). We probably have to anyway. But I can do that later I suppose.
Re: [patch V6 04/37] x86: Make hardware latency tracing explicit
Peter Zijlstra writes: > On Sat, May 16, 2020 at 01:45:51AM +0200, Thomas Gleixner wrote: >> --- a/arch/x86/kernel/nmi.c >> +++ b/arch/x86/kernel/nmi.c >> @@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc >> __this_cpu_write(last_nmi_rip, regs->ip); >> >> instrumentation_begin(); >> +ftrace_nmi_handler_enter(); >> >> handled = nmi_handle(NMI_LOCAL, regs); >> __this_cpu_add(nmi_stats.normal, handled); >> @@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc >> unknown_nmi_error(reason, regs); >> >> out: >> +ftrace_nmi_handler_exit(); >> instrumentation_end(); >> } > > Yeah, so I'm confused about this and the previous patch too. Why not > do just this? Remove that ftrace_nmi_handler.* crud from > nmi_{enter,exit}() and stick it here? Why do we needs the > nmi_{enter,exit}_notrace() thing? Because you then have to fixup _all_ architectures which use nmi_enter/exit(). Thanks, tglx
Re: [patch V6 04/37] x86: Make hardware latency tracing explicit
Andy Lutomirski writes: > On Sun, May 17, 2020 at 1:48 AM Thomas Gleixner wrote: >> Remember this is about ensuring that all the state is properly >> established before any of this instrumentation muck can happen. >> >> DR7 handling is specific to #DB and done even before nmi_enter to >> prevent recursion. > > So why is this change needed? We really want nmi_enter() to be the carefully crafted mechanism which establishes correct state in whatever strange context the exception hits. Not more, not less. Random instrumentation has absolutely no business there and I went a long way to make sure that this is enforcible by objtool. Aside of that the tracing which is contained in nmi_enter() is about taking timestamps for hardware latency detection. If someone runs hardware latency detection with active break/watchpoints then I really can't help it. Thanks, tglx
Re: [patch V6 04/37] x86: Make hardware latency tracing explicit
On Sat, May 16, 2020 at 01:45:51AM +0200, Thomas Gleixner wrote: > --- a/arch/x86/kernel/nmi.c > +++ b/arch/x86/kernel/nmi.c > @@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc > __this_cpu_write(last_nmi_rip, regs->ip); > > instrumentation_begin(); > + ftrace_nmi_handler_enter(); > > handled = nmi_handle(NMI_LOCAL, regs); > __this_cpu_add(nmi_stats.normal, handled); > @@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc > unknown_nmi_error(reason, regs); > > out: > + ftrace_nmi_handler_exit(); > instrumentation_end(); > } Yeah, so I'm confused about this and the previous patch too. Why not do just this? Remove that ftrace_nmi_handler.* crud from nmi_{enter,exit}() and stick it here? Why do we needs the nmi_{enter,exit}_notrace() thing?
Re: [patch V6 04/37] x86: Make hardware latency tracing explicit
On Sun, May 17, 2020 at 1:48 AM Thomas Gleixner wrote: > > Andy Lutomirski writes: > > On Fri, May 15, 2020 at 5:10 PM Thomas Gleixner wrote: > >> > >> > >> The hardware latency tracer calls into trace_sched_clock and ends up in > >> various instrumentable functions which is problemeatic vs. the kprobe > >> handling especially the text poke machinery. It's invoked from > >> nmi_enter/exit(), i.e. non-instrumentable code. > >> > >> Use nmi_enter/exit_notrace() instead. These variants do not invoke the > >> hardware latency tracer which avoids chasing down complex callchains to > >> make them non-instrumentable. > >> > >> The real interesting measurement is the actual NMI handler. Add an explicit > >> invocation for the hardware latency tracer to it. > >> > >> #DB and #BP are uninteresting as they really should not be in use when > >> analzying hardware induced latencies. > >> > > > >> @@ -849,7 +851,7 @@ static void noinstr handle_debug(struct > >> static __always_inline void exc_debug_kernel(struct pt_regs *regs, > >> unsigned long dr6) > >> { > >> - nmi_enter(); > >> + nmi_enter_notrace(); > > > > Why can't exc_debug_kernel() handle instrumentation? We shouldn't > > recurse into #DB since we've already cleared DR7, right? > > It can later on. The point is that the trace stuff calls into the world > and some more before the entry handling is complete. > > Remember this is about ensuring that all the state is properly > established before any of this instrumentation muck can happen. > > DR7 handling is specific to #DB and done even before nmi_enter to > prevent recursion. So why is this change needed?
Re: [patch V6 04/37] x86: Make hardware latency tracing explicit
Andy Lutomirski writes: > On Fri, May 15, 2020 at 5:10 PM Thomas Gleixner wrote: >> >> >> The hardware latency tracer calls into trace_sched_clock and ends up in >> various instrumentable functions which is problemeatic vs. the kprobe >> handling especially the text poke machinery. It's invoked from >> nmi_enter/exit(), i.e. non-instrumentable code. >> >> Use nmi_enter/exit_notrace() instead. These variants do not invoke the >> hardware latency tracer which avoids chasing down complex callchains to >> make them non-instrumentable. >> >> The real interesting measurement is the actual NMI handler. Add an explicit >> invocation for the hardware latency tracer to it. >> >> #DB and #BP are uninteresting as they really should not be in use when >> analzying hardware induced latencies. >> > >> @@ -849,7 +851,7 @@ static void noinstr handle_debug(struct >> static __always_inline void exc_debug_kernel(struct pt_regs *regs, >> unsigned long dr6) >> { >> - nmi_enter(); >> + nmi_enter_notrace(); > > Why can't exc_debug_kernel() handle instrumentation? We shouldn't > recurse into #DB since we've already cleared DR7, right? It can later on. The point is that the trace stuff calls into the world and some more before the entry handling is complete. Remember this is about ensuring that all the state is properly established before any of this instrumentation muck can happen. DR7 handling is specific to #DB and done even before nmi_enter to prevent recursion. Thanks, tglx
Re: [patch V6 04/37] x86: Make hardware latency tracing explicit
On Fri, May 15, 2020 at 5:10 PM Thomas Gleixner wrote: > > > The hardware latency tracer calls into trace_sched_clock and ends up in > various instrumentable functions which is problemeatic vs. the kprobe > handling especially the text poke machinery. It's invoked from > nmi_enter/exit(), i.e. non-instrumentable code. > > Use nmi_enter/exit_notrace() instead. These variants do not invoke the > hardware latency tracer which avoids chasing down complex callchains to > make them non-instrumentable. > > The real interesting measurement is the actual NMI handler. Add an explicit > invocation for the hardware latency tracer to it. > > #DB and #BP are uninteresting as they really should not be in use when > analzying hardware induced latencies. > > @@ -849,7 +851,7 @@ static void noinstr handle_debug(struct > static __always_inline void exc_debug_kernel(struct pt_regs *regs, > unsigned long dr6) > { > - nmi_enter(); > + nmi_enter_notrace(); Why can't exc_debug_kernel() handle instrumentation? We shouldn't recurse into #DB since we've already cleared DR7, right?
[patch V6 04/37] x86: Make hardware latency tracing explicit
The hardware latency tracer calls into trace_sched_clock and ends up in various instrumentable functions which is problemeatic vs. the kprobe handling especially the text poke machinery. It's invoked from nmi_enter/exit(), i.e. non-instrumentable code. Use nmi_enter/exit_notrace() instead. These variants do not invoke the hardware latency tracer which avoids chasing down complex callchains to make them non-instrumentable. The real interesting measurement is the actual NMI handler. Add an explicit invocation for the hardware latency tracer to it. #DB and #BP are uninteresting as they really should not be in use when analzying hardware induced latencies. If #DF hits, hardware latency is definitely not interesting anymore and in case of a machine check the hardware latency is not the most troublesome issue either. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/mce/core.c |4 ++-- arch/x86/kernel/nmi.c |6 -- arch/x86/kernel/traps.c| 12 +++- 3 files changed, 13 insertions(+), 9 deletions(-) --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1916,7 +1916,7 @@ static __always_inline void exc_machine_ mce_check_crashing_cpu()) return; - nmi_enter(); + nmi_enter_notrace(); /* * The call targets are marked noinstr, but objtool can't figure * that out because it's an indirect call. Annotate it. @@ -1924,7 +1924,7 @@ static __always_inline void exc_machine_ instrumentation_begin(); machine_check_vector(regs); instrumentation_end(); - nmi_exit(); + nmi_exit_notrace(); } static __always_inline void exc_machine_check_user(struct pt_regs *regs) --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc __this_cpu_write(last_nmi_rip, regs->ip); instrumentation_begin(); + ftrace_nmi_handler_enter(); handled = nmi_handle(NMI_LOCAL, regs); __this_cpu_add(nmi_stats.normal, handled); @@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc unknown_nmi_error(reason, regs); out: + ftrace_nmi_handler_exit(); instrumentation_end(); } @@ -536,14 +538,14 @@ DEFINE_IDTENTRY_NMI(exc_nmi) } #endif - nmi_enter(); + nmi_enter_notrace(); inc_irq_stat(__nmi_count); if (!ignore_nmis) default_do_nmi(regs); - nmi_exit(); + nmi_exit_notrace(); #ifdef CONFIG_X86_64 if (unlikely(this_cpu_read(update_debug_stack))) { --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -387,7 +387,7 @@ DEFINE_IDTENTRY_DF(exc_double_fault) } #endif - nmi_enter(); + nmi_enter_notrace(); instrumentation_begin(); notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV); @@ -632,12 +632,14 @@ DEFINE_IDTENTRY_RAW(exc_int3) instrumentation_end(); idtentry_exit(regs); } else { - nmi_enter(); + nmi_enter_notrace(); instrumentation_begin(); + ftrace_nmi_handler_enter(); if (!do_int3(regs)) die("int3", regs, 0); + ftrace_nmi_handler_exit(); instrumentation_end(); - nmi_exit(); + nmi_exit_notrace(); } } @@ -849,7 +851,7 @@ static void noinstr handle_debug(struct static __always_inline void exc_debug_kernel(struct pt_regs *regs, unsigned long dr6) { - nmi_enter(); + nmi_enter_notrace(); /* * The SDM says "The processor clears the BTF flag when it * generates a debug exception." Clear TIF_BLOCKSTEP to keep @@ -871,7 +873,7 @@ static __always_inline void exc_debug_ke if (dr6) handle_debug(regs, dr6, false); - nmi_exit(); + nmi_exit_notrace(); } static __always_inline void exc_debug_user(struct pt_regs *regs,