Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles

2020-05-08 Thread Peter Zijlstra
On Fri, May 08, 2020 at 02:26:32PM +0200, Thomas Gleixner wrote:
> Peter Zijlstra  writes:
> 
> > On Thu, May 07, 2020 at 11:24:49PM +0200, Thomas Gleixner wrote:
> >> But over our IRC conversation I came up with a 3rd variant:
> >> 
> >>   For most of the vectors the indirect call overhead is just noise, so
> >>   we can run them through the ASM switcher, but for the resched IPI
> >>   we can just use a separate direct call stub in ASM.
> >
> > Are we sure the rat-poison crap is noise for all the other system
> > vectors? I suppose it is for most since they'll do indirect calls
> > themselves anyway, right?
> 
> We have different categories:
> 
>  1) Uninteresting
> 
> SPURIOUS_APIC_VECTOR, ERROR_APIC_VECTOR, THERMAL_APIC_VECTOR,
> THRESHOLD_APIC_VECTOR, REBOOT_VECTOR, DEFERRED_ERROR_VECTOR
> 
>  2) Indirect call poisoned
> 
> LOCAL_TIMER_VECTOR
> X86_PLATFORM_IPI_VECTOR
> IRQ_WORK_VECTOR
> HYPERV_STIMER0_VECTOR
> HYPERVISOR_CALLBACK_VECTOR
> POSTED_INTERRUPT_WAKEUP_VECTOR.
> CALL_FUNCTION_VECTOR
> CALL_FUNCTION_SINGLE_VECTOR
> 
>  3) Quick
> 
> RESCHEDULE_VECTOR
> 
> POSTED_INTR_VECTOR
> POSTED_INTR_NESTED_VECTOR
> 
> These two postit ones are weird because they are both empty and
> just increment different irq counts.
> 
> HYPERV_REENLIGHTENMENT_VECTOR
> 
> schedules delayed work, i,e. arms a timer which should be
> straight forward, but does it matter?
> 
>  4) Others
> 
> UV_BAU_MESSAGE - The TLB flushes are probably more expensive than
>  ratpoutine
> 
> Hmm?

As we just agreed on IRC, 3) can run without changing stack, and then
the rest can use the indirect thing.


Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles

2020-05-08 Thread Thomas Gleixner
Thomas Gleixner  writes:
> Peter Zijlstra  writes:
>> Are we sure the rat-poison crap is noise for all the other system
>> vectors? I suppose it is for most since they'll do indirect calls
>> themselves anyway, right?
>
>  3) Quick
>
> RESCHEDULE_VECTOR
>
> POSTED_INTR_VECTOR
> POSTED_INTR_NESTED_VECTOR
>
> These two postit ones are weird because they are both empty and
> just increment different irq counts.

For those 3 it's also pointless to run them on IST stack at all.

> HYPERV_REENLIGHTENMENT_VECTOR
>
> schedules delayed work, i,e. arms a timer which should be
> straight forward, but does it matter?

This one shouldn't have an issue when running on task stack either, but
we can run it through the regular indirect path for now and switch it
over when it matters performance wise.

Thanks,

tglx


Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles

2020-05-08 Thread Thomas Gleixner
Peter Zijlstra  writes:

> On Thu, May 07, 2020 at 11:24:49PM +0200, Thomas Gleixner wrote:
>> But over our IRC conversation I came up with a 3rd variant:
>> 
>>   For most of the vectors the indirect call overhead is just noise, so
>>   we can run them through the ASM switcher, but for the resched IPI
>>   we can just use a separate direct call stub in ASM.
>
> Are we sure the rat-poison crap is noise for all the other system
> vectors? I suppose it is for most since they'll do indirect calls
> themselves anyway, right?

We have different categories:

 1) Uninteresting

SPURIOUS_APIC_VECTOR, ERROR_APIC_VECTOR, THERMAL_APIC_VECTOR,
THRESHOLD_APIC_VECTOR, REBOOT_VECTOR, DEFERRED_ERROR_VECTOR

 2) Indirect call poisoned

LOCAL_TIMER_VECTOR
X86_PLATFORM_IPI_VECTOR
IRQ_WORK_VECTOR
HYPERV_STIMER0_VECTOR
HYPERVISOR_CALLBACK_VECTOR
POSTED_INTERRUPT_WAKEUP_VECTOR.
CALL_FUNCTION_VECTOR
CALL_FUNCTION_SINGLE_VECTOR

 3) Quick

RESCHEDULE_VECTOR

POSTED_INTR_VECTOR
POSTED_INTR_NESTED_VECTOR

These two postit ones are weird because they are both empty and
just increment different irq counts.

HYPERV_REENLIGHTENMENT_VECTOR

schedules delayed work, i,e. arms a timer which should be
straight forward, but does it matter?

 4) Others

UV_BAU_MESSAGE - The TLB flushes are probably more expensive than
 ratpoutine

Hmm?

Thanks,

tglx


Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles

2020-05-08 Thread Peter Zijlstra
On Thu, May 07, 2020 at 11:24:49PM +0200, Thomas Gleixner wrote:
> But over our IRC conversation I came up with a 3rd variant:
> 
>   For most of the vectors the indirect call overhead is just noise, so
>   we can run them through the ASM switcher, but for the resched IPI
>   we can just use a separate direct call stub in ASM.

Are we sure the rat-poison crap is noise for all the other system
vectors? I suppose it is for most since they'll do indirect calls
themselves anyway, right?

> I can live with that. I might have to pay up for Peter's headaches to
> teach objtool, but that's a different story. Let me check how many beers
> he owes me first ...

We're going to be so massively drunk if we ever settle that :-) For now
I'll just have to live with knowing more about the unwinders.


Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles

2020-05-07 Thread Thomas Gleixner
Josh Poimboeuf  writes:
> On Thu, May 07, 2020 at 07:38:09PM +0200, Peter Zijlstra wrote:
>> On Thu, May 07, 2020 at 06:10:23PM +0200, Peter Zijlstra wrote:
>> Much simpler, also works.
>
> Doing the stack switch in inline asm is just nasty.

Matter of personal preference :)

> Also, a frame pointer makes this SO much easier for ORC/objtool, no
> funky hints needed.  In fact maybe we can get rid of the indirect hint
> things altogether, which means even more deleted code.
>
> This is much cleaner, and it boots:

Well, yes. I thought about that and it clearly works fine. There is a
downside because it adds the indirect call to all system vectors.

The device interrupt part is fine, it has an indirect call already, but
for system vectors and in particular the rescheduling IPI its bad. Not
bad because of the indirect call per se, but bad because of the
ratpoutine mitigation mess.

My other alternative was to emit switcheroo into ASM for each system
vector. Not pretty either

But over our IRC conversation I came up with a 3rd variant:

  For most of the vectors the indirect call overhead is just noise, so
  we can run them through the ASM switcher, but for the resched IPI
  we can just use a separate direct call stub in ASM.

I can live with that. I might have to pay up for Peter's headaches to
teach objtool, but that's a different story. Let me check how many beers
he owes me first ...

Thanks,

tglx






Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles

2020-05-07 Thread Josh Poimboeuf
On Thu, May 07, 2020 at 07:38:09PM +0200, Peter Zijlstra wrote:
> On Thu, May 07, 2020 at 06:10:23PM +0200, Peter Zijlstra wrote:
> > Thomas would very much like objtool to understand and generate correct
> > ORC unwind information for the minimal stack swizzle sequence:
> > 
> > mov %rsp, (%[ts])
> > mov %[ts], %rsp
> > ...
> > pop %rsp
> > 
> > This sequence works for the fp and guess unwinders -- all they need is
> > that top-of-stack link set up by the first instruction.
> > 
> > The previous entry_64.S code worked with "UNWIND_HINT_REGS indirect=1"
> > hints to inform the unwinder about the stack-swizzle, but because
> > we've now already entered C, we can no longer point to a REGS. In
> > fact, due to being in C we don't even have a reliable sp_offset to
> > anything.
> > 
> > None of the existing UNWIND_HINT() functionality is quite sufficient
> > to generate the right thing, but SP_INDIRECT is still the closest, so
> > extend it.
> > 
> > When SP_INDIRECT is combined with .end=1 (which is otherwise unused,
> > except for sp_reg == UNDEFINED):
> > 
> >  - change it from (sp+sp_offset) to (sp)+sp_offset
> >  - have objtool preserve sp_offset from the previous state
> >  - change "pop %rsp" handling to restore the CFI state from before the
> >hint.
> > 
> > NOTES:
> > 
> >  - We now have an instruction with stackops and a hint; make hint take
> >precedence over stackops.
> > 
> >  - Due to the reverse search in "pop %rsp" we must
> >fill_alternative_cfi() before validate_branch().
> > 
> >  - This all isn't really pretty, but it works and gets Thomas the code
> >sequence he wants.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> 
> Much simpler, also works.

Doing the stack switch in inline asm is just nasty.

Also, a frame pointer makes this SO much easier for ORC/objtool, no
funky hints needed.  In fact maybe we can get rid of the indirect hint
things altogether, which means even more deleted code.

This is much cleaner, and it boots:

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 3f9b2555e6fb..4a25f72f969f 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -718,15 +718,6 @@ __visible void __xen_pv_evtchn_do_upcall(void)
irq_exit_rcu();
 }
 
-/*
- * Separate function as objtool is unhappy about having
- * the macro at the call site.
- */
-static noinstr void run_on_irqstack(void)
-{
-   RUN_ON_IRQSTACK(__xen_pv_evtchn_do_upcall);
-}
-
 __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
 {
struct pt_regs *old_regs;
@@ -739,7 +730,7 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct 
pt_regs *regs)
__xen_pv_evtchn_do_upcall();
instr_end();
} else {
-   run_on_irqstack();
+   RUN_ON_IRQSTACK(__xen_pv_evtchn_do_upcall);
}
 
set_irq_regs(old_regs);
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3046dfc69b8c..d036dc831a23 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1295,3 +1295,31 @@ SYM_CODE_START(rewind_stack_do_exit)
calldo_exit
 SYM_CODE_END(rewind_stack_do_exit)
 .popsection
+
+/*
+ * rdi: new stack pointer
+ * rsi: function pointer
+ * rdx: arg1 (can be NULL if none)
+ */
+SYM_FUNC_START(call_on_stack)
+   /*
+* Save the frame pointer unconditionally.  This allows the ORC
+* unwinder to handle the stack switch.
+*/
+   pushq   %rbp
+   mov %rsp, %rbp
+
+/*
+ * The unwinder relies on the word at the end of the new stack page
+ * linking back to the previous RSP.
+*/
+   mov %rsp, -8(%rdi)
+   lea -8(%rdi), %rsp
+   mov %rdx, %rdi
+   CALL_NOSPEC rsi
+
+   /* Restore the previous stack pointer from RBP. */
+   leaveq
+
+   ret
+SYM_FUNC_END(call_on_stack)
diff --git a/arch/x86/include/asm/irq_stack.h b/arch/x86/include/asm/irq_stack.h
index f307d4c27f84..131a6c689b1c 100644
--- a/arch/x86/include/asm/irq_stack.h
+++ b/arch/x86/include/asm/irq_stack.h
@@ -7,42 +7,26 @@
 #include 
 
 #ifdef CONFIG_X86_64
+
+void call_on_stack(void *stack, void *func, void *arg);
+
 static __always_inline bool irqstack_active(void)
 {
return __this_cpu_read(irq_count) != -1;
 }
 
-#define __RUN_ON_IRQSTACK(_asm, ...)   \
+#define __RUN_ON_IRQSTACK(func, arg)   \
 do {   \
-   unsigned long tos;  \
-   \
lockdep_assert_irqs_disabled(); \
-   \
-   tos = ((unsigned long)__this_cpu_read(hardirq_stack_ptr)) - 8;  \
-   

Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles

2020-05-07 Thread Peter Zijlstra
On Thu, May 07, 2020 at 06:10:23PM +0200, Peter Zijlstra wrote:
> Thomas would very much like objtool to understand and generate correct
> ORC unwind information for the minimal stack swizzle sequence:
> 
>   mov %rsp, (%[ts])
>   mov %[ts], %rsp
>   ...
>   pop %rsp
> 
> This sequence works for the fp and guess unwinders -- all they need is
> that top-of-stack link set up by the first instruction.
> 
> The previous entry_64.S code worked with "UNWIND_HINT_REGS indirect=1"
> hints to inform the unwinder about the stack-swizzle, but because
> we've now already entered C, we can no longer point to a REGS. In
> fact, due to being in C we don't even have a reliable sp_offset to
> anything.
> 
> None of the existing UNWIND_HINT() functionality is quite sufficient
> to generate the right thing, but SP_INDIRECT is still the closest, so
> extend it.
> 
> When SP_INDIRECT is combined with .end=1 (which is otherwise unused,
> except for sp_reg == UNDEFINED):
> 
>  - change it from (sp+sp_offset) to (sp)+sp_offset
>  - have objtool preserve sp_offset from the previous state
>  - change "pop %rsp" handling to restore the CFI state from before the
>hint.
> 
> NOTES:
> 
>  - We now have an instruction with stackops and a hint; make hint take
>precedence over stackops.
> 
>  - Due to the reverse search in "pop %rsp" we must
>fill_alternative_cfi() before validate_branch().
> 
>  - This all isn't really pretty, but it works and gets Thomas the code
>sequence he wants.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---

Much simpler, also works.

---
--- a/arch/x86/include/asm/irq_stack.h
+++ b/arch/x86/include/asm/irq_stack.h
@@ -23,6 +23,7 @@ do {  
\
__this_cpu_add(irq_count, 1);   \
asm volatile(   \
"movq   %%rsp, (%[ts])  \n" \
+   UNWIND_HINT_INDIRECT\
"movq   %[ts], %%rsp\n" \
ASM_INSTR_BEGIN \
_asm "  \n" \
--- a/arch/x86/include/asm/unwind_hints.h
+++ b/arch/x86/include/asm/unwind_hints.h
@@ -95,6 +95,47 @@
UNWIND_HINT type=UNWIND_HINT_TYPE_RET_OFFSET sp_offset=\sp_offset
 .endm
 
+#else
+
+#define UNWIND_HINT(sp_reg, sp_offset, type, end)  \
+   "987: \n\t" \
+   ".pushsection .discard.unwind_hints\n\t"\
+   /* struct unwind_hint */\
+   ".long 987b - .\n\t"\
+   ".short " __stringify(sp_offset) "\n\t" \
+   ".byte " __stringify(sp_reg) "\n\t" \
+   ".byte " __stringify(type) "\n\t"   \
+   ".byte " __stringify(end) "\n\t"\
+   ".balign 4 \n\t"\
+   ".popsection\n\t"
+
+/*
+ * Stack swizzling vs objtool/ORC:
+ *
+ * The canonical way of swizzling stack is:
+ *
+ * 1:  mov %%rsp, (%[ts])
+ * 2:  mov %[ts], %%rsp
+ * ...
+ * 3:  pop %%rsp
+ *
+ * Where:
+ *
+ * 1 - places a pointer to the previous stack at the top of the new stack;
+ * also see the unwinders.
+ *
+ * 2 - switches to the new stack, but to avoid hitting the CFA_UNDEFINED case,
+ * we need to tell objtool the stack pointer can be found at (%%rsp),
+ * UNWIND_HINT_INDIRECT does so.
+ *
+ * 3 - restores the previous stack by popping the value stored by 1 into %%rsp,
+ * 
+ *
+ * See arch/x86/include/asm/irq_stack.h
+ */
+#define UNWIND_HINT_INDIRECT \
+   UNWIND_HINT(ORC_REG_SP_INDIRECT, 0, ORC_TYPE_CALL, 0)
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_UNWIND_HINTS_H */
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -435,12 +435,12 @@ bool unwind_next_frame(struct unwind_sta
break;
 
case ORC_REG_SP_INDIRECT:
-   sp = state->sp + orc->sp_offset;
+   sp = state->sp;
indirect = true;
break;
 
case ORC_REG_BP_INDIRECT:
-   sp = state->bp + orc->sp_offset;
+   sp = state->bp;
indirect = true;
break;
 
@@ -489,6 +489,8 @@ bool unwind_next_frame(struct unwind_sta
if (indirect) {
if (!deref_stack_reg(state, sp, ))
goto err;
+
+   sp += orc->sp_offset;
}
 
/* Find IP, SP and possibly regs: */
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1720,8 +1720,7 @@ static void restore_reg(struct cfi_state
  *   41 5d pop%r13
  *   c3retq
  */
-static int update_cfi_state(struct