Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles
On Fri, May 08, 2020 at 02:26:32PM +0200, Thomas Gleixner wrote: > Peter Zijlstra writes: > > > On Thu, May 07, 2020 at 11:24:49PM +0200, Thomas Gleixner wrote: > >> But over our IRC conversation I came up with a 3rd variant: > >> > >> For most of the vectors the indirect call overhead is just noise, so > >> we can run them through the ASM switcher, but for the resched IPI > >> we can just use a separate direct call stub in ASM. > > > > Are we sure the rat-poison crap is noise for all the other system > > vectors? I suppose it is for most since they'll do indirect calls > > themselves anyway, right? > > We have different categories: > > 1) Uninteresting > > SPURIOUS_APIC_VECTOR, ERROR_APIC_VECTOR, THERMAL_APIC_VECTOR, > THRESHOLD_APIC_VECTOR, REBOOT_VECTOR, DEFERRED_ERROR_VECTOR > > 2) Indirect call poisoned > > LOCAL_TIMER_VECTOR > X86_PLATFORM_IPI_VECTOR > IRQ_WORK_VECTOR > HYPERV_STIMER0_VECTOR > HYPERVISOR_CALLBACK_VECTOR > POSTED_INTERRUPT_WAKEUP_VECTOR. > CALL_FUNCTION_VECTOR > CALL_FUNCTION_SINGLE_VECTOR > > 3) Quick > > RESCHEDULE_VECTOR > > POSTED_INTR_VECTOR > POSTED_INTR_NESTED_VECTOR > > These two postit ones are weird because they are both empty and > just increment different irq counts. > > HYPERV_REENLIGHTENMENT_VECTOR > > schedules delayed work, i,e. arms a timer which should be > straight forward, but does it matter? > > 4) Others > > UV_BAU_MESSAGE - The TLB flushes are probably more expensive than > ratpoutine > > Hmm? As we just agreed on IRC, 3) can run without changing stack, and then the rest can use the indirect thing.
Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles
Thomas Gleixner writes: > Peter Zijlstra writes: >> Are we sure the rat-poison crap is noise for all the other system >> vectors? I suppose it is for most since they'll do indirect calls >> themselves anyway, right? > > 3) Quick > > RESCHEDULE_VECTOR > > POSTED_INTR_VECTOR > POSTED_INTR_NESTED_VECTOR > > These two postit ones are weird because they are both empty and > just increment different irq counts. For those 3 it's also pointless to run them on IST stack at all. > HYPERV_REENLIGHTENMENT_VECTOR > > schedules delayed work, i,e. arms a timer which should be > straight forward, but does it matter? This one shouldn't have an issue when running on task stack either, but we can run it through the regular indirect path for now and switch it over when it matters performance wise. Thanks, tglx
Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles
Peter Zijlstra writes: > On Thu, May 07, 2020 at 11:24:49PM +0200, Thomas Gleixner wrote: >> But over our IRC conversation I came up with a 3rd variant: >> >> For most of the vectors the indirect call overhead is just noise, so >> we can run them through the ASM switcher, but for the resched IPI >> we can just use a separate direct call stub in ASM. > > Are we sure the rat-poison crap is noise for all the other system > vectors? I suppose it is for most since they'll do indirect calls > themselves anyway, right? We have different categories: 1) Uninteresting SPURIOUS_APIC_VECTOR, ERROR_APIC_VECTOR, THERMAL_APIC_VECTOR, THRESHOLD_APIC_VECTOR, REBOOT_VECTOR, DEFERRED_ERROR_VECTOR 2) Indirect call poisoned LOCAL_TIMER_VECTOR X86_PLATFORM_IPI_VECTOR IRQ_WORK_VECTOR HYPERV_STIMER0_VECTOR HYPERVISOR_CALLBACK_VECTOR POSTED_INTERRUPT_WAKEUP_VECTOR. CALL_FUNCTION_VECTOR CALL_FUNCTION_SINGLE_VECTOR 3) Quick RESCHEDULE_VECTOR POSTED_INTR_VECTOR POSTED_INTR_NESTED_VECTOR These two postit ones are weird because they are both empty and just increment different irq counts. HYPERV_REENLIGHTENMENT_VECTOR schedules delayed work, i,e. arms a timer which should be straight forward, but does it matter? 4) Others UV_BAU_MESSAGE - The TLB flushes are probably more expensive than ratpoutine Hmm? Thanks, tglx
Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles
On Thu, May 07, 2020 at 11:24:49PM +0200, Thomas Gleixner wrote: > But over our IRC conversation I came up with a 3rd variant: > > For most of the vectors the indirect call overhead is just noise, so > we can run them through the ASM switcher, but for the resched IPI > we can just use a separate direct call stub in ASM. Are we sure the rat-poison crap is noise for all the other system vectors? I suppose it is for most since they'll do indirect calls themselves anyway, right? > I can live with that. I might have to pay up for Peter's headaches to > teach objtool, but that's a different story. Let me check how many beers > he owes me first ... We're going to be so massively drunk if we ever settle that :-) For now I'll just have to live with knowing more about the unwinders.
Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles
Josh Poimboeuf writes: > On Thu, May 07, 2020 at 07:38:09PM +0200, Peter Zijlstra wrote: >> On Thu, May 07, 2020 at 06:10:23PM +0200, Peter Zijlstra wrote: >> Much simpler, also works. > > Doing the stack switch in inline asm is just nasty. Matter of personal preference :) > Also, a frame pointer makes this SO much easier for ORC/objtool, no > funky hints needed. In fact maybe we can get rid of the indirect hint > things altogether, which means even more deleted code. > > This is much cleaner, and it boots: Well, yes. I thought about that and it clearly works fine. There is a downside because it adds the indirect call to all system vectors. The device interrupt part is fine, it has an indirect call already, but for system vectors and in particular the rescheduling IPI its bad. Not bad because of the indirect call per se, but bad because of the ratpoutine mitigation mess. My other alternative was to emit switcheroo into ASM for each system vector. Not pretty either But over our IRC conversation I came up with a 3rd variant: For most of the vectors the indirect call overhead is just noise, so we can run them through the ASM switcher, but for the resched IPI we can just use a separate direct call stub in ASM. I can live with that. I might have to pay up for Peter's headaches to teach objtool, but that's a different story. Let me check how many beers he owes me first ... Thanks, tglx
Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles
On Thu, May 07, 2020 at 07:38:09PM +0200, Peter Zijlstra wrote: > On Thu, May 07, 2020 at 06:10:23PM +0200, Peter Zijlstra wrote: > > Thomas would very much like objtool to understand and generate correct > > ORC unwind information for the minimal stack swizzle sequence: > > > > mov %rsp, (%[ts]) > > mov %[ts], %rsp > > ... > > pop %rsp > > > > This sequence works for the fp and guess unwinders -- all they need is > > that top-of-stack link set up by the first instruction. > > > > The previous entry_64.S code worked with "UNWIND_HINT_REGS indirect=1" > > hints to inform the unwinder about the stack-swizzle, but because > > we've now already entered C, we can no longer point to a REGS. In > > fact, due to being in C we don't even have a reliable sp_offset to > > anything. > > > > None of the existing UNWIND_HINT() functionality is quite sufficient > > to generate the right thing, but SP_INDIRECT is still the closest, so > > extend it. > > > > When SP_INDIRECT is combined with .end=1 (which is otherwise unused, > > except for sp_reg == UNDEFINED): > > > > - change it from (sp+sp_offset) to (sp)+sp_offset > > - have objtool preserve sp_offset from the previous state > > - change "pop %rsp" handling to restore the CFI state from before the > >hint. > > > > NOTES: > > > > - We now have an instruction with stackops and a hint; make hint take > >precedence over stackops. > > > > - Due to the reverse search in "pop %rsp" we must > >fill_alternative_cfi() before validate_branch(). > > > > - This all isn't really pretty, but it works and gets Thomas the code > >sequence he wants. > > > > Signed-off-by: Peter Zijlstra (Intel) > > --- > > Much simpler, also works. Doing the stack switch in inline asm is just nasty. Also, a frame pointer makes this SO much easier for ORC/objtool, no funky hints needed. In fact maybe we can get rid of the indirect hint things altogether, which means even more deleted code. This is much cleaner, and it boots: diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 3f9b2555e6fb..4a25f72f969f 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -718,15 +718,6 @@ __visible void __xen_pv_evtchn_do_upcall(void) irq_exit_rcu(); } -/* - * Separate function as objtool is unhappy about having - * the macro at the call site. - */ -static noinstr void run_on_irqstack(void) -{ - RUN_ON_IRQSTACK(__xen_pv_evtchn_do_upcall); -} - __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs) { struct pt_regs *old_regs; @@ -739,7 +730,7 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs) __xen_pv_evtchn_do_upcall(); instr_end(); } else { - run_on_irqstack(); + RUN_ON_IRQSTACK(__xen_pv_evtchn_do_upcall); } set_irq_regs(old_regs); diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 3046dfc69b8c..d036dc831a23 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1295,3 +1295,31 @@ SYM_CODE_START(rewind_stack_do_exit) calldo_exit SYM_CODE_END(rewind_stack_do_exit) .popsection + +/* + * rdi: new stack pointer + * rsi: function pointer + * rdx: arg1 (can be NULL if none) + */ +SYM_FUNC_START(call_on_stack) + /* +* Save the frame pointer unconditionally. This allows the ORC +* unwinder to handle the stack switch. +*/ + pushq %rbp + mov %rsp, %rbp + +/* + * The unwinder relies on the word at the end of the new stack page + * linking back to the previous RSP. +*/ + mov %rsp, -8(%rdi) + lea -8(%rdi), %rsp + mov %rdx, %rdi + CALL_NOSPEC rsi + + /* Restore the previous stack pointer from RBP. */ + leaveq + + ret +SYM_FUNC_END(call_on_stack) diff --git a/arch/x86/include/asm/irq_stack.h b/arch/x86/include/asm/irq_stack.h index f307d4c27f84..131a6c689b1c 100644 --- a/arch/x86/include/asm/irq_stack.h +++ b/arch/x86/include/asm/irq_stack.h @@ -7,42 +7,26 @@ #include #ifdef CONFIG_X86_64 + +void call_on_stack(void *stack, void *func, void *arg); + static __always_inline bool irqstack_active(void) { return __this_cpu_read(irq_count) != -1; } -#define __RUN_ON_IRQSTACK(_asm, ...) \ +#define __RUN_ON_IRQSTACK(func, arg) \ do { \ - unsigned long tos; \ - \ lockdep_assert_irqs_disabled(); \ - \ - tos = ((unsigned long)__this_cpu_read(hardirq_stack_ptr)) - 8; \ -
Re: [RFC][PATCH 3/3] x86/entry, ORC: Teach objtool/unwind_orc about stack irq swizzles
On Thu, May 07, 2020 at 06:10:23PM +0200, Peter Zijlstra wrote: > Thomas would very much like objtool to understand and generate correct > ORC unwind information for the minimal stack swizzle sequence: > > mov %rsp, (%[ts]) > mov %[ts], %rsp > ... > pop %rsp > > This sequence works for the fp and guess unwinders -- all they need is > that top-of-stack link set up by the first instruction. > > The previous entry_64.S code worked with "UNWIND_HINT_REGS indirect=1" > hints to inform the unwinder about the stack-swizzle, but because > we've now already entered C, we can no longer point to a REGS. In > fact, due to being in C we don't even have a reliable sp_offset to > anything. > > None of the existing UNWIND_HINT() functionality is quite sufficient > to generate the right thing, but SP_INDIRECT is still the closest, so > extend it. > > When SP_INDIRECT is combined with .end=1 (which is otherwise unused, > except for sp_reg == UNDEFINED): > > - change it from (sp+sp_offset) to (sp)+sp_offset > - have objtool preserve sp_offset from the previous state > - change "pop %rsp" handling to restore the CFI state from before the >hint. > > NOTES: > > - We now have an instruction with stackops and a hint; make hint take >precedence over stackops. > > - Due to the reverse search in "pop %rsp" we must >fill_alternative_cfi() before validate_branch(). > > - This all isn't really pretty, but it works and gets Thomas the code >sequence he wants. > > Signed-off-by: Peter Zijlstra (Intel) > --- Much simpler, also works. --- --- a/arch/x86/include/asm/irq_stack.h +++ b/arch/x86/include/asm/irq_stack.h @@ -23,6 +23,7 @@ do { \ __this_cpu_add(irq_count, 1); \ asm volatile( \ "movq %%rsp, (%[ts]) \n" \ + UNWIND_HINT_INDIRECT\ "movq %[ts], %%rsp\n" \ ASM_INSTR_BEGIN \ _asm " \n" \ --- a/arch/x86/include/asm/unwind_hints.h +++ b/arch/x86/include/asm/unwind_hints.h @@ -95,6 +95,47 @@ UNWIND_HINT type=UNWIND_HINT_TYPE_RET_OFFSET sp_offset=\sp_offset .endm +#else + +#define UNWIND_HINT(sp_reg, sp_offset, type, end) \ + "987: \n\t" \ + ".pushsection .discard.unwind_hints\n\t"\ + /* struct unwind_hint */\ + ".long 987b - .\n\t"\ + ".short " __stringify(sp_offset) "\n\t" \ + ".byte " __stringify(sp_reg) "\n\t" \ + ".byte " __stringify(type) "\n\t" \ + ".byte " __stringify(end) "\n\t"\ + ".balign 4 \n\t"\ + ".popsection\n\t" + +/* + * Stack swizzling vs objtool/ORC: + * + * The canonical way of swizzling stack is: + * + * 1: mov %%rsp, (%[ts]) + * 2: mov %[ts], %%rsp + * ... + * 3: pop %%rsp + * + * Where: + * + * 1 - places a pointer to the previous stack at the top of the new stack; + * also see the unwinders. + * + * 2 - switches to the new stack, but to avoid hitting the CFA_UNDEFINED case, + * we need to tell objtool the stack pointer can be found at (%%rsp), + * UNWIND_HINT_INDIRECT does so. + * + * 3 - restores the previous stack by popping the value stored by 1 into %%rsp, + * + * + * See arch/x86/include/asm/irq_stack.h + */ +#define UNWIND_HINT_INDIRECT \ + UNWIND_HINT(ORC_REG_SP_INDIRECT, 0, ORC_TYPE_CALL, 0) + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_UNWIND_HINTS_H */ --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -435,12 +435,12 @@ bool unwind_next_frame(struct unwind_sta break; case ORC_REG_SP_INDIRECT: - sp = state->sp + orc->sp_offset; + sp = state->sp; indirect = true; break; case ORC_REG_BP_INDIRECT: - sp = state->bp + orc->sp_offset; + sp = state->bp; indirect = true; break; @@ -489,6 +489,8 @@ bool unwind_next_frame(struct unwind_sta if (indirect) { if (!deref_stack_reg(state, sp, )) goto err; + + sp += orc->sp_offset; } /* Find IP, SP and possibly regs: */ --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1720,8 +1720,7 @@ static void restore_reg(struct cfi_state * 41 5d pop%r13 * c3retq */ -static int update_cfi_state(struct