Re: [PATCH] KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()
Hi Alex, > On 9 May 2022, at 21:23, Alexander Graf wrote: > > Commit 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C") > moved the switch_mmu_context() to C. While in principle a good idea, it > meant that the function now uses the stack. The stack is not accessible > from real mode though. > > So to keep calling the function, let's turn on MSR_DR while we call it. > That way, all pointer references to the stack are handled virtually. > > Reported-by: Matt Evans > Fixes: 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C") > Signed-off-by: Alexander Graf > Cc: sta...@vger.kernel.org Many thanks - this addresses the issue I saw, and has been... Tested-by: Matt Evans ...on a G4 host. One comment though: > — > arch/powerpc/kvm/book3s_32_sr.S | 20 +++- > 1 file changed, 15 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_32_sr.S b/arch/powerpc/kvm/book3s_32_sr.S > index e3ab9df6cf19..bd4f798f7a46 100644 > --- a/arch/powerpc/kvm/book3s_32_sr.S > +++ b/arch/powerpc/kvm/book3s_32_sr.S > @@ -122,11 +122,21 @@ > > /* 0x0 - 0xb */ > > - /* 'current->mm' needs to be in r4 */ > - tophys(r4, r2) > - lwz r4, MM(r4) > - tophys(r4, r4) > - /* This only clobbers r0, r3, r4 and r5 */ > + /* switch_mmu_context() needs paging, let's enable it */ > + mfmsr r9 > + ori r11, r9, MSR_DR > + mtmsr r11 > + sync > + > + /* Calling switch_mmu_context(, current->mm, ); */ > + lwz r4, MM(r2) > bl switch_mmu_context Of the volatile registers, I believe r12 is still valuable here and would need to be preserved. (I can’t spot any others but would defer to your judgement here.) For example: diff --git a/arch/powerpc/kvm/book3s_32_sr.S b/arch/powerpc/kvm/book3s_32_sr.S index e3ab9df6cf19..41fc9ca12d38 100644 --- a/arch/powerpc/kvm/book3s_32_sr.S +++ b/arch/powerpc/kvm/book3s_32_sr.S @@ -122,11 +122,23 @@ /* 0x0 - 0xb */ - /* 'current->mm' needs to be in r4 */ - tophys(r4, r2) - lwz r4, MM(r4) - tophys(r4, r4) - /* This only clobbers r0, r3, r4 and r5 */ + /* switch_mmu_context() needs paging, let's enable it */ + mfmsr r9 + ori r11, r9, MSR_DR + mtmsr r11 + sync + + SAVE_GPR(12, r1) + /* Calling switch_mmu_context(, current->mm, ); */ + lwz r4, MM(r2) bl switch_mmu_context + REST_GPR(12, r1) + + /* Disable paging again */ + mfmsr r9 + li r6, MSR_DR + andcr9, r9, r6 + mtmsr r9 + sync .endm Matt > > + /* Disable paging again */ > + mfmsr r9 > + li r6, MSR_DR > + andcr9, r9, r6 > + mtmsr r9 > + sync > + > .endm > -- > 2.28.0.394.ge197136389 > > > > > Amazon Development Center Germany GmbH > Krausenstr. 38 > 10117 Berlin > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss > Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B > Sitz: Berlin > Ust-ID: DE 289 237 879 > > >
Re: [PATCH] powerpc: Clear branch trap (MSR.BE) before delivering SIGTRAP
Hi Michael, > On 28 Mar 2018, at 11:36, Matt Evans <m...@ozlabs.org> wrote: > > Howdy Michael, > >> On 28 Mar 2018, at 06:54, Michael Ellerman <m...@ellerman.id.au> wrote: >> >> Matt Evans <m...@ozlabs.org> writes: >> >>> When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the >>> user context when single_step_exception() prepares the SIGTRAP >>> delivery. The resulting branch-trap-within-the-SIGTRAP-handler >>> isn't healthy. >>> >>> Commit 2538c2d08f46141550a1e68819efa8fe31c6e3dc broke this, by >>> replacing an MSR mask operation of ~(MSR_SE | MSR_BE) with a call >>> to clear_single_step() which only clears MSR_SE. >>> >>> This patch adds a new helper, clear_br_trace(), which clears the >>> debug trap before invoking the signal handler. This helper is a >>> NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE. >>> >>> Signed-off-by: Matt Evans <m...@ozlabs.org> >> >> Hi Matt! >> >> It seems we might not be regularly testing this code :} > > I know, rite? ;-) > >> How did you hit/find the bug? And do you have a test case by any chance? >> >> I found the test code at the bottom of: >> https://lwn.net/Articles/114587/ >> >> But it didn't immediately work. > > I'm using this feature as part of a debug harness I wrote to log a program’s > control flow (to create a “known good” pattern to compare a PPC interpreter > against). So at least the feature has /one/ user. ;-) > > The symptoms of the bug are that if you use single-stepping you get a > sequence of SIGTRAPs representing each instruction completion (good), but if > you use branch tracing the process just dies with SIGTRAP (looks like it’s > never caught by the signal handler). What’s really happening is that there > /is/ a signal delivered to the handler, but (because branch tracing is left > on) that then causes a second debug exception from the handler itself, i.e. > whilst SIGTRAP’s masked. > > OK, let me have a dig to reduce my program to something very basic and I’ll > post something — sorry, I should’ve got a PoC ready before. (I did start out > inspired by that post you linked to, but IIRC I don’t think it worked out of > the box for me either.) I’ve put a simple SIG_DBG_BRANCH_TRACING test program here: http://ozlabs.org/~matt/files/sig_dbg_brtrace_test.c It’s commented regarding expected output. I’ve only tested this on a G4 — it should work on PPC64 too but the ISA says support for branch tracing is optional for an implementation. I’d be interested in what POWERx does. :) Cheers, Matt
Re: [PATCH] powerpc: Clear branch trap (MSR.BE) before delivering SIGTRAP
Howdy Michael, > On 28 Mar 2018, at 06:54, Michael Ellerman <m...@ellerman.id.au> wrote: > > Matt Evans <m...@ozlabs.org> writes: > >> When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the >> user context when single_step_exception() prepares the SIGTRAP >> delivery. The resulting branch-trap-within-the-SIGTRAP-handler >> isn't healthy. >> >> Commit 2538c2d08f46141550a1e68819efa8fe31c6e3dc broke this, by >> replacing an MSR mask operation of ~(MSR_SE | MSR_BE) with a call >> to clear_single_step() which only clears MSR_SE. >> >> This patch adds a new helper, clear_br_trace(), which clears the >> debug trap before invoking the signal handler. This helper is a >> NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE. >> >> Signed-off-by: Matt Evans <m...@ozlabs.org> > > Hi Matt! > > It seems we might not be regularly testing this code :} I know, rite? ;-) > How did you hit/find the bug? And do you have a test case by any chance? > > I found the test code at the bottom of: > https://lwn.net/Articles/114587/ > > But it didn't immediately work. I'm using this feature as part of a debug harness I wrote to log a program’s control flow (to create a “known good” pattern to compare a PPC interpreter against). So at least the feature has /one/ user. ;-) The symptoms of the bug are that if you use single-stepping you get a sequence of SIGTRAPs representing each instruction completion (good), but if you use branch tracing the process just dies with SIGTRAP (looks like it’s never caught by the signal handler). What’s really happening is that there /is/ a signal delivered to the handler, but (because branch tracing is left on) that then causes a second debug exception from the handler itself, i.e. whilst SIGTRAP’s masked. OK, let me have a dig to reduce my program to something very basic and I’ll post something — sorry, I should’ve got a PoC ready before. (I did start out inspired by that post you linked to, but IIRC I don’t think it worked out of the box for me either.) Cheers, Matt
[PATCH] powerpc: Clear branch trap (MSR.BE) before delivering SIGTRAP
When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the user context when single_step_exception() prepares the SIGTRAP delivery. The resulting branch-trap-within-the-SIGTRAP-handler isn't healthy. Commit 2538c2d08f46141550a1e68819efa8fe31c6e3dc broke this, by replacing an MSR mask operation of ~(MSR_SE | MSR_BE) with a call to clear_single_step() which only clears MSR_SE. This patch adds a new helper, clear_br_trace(), which clears the debug trap before invoking the signal handler. This helper is a NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE. Signed-off-by: Matt Evans <m...@ozlabs.org> --- arch/powerpc/kernel/traps.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 1e48d157196a..5eaab234e747 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -460,7 +460,7 @@ static inline int check_io_access(struct pt_regs *regs) /* single-step stuff */ #define single_stepping(regs) (current->thread.debug.dbcr0 & DBCR0_IC) #define clear_single_step(regs)(current->thread.debug.dbcr0 &= ~DBCR0_IC) - +#define clear_br_trace(regs) do {} while(0) #else /* On non-4xx, the reason for the machine check or program exception is in the MSR. */ @@ -473,6 +473,7 @@ static inline int check_io_access(struct pt_regs *regs) #define single_stepping(regs) ((regs)->msr & MSR_SE) #define clear_single_step(regs)((regs)->msr &= ~MSR_SE) +#define clear_br_trace(regs) ((regs)->msr &= ~MSR_BE) #endif #if defined(CONFIG_E500) @@ -988,6 +989,7 @@ void single_step_exception(struct pt_regs *regs) enum ctx_state prev_state = exception_enter(); clear_single_step(regs); + clear_br_trace(regs); if (kprobe_post_handler(regs)) return; -- 2.14.1
Re: [RESEND PATCH 2/2] ppc: bpf_jit: support MOD operation
Hi Vladimir, On 21 Sep 2013, at 17:25, Vladimir Murzin murzi...@gmail.com wrote: commit b6069a9570 (filter: add MOD operation) added generic support for modulus operation in BPF. This patch brings JIT support for PPC64 Signed-off-by: Vladimir Murzin murzi...@gmail.com Acked-by: Matt Evans m...@ozlabs.org Not this version, though; see below. --- arch/powerpc/net/bpf_jit_comp.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index bf56e33..96f24dc 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -193,6 +193,28 @@ static int bpf_jit_build_body(struct sk_filter *fp, u32 *image, PPC_MUL(r_A, r_A, r_scratch1); } break; +case BPF_S_ALU_MOD_X: /* A %= X; */ +ctx-seen |= SEEN_XREG; +PPC_CMPWI(r_X, 0); +if (ctx-pc_ret0 != -1) { +PPC_BCC(COND_EQ, addrs[ctx-pc_ret0]); +} else { +PPC_BCC_SHORT(COND_NE, (ctx-idx*4)+12); +PPC_LI(r_ret, 0); +PPC_JMP(exit_addr); +} +PPC_DIVWU(r_scratch1, r_A, r_X); +PPC_MUL(r_scratch1, r_X, r_scratch1); +PPC_SUB(r_A, r_A, r_scratch1); +break; +case BPF_S_ALU_MOD_K: /* A %= K; */ +#define r_scratch2 (r_scratch1 + 1) Old version of this patch, still? I had hoped that r_scratch2 would be defined in the header. +PPC_LI32(r_scratch2, K); +PPC_DIVWU(r_scratch1, r_A, r_scratch2); +PPC_MUL(r_scratch1, r_scratch2, r_scratch1); +PPC_SUB(r_A, r_A, r_scratch1); +#undef r_scratch2 And remember this guy too.. :) Matt +break; case BPF_S_ALU_DIV_X: /* A /= X; */ ctx-seen |= SEEN_XREG; PPC_CMPWI(r_X, 0); -- 1.8.1.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: net: filter: fix DIVWU instruction opcode
On 12 Sep 2013, at 10:02, Michael Neuling mi...@neuling.org wrote: Vladimir Murzin murzi...@gmail.com wrote: Currently DIVWU stands for *signed* divw opcode: 7d 2a 4b 96divwu r9,r10,r9 7d 2a 4b d6divwr9,r10,r9 Use the *unsigned* divw opcode for DIVWU. This looks like it's in only used in the BPF JIT code. Matt, any chance you an ACK/NACK this? Sure, that looks sensible, thanks Vladimir. Acked-by: Matt Evans m...@ozlabs.org Mikey Signed-off-by: Vladimir Murzin murzi...@gmail.com --- arch/powerpc/include/asm/ppc-opcode.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index d7fe9f5..c91842c 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -218,7 +218,7 @@ #define PPC_INST_MULLW0x7c0001d6 #define PPC_INST_MULHWU0x7c16 #define PPC_INST_MULLI0x1c00 -#define PPC_INST_DIVWU0x7c0003d6 +#define PPC_INST_DIVWU0x7c000396 #define PPC_INST_RLWINM0x5400 #define PPC_INST_RLDICR0x7804 #define PPC_INST_SLW0x7c30 -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ppc: bpf_jit: support MOD operation
Hi Ben, Vladimir, *dusts off very thick PPC cobwebs* Sorry for the delay as I'm travelling, didn't get to this until now. On 02/09/2013, at 9:45 PM, Benjamin Herrenschmidt wrote: On Mon, 2013-09-02 at 19:48 +0200, Vladimir Murzin wrote: Ping On Wed, Aug 28, 2013 at 02:49:52AM +0400, Vladimir Murzin wrote: commit b6069a9570 (filter: add MOD operation) added generic support for modulus operation in BPF. Sorry, nobody got a chance to review that yet. Unfortunately Matt doesn't work for us anymore and none of us has experience with the BPF code, so somebody (possibly me) will need to spend a bit of time figuring it out before verifying that is correct. Do you have a test case/suite by any chance ? Ben. This patch brings JIT support for PPC64 Signed-off-by: Vladimir Murzin murzi...@gmail.com --- arch/powerpc/net/bpf_jit_comp.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index bf56e33..96f24dc 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -193,6 +193,28 @@ static int bpf_jit_build_body(struct sk_filter *fp, u32 *image, PPC_MUL(r_A, r_A, r_scratch1); } break; + case BPF_S_ALU_MOD_X: /* A %= X; */ + ctx-seen |= SEEN_XREG; + PPC_CMPWI(r_X, 0); + if (ctx-pc_ret0 != -1) { + PPC_BCC(COND_EQ, addrs[ctx-pc_ret0]); + } else { + PPC_BCC_SHORT(COND_NE, (ctx-idx*4)+12); + PPC_LI(r_ret, 0); + PPC_JMP(exit_addr); + } + PPC_DIVWU(r_scratch1, r_A, r_X); + PPC_MUL(r_scratch1, r_X, r_scratch1); + PPC_SUB(r_A, r_A, r_scratch1); + break; Without having compiled tested this, it looks fine to me (especially with the corrected DIVWU opcode in the other patch, oops...). + case BPF_S_ALU_MOD_K: /* A %= K; */ +#define r_scratch2 (r_scratch1 + 1) + PPC_LI32(r_scratch2, K); + PPC_DIVWU(r_scratch1, r_A, r_scratch2); + PPC_MUL(r_scratch1, r_scratch2, r_scratch1); + PPC_SUB(r_A, r_A, r_scratch1); +#undef r_scratch2 + break; If you need another scratch register, it should really be defined in bpf_jit.h instead. Once you define r_scratch2 in there, Acked-by: Matt Evans m...@ozlabs.org Thanks! Matt case BPF_S_ALU_DIV_X: /* A /= X; */ ctx-seen |= SEEN_XREG; PPC_CMPWI(r_X, 0); -- 1.8.1.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] PPC: Add __SANE_USERSPACE_TYPES__ to asm/types.h for LL64
PPC64 uses long long for u64 in the kernel, but powerpc's asm/types.h prevents 64-bit userland from seeing this definition, instead defaulting to u64 == long in userspace. Some user programs (e.g. kvmtool) may actually want LL64, so this patch adds a check for __SANE_USERSPACE_TYPES__ so that, if defined, int-ll64.h is included instead. Signed-off-by: Matt Evans m...@ozlabs.org --- arch/powerpc/include/asm/types.h |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/types.h b/arch/powerpc/include/asm/types.h index 8947b98..d82e94e 100644 --- a/arch/powerpc/include/asm/types.h +++ b/arch/powerpc/include/asm/types.h @@ -5,8 +5,11 @@ * This is here because we used to use l64 for 64bit powerpc * and we don't want to impact user mode with our change to ll64 * in the kernel. + * + * However, some user programs are fine with this. They can + * flag __SANE_USERSPACE_TYPES__ to get int-ll64.h here. */ -#if defined(__powerpc64__) !defined(__KERNEL__) +#if !defined(__SANE_USERSPACE_TYPES__) defined(__powerpc64__) !defined(__KERNEL__) # include asm-generic/int-l64.h #else # include asm-generic/int-ll64.h -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3] net: filter: BPF 'JIT' compiler for PPC64
An implementation of a code generator for BPF programs to speed up packet filtering on PPC64, inspired by Eric Dumazet's x86-64 version. Filter code is generated as an ABI-compliant function in module_alloc()'d mem with stackframe prologue/epilogue generated if required (simple filters don't need anything more than an li/blr). The filter's local variables, M[], live in registers. Supports all BPF opcodes, although complicated loads from negative packet offsets (e.g. SKF_LL_OFF) are not yet supported. There are a couple of further optimisations left for future work; many-pass assembly with branch-reach reduction and a register allocator to push M[] variables into volatile registers would improve the code quality further. This currently supports big-endian 64-bit PowerPC only (but is fairly simple to port to PPC32 or LE!). Enabled in the same way as x86-64: echo 1 /proc/sys/net/core/bpf_jit_enable Or, enabled with extra debug output: echo 2 /proc/sys/net/core/bpf_jit_enable Signed-off-by: Matt Evans m...@ozlabs.org --- V3: Added BUILD_BUG_ON to assert PACA CPU ID is 16bits, made a comment (in LD_MSH) a bit clearer, ratelimited Unknown opcode error and moved bpf_jit.S to bpf_jit_64.S (it doesn't make sense to rename bpf_jit_comp.c as small portions will eventually get split out into _32/_64.c files when we do 32bit support). arch/powerpc/Kconfig |1 + arch/powerpc/Makefile |3 +- arch/powerpc/include/asm/ppc-opcode.h | 40 ++ arch/powerpc/net/Makefile |4 + arch/powerpc/net/bpf_jit.h| 227 +++ arch/powerpc/net/bpf_jit_64.S | 138 +++ arch/powerpc/net/bpf_jit_comp.c | 694 + 7 files changed, 1106 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2729c66..39860fc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -134,6 +134,7 @@ config PPC select GENERIC_IRQ_SHOW_LEVEL select HAVE_RCU_TABLE_FREE if SMP select HAVE_SYSCALL_TRACEPOINTS + select HAVE_BPF_JIT if PPC64 config EARLY_PRINTK bool diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index b7212b6..b94740f 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -154,7 +154,8 @@ core-y += arch/powerpc/kernel/ \ arch/powerpc/lib/ \ arch/powerpc/sysdev/ \ arch/powerpc/platforms/ \ - arch/powerpc/math-emu/ + arch/powerpc/math-emu/ \ + arch/powerpc/net/ core-$(CONFIG_XMON)+= arch/powerpc/xmon/ core-$(CONFIG_KVM) += arch/powerpc/kvm/ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index e472659..e980faa 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -71,6 +71,42 @@ #define PPC_INST_ERATSX0x7c000126 #define PPC_INST_ERATSX_DOT0x7c000127 +/* Misc instructions for BPF compiler */ +#define PPC_INST_LD0xe800 +#define PPC_INST_LHZ 0xa000 +#define PPC_INST_LWZ 0x8000 +#define PPC_INST_STD 0xf800 +#define PPC_INST_STDU 0xf801 +#define PPC_INST_MFLR 0x7c0802a6 +#define PPC_INST_MTLR 0x7c0803a6 +#define PPC_INST_CMPWI 0x2c00 +#define PPC_INST_CMPDI 0x2c20 +#define PPC_INST_CMPLW 0x7c40 +#define PPC_INST_CMPLWI0x2800 +#define PPC_INST_ADDI 0x3800 +#define PPC_INST_ADDIS 0x3c00 +#define PPC_INST_ADD 0x7c000214 +#define PPC_INST_SUB 0x7c50 +#define PPC_INST_BLR 0x4e800020 +#define PPC_INST_BLRL 0x4e800021 +#define PPC_INST_MULLW 0x7c0001d6 +#define PPC_INST_MULHWU0x7c16 +#define PPC_INST_MULLI 0x1c00 +#define PPC_INST_DIVWU 0x7c0003d6 +#define PPC_INST_RLWINM0x5400 +#define PPC_INST_RLDICR0x7804 +#define PPC_INST_SLW 0x7c30 +#define PPC_INST_SRW 0x7c000430 +#define PPC_INST_AND 0x7c38 +#define PPC_INST_ANDDOT0x7c39 +#define PPC_INST_OR0x7c000378 +#define PPC_INST_ANDI 0x7000 +#define PPC_INST_ORI 0x6000 +#define PPC_INST_ORIS 0x6400 +#define PPC_INST_NEG 0x7cd0 +#define PPC_INST_BRANCH0x4800
Re: [PATCH v2] net: filter: BPF 'JIT' compiler for PPC64
On 19/07/11 16:59, Kumar Gala wrote: On Jul 18, 2011, at 9:13 PM, Matt Evans wrote: An implementation of a code generator for BPF programs to speed up packet filtering on PPC64, inspired by Eric Dumazet's x86-64 version. Filter code is generated as an ABI-compliant function in module_alloc()'d mem with stackframe prologue/epilogue generated if required (simple filters don't need anything more than an li/blr). The filter's local variables, M[], live in registers. Supports all BPF opcodes, although complicated loads from negative packet offsets (e.g. SKF_LL_OFF) are not yet supported. There are a couple of further optimisations left for future work; many-pass assembly with branch-reach reduction and a register allocator to push M[] variables into volatile registers would improve the code quality further. This currently supports big-endian 64-bit PowerPC only (but is fairly simple to port to PPC32 or LE!). Enabled in the same way as x86-64: echo 1 /proc/sys/net/core/bpf_jit_enable Or, enabled with extra debug output: echo 2 /proc/sys/net/core/bpf_jit_enable Signed-off-by: Matt Evans m...@ozlabs.org --- V2: Removed some cut/paste woe in setting SEEN_X even on writes. Merci for le review, Eric! arch/powerpc/Kconfig |1 + arch/powerpc/Makefile |3 +- arch/powerpc/include/asm/ppc-opcode.h | 40 ++ arch/powerpc/net/Makefile |4 + arch/powerpc/net/bpf_jit.S| 138 +++ can we rename to bpf_jit_64.S, since this doesn't work on PPC32. arch/powerpc/net/bpf_jit.h| 227 +++ arch/powerpc/net/bpf_jit_comp.c | 690 + same here, or split between bpf_jit_comp.c (shared between ppc32 ppc64) and bpf_jit_comp_64.c A reasonable suggestion -- bpf_jit_64.S certainly. I think it may not be worth splitting bpf_jit_comp.c until we support both tho? (I'm thinking bpf_jit_comp_{32,64}.c would just house the stackframe generation code which is the main difference, plus compile-time switched macros for the odd LD vs LWZ.) Sorry it's not 32bit-friendly just yet (I knew you'd ask, hehe), I've postponed that for when I get a mo :-) Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] net: filter: BPF 'JIT' compiler for PPC64
On 19/07/11 17:17, Kumar Gala wrote: On Jul 19, 2011, at 2:06 AM, Matt Evans wrote: On 19/07/11 16:59, Kumar Gala wrote: On Jul 18, 2011, at 9:13 PM, Matt Evans wrote: [snip] V2: Removed some cut/paste woe in setting SEEN_X even on writes. Merci for le review, Eric! arch/powerpc/Kconfig |1 + arch/powerpc/Makefile |3 +- arch/powerpc/include/asm/ppc-opcode.h | 40 ++ arch/powerpc/net/Makefile |4 + arch/powerpc/net/bpf_jit.S| 138 +++ can we rename to bpf_jit_64.S, since this doesn't work on PPC32. arch/powerpc/net/bpf_jit.h| 227 +++ arch/powerpc/net/bpf_jit_comp.c | 690 + same here, or split between bpf_jit_comp.c (shared between ppc32 ppc64) and bpf_jit_comp_64.c A reasonable suggestion -- bpf_jit_64.S certainly. I think it may not be worth splitting bpf_jit_comp.c until we support both tho? (I'm thinking bpf_jit_comp_{32,64}.c would just house the stackframe generation code which is the main difference, plus compile-time switched macros for the odd LD vs LWZ.) If its most 64-bit specific than just go with bpf_jit_comp_64.c for now. We can refactor later. Nah, other way round -- it's almost all agnostic but with a couple of functions that I was recommending moving out to a _64.c and _32.c later, leaving the bulk still in bpf_jit_comp.c. Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] net: filter: BPF 'JIT' compiler for PPC64
An implementation of a code generator for BPF programs to speed up packet filtering on PPC64, inspired by Eric Dumazet's x86-64 version. Filter code is generated as an ABI-compliant function in module_alloc()'d mem with stackframe prologue/epilogue generated if required (simple filters don't need anything more than an li/blr). The filter's local variables, M[], live in registers. Supports all BPF opcodes, although complicated loads from negative packet offsets (e.g. SKF_LL_OFF) are not yet supported. There are a couple of further optimisations left for future work; many-pass assembly with branch-reach reduction and a register allocator to push M[] variables into volatile registers would improve the code quality further. This currently supports big-endian 64-bit PowerPC only (but is fairly simple to port to PPC32 or LE!). Enabled in the same way as x86-64: echo 1 /proc/sys/net/core/bpf_jit_enable Or, enabled with extra debug output: echo 2 /proc/sys/net/core/bpf_jit_enable Signed-off-by: Matt Evans m...@ozlabs.org --- Since the RFC post, this has incorporated the bugfixes/tidies from review plus a couple more found in further testing, plus some general/comment tidies. arch/powerpc/Kconfig |1 + arch/powerpc/Makefile |3 +- arch/powerpc/include/asm/ppc-opcode.h | 40 ++ arch/powerpc/net/Makefile |4 + arch/powerpc/net/bpf_jit.S| 138 +++ arch/powerpc/net/bpf_jit.h| 227 +++ arch/powerpc/net/bpf_jit_comp.c | 694 + 7 files changed, 1106 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2729c66..39860fc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -134,6 +134,7 @@ config PPC select GENERIC_IRQ_SHOW_LEVEL select HAVE_RCU_TABLE_FREE if SMP select HAVE_SYSCALL_TRACEPOINTS + select HAVE_BPF_JIT if PPC64 config EARLY_PRINTK bool diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index b7212b6..b94740f 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -154,7 +154,8 @@ core-y += arch/powerpc/kernel/ \ arch/powerpc/lib/ \ arch/powerpc/sysdev/ \ arch/powerpc/platforms/ \ - arch/powerpc/math-emu/ + arch/powerpc/math-emu/ \ + arch/powerpc/net/ core-$(CONFIG_XMON)+= arch/powerpc/xmon/ core-$(CONFIG_KVM) += arch/powerpc/kvm/ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index e472659..e980faa 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -71,6 +71,42 @@ #define PPC_INST_ERATSX0x7c000126 #define PPC_INST_ERATSX_DOT0x7c000127 +/* Misc instructions for BPF compiler */ +#define PPC_INST_LD0xe800 +#define PPC_INST_LHZ 0xa000 +#define PPC_INST_LWZ 0x8000 +#define PPC_INST_STD 0xf800 +#define PPC_INST_STDU 0xf801 +#define PPC_INST_MFLR 0x7c0802a6 +#define PPC_INST_MTLR 0x7c0803a6 +#define PPC_INST_CMPWI 0x2c00 +#define PPC_INST_CMPDI 0x2c20 +#define PPC_INST_CMPLW 0x7c40 +#define PPC_INST_CMPLWI0x2800 +#define PPC_INST_ADDI 0x3800 +#define PPC_INST_ADDIS 0x3c00 +#define PPC_INST_ADD 0x7c000214 +#define PPC_INST_SUB 0x7c50 +#define PPC_INST_BLR 0x4e800020 +#define PPC_INST_BLRL 0x4e800021 +#define PPC_INST_MULLW 0x7c0001d6 +#define PPC_INST_MULHWU0x7c16 +#define PPC_INST_MULLI 0x1c00 +#define PPC_INST_DIVWU 0x7c0003d6 +#define PPC_INST_RLWINM0x5400 +#define PPC_INST_RLDICR0x7804 +#define PPC_INST_SLW 0x7c30 +#define PPC_INST_SRW 0x7c000430 +#define PPC_INST_AND 0x7c38 +#define PPC_INST_ANDDOT0x7c39 +#define PPC_INST_OR0x7c000378 +#define PPC_INST_ANDI 0x7000 +#define PPC_INST_ORI 0x6000 +#define PPC_INST_ORIS 0x6400 +#define PPC_INST_NEG 0x7cd0 +#define PPC_INST_BRANCH0x4800 +#define PPC_INST_BRANCH_COND 0x4080 + /* macros to insert fields into opcodes */ #define __PPC_RA(a)(((a) 0x1f) 16) #define __PPC_RB(b)(((b) 0x1f) 11
Re: [PATCH] net: filter: BPF 'JIT' compiler for PPC64
On 19/07/11 05:42, David Miller wrote: From: Eric Dumazet eric.duma...@gmail.com Date: Mon, 18 Jul 2011 10:39:35 +0200 So in PPC SEEN_XREG only is to be set of X is read, not if written. So you dont have to set SEEN_XREG bit in this part : Matt, do you want to integrate changes based upon Eric's feedback here or do you want me to apply your patch as-is for now? Thanks, but no worries; I will send a v2 in a sec. Eric's comments are spot-on, and there are a couple of other areas that the brainfart should really be polished out of, too. :-) Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] net: filter: BPF 'JIT' compiler for PPC64
An implementation of a code generator for BPF programs to speed up packet filtering on PPC64, inspired by Eric Dumazet's x86-64 version. Filter code is generated as an ABI-compliant function in module_alloc()'d mem with stackframe prologue/epilogue generated if required (simple filters don't need anything more than an li/blr). The filter's local variables, M[], live in registers. Supports all BPF opcodes, although complicated loads from negative packet offsets (e.g. SKF_LL_OFF) are not yet supported. There are a couple of further optimisations left for future work; many-pass assembly with branch-reach reduction and a register allocator to push M[] variables into volatile registers would improve the code quality further. This currently supports big-endian 64-bit PowerPC only (but is fairly simple to port to PPC32 or LE!). Enabled in the same way as x86-64: echo 1 /proc/sys/net/core/bpf_jit_enable Or, enabled with extra debug output: echo 2 /proc/sys/net/core/bpf_jit_enable Signed-off-by: Matt Evans m...@ozlabs.org --- V2: Removed some cut/paste woe in setting SEEN_X even on writes. Merci for le review, Eric! arch/powerpc/Kconfig |1 + arch/powerpc/Makefile |3 +- arch/powerpc/include/asm/ppc-opcode.h | 40 ++ arch/powerpc/net/Makefile |4 + arch/powerpc/net/bpf_jit.S| 138 +++ arch/powerpc/net/bpf_jit.h| 227 +++ arch/powerpc/net/bpf_jit_comp.c | 690 + 7 files changed, 1102 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2729c66..39860fc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -134,6 +134,7 @@ config PPC select GENERIC_IRQ_SHOW_LEVEL select HAVE_RCU_TABLE_FREE if SMP select HAVE_SYSCALL_TRACEPOINTS + select HAVE_BPF_JIT if PPC64 config EARLY_PRINTK bool diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index b7212b6..b94740f 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -154,7 +154,8 @@ core-y += arch/powerpc/kernel/ \ arch/powerpc/lib/ \ arch/powerpc/sysdev/ \ arch/powerpc/platforms/ \ - arch/powerpc/math-emu/ + arch/powerpc/math-emu/ \ + arch/powerpc/net/ core-$(CONFIG_XMON)+= arch/powerpc/xmon/ core-$(CONFIG_KVM) += arch/powerpc/kvm/ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index e472659..e980faa 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -71,6 +71,42 @@ #define PPC_INST_ERATSX0x7c000126 #define PPC_INST_ERATSX_DOT0x7c000127 +/* Misc instructions for BPF compiler */ +#define PPC_INST_LD0xe800 +#define PPC_INST_LHZ 0xa000 +#define PPC_INST_LWZ 0x8000 +#define PPC_INST_STD 0xf800 +#define PPC_INST_STDU 0xf801 +#define PPC_INST_MFLR 0x7c0802a6 +#define PPC_INST_MTLR 0x7c0803a6 +#define PPC_INST_CMPWI 0x2c00 +#define PPC_INST_CMPDI 0x2c20 +#define PPC_INST_CMPLW 0x7c40 +#define PPC_INST_CMPLWI0x2800 +#define PPC_INST_ADDI 0x3800 +#define PPC_INST_ADDIS 0x3c00 +#define PPC_INST_ADD 0x7c000214 +#define PPC_INST_SUB 0x7c50 +#define PPC_INST_BLR 0x4e800020 +#define PPC_INST_BLRL 0x4e800021 +#define PPC_INST_MULLW 0x7c0001d6 +#define PPC_INST_MULHWU0x7c16 +#define PPC_INST_MULLI 0x1c00 +#define PPC_INST_DIVWU 0x7c0003d6 +#define PPC_INST_RLWINM0x5400 +#define PPC_INST_RLDICR0x7804 +#define PPC_INST_SLW 0x7c30 +#define PPC_INST_SRW 0x7c000430 +#define PPC_INST_AND 0x7c38 +#define PPC_INST_ANDDOT0x7c39 +#define PPC_INST_OR0x7c000378 +#define PPC_INST_ANDI 0x7000 +#define PPC_INST_ORI 0x6000 +#define PPC_INST_ORIS 0x6400 +#define PPC_INST_NEG 0x7cd0 +#define PPC_INST_BRANCH0x4800 +#define PPC_INST_BRANCH_COND 0x4080 + /* macros to insert fields into opcodes */ #define __PPC_RA(a)(((a) 0x1f) 16) #define __PPC_RB(b)(((b) 0x1f) 11) @@ -83,6 +119,10 @@ #define __PPC_T_TLB(t) (((t) 0x3
Re: [RFC PATCH 1/1] BPF JIT for PPC64
On 25/06/11 11:58, Ben Hutchings wrote: On Fri, 2011-06-24 at 16:02 +1000, Matt Evans wrote: [...] +case BPF_S_ALU_ADD_K: /* A += K; */ +if (!K) +break; +if (K 32768) +PPC_ADDI(r_A, r_A, K); +else +PPC_ADDI(r_A, r_A, IMM_L(K)); +PPC_ADDIS(r_A, r_A, IMM_HA(K)); +break; Missing braces. +case BPF_S_ALU_SUB_X: /* A -= X; */ +ctx-seen |= SEEN_XREG; +PPC_SUB(r_A, r_A, r_X); +break; +case BPF_S_ALU_SUB_K: /* A -= K */ +if (!K) +break; +if (K 32768) +PPC_ADDI(r_A, r_A, -K); +else +PPC_ADDI(r_A, r_A, IMM_L(-K)); +PPC_ADDIS(r_A, r_A, IMM_HA(-K)); +break; [...] Here as well. Thanks, Ben -- oops! :) Really, just the ADDISes need to be conditional, too. Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 1/1] BPF JIT for PPC64
On 25/06/11 17:33, Andreas Schwab wrote: Matt Evans m...@ozlabs.org writes: +stdur1, -128(r1); \ +addir5, r1, 128+BPF_PPC_STACK_BASIC+(2*8); \ +addir1, r1, 128;\ +PPC_STD(r_M + i, 1, -128 + (8*i)); +PPC_LD(r_M + i, 1, -128 + (8*i)); s/128/BPF_PPC_STACK_SAVE/? Actually, that's a different 128, but that nicely illustrates that I should've #defined something more recognisable :-) The second set, with -128, is actually in the save area for non-volatile regs, whereas the first is just a stackframe size. Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 1/1] BPF JIT for PPC64
Hi Eric, On 25/06/11 17:49, Eric Dumazet wrote: Le samedi 25 juin 2011 à 09:33 +0200, Andreas Schwab a écrit : Matt Evans m...@ozlabs.org writes: + stdur1, -128(r1); \ + addir5, r1, 128+BPF_PPC_STACK_BASIC+(2*8); \ + addir1, r1, 128;\ + PPC_STD(r_M + i, 1, -128 + (8*i)); + PPC_LD(r_M + i, 1, -128 + (8*i)); s/128/BPF_PPC_STACK_SAVE/? I am not sure using registers to hold MEM[] is a win if MEM[idx] is used once in the filter # tcpdump tcp[20]+tcp[21]=0 -d (000) ldh [12] (001) jeq #0x800 jt 2 jf 15 (002) ldb [23] (003) jeq #0x6 jt 4 jf 15 (004) ldh [20] (005) jset #0x1fff jt 15 jf 6 (006) ldxb 4*([14]0xf) (007) ldb [x + 34] (008) st M[1] (009) ldb [x + 35] (010) tax (011) ld M[1] (012) add x (013) jeq #0x0 jt 14 jf 15 (014) ret #65535 (015) ret #0 In this sample, we use M[1] once ( one store, one load) So saving previous register content on stack in prologue, and restoring it in epilogue actually slow down the code, and adds two instructions in filter asm code. The x86 version generates all accesses of M[x] as a load or store to the stackframe, so your example would result in one store + one load to/from the stack. More than one access would result in N stores/loads. By having M[] live in r16-31 on PPC, there is a one-off cost of saving/restoring the (non-volatile) register to the stack plus a per-use cost of a register-register move (MR, which is pretty cheap compared to loads/stores!). You are correct in that, for a single store/load of M[1] above, I'll generate a STD, MR, MR, LD, but this extra cost of the two MRs is pretty small. With the current implementation the gains seen when accessing M[x] /more/ than once will IMHO more than justify this. (For several M[x] accesses, x86 would have several mem ops, PPC would have several reg-reg moves and one load+store.) An obvious alternative would be to use some of the three free volatile registers first (in the hope that most filters won't use 3 M[] locations), with a simple register allocator. This would save the (non-volatile reg) spill/fill to stack, too. In the interest of simplicity I didn't want to do that on a first pass. This also makes epilogue code not easy (not possible as a matter of fact) to unwind in helper function In x86_64 implementation, I chose bpf_error be able to force an exception, not returning to JIT code but directly to bpf_func() caller bpf_error: # force a return 0 from jit handler xor %eax,%eax mov -8(%rbp),%rbx leaveq ret Yep, if I use non-volatile regs a return isn't just a simple stack pop. Currently, I've an extra branch in the return path to hit the common epilogue. This could be optimised such that the out of line error path jumps directly to the common epilogue to return (rather than back to the callsite, checking a flag and /then/ to the epilogue) to speed up the non-error case. However, it's just a question of getting to the (existing) epilogue code to clean up; it doesn't need to be unwound in the helper function. I don't think this issue is a strong argument against having M[] exist in registers, though. From the current position, I think going in the direction of using volatile regs (without backup/restore cost) is better than going in the direction of making all M[] references stack accesses. Do you think it's bearable to continue as it is and then perform that optimisation later? Also, thanks for reading/commenting on the patch! Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH 0/1] BPF JIT for PPC64
Hi, Inspired by Eric Dumazet's x86-64 compiler for Berkeley Packet Filter programs, I've written a BPF compiler for 64-bit PowerPC. Although it hasn't finished its strenuous testing regime, I'll have intermittent net access for a couple of weeks so thought I'd post it for feedback now and submit a 'proper' version when I'm back. It's a fairly simple code generator, following a similar structure to the x86 version. The filter programs are an array of opcode/constant/branch destination structs, and can perform arithmetic/logical/comparison operations on two virtual registers A and X, loads from packet headers/data and accesses to local variables, M[]. Branching is also supported, but only forwards and only within the extent of the program. I would probably describe this as more of a static template binary translator than a JIT but have kept naming consistent :) Features include: - Filter code is generated as an ABI-compliant function, stackframe prologue/epilogue if necessary. - Simple filters (e.g. RET nn) need no stackframe or save/restore code so generate into only an li/blr. - Local variables, M[], live in registers - I believe this supports all BPF opcodes, although complicated loads from negative packet offsets (e.g. SKF_LL_OFF) are not yet supported. Caveats include: :) - Packet data loads call out to simple helper functions (bpf_jit.S) which themselves may fall back to a trampoline to skb_copy_bits. I haven't decided whether (as per comments there) it would be better to generate the simple loads inline and only call out in the slow case. - Branches currently generate to bcc 1f; b far dest; 1: or bcc near dest ; nop so either case is the same size. Multiple passes of assembly are used (the first gets an idea of how big everything is and what features are required), the next generates everything at accurate size, the third generates everything with accurate branch destination addresses); I intend not to nop-pad the short branch case but changing code size may result in more passes and a 'settling-down period'. Kept simple for now. - Anyone running PPC64 little-endian is doing something both interesting and unsupported for this work :-) (There are some trivial endian assumptions.) Tested in-situ (tcpdump with varying complexity filters) and with a random BPF generator; I haven't verified loads from the fall back skb_copy_bits path. Bug reports/testing would be very welcome. Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH 1/1] BPF JIT for PPC64
arch/powerpc/Kconfig |1 + arch/powerpc/Makefile |3 +- arch/powerpc/include/asm/ppc-opcode.h | 40 ++ arch/powerpc/net/Makefile |4 + arch/powerpc/net/bpf_jit.S| 138 +++ arch/powerpc/net/bpf_jit.h| 226 +++ arch/powerpc/net/bpf_jit_comp.c | 697 + 7 files changed, 1108 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2729c66..39860fc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -134,6 +134,7 @@ config PPC select GENERIC_IRQ_SHOW_LEVEL select HAVE_RCU_TABLE_FREE if SMP select HAVE_SYSCALL_TRACEPOINTS + select HAVE_BPF_JIT if PPC64 config EARLY_PRINTK bool diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index b7212b6..b94740f 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -154,7 +154,8 @@ core-y += arch/powerpc/kernel/ \ arch/powerpc/lib/ \ arch/powerpc/sysdev/ \ arch/powerpc/platforms/ \ - arch/powerpc/math-emu/ + arch/powerpc/math-emu/ \ + arch/powerpc/net/ core-$(CONFIG_XMON)+= arch/powerpc/xmon/ core-$(CONFIG_KVM) += arch/powerpc/kvm/ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index e472659..e980faa 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -71,6 +71,42 @@ #define PPC_INST_ERATSX0x7c000126 #define PPC_INST_ERATSX_DOT0x7c000127 +/* Misc instructions for BPF compiler */ +#define PPC_INST_LD0xe800 +#define PPC_INST_LHZ 0xa000 +#define PPC_INST_LWZ 0x8000 +#define PPC_INST_STD 0xf800 +#define PPC_INST_STDU 0xf801 +#define PPC_INST_MFLR 0x7c0802a6 +#define PPC_INST_MTLR 0x7c0803a6 +#define PPC_INST_CMPWI 0x2c00 +#define PPC_INST_CMPDI 0x2c20 +#define PPC_INST_CMPLW 0x7c40 +#define PPC_INST_CMPLWI0x2800 +#define PPC_INST_ADDI 0x3800 +#define PPC_INST_ADDIS 0x3c00 +#define PPC_INST_ADD 0x7c000214 +#define PPC_INST_SUB 0x7c50 +#define PPC_INST_BLR 0x4e800020 +#define PPC_INST_BLRL 0x4e800021 +#define PPC_INST_MULLW 0x7c0001d6 +#define PPC_INST_MULHWU0x7c16 +#define PPC_INST_MULLI 0x1c00 +#define PPC_INST_DIVWU 0x7c0003d6 +#define PPC_INST_RLWINM0x5400 +#define PPC_INST_RLDICR0x7804 +#define PPC_INST_SLW 0x7c30 +#define PPC_INST_SRW 0x7c000430 +#define PPC_INST_AND 0x7c38 +#define PPC_INST_ANDDOT0x7c39 +#define PPC_INST_OR0x7c000378 +#define PPC_INST_ANDI 0x7000 +#define PPC_INST_ORI 0x6000 +#define PPC_INST_ORIS 0x6400 +#define PPC_INST_NEG 0x7cd0 +#define PPC_INST_BRANCH0x4800 +#define PPC_INST_BRANCH_COND 0x4080 + /* macros to insert fields into opcodes */ #define __PPC_RA(a)(((a) 0x1f) 16) #define __PPC_RB(b)(((b) 0x1f) 11) @@ -83,6 +119,10 @@ #define __PPC_T_TLB(t) (((t) 0x3) 21) #define __PPC_WC(w)(((w) 0x3) 21) #define __PPC_WS(w)(((w) 0x1f) 11) +#define __PPC_SH(s)__PPC_WS(s) +#define __PPC_MB(s)(((s) 0x1f) 6) +#define __PPC_ME(s)(((s) 0x1f) 1) +#define __PPC_BI(s)(((s) 0x1f) 16) /* * Only use the larx hint bit on 64bit CPUs. e500v1/v2 based CPUs will treat a diff --git a/arch/powerpc/net/Makefile b/arch/powerpc/net/Makefile new file mode 100644 index 000..90568c3 --- /dev/null +++ b/arch/powerpc/net/Makefile @@ -0,0 +1,4 @@ +# +# Arch-specific network modules +# +obj-$(CONFIG_BPF_JIT) += bpf_jit.o bpf_jit_comp.o diff --git a/arch/powerpc/net/bpf_jit.S b/arch/powerpc/net/bpf_jit.S new file mode 100644 index 000..ce2225e --- /dev/null +++ b/arch/powerpc/net/bpf_jit.S @@ -0,0 +1,138 @@ +/* bpf_jit.S: Packet/header access helper functions + * for PPC64 BPF compiler. + * + * Copyright 2011 Matt Evans m...@ozlabs.org, IBM Corporation + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include asm
[PATCH] powerpc: Replace boot_cpu_count with spinning_secondaries, fix off-by-one cpu wait
smp_release_cpus() waits for all cpus (including the bootcpu) due to an off-by-one count on boot_cpu_count (which is all CPUs). This patch replaces that with spinning_secondaries (which is all secondary CPUs). Signed-off-by: Matt Evans m...@ozlabs.org --- arch/powerpc/include/asm/smp.h |2 +- arch/powerpc/kernel/head_64.S |2 +- arch/powerpc/kernel/prom.c |8 arch/powerpc/kernel/setup_32.c |1 - arch/powerpc/kernel/setup_64.c |6 +++--- 5 files changed, 13 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 880b8c1..8f01c85 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -30,7 +30,7 @@ #include asm/percpu.h extern int boot_cpuid; -extern int boot_cpu_count; +extern int spinning_secondaries; extern void cpu_die(void); diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index ba50409..3564c49 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -255,7 +255,7 @@ generic_secondary_common_init: mtctr r23 bctrl -3: LOAD_REG_ADDR(r3, boot_cpu_count) /* Decrement boot_cpu_count */ +3: LOAD_REG_ADDR(r3, spinning_secondaries) /* Decrement spinning_secondaries */ lwarx r4,0,r3 subir4,r4,1 stwcx. r4,0,r3 diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 48aeb55..540e0dc 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -69,6 +69,7 @@ unsigned long tce_alloc_start, tce_alloc_end; u64 ppc64_rma_size; #endif static phys_addr_t first_memblock_size; +static int __initdata boot_cpu_count; static int __init early_parse_mem(char *p) { @@ -748,6 +749,13 @@ void __init early_init_devtree(void *params) */ of_scan_flat_dt(early_init_dt_scan_cpus, NULL); +#if defined(CONFIG_SMP) defined(CONFIG_PPC64) + /* We'll later wait for secondaries to check in; there are +* NCPUS-1 non-boot CPUs :-) +*/ + spinning_secondaries = boot_cpu_count - 1; +#endif + DBG( - early_init_devtree()\n); } diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c index 620d792..1d2fbc9 100644 --- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -48,7 +48,6 @@ extern void bootx_init(unsigned long r4, unsigned long phys); int boot_cpuid = -1; EXPORT_SYMBOL_GPL(boot_cpuid); -int __initdata boot_cpu_count; int boot_cpuid_phys; int smp_hw_index[NR_CPUS]; diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index a88bf27..0576919 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -73,7 +73,7 @@ #endif int boot_cpuid = 0; -int __initdata boot_cpu_count; +int __initdata spinning_secondaries; u64 ppc64_pft_size; /* Pick defaults since we might want to patch instructions @@ -253,11 +253,11 @@ void smp_release_cpus(void) for (i = 0; i 10; i++) { mb(); HMT_low(); - if (boot_cpu_count == 0) + if (spinning_secondaries == 0) break; udelay(1); } - DBG(boot_cpu_count = %d\n, boot_cpu_count); + DBG(spinning_secondaries = %d\n, spinning_secondaries); DBG( - smp_release_cpus()\n); } -- 1.7.0.4 0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Free up some CPU feature bits by moving out MMU-related features
On 07/04/11 17:06, Kumar Gala wrote: On Apr 7, 2011, at 12:48 AM, Matt Evans wrote: diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index be3cdf9..7b0fe7c 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -178,22 +178,15 @@ extern const char *powerpc_base_platform; #define LONG_ASM_CONST(x)0 #endif -#define CPU_FTR_SLB LONG_ASM_CONST(0x0001) -#define CPU_FTR_16M_PAGELONG_ASM_CONST(0x0002) -#define CPU_FTR_TLBIEL LONG_ASM_CONST(0x0004) #define CPU_FTR_IABR LONG_ASM_CONST(0x0020) #define CPU_FTR_MMCRA LONG_ASM_CONST(0x0040) #define CPU_FTR_CTRL LONG_ASM_CONST(0x0080) #define CPU_FTR_SMT LONG_ASM_CONST(0x0100) -#define CPU_FTR_LOCKLESS_TLBIE LONG_ASM_CONST(0x0400) -#define CPU_FTR_CI_LARGE_PAGE LONG_ASM_CONST(0x1000) #define CPU_FTR_PAUSE_ZERO LONG_ASM_CONST(0x2000) #define CPU_FTR_PURR LONG_ASM_CONST(0x4000) #define CPU_FTR_CELL_TB_BUG LONG_ASM_CONST(0x8000) #define CPU_FTR_SPURR LONG_ASM_CONST(0x0001) #define CPU_FTR_DSCR LONG_ASM_CONST(0x0002) -#define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004) -#define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008) #define CPU_FTR_VSX LONG_ASM_CONST(0x0010) #define CPU_FTR_SAO LONG_ASM_CONST(0x0020) #define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x0040) @@ -205,9 +198,10 @@ extern const char *powerpc_base_platform; Seems like SAO should move into MMU features I would argue it's the core/nest that orders/disorders things rather than the MMU. Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Free up some CPU feature bits by moving out MMU-related features
Some of the 64bit PPC CPU features are MMU-related, so this patch moves them to MMU_FTR_ bits. All cpu_has_feature()-style tests are moved to mmu_has_feature(), and seven feature bits are freed as a result. Signed-off-by: Matt Evans m...@ozlabs.org --- Boot-tested on pseries and G5. arch/powerpc/include/asm/cputable.h| 52 ++- arch/powerpc/include/asm/mmu.h | 28 +++ arch/powerpc/include/asm/mmu_context.h |2 +- arch/powerpc/kernel/cputable.c | 39 ++--- arch/powerpc/kernel/entry_64.S |8 ++-- arch/powerpc/kernel/exceptions-64s.S |4 +- arch/powerpc/kernel/process.c |4 +- arch/powerpc/kernel/prom.c | 17 + arch/powerpc/kernel/setup_64.c |2 +- arch/powerpc/mm/hash_low_64.S |8 ++-- arch/powerpc/mm/hash_native_64.c |8 ++-- arch/powerpc/mm/hash_utils_64.c| 18 +- arch/powerpc/mm/hugetlbpage.c |2 +- arch/powerpc/mm/slb.c |4 +- arch/powerpc/mm/slb_low.S |8 ++-- arch/powerpc/mm/stab.c |2 +- arch/powerpc/platforms/iseries/exception.S |2 +- arch/powerpc/platforms/iseries/setup.c |4 +- arch/powerpc/platforms/pseries/lpar.c |2 +- arch/powerpc/xmon/xmon.c |2 +- 20 files changed, 123 insertions(+), 93 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index be3cdf9..7b0fe7c 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -178,22 +178,15 @@ extern const char *powerpc_base_platform; #define LONG_ASM_CONST(x) 0 #endif -#define CPU_FTR_SLBLONG_ASM_CONST(0x0001) -#define CPU_FTR_16M_PAGE LONG_ASM_CONST(0x0002) -#define CPU_FTR_TLBIEL LONG_ASM_CONST(0x0004) #define CPU_FTR_IABR LONG_ASM_CONST(0x0020) #define CPU_FTR_MMCRA LONG_ASM_CONST(0x0040) #define CPU_FTR_CTRL LONG_ASM_CONST(0x0080) #define CPU_FTR_SMTLONG_ASM_CONST(0x0100) -#define CPU_FTR_LOCKLESS_TLBIE LONG_ASM_CONST(0x0400) -#define CPU_FTR_CI_LARGE_PAGE LONG_ASM_CONST(0x1000) #define CPU_FTR_PAUSE_ZERO LONG_ASM_CONST(0x2000) #define CPU_FTR_PURR LONG_ASM_CONST(0x4000) #define CPU_FTR_CELL_TB_BUGLONG_ASM_CONST(0x8000) #define CPU_FTR_SPURR LONG_ASM_CONST(0x0001) #define CPU_FTR_DSCR LONG_ASM_CONST(0x0002) -#define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004) -#define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008) #define CPU_FTR_VSXLONG_ASM_CONST(0x0010) #define CPU_FTR_SAOLONG_ASM_CONST(0x0020) #define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x0040) @@ -205,9 +198,10 @@ extern const char *powerpc_base_platform; #ifndef __ASSEMBLY__ -#define CPU_FTR_PPCAS_ARCH_V2 (CPU_FTR_SLB | \ -CPU_FTR_TLBIEL | CPU_FTR_NOEXECUTE | \ -CPU_FTR_NODSISRALIGN | CPU_FTR_16M_PAGE) +#define CPU_FTR_PPCAS_ARCH_V2 (CPU_FTR_NOEXECUTE | CPU_FTR_NODSISRALIGN) + +#define MMU_FTR_PPCAS_ARCH_V2 (MMU_FTR_SLB | MMU_FTR_TLBIEL | \ +MMU_FTR_16M_PAGE) /* We only set the altivec features if the kernel was compiled with altivec * support @@ -405,41 +399,49 @@ extern const char *powerpc_base_platform; #define CPU_FTRS_POWER5(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_MMCRA | CPU_FTR_SMT | \ - CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ - CPU_FTR_PURR | CPU_FTR_STCX_CHECKS_ADDRESS | \ - CPU_FTR_POPCNTB) + CPU_FTR_COHERENT_ICACHE | CPU_FTR_PURR | \ + CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB) #define CPU_FTRS_POWER6 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_MMCRA | CPU_FTR_SMT | \ - CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ - CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD | \ + CPU_FTR_COHERENT_ICACHE | CPU_FTR_PURR | CPU_FTR_SPURR | \ + CPU_FTR_REAL_LE | CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD | \ CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB) #define CPU_FTRS_POWER7 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_MMCRA
Re: [PATCH v2 5/5] xhci: Remove recursive call to xhci_handle_event
On 30/03/11 07:00, Dmitry Torokhov wrote: On Sunday, March 27, 2011 09:53:00 pm Matt Evans wrote: @@ -2282,7 +2284,7 @@ hw_died: /* FIXME this should be a delayed service routine * that clears the EHB. */ -xhci_handle_event(xhci); +while (xhci_handle_event(xhci)) {} I must admit I dislike the style with empty loop bodies, do you think we could have something like below instead? Well, although I don't mind empty while()s at all (they're clean and obvious IMHO) I would remove an empty blightful while loop with something like this: do { ret = xhci_handle_event(xhci); } while (ret 0); ;-) (Not sure that refactoring its contents into the IRQ handler is a good idea, if that area's going to be revisited soon to extend error handling/reporting.) Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/5] Make xHCI driver endian-safe, add a barrier, some debug
On 29/03/11 09:16, Sarah Sharp wrote: On Fri, Mar 25, 2011 at 06:43:44PM +1100, Matt Evans wrote: Hi Sarah, This series addresses the endian issues with the xHCI driver, and has brought lovely USB 3 to PPC. :-) I've tested various types of traffic on ppc4xx and POWER7 and (some sound driver bugs notwithstanding) all seems fine. Also addresses an ordering problem we found and the recursive nature of the event handling, plus the addition of some debug. Thanks for doing this work, Matt! I appreciate it. Hey, no worries! I've got some $3 USB speakers I wanted to connect to the $15,000 POWER server, you know how it goes. This should apply to 2.6.38/Linus' tree. You say that these apply against 2.6.38, but recently a lot of xHCI changes went into Linus' tree to support USB 3.0 hubs in 2.6.39. Will these patches still apply against Linus' latest tree? If not, I suggest you base your patches against Greg KH's usb-linus branch, as that's my tree base: http://git.kernel.org/?p=linux/kernel/git/gregkh/usb-2.6.git;a=shortlog;h=refs/heads/usb-linus Sorry, I wasn't too explicit about that; I'd meant Linus' tree as of todayish and I believe I caught said USB 3.0 changes, but I'll rebase from usb-linus anyway to make sure we're fine. Also, I haven't read too far into the patches, but the first patch seems to have several one or two letter variable names, like f and di. Can you make those variable names more descriptive? Thanks. Sure; they were single-use throwaways to break long RMW lines but I've rejiggled removed them. Thanks for looking, I'll repost v3 with tidyups. Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/5] xhci: Extend debug messages
Hi, On 29/03/11 09:19, Sarah Sharp wrote: On Fri, Mar 25, 2011 at 06:43:59PM +1100, Matt Evans wrote: Add more debug to print queued transfers, show control intentions and modify an existing message to hexify address output. Are these new debug messages really necessary? I feel like the xHCI driver has way too many debugging messages already. I'd like to switch it over to the event tracer infrastructure eventually, and every new debug message makes that work harder to do... The queue_trb debug message is going to be especially noticeable, since it will be triggered on every URB submission (sometimes more than once per URB). My rationale was that when looking out for control transfers, the actual transfer was the one line of info *not* printed at that point (among about a trillion of ring/buffer state). :) I didn't know about the event tracer intentions, so if this'll complicate things I'll drop it from the series, no worries. Cheers, Matt Sarah Sharp Signed-off-by: Matt Evans m...@ozlabs.org --- drivers/usb/host/xhci-dbg.c |2 +- drivers/usb/host/xhci-hub.c |3 +++ drivers/usb/host/xhci-ring.c |5 + 3 files changed, 9 insertions(+), 1 deletions(-) diff --git a/drivers/usb/host/xhci-dbg.c b/drivers/usb/host/xhci-dbg.c index 2e04861..905f3bf 100644 --- a/drivers/usb/host/xhci-dbg.c +++ b/drivers/usb/host/xhci-dbg.c @@ -278,7 +278,7 @@ void xhci_debug_trb(struct xhci_hcd *xhci, union xhci_trb *trb) * FIXME: look at flags to figure out if it's an address or if * the data is directly in the buffer field. */ -xhci_dbg(xhci, DMA address or buffer contents= %llu\n, address); +xhci_dbg(xhci, DMA address or buffer contents= 0x%llx\n, address); break; case TRB_TYPE(TRB_COMPLETION): address = le64_to_cpu(trb-event_cmd.cmd_trb); diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index ae1d24c..768fd6e 100644 --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -389,6 +389,9 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue, } bus_state = xhci-bus_state[hcd_index(hcd)]; +xhci_dbg(xhci, %s(%04x, %04x, %04x, %p, %04x);\n, __func__, + typeReq, wValue, wIndex, buf, wLength); + spin_lock_irqsave(xhci-lock, flags); switch (typeReq) { case GetHubStatus: diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 3898f22..45f3b77 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2336,6 +2336,11 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring, trb-field[1] = cpu_to_le32(field2); trb-field[2] = cpu_to_le32(field3); trb-field[3] = cpu_to_le32(field4); + +xhci_dbg(xhci, queue_trb @%llx %08x %08x %08x %08x\n, + xhci_trb_virt_to_dma(ring-enq_seg, ring-enqueue), + le32_to_cpu(trb-field[0]), le32_to_cpu(trb-field[1]), + le32_to_cpu(trb-field[2]), le32_to_cpu(trb-field[3])); inc_enq(xhci, ring, consumer, more_trbs_coming); } -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 0/4] xhci: Make xHCI driver endian-safe
Hi Sarah, Reposting the whole set, since there're 4 instead of 5 having dropped the 'add debug' patch in the middle and I thought differing 'vN a/b' subjects may get confusing otherwise. (Two of the patches haven't changed. :/ ) These remove the scratty temp variables and comment the return value of xhci_handle_event(). Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 2/4] xhci: Add rmb() between reading event validity event data access.
On weakly-ordered systems, the reading of an event's content must occur after reading the event's validity. Signed-off-by: Matt Evans m...@ozlabs.org --- drivers/usb/host/xhci-ring.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 1361032..86c8198 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2150,6 +2150,11 @@ static void xhci_handle_event(struct xhci_hcd *xhci) } xhci_dbg(xhci, %s - OS owns TRB\n, __func__); + /* +* Barrier between reading the TRB_CYCLE (valid) flag above and any +* speculative reads of the event's flags/data below. +*/ + rmb(); /* FIXME: Handle more event types. */ switch ((le32_to_cpu(event-event_cmd.flags) TRB_TYPE_BITMASK)) { case TRB_TYPE(TRB_COMPLETION): -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 3/4] xhci: Add an assertion to check for virt_dev=0 bug.
During a plug-unplug stress test on an NEC xHCI card, a null pointer dereference was observed. xhci_address_device() dereferenced a null virt_dev (possibly an erroneous udev-slot_id?); this patch adds a WARN_ON message to aid debug if it can be recreated. Signed-off-by: Matt Evans m...@ozlabs.org --- drivers/usb/host/xhci.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 3a9f931..d145fa3 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -2533,6 +2533,17 @@ int xhci_address_device(struct usb_hcd *hcd, struct usb_device *udev) virt_dev = xhci-devs[udev-slot_id]; + if (WARN_ON(!virt_dev)) { + /* +* In plug/unplug torture test with an NEC controller, +* a zero-dereference was observed once due to virt_dev = 0. +* Print useful debug rather than crash if it is observed again! +*/ + xhci_warn(xhci, Virt dev invalid for slot_id 0x%x!\n, + udev-slot_id); + return -EINVAL; + } + slot_ctx = xhci_get_slot_ctx(xhci, virt_dev-in_ctx); /* * If this is the first Set Address since device plug-in or -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 4/4] xhci: Remove recursive call to xhci_handle_event
Make the caller loop while there are events to handle, instead. Signed-off-by: Matt Evans m...@ozlabs.org --- Added a comment on the return value, defining 0 to be 'bad'. drivers/usb/host/xhci-ring.c | 18 +++--- 1 files changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 86c8198..5fce46c 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2128,8 +2128,10 @@ cleanup: /* * This function handles all OS-owned events on the event ring. It may drop * xhci-lock between event processing (e.g. to pass up port status changes). + * Returns 0 for possibly more events to process (caller should call again), + * otherwise 0 if done. In future, 0 returns should indicate error code. */ -static void xhci_handle_event(struct xhci_hcd *xhci) +static int xhci_handle_event(struct xhci_hcd *xhci) { union xhci_trb *event; int update_ptrs = 1; @@ -2138,7 +2140,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci) xhci_dbg(xhci, In %s\n, __func__); if (!xhci-event_ring || !xhci-event_ring-dequeue) { xhci-error_bitmask |= 1 1; - return; + return 0; } event = xhci-event_ring-dequeue; @@ -2146,7 +2148,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci) if ((le32_to_cpu(event-event_cmd.flags) TRB_CYCLE) != xhci-event_ring-cycle_state) { xhci-error_bitmask |= 1 2; - return; + return 0; } xhci_dbg(xhci, %s - OS owns TRB\n, __func__); @@ -2190,15 +2192,17 @@ static void xhci_handle_event(struct xhci_hcd *xhci) if (xhci-xhc_state XHCI_STATE_DYING) { xhci_dbg(xhci, xHCI host dying, returning from event handler.\n); - return; + return 0; } if (update_ptrs) /* Update SW event ring dequeue pointer */ inc_deq(xhci, xhci-event_ring, true); - /* Are there more items on the event ring? */ - xhci_handle_event(xhci); + /* Are there more items on the event ring? Caller will call us again to +* check. +*/ + return 1; } /* @@ -2280,7 +2284,7 @@ hw_died: /* FIXME this should be a delayed service routine * that clears the EHB. */ - xhci_handle_event(xhci); + while (xhci_handle_event(xhci) 0) {} temp_64 = xhci_read_64(xhci, xhci-ir_set-erst_dequeue); /* If necessary, update the HW's version of the event ring deq ptr. */ -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 3/5] xhci: Add rmb() between reading event validity event data access.
On weakly-ordered systems, the reading of an event's content must occur after reading the event's validity. Signed-off-by: Matt Evans m...@ozlabs.org --- Segher, thanks for the comment; explanation added. drivers/usb/host/xhci-ring.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 45f3b77..d6aa880 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2152,6 +2152,11 @@ static void xhci_handle_event(struct xhci_hcd *xhci) } xhci_dbg(xhci, %s - OS owns TRB\n, __func__); + /* +* Barrier between reading the TRB_CYCLE (valid) flag above and any +* speculative reads of the event's flags/data below. +*/ + rmb(); /* FIXME: Handle more event types. */ switch ((le32_to_cpu(event-event_cmd.flags) TRB_TYPE_BITMASK)) { case TRB_TYPE(TRB_COMPLETION): -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 5/5] xhci: Remove recursive call to xhci_handle_event
Make the caller loop while there are events to handle, instead. Signed-off-by: Matt Evans m...@ozlabs.org --- 1 byte smaller after Sergei's suggestion. drivers/usb/host/xhci-ring.c | 16 +--- 1 files changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index d6aa880..9c51d69 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2131,7 +2131,7 @@ cleanup: * This function handles all OS-owned events on the event ring. It may drop * xhci-lock between event processing (e.g. to pass up port status changes). */ -static void xhci_handle_event(struct xhci_hcd *xhci) +static int xhci_handle_event(struct xhci_hcd *xhci) { union xhci_trb *event; int update_ptrs = 1; @@ -2140,7 +2140,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci) xhci_dbg(xhci, In %s\n, __func__); if (!xhci-event_ring || !xhci-event_ring-dequeue) { xhci-error_bitmask |= 1 1; - return; + return 0; } event = xhci-event_ring-dequeue; @@ -2148,7 +2148,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci) if ((le32_to_cpu(event-event_cmd.flags) TRB_CYCLE) != xhci-event_ring-cycle_state) { xhci-error_bitmask |= 1 2; - return; + return 0; } xhci_dbg(xhci, %s - OS owns TRB\n, __func__); @@ -2192,15 +2192,17 @@ static void xhci_handle_event(struct xhci_hcd *xhci) if (xhci-xhc_state XHCI_STATE_DYING) { xhci_dbg(xhci, xHCI host dying, returning from event handler.\n); - return; + return 0; } if (update_ptrs) /* Update SW event ring dequeue pointer */ inc_deq(xhci, xhci-event_ring, true); - /* Are there more items on the event ring? */ - xhci_handle_event(xhci); + /* Are there more items on the event ring? Caller will call us again to +* check. +*/ + return 1; } /* @@ -2282,7 +2284,7 @@ hw_died: /* FIXME this should be a delayed service routine * that clears the EHB. */ - xhci_handle_event(xhci); + while (xhci_handle_event(xhci)) {} temp_64 = xhci_read_64(xhci, xhci-ir_set-erst_dequeue); /* If necessary, update the HW's version of the event ring deq ptr. */ -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/5] Make xHCI driver endian-safe, add a barrier, some debug
Hi Sarah, This series addresses the endian issues with the xHCI driver, and has brought lovely USB 3 to PPC. :-) I've tested various types of traffic on ppc4xx and POWER7 and (some sound driver bugs notwithstanding) all seems fine. Also addresses an ordering problem we found and the recursive nature of the event handling, plus the addition of some debug. This should apply to 2.6.38/Linus' tree. Thanks! Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/5] xhci: Extend debug messages
Add more debug to print queued transfers, show control intentions and modify an existing message to hexify address output. Signed-off-by: Matt Evans m...@ozlabs.org --- drivers/usb/host/xhci-dbg.c |2 +- drivers/usb/host/xhci-hub.c |3 +++ drivers/usb/host/xhci-ring.c |5 + 3 files changed, 9 insertions(+), 1 deletions(-) diff --git a/drivers/usb/host/xhci-dbg.c b/drivers/usb/host/xhci-dbg.c index 2e04861..905f3bf 100644 --- a/drivers/usb/host/xhci-dbg.c +++ b/drivers/usb/host/xhci-dbg.c @@ -278,7 +278,7 @@ void xhci_debug_trb(struct xhci_hcd *xhci, union xhci_trb *trb) * FIXME: look at flags to figure out if it's an address or if * the data is directly in the buffer field. */ - xhci_dbg(xhci, DMA address or buffer contents= %llu\n, address); + xhci_dbg(xhci, DMA address or buffer contents= 0x%llx\n, address); break; case TRB_TYPE(TRB_COMPLETION): address = le64_to_cpu(trb-event_cmd.cmd_trb); diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index ae1d24c..768fd6e 100644 --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -389,6 +389,9 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue, } bus_state = xhci-bus_state[hcd_index(hcd)]; + xhci_dbg(xhci, %s(%04x, %04x, %04x, %p, %04x);\n, __func__, +typeReq, wValue, wIndex, buf, wLength); + spin_lock_irqsave(xhci-lock, flags); switch (typeReq) { case GetHubStatus: diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 3898f22..45f3b77 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2336,6 +2336,11 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring, trb-field[1] = cpu_to_le32(field2); trb-field[2] = cpu_to_le32(field3); trb-field[3] = cpu_to_le32(field4); + + xhci_dbg(xhci, queue_trb @%llx %08x %08x %08x %08x\n, +xhci_trb_virt_to_dma(ring-enq_seg, ring-enqueue), +le32_to_cpu(trb-field[0]), le32_to_cpu(trb-field[1]), +le32_to_cpu(trb-field[2]), le32_to_cpu(trb-field[3])); inc_enq(xhci, ring, consumer, more_trbs_coming); } -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/5] xhci: Add rmb() between reading event validity event data access.
On weakly-ordered systems, the reading of an event's content must occur after reading the event's validity. Signed-off-by: Matt Evans m...@ozlabs.org --- drivers/usb/host/xhci-ring.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 45f3b77..b46efd9 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2151,7 +2151,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci) return; } xhci_dbg(xhci, %s - OS owns TRB\n, __func__); - + rmb(); /* FIXME: Handle more event types. */ switch ((le32_to_cpu(event-event_cmd.flags) TRB_TYPE_BITMASK)) { case TRB_TYPE(TRB_COMPLETION): -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/5] xhci: Add an assertion to check for virt_dev=0 bug.
During a plug-unplug stress test on an NEC xHCI card, a null pointer dereference was observed. xhci_address_device() dereferenced a null virt_dev (possibly an erroneous udev-slot_id?); this patch adds a WARN_ON message to aid debug if it can be recreated. Signed-off-by: Matt Evans m...@ozlabs.org --- drivers/usb/host/xhci.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 88e6298..7d43456 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -2542,6 +2542,17 @@ int xhci_address_device(struct usb_hcd *hcd, struct usb_device *udev) virt_dev = xhci-devs[udev-slot_id]; + if (WARN_ON(!virt_dev)) { + /* +* In plug/unplug torture test with an NEC controller, +* a zero-dereference was observed once due to virt_dev = 0. +* Print useful debug rather than crash if it is observed again! +*/ + xhci_warn(xhci, Virt dev invalid for slot_id 0x%x!\n, + udev-slot_id); + return -EINVAL; + } + slot_ctx = xhci_get_slot_ctx(xhci, virt_dev-in_ctx); /* * If this is the first Set Address since device plug-in or -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 5/5] xhci: Remove recursive call to xhci_handle_event
Make the caller loop while there are events to handle, instead. Signed-off-by: Matt Evans m...@ozlabs.org --- drivers/usb/host/xhci-ring.c | 16 +--- 1 files changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index b46efd9..97bedd6 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2131,7 +2131,7 @@ cleanup: * This function handles all OS-owned events on the event ring. It may drop * xhci-lock between event processing (e.g. to pass up port status changes). */ -static void xhci_handle_event(struct xhci_hcd *xhci) +static int xhci_handle_event(struct xhci_hcd *xhci) { union xhci_trb *event; int update_ptrs = 1; @@ -2140,7 +2140,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci) xhci_dbg(xhci, In %s\n, __func__); if (!xhci-event_ring || !xhci-event_ring-dequeue) { xhci-error_bitmask |= 1 1; - return; + return 0; } event = xhci-event_ring-dequeue; @@ -2148,7 +2148,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci) if ((le32_to_cpu(event-event_cmd.flags) TRB_CYCLE) != xhci-event_ring-cycle_state) { xhci-error_bitmask |= 1 2; - return; + return 0; } xhci_dbg(xhci, %s - OS owns TRB\n, __func__); rmb(); @@ -2187,15 +2187,17 @@ static void xhci_handle_event(struct xhci_hcd *xhci) if (xhci-xhc_state XHCI_STATE_DYING) { xhci_dbg(xhci, xHCI host dying, returning from event handler.\n); - return; + return 0; } if (update_ptrs) /* Update SW event ring dequeue pointer */ inc_deq(xhci, xhci-event_ring, true); - /* Are there more items on the event ring? */ - xhci_handle_event(xhci); + /* Are there more items on the event ring? Caller will call us again to +* check. +*/ + return 1; } /* @@ -2277,7 +2279,7 @@ hw_died: /* FIXME this should be a delayed service routine * that clears the EHB. */ - xhci_handle_event(xhci); + while (xhci_handle_event(xhci)) {}; temp_64 = xhci_read_64(xhci, xhci-ir_set-erst_dequeue); /* If necessary, update the HW's version of the event ring deq ptr. */ -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Initialise paca-kstack before early_setup_secondary
As early setup calls down to slb_initialize(), we must have kstack initialised before checking should we add a bolted SLB entry for our kstack? Failing to do so means stack access requires an SLB miss exception to refill an entry dynamically, if the stack isn't accessible via SLB(0) (kernel text static data). It's not always allowable to take such a miss, and intermittent crashes will result. Primary CPUs don't have this issue; an SLB entry is not bolted for their stack anyway (as that lives within SLB(0)). This patch therefore only affects the init of secondaries. Signed-off-by: Matt Evans m...@ozlabs.org Cc: stable sta...@kernel.org --- arch/powerpc/kernel/head_64.S |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 844a44b..4d6681d 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -572,9 +572,6 @@ __secondary_start: /* Set thread priority to MEDIUM */ HMT_MEDIUM - /* Do early setup for that CPU (stab, slb, hash table pointer) */ - bl .early_setup_secondary - /* Initialize the kernel stack. Just a repeat for iSeries. */ LOAD_REG_ADDR(r3, current_set) sldir28,r24,3 /* get current_set[cpu#] */ @@ -582,6 +579,9 @@ __secondary_start: addir1,r1,THREAD_SIZE-STACK_FRAME_OVERHEAD std r1,PACAKSAVE(r13) + /* Do early setup for that CPU (stab, slb, hash table pointer) */ + bl .early_setup_secondary + /* Clear backchain so we get nice backtraces */ li r7,0 mtlrr7 -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec
Michael Neuling wrote: In message 4c511216.30...@ozlabs.org you wrote: When CPU hotplug is used, some CPUs may be offline at the time a kexec is performed. The subsequent kernel may expect these CPUs to be already running , and will declare them stuck. On pseries, there's also a soft-offline (cede) state that CPUs may be in; this can also cause problems as the kexeced kernel may ask RTAS if they're online -- and RTAS would say they are. Again, stuck. This patch kicks each present offline CPU awake before the kexec, so that none are lost to these assumptions in the subsequent kernel. There are a lot of cleanups in this patch. The change you are making would be a lot clearer without all the additional cleanups in there. I think I'd like to see this as two patches. One for cleanups and one for the addition of wake_offline_cpus(). Okay, I can split this. Typofixy-add-debug in one, wake_offline_cpus in another. Other than that, I'm not completely convinced this is the functionality we want. Do we really want to online these cpus? Why where they offlined in the first place? I understand the stuck problem, but is the solution to online them, or to change the device tree so that the second kernel doesn't detect them as stuck? Well... There are two cases. If a CPU is soft-offlined on pseries, it must be woken from that cede loop (in platforms/pseries/hotplug-cpu.c) as we're replacing code under its feet. We could either special-case the wakeup from this cede loop to get that CPU to RTAS stop-self itself properly. (Kind of like a wake to die.) So that leaves hard-offline CPUs (perhaps including the above): I don't know why they might have been offlined. If it's something serious, like fire, they'd be removed from the present set too (and thus not be considered in this restarting case). We could add a mask to the CPU node to show which of the threads (if any) are running, and alter the startup code to start everything if this mask doesn't exist (non-kexec) or only online currently-running threads if the mask is present. That feels a little weird. My reasoning for restarting everything was: The first time you boot, all of your present CPUs are started up. When you reboot, any CPUs you offlined for fun are restarted. Kexec is (in this non-crash sense) a user-initiated 'quick reboot', so I reasoned that it should look the same as a 'hard reboot' and your new invocation would have all available CPUs running as is usual. Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/2] powerpc/kexec: Fix orphaned offline CPUs, add comments/debug
Separated tidyup comments debug away from the fix of restarting offline available CPUs before waiting for them on kexec. Matt Evans (2): powerpc/kexec: Add to and tidy debug/comments in machine_kexec64.c powerpc/kexec: Fix orphaned offline CPUs across kexec arch/powerpc/kernel/machine_kexec_64.c | 55 --- 1 files changed, 49 insertions(+), 6 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] powerpc/kexec: Add to and tidy debug/comments in machine_kexec64.c
Tidies some typos, KERN_INFO-ise an info msg, and add a debug msg showing when the final sequence starts. Also adds a comment to kexec_prepare_cpus_wait() to make note of a possible problem; the need for kexec to deal with CPUs that failed to originally start up. Signed-off-by: Matt Evans m...@ozlabs.org --- arch/powerpc/kernel/machine_kexec_64.c | 29 - 1 files changed, 24 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 4fbb3be..aa3d5cd 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -15,6 +15,7 @@ #include linux/thread_info.h #include linux/init_task.h #include linux/errno.h +#include linux/kernel.h #include asm/page.h #include asm/current.h @@ -181,7 +182,20 @@ static void kexec_prepare_cpus_wait(int wait_state) int my_cpu, i, notified=-1; my_cpu = get_cpu(); - /* Make sure each CPU has atleast made it to the state we need */ + /* Make sure each CPU has at least made it to the state we need. +* +* FIXME: There is a (slim) chance of a problem if not all of the CPUs +* are correctly onlined. If somehow we start a CPU on boot with RTAS +* start-cpu, but somehow that CPU doesn't write callin_cpu_map[] in +* time, the boot CPU will timeout. If it does eventually execute +* stuff, the secondary will start up (paca[].cpu_start was written) and +* get into a peculiar state. If the platform supports +* smp_ops-take_timebase(), the secondary CPU will probably be spinning +* in there. If not (i.e. pseries), the secondary will continue on and +* try to online itself/idle/etc. If it survives that, we need to find +* these possible-but-not-online-but-should-be CPUs and chaperone them +* into kexec_smp_wait(). +*/ for_each_online_cpu(i) { if (i == my_cpu) continue; @@ -189,9 +203,9 @@ static void kexec_prepare_cpus_wait(int wait_state) while (paca[i].kexec_state wait_state) { barrier(); if (i != notified) { - printk( kexec: waiting for cpu %d (physical -%d) to enter %i state\n, - i, paca[i].hw_cpu_id, wait_state); + printk(KERN_INFO kexec: waiting for cpu %d + (physical %d) to enter %i state\n, + i, paca[i].hw_cpu_id, wait_state); notified = i; } } @@ -215,7 +229,10 @@ static void kexec_prepare_cpus(void) if (ppc_md.kexec_cpu_down) ppc_md.kexec_cpu_down(0, 0); - /* Before removing MMU mapings make sure all CPUs have entered real mode */ + /* +* Before removing MMU mappings make sure all CPUs have entered real +* mode: +*/ kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE); put_cpu(); @@ -284,6 +301,8 @@ void default_machine_kexec(struct kimage *image) if (crashing_cpu == -1) kexec_prepare_cpus(); + pr_debug(kexec: Starting switchover sequence.\n); + /* switch to a staticly allocated stack. Based on irq stack code. * XXX: the task struct will likely be invalid once we do the copy! */ -- 1.6.3.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] powerpc/kexec: Fix orphaned offline CPUs across kexec
When CPU hotplug is used, some CPUs may be offline at the time a kexec is performed. The subsequent kernel may expect these CPUs to be already running, and will declare them stuck. On pseries, there's also a soft-offline (cede) state that CPUs may be in; this can also cause problems as the kexeced kernel may ask RTAS if they're online -- and RTAS would say they are. The CPU will either appear stuck, or will cause a crash as we replace its cede loop beneath it. This patch kicks each present offline CPU awake before the kexec, so that none are forever lost to these assumptions in the subsequent kernel. Now, the behaviour is that all available CPUs that were offlined are now online usable after the kexec. This mimics the behaviour of a full reboot (on which all CPUs will be restarted). Signed-off-by: Matt Evans m...@ozlabs.org --- arch/powerpc/kernel/machine_kexec_64.c | 26 +- 1 files changed, 25 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index aa3d5cd..37f805e 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -16,6 +16,7 @@ #include linux/init_task.h #include linux/errno.h #include linux/kernel.h +#include linux/cpu.h #include asm/page.h #include asm/current.h @@ -213,9 +214,32 @@ static void kexec_prepare_cpus_wait(int wait_state) mb(); } -static void kexec_prepare_cpus(void) +/* + * We need to make sure each present CPU is online. The next kernel will scan + * the device tree and assume primary threads are online and query secondary + * threads via RTAS to online them if required. If we don't online primary + * threads, they will be stuck. However, we also online secondary threads as we + * may be using 'cede offline'. In this case RTAS doesn't see the secondary + * threads as offline -- and again, these CPUs will be stuck. + * + * So, we online all CPUs that should be running, including secondary threads. + */ +static void wake_offline_cpus(void) { + int cpu = 0; + + for_each_present_cpu(cpu) { + if (!cpu_online(cpu)) { + printk(KERN_INFO kexec: Waking offline cpu %d.\n, + cpu); + cpu_up(cpu); + } + } +} +static void kexec_prepare_cpus(void) +{ + wake_offline_cpus(); smp_call_function(kexec_smp_down, NULL, /* wait */0); local_irq_disable(); mb(); /* make sure IRQs are disabled before we say they are */ -- 1.6.3.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec
When CPU hotplug is used, some CPUs may be offline at the time a kexec is performed. The subsequent kernel may expect these CPUs to be already running, and will declare them stuck. On pseries, there's also a soft-offline (cede) state that CPUs may be in; this can also cause problems as the kexeced kernel may ask RTAS if they're online -- and RTAS would say they are. Again, stuck. This patch kicks each present offline CPU awake before the kexec, so that none are lost to these assumptions in the subsequent kernel. Signed-off-by: Matt Evans m...@ozlabs.org --- v2: Added FIXME comment noting a possible problem with incorrectly started secondary CPUs, following feedback from Milton. arch/powerpc/kernel/machine_kexec_64.c | 55 --- 1 files changed, 49 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 4fbb3be..37f805e 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -15,6 +15,8 @@ #include linux/thread_info.h #include linux/init_task.h #include linux/errno.h +#include linux/kernel.h +#include linux/cpu.h #include asm/page.h #include asm/current.h @@ -181,7 +183,20 @@ static void kexec_prepare_cpus_wait(int wait_state) int my_cpu, i, notified=-1; my_cpu = get_cpu(); - /* Make sure each CPU has atleast made it to the state we need */ + /* Make sure each CPU has at least made it to the state we need. +* +* FIXME: There is a (slim) chance of a problem if not all of the CPUs +* are correctly onlined. If somehow we start a CPU on boot with RTAS +* start-cpu, but somehow that CPU doesn't write callin_cpu_map[] in +* time, the boot CPU will timeout. If it does eventually execute +* stuff, the secondary will start up (paca[].cpu_start was written) and +* get into a peculiar state. If the platform supports +* smp_ops-take_timebase(), the secondary CPU will probably be spinning +* in there. If not (i.e. pseries), the secondary will continue on and +* try to online itself/idle/etc. If it survives that, we need to find +* these possible-but-not-online-but-should-be CPUs and chaperone them +* into kexec_smp_wait(). +*/ for_each_online_cpu(i) { if (i == my_cpu) continue; @@ -189,9 +204,9 @@ static void kexec_prepare_cpus_wait(int wait_state) while (paca[i].kexec_state wait_state) { barrier(); if (i != notified) { - printk( kexec: waiting for cpu %d (physical -%d) to enter %i state\n, - i, paca[i].hw_cpu_id, wait_state); + printk(KERN_INFO kexec: waiting for cpu %d + (physical %d) to enter %i state\n, + i, paca[i].hw_cpu_id, wait_state); notified = i; } } @@ -199,9 +214,32 @@ static void kexec_prepare_cpus_wait(int wait_state) mb(); } -static void kexec_prepare_cpus(void) +/* + * We need to make sure each present CPU is online. The next kernel will scan + * the device tree and assume primary threads are online and query secondary + * threads via RTAS to online them if required. If we don't online primary + * threads, they will be stuck. However, we also online secondary threads as we + * may be using 'cede offline'. In this case RTAS doesn't see the secondary + * threads as offline -- and again, these CPUs will be stuck. + * + * So, we online all CPUs that should be running, including secondary threads. + */ +static void wake_offline_cpus(void) { + int cpu = 0; + for_each_present_cpu(cpu) { + if (!cpu_online(cpu)) { + printk(KERN_INFO kexec: Waking offline cpu %d.\n, + cpu); + cpu_up(cpu); + } + } +} + +static void kexec_prepare_cpus(void) +{ + wake_offline_cpus(); smp_call_function(kexec_smp_down, NULL, /* wait */0); local_irq_disable(); mb(); /* make sure IRQs are disabled before we say they are */ @@ -215,7 +253,10 @@ static void kexec_prepare_cpus(void) if (ppc_md.kexec_cpu_down) ppc_md.kexec_cpu_down(0, 0); - /* Before removing MMU mapings make sure all CPUs have entered real mode */ + /* +* Before removing MMU mappings make sure all CPUs have entered real +* mode: +*/ kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE); put_cpu(); @@ -284,6 +325,8 @@ void default_machine_kexec(struct kimage *image) if (crashing_cpu == -1) kexec_prepare_cpus
[PATCH] powerpc/kexec: Fix orphaned offline CPUs across kexec
When CPU hotplug is used, some CPUs may be offline at the time a kexec is performed. The subsequent kernel may expect these CPUs to be already running, and will declare them stuck. On pseries, there's also a soft-offline (cede) state that CPUs may be in; this can also cause problems as the kexeced kernel may ask RTAS if they're online -- and RTAS would say they are. Again, stuck. This patch kicks each present offline CPU awake before the kexec, so that none are lost to these assumptions in the subsequent kernel. Signed-off-by: Matt Evans m...@ozlabs.org --- arch/powerpc/kernel/machine_kexec_64.c | 42 +++ 1 files changed, 36 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 4fbb3be..c2fd914 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -15,6 +15,8 @@ #include linux/thread_info.h #include linux/init_task.h #include linux/errno.h +#include linux/kernel.h +#include linux/cpu.h #include asm/page.h #include asm/current.h @@ -181,7 +183,7 @@ static void kexec_prepare_cpus_wait(int wait_state) int my_cpu, i, notified=-1; my_cpu = get_cpu(); - /* Make sure each CPU has atleast made it to the state we need */ + /* Make sure each CPU has at least made it to the state we need. */ for_each_online_cpu(i) { if (i == my_cpu) continue; @@ -189,9 +191,9 @@ static void kexec_prepare_cpus_wait(int wait_state) while (paca[i].kexec_state wait_state) { barrier(); if (i != notified) { - printk( kexec: waiting for cpu %d (physical -%d) to enter %i state\n, - i, paca[i].hw_cpu_id, wait_state); + printk(KERN_INFO kexec: waiting for cpu %d + (physical %d) to enter %i state\n, + i, paca[i].hw_cpu_id, wait_state); notified = i; } } @@ -199,9 +201,32 @@ static void kexec_prepare_cpus_wait(int wait_state) mb(); } -static void kexec_prepare_cpus(void) +/* + * We need to make sure each present CPU is online. The next kernel will scan + * the device tree and assume primary threads are online and query secondary + * threads via RTAS to online them if required. If we don't online primary + * threads, they will be stuck. However, we also online secondary threads as we + * may be using 'cede offline'. In this case RTAS doesn't see the secondary + * threads as offline -- and again, these CPUs will be stuck. + * + * So, we online all CPUs that should be running, including secondary threads. + */ +static void wake_offline_cpus(void) { + int cpu = 0; + for_each_present_cpu(cpu) { + if (!cpu_online(cpu)) { + printk(KERN_INFO kexec: Waking offline cpu %d.\n, + cpu); + cpu_up(cpu); + } + } +} + +static void kexec_prepare_cpus(void) +{ + wake_offline_cpus(); smp_call_function(kexec_smp_down, NULL, /* wait */0); local_irq_disable(); mb(); /* make sure IRQs are disabled before we say they are */ @@ -215,7 +240,10 @@ static void kexec_prepare_cpus(void) if (ppc_md.kexec_cpu_down) ppc_md.kexec_cpu_down(0, 0); - /* Before removing MMU mapings make sure all CPUs have entered real mode */ + /* +* Before removing MMU mappings make sure all CPUs have entered real +* mode: +*/ kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE); put_cpu(); @@ -284,6 +312,8 @@ void default_machine_kexec(struct kimage *image) if (crashing_cpu == -1) kexec_prepare_cpus(); + pr_debug(kexec: Starting switchover sequence.\n); + /* switch to a staticly allocated stack. Based on irq stack code. * XXX: the task struct will likely be invalid once we do the copy! */ -- 1.6.3.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] powerpc/kexec: Switch to a static PACA on the way out
With dynamic PACAs, the kexecing CPU's PACA won't lie within the kernel static data and there is a chance that something may stomp it when preparing to kexec. This patch switches this final CPU to a static PACA just before we pull the switch. Signed-off-by: Matt Evans m...@ozlabs.org --- v2: Changes from Milton's review: - Use setup_paca() and move from setup_64.c, - SLB cache inval. not required, - Adjust 'paca' (oops..), and - Poison data_offset/per_cpu_offset arch/powerpc/include/asm/paca.h|2 +- arch/powerpc/kernel/machine_kexec_64.c | 20 arch/powerpc/kernel/paca.c | 10 ++ arch/powerpc/kernel/setup_64.c | 10 -- 4 files changed, 31 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 8ce7963..1ff6662 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -146,7 +146,7 @@ struct paca_struct { extern struct paca_struct *paca; extern __initdata struct paca_struct boot_paca; extern void initialise_paca(struct paca_struct *new_paca, int cpu); - +extern void setup_paca(struct paca_struct *new_paca); extern void allocate_pacas(void); extern void free_unused_pacas(void); diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 26f9900..c4d0123 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -273,6 +273,12 @@ static void kexec_prepare_cpus(void) static union thread_union kexec_stack __init_task_data = { }; +/* + * For similar reasons to the stack above, the kexecing CPU needs to be on a + * static PACA; we switch to kexec_paca. + */ +struct paca_struct kexec_paca; + /* Our assembly helper, in kexec_stub.S */ extern NORET_TYPE void kexec_sequence(void *newstack, unsigned long start, void *image, void *control, @@ -300,6 +306,20 @@ void default_machine_kexec(struct kimage *image) kexec_stack.thread_info.task = current_thread_info()-task; kexec_stack.thread_info.flags = 0; + /* We need a static PACA, too; copy this CPU's PACA over and switch to +* it. Also poison per_cpu_offset to catch anyone using non-static +* data. +*/ + memcpy(kexec_paca, get_paca(), sizeof(struct paca_struct)); + kexec_paca.data_offset = 0xedeaddeadeeeUL; + paca = (struct paca_struct *)RELOC_HIDE(kexec_paca, 0) - + kexec_paca.paca_index; + setup_paca(kexec_paca); + + /* XXX: If anyone does 'dynamic lppacas' this will also need to be +* switched to a static version! +*/ + /* Some things are best done in assembly. Finding globals with * a toc is easier in C, so pass in what we can. */ diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c index f88acf0..3db8d64 100644 --- a/arch/powerpc/kernel/paca.c +++ b/arch/powerpc/kernel/paca.c @@ -105,6 +105,16 @@ void __init initialise_paca(struct paca_struct *new_paca, int cpu) #endif /* CONFIG_PPC_STD_MMU_64 */ } +/* Put the paca pointer into r13 and SPRG_PACA */ +void setup_paca(struct paca_struct *new_paca) +{ + local_paca = new_paca; + mtspr(SPRN_SPRG_PACA, local_paca); +#ifdef CONFIG_PPC_BOOK3E + mtspr(SPRN_SPRG_TLB_EXFRAME, local_paca-extlb); +#endif +} + static int __initdata paca_size; void __init allocate_pacas(void) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index f3fb5a7..6efbed4 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -142,16 +142,6 @@ early_param(smt-enabled, early_smt_enabled); #define check_smt_enabled() #endif /* CONFIG_SMP */ -/* Put the paca pointer into r13 and SPRG_PACA */ -static void __init setup_paca(struct paca_struct *new_paca) -{ - local_paca = new_paca; - mtspr(SPRN_SPRG_PACA, local_paca); -#ifdef CONFIG_PPC_BOOK3E - mtspr(SPRN_SPRG_TLB_EXFRAME, local_paca-extlb); -#endif -} - /* * Early initialization entry point. This is called by head.S * with MMU translation disabled. We rely on the feature of -- 1.6.3.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] kexec/ppc64: Switch to a static PACA on the way out
With dynamic PACAs, the kexecing CPU's PACA won't lie within the kernel static data and there is a chance that something may stomp it when preparing to kexec. This patch switches this final CPU to a static PACA just before we pull the switch. Signed-off-by: Matt Evans m...@ozlabs.org --- arch/powerpc/kernel/machine_kexec_64.c | 19 +++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 040bd1d..1348e84 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -253,6 +253,12 @@ static void kexec_prepare_cpus(void) static union thread_union kexec_stack __init_task_data = { }; +/* + * For similar reasons to the stack above, the kexecing CPU needs to be on a + * static PACA; we switch to kexec_paca. + */ +struct paca_struct kexec_paca; + /* Our assembly helper, in kexec_stub.S */ extern NORET_TYPE void kexec_sequence(void *newstack, unsigned long start, void *image, void *control, @@ -280,6 +286,19 @@ void default_machine_kexec(struct kimage *image) kexec_stack.thread_info.task = current_thread_info()-task; kexec_stack.thread_info.flags = 0; + /* We need a static PACA, too; copy this CPU's PACA over and switch to +* it. Also make the SLB cache look invalid as it may have been touched +* by an IRQ before we switch to it. +*/ + memcpy(kexec_paca, get_paca(), sizeof(struct paca_struct)); + kexec_paca.slb_cache_ptr = SLB_CACHE_ENTRIES+1; + /* 'local_paca' boils down to GPR13 */ + local_paca = kexec_paca; + + /* XXX: If anyone does 'dynamic lppacas' this will also need to be +* switched to a static version! +*/ + /* Some things are best done in assembly. Finding globals with * a toc is easier in C, so pass in what we can. */ -- 1.6.3.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] perf_event: Fix for power_pmu_disable()
When power_pmu_disable() removes the given event from a particular index into cpuhw-event[], it shuffles down higher event[] entries. But, this array is paired with cpuhw-events[] and cpuhw-flags[] so should shuffle them similarly. If these arrays get out of sync, code such as power_check_constraints() will fail. This caused a bug where events were temporarily disabled and then failed to be re-enabled; subsequent code tried to write_pmc() with its (disabled) idx of 0, causing a message oops trying to write PMC0. This triggers this bug on POWER7, running a miss-heavy test: perf record -e L1-dcache-load-misses -e L1-dcache-store-misses ./misstest Signed-off-by: Matt Evans m...@ozlabs.org --- arch/powerpc/kernel/perf_event.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c index 08460a2..3766398 100644 --- a/arch/powerpc/kernel/perf_event.c +++ b/arch/powerpc/kernel/perf_event.c @@ -838,8 +838,11 @@ static void power_pmu_disable(struct perf_event *event) cpuhw = __get_cpu_var(cpu_hw_events); for (i = 0; i cpuhw-n_events; ++i) { if (event == cpuhw-event[i]) { - while (++i cpuhw-n_events) + while (++i cpuhw-n_events) { cpuhw-event[i-1] = cpuhw-event[i]; + cpuhw-events[i-1] = cpuhw-events[i]; + cpuhw-flags[i-1] = cpuhw-flags[i]; + } --cpuhw-n_events; ppmu-disable_pmc(event-hw.idx - 1, cpuhw-mmcr); if (event-hw.idx) { -- 1.6.3.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
Howdy Alex! Alexander Graf wrote: We will soon start and replace instructions from the text section with other, paravirtualized versions. To ease the readability of those patches I split out the generic looping and magic page mapping code out. This patch still only contains stubs. But at least it loops through the text section :). Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 59 + 1 files changed, 59 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 2d8dd73..d873bc6 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -32,3 +32,62 @@ #define KVM_MAGIC_PAGE (-4096L) #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) +static bool kvm_patching_worked = true; + +static void kvm_map_magic_page(void *data) +{ + kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, +KVM_MAGIC_PAGE, /* Physical Address */ +KVM_MAGIC_PAGE); /* Effective Address */ +} + +static void kvm_check_ins(u32 *inst) +{ + u32 _inst = *inst; + u32 inst_no_rt = _inst ~KVM_MASK_RT; + u32 inst_rt = _inst KVM_MASK_RT; + + switch (inst_no_rt) { + } + + switch (_inst) { + } + + flush_icache_range((ulong)inst, (ulong)inst + 4); +} + +static void kvm_use_magic_page(void) +{ + u32 *p; + u32 *start, *end; + + /* Tell the host to map the magic page to -4096 on all CPUs */ + + on_each_cpu(kvm_map_magic_page, NULL, 1); + + /* Now loop through all code and find instructions */ + + start = (void*)_stext; + end = (void*)_etext; + + for (p = start; p end; p++) + kvm_check_ins(p); +} Could you do something similar in module_finalize() to patch loaded modules' .text sections? + +static int __init kvm_guest_init(void) +{ + char *p; + + if (!kvm_para_available()) + return 0; + + if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE)) + kvm_use_magic_page(); + + printk(KERN_INFO KVM: Live patching for a fast VM %s\n, + kvm_patching_worked ? worked : failed); + + return 0; +} + +postcore_initcall(kvm_guest_init); Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] kexec, ppc64: Wait for online/possible CPUs only.
kexec_perpare_cpus_wait() iterates i through NR_CPUS to check paca[i].kexec_state of each to make sure they have quiesced. However now we have dynamic PACA allocation, paca[NR_CPUS] is not necessarily valid and we overrun the array; spurious cpu is not possible, ignoring errors result. This patch iterates for_each_online_cpu so stays within the bounds of paca[] -- and every CPU is now 'possible'. Signed-off-by: Matt Evans m...@ozlabs.org --- arch/powerpc/kernel/machine_kexec_64.c | 18 +- 1 files changed, 1 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 26f9900..ed31a29 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -182,28 +182,12 @@ static void kexec_prepare_cpus_wait(int wait_state) my_cpu = get_cpu(); /* Make sure each CPU has atleast made it to the state we need */ - for (i=0; i NR_CPUS; i++) { + for_each_online_cpu(i) { if (i == my_cpu) continue; while (paca[i].kexec_state wait_state) { barrier(); - if (!cpu_possible(i)) { - printk(kexec: cpu %d hw_cpu_id %d is not -possible, ignoring\n, - i, paca[i].hw_cpu_id); - break; - } - if (!cpu_online(i)) { - /* Fixme: this can be spinning in -* pSeries_secondary_wait with a paca -* waiting for it to go online. -*/ - printk(kexec: cpu %d hw_cpu_id %d is not -online, ignoring\n, - i, paca[i].hw_cpu_id); - break; - } if (i != notified) { printk( kexec: waiting for cpu %d (physical %d) to enter %i state\n, -- 1.6.3.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Emulate most Book I instructions in emulate_step()
Paul Mackerras wrote: [snip] The second alternative -- emulating the lwarx/stwcx and all the instructions in between -- sounds complicated but turns out to be pretty straightforward in fact, since the code for each instruction is pretty small, easy to verify that it's correct, and has little interaction with other code. Easy to verify -- visually or logically? Having had a little experience with interpreters 'invisibly' operating behind the scenes I am all for very rigorous testing of these things. I have lost at least four of my nine lives to incorrect flag values, odd data problems and hideous heisenbugs etc. of such interpreters. Looked at another way, you'd be surprised how much one can break in an interpreter and still successfully run various programs. Presumably your first pass is completely correct already, but I'm thinking that if any future changes are made to it it would be good to include test code/modes alongside the interpreter so others can check alterations. E.g. include the run user program interpreted test switch patch, or even better compare the interpreted state to real hardware execution. There are other more directed test strategies (e.g. handwritten tests, random code) but these would be a good start. Cheers, Matt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] kexec-tools, ppc64: Fix segfault parsing DR memory property
add_dyn_reconf_usable_mem_property() iterates over memory spans in /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory and intersects these with usablemem_rgns ranges. In doing so it used an unchecked fixed-size array which will overrun on machines with lots of LMBs. This patch removes the fixed-sized arrays from add_dyn_reconf_usable_mem_property() and add_usable_mem_property(), in lieu of malloc/realloc/free. Signed-off-by: Matt Evans m...@ozlabs.org --- kexec/arch/ppc64/fs2dt.c | 82 +++--- 1 files changed, 70 insertions(+), 12 deletions(-) diff --git a/kexec/arch/ppc64/fs2dt.c b/kexec/arch/ppc64/fs2dt.c index 762bf04..4400f13 100644 --- a/kexec/arch/ppc64/fs2dt.c +++ b/kexec/arch/ppc64/fs2dt.c @@ -37,7 +37,7 @@ #define NAMESPACE 16384/* max bytes for property names */ #define INIT_TREE_WORDS 65536 /* Initial num words for prop values */ #define MEMRESERVE 256 /* max number of reserved memory blocks */ -#define MAX_MEMORY_RANGES 1024 +#define MEM_RANGE_CHUNK_SZ 2048 /* Initial num dwords for mem ranges */ static char pathname[MAXPATH], *pathstart; static char propnames[NAMESPACE] = { 0 }; @@ -148,7 +148,8 @@ static void add_dyn_reconf_usable_mem_property(int fd) { char fname[MAXPATH], *bname; uint64_t buf[32]; - uint64_t ranges[2*MAX_MEMORY_RANGES]; + uint64_t *ranges; + int ranges_size = MEM_RANGE_CHUNK_SZ; uint64_t base, end, loc_base, loc_end; size_t i, rngs_cnt, range; int rlen = 0; @@ -165,6 +166,11 @@ static void add_dyn_reconf_usable_mem_property(int fd) die(unrecoverable error: error seeking in \%s\: %s\n, pathname, strerror(errno)); + ranges = malloc(ranges_size*8); + if (!ranges) + die(unrecoverable error: can't alloc %d bytes for ranges.\n, + ranges_size*8); + rlen = 0; for (i = 0; i num_of_lmbs; i++) { if (read(fd, buf, 24) 0) @@ -180,24 +186,57 @@ static void add_dyn_reconf_usable_mem_property(int fd) rngs_cnt = 0; for (range = 0; range usablemem_rgns.size; range++) { + int add = 0; loc_base = usablemem_rgns.ranges[range].start; loc_end = usablemem_rgns.ranges[range].end; if (loc_base = base loc_end = end) { - ranges[rlen++] = loc_base; - ranges[rlen++] = loc_end - loc_base; - rngs_cnt++; + add = 1; } else if (base loc_end end loc_base) { if (loc_base base) loc_base = base; if (loc_end end) loc_end = end; + add = 1; + } + + if (add) { + if (rlen = (ranges_size-2)) { + ranges_size += MEM_RANGE_CHUNK_SZ; + ranges = realloc(ranges, ranges_size*8); + if (!ranges) + die(unrecoverable error: can't +realloc %d bytes for +ranges.\n, + ranges_size*8); + } ranges[rlen++] = loc_base; ranges[rlen++] = loc_end - loc_base; rngs_cnt++; } } - /* Store the count of (base, size) duple */ - ranges[tmp_indx] = rngs_cnt; + if (rngs_cnt == 0) { + /* We still need to add a counter for every LMB because +* the kernel parsing code is dumb. We just have +* a zero in this case, with no following base/len. +*/ + ranges[tmp_indx] = 0; + /* rlen is already just tmp_indx+1 as we didn't write +* anything. Check array size here, as we'll probably +* go on for a while writing zeros now. +*/ + if (rlen = (ranges_size-1)) { + ranges_size += MEM_RANGE_CHUNK_SZ; + ranges = realloc(ranges, ranges_size*8); + if (!ranges) + die(unrecoverable error: can't +realloc %d bytes for +ranges.\n
[PATCH] kexec-tools, ppc64: Fix segfault parsing DR memory property
add_dyn_reconf_usable_mem_property() iterates over memory spans in /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory and intersects these with usablemem_rgns ranges. Not only did it seem to write null properties for every range that didn't match, but it used an unchecked fixed-size array which will overrun on machines with lots of LMBs. This patch stops add_dyn_reconf_usable_mem_property() from adding null ranges to the linux,drconf-usable-memory property and removes its fixed-size array, as well as the array in add_usable_mem_property, in lieu of malloc/realloc/free. Signed-off-by: Matt Evans m...@ozlabs.org --- kexec/arch/ppc64/fs2dt.c | 66 + 1 files changed, 54 insertions(+), 12 deletions(-) diff --git a/kexec/arch/ppc64/fs2dt.c b/kexec/arch/ppc64/fs2dt.c index 762bf04..7470132 100644 --- a/kexec/arch/ppc64/fs2dt.c +++ b/kexec/arch/ppc64/fs2dt.c @@ -37,7 +37,7 @@ #define NAMESPACE 16384/* max bytes for property names */ #define INIT_TREE_WORDS 65536 /* Initial num words for prop values */ #define MEMRESERVE 256 /* max number of reserved memory blocks */ -#define MAX_MEMORY_RANGES 1024 +#define MEM_RANGE_CHUNK_SZ 2048 /* Initial num dwords for mem ranges */ static char pathname[MAXPATH], *pathstart; static char propnames[NAMESPACE] = { 0 }; @@ -148,7 +148,8 @@ static void add_dyn_reconf_usable_mem_property(int fd) { char fname[MAXPATH], *bname; uint64_t buf[32]; - uint64_t ranges[2*MAX_MEMORY_RANGES]; + uint64_t *ranges; + int ranges_size = MEM_RANGE_CHUNK_SZ; uint64_t base, end, loc_base, loc_end; size_t i, rngs_cnt, range; int rlen = 0; @@ -165,6 +166,11 @@ static void add_dyn_reconf_usable_mem_property(int fd) die(unrecoverable error: error seeking in \%s\: %s\n, pathname, strerror(errno)); + ranges = malloc(ranges_size*8); + if (!ranges) + die(unrecoverable error: can't alloc %d bytes for ranges.\n, + ranges_size*8); + rlen = 0; for (i = 0; i num_of_lmbs; i++) { if (read(fd, buf, 24) 0) @@ -180,24 +186,41 @@ static void add_dyn_reconf_usable_mem_property(int fd) rngs_cnt = 0; for (range = 0; range usablemem_rgns.size; range++) { + int add = 0; loc_base = usablemem_rgns.ranges[range].start; loc_end = usablemem_rgns.ranges[range].end; if (loc_base = base loc_end = end) { - ranges[rlen++] = loc_base; - ranges[rlen++] = loc_end - loc_base; - rngs_cnt++; + add = 1; } else if (base loc_end end loc_base) { if (loc_base base) loc_base = base; if (loc_end end) loc_end = end; + add = 1; + } + + if (add) { + if (rlen = (ranges_size-2)) { + ranges_size += MEM_RANGE_CHUNK_SZ; + ranges = realloc(ranges, ranges_size*8); + if (!ranges) + die(unrecoverable error: can't +realloc %d bytes for +ranges.\n, + ranges_size*8); + } ranges[rlen++] = loc_base; ranges[rlen++] = loc_end - loc_base; rngs_cnt++; } } - /* Store the count of (base, size) duple */ - ranges[tmp_indx] = rngs_cnt; + if (rngs_cnt == 0) { + /* Don't store anything for unwritten iterations! */ + rlen = tmp_indx; + } else { + /* Store the count of (base, size) duple */ + ranges[tmp_indx] = rngs_cnt; + } } rlen = rlen * sizeof(uint64_t); @@ -210,7 +233,8 @@ static void add_dyn_reconf_usable_mem_property(int fd) *dt++ = propnum(linux,drconf-usable-memory); if ((rlen = 8) ((unsigned long)dt 0x4)) dt++; - memcpy(dt, ranges, rlen); + memcpy(dt, ranges, rlen); + free(ranges); dt += (rlen + 3)/4; } @@ -218,7 +242,8 @@ static void add_usable_mem_property(int fd, size_t len) { char fname[MAXPATH], *bname; uint64_t buf[2]; - uint64_t ranges[2