Re: [PATCH] KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()

2022-05-10 Thread Matt Evans
Hi Alex,

> On 9 May 2022, at 21:23, Alexander Graf  wrote:
> 
> Commit 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")
> moved the switch_mmu_context() to C. While in principle a good idea, it
> meant that the function now uses the stack. The stack is not accessible
> from real mode though.
> 
> So to keep calling the function, let's turn on MSR_DR while we call it.
> That way, all pointer references to the stack are handled virtually.
> 
> Reported-by: Matt Evans 
> Fixes: 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")
> Signed-off-by: Alexander Graf 
> Cc: sta...@vger.kernel.org

Many thanks - this addresses the issue I saw, and has been...

Tested-by: Matt Evans 

...on a G4 host.  One comment though:

> —
> arch/powerpc/kvm/book3s_32_sr.S | 20 +++-
> 1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_32_sr.S b/arch/powerpc/kvm/book3s_32_sr.S
> index e3ab9df6cf19..bd4f798f7a46 100644
> --- a/arch/powerpc/kvm/book3s_32_sr.S
> +++ b/arch/powerpc/kvm/book3s_32_sr.S
> @@ -122,11 +122,21 @@
> 
>   /* 0x0 - 0xb */
> 
> - /* 'current->mm' needs to be in r4 */
> - tophys(r4, r2)
> - lwz r4, MM(r4)
> - tophys(r4, r4)
> - /* This only clobbers r0, r3, r4 and r5 */
> + /* switch_mmu_context() needs paging, let's enable it */
> + mfmsr   r9
> + ori r11, r9, MSR_DR
> + mtmsr   r11
> + sync
> +
> + /* Calling switch_mmu_context(, current->mm, ); */
> + lwz r4, MM(r2)
>   bl  switch_mmu_context

Of the volatile registers, I believe r12 is still valuable here and would need 
to be preserved.
(I can’t spot any others but would defer to your judgement here.)

For example:

diff --git a/arch/powerpc/kvm/book3s_32_sr.S b/arch/powerpc/kvm/book3s_32_sr.S
index e3ab9df6cf19..41fc9ca12d38 100644
--- a/arch/powerpc/kvm/book3s_32_sr.S
+++ b/arch/powerpc/kvm/book3s_32_sr.S
@@ -122,11 +122,23 @@
 
/* 0x0 - 0xb */
 
-   /* 'current->mm' needs to be in r4 */
-   tophys(r4, r2)
-   lwz r4, MM(r4)
-   tophys(r4, r4)
-   /* This only clobbers r0, r3, r4 and r5 */
+   /* switch_mmu_context() needs paging, let's enable it */
+   mfmsr   r9
+   ori r11, r9, MSR_DR
+   mtmsr   r11
+   sync
+
+   SAVE_GPR(12, r1)
+   /* Calling switch_mmu_context(, current->mm, ); */
+   lwz r4, MM(r2)
bl  switch_mmu_context
+   REST_GPR(12, r1)
+
+   /* Disable paging again */
+   mfmsr   r9
+   li  r6, MSR_DR
+   andcr9, r9, r6
+   mtmsr   r9
+   sync
 
 .endm


Matt

> 
> + /* Disable paging again */
> + mfmsr   r9
> + li  r6, MSR_DR
> + andcr9, r9, r6
> + mtmsr   r9
> + sync
> +
> .endm
> -- 
> 2.28.0.394.ge197136389
> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 
> 



Re: [PATCH] powerpc: Clear branch trap (MSR.BE) before delivering SIGTRAP

2018-03-29 Thread Matt Evans
Hi Michael,

> On 28 Mar 2018, at 11:36, Matt Evans <m...@ozlabs.org> wrote:
> 
> Howdy Michael,
> 
>> On 28 Mar 2018, at 06:54, Michael Ellerman <m...@ellerman.id.au> wrote:
>> 
>> Matt Evans <m...@ozlabs.org> writes:
>> 
>>> When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the
>>> user context when single_step_exception() prepares the SIGTRAP
>>> delivery.  The resulting branch-trap-within-the-SIGTRAP-handler
>>> isn't healthy.
>>> 
>>> Commit 2538c2d08f46141550a1e68819efa8fe31c6e3dc broke this, by
>>> replacing an MSR mask operation of ~(MSR_SE | MSR_BE) with a call
>>> to clear_single_step() which only clears MSR_SE.
>>> 
>>> This patch adds a new helper, clear_br_trace(), which clears the
>>> debug trap before invoking the signal handler.  This helper is a
>>> NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE.
>>> 
>>> Signed-off-by: Matt Evans <m...@ozlabs.org>
>> 
>> Hi Matt!
>> 
>> It seems we might not be regularly testing this code :}
> 
> I know, rite? ;-)
> 
>> How did you hit/find the bug? And do you have a test case by any chance?
>> 
>> I found the test code at the bottom of:
>> https://lwn.net/Articles/114587/
>> 
>> But it didn't immediately work.
> 
> I'm using this feature as part of a debug harness I wrote to log a program’s 
> control flow (to create a “known good” pattern to compare a PPC interpreter 
> against).  So at least the feature has /one/ user.  ;-)
> 
> The symptoms of the bug are that if you use single-stepping you get a 
> sequence of SIGTRAPs representing each instruction completion (good), but if 
> you use branch tracing the process just dies with SIGTRAP (looks like it’s 
> never caught by the signal handler).  What’s really happening is that there 
> /is/ a signal delivered to the handler, but (because branch tracing is left 
> on) that then causes a second debug exception from the handler itself, i.e. 
> whilst SIGTRAP’s masked.
> 
> OK, let me have a dig to reduce my program to something very basic and I’ll 
> post something — sorry, I should’ve got a PoC ready before.  (I did start out 
> inspired by that post you linked to, but IIRC I don’t think it worked out of 
> the box for me either.)


I’ve put a simple SIG_DBG_BRANCH_TRACING test program here:

http://ozlabs.org/~matt/files/sig_dbg_brtrace_test.c

It’s commented regarding expected output.  I’ve only tested this on a G4 — it 
should work on PPC64 too but the ISA says support for branch tracing is 
optional for an implementation.  I’d be interested in what POWERx does.  :)


Cheers,


Matt






Re: [PATCH] powerpc: Clear branch trap (MSR.BE) before delivering SIGTRAP

2018-03-28 Thread Matt Evans
Howdy Michael,

> On 28 Mar 2018, at 06:54, Michael Ellerman <m...@ellerman.id.au> wrote:
> 
> Matt Evans <m...@ozlabs.org> writes:
> 
>> When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the
>> user context when single_step_exception() prepares the SIGTRAP
>> delivery.  The resulting branch-trap-within-the-SIGTRAP-handler
>> isn't healthy.
>> 
>> Commit 2538c2d08f46141550a1e68819efa8fe31c6e3dc broke this, by
>> replacing an MSR mask operation of ~(MSR_SE | MSR_BE) with a call
>> to clear_single_step() which only clears MSR_SE.
>> 
>> This patch adds a new helper, clear_br_trace(), which clears the
>> debug trap before invoking the signal handler.  This helper is a
>> NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE.
>> 
>> Signed-off-by: Matt Evans <m...@ozlabs.org>
> 
> Hi Matt!
> 
> It seems we might not be regularly testing this code :}

I know, rite? ;-)

> How did you hit/find the bug? And do you have a test case by any chance?
> 
> I found the test code at the bottom of:
>  https://lwn.net/Articles/114587/
> 
> But it didn't immediately work.

I'm using this feature as part of a debug harness I wrote to log a program’s 
control flow (to create a “known good” pattern to compare a PPC interpreter 
against).  So at least the feature has /one/ user.  ;-)

The symptoms of the bug are that if you use single-stepping you get a sequence 
of SIGTRAPs representing each instruction completion (good), but if you use 
branch tracing the process just dies with SIGTRAP (looks like it’s never caught 
by the signal handler).  What’s really happening is that there /is/ a signal 
delivered to the handler, but (because branch tracing is left on) that then 
causes a second debug exception from the handler itself, i.e. whilst SIGTRAP’s 
masked.

OK, let me have a dig to reduce my program to something very basic and I’ll 
post something — sorry, I should’ve got a PoC ready before.  (I did start out 
inspired by that post you linked to, but IIRC I don’t think it worked out of 
the box for me either.)


Cheers,


Matt



[PATCH] powerpc: Clear branch trap (MSR.BE) before delivering SIGTRAP

2018-03-26 Thread Matt Evans
When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the
user context when single_step_exception() prepares the SIGTRAP
delivery.  The resulting branch-trap-within-the-SIGTRAP-handler
isn't healthy.

Commit 2538c2d08f46141550a1e68819efa8fe31c6e3dc broke this, by
replacing an MSR mask operation of ~(MSR_SE | MSR_BE) with a call
to clear_single_step() which only clears MSR_SE.

This patch adds a new helper, clear_br_trace(), which clears the
debug trap before invoking the signal handler.  This helper is a
NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE.

Signed-off-by: Matt Evans <m...@ozlabs.org>
---
 arch/powerpc/kernel/traps.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 1e48d157196a..5eaab234e747 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -460,7 +460,7 @@ static inline int check_io_access(struct pt_regs *regs)
 /* single-step stuff */
 #define single_stepping(regs)  (current->thread.debug.dbcr0 & DBCR0_IC)
 #define clear_single_step(regs)(current->thread.debug.dbcr0 &= 
~DBCR0_IC)
-
+#define clear_br_trace(regs)   do {} while(0)
 #else
 /* On non-4xx, the reason for the machine check or program
exception is in the MSR. */
@@ -473,6 +473,7 @@ static inline int check_io_access(struct pt_regs *regs)
 
 #define single_stepping(regs)  ((regs)->msr & MSR_SE)
 #define clear_single_step(regs)((regs)->msr &= ~MSR_SE)
+#define clear_br_trace(regs)   ((regs)->msr &= ~MSR_BE)
 #endif
 
 #if defined(CONFIG_E500)
@@ -988,6 +989,7 @@ void single_step_exception(struct pt_regs *regs)
enum ctx_state prev_state = exception_enter();
 
clear_single_step(regs);
+   clear_br_trace(regs);
 
if (kprobe_post_handler(regs))
return;
-- 
2.14.1




Re: [RESEND PATCH 2/2] ppc: bpf_jit: support MOD operation

2013-09-22 Thread Matt Evans
Hi Vladimir,

On 21 Sep 2013, at 17:25, Vladimir Murzin murzi...@gmail.com wrote:

 commit b6069a9570 (filter: add MOD operation) added generic
 support for modulus operation in BPF.
 
 This patch brings JIT support for PPC64
 
 Signed-off-by: Vladimir Murzin murzi...@gmail.com
 Acked-by: Matt Evans m...@ozlabs.org

Not this version, though; see below. 

 ---
 arch/powerpc/net/bpf_jit_comp.c | 22 ++
 1 file changed, 22 insertions(+)
 
 diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
 index bf56e33..96f24dc 100644
 --- a/arch/powerpc/net/bpf_jit_comp.c
 +++ b/arch/powerpc/net/bpf_jit_comp.c
 @@ -193,6 +193,28 @@ static int bpf_jit_build_body(struct sk_filter *fp, u32 
 *image,
PPC_MUL(r_A, r_A, r_scratch1);
}
break;
 +case BPF_S_ALU_MOD_X: /* A %= X; */
 +ctx-seen |= SEEN_XREG;
 +PPC_CMPWI(r_X, 0);
 +if (ctx-pc_ret0 != -1) {
 +PPC_BCC(COND_EQ, addrs[ctx-pc_ret0]);
 +} else {
 +PPC_BCC_SHORT(COND_NE, (ctx-idx*4)+12);
 +PPC_LI(r_ret, 0);
 +PPC_JMP(exit_addr);
 +}
 +PPC_DIVWU(r_scratch1, r_A, r_X);
 +PPC_MUL(r_scratch1, r_X, r_scratch1);
 +PPC_SUB(r_A, r_A, r_scratch1);
 +break;
 +case BPF_S_ALU_MOD_K: /* A %= K; */
 +#define r_scratch2 (r_scratch1 + 1)

Old version of this patch, still?  I had hoped that r_scratch2 would be defined 
in the header.

 +PPC_LI32(r_scratch2, K);
 +PPC_DIVWU(r_scratch1, r_A, r_scratch2);
 +PPC_MUL(r_scratch1, r_scratch2, r_scratch1);
 +PPC_SUB(r_A, r_A, r_scratch1);
 +#undef r_scratch2

And remember this guy too.. :)


Matt

 +break;
case BPF_S_ALU_DIV_X: /* A /= X; */
ctx-seen |= SEEN_XREG;
PPC_CMPWI(r_X, 0);
 -- 
 1.8.1.5
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: net: filter: fix DIVWU instruction opcode

2013-09-11 Thread Matt Evans
On 12 Sep 2013, at 10:02, Michael Neuling mi...@neuling.org wrote:

 Vladimir Murzin murzi...@gmail.com wrote:
 
 Currently DIVWU stands for *signed* divw opcode:
 
 7d 2a 4b 96divwu   r9,r10,r9
 7d 2a 4b d6divwr9,r10,r9
 
 Use the *unsigned* divw opcode for DIVWU.
 
 This looks like it's in only used in the BPF JIT code.  
 
 Matt, any chance you an ACK/NACK this?

Sure, that looks sensible, thanks Vladimir. 

Acked-by: Matt Evans m...@ozlabs.org

 
 Mikey
 
 
 Signed-off-by: Vladimir Murzin murzi...@gmail.com
 ---
 arch/powerpc/include/asm/ppc-opcode.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
 b/arch/powerpc/include/asm/ppc-opcode.h
 index d7fe9f5..c91842c 100644
 --- a/arch/powerpc/include/asm/ppc-opcode.h
 +++ b/arch/powerpc/include/asm/ppc-opcode.h
 @@ -218,7 +218,7 @@
 #define PPC_INST_MULLW0x7c0001d6
 #define PPC_INST_MULHWU0x7c16
 #define PPC_INST_MULLI0x1c00
 -#define PPC_INST_DIVWU0x7c0003d6
 +#define PPC_INST_DIVWU0x7c000396
 #define PPC_INST_RLWINM0x5400
 #define PPC_INST_RLDICR0x7804
 #define PPC_INST_SLW0x7c30
 -- 
 1.7.10.4
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] ppc: bpf_jit: support MOD operation

2013-09-11 Thread Matt Evans
Hi Ben, Vladimir,


*dusts off very thick PPC cobwebs*  Sorry for the delay as I'm travelling, 
didn't get to this until now.

On 02/09/2013, at 9:45 PM, Benjamin Herrenschmidt wrote:

 On Mon, 2013-09-02 at 19:48 +0200, Vladimir Murzin wrote:
 Ping
 
 On Wed, Aug 28, 2013 at 02:49:52AM +0400, Vladimir Murzin wrote:
 commit b6069a9570 (filter: add MOD operation) added generic
 support for modulus operation in BPF.
 
 Sorry, nobody got a chance to review that yet. Unfortunately Matt
 doesn't work for us anymore and none of us has experience with the
 BPF code, so somebody (possibly me) will need to spend a bit of time
 figuring it out before verifying that is correct.
 
 Do you have a test case/suite by any chance ?
 
 Ben.
 
 This patch brings JIT support for PPC64
 
 Signed-off-by: Vladimir Murzin murzi...@gmail.com
 ---
 arch/powerpc/net/bpf_jit_comp.c | 22 ++
 1 file changed, 22 insertions(+)
 
 diff --git a/arch/powerpc/net/bpf_jit_comp.c 
 b/arch/powerpc/net/bpf_jit_comp.c
 index bf56e33..96f24dc 100644
 --- a/arch/powerpc/net/bpf_jit_comp.c
 +++ b/arch/powerpc/net/bpf_jit_comp.c
 @@ -193,6 +193,28 @@ static int bpf_jit_build_body(struct sk_filter *fp, 
 u32 *image,
 PPC_MUL(r_A, r_A, r_scratch1);
 }
 break;
 +   case BPF_S_ALU_MOD_X: /* A %= X; */
 +   ctx-seen |= SEEN_XREG;
 +   PPC_CMPWI(r_X, 0);
 +   if (ctx-pc_ret0 != -1) {
 +   PPC_BCC(COND_EQ, addrs[ctx-pc_ret0]);
 +   } else {
 +   PPC_BCC_SHORT(COND_NE, (ctx-idx*4)+12);
 +   PPC_LI(r_ret, 0);
 +   PPC_JMP(exit_addr);
 +   }
 +   PPC_DIVWU(r_scratch1, r_A, r_X);
 +   PPC_MUL(r_scratch1, r_X, r_scratch1);
 +   PPC_SUB(r_A, r_A, r_scratch1);
 +   break;

Without having compiled  tested this, it looks fine to me (especially with the 
corrected DIVWU opcode in the other patch, oops...).

 +   case BPF_S_ALU_MOD_K: /* A %= K; */
 +#define r_scratch2 (r_scratch1 + 1)
 +   PPC_LI32(r_scratch2, K);
 +   PPC_DIVWU(r_scratch1, r_A, r_scratch2);
 +   PPC_MUL(r_scratch1, r_scratch2, r_scratch1);
 +   PPC_SUB(r_A, r_A, r_scratch1);
 +#undef r_scratch2
 +   break;

If you need another scratch register, it should really be defined in bpf_jit.h 
instead.

Once you define r_scratch2 in there,

Acked-by: Matt Evans m...@ozlabs.org


Thanks!


Matt




 case BPF_S_ALU_DIV_X: /* A /= X; */
 ctx-seen |= SEEN_XREG;
 PPC_CMPWI(r_X, 0);
 -- 
 1.8.1.5
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] PPC: Add __SANE_USERSPACE_TYPES__ to asm/types.h for LL64

2011-12-07 Thread Matt Evans
PPC64 uses long long for u64 in the kernel, but powerpc's asm/types.h
prevents 64-bit userland from seeing this definition, instead defaulting
to u64 == long in userspace.  Some user programs (e.g. kvmtool) may actually
want LL64, so this patch adds a check for __SANE_USERSPACE_TYPES__ so that,
if defined, int-ll64.h is included instead.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 arch/powerpc/include/asm/types.h |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/types.h b/arch/powerpc/include/asm/types.h
index 8947b98..d82e94e 100644
--- a/arch/powerpc/include/asm/types.h
+++ b/arch/powerpc/include/asm/types.h
@@ -5,8 +5,11 @@
  * This is here because we used to use l64 for 64bit powerpc
  * and we don't want to impact user mode with our change to ll64
  * in the kernel.
+ *
+ * However, some user programs are fine with this.  They can
+ * flag __SANE_USERSPACE_TYPES__ to get int-ll64.h here.
  */
-#if defined(__powerpc64__)  !defined(__KERNEL__)
+#if !defined(__SANE_USERSPACE_TYPES__)  defined(__powerpc64__)  
!defined(__KERNEL__)
 # include asm-generic/int-l64.h
 #else
 # include asm-generic/int-ll64.h
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3] net: filter: BPF 'JIT' compiler for PPC64

2011-07-20 Thread Matt Evans
An implementation of a code generator for BPF programs to speed up packet
filtering on PPC64, inspired by Eric Dumazet's x86-64 version.

Filter code is generated as an ABI-compliant function in module_alloc()'d mem
with stackframe  prologue/epilogue generated if required (simple filters don't
need anything more than an li/blr).  The filter's local variables, M[], live in
registers.  Supports all BPF opcodes, although complicated loads from negative
packet offsets (e.g. SKF_LL_OFF) are not yet supported.

There are a couple of further optimisations left for future work; many-pass
assembly with branch-reach reduction and a register allocator to push M[]
variables into volatile registers would improve the code quality further.

This currently supports big-endian 64-bit PowerPC only (but is fairly simple
to port to PPC32 or LE!).

Enabled in the same way as x86-64:

echo 1  /proc/sys/net/core/bpf_jit_enable

Or, enabled with extra debug output:

echo 2  /proc/sys/net/core/bpf_jit_enable

Signed-off-by: Matt Evans m...@ozlabs.org
---

V3: Added BUILD_BUG_ON to assert PACA CPU ID is 16bits, made a comment (in
LD_MSH) a bit clearer, ratelimited Unknown opcode error and moved
bpf_jit.S to bpf_jit_64.S (it doesn't make sense to rename bpf_jit_comp.c as
small portions will eventually get split out into _32/_64.c files when we do
32bit support).

 arch/powerpc/Kconfig  |1 +
 arch/powerpc/Makefile |3 +-
 arch/powerpc/include/asm/ppc-opcode.h |   40 ++
 arch/powerpc/net/Makefile |4 +
 arch/powerpc/net/bpf_jit.h|  227 +++
 arch/powerpc/net/bpf_jit_64.S |  138 +++
 arch/powerpc/net/bpf_jit_comp.c   |  694 +
 7 files changed, 1106 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2729c66..39860fc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -134,6 +134,7 @@ config PPC
select GENERIC_IRQ_SHOW_LEVEL
select HAVE_RCU_TABLE_FREE if SMP
select HAVE_SYSCALL_TRACEPOINTS
+   select HAVE_BPF_JIT if PPC64
 
 config EARLY_PRINTK
bool
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index b7212b6..b94740f 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -154,7 +154,8 @@ core-y  += arch/powerpc/kernel/ 
\
   arch/powerpc/lib/ \
   arch/powerpc/sysdev/ \
   arch/powerpc/platforms/ \
-  arch/powerpc/math-emu/
+  arch/powerpc/math-emu/ \
+  arch/powerpc/net/
 core-$(CONFIG_XMON)+= arch/powerpc/xmon/
 core-$(CONFIG_KVM) += arch/powerpc/kvm/
 
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index e472659..e980faa 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -71,6 +71,42 @@
 #define PPC_INST_ERATSX0x7c000126
 #define PPC_INST_ERATSX_DOT0x7c000127
 
+/* Misc instructions for BPF compiler */
+#define PPC_INST_LD0xe800
+#define PPC_INST_LHZ   0xa000
+#define PPC_INST_LWZ   0x8000
+#define PPC_INST_STD   0xf800
+#define PPC_INST_STDU  0xf801
+#define PPC_INST_MFLR  0x7c0802a6
+#define PPC_INST_MTLR  0x7c0803a6
+#define PPC_INST_CMPWI 0x2c00
+#define PPC_INST_CMPDI 0x2c20
+#define PPC_INST_CMPLW 0x7c40
+#define PPC_INST_CMPLWI0x2800
+#define PPC_INST_ADDI  0x3800
+#define PPC_INST_ADDIS 0x3c00
+#define PPC_INST_ADD   0x7c000214
+#define PPC_INST_SUB   0x7c50
+#define PPC_INST_BLR   0x4e800020
+#define PPC_INST_BLRL  0x4e800021
+#define PPC_INST_MULLW 0x7c0001d6
+#define PPC_INST_MULHWU0x7c16
+#define PPC_INST_MULLI 0x1c00
+#define PPC_INST_DIVWU 0x7c0003d6
+#define PPC_INST_RLWINM0x5400
+#define PPC_INST_RLDICR0x7804
+#define PPC_INST_SLW   0x7c30
+#define PPC_INST_SRW   0x7c000430
+#define PPC_INST_AND   0x7c38
+#define PPC_INST_ANDDOT0x7c39
+#define PPC_INST_OR0x7c000378
+#define PPC_INST_ANDI  0x7000
+#define PPC_INST_ORI   0x6000
+#define PPC_INST_ORIS  0x6400
+#define PPC_INST_NEG   0x7cd0
+#define PPC_INST_BRANCH0x4800

Re: [PATCH v2] net: filter: BPF 'JIT' compiler for PPC64

2011-07-19 Thread Matt Evans
On 19/07/11 16:59, Kumar Gala wrote:
 
 On Jul 18, 2011, at 9:13 PM, Matt Evans wrote:
 
 An implementation of a code generator for BPF programs to speed up packet
 filtering on PPC64, inspired by Eric Dumazet's x86-64 version.

 Filter code is generated as an ABI-compliant function in module_alloc()'d mem
 with stackframe  prologue/epilogue generated if required (simple filters 
 don't
 need anything more than an li/blr).  The filter's local variables, M[], live 
 in
 registers.  Supports all BPF opcodes, although complicated loads from 
 negative
 packet offsets (e.g. SKF_LL_OFF) are not yet supported.

 There are a couple of further optimisations left for future work; many-pass
 assembly with branch-reach reduction and a register allocator to push M[]
 variables into volatile registers would improve the code quality further.

 This currently supports big-endian 64-bit PowerPC only (but is fairly simple
 to port to PPC32 or LE!).

 Enabled in the same way as x86-64:

  echo 1  /proc/sys/net/core/bpf_jit_enable

 Or, enabled with extra debug output:

  echo 2  /proc/sys/net/core/bpf_jit_enable

 Signed-off-by: Matt Evans m...@ozlabs.org
 ---

 V2: Removed some cut/paste woe in setting SEEN_X even on writes.
Merci for le review, Eric!

 arch/powerpc/Kconfig  |1 +
 arch/powerpc/Makefile |3 +-
 arch/powerpc/include/asm/ppc-opcode.h |   40 ++
 arch/powerpc/net/Makefile |4 +
 arch/powerpc/net/bpf_jit.S|  138 +++
 
 can we rename to bpf_jit_64.S, since this doesn't work on PPC32.
 
 arch/powerpc/net/bpf_jit.h|  227 +++
 arch/powerpc/net/bpf_jit_comp.c   |  690 
 +
 
 same here, or split between bpf_jit_comp.c (shared between ppc32  ppc64) and
 bpf_jit_comp_64.c

A reasonable suggestion -- bpf_jit_64.S certainly.  I think it may not be worth
splitting bpf_jit_comp.c until we support both tho?  (I'm thinking
bpf_jit_comp_{32,64}.c would just house the stackframe generation code which is
the main difference, plus compile-time switched macros for the odd LD vs LWZ.)

Sorry it's not 32bit-friendly just yet (I knew you'd ask, hehe), I've postponed
that for when I get a mo :-)

Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] net: filter: BPF 'JIT' compiler for PPC64

2011-07-19 Thread Matt Evans
On 19/07/11 17:17, Kumar Gala wrote:
 
 On Jul 19, 2011, at 2:06 AM, Matt Evans wrote:
 
 On 19/07/11 16:59, Kumar Gala wrote:

 On Jul 18, 2011, at 9:13 PM, Matt Evans wrote:

 [snip]

 V2: Removed some cut/paste woe in setting SEEN_X even on writes.
   Merci for le review, Eric!

 arch/powerpc/Kconfig  |1 +
 arch/powerpc/Makefile |3 +-
 arch/powerpc/include/asm/ppc-opcode.h |   40 ++
 arch/powerpc/net/Makefile |4 +
 arch/powerpc/net/bpf_jit.S|  138 +++

 can we rename to bpf_jit_64.S, since this doesn't work on PPC32.

 arch/powerpc/net/bpf_jit.h|  227 +++
 arch/powerpc/net/bpf_jit_comp.c   |  690 
 +

 same here, or split between bpf_jit_comp.c (shared between ppc32  ppc64) 
 and
 bpf_jit_comp_64.c

 A reasonable suggestion -- bpf_jit_64.S certainly.  I think it may not be 
 worth
 splitting bpf_jit_comp.c until we support both tho?  (I'm thinking
 bpf_jit_comp_{32,64}.c would just house the stackframe generation code which 
 is
 the main difference, plus compile-time switched macros for the odd LD vs 
 LWZ.)
 
 If its most 64-bit specific than just go with bpf_jit_comp_64.c for now.  We 
 can refactor later.

Nah, other way round -- it's almost all agnostic but with a couple of functions
that I was recommending moving out to a _64.c and _32.c later, leaving the bulk
still in bpf_jit_comp.c.


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] net: filter: BPF 'JIT' compiler for PPC64

2011-07-18 Thread Matt Evans
An implementation of a code generator for BPF programs to speed up packet
filtering on PPC64, inspired by Eric Dumazet's x86-64 version.

Filter code is generated as an ABI-compliant function in module_alloc()'d mem
with stackframe  prologue/epilogue generated if required (simple filters don't
need anything more than an li/blr).  The filter's local variables, M[], live in
registers.  Supports all BPF opcodes, although complicated loads from negative
packet offsets (e.g. SKF_LL_OFF) are not yet supported.

There are a couple of further optimisations left for future work; many-pass
assembly with branch-reach reduction and a register allocator to push M[]
variables into volatile registers would improve the code quality further.

This currently supports big-endian 64-bit PowerPC only (but is fairly simple
to port to PPC32 or LE!).

Enabled in the same way as x86-64:

echo 1  /proc/sys/net/core/bpf_jit_enable

Or, enabled with extra debug output:

echo 2  /proc/sys/net/core/bpf_jit_enable

Signed-off-by: Matt Evans m...@ozlabs.org
---

Since the RFC post, this has incorporated the bugfixes/tidies from review plus a
couple more found in further testing, plus some general/comment tidies.

 arch/powerpc/Kconfig  |1 +
 arch/powerpc/Makefile |3 +-
 arch/powerpc/include/asm/ppc-opcode.h |   40 ++
 arch/powerpc/net/Makefile |4 +
 arch/powerpc/net/bpf_jit.S|  138 +++
 arch/powerpc/net/bpf_jit.h|  227 +++
 arch/powerpc/net/bpf_jit_comp.c   |  694 +
 7 files changed, 1106 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2729c66..39860fc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -134,6 +134,7 @@ config PPC
select GENERIC_IRQ_SHOW_LEVEL
select HAVE_RCU_TABLE_FREE if SMP
select HAVE_SYSCALL_TRACEPOINTS
+   select HAVE_BPF_JIT if PPC64
 
 config EARLY_PRINTK
bool
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index b7212b6..b94740f 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -154,7 +154,8 @@ core-y  += arch/powerpc/kernel/ 
\
   arch/powerpc/lib/ \
   arch/powerpc/sysdev/ \
   arch/powerpc/platforms/ \
-  arch/powerpc/math-emu/
+  arch/powerpc/math-emu/ \
+  arch/powerpc/net/
 core-$(CONFIG_XMON)+= arch/powerpc/xmon/
 core-$(CONFIG_KVM) += arch/powerpc/kvm/
 
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index e472659..e980faa 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -71,6 +71,42 @@
 #define PPC_INST_ERATSX0x7c000126
 #define PPC_INST_ERATSX_DOT0x7c000127
 
+/* Misc instructions for BPF compiler */
+#define PPC_INST_LD0xe800
+#define PPC_INST_LHZ   0xa000
+#define PPC_INST_LWZ   0x8000
+#define PPC_INST_STD   0xf800
+#define PPC_INST_STDU  0xf801
+#define PPC_INST_MFLR  0x7c0802a6
+#define PPC_INST_MTLR  0x7c0803a6
+#define PPC_INST_CMPWI 0x2c00
+#define PPC_INST_CMPDI 0x2c20
+#define PPC_INST_CMPLW 0x7c40
+#define PPC_INST_CMPLWI0x2800
+#define PPC_INST_ADDI  0x3800
+#define PPC_INST_ADDIS 0x3c00
+#define PPC_INST_ADD   0x7c000214
+#define PPC_INST_SUB   0x7c50
+#define PPC_INST_BLR   0x4e800020
+#define PPC_INST_BLRL  0x4e800021
+#define PPC_INST_MULLW 0x7c0001d6
+#define PPC_INST_MULHWU0x7c16
+#define PPC_INST_MULLI 0x1c00
+#define PPC_INST_DIVWU 0x7c0003d6
+#define PPC_INST_RLWINM0x5400
+#define PPC_INST_RLDICR0x7804
+#define PPC_INST_SLW   0x7c30
+#define PPC_INST_SRW   0x7c000430
+#define PPC_INST_AND   0x7c38
+#define PPC_INST_ANDDOT0x7c39
+#define PPC_INST_OR0x7c000378
+#define PPC_INST_ANDI  0x7000
+#define PPC_INST_ORI   0x6000
+#define PPC_INST_ORIS  0x6400
+#define PPC_INST_NEG   0x7cd0
+#define PPC_INST_BRANCH0x4800
+#define PPC_INST_BRANCH_COND   0x4080
+
 /* macros to insert fields into opcodes */
 #define __PPC_RA(a)(((a)  0x1f)  16)
 #define __PPC_RB(b)(((b)  0x1f)  11

Re: [PATCH] net: filter: BPF 'JIT' compiler for PPC64

2011-07-18 Thread Matt Evans
On 19/07/11 05:42, David Miller wrote:
 From: Eric Dumazet eric.duma...@gmail.com
 Date: Mon, 18 Jul 2011 10:39:35 +0200
 
 So in PPC SEEN_XREG only is to be set of X is read, not if written.

 So you dont have to set SEEN_XREG bit in this part :
 
 Matt, do you want to integrate changes based upon Eric's feedback
 here or do you want me to apply your patch as-is for now?

Thanks, but no worries; I will send a v2 in a sec.  Eric's comments are spot-on,
and there are a couple of other areas that the brainfart should really be
polished out of, too.  :-)


Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2] net: filter: BPF 'JIT' compiler for PPC64

2011-07-18 Thread Matt Evans
An implementation of a code generator for BPF programs to speed up packet
filtering on PPC64, inspired by Eric Dumazet's x86-64 version.

Filter code is generated as an ABI-compliant function in module_alloc()'d mem
with stackframe  prologue/epilogue generated if required (simple filters don't
need anything more than an li/blr).  The filter's local variables, M[], live in
registers.  Supports all BPF opcodes, although complicated loads from negative
packet offsets (e.g. SKF_LL_OFF) are not yet supported.

There are a couple of further optimisations left for future work; many-pass
assembly with branch-reach reduction and a register allocator to push M[]
variables into volatile registers would improve the code quality further.

This currently supports big-endian 64-bit PowerPC only (but is fairly simple
to port to PPC32 or LE!).

Enabled in the same way as x86-64:

echo 1  /proc/sys/net/core/bpf_jit_enable

Or, enabled with extra debug output:

echo 2  /proc/sys/net/core/bpf_jit_enable

Signed-off-by: Matt Evans m...@ozlabs.org
---

V2: Removed some cut/paste woe in setting SEEN_X even on writes.
Merci for le review, Eric!

 arch/powerpc/Kconfig  |1 +
 arch/powerpc/Makefile |3 +-
 arch/powerpc/include/asm/ppc-opcode.h |   40 ++
 arch/powerpc/net/Makefile |4 +
 arch/powerpc/net/bpf_jit.S|  138 +++
 arch/powerpc/net/bpf_jit.h|  227 +++
 arch/powerpc/net/bpf_jit_comp.c   |  690 +
 7 files changed, 1102 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2729c66..39860fc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -134,6 +134,7 @@ config PPC
select GENERIC_IRQ_SHOW_LEVEL
select HAVE_RCU_TABLE_FREE if SMP
select HAVE_SYSCALL_TRACEPOINTS
+   select HAVE_BPF_JIT if PPC64
 
 config EARLY_PRINTK
bool
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index b7212b6..b94740f 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -154,7 +154,8 @@ core-y  += arch/powerpc/kernel/ 
\
   arch/powerpc/lib/ \
   arch/powerpc/sysdev/ \
   arch/powerpc/platforms/ \
-  arch/powerpc/math-emu/
+  arch/powerpc/math-emu/ \
+  arch/powerpc/net/
 core-$(CONFIG_XMON)+= arch/powerpc/xmon/
 core-$(CONFIG_KVM) += arch/powerpc/kvm/
 
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index e472659..e980faa 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -71,6 +71,42 @@
 #define PPC_INST_ERATSX0x7c000126
 #define PPC_INST_ERATSX_DOT0x7c000127
 
+/* Misc instructions for BPF compiler */
+#define PPC_INST_LD0xe800
+#define PPC_INST_LHZ   0xa000
+#define PPC_INST_LWZ   0x8000
+#define PPC_INST_STD   0xf800
+#define PPC_INST_STDU  0xf801
+#define PPC_INST_MFLR  0x7c0802a6
+#define PPC_INST_MTLR  0x7c0803a6
+#define PPC_INST_CMPWI 0x2c00
+#define PPC_INST_CMPDI 0x2c20
+#define PPC_INST_CMPLW 0x7c40
+#define PPC_INST_CMPLWI0x2800
+#define PPC_INST_ADDI  0x3800
+#define PPC_INST_ADDIS 0x3c00
+#define PPC_INST_ADD   0x7c000214
+#define PPC_INST_SUB   0x7c50
+#define PPC_INST_BLR   0x4e800020
+#define PPC_INST_BLRL  0x4e800021
+#define PPC_INST_MULLW 0x7c0001d6
+#define PPC_INST_MULHWU0x7c16
+#define PPC_INST_MULLI 0x1c00
+#define PPC_INST_DIVWU 0x7c0003d6
+#define PPC_INST_RLWINM0x5400
+#define PPC_INST_RLDICR0x7804
+#define PPC_INST_SLW   0x7c30
+#define PPC_INST_SRW   0x7c000430
+#define PPC_INST_AND   0x7c38
+#define PPC_INST_ANDDOT0x7c39
+#define PPC_INST_OR0x7c000378
+#define PPC_INST_ANDI  0x7000
+#define PPC_INST_ORI   0x6000
+#define PPC_INST_ORIS  0x6400
+#define PPC_INST_NEG   0x7cd0
+#define PPC_INST_BRANCH0x4800
+#define PPC_INST_BRANCH_COND   0x4080
+
 /* macros to insert fields into opcodes */
 #define __PPC_RA(a)(((a)  0x1f)  16)
 #define __PPC_RB(b)(((b)  0x1f)  11)
@@ -83,6 +119,10 @@
 #define __PPC_T_TLB(t) (((t)  0x3

Re: [RFC PATCH 1/1] BPF JIT for PPC64

2011-07-11 Thread Matt Evans
On 25/06/11 11:58, Ben Hutchings wrote:
 On Fri, 2011-06-24 at 16:02 +1000, Matt Evans wrote:
 [...]
 +case BPF_S_ALU_ADD_K: /* A += K; */
 +if (!K)
 +break;
 +if (K  32768)
 +PPC_ADDI(r_A, r_A, K);
 +else
 +PPC_ADDI(r_A, r_A, IMM_L(K));
 +PPC_ADDIS(r_A, r_A, IMM_HA(K));
 +break;
 
 Missing braces.
 
 +case BPF_S_ALU_SUB_X: /* A -= X; */
 +ctx-seen |= SEEN_XREG;
 +PPC_SUB(r_A, r_A, r_X);
 +break;
 +case BPF_S_ALU_SUB_K: /* A -= K */
 +if (!K)
 +break;
 +if (K  32768)
 +PPC_ADDI(r_A, r_A, -K);
 +else
 +PPC_ADDI(r_A, r_A, IMM_L(-K));
 +PPC_ADDIS(r_A, r_A, IMM_HA(-K));
 +break;
 [...]
 
 Here as well.

Thanks, Ben -- oops! :)  Really, just the ADDISes need to be conditional, too.


Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 1/1] BPF JIT for PPC64

2011-07-11 Thread Matt Evans
On 25/06/11 17:33, Andreas Schwab wrote:
 Matt Evans m...@ozlabs.org writes:
 
 +stdur1, -128(r1);   \
 
 +addir5, r1, 128+BPF_PPC_STACK_BASIC+(2*8);  \
 
 +addir1, r1, 128;\
 
 +PPC_STD(r_M + i, 1, -128 + (8*i));
 
 +PPC_LD(r_M + i, 1, -128 + (8*i));
 
 s/128/BPF_PPC_STACK_SAVE/?

Actually, that's a different 128, but that nicely illustrates that I should've
#defined something more recognisable :-) The second set, with -128, is actually
in the save area for non-volatile regs, whereas the first is just a stackframe
size.

Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 1/1] BPF JIT for PPC64

2011-07-11 Thread Matt Evans
Hi Eric,

On 25/06/11 17:49, Eric Dumazet wrote:
 Le samedi 25 juin 2011 à 09:33 +0200, Andreas Schwab a écrit :
 Matt Evans m...@ozlabs.org writes:

 +   stdur1, -128(r1);   \

 +   addir5, r1, 128+BPF_PPC_STACK_BASIC+(2*8);  \

 +   addir1, r1, 128;\

 +   PPC_STD(r_M + i, 1, -128 + (8*i));

 +   PPC_LD(r_M + i, 1, -128 + (8*i));

 s/128/BPF_PPC_STACK_SAVE/?

 
 I am not sure using registers to hold MEM[] is a win if MEM[idx] is used
 once in the filter
 
 # tcpdump tcp[20]+tcp[21]=0 -d
 (000) ldh  [12]
 (001) jeq  #0x800   jt 2  jf 15
 (002) ldb  [23]
 (003) jeq  #0x6 jt 4  jf 15
 (004) ldh  [20]
 (005) jset #0x1fff  jt 15 jf 6
 (006) ldxb 4*([14]0xf)
 (007) ldb  [x + 34]
 (008) st   M[1]
 (009) ldb  [x + 35]
 (010) tax  
 (011) ld   M[1]
 (012) add  x
 (013) jeq  #0x0 jt 14 jf 15
 (014) ret  #65535
 (015) ret  #0
 
 In this sample, we use M[1] once ( one store, one load)
 
 So saving previous register content on stack in prologue, and restoring
 it in epilogue actually slow down the code, and adds two instructions in
 filter asm code.

The x86 version generates all accesses of M[x] as a load or store to the
stackframe, so your example would result in one store + one load to/from the
stack.  More than one access would result in N stores/loads.  By having M[] live
in r16-31 on PPC, there is a one-off cost of saving/restoring the (non-volatile)
register to the stack plus a per-use cost of a register-register move (MR, which
is pretty cheap compared to loads/stores!).

You are correct in that, for a single store/load of M[1] above, I'll generate
a STD, MR, MR, LD, but this extra cost of the two MRs is pretty small.  With the
current implementation the gains seen when accessing M[x] /more/ than once will
IMHO more than justify this.  (For several M[x] accesses, x86 would have several
mem ops, PPC would have several reg-reg moves and one load+store.)

An obvious alternative would be to use some of the three free volatile registers
first (in the hope that most filters won't use 3 M[] locations), with a
simple register allocator.  This would save the (non-volatile reg) spill/fill to
stack, too.  In the interest of simplicity I didn't want to do that on a first
pass.

 This also makes epilogue code not easy (not possible as a matter of fact)
 to unwind in helper function
 
 In x86_64 implementation, I chose bpf_error be able to force
 an exception, not returning to JIT code but directly to bpf_func() caller
 
 bpf_error:
 # force a return 0 from jit handler
 xor %eax,%eax
 mov -8(%rbp),%rbx
 leaveq
 ret

Yep, if I use non-volatile regs a return isn't just a simple stack pop.
Currently, I've an extra branch in the return path to hit the common epilogue.
This could be optimised such that the out of line error path jumps directly to
the common epilogue to return (rather than back to the callsite, checking a flag
and /then/ to the epilogue) to speed up the non-error case.  However, it's just
a question of getting to the (existing) epilogue code to clean up; it doesn't
need to be unwound in the helper function.  I don't think this issue is a
strong argument against having M[] exist in registers, though.

From the current position, I think going in the direction of using volatile regs
(without backup/restore cost) is better than going in the direction of making
all M[] references stack accesses.  Do you think it's bearable to continue as it
is and then perform that optimisation later?

Also, thanks for reading/commenting on the patch!



Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 0/1] BPF JIT for PPC64

2011-06-24 Thread Matt Evans
Hi,


Inspired by Eric Dumazet's x86-64 compiler for Berkeley Packet Filter programs,
I've written a BPF compiler for 64-bit PowerPC.  Although it hasn't finished its
strenuous testing regime, I'll have intermittent net access for a couple of
weeks so thought I'd post it for feedback now and submit a 'proper' version when
I'm back.

It's a fairly simple code generator, following a similar structure to the x86
version.  The filter programs are an array of opcode/constant/branch destination
structs, and can perform arithmetic/logical/comparison operations on two virtual
registers A and X, loads from packet headers/data and accesses to local
variables, M[].  Branching is also supported, but only forwards and only within
the extent of the program.

I would probably describe this as more of a static template binary translator
than a JIT but have kept naming consistent :)


Features include:

- Filter code is generated as an ABI-compliant function, stackframe 
  prologue/epilogue if necessary.

- Simple filters (e.g. RET nn) need no stackframe or save/restore code so
  generate into only an li/blr.

- Local variables, M[], live in registers

- I believe this supports all BPF opcodes, although complicated loads from
  negative packet offsets (e.g. SKF_LL_OFF) are not yet supported.

Caveats include:  :)

- Packet data loads call out to simple helper functions (bpf_jit.S) which
  themselves may fall back to a trampoline to skb_copy_bits.  I haven't decided
  whether (as per comments there) it would be better to generate the simple
  loads inline and only call out in the slow case.

- Branches currently generate to bcc 1f; b far dest; 1: or
  bcc near dest ; nop so either case is the same size.  Multiple passes of
  assembly are used (the first gets an idea of how big everything is and what
  features are required), the next generates everything at accurate size, the
  third generates everything with accurate branch destination addresses); I
  intend not to nop-pad the short branch case but changing code size may
  result in more passes and a 'settling-down period'.  Kept simple for now.

- Anyone running PPC64 little-endian is doing something both interesting and
  unsupported for this work :-)  (There are some trivial endian assumptions.)

Tested in-situ (tcpdump with varying complexity filters) and with a random BPF
generator; I haven't verified loads from the fall back skb_copy_bits path.  Bug
reports/testing would be very welcome.



Cheers,



Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH 1/1] BPF JIT for PPC64

2011-06-24 Thread Matt Evans

 arch/powerpc/Kconfig  |1 +
 arch/powerpc/Makefile |3 +-
 arch/powerpc/include/asm/ppc-opcode.h |   40 ++
 arch/powerpc/net/Makefile |4 +
 arch/powerpc/net/bpf_jit.S|  138 +++
 arch/powerpc/net/bpf_jit.h|  226 +++
 arch/powerpc/net/bpf_jit_comp.c   |  697 +
 7 files changed, 1108 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2729c66..39860fc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -134,6 +134,7 @@ config PPC
select GENERIC_IRQ_SHOW_LEVEL
select HAVE_RCU_TABLE_FREE if SMP
select HAVE_SYSCALL_TRACEPOINTS
+   select HAVE_BPF_JIT if PPC64
 
 config EARLY_PRINTK
bool
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index b7212b6..b94740f 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -154,7 +154,8 @@ core-y  += arch/powerpc/kernel/ 
\
   arch/powerpc/lib/ \
   arch/powerpc/sysdev/ \
   arch/powerpc/platforms/ \
-  arch/powerpc/math-emu/
+  arch/powerpc/math-emu/ \
+  arch/powerpc/net/
 core-$(CONFIG_XMON)+= arch/powerpc/xmon/
 core-$(CONFIG_KVM) += arch/powerpc/kvm/
 
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index e472659..e980faa 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -71,6 +71,42 @@
 #define PPC_INST_ERATSX0x7c000126
 #define PPC_INST_ERATSX_DOT0x7c000127
 
+/* Misc instructions for BPF compiler */
+#define PPC_INST_LD0xe800
+#define PPC_INST_LHZ   0xa000
+#define PPC_INST_LWZ   0x8000
+#define PPC_INST_STD   0xf800
+#define PPC_INST_STDU  0xf801
+#define PPC_INST_MFLR  0x7c0802a6
+#define PPC_INST_MTLR  0x7c0803a6
+#define PPC_INST_CMPWI 0x2c00
+#define PPC_INST_CMPDI 0x2c20
+#define PPC_INST_CMPLW 0x7c40
+#define PPC_INST_CMPLWI0x2800
+#define PPC_INST_ADDI  0x3800
+#define PPC_INST_ADDIS 0x3c00
+#define PPC_INST_ADD   0x7c000214
+#define PPC_INST_SUB   0x7c50
+#define PPC_INST_BLR   0x4e800020
+#define PPC_INST_BLRL  0x4e800021
+#define PPC_INST_MULLW 0x7c0001d6
+#define PPC_INST_MULHWU0x7c16
+#define PPC_INST_MULLI 0x1c00
+#define PPC_INST_DIVWU 0x7c0003d6
+#define PPC_INST_RLWINM0x5400
+#define PPC_INST_RLDICR0x7804
+#define PPC_INST_SLW   0x7c30
+#define PPC_INST_SRW   0x7c000430
+#define PPC_INST_AND   0x7c38
+#define PPC_INST_ANDDOT0x7c39
+#define PPC_INST_OR0x7c000378
+#define PPC_INST_ANDI  0x7000
+#define PPC_INST_ORI   0x6000
+#define PPC_INST_ORIS  0x6400
+#define PPC_INST_NEG   0x7cd0
+#define PPC_INST_BRANCH0x4800
+#define PPC_INST_BRANCH_COND   0x4080
+
 /* macros to insert fields into opcodes */
 #define __PPC_RA(a)(((a)  0x1f)  16)
 #define __PPC_RB(b)(((b)  0x1f)  11)
@@ -83,6 +119,10 @@
 #define __PPC_T_TLB(t) (((t)  0x3)  21)
 #define __PPC_WC(w)(((w)  0x3)  21)
 #define __PPC_WS(w)(((w)  0x1f)  11)
+#define __PPC_SH(s)__PPC_WS(s)
+#define __PPC_MB(s)(((s)  0x1f)  6)
+#define __PPC_ME(s)(((s)  0x1f)  1)
+#define __PPC_BI(s)(((s)  0x1f)  16)
 
 /*
  * Only use the larx hint bit on 64bit CPUs. e500v1/v2 based CPUs will treat a
diff --git a/arch/powerpc/net/Makefile b/arch/powerpc/net/Makefile
new file mode 100644
index 000..90568c3
--- /dev/null
+++ b/arch/powerpc/net/Makefile
@@ -0,0 +1,4 @@
+#
+# Arch-specific network modules
+#
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o bpf_jit_comp.o
diff --git a/arch/powerpc/net/bpf_jit.S b/arch/powerpc/net/bpf_jit.S
new file mode 100644
index 000..ce2225e
--- /dev/null
+++ b/arch/powerpc/net/bpf_jit.S
@@ -0,0 +1,138 @@
+/* bpf_jit.S: Packet/header access helper functions
+ * for PPC64 BPF compiler.
+ *
+ * Copyright 2011 Matt Evans m...@ozlabs.org, IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include asm

[PATCH] powerpc: Replace boot_cpu_count with spinning_secondaries, fix off-by-one cpu wait

2011-05-25 Thread Matt Evans
smp_release_cpus() waits for all cpus (including the bootcpu) due to an
off-by-one count on boot_cpu_count (which is all CPUs).  This patch replaces
that with spinning_secondaries (which is all secondary CPUs).

Signed-off-by: Matt Evans m...@ozlabs.org
---
 arch/powerpc/include/asm/smp.h |2 +-
 arch/powerpc/kernel/head_64.S  |2 +-
 arch/powerpc/kernel/prom.c |8 
 arch/powerpc/kernel/setup_32.c |1 -
 arch/powerpc/kernel/setup_64.c |6 +++---
 5 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 880b8c1..8f01c85 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -30,7 +30,7 @@
 #include asm/percpu.h
 
 extern int boot_cpuid;
-extern int boot_cpu_count;
+extern int spinning_secondaries;
 
 extern void cpu_die(void);
 
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index ba50409..3564c49 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -255,7 +255,7 @@ generic_secondary_common_init:
mtctr   r23
bctrl
 
-3: LOAD_REG_ADDR(r3, boot_cpu_count) /* Decrement boot_cpu_count */
+3: LOAD_REG_ADDR(r3, spinning_secondaries) /* Decrement 
spinning_secondaries */
lwarx   r4,0,r3
subir4,r4,1
stwcx.  r4,0,r3
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 48aeb55..540e0dc 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -69,6 +69,7 @@ unsigned long tce_alloc_start, tce_alloc_end;
 u64 ppc64_rma_size;
 #endif
 static phys_addr_t first_memblock_size;
+static int __initdata boot_cpu_count;
 
 static int __init early_parse_mem(char *p)
 {
@@ -748,6 +749,13 @@ void __init early_init_devtree(void *params)
 */
of_scan_flat_dt(early_init_dt_scan_cpus, NULL);
 
+#if defined(CONFIG_SMP)  defined(CONFIG_PPC64)
+   /* We'll later wait for secondaries to check in; there are
+* NCPUS-1 non-boot CPUs  :-)
+*/
+   spinning_secondaries = boot_cpu_count - 1;
+#endif
+
DBG( - early_init_devtree()\n);
 }
 
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 620d792..1d2fbc9 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -48,7 +48,6 @@ extern void bootx_init(unsigned long r4, unsigned long phys);
 
 int boot_cpuid = -1;
 EXPORT_SYMBOL_GPL(boot_cpuid);
-int __initdata boot_cpu_count;
 int boot_cpuid_phys;
 
 int smp_hw_index[NR_CPUS];
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index a88bf27..0576919 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -73,7 +73,7 @@
 #endif
 
 int boot_cpuid = 0;
-int __initdata boot_cpu_count;
+int __initdata spinning_secondaries;
 u64 ppc64_pft_size;
 
 /* Pick defaults since we might want to patch instructions
@@ -253,11 +253,11 @@ void smp_release_cpus(void)
for (i = 0; i  10; i++) {
mb();
HMT_low();
-   if (boot_cpu_count == 0)
+   if (spinning_secondaries == 0)
break;
udelay(1);
}
-   DBG(boot_cpu_count = %d\n, boot_cpu_count);
+   DBG(spinning_secondaries = %d\n, spinning_secondaries);
 
DBG( - smp_release_cpus()\n);
 }
-- 
1.7.0.4

0
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Free up some CPU feature bits by moving out MMU-related features

2011-04-07 Thread Matt Evans
On 07/04/11 17:06, Kumar Gala wrote:
 
 On Apr 7, 2011, at 12:48 AM, Matt Evans wrote:
 
 diff --git a/arch/powerpc/include/asm/cputable.h 
 b/arch/powerpc/include/asm/cputable.h
 index be3cdf9..7b0fe7c 100644
 --- a/arch/powerpc/include/asm/cputable.h
 +++ b/arch/powerpc/include/asm/cputable.h
 @@ -178,22 +178,15 @@ extern const char *powerpc_base_platform;
 #define LONG_ASM_CONST(x)0
 #endif

 -#define CPU_FTR_SLB LONG_ASM_CONST(0x0001)
 -#define CPU_FTR_16M_PAGELONG_ASM_CONST(0x0002)
 -#define CPU_FTR_TLBIEL  
 LONG_ASM_CONST(0x0004)
 #define CPU_FTR_IABR LONG_ASM_CONST(0x0020)
 #define CPU_FTR_MMCRA
 LONG_ASM_CONST(0x0040)
 #define CPU_FTR_CTRL LONG_ASM_CONST(0x0080)
 #define CPU_FTR_SMT  LONG_ASM_CONST(0x0100)
 -#define CPU_FTR_LOCKLESS_TLBIE  
 LONG_ASM_CONST(0x0400)
 -#define CPU_FTR_CI_LARGE_PAGE   
 LONG_ASM_CONST(0x1000)
 #define CPU_FTR_PAUSE_ZERO   LONG_ASM_CONST(0x2000)
 #define CPU_FTR_PURR LONG_ASM_CONST(0x4000)
 #define CPU_FTR_CELL_TB_BUG  LONG_ASM_CONST(0x8000)
 #define CPU_FTR_SPURR
 LONG_ASM_CONST(0x0001)
 #define CPU_FTR_DSCR LONG_ASM_CONST(0x0002)
 -#define CPU_FTR_1T_SEGMENT  LONG_ASM_CONST(0x0004)
 -#define CPU_FTR_NO_SLBIE_B  LONG_ASM_CONST(0x0008)
 #define CPU_FTR_VSX  LONG_ASM_CONST(0x0010)
 #define CPU_FTR_SAO  LONG_ASM_CONST(0x0020)
 #define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x0040)
 @@ -205,9 +198,10 @@ extern const char *powerpc_base_platform;
 
 Seems like SAO should move into MMU features

I would argue it's the core/nest that orders/disorders things rather than the 
MMU.


Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: Free up some CPU feature bits by moving out MMU-related features

2011-04-06 Thread Matt Evans
Some of the 64bit PPC CPU features are MMU-related, so this patch moves
them to MMU_FTR_ bits.  All cpu_has_feature()-style tests are moved to
mmu_has_feature(), and seven feature bits are freed as a result.

Signed-off-by: Matt Evans m...@ozlabs.org
---
Boot-tested on pseries and G5.

 arch/powerpc/include/asm/cputable.h|   52 ++-
 arch/powerpc/include/asm/mmu.h |   28 +++
 arch/powerpc/include/asm/mmu_context.h |2 +-
 arch/powerpc/kernel/cputable.c |   39 ++---
 arch/powerpc/kernel/entry_64.S |8 ++--
 arch/powerpc/kernel/exceptions-64s.S   |4 +-
 arch/powerpc/kernel/process.c  |4 +-
 arch/powerpc/kernel/prom.c |   17 +
 arch/powerpc/kernel/setup_64.c |2 +-
 arch/powerpc/mm/hash_low_64.S  |8 ++--
 arch/powerpc/mm/hash_native_64.c   |8 ++--
 arch/powerpc/mm/hash_utils_64.c|   18 +-
 arch/powerpc/mm/hugetlbpage.c  |2 +-
 arch/powerpc/mm/slb.c  |4 +-
 arch/powerpc/mm/slb_low.S  |8 ++--
 arch/powerpc/mm/stab.c |2 +-
 arch/powerpc/platforms/iseries/exception.S |2 +-
 arch/powerpc/platforms/iseries/setup.c |4 +-
 arch/powerpc/platforms/pseries/lpar.c  |2 +-
 arch/powerpc/xmon/xmon.c   |2 +-
 20 files changed, 123 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index be3cdf9..7b0fe7c 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -178,22 +178,15 @@ extern const char *powerpc_base_platform;
 #define LONG_ASM_CONST(x)  0
 #endif
 
-#define CPU_FTR_SLBLONG_ASM_CONST(0x0001)
-#define CPU_FTR_16M_PAGE   LONG_ASM_CONST(0x0002)
-#define CPU_FTR_TLBIEL LONG_ASM_CONST(0x0004)
 #define CPU_FTR_IABR   LONG_ASM_CONST(0x0020)
 #define CPU_FTR_MMCRA  LONG_ASM_CONST(0x0040)
 #define CPU_FTR_CTRL   LONG_ASM_CONST(0x0080)
 #define CPU_FTR_SMTLONG_ASM_CONST(0x0100)
-#define CPU_FTR_LOCKLESS_TLBIE LONG_ASM_CONST(0x0400)
-#define CPU_FTR_CI_LARGE_PAGE  LONG_ASM_CONST(0x1000)
 #define CPU_FTR_PAUSE_ZERO LONG_ASM_CONST(0x2000)
 #define CPU_FTR_PURR   LONG_ASM_CONST(0x4000)
 #define CPU_FTR_CELL_TB_BUGLONG_ASM_CONST(0x8000)
 #define CPU_FTR_SPURR  LONG_ASM_CONST(0x0001)
 #define CPU_FTR_DSCR   LONG_ASM_CONST(0x0002)
-#define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004)
-#define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008)
 #define CPU_FTR_VSXLONG_ASM_CONST(0x0010)
 #define CPU_FTR_SAOLONG_ASM_CONST(0x0020)
 #define CPU_FTR_CP_USE_DCBTZ   LONG_ASM_CONST(0x0040)
@@ -205,9 +198,10 @@ extern const char *powerpc_base_platform;
 
 #ifndef __ASSEMBLY__
 
-#define CPU_FTR_PPCAS_ARCH_V2  (CPU_FTR_SLB | \
-CPU_FTR_TLBIEL | CPU_FTR_NOEXECUTE | \
-CPU_FTR_NODSISRALIGN | CPU_FTR_16M_PAGE)
+#define CPU_FTR_PPCAS_ARCH_V2  (CPU_FTR_NOEXECUTE | CPU_FTR_NODSISRALIGN)
+
+#define MMU_FTR_PPCAS_ARCH_V2  (MMU_FTR_SLB | MMU_FTR_TLBIEL | \
+MMU_FTR_16M_PAGE)
 
 /* We only set the altivec features if the kernel was compiled with altivec
  * support
@@ -405,41 +399,49 @@ extern const char *powerpc_base_platform;
 #define CPU_FTRS_POWER5(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_MMCRA | CPU_FTR_SMT | \
-   CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
-   CPU_FTR_PURR | CPU_FTR_STCX_CHECKS_ADDRESS | \
-   CPU_FTR_POPCNTB)
+   CPU_FTR_COHERENT_ICACHE | CPU_FTR_PURR | \
+   CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB)
 #define CPU_FTRS_POWER6 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_MMCRA | CPU_FTR_SMT | \
-   CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
-   CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD | \
+   CPU_FTR_COHERENT_ICACHE | CPU_FTR_PURR | CPU_FTR_SPURR | \
+   CPU_FTR_REAL_LE | CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB)
 #define CPU_FTRS_POWER7 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_MMCRA

Re: [PATCH v2 5/5] xhci: Remove recursive call to xhci_handle_event

2011-03-30 Thread Matt Evans
On 30/03/11 07:00, Dmitry Torokhov wrote:
 On Sunday, March 27, 2011 09:53:00 pm Matt Evans wrote:
 @@ -2282,7 +2284,7 @@ hw_died:
  /* FIXME this should be a delayed service routine
   * that clears the EHB.
   */
 -xhci_handle_event(xhci);
 +while (xhci_handle_event(xhci)) {}

 
 I must admit I dislike the style with empty loop bodies, do you think
 we could have something like below instead?

Well, although I don't mind empty while()s at all (they're clean and obvious
IMHO) I would remove an empty blightful while loop with something like this:

do {
ret = xhci_handle_event(xhci);
} while (ret  0);

;-) (Not sure that refactoring its contents into the IRQ handler is a good idea,
if that area's going to be revisited soon to extend error
handling/reporting.)

Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/5] Make xHCI driver endian-safe, add a barrier, some debug

2011-03-28 Thread Matt Evans
On 29/03/11 09:16, Sarah Sharp wrote:
 On Fri, Mar 25, 2011 at 06:43:44PM +1100, Matt Evans wrote:
 Hi Sarah,


 This series addresses the endian issues with the xHCI driver, and has brought
 lovely USB 3 to PPC. :-)  I've tested various types of traffic on ppc4xx and
 POWER7 and (some sound driver bugs notwithstanding) all seems fine.  Also
 addresses an ordering problem we found and the recursive nature of the event
 handling, plus the addition of some debug.
 
 Thanks for doing this work, Matt!  I appreciate it.

Hey, no worries!  I've got some $3 USB speakers I wanted to connect to the
$15,000 POWER server, you know how it goes.

 This should apply to 2.6.38/Linus' tree.
 
 You say that these apply against 2.6.38, but recently a lot of xHCI
 changes went into Linus' tree to support USB 3.0 hubs in 2.6.39.  Will
 these patches still apply against Linus' latest tree?  If not, I suggest
 you base your patches against Greg KH's usb-linus branch, as that's my
 tree base:
 
 http://git.kernel.org/?p=linux/kernel/git/gregkh/usb-2.6.git;a=shortlog;h=refs/heads/usb-linus

Sorry, I wasn't too explicit about that; I'd meant Linus' tree as of todayish
and I believe I caught said USB 3.0 changes, but I'll rebase from usb-linus
anyway to make sure we're fine.

 Also, I haven't read too far into the patches, but the first patch seems
 to have several one or two letter variable names, like f and di.
 Can you make those variable names more descriptive?  Thanks.

Sure; they were single-use throwaways to break long RMW lines but I've rejiggled
 removed them.

Thanks for looking, I'll repost v3 with tidyups.


Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/5] xhci: Extend debug messages

2011-03-28 Thread Matt Evans
Hi,

On 29/03/11 09:19, Sarah Sharp wrote:
 On Fri, Mar 25, 2011 at 06:43:59PM +1100, Matt Evans wrote:
 Add more debug to print queued transfers, show control intentions and
 modify an existing message to hexify address output.
 
 Are these new debug messages really necessary?  I feel like the xHCI
 driver has way too many debugging messages already.  I'd like to switch
 it over to the event tracer infrastructure eventually, and every new
 debug message makes that work harder to do...
 
 The queue_trb debug message is going to be especially noticeable, since
 it will be triggered on every URB submission (sometimes more than once
 per URB).

My rationale was that when looking out for control transfers, the actual
transfer was the one line of info *not* printed at that point (among about a
trillion of ring/buffer state). :) I didn't know about the event tracer
intentions, so if this'll complicate things I'll drop it from the series, no
worries.

Cheers,


Matt

 
 Sarah Sharp
 
 Signed-off-by: Matt Evans m...@ozlabs.org
 ---
  drivers/usb/host/xhci-dbg.c  |2 +-
  drivers/usb/host/xhci-hub.c  |3 +++
  drivers/usb/host/xhci-ring.c |5 +
  3 files changed, 9 insertions(+), 1 deletions(-)

 diff --git a/drivers/usb/host/xhci-dbg.c b/drivers/usb/host/xhci-dbg.c
 index 2e04861..905f3bf 100644
 --- a/drivers/usb/host/xhci-dbg.c
 +++ b/drivers/usb/host/xhci-dbg.c
 @@ -278,7 +278,7 @@ void xhci_debug_trb(struct xhci_hcd *xhci, union 
 xhci_trb *trb)
   * FIXME: look at flags to figure out if it's an address or if
   * the data is directly in the buffer field.
   */
 -xhci_dbg(xhci, DMA address or buffer contents= %llu\n, 
 address);
 +xhci_dbg(xhci, DMA address or buffer contents= 0x%llx\n, 
 address);
  break;
  case TRB_TYPE(TRB_COMPLETION):
  address = le64_to_cpu(trb-event_cmd.cmd_trb);
 diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
 index ae1d24c..768fd6e 100644
 --- a/drivers/usb/host/xhci-hub.c
 +++ b/drivers/usb/host/xhci-hub.c
 @@ -389,6 +389,9 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, 
 u16 wValue,
  }
  bus_state = xhci-bus_state[hcd_index(hcd)];
  
 +xhci_dbg(xhci, %s(%04x, %04x, %04x, %p, %04x);\n, __func__,
 + typeReq, wValue, wIndex, buf, wLength);
 +
  spin_lock_irqsave(xhci-lock, flags);
  switch (typeReq) {
  case GetHubStatus:
 diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
 index 3898f22..45f3b77 100644
 --- a/drivers/usb/host/xhci-ring.c
 +++ b/drivers/usb/host/xhci-ring.c
 @@ -2336,6 +2336,11 @@ static void queue_trb(struct xhci_hcd *xhci, struct 
 xhci_ring *ring,
  trb-field[1] = cpu_to_le32(field2);
  trb-field[2] = cpu_to_le32(field3);
  trb-field[3] = cpu_to_le32(field4);
 +
 +xhci_dbg(xhci, queue_trb @%llx %08x %08x %08x %08x\n,
 + xhci_trb_virt_to_dma(ring-enq_seg, ring-enqueue),
 + le32_to_cpu(trb-field[0]), le32_to_cpu(trb-field[1]),
 + le32_to_cpu(trb-field[2]), le32_to_cpu(trb-field[3]));
  inc_enq(xhci, ring, consumer, more_trbs_coming);
  }
  
 -- 
 1.7.0.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3 0/4] xhci: Make xHCI driver endian-safe

2011-03-28 Thread Matt Evans
Hi Sarah,


Reposting the whole set, since there're 4 instead of 5 having dropped the 'add
debug' patch in the middle and I thought differing 'vN a/b' subjects may get
confusing otherwise.  (Two of the patches haven't changed. :/ )

These remove the scratty temp variables and comment the return value
of xhci_handle_event().


Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3 2/4] xhci: Add rmb() between reading event validity event data access.

2011-03-28 Thread Matt Evans
On weakly-ordered systems, the reading of an event's content must occur
after reading the event's validity.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 drivers/usb/host/xhci-ring.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 1361032..86c8198 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2150,6 +2150,11 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
}
xhci_dbg(xhci, %s - OS owns TRB\n, __func__);
 
+   /*
+* Barrier between reading the TRB_CYCLE (valid) flag above and any
+* speculative reads of the event's flags/data below.
+*/
+   rmb();
/* FIXME: Handle more event types. */
switch ((le32_to_cpu(event-event_cmd.flags)  TRB_TYPE_BITMASK)) {
case TRB_TYPE(TRB_COMPLETION):
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3 3/4] xhci: Add an assertion to check for virt_dev=0 bug.

2011-03-28 Thread Matt Evans
During a plug-unplug stress test on an NEC xHCI card, a null pointer
dereference was observed.  xhci_address_device() dereferenced a null
virt_dev (possibly an erroneous udev-slot_id?); this patch adds a WARN_ON 
message to aid debug if it can be recreated.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 drivers/usb/host/xhci.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 3a9f931..d145fa3 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -2533,6 +2533,17 @@ int xhci_address_device(struct usb_hcd *hcd, struct 
usb_device *udev)
 
virt_dev = xhci-devs[udev-slot_id];
 
+   if (WARN_ON(!virt_dev)) {
+   /*
+* In plug/unplug torture test with an NEC controller,
+* a zero-dereference was observed once due to virt_dev = 0.
+* Print useful debug rather than crash if it is observed again!
+*/
+   xhci_warn(xhci, Virt dev invalid for slot_id 0x%x!\n,
+   udev-slot_id);
+   return -EINVAL;
+   }
+
slot_ctx = xhci_get_slot_ctx(xhci, virt_dev-in_ctx);
/*
 * If this is the first Set Address since device plug-in or
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3 4/4] xhci: Remove recursive call to xhci_handle_event

2011-03-28 Thread Matt Evans
Make the caller loop while there are events to handle, instead.

Signed-off-by: Matt Evans m...@ozlabs.org
---

Added a comment on the return value, defining 0 to be 'bad'.

 drivers/usb/host/xhci-ring.c |   18 +++---
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 86c8198..5fce46c 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2128,8 +2128,10 @@ cleanup:
 /*
  * This function handles all OS-owned events on the event ring.  It may drop
  * xhci-lock between event processing (e.g. to pass up port status changes).
+ * Returns 0 for possibly more events to process (caller should call again),
+ * otherwise 0 if done.  In future, 0 returns should indicate error code.
  */
-static void xhci_handle_event(struct xhci_hcd *xhci)
+static int xhci_handle_event(struct xhci_hcd *xhci)
 {
union xhci_trb *event;
int update_ptrs = 1;
@@ -2138,7 +2140,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
xhci_dbg(xhci, In %s\n, __func__);
if (!xhci-event_ring || !xhci-event_ring-dequeue) {
xhci-error_bitmask |= 1  1;
-   return;
+   return 0;
}
 
event = xhci-event_ring-dequeue;
@@ -2146,7 +2148,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
if ((le32_to_cpu(event-event_cmd.flags)  TRB_CYCLE) !=
xhci-event_ring-cycle_state) {
xhci-error_bitmask |= 1  2;
-   return;
+   return 0;
}
xhci_dbg(xhci, %s - OS owns TRB\n, __func__);
 
@@ -2190,15 +2192,17 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
if (xhci-xhc_state  XHCI_STATE_DYING) {
xhci_dbg(xhci, xHCI host dying, returning from 
event handler.\n);
-   return;
+   return 0;
}
 
if (update_ptrs)
/* Update SW event ring dequeue pointer */
inc_deq(xhci, xhci-event_ring, true);
 
-   /* Are there more items on the event ring? */
-   xhci_handle_event(xhci);
+   /* Are there more items on the event ring?  Caller will call us again to
+* check.
+*/
+   return 1;
 }
 
 /*
@@ -2280,7 +2284,7 @@ hw_died:
/* FIXME this should be a delayed service routine
 * that clears the EHB.
 */
-   xhci_handle_event(xhci);
+   while (xhci_handle_event(xhci)  0) {}
 
temp_64 = xhci_read_64(xhci, xhci-ir_set-erst_dequeue);
/* If necessary, update the HW's version of the event ring deq ptr. */
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 3/5] xhci: Add rmb() between reading event validity event data access.

2011-03-27 Thread Matt Evans
On weakly-ordered systems, the reading of an event's content must occur
after reading the event's validity.

Signed-off-by: Matt Evans m...@ozlabs.org
---
Segher, thanks for the comment; explanation added.

 drivers/usb/host/xhci-ring.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 45f3b77..d6aa880 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2152,6 +2152,11 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
}
xhci_dbg(xhci, %s - OS owns TRB\n, __func__);
 
+   /*
+* Barrier between reading the TRB_CYCLE (valid) flag above and any
+* speculative reads of the event's flags/data below.
+*/
+   rmb();
/* FIXME: Handle more event types. */
switch ((le32_to_cpu(event-event_cmd.flags)  TRB_TYPE_BITMASK)) {
case TRB_TYPE(TRB_COMPLETION):
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 5/5] xhci: Remove recursive call to xhci_handle_event

2011-03-27 Thread Matt Evans
Make the caller loop while there are events to handle, instead.

Signed-off-by: Matt Evans m...@ozlabs.org
---
1 byte smaller after Sergei's suggestion.

 drivers/usb/host/xhci-ring.c |   16 +---
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index d6aa880..9c51d69 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2131,7 +2131,7 @@ cleanup:
  * This function handles all OS-owned events on the event ring.  It may drop
  * xhci-lock between event processing (e.g. to pass up port status changes).
  */
-static void xhci_handle_event(struct xhci_hcd *xhci)
+static int xhci_handle_event(struct xhci_hcd *xhci)
 {
union xhci_trb *event;
int update_ptrs = 1;
@@ -2140,7 +2140,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
xhci_dbg(xhci, In %s\n, __func__);
if (!xhci-event_ring || !xhci-event_ring-dequeue) {
xhci-error_bitmask |= 1  1;
-   return;
+   return 0;
}
 
event = xhci-event_ring-dequeue;
@@ -2148,7 +2148,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
if ((le32_to_cpu(event-event_cmd.flags)  TRB_CYCLE) !=
xhci-event_ring-cycle_state) {
xhci-error_bitmask |= 1  2;
-   return;
+   return 0;
}
xhci_dbg(xhci, %s - OS owns TRB\n, __func__);
 
@@ -2192,15 +2192,17 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
if (xhci-xhc_state  XHCI_STATE_DYING) {
xhci_dbg(xhci, xHCI host dying, returning from 
event handler.\n);
-   return;
+   return 0;
}
 
if (update_ptrs)
/* Update SW event ring dequeue pointer */
inc_deq(xhci, xhci-event_ring, true);
 
-   /* Are there more items on the event ring? */
-   xhci_handle_event(xhci);
+   /* Are there more items on the event ring?  Caller will call us again to
+* check.
+*/
+   return 1;
 }
 
 /*
@@ -2282,7 +2284,7 @@ hw_died:
/* FIXME this should be a delayed service routine
 * that clears the EHB.
 */
-   xhci_handle_event(xhci);
+   while (xhci_handle_event(xhci)) {}
 
temp_64 = xhci_read_64(xhci, xhci-ir_set-erst_dequeue);
/* If necessary, update the HW's version of the event ring deq ptr. */
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 0/5] Make xHCI driver endian-safe, add a barrier, some debug

2011-03-25 Thread Matt Evans
Hi Sarah,


This series addresses the endian issues with the xHCI driver, and has brought
lovely USB 3 to PPC. :-)  I've tested various types of traffic on ppc4xx and
POWER7 and (some sound driver bugs notwithstanding) all seems fine.  Also
addresses an ordering problem we found and the recursive nature of the event
handling, plus the addition of some debug.

This should apply to 2.6.38/Linus' tree.


Thanks!


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/5] xhci: Extend debug messages

2011-03-25 Thread Matt Evans
Add more debug to print queued transfers, show control intentions and
modify an existing message to hexify address output.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 drivers/usb/host/xhci-dbg.c  |2 +-
 drivers/usb/host/xhci-hub.c  |3 +++
 drivers/usb/host/xhci-ring.c |5 +
 3 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/host/xhci-dbg.c b/drivers/usb/host/xhci-dbg.c
index 2e04861..905f3bf 100644
--- a/drivers/usb/host/xhci-dbg.c
+++ b/drivers/usb/host/xhci-dbg.c
@@ -278,7 +278,7 @@ void xhci_debug_trb(struct xhci_hcd *xhci, union xhci_trb 
*trb)
 * FIXME: look at flags to figure out if it's an address or if
 * the data is directly in the buffer field.
 */
-   xhci_dbg(xhci, DMA address or buffer contents= %llu\n, 
address);
+   xhci_dbg(xhci, DMA address or buffer contents= 0x%llx\n, 
address);
break;
case TRB_TYPE(TRB_COMPLETION):
address = le64_to_cpu(trb-event_cmd.cmd_trb);
diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index ae1d24c..768fd6e 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -389,6 +389,9 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 
wValue,
}
bus_state = xhci-bus_state[hcd_index(hcd)];
 
+   xhci_dbg(xhci, %s(%04x, %04x, %04x, %p, %04x);\n, __func__,
+typeReq, wValue, wIndex, buf, wLength);
+
spin_lock_irqsave(xhci-lock, flags);
switch (typeReq) {
case GetHubStatus:
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 3898f22..45f3b77 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2336,6 +2336,11 @@ static void queue_trb(struct xhci_hcd *xhci, struct 
xhci_ring *ring,
trb-field[1] = cpu_to_le32(field2);
trb-field[2] = cpu_to_le32(field3);
trb-field[3] = cpu_to_le32(field4);
+
+   xhci_dbg(xhci, queue_trb @%llx %08x %08x %08x %08x\n,
+xhci_trb_virt_to_dma(ring-enq_seg, ring-enqueue),
+le32_to_cpu(trb-field[0]), le32_to_cpu(trb-field[1]),
+le32_to_cpu(trb-field[2]), le32_to_cpu(trb-field[3]));
inc_enq(xhci, ring, consumer, more_trbs_coming);
 }
 
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 3/5] xhci: Add rmb() between reading event validity event data access.

2011-03-25 Thread Matt Evans
On weakly-ordered systems, the reading of an event's content must occur
after reading the event's validity.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 drivers/usb/host/xhci-ring.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 45f3b77..b46efd9 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2151,7 +2151,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
return;
}
xhci_dbg(xhci, %s - OS owns TRB\n, __func__);
-
+   rmb();
/* FIXME: Handle more event types. */
switch ((le32_to_cpu(event-event_cmd.flags)  TRB_TYPE_BITMASK)) {
case TRB_TYPE(TRB_COMPLETION):
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 4/5] xhci: Add an assertion to check for virt_dev=0 bug.

2011-03-25 Thread Matt Evans
During a plug-unplug stress test on an NEC xHCI card, a null pointer
dereference was observed.  xhci_address_device() dereferenced a null
virt_dev (possibly an erroneous udev-slot_id?); this patch adds a WARN_ON 
message to aid debug if it can be recreated.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 drivers/usb/host/xhci.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 88e6298..7d43456 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -2542,6 +2542,17 @@ int xhci_address_device(struct usb_hcd *hcd, struct 
usb_device *udev)
 
virt_dev = xhci-devs[udev-slot_id];
 
+   if (WARN_ON(!virt_dev)) {
+   /*
+* In plug/unplug torture test with an NEC controller,
+* a zero-dereference was observed once due to virt_dev = 0.
+* Print useful debug rather than crash if it is observed again!
+*/
+   xhci_warn(xhci, Virt dev invalid for slot_id 0x%x!\n,
+   udev-slot_id);
+   return -EINVAL;
+   }
+
slot_ctx = xhci_get_slot_ctx(xhci, virt_dev-in_ctx);
/*
 * If this is the first Set Address since device plug-in or
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 5/5] xhci: Remove recursive call to xhci_handle_event

2011-03-25 Thread Matt Evans
Make the caller loop while there are events to handle, instead.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 drivers/usb/host/xhci-ring.c |   16 +---
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index b46efd9..97bedd6 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2131,7 +2131,7 @@ cleanup:
  * This function handles all OS-owned events on the event ring.  It may drop
  * xhci-lock between event processing (e.g. to pass up port status changes).
  */
-static void xhci_handle_event(struct xhci_hcd *xhci)
+static int xhci_handle_event(struct xhci_hcd *xhci)
 {
union xhci_trb *event;
int update_ptrs = 1;
@@ -2140,7 +2140,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
xhci_dbg(xhci, In %s\n, __func__);
if (!xhci-event_ring || !xhci-event_ring-dequeue) {
xhci-error_bitmask |= 1  1;
-   return;
+   return 0;
}
 
event = xhci-event_ring-dequeue;
@@ -2148,7 +2148,7 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
if ((le32_to_cpu(event-event_cmd.flags)  TRB_CYCLE) !=
xhci-event_ring-cycle_state) {
xhci-error_bitmask |= 1  2;
-   return;
+   return 0;
}
xhci_dbg(xhci, %s - OS owns TRB\n, __func__);
rmb();
@@ -2187,15 +2187,17 @@ static void xhci_handle_event(struct xhci_hcd *xhci)
if (xhci-xhc_state  XHCI_STATE_DYING) {
xhci_dbg(xhci, xHCI host dying, returning from 
event handler.\n);
-   return;
+   return 0;
}
 
if (update_ptrs)
/* Update SW event ring dequeue pointer */
inc_deq(xhci, xhci-event_ring, true);
 
-   /* Are there more items on the event ring? */
-   xhci_handle_event(xhci);
+   /* Are there more items on the event ring?  Caller will call us again to
+* check.
+*/
+   return 1;
 }
 
 /*
@@ -2277,7 +2279,7 @@ hw_died:
/* FIXME this should be a delayed service routine
 * that clears the EHB.
 */
-   xhci_handle_event(xhci);
+   while (xhci_handle_event(xhci)) {};
 
temp_64 = xhci_read_64(xhci, xhci-ir_set-erst_dequeue);
/* If necessary, update the HW's version of the event ring deq ptr. */
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: Initialise paca-kstack before early_setup_secondary

2010-08-13 Thread Matt Evans
As early setup calls down to slb_initialize(), we must have kstack
initialised before checking should we add a bolted SLB entry for our kstack?

Failing to do so means stack access requires an SLB miss exception to refill
an entry dynamically, if the stack isn't accessible via SLB(0) (kernel text
 static data).  It's not always allowable to take such a miss, and
intermittent crashes will result.

Primary CPUs don't have this issue; an SLB entry is not bolted for their
stack anyway (as that lives within SLB(0)).  This patch therefore only
affects the init of secondaries.

Signed-off-by: Matt Evans m...@ozlabs.org
Cc: stable sta...@kernel.org
---
 arch/powerpc/kernel/head_64.S |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 844a44b..4d6681d 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -572,9 +572,6 @@ __secondary_start:
/* Set thread priority to MEDIUM */
HMT_MEDIUM
 
-   /* Do early setup for that CPU (stab, slb, hash table pointer) */
-   bl  .early_setup_secondary
-
/* Initialize the kernel stack.  Just a repeat for iSeries.  */
LOAD_REG_ADDR(r3, current_set)
sldir28,r24,3   /* get current_set[cpu#] */
@@ -582,6 +579,9 @@ __secondary_start:
addir1,r1,THREAD_SIZE-STACK_FRAME_OVERHEAD
std r1,PACAKSAVE(r13)
 
+   /* Do early setup for that CPU (stab, slb, hash table pointer) */
+   bl  .early_setup_secondary
+
/* Clear backchain so we get nice backtraces */
li  r7,0
mtlrr7
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec

2010-07-29 Thread Matt Evans
Michael Neuling wrote:
 In message 4c511216.30...@ozlabs.org you wrote:
 When CPU hotplug is used, some CPUs may be offline at the time a kexec is
 performed.  The subsequent kernel may expect these CPUs to be already running
 ,
 and will declare them stuck.  On pseries, there's also a soft-offline (cede)
 state that CPUs may be in; this can also cause problems as the kexeced kernel
 may ask RTAS if they're online -- and RTAS would say they are.  Again, stuck.

 This patch kicks each present offline CPU awake before the kexec, so that
 none are lost to these assumptions in the subsequent kernel.
 
 There are a lot of cleanups in this patch.  The change you are making
 would be a lot clearer without all the additional cleanups in there.  I
 think I'd like to see this as two patches.  One for cleanups and one for
 the addition of wake_offline_cpus().

Okay, I can split this.  Typofixy-add-debug in one, wake_offline_cpus in 
another.

 Other than that, I'm not completely convinced this is the functionality
 we want.  Do we really want to online these cpus?  Why where they
 offlined in the first place?  I understand the stuck problem, but is the
 solution to online them, or to change the device tree so that the second
 kernel doesn't detect them as stuck?

Well... There are two cases.  If a CPU is soft-offlined on pseries, it must be 
woken from that cede loop (in platforms/pseries/hotplug-cpu.c) as we're 
replacing code under its feet.  We could either special-case the wakeup from 
this cede loop to get that CPU to RTAS stop-self itself properly.  (Kind of 
like a wake to die.)

So that leaves hard-offline CPUs (perhaps including the above): I don't know 
why they might have been offlined.  If it's something serious, like fire, 
they'd be removed from the present set too (and thus not be considered in this 
restarting case).  We could add a mask to the CPU node to show which of the 
threads (if any) are running, and alter the startup code to start everything if 
this mask doesn't exist (non-kexec) or only online currently-running threads if 
the mask is present.  That feels a little weird.

My reasoning for restarting everything was:  The first time you boot, all of 
your present CPUs are started up.  When you reboot, any CPUs you offlined for 
fun are restarted.  Kexec is (in this non-crash sense) a user-initiated 'quick 
reboot', so I reasoned that it should look the same as a 'hard reboot' and your 
new invocation would have all available CPUs running as is usual.


Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 0/2] powerpc/kexec: Fix orphaned offline CPUs, add comments/debug

2010-07-29 Thread Matt Evans
Separated tidyup comments  debug away from the fix of restarting offline
available CPUs before waiting for them on kexec.


Matt Evans (2):
  powerpc/kexec: Add to and tidy debug/comments in machine_kexec64.c
  powerpc/kexec: Fix orphaned offline CPUs across kexec

 arch/powerpc/kernel/machine_kexec_64.c |   55 ---
 1 files changed, 49 insertions(+), 6 deletions(-)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/2] powerpc/kexec: Add to and tidy debug/comments in machine_kexec64.c

2010-07-29 Thread Matt Evans
Tidies some typos, KERN_INFO-ise an info msg, and add a debug msg showing
when the final sequence starts.

Also adds a comment to kexec_prepare_cpus_wait() to make note of a possible
problem; the need for kexec to deal with CPUs that failed to originally start
up.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 arch/powerpc/kernel/machine_kexec_64.c |   29 -
 1 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 4fbb3be..aa3d5cd 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -15,6 +15,7 @@
 #include linux/thread_info.h
 #include linux/init_task.h
 #include linux/errno.h
+#include linux/kernel.h
 
 #include asm/page.h
 #include asm/current.h
@@ -181,7 +182,20 @@ static void kexec_prepare_cpus_wait(int wait_state)
int my_cpu, i, notified=-1;
 
my_cpu = get_cpu();
-   /* Make sure each CPU has atleast made it to the state we need */
+   /* Make sure each CPU has at least made it to the state we need.
+*
+* FIXME: There is a (slim) chance of a problem if not all of the CPUs
+* are correctly onlined.  If somehow we start a CPU on boot with RTAS
+* start-cpu, but somehow that CPU doesn't write callin_cpu_map[] in
+* time, the boot CPU will timeout.  If it does eventually execute
+* stuff, the secondary will start up (paca[].cpu_start was written) and
+* get into a peculiar state.  If the platform supports
+* smp_ops-take_timebase(), the secondary CPU will probably be spinning
+* in there.  If not (i.e. pseries), the secondary will continue on and
+* try to online itself/idle/etc. If it survives that, we need to find
+* these possible-but-not-online-but-should-be CPUs and chaperone them
+* into kexec_smp_wait().
+*/
for_each_online_cpu(i) {
if (i == my_cpu)
continue;
@@ -189,9 +203,9 @@ static void kexec_prepare_cpus_wait(int wait_state)
while (paca[i].kexec_state  wait_state) {
barrier();
if (i != notified) {
-   printk( kexec: waiting for cpu %d (physical
-%d) to enter %i state\n,
-   i, paca[i].hw_cpu_id, wait_state);
+   printk(KERN_INFO kexec: waiting for cpu %d 
+  (physical %d) to enter %i state\n,
+  i, paca[i].hw_cpu_id, wait_state);
notified = i;
}
}
@@ -215,7 +229,10 @@ static void kexec_prepare_cpus(void)
if (ppc_md.kexec_cpu_down)
ppc_md.kexec_cpu_down(0, 0);
 
-   /* Before removing MMU mapings make sure all CPUs have entered real 
mode */
+   /*
+* Before removing MMU mappings make sure all CPUs have entered real
+* mode:
+*/
kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
 
put_cpu();
@@ -284,6 +301,8 @@ void default_machine_kexec(struct kimage *image)
if (crashing_cpu == -1)
kexec_prepare_cpus();
 
+   pr_debug(kexec: Starting switchover sequence.\n);
+
/* switch to a staticly allocated stack.  Based on irq stack code.
 * XXX: the task struct will likely be invalid once we do the copy!
 */
-- 
1.6.3.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/2] powerpc/kexec: Fix orphaned offline CPUs across kexec

2010-07-29 Thread Matt Evans
When CPU hotplug is used, some CPUs may be offline at the time a kexec is
performed.  The subsequent kernel may expect these CPUs to be already running,
and will declare them stuck.  On pseries, there's also a soft-offline (cede)
state that CPUs may be in; this can also cause problems as the kexeced kernel
may ask RTAS if they're online -- and RTAS would say they are.  The CPU will
either appear stuck, or will cause a crash as we replace its cede loop beneath
it.

This patch kicks each present offline CPU awake before the kexec, so that
none are forever lost to these assumptions in the subsequent kernel.

Now, the behaviour is that all available CPUs that were offlined are now
online  usable after the kexec.  This mimics the behaviour of a full reboot
(on which all CPUs will be restarted).

Signed-off-by: Matt Evans m...@ozlabs.org
---
 arch/powerpc/kernel/machine_kexec_64.c |   26 +-
 1 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index aa3d5cd..37f805e 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -16,6 +16,7 @@
 #include linux/init_task.h
 #include linux/errno.h
 #include linux/kernel.h
+#include linux/cpu.h
 
 #include asm/page.h
 #include asm/current.h
@@ -213,9 +214,32 @@ static void kexec_prepare_cpus_wait(int wait_state)
mb();
 }
 
-static void kexec_prepare_cpus(void)
+/*
+ * We need to make sure each present CPU is online.  The next kernel will scan
+ * the device tree and assume primary threads are online and query secondary
+ * threads via RTAS to online them if required.  If we don't online primary
+ * threads, they will be stuck.  However, we also online secondary threads as 
we
+ * may be using 'cede offline'.  In this case RTAS doesn't see the secondary
+ * threads as offline -- and again, these CPUs will be stuck.
+ *
+ * So, we online all CPUs that should be running, including secondary threads.
+ */
+static void wake_offline_cpus(void)
 {
+   int cpu = 0;
+
+   for_each_present_cpu(cpu) {
+   if (!cpu_online(cpu)) {
+   printk(KERN_INFO kexec: Waking offline cpu %d.\n,
+  cpu);
+   cpu_up(cpu);
+   }
+   }
+}
 
+static void kexec_prepare_cpus(void)
+{
+   wake_offline_cpus();
smp_call_function(kexec_smp_down, NULL, /* wait */0);
local_irq_disable();
mb(); /* make sure IRQs are disabled before we say they are */
-- 
1.6.3.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec

2010-07-28 Thread Matt Evans
When CPU hotplug is used, some CPUs may be offline at the time a kexec is
performed.  The subsequent kernel may expect these CPUs to be already running,
and will declare them stuck.  On pseries, there's also a soft-offline (cede)
state that CPUs may be in; this can also cause problems as the kexeced kernel
may ask RTAS if they're online -- and RTAS would say they are.  Again, stuck.

This patch kicks each present offline CPU awake before the kexec, so that
none are lost to these assumptions in the subsequent kernel.

Signed-off-by: Matt Evans m...@ozlabs.org
---
v2: Added FIXME comment noting a possible problem with incorrectly
started secondary CPUs, following feedback from Milton.

 arch/powerpc/kernel/machine_kexec_64.c |   55 ---
 1 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 4fbb3be..37f805e 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -15,6 +15,8 @@
 #include linux/thread_info.h
 #include linux/init_task.h
 #include linux/errno.h
+#include linux/kernel.h
+#include linux/cpu.h
 
 #include asm/page.h
 #include asm/current.h
@@ -181,7 +183,20 @@ static void kexec_prepare_cpus_wait(int wait_state)
int my_cpu, i, notified=-1;
 
my_cpu = get_cpu();
-   /* Make sure each CPU has atleast made it to the state we need */
+   /* Make sure each CPU has at least made it to the state we need.
+*
+* FIXME: There is a (slim) chance of a problem if not all of the CPUs
+* are correctly onlined.  If somehow we start a CPU on boot with RTAS
+* start-cpu, but somehow that CPU doesn't write callin_cpu_map[] in
+* time, the boot CPU will timeout.  If it does eventually execute
+* stuff, the secondary will start up (paca[].cpu_start was written) and
+* get into a peculiar state.  If the platform supports
+* smp_ops-take_timebase(), the secondary CPU will probably be spinning
+* in there.  If not (i.e. pseries), the secondary will continue on and
+* try to online itself/idle/etc. If it survives that, we need to find
+* these possible-but-not-online-but-should-be CPUs and chaperone them
+* into kexec_smp_wait().
+*/
for_each_online_cpu(i) {
if (i == my_cpu)
continue;
@@ -189,9 +204,9 @@ static void kexec_prepare_cpus_wait(int wait_state)
while (paca[i].kexec_state  wait_state) {
barrier();
if (i != notified) {
-   printk( kexec: waiting for cpu %d (physical
-%d) to enter %i state\n,
-   i, paca[i].hw_cpu_id, wait_state);
+   printk(KERN_INFO kexec: waiting for cpu %d 
+  (physical %d) to enter %i state\n,
+  i, paca[i].hw_cpu_id, wait_state);
notified = i;
}
}
@@ -199,9 +214,32 @@ static void kexec_prepare_cpus_wait(int wait_state)
mb();
 }
 
-static void kexec_prepare_cpus(void)
+/*
+ * We need to make sure each present CPU is online.  The next kernel will scan
+ * the device tree and assume primary threads are online and query secondary
+ * threads via RTAS to online them if required.  If we don't online primary
+ * threads, they will be stuck.  However, we also online secondary threads as 
we
+ * may be using 'cede offline'.  In this case RTAS doesn't see the secondary
+ * threads as offline -- and again, these CPUs will be stuck.
+ *
+ * So, we online all CPUs that should be running, including secondary threads.
+ */
+static void wake_offline_cpus(void)
 {
+   int cpu = 0;
 
+   for_each_present_cpu(cpu) {
+   if (!cpu_online(cpu)) {
+   printk(KERN_INFO kexec: Waking offline cpu %d.\n,
+  cpu);
+   cpu_up(cpu);
+   }
+   }
+}
+
+static void kexec_prepare_cpus(void)
+{
+   wake_offline_cpus();
smp_call_function(kexec_smp_down, NULL, /* wait */0);
local_irq_disable();
mb(); /* make sure IRQs are disabled before we say they are */
@@ -215,7 +253,10 @@ static void kexec_prepare_cpus(void)
if (ppc_md.kexec_cpu_down)
ppc_md.kexec_cpu_down(0, 0);
 
-   /* Before removing MMU mapings make sure all CPUs have entered real 
mode */
+   /*
+* Before removing MMU mappings make sure all CPUs have entered real
+* mode:
+*/
kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
 
put_cpu();
@@ -284,6 +325,8 @@ void default_machine_kexec(struct kimage *image)
if (crashing_cpu == -1)
kexec_prepare_cpus

[PATCH] powerpc/kexec: Fix orphaned offline CPUs across kexec

2010-07-27 Thread Matt Evans
When CPU hotplug is used, some CPUs may be offline at the time a kexec is
performed.  The subsequent kernel may expect these CPUs to be already running,
and will declare them stuck.  On pseries, there's also a soft-offline (cede)
state that CPUs may be in; this can also cause problems as the kexeced kernel
may ask RTAS if they're online -- and RTAS would say they are.  Again, stuck.

This patch kicks each present offline CPU awake before the kexec, so that
none are lost to these assumptions in the subsequent kernel.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 arch/powerpc/kernel/machine_kexec_64.c |   42 +++
 1 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 4fbb3be..c2fd914 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -15,6 +15,8 @@
 #include linux/thread_info.h
 #include linux/init_task.h
 #include linux/errno.h
+#include linux/kernel.h
+#include linux/cpu.h
 
 #include asm/page.h
 #include asm/current.h
@@ -181,7 +183,7 @@ static void kexec_prepare_cpus_wait(int wait_state)
int my_cpu, i, notified=-1;
 
my_cpu = get_cpu();
-   /* Make sure each CPU has atleast made it to the state we need */
+   /* Make sure each CPU has at least made it to the state we need. */
for_each_online_cpu(i) {
if (i == my_cpu)
continue;
@@ -189,9 +191,9 @@ static void kexec_prepare_cpus_wait(int wait_state)
while (paca[i].kexec_state  wait_state) {
barrier();
if (i != notified) {
-   printk( kexec: waiting for cpu %d (physical
-%d) to enter %i state\n,
-   i, paca[i].hw_cpu_id, wait_state);
+   printk(KERN_INFO kexec: waiting for cpu %d 
+  (physical %d) to enter %i state\n,
+  i, paca[i].hw_cpu_id, wait_state);
notified = i;
}
}
@@ -199,9 +201,32 @@ static void kexec_prepare_cpus_wait(int wait_state)
mb();
 }
 
-static void kexec_prepare_cpus(void)
+/*
+ * We need to make sure each present CPU is online.  The next kernel will scan
+ * the device tree and assume primary threads are online and query secondary
+ * threads via RTAS to online them if required.  If we don't online primary
+ * threads, they will be stuck.  However, we also online secondary threads as 
we
+ * may be using 'cede offline'.  In this case RTAS doesn't see the secondary
+ * threads as offline -- and again, these CPUs will be stuck.
+ *
+ * So, we online all CPUs that should be running, including secondary threads.
+ */
+static void wake_offline_cpus(void)
 {
+   int cpu = 0;
 
+   for_each_present_cpu(cpu) {
+   if (!cpu_online(cpu)) {
+   printk(KERN_INFO kexec: Waking offline cpu %d.\n,
+  cpu);
+   cpu_up(cpu);
+   }
+   }
+}
+
+static void kexec_prepare_cpus(void)
+{
+   wake_offline_cpus();
smp_call_function(kexec_smp_down, NULL, /* wait */0);
local_irq_disable();
mb(); /* make sure IRQs are disabled before we say they are */
@@ -215,7 +240,10 @@ static void kexec_prepare_cpus(void)
if (ppc_md.kexec_cpu_down)
ppc_md.kexec_cpu_down(0, 0);
 
-   /* Before removing MMU mapings make sure all CPUs have entered real 
mode */
+   /*
+* Before removing MMU mappings make sure all CPUs have entered real
+* mode:
+*/
kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
 
put_cpu();
@@ -284,6 +312,8 @@ void default_machine_kexec(struct kimage *image)
if (crashing_cpu == -1)
kexec_prepare_cpus();
 
+   pr_debug(kexec: Starting switchover sequence.\n);
+
/* switch to a staticly allocated stack.  Based on irq stack code.
 * XXX: the task struct will likely be invalid once we do the copy!
 */
-- 
1.6.3.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2] powerpc/kexec: Switch to a static PACA on the way out

2010-07-08 Thread Matt Evans
With dynamic PACAs, the kexecing CPU's PACA won't lie within the kernel
static data and there is a chance that something may stomp it when preparing
to kexec.  This patch switches this final CPU to a static PACA just before
we pull the switch.

Signed-off-by: Matt Evans m...@ozlabs.org
---
v2: Changes from Milton's review:
- Use setup_paca() and move from setup_64.c,
- SLB cache inval. not required,
- Adjust 'paca' (oops..), and
- Poison data_offset/per_cpu_offset

 arch/powerpc/include/asm/paca.h|2 +-
 arch/powerpc/kernel/machine_kexec_64.c |   20 
 arch/powerpc/kernel/paca.c |   10 ++
 arch/powerpc/kernel/setup_64.c |   10 --
 4 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 8ce7963..1ff6662 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -146,7 +146,7 @@ struct paca_struct {
 extern struct paca_struct *paca;
 extern __initdata struct paca_struct boot_paca;
 extern void initialise_paca(struct paca_struct *new_paca, int cpu);
-
+extern void setup_paca(struct paca_struct *new_paca);
 extern void allocate_pacas(void);
 extern void free_unused_pacas(void);
 
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 26f9900..c4d0123 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -273,6 +273,12 @@ static void kexec_prepare_cpus(void)
 static union thread_union kexec_stack __init_task_data =
{ };
 
+/*
+ * For similar reasons to the stack above, the kexecing CPU needs to be on a
+ * static PACA; we switch to kexec_paca.
+ */
+struct paca_struct kexec_paca;
+
 /* Our assembly helper, in kexec_stub.S */
 extern NORET_TYPE void kexec_sequence(void *newstack, unsigned long start,
void *image, void *control,
@@ -300,6 +306,20 @@ void default_machine_kexec(struct kimage *image)
kexec_stack.thread_info.task = current_thread_info()-task;
kexec_stack.thread_info.flags = 0;
 
+   /* We need a static PACA, too; copy this CPU's PACA over and switch to
+* it.  Also poison per_cpu_offset to catch anyone using non-static
+* data.
+*/
+   memcpy(kexec_paca, get_paca(), sizeof(struct paca_struct));
+   kexec_paca.data_offset = 0xedeaddeadeeeUL;
+   paca = (struct paca_struct *)RELOC_HIDE(kexec_paca, 0) -
+   kexec_paca.paca_index;
+   setup_paca(kexec_paca);
+
+   /* XXX: If anyone does 'dynamic lppacas' this will also need to be
+* switched to a static version!
+*/
+
/* Some things are best done in assembly.  Finding globals with
 * a toc is easier in C, so pass in what we can.
 */
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index f88acf0..3db8d64 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -105,6 +105,16 @@ void __init initialise_paca(struct paca_struct *new_paca, 
int cpu)
 #endif /* CONFIG_PPC_STD_MMU_64 */
 }
 
+/* Put the paca pointer into r13 and SPRG_PACA */
+void setup_paca(struct paca_struct *new_paca)
+{
+   local_paca = new_paca;
+   mtspr(SPRN_SPRG_PACA, local_paca);
+#ifdef CONFIG_PPC_BOOK3E
+   mtspr(SPRN_SPRG_TLB_EXFRAME, local_paca-extlb);
+#endif
+}
+
 static int __initdata paca_size;
 
 void __init allocate_pacas(void)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index f3fb5a7..6efbed4 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -142,16 +142,6 @@ early_param(smt-enabled, early_smt_enabled);
 #define check_smt_enabled()
 #endif /* CONFIG_SMP */
 
-/* Put the paca pointer into r13 and SPRG_PACA */
-static void __init setup_paca(struct paca_struct *new_paca)
-{
-   local_paca = new_paca;
-   mtspr(SPRN_SPRG_PACA, local_paca);
-#ifdef CONFIG_PPC_BOOK3E
-   mtspr(SPRN_SPRG_TLB_EXFRAME, local_paca-extlb);
-#endif
-}
-
 /*
  * Early initialization entry point. This is called by head.S
  * with MMU translation disabled. We rely on the feature of
-- 
1.6.3.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] kexec/ppc64: Switch to a static PACA on the way out

2010-07-07 Thread Matt Evans
With dynamic PACAs, the kexecing CPU's PACA won't lie within the kernel
static data and there is a chance that something may stomp it when preparing
to kexec.  This patch switches this final CPU to a static PACA just before
we pull the switch.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 arch/powerpc/kernel/machine_kexec_64.c |   19 +++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 040bd1d..1348e84 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -253,6 +253,12 @@ static void kexec_prepare_cpus(void)
 static union thread_union kexec_stack __init_task_data =
{ };
 
+/*
+ * For similar reasons to the stack above, the kexecing CPU needs to be on a
+ * static PACA; we switch to kexec_paca.
+ */
+struct paca_struct kexec_paca;
+
 /* Our assembly helper, in kexec_stub.S */
 extern NORET_TYPE void kexec_sequence(void *newstack, unsigned long start,
void *image, void *control,
@@ -280,6 +286,19 @@ void default_machine_kexec(struct kimage *image)
kexec_stack.thread_info.task = current_thread_info()-task;
kexec_stack.thread_info.flags = 0;
 
+   /* We need a static PACA, too; copy this CPU's PACA over and switch to
+* it.  Also make the SLB cache look invalid as it may have been touched
+* by an IRQ before we switch to it.
+*/
+   memcpy(kexec_paca, get_paca(), sizeof(struct paca_struct));
+   kexec_paca.slb_cache_ptr = SLB_CACHE_ENTRIES+1;
+   /* 'local_paca' boils down to GPR13 */
+   local_paca = kexec_paca;
+
+   /* XXX: If anyone does 'dynamic lppacas' this will also need to be
+* switched to a static version!
+*/
+
/* Some things are best done in assembly.  Finding globals with
 * a toc is easier in C, so pass in what we can.
 */
-- 
1.6.3.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] perf_event: Fix for power_pmu_disable()

2010-07-05 Thread Matt Evans
When power_pmu_disable() removes the given event from a particular index into
cpuhw-event[], it shuffles down higher event[] entries.  But, this array is
paired with cpuhw-events[] and cpuhw-flags[] so should shuffle them
similarly.

If these arrays get out of sync, code such as power_check_constraints() will
fail.  This caused a bug where events were temporarily disabled and then failed
to be re-enabled; subsequent code tried to write_pmc() with its (disabled) idx
of 0, causing a message oops trying to write PMC0.  This triggers this bug on
POWER7, running a miss-heavy test:

  perf record -e L1-dcache-load-misses -e L1-dcache-store-misses ./misstest

Signed-off-by: Matt Evans m...@ozlabs.org
---
 arch/powerpc/kernel/perf_event.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index 08460a2..3766398 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -838,8 +838,11 @@ static void power_pmu_disable(struct perf_event *event)
cpuhw = __get_cpu_var(cpu_hw_events);
for (i = 0; i  cpuhw-n_events; ++i) {
if (event == cpuhw-event[i]) {
-   while (++i  cpuhw-n_events)
+   while (++i  cpuhw-n_events) {
cpuhw-event[i-1] = cpuhw-event[i];
+   cpuhw-events[i-1] = cpuhw-events[i];
+   cpuhw-flags[i-1] = cpuhw-flags[i];
+   }
--cpuhw-n_events;
ppmu-disable_pmc(event-hw.idx - 1, cpuhw-mmcr);
if (event-hw.idx) {
-- 
1.6.3.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs

2010-06-27 Thread Matt Evans
Howdy Alex!

Alexander Graf wrote:
 We will soon start and replace instructions from the text section with
 other, paravirtualized versions. To ease the readability of those patches
 I split out the generic looping and magic page mapping code out.
 
 This patch still only contains stubs. But at least it loops through the
 text section :).
 
 Signed-off-by: Alexander Graf ag...@suse.de
 ---
  arch/powerpc/kernel/kvm.c |   59 
 +
  1 files changed, 59 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
 index 2d8dd73..d873bc6 100644
 --- a/arch/powerpc/kernel/kvm.c
 +++ b/arch/powerpc/kernel/kvm.c
 @@ -32,3 +32,62 @@
  #define KVM_MAGIC_PAGE   (-4096L)
  #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, 
 x)
  
 +static bool kvm_patching_worked = true;
 +
 +static void kvm_map_magic_page(void *data)
 +{
 + kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
 +KVM_MAGIC_PAGE,  /* Physical Address */
 +KVM_MAGIC_PAGE); /* Effective Address */
 +}
 +
 +static void kvm_check_ins(u32 *inst)
 +{
 + u32 _inst = *inst;
 + u32 inst_no_rt = _inst  ~KVM_MASK_RT;
 + u32 inst_rt = _inst  KVM_MASK_RT;
 +
 + switch (inst_no_rt) {
 + }
 +
 + switch (_inst) {
 + }
 +
 + flush_icache_range((ulong)inst, (ulong)inst + 4);
 +}
 +
 +static void kvm_use_magic_page(void)
 +{
 + u32 *p;
 + u32 *start, *end;
 +
 + /* Tell the host to map the magic page to -4096 on all CPUs */
 +
 + on_each_cpu(kvm_map_magic_page, NULL, 1);
 +
 + /* Now loop through all code and find instructions */
 +
 + start = (void*)_stext;
 + end = (void*)_etext;
 +
 + for (p = start; p  end; p++)
 + kvm_check_ins(p);
 +}

Could you do something similar in module_finalize() to patch loaded modules' 
.text sections?

 +
 +static int __init kvm_guest_init(void)
 +{
 + char *p;
 +
 + if (!kvm_para_available())
 + return 0;
 +
 + if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
 + kvm_use_magic_page();
 +
 + printk(KERN_INFO KVM: Live patching for a fast VM %s\n,
 +  kvm_patching_worked ? worked : failed);
 +
 + return 0;
 +}
 +
 +postcore_initcall(kvm_guest_init);

Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] kexec, ppc64: Wait for online/possible CPUs only.

2010-06-08 Thread Matt Evans
kexec_perpare_cpus_wait() iterates i through NR_CPUS to check
paca[i].kexec_state of each to make sure they have quiesced.
However now we have dynamic PACA allocation, paca[NR_CPUS] is not necessarily
valid and we overrun the array;  spurious cpu is not possible, ignoring
errors result.  This patch iterates for_each_online_cpu so stays
within the bounds of paca[] -- and every CPU is now 'possible'.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 arch/powerpc/kernel/machine_kexec_64.c |   18 +-
 1 files changed, 1 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 26f9900..ed31a29 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -182,28 +182,12 @@ static void kexec_prepare_cpus_wait(int wait_state)
 
my_cpu = get_cpu();
/* Make sure each CPU has atleast made it to the state we need */
-   for (i=0; i  NR_CPUS; i++) {
+   for_each_online_cpu(i) {
if (i == my_cpu)
continue;
 
while (paca[i].kexec_state  wait_state) {
barrier();
-   if (!cpu_possible(i)) {
-   printk(kexec: cpu %d hw_cpu_id %d is not
-possible, ignoring\n,
-   i, paca[i].hw_cpu_id);
-   break;
-   }
-   if (!cpu_online(i)) {
-   /* Fixme: this can be spinning in
-* pSeries_secondary_wait with a paca
-* waiting for it to go online.
-*/
-   printk(kexec: cpu %d hw_cpu_id %d is not
-online, ignoring\n,
-   i, paca[i].hw_cpu_id);
-   break;
-   }
if (i != notified) {
printk( kexec: waiting for cpu %d (physical
 %d) to enter %i state\n,
-- 
1.6.3.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Emulate most Book I instructions in emulate_step()

2010-06-02 Thread Matt Evans
Paul Mackerras wrote:
 [snip]
 The second alternative -- emulating the lwarx/stwcx and all the
 instructions in between -- sounds complicated but turns out to be
 pretty straightforward in fact, since the code for each instruction is
 pretty small, easy to verify that it's correct, and has little
 interaction with other code.

Easy to verify -- visually or logically?

Having had a little experience with interpreters 'invisibly' operating behind 
the scenes I am all for very rigorous testing of these things.  I have lost at 
least four of my nine lives to incorrect flag values, odd data problems and 
hideous heisenbugs etc. of such interpreters.  Looked at another way, you'd be 
surprised how much one can break in an interpreter and still successfully run 
various programs.

Presumably your first pass is completely correct already, but I'm thinking that 
if any future changes are made to it 
it would be good to include test code/modes alongside the interpreter so others 
can check alterations.  E.g. include the run user program interpreted test 
switch patch, or even better compare the interpreted state to real hardware 
execution.  There are other more directed test strategies (e.g. handwritten 
tests, random code) but these would be a good start.


Cheers,


Matt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2] kexec-tools, ppc64: Fix segfault parsing DR memory property

2010-05-13 Thread Matt Evans
add_dyn_reconf_usable_mem_property() iterates over memory spans
in /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory and intersects
these with usablemem_rgns ranges.  In doing so it used an unchecked
fixed-size array which will overrun on machines with lots of LMBs.

This patch removes the fixed-sized arrays from
add_dyn_reconf_usable_mem_property() and add_usable_mem_property(), in lieu of
malloc/realloc/free.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 kexec/arch/ppc64/fs2dt.c |   82 +++---
 1 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/kexec/arch/ppc64/fs2dt.c b/kexec/arch/ppc64/fs2dt.c
index 762bf04..4400f13 100644
--- a/kexec/arch/ppc64/fs2dt.c
+++ b/kexec/arch/ppc64/fs2dt.c
@@ -37,7 +37,7 @@
 #define NAMESPACE 16384/* max bytes for property names */
 #define INIT_TREE_WORDS 65536  /* Initial num words for prop values */
 #define MEMRESERVE 256 /* max number of reserved memory blocks */
-#define MAX_MEMORY_RANGES 1024
+#define MEM_RANGE_CHUNK_SZ 2048 /* Initial num dwords for mem ranges */
 
 static char pathname[MAXPATH], *pathstart;
 static char propnames[NAMESPACE] = { 0 };
@@ -148,7 +148,8 @@ static void add_dyn_reconf_usable_mem_property(int fd)
 {
char fname[MAXPATH], *bname;
uint64_t buf[32];
-   uint64_t ranges[2*MAX_MEMORY_RANGES];
+   uint64_t *ranges;
+   int ranges_size = MEM_RANGE_CHUNK_SZ;
uint64_t base, end, loc_base, loc_end;
size_t i, rngs_cnt, range;
int rlen = 0;
@@ -165,6 +166,11 @@ static void add_dyn_reconf_usable_mem_property(int fd)
die(unrecoverable error: error seeking in \%s\: %s\n,
pathname, strerror(errno));
 
+   ranges = malloc(ranges_size*8);
+   if (!ranges)
+   die(unrecoverable error: can't alloc %d bytes for ranges.\n,
+   ranges_size*8);
+
rlen = 0;
for (i = 0; i  num_of_lmbs; i++) {
if (read(fd, buf, 24)  0)
@@ -180,24 +186,57 @@ static void add_dyn_reconf_usable_mem_property(int fd)
 
rngs_cnt = 0;
for (range = 0; range  usablemem_rgns.size; range++) {
+   int add = 0;
loc_base = usablemem_rgns.ranges[range].start;
loc_end = usablemem_rgns.ranges[range].end;
if (loc_base = base  loc_end = end) {
-   ranges[rlen++] = loc_base;
-   ranges[rlen++] = loc_end - loc_base;
-   rngs_cnt++;
+   add = 1;
} else if (base  loc_end  end  loc_base) {
if (loc_base  base)
loc_base = base;
if (loc_end  end)
loc_end = end;
+   add = 1;
+   }
+
+   if (add) {
+   if (rlen = (ranges_size-2)) {
+   ranges_size += MEM_RANGE_CHUNK_SZ;
+   ranges = realloc(ranges, ranges_size*8);
+   if (!ranges)
+   die(unrecoverable error: can't
+realloc %d bytes for
+ranges.\n,
+   ranges_size*8);
+   }
ranges[rlen++] = loc_base;
ranges[rlen++] = loc_end - loc_base;
rngs_cnt++;
}
}
-   /* Store the count of (base, size) duple */
-   ranges[tmp_indx] = rngs_cnt;
+   if (rngs_cnt == 0) {
+   /* We still need to add a counter for every LMB because
+* the kernel parsing code is dumb.  We just have
+* a zero in this case, with no following base/len.
+*/
+   ranges[tmp_indx] = 0;
+   /* rlen is already just tmp_indx+1 as we didn't write
+* anything.  Check array size here, as we'll probably
+* go on for a while writing zeros now.
+*/
+   if (rlen = (ranges_size-1)) {
+   ranges_size += MEM_RANGE_CHUNK_SZ;
+   ranges = realloc(ranges, ranges_size*8);
+   if (!ranges)
+   die(unrecoverable error: can't
+realloc %d bytes for
+ranges.\n

[PATCH] kexec-tools, ppc64: Fix segfault parsing DR memory property

2010-05-11 Thread Matt Evans

add_dyn_reconf_usable_mem_property() iterates over memory spans
in /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory and intersects
these with usablemem_rgns ranges.  Not only did it seem to write
null properties for every range that didn't match, but it used an unchecked
fixed-size array which will overrun on machines with lots of LMBs.

This patch stops add_dyn_reconf_usable_mem_property() from adding null ranges
to the linux,drconf-usable-memory property and removes its fixed-size array,
as well as the array in add_usable_mem_property, in lieu of malloc/realloc/free.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 kexec/arch/ppc64/fs2dt.c |   66 +
 1 files changed, 54 insertions(+), 12 deletions(-)

diff --git a/kexec/arch/ppc64/fs2dt.c b/kexec/arch/ppc64/fs2dt.c
index 762bf04..7470132 100644
--- a/kexec/arch/ppc64/fs2dt.c
+++ b/kexec/arch/ppc64/fs2dt.c
@@ -37,7 +37,7 @@
 #define NAMESPACE 16384/* max bytes for property names */
 #define INIT_TREE_WORDS 65536  /* Initial num words for prop values */
 #define MEMRESERVE 256 /* max number of reserved memory blocks */
-#define MAX_MEMORY_RANGES 1024
+#define MEM_RANGE_CHUNK_SZ 2048 /* Initial num dwords for mem ranges */
 
 static char pathname[MAXPATH], *pathstart;
 static char propnames[NAMESPACE] = { 0 };
@@ -148,7 +148,8 @@ static void add_dyn_reconf_usable_mem_property(int fd)
 {
char fname[MAXPATH], *bname;
uint64_t buf[32];
-   uint64_t ranges[2*MAX_MEMORY_RANGES];
+   uint64_t *ranges;
+   int ranges_size = MEM_RANGE_CHUNK_SZ;
uint64_t base, end, loc_base, loc_end;
size_t i, rngs_cnt, range;
int rlen = 0;
@@ -165,6 +166,11 @@ static void add_dyn_reconf_usable_mem_property(int fd)
die(unrecoverable error: error seeking in \%s\: %s\n,
pathname, strerror(errno));
 
+   ranges = malloc(ranges_size*8);
+   if (!ranges)
+   die(unrecoverable error: can't alloc %d bytes for ranges.\n,
+   ranges_size*8);
+
rlen = 0;
for (i = 0; i  num_of_lmbs; i++) {
if (read(fd, buf, 24)  0)
@@ -180,24 +186,41 @@ static void add_dyn_reconf_usable_mem_property(int fd)
 
rngs_cnt = 0;
for (range = 0; range  usablemem_rgns.size; range++) {
+   int add = 0;
loc_base = usablemem_rgns.ranges[range].start;
loc_end = usablemem_rgns.ranges[range].end;
if (loc_base = base  loc_end = end) {
-   ranges[rlen++] = loc_base;
-   ranges[rlen++] = loc_end - loc_base;
-   rngs_cnt++;
+   add = 1;
} else if (base  loc_end  end  loc_base) {
if (loc_base  base)
loc_base = base;
if (loc_end  end)
loc_end = end;
+   add = 1;
+   }
+
+   if (add) {
+   if (rlen = (ranges_size-2)) {
+   ranges_size += MEM_RANGE_CHUNK_SZ;
+   ranges = realloc(ranges, ranges_size*8);
+   if (!ranges)
+   die(unrecoverable error: can't
+realloc %d bytes for
+ranges.\n,
+   ranges_size*8);
+   }
ranges[rlen++] = loc_base;
ranges[rlen++] = loc_end - loc_base;
rngs_cnt++;
}
}
-   /* Store the count of (base, size) duple */
-   ranges[tmp_indx] = rngs_cnt;
+   if (rngs_cnt == 0) {
+   /* Don't store anything for unwritten iterations! */
+   rlen = tmp_indx;
+   } else {
+   /* Store the count of (base, size) duple */
+   ranges[tmp_indx] = rngs_cnt;
+   }
}

rlen = rlen * sizeof(uint64_t);
@@ -210,7 +233,8 @@ static void add_dyn_reconf_usable_mem_property(int fd)
*dt++ = propnum(linux,drconf-usable-memory);
if ((rlen = 8)  ((unsigned long)dt  0x4))
dt++;
-   memcpy(dt, ranges, rlen);
+   memcpy(dt, ranges, rlen);
+   free(ranges);
dt += (rlen + 3)/4;
 }
 
@@ -218,7 +242,8 @@ static void add_usable_mem_property(int fd, size_t len)
 {
char fname[MAXPATH], *bname;
uint64_t buf[2];
-   uint64_t ranges[2