Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
In message [EMAIL PROTECTED] you wrote : + } fpvsr __attribute__((aligned(16))); Do we really need a union here? what would happen if you just changed the type of fpr[32] from double to vector if #CONFIG_VSX? I really dont like the union and think we can just make the storage look opaque which is the key. I doubt we every really care about using fpr[] as a double in the kernel. I did something similar to this for the first cut of this patch, but it made the code accessing this structure much less readable. really, what code is that? Any code that has to read/write the top or bottom 64 bits _only_ of the 128 bit vector. The signals code is a good example where, for backwards compatibility, we need to read/write the old 64 bit FP regs, from the 128 bit value in the struct. Similarly, the way we've extended the signals interface for VSX, you need to read/write out the bottom 64 bits (vsrlow) of a 128 bit value. eg. the simple: current-thread.fpvsr.fp[i].vsrlow = buf[i] would turn into some abomination/macro. it would turn into something like: current-thread.fpr[i][2] = buf[i]; current-thread.fpr[i][3] = buf[i+1]; Maybe abomination was going too far :-) I still think using the union makes it is easier to read than what you have here. Also, it better reflects the structure of what's being stored there. Mikey if you look at your code you'll see there are only a few places you accessing the union as fpvsr.vsr[] and those places could easily be fpr[], since they are already #CONFIG_VSX protected. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
On Jun 19, 2008, at 1:01 AM, Michael Neuling wrote: In message B0E87874-BC65-4037- [EMAIL PROTECTED] you wrote : + } fpvsr __attribute__((aligned(16))); Do we really need a union here? what would happen if you just changed the type of fpr[32] from double to vector if #CONFIG_VSX? I really dont like the union and think we can just make the storage look opaque which is the key. I doubt we every really care about using fpr[] as a double in the kernel. I did something similar to this for the first cut of this patch, but it made the code accessing this structure much less readable. really, what code is that? Any code that has to read/write the top or bottom 64 bits _only_ of the 128 bit vector. The signals code is a good example where, for backwards compatibility, we need to read/write the old 64 bit FP regs, from the 128 bit value in the struct. Similarly, the way we've extended the signals interface for VSX, you need to read/write out the bottom 64 bits (vsrlow) of a 128 bit value. eg. the simple: current-thread.fpvsr.fp[i].vsrlow = buf[i] would turn into some abomination/macro. it would turn into something like: current-thread.fpr[i][2] = buf[i]; current-thread.fpr[i][3] = buf[i+1]; Maybe abomination was going too far :-) I still think using the union makes it is easier to read than what you have here. Also, it better reflects the structure of what's being stored there. I don't think that holds much weight with me. We don't union the vector128 type to show it also supports float, u16, and u8 types. I stick by the fact that the ONLY place it looks like you access the union via the .vsr member is for memset or memcpy so you clearly know if the size should be sizeof(double) or sizeof(vector). Also, I can see the case in the future that 'fpr's become 128-bits wide' and allow for native long double support. - k ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/6] Move code patching code into arch/powerpc/lib/code-patching.c
On May 29, 2008, at 1:20 AM, Michael Ellerman wrote: We currently have a few routines for patching code in asm/system.h, because they didn't fit anywhere else. I'd like to clean them up a little and add some more, so first move them into a dedicated C file - they don't need to be inlined. While we're moving the code, drop create_function_call(), it's intended caller never got merged and will be replaced in future with something different. Signed-off-by: Michael Ellerman [EMAIL PROTECTED] --- arch/powerpc/kernel/crash_dump.c |1 + arch/powerpc/lib/Makefile |2 + arch/powerpc/lib/code-patching.c | 33 arch/powerpc/platforms/86xx/mpc86xx_smp.c |1 + arch/powerpc/platforms/powermac/smp.c |1 + include/asm-powerpc/code-patching.h | 25 +++ include/asm-powerpc/system.h | 48 - 7 files changed, 63 insertions(+), 48 deletions(-) diff --git a/arch/powerpc/kernel/crash_dum what's the state of these patches and getting them into powerpc-next? I'm looking at some runtime fix ups that I was thinking of basing on this code. - k ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 6/9] powerpc: Add VSX CPU feature
On Wed, 2008-06-18 at 10:47 +1000, Michael Neuling wrote: {ibm,vmx, 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC}, #endif /* CONFIG_ALTIVEC */ +#ifdef CONFIG_VSX + {ibm,vmx, 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX}, +#endif /* CONFIG_VSX */ Should that be ibm,vsx? -- dwmw2 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/6] Move code patching code into arch/powerpc/lib/code-patching.c
On Thu, 2008-06-19 at 01:15 -0500, Kumar Gala wrote: On May 29, 2008, at 1:20 AM, Michael Ellerman wrote: We currently have a few routines for patching code in asm/system.h, because they didn't fit anywhere else. I'd like to clean them up a little and add some more, so first move them into a dedicated C file - they don't need to be inlined. While we're moving the code, drop create_function_call(), it's intended caller never got merged and will be replaced in future with something different. Signed-off-by: Michael Ellerman [EMAIL PROTECTED] --- arch/powerpc/kernel/crash_dump.c |1 + arch/powerpc/lib/Makefile |2 + arch/powerpc/lib/code-patching.c | 33 arch/powerpc/platforms/86xx/mpc86xx_smp.c |1 + arch/powerpc/platforms/powermac/smp.c |1 + include/asm-powerpc/code-patching.h | 25 +++ include/asm-powerpc/system.h | 48 - 7 files changed, 63 insertions(+), 48 deletions(-) diff --git a/arch/powerpc/kernel/crash_dum what's the state of these patches and getting them into powerpc-next? I think what I posted is reasonably solid, I've added some more routines for the stuff I'm working on. I'll repost today or tommorrow. I'm looking at some runtime fix ups that I was thinking of basing on this code. What have you got in mind? I'm working on some runtime fixups too :) cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 6/9] powerpc: Add VSX CPU feature
On Wed, 2008-06-18 at 10:47 +1000, Michael Neuling wrote: {ibm,vmx, 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC}, #endif /* CONFIG_ALTIVEC */ +#ifdef CONFIG_VSX + {ibm,vmx, 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX}, +#endif /* CONFIG_VSX */ Should that be ibm,vsx? Nope ibm,vmx == 2 is correct for VSX. You're not the first to think it looks wrong, so I should add a comment. Mikey ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[RFC 3/3] powerpc: copy_4K_page tweaked for Cell
/* * Copyright (C) 2008 Gunnar von Boehn, IBM Corp. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * * * copy_4K_page routine optimized for CELL-BE-PPC * * The CELL PPC core has 1 integerunit and 1 load/store unit * CELL: 1st level data cache = 32K - 2nd level data cache = 512K * - 3rd level data cache = 0K * To improve copy performance we need to prefetch source data * far ahead to hide this latency * For best performance instruction forms ending in . like andi. * should be avoided as they are implemented in microcode on CELL. * * The below code is loop unrolled for the CELL cache line of 128 bytes. */ #include asm/processor.h #include asm/ppc_asm.h #define PREFETCH_AHEAD 6 #define ZERO_AHEAD 4 .align 7 _GLOBAL(copy_4K_page) dcbt0,r4/* Prefetch ONE SRC cacheline */ addir6,r3,-8/* prepare for stdu */ addir4,r4,-8/* prepare for ldu */ li r10,32 /* copy 32 cache lines for a 4K page */ li r12,128+8 /* prefetch distance*/ subir11,r10,PREFETCH_AHEAD li r10,PREFETCH_AHEAD mtctr r10 .LprefetchSRC: dcbtr12,r4 addir12,r12,128 bdnz.LprefetchSRC .Louterloop:/* copy while cache lines */ mtctr r11 li r11,128*ZERO_AHEAD +8 /* DCBZ dist */ .align 4 /* Copy whole cachelines, optimized by prefetching SRC cacheline */ .Lloop: /* Copy aligned body */ dcbtr12,r4 /* PREFETCH SOURCE some cache lines ahead*/ ld r9, 0x08(r4) dcbzr11,r6 ld r7, 0x10(r4)/* 4 register stride copy */ ld r8, 0x18(r4)/* 4 are optimal to hide 1st level cache lantency*/ ld r0, 0x20(r4) std r9, 0x08(r6) std r7, 0x10(r6) std r8, 0x18(r6) std r0, 0x20(r6) ld r9, 0x28(r4) ld r7, 0x30(r4) ld r8, 0x38(r4) ld r0, 0x40(r4) std r9, 0x28(r6) std r7, 0x30(r6) std r8, 0x38(r6) std r0, 0x40(r6) ld r9, 0x48(r4) ld r7, 0x50(r4) ld r8, 0x58(r4) ld r0, 0x60(r4) std r9, 0x48(r6) std r7, 0x50(r6) std r8, 0x58(r6) std r0, 0x60(r6) ld r9, 0x68(r4) ld r7, 0x70(r4) ld r8, 0x78(r4) ldu r0, 0x80(r4) std r9, 0x68(r6) std r7, 0x70(r6) std r8, 0x78(r6) stdur0, 0x80(r6) bdnz.Lloop sldir10,r10,2 /* adjust from 128 to 32 byte stride */ mtctr r10 .Lloop2:/* Copy aligned body */ ld r9, 0x08(r4) ld r7, 0x10(r4) ld r8, 0x18(r4) ldu r0, 0x20(r4) std r9, 0x08(r6) std r7, 0x10(r6) std r8, 0x18(r6) stdur0, 0x20(r6) bdnz.Lloop2 .Lendloop2: blr ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[RFC 2/3] powerpc: memcpy tweaked for Cell
/* * Copyright (C) 2008 Gunnar von Boehn, IBM Corp. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * * * memcpy (and copy_4K_page) routine optimized for CELL-BE-PPC * * The CELL PPC core has 1 integerunit and 1 load/store unit * CELL: 1st level data cache = 32K - 2nd level data cache = 512K * - 3rd level data cache = 0K * To improve copy performance we need to prefetch source data * far ahead to hide this latency * For best performance instruction forms ending in . like andi. * should be avoided as they are implemented in microcode on CELL. * * The below code is loop unrolled for the CELL cache line of 128 bytes. */ #include asm/processor.h #include asm/ppc_asm.h #define PREFETCH_AHEAD 6 #define ZERO_AHEAD 4 .align 7 _GLOBAL(memcpy) dcbt0,r4/* Prefetch ONE SRC cacheline */ cmpldi cr1,r5,16 /* is size 16 ? */ mr r6,r3 blt+cr1,.Lshortcopy .Lbigcopy: neg r8,r3 /* LS 3 bits = # bytes to 8-byte dest bdry */ clrldi r8,r8,64-4 /* aling to 16byte boundary */ sub r7,r4,r3 cmpldi cr0,r8,0 beq+.Ldst_aligned .Ldst_unaligned: mtcrf 0x01,r8 /* put #bytes to boundary into cr7 */ subfr5,r8,r5 bf cr7*4+3,1f lbzxr0,r7,r6/* copy 1 byte */ stb r0,0(r6) addir6,r6,1 1: bf cr7*4+2,2f lhzxr0,r7,r6/* copy 2 byte */ sth r0,0(r6) addir6,r6,2 2: bf cr7*4+1,4f lwzxr0,r7,r6/* copy 4 byte */ stw r0,0(r6) addir6,r6,4 4: bf cr7*4+0,8f ldx r0,r7,r6/* copy 8 byte */ std r0,0(r6) addir6,r6,8 8: add r4,r7,r6 .Ldst_aligned: cmpdi cr5,r5,128-1 neg r7,r6 addir6,r6,-8/* prepare for stdu */ addir4,r4,-8/* prepare for ldu */ clrldi r7,r7,64-7 /* align to cacheline boundary */ ble+cr5,.Llessthancacheline cmpldi cr6,r7,0 subfr5,r7,r5 srdir7,r7,4 /* divide size by 16 */ srdir10,r5,7/* number of cache lines to copy */ cmpldi r10,0 li r11,0 /* number cachelines to copy with prefetch */ beq .Lnocacheprefetch cmpldi r10,PREFETCH_AHEAD li r12,128+8 /* prefetch distance*/ ble .Llessthanmaxprefetch subir11,r10,PREFETCH_AHEAD li r10,PREFETCH_AHEAD .Llessthanmaxprefetch: mtctr r10 .LprefetchSRC: dcbtr12,r4 addir12,r12,128 bdnz.LprefetchSRC .Lnocacheprefetch: mtctr r7 cmpldi cr1,r5,128 clrldi r5,r5,64-7 beq cr6,.Lcachelinealigned /* */ .Laligntocacheline: ld r9,0x08(r4) ldu r7,0x10(r4) std r9,0x08(r6) stdur7,0x10(r6) bdnz.Laligntocacheline .Lcachelinealigned: /* copy while cache lines */ blt-cr1,.Llessthancacheline /* size 128 */ .Louterloop: cmpdi r11,0 mtctr r11 beq-.Lendloop li r11,128*ZERO_AHEAD +8 /* DCBZ dist */ .align 4 /* Copy whole cachelines, optimized by prefetching SRC cacheline */ .Lloop: /* Copy aligned body */ dcbtr12,r4 /* PREFETCH SOURCE some cache lines ahead*/ ld r9, 0x08(r4) dcbzr11,r6 ld r7, 0x10(r4)/* 4 register stride copy */ ld r8, 0x18(r4)/* 4 are optimal to hide 1st level cache lantency*/ ld r0, 0x20(r4) std r9, 0x08(r6) std r7, 0x10(r6) std r8, 0x18(r6) std r0, 0x20(r6) ld r9, 0x28(r4) ld r7, 0x30(r4) ld r8, 0x38(r4) ld r0, 0x40(r4) std r9, 0x28(r6) std r7, 0x30(r6) std r8, 0x38(r6) std r0, 0x40(r6) ld r9, 0x48(r4) ld r7, 0x50(r4) ld r8, 0x58(r4) ld r0, 0x60(r4) std r9, 0x48(r6) std r7, 0x50(r6) std r8, 0x58(r6) std r0, 0x60(r6) ld r9, 0x68(r4) ld r7, 0x70(r4) ld r8, 0x78(r4) ldu r0, 0x80(r4) std r9, 0x68(r6) std r7, 0x70(r6) std r8, 0x78(r6) stdur0, 0x80(r6) bdnz.Lloop .Lendloop: cmpdi r10,0 sldir10,r10,2 /* adjust from 128
Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
On Thu, 2008-06-19 at 01:10 -0500, Kumar Gala wrote: I still think using the union makes it is easier to read than what you have here. Also, it better reflects the structure of what's being stored there. I don't think that holds much weight with me. We don't union the vector128 type to show it also supports float, u16, and u8 types. But this is different. The same registers are either basic FP regs or full VSX regs. I don't see what's wrong with union, it's a nice way to express things. I stick by the fact that the ONLY place it looks like you access the union via the .vsr member is for memset or memcpy so you clearly know if the size should be sizeof(double) or sizeof(vector). Also, I can see the case in the future that 'fpr's become What's wrong with the union ? there's nothing ugly about them.. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC/PATCH 0/3] sched: allow arch override of cpu power
* Nathan Lynch [EMAIL PROTECTED] wrote: There is an interesting quality of POWER6 cores, which each have 2 hardware threads: assuming one thread on the core is idle, the primary thread is a little faster than the secondary thread. To illustrate: for cpumask in 0x1 0x2 ; do taskset $cpumask /usr/bin/time -f %e elapsed, %U user, %S sys \ /bin/sh -c i=100 ; while (( i-- )) ; do : ; done done 17.05 elapsed, 16.83 user, 0.22 sys 17.54 elapsed, 17.32 user, 0.22 sys (The first result is for a primary thread; the second result for a secondary thread.) So it would be nice to have the scheduler slightly prefer primary threads on POWER6 machines. These patches, which allow the architecture to override the scheduler's CPU power calculation, are one possible approach, but I'm open to others. Please note: these seemed to have the desired effect on 2.6.25-rc kernels (2-3% improvement in a kernbench-like make -j nr_cores), but I'm not seeing this improvement with 2.6.26-rc kernels for some reason I am still trying to track down. ok, i guess that discrepancy has to be tracked down before we can think about these patches - but the principle is OK. One problem is that the whole cpu-power balancing code in sched.c is a bit ... unclear and under-documented. So any change to this area should begin at documenting the basics: what do the units mean exactly, how are they used in balancing and what is the desired effect. I'd not be surprised if there were a few buglets in this area, SMT is not at the forefront of testing at the moment. There's nothing spectacularly broken in it (i have a HT machine myself), but the concepts have bitrotten a bit. Patches - even if they just add comments - are welcome :-) Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
On Thursday 19 June 2008, Mark Nelson wrote: The plan is to use Michael Ellerman's code patching work so that at runtime if we're running on a Cell machine the new routines are called but otherwise the existing memory copy routines are used. Have you tried running this code on other platforms to see if it actually performs worse on any of them? I would guess that the older code also doesn't work too well on Power 5 and Power 6, so the cell optimized version could give us a significant advantage as well, albeit less than another CPU specific version. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
Arnd Bergmann writes: Have you tried running this code on other platforms to see if it actually performs worse on any of them? I would guess that the older code also doesn't work too well on Power 5 and Power 6, Why would you guess that? Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
Hi Arnd, I have no results for P5/P6, but I did some tests on JS21 aka PPC-970. On PPC-970 the CELL memcpy is faster than the current Linux routine. This becomes really visible when you really copy memory-to-memory and are not only working in the 2ndlevelcache. Kind regards Gunnar von Boehn Arnd Bergmann [EMAIL PROTECTED] To 19/06/2008 13:53 linuxppc-dev@ozlabs.org cc Mark Nelson [EMAIL PROTECTED], [EMAIL PROTECTED], Gunnar von Boehn/Germany/Contr/[EMAIL PROTECTED], Michael Ellerman [EMAIL PROTECTED] Subject Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell On Thursday 19 June 2008, Mark Nelson wrote: The plan is to use Michael Ellerman's code patching work so that at runtime if we're running on a Cell machine the new routines are called but otherwise the existing memory copy routines are used. Have you tried running this code on other platforms to see if it actually performs worse on any of them? I would guess that the older code also doesn't work too well on Power 5 and Power 6, so the cell optimized version could give us a significant advantage as well, albeit less than another CPU specific version. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/6] Move code patching code into arch/powerpc/lib/code-patching.c
On Jun 19, 2008, at 1:55 AM, Michael Ellerman wrote: On Thu, 2008-06-19 at 01:15 -0500, Kumar Gala wrote: On May 29, 2008, at 1:20 AM, Michael Ellerman wrote: We currently have a few routines for patching code in asm/system.h, because they didn't fit anywhere else. I'd like to clean them up a little and add some more, so first move them into a dedicated C file - they don't need to be inlined. While we're moving the code, drop create_function_call(), it's intended caller never got merged and will be replaced in future with something different. Signed-off-by: Michael Ellerman [EMAIL PROTECTED] --- arch/powerpc/kernel/crash_dump.c |1 + arch/powerpc/lib/Makefile |2 + arch/powerpc/lib/code-patching.c | 33 ++ ++ arch/powerpc/platforms/86xx/mpc86xx_smp.c |1 + arch/powerpc/platforms/powermac/smp.c |1 + include/asm-powerpc/code-patching.h | 25 +++ include/asm-powerpc/system.h | 48 - 7 files changed, 63 insertions(+), 48 deletions(-) diff --git a/arch/powerpc/kernel/crash_dum what's the state of these patches and getting them into powerpc-next? I think what I posted is reasonably solid, I've added some more routines for the stuff I'm working on. I'll repost today or tommorrow. I'm looking at some runtime fix ups that I was thinking of basing on this code. What have you got in mind? I'm working on some runtime fixups too :) I want to be able to run time fix up lwsync an remove it as compile time thing. - k ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
On Jun 19, 2008, at 4:33 AM, Benjamin Herrenschmidt wrote: On Thu, 2008-06-19 at 01:10 -0500, Kumar Gala wrote: I still think using the union makes it is easier to read than what you have here. Also, it better reflects the structure of what's being stored there. I don't think that holds much weight with me. We don't union the vector128 type to show it also supports float, u16, and u8 types. But this is different. The same registers are either basic FP regs or full VSX regs. I don't see what's wrong with union, it's a nice way to express things. We also don't do this for SPE (the freescale version). I stick by the fact that the ONLY place it looks like you access the union via the .vsr member is for memset or memcpy so you clearly know if the size should be sizeof(double) or sizeof(vector). Also, I can see the case in the future that 'fpr's become What's wrong with the union ? there's nothing ugly about them.. I'll wait for the next version and see how many places .vsr is actually accessed. - k ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
On Thursday 19 June 2008, Paul Mackerras wrote: Arnd Bergmann writes: Have you tried running this code on other platforms to see if it actually performs worse on any of them? I would guess that the older code also doesn't work too well on Power 5 and Power 6, Why would you guess that? I remembered that Gunnar had done some tests on other CPUs showing that an earlier version of the code was better than the kernel memcpy. Also, I had tried to trace the history of the usercopy function and found that it predates most of the CPUs in current use, so I assume it has suffered from bitrot and nobody tried to do better since the Power3 days. AFAICT, it hasn't seen any update since your original Power4 version from 2002. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC/PATCH] powerpc: rework 4xx PTE access and TLB miss
On Wed, 11 Jun 2008 10:50:31 +1000 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: This is some preliminary work to improve TLB management on SW loaded TLB powerpc platforms. This introduce support for non-atomic PTE operations in pgtable-ppc32.h and removes write back to the PTE from the TLB miss handlers. In addition, the DSI interrupt code no longer tries to fixup write permission, this is left to generic code, and _PAGE_HWWRITE is gone. Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED] --- This is a first step, plan is to do the same for FSL BookE, 405 and possibly 8xx too. From there, I want to rework a bit the execute permission handling to avoid multiple faults, add support for _PAGE_EXEC (no executable mappings), for prefaulting (especially for kmap) and proper SMP support for future SMP capable BookE platforms. I've looked this over quite a bit and can't find anything wrong with it. As soon as I get my boards set back up next week, I will try it out on a few and see if I can find a good stress test as well. If you could add the comments that Kumar suggested and send out an updated patch, I'm inclined to get this into 2.6.27, but we should do that soon if that is our target. josh ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [OOPS] RT kernel on PowerPC
Hi Chirag On Thu, 19 Jun 2008 18:16:34 +0530 Chirag Jog [EMAIL PROTECTED] wrote: Hi, I was trying out the realtime linux kernel 2.6.25.4-rt3 on a powerpc box. The kernel booted fine. On running the matrix_mult testcase from the real-time testsuite in ltp (ltp/testcases/realtime/func), I get the following Oops. After the which machine just freezes. I do get the same thing on a JS22 blade, and you will find that some other tests are hanging or oopsing. For example the sbrk_mutex testcase suffers from missed hrtimer wakeups. I also get loads of: BUG: using smp_processor_id() in preemptible [] code all triggered from the sys_munmap - ... - free_pgtables code path. Currently I'm trying to debug some networking problems where the whole stack gets stuck under heavy receive load. As you can see, -rt is far from stable on the Power architecture. Sorry for not having an answer for you but I just wanted to show some of the obstacles laying ahead. Sebastien. I tried setting the panic_on_oops but that didn't help strangely. Also, attached is the config file Oops: Kernel access of bad area, sig: 11 [#1] PREEMPT SMP NR_CPUS=64 NUMA pSeries Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ehea inet_lro iptable_filter ip_tables xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_multipath dm_mod parport_pc lp parport sg sr_mod ibmvscsic scsi_transport_srp sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd NIP: c0050fa0 LR: c0053db4 CTR: c005ab98 REGS: c0009224fe50 TRAP: 0300 Not tainted (2.6.25.4-rt3) MSR: 80001032 ME,IR,DR CR: 2888 XER: DAR: c180004fc4b0, DSISR: 4000 TASK = c00092200ad0[59] 'events/3' THREAD: c0009225 CPU: 3 GPR00: c04fc480 c000922500d0 c05bca00 c00092200ad0 GPR04: 0002 0038 000f GPR08: 0001 c180004fc480 0f0f0f0f0f0f0f0f c04d9a00 GPR12: 80009032 c04fca80 c0413910 GPR16: 40c0 c0412108 00284000 GPR20: c04cb9b0 010cb9b0 001fdaa40b13 GPR24: 001f c000922501d0 GPR28: 0001 c00092200ad0 c056a680 c00090b18ad0 NIP [c0050fa0] .__resched_task+0x38/0xfc LR [c0053db4] .try_to_wake_up+0x168/0x200 Call Trace: Instruction dump: fbc1fff0 fbe1fff8 ebc2af70 7c7d1b78 f8010010 f821ff71 e81e8008 e97e8000 e9230008 81290010 79294da4 7d290214 e8090030 7c0b002e 7c74 7800d182 Disassembling the __resched_task, Dump of assembler code for function __resched_task: 0xc0050f68 __resched_task+0: mflrr0 0xc0050f6c __resched_task+4: std r29,-24(r1) 0xc0050f70 __resched_task+8: std r30,-16(r1) 0xc0050f74 __resched_task+12: std r31,-8(r1) 0xc0050f78 __resched_task+16: ld r30,-20624(r2) 0xc0050f7c __resched_task+20: mr r29,r3 0xc0050f80 __resched_task+24: std r0,16(r1) 0xc0050f84 __resched_task+28: stdur1,-144(r1) 0xc0050f88 __resched_task+32: ld r0,-32760(r30) 0xc0050f8c __resched_task+36: ld r11,-32768(r30) 0xc0050f90 __resched_task+40: ld r9,8(r3) 0xc0050f94 __resched_task+44: lwz r9,16(r9) 0xc0050f98 __resched_task+48: rldicr r9,r9,9,54 0xc0050f9c __resched_task+52: add r9,r9,r0 0xc0050fa0 __resched_task+56: ld r0,48(r9) offending instruction 0xc0050fa4 __resched_task+60: lwzxr0,r11,r0 -- Cheers, Chirag Jog ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell
On Thursday 19 June 2008, Mark Nelson wrote: * __copy_tofrom_user routine optimized for CELL-BE-PPC A few things I noticed: * You don't have a page wise user copy, which the regular code has. This is probably not so noticable in iperf, but should have a significant impact on lmbench and on a number of file system tests that copy large amounts of data. Have you checked that the loop around cache lines is just as fast? * You don't align the source to word size, only the target. Does this get handled correctly when the source is a noncacheable mapping, e.g. an unaligned copy_from_user where the source points to a physical local store mapping of an SPU? I don't think we need to optimize this case for performance, but I'm not sure if it would crash. AFAIR, unaligned loads from noncacheable storage give you an alignment exception that you need to handle, right? * The naming of the labels (with just numbers) is rather confusing, it would be good to have something better, but I must admit that I don't have a good idea either. * The trick of using the condition code in cr7 for the last bytes is really cute, but are the four branches actually better than a single computed branch into the middle of 15 byte wise copies? Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
On Jun 19, 2008, at 8:59 AM, Arnd Bergmann wrote: I assume it has suffered from bitrot and nobody tried to do better since the Power3 days. AFAICT, it hasn't seen any update since your original Power4 version from 2002. I've got an out-of-tree optimized version for pa6t as well that I haven't bothered posting yet. The real pain with the usercopy code is all the exception cases. If anyone has made a test harness to make sure they're all right, please do post it for others to use as well... -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell
Hi Arnd, You don't have a page wise user copy, which the regular code has. The new code does not need two version IMHO. The regular code was much slower for the normal case and has a special version for the 4K optimized case. The new code is equally good in both cases, so adding an extra 4K routine is will increase the code size for very minor gain. I'm not sure if its worth it. Benchmark result on QS22 for good aligned copy Old-code : 1300 MB/sec Old-code 4k Special case: 2600 MB/sec New code : 4000 MB/sec (always) You don't align the source to word size, only the target. Does this get handled correctly when the source is a noncacheable mapping, e.g. The problem is that on CELL the required shift instructions for SRC alignment are microcoded, in other words really slow. You are right the main copy2user requires that the SRC is cacheable. IMHO because of the exception on load, the routine should fallback to the byte copy loop. Arnd, could you verify that it works on localstore? Cheers Gunnar Arnd Bergmann [EMAIL PROTECTED] To 19/06/2008 16:43 linuxppc-dev@ozlabs.org cc Mark Nelson [EMAIL PROTECTED], [EMAIL PROTECTED], Gunnar von Boehn/Germany/Contr/[EMAIL PROTECTED], Michael Ellerman [EMAIL PROTECTED] Subject Re: [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell On Thursday 19 June 2008, Mark Nelson wrote: * __copy_tofrom_user routine optimized for CELL-BE-PPC A few things I noticed: * You don't have a page wise user copy, which the regular code has. This is probably not so noticable in iperf, but should have a significant impact on lmbench and on a number of file system tests that copy large amounts of data. Have you checked that the loop around cache lines is just as fast? * You don't align the source to word size, only the target. Does this get handled correctly when the source is a noncacheable mapping, e.g. an unaligned copy_from_user where the source points to a physical local store mapping of an SPU? I don't think we need to optimize this case for performance, but I'm not sure if it would crash. AFAIR, unaligned loads from noncacheable storage give you an alignment exception that you need to handle, right? * The naming of the labels (with just numbers) is rather confusing, it would be good to have something better, but I must admit that I don't have a good idea either. * The trick of using the condition code in cr7 for the last bytes is really cute, but are the four branches actually better than a single computed branch into the middle of 15 byte wise copies? Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell
--- On Thu, 6/19/08, Gunnar von Boehn [EMAIL PROTECTED] wrote: You are right the main copy2user requires that the SRC is cacheable. IMHO because of the exception on load, the routine should fallback to the byte copy loop. Arnd, could you verify that it works on localstore? Since the main loops use 'dcbz', the destination must also be cacheable. IIRC, if the destination is write-through or cache-inhibited, the 'dcbz' will cause an alignment exception. I suppose it would still function correctly via the handler, but horribly slowly. --Sanjay ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
DTC 1.2.0-rc1 Tagged
Folks, I've pushed out a freshly tagged DTC 1.2.0-rc1 to jdl.com. Please feel free to test it! Thanks, jdl David Gibson (34): libfdt: Add and use a node iteration helper function. libfdt: Fix NOP handling bug in fdt_add_subnode_namelen() dtc: Fold comment handling test into testsuite libfdt: More tests of NOP handling behaviour libfdt: Trivial cleanup for CHECK_HEADER) libfdt: Remove no longer used code from fdt_node_offset_by_compatible() dtc: Fix error reporting in push_input_file() dtc: Implement checks for the format of node and property names dtc: Fix indentation of fixup_phandle_references dtc: Strip redundant name properties dtc: Test and fix conversion to/from old dtb versions dtc: Use for_each_marker_of_type in asm_emit_data() dtc: Make -I dtb mode use fill_fullpaths() dtc: Make eval_literal() static dtc: Assorted improvements to test harness dtc: Testcases for input handling dtc: Make dtc_open_file() die() if unable to open requested file dtc: Remove ugly include stack abuse dtc: Abolish asize field of struct data dtc: Add some documentation for the dts formta dtc: Cleanup \nnn and \xNN string escape handling dtc: Change exit code for usage message dtc: Simplify error handling for unparseable input dtc: Clean up included Makefile fragments dtc: Trivial formatting fixes dtc: Make dt_from_blob() open its own input file, like the other input formats dtc: Rework handling of boot_cpuid_phys dtc: Add program to convert dts files from v0 to v1 dtc: Remove reference to dead Makefile variables libfdt: Several cleanups to parameter checking dtc: Remove some small bashisms from test scripts dtc: Fix some printf() format warnings when compiling 64-bit dtc: Add a testcase for 'reg' or 'ranges' in / dtc: Add support for binary includes. Jon Loeliger (1): Tag Version 1.2.0-rc1 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC/PATCH 0/3] sched: allow arch override of cpu power
Ingo Molnar wrote: * Nathan Lynch [EMAIL PROTECTED] wrote: So it would be nice to have the scheduler slightly prefer primary threads on POWER6 machines. These patches, which allow the architecture to override the scheduler's CPU power calculation, are one possible approach, but I'm open to others. Please note: these seemed to have the desired effect on 2.6.25-rc kernels (2-3% improvement in a kernbench-like make -j nr_cores), but I'm not seeing this improvement with 2.6.26-rc kernels for some reason I am still trying to track down. ok, i guess that discrepancy has to be tracked down before we can think about these patches - but the principle is OK. Great. I'll keep trying to figure out what's going on there. One problem is that the whole cpu-power balancing code in sched.c is a bit ... unclear and under-documented. So any change to this area should begin at documenting the basics: what do the units mean exactly, how are they used in balancing and what is the desired effect. I'd not be surprised if there were a few buglets in this area, SMT is not at the forefront of testing at the moment. There's nothing spectacularly broken in it (i have a HT machine myself), but the concepts have bitrotten a bit. Patches - even if they just add comments - are welcome :-) Okay, I'll have a look. Thanks Ingo. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
cell: add support for power button of future IBM cell blades
This patch adds support for the power button on future IBM cell blades. It actually doesn't shut down the machine. Instead it exposes an input device /dev/input/event0 to userspace which sends KEY_POWER if power button has been pressed. haldaemon actually recognizes the button, so a plattform independent acpid replacement should handle it correctly. Signed-off-by: Christian Krafft [EMAIL PROTECTED] Index: linux.git/arch/powerpc/platforms/cell/pervasive.c === --- linux.git.orig/arch/powerpc/platforms/cell/pervasive.c +++ linux.git/arch/powerpc/platforms/cell/pervasive.c @@ -24,8 +24,10 @@ #undef DEBUG #include linux/interrupt.h +#include linux/input.h #include linux/irq.h #include linux/percpu.h +#include linux/platform_device.h #include linux/types.h #include linux/kallsyms.h @@ -40,6 +42,9 @@ static int sysreset_hack; +static struct input_dev *button_dev; +static struct platform_device *button_pdev; + static void cbe_power_save(void) { unsigned long ctrl, thread_switch_control; @@ -105,10 +110,21 @@ static int cbe_system_reset_exception(st */ if (sysreset_hack (cpu = smp_processor_id()) == 0) { pmd = cbe_get_cpu_pmd_regs(cpu); - if (in_be64(pmd-ras_esc_0) 0x) { + if (in_be64(pmd-ras_esc_0) 0x) { out_be64(pmd-ras_esc_0, 0); return 0; } + if (in_be64(pmd-ras_esc_0) 0x0001) { + out_be64(pmd-ras_esc_0, 0); + if (!button_dev) + return 0; + + input_report_key(button_dev, KEY_POWER, 1); + input_sync(button_dev); + input_report_key(button_dev, KEY_POWER, 0); + input_sync(button_dev); + return 1; + } } break; #ifdef CONFIG_CBE_RAS @@ -155,3 +171,55 @@ void __init cbe_pervasive_init(void) ppc_md.power_save = cbe_power_save; ppc_md.system_reset_exception = cbe_system_reset_exception; } + +static int __init cbe_power_button_init(void) +{ + int ret; + struct input_dev *dev; + + if (!sysreset_hack) + return 0; + + dev = input_allocate_device(); +if (!dev) { + ret = -ENOMEM; +printk(KERN_ERR %s: Not enough memory\n, __func__); +goto out; +} + + set_bit(EV_KEY, dev-evbit); + set_bit(KEY_POWER, dev-keybit); + + dev-name = Power Button; + dev-id.bustype = BUS_HOST; + + /* this makes the button look like an acpi power button +* no clue whether anyone relies on that though */ + dev-id.product = 0x02; + dev-phys = LNXPWRBN/button/input0; + + button_pdev = platform_device_register_simple(power_button, 0, NULL, 0); + if (IS_ERR(button_pdev)) { + ret = PTR_ERR(button_pdev); + goto out_free_input; + } + + dev-dev.parent = button_pdev-dev; + + ret = input_register_device(dev); + if (ret) { +printk(KERN_ERR %s: Failed to register device\n, __func__); + goto out_free_pdev; +} + + button_dev = dev; + return ret; + +out_free_pdev: + platform_device_unregister(button_pdev); +out_free_input: + input_free_device(dev); +out: + return ret; +} +device_initcall(cbe_power_button_init); -- Mit freundlichen Gruessen, kind regards, Christian Krafft IBM Systems Technology Group, Linux Kernel Development IT Specialist Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Herbert Kircher Sitz der Gesellschaft: Boeblingen Registriergericht: Amtsgericht Stuttgart, HRB 243294 signature.asc Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Please pull 'next' branch of 4xx tree
Hi Paul, Please pull from: master.kernel.org:/pub/scm/linux/kernel/git/jwboyer/powerpc-4xx.git next to get some more changes for 2.6.27. A new board port, a revert, and a few fixes. I'll have a few more after this as well, most notably Ben's rework patch. josh Giuseppe Coviello (2): powerpc/4xx: Sam440ep support powerpc/4xx: Convert Sam440ep DTS to dts-v1 Imre Kaloz (1): powerpc/4xx: MTD support for the AMCC Taishan Board Josh Boyer (2): Revert [POWERPC] 4xx: Fix 460GT support to not enable FPU powerpc/4xx: Workaround for PPC440EPx/GRx PCI_28 Errata Stefan Roese (1): powerpc/4xx: PCIe driver now detects if a port is disabled via the dev-tre Valentine Barshak (1): powerpc/4xx: Fix resource issue in warp-nand.c arch/powerpc/boot/Makefile |3 +- arch/powerpc/boot/cuboot-sam440ep.c | 49 ++ arch/powerpc/boot/dts/sam440ep.dts | 293 +++ arch/powerpc/boot/dts/taishan.dts | 29 +- arch/powerpc/configs/44x/sam440ep_defconfig | 1192 +++ arch/powerpc/configs/44x/taishan_defconfig | 79 ++- arch/powerpc/kernel/cpu_setup_44x.S |1 + arch/powerpc/kernel/cputable.c |4 +- arch/powerpc/platforms/44x/Kconfig |9 + arch/powerpc/platforms/44x/Makefile |1 + arch/powerpc/platforms/44x/sam440ep.c | 79 ++ arch/powerpc/platforms/44x/warp-nand.c |3 +- arch/powerpc/sysdev/indirect_pci.c |6 + arch/powerpc/sysdev/ppc4xx_pci.c| 14 + include/asm-powerpc/pci-bridge.h|3 + 15 files changed, 1759 insertions(+), 6 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] fs_enet: restore promiscuous and multicast settings in restart()
On Wed, 18 Jun 2008 22:45:57 +0400 Matvejchikov Ilya [EMAIL PROTECTED] wrote: I'm glad that you have corrected it. Half a year ago I pointed out that there was such a mistake: http://patchwork.ozlabs.org/linuxppc/patch?id=10700 You've used -embedded ML, and patch wasn't noticed... I can add your S-O-B line if that will make you fill better :) -Vitaly Thanks. 2008/6/18 Laurent Pinchart [EMAIL PROTECTED]: The restart() function is called when the link state changes and resets multicast and promiscous settings. This patch restores those settings at the end of restart(). Signed-off-by: Laurent Pinchart [EMAIL PROTECTED] --- drivers/net/fs_enet/mac-fcc.c |3 +++ 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/net/fs_enet/mac-fcc.c b/drivers/net/fs_enet/mac-fcc.c index ce40cf9..1a95cf1 100644 --- a/drivers/net/fs_enet/mac-fcc.c +++ b/drivers/net/fs_enet/mac-fcc.c @@ -464,6 +464,9 @@ static void restart(struct net_device *dev) C32(fccp, fcc_fpsmr, FCC_PSMR_FDE | FCC_PSMR_LPB); S32(fccp, fcc_gfmr, FCC_GFMR_ENR | FCC_GFMR_ENT); + + /* Restore multicast and promiscous settings */ + set_multicast_list(dev); } static void stop(struct net_device *dev) -- 1.5.0 -- Laurent Pinchart CSE Semaphore Belgium Chaussee de Bruxelles, 732A B-1410 Waterloo Belgium T +32 (2) 387 42 59 F +32 (2) 387 42 75 -- Sincerely, Vitaly ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] fs_enet: restore promiscuous and multicast settings in restart()
Vitaly Bordug wrote: On Wed, 18 Jun 2008 22:45:57 +0400 Matvejchikov Ilya [EMAIL PROTECTED] wrote: I'm glad that you have corrected it. Half a year ago I pointed out that there was such a mistake: http://patchwork.ozlabs.org/linuxppc/patch?id=10700 You've used -embedded ML, and patch wasn't noticed... *sigh* We should merge the -embedded list into -dev and retire the -embedded list finally. jdl ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] fs_enet: restore promiscuous and multicast settings in restart()
On Jun 19, 2008, at 1:47 PM, Jon Loeliger wrote: We should merge the -embedded list into -dev and retire the -embedded list finally. I used to be an opponent to this given the amount of help my board doesn't work questions on -embedded, but the volume isn't that great, and much lower than the -dev list anyway. So yes, I agree. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] fs_enet: restore promiscuous and multicast settings in restart()
Yes, please. =) 2008/6/19 Vitaly Bordug [EMAIL PROTECTED]: On Wed, 18 Jun 2008 22:45:57 +0400 Matvejchikov Ilya [EMAIL PROTECTED] wrote: I'm glad that you have corrected it. Half a year ago I pointed out that there was such a mistake: http://patchwork.ozlabs.org/linuxppc/patch?id=10700 You've used -embedded ML, and patch wasn't noticed... I can add your S-O-B line if that will make you fill better :) -Vitaly Thanks. 2008/6/18 Laurent Pinchart [EMAIL PROTECTED]: The restart() function is called when the link state changes and resets multicast and promiscous settings. This patch restores those settings at the end of restart(). Signed-off-by: Laurent Pinchart [EMAIL PROTECTED] --- drivers/net/fs_enet/mac-fcc.c |3 +++ 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/net/fs_enet/mac-fcc.c b/drivers/net/fs_enet/mac-fcc.c index ce40cf9..1a95cf1 100644 --- a/drivers/net/fs_enet/mac-fcc.c +++ b/drivers/net/fs_enet/mac-fcc.c @@ -464,6 +464,9 @@ static void restart(struct net_device *dev) C32(fccp, fcc_fpsmr, FCC_PSMR_FDE | FCC_PSMR_LPB); S32(fccp, fcc_gfmr, FCC_GFMR_ENR | FCC_GFMR_ENT); + + /* Restore multicast and promiscous settings */ + set_multicast_list(dev); } static void stop(struct net_device *dev) -- 1.5.0 -- Laurent Pinchart CSE Semaphore Belgium Chaussee de Bruxelles, 732A B-1410 Waterloo Belgium T +32 (2) 387 42 59 F +32 (2) 387 42 75 -- Sincerely, Vitaly ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [Cbe-oss-dev] [RFC 3/3] powerpc: copy_4K_page tweaked for Cell
On Thursday 19 June 2008, Mark Nelson wrote: .align 7 _GLOBAL(copy_4K_page) dcbt0,r4/* Prefetch ONE SRC cacheline */ addir6,r3,-8/* prepare for stdu */ addir4,r4,-8/* prepare for ldu */ li r10,32 /* copy 32 cache lines for a 4K page */ li r12,128+8 /* prefetch distance*/ Since you have a loop here anyway instead of the fully unrolled code, why not provide a copy_64K_page function as well, jumping in here? The inline 64k copy_page function otherwise just adds code size, as well as being a tiny bit slower. It may even be good to have an out-of-line copy_64K_page for the regular code, just calling copy_4K_page repeatedly. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH REPOST #2] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts
During corner case testing, we noticed that some versions of ehca do not properly transition to interrupt done in special load situations. This can be resolved by periodically triggering EOI through H_EOI, if eqes are pending. Signed-off-by: Stefan Roscher [EMAIL PROTECTED] --- As firmware team suggested I moved the call of the EOI h_call into the handler function, this ensures that we will call EOI only when we find a valid eqe on the event queue. Additionally I changed the calculation of the xirr value as Roland suggested. paulus / benh -- does this version still get your ack? Seems that fw team is OK with it according to Stefan... If so I will add this to my tree for 2.6.27. diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index ce1ab05..0792d93 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -531,7 +531,7 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq) { struct ehca_eq *eq = shca-eq; struct ehca_eqe_cache_entry *eqe_cache = eq-eqe_cache; -u64 eqe_value; +u64 eqe_value, ret; unsigned long flags; int eqe_cnt, i; int eq_empty = 0; @@ -583,8 +583,13 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq) ehca_dbg(shca-ib_device, No eqe found for irq event); goto unlock_irq_spinlock; -} else if (!is_irq) +} else if (!is_irq) { +ret = hipz_h_eoi(eq-ist); +if (ret != H_SUCCESS) +ehca_err(shca-ib_device, + bad return code EOI -rc = %ld\n, ret); ehca_dbg(shca-ib_device, deadman found %x eqe, eqe_cnt); +} if (unlikely(eqe_cnt == EHCA_EQE_CACHE_SIZE)) ehca_dbg(shca-ib_device, too many eqes for one irq event); /* enable irq for new packets */ diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c index 5245e13..415d3a4 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.c +++ b/drivers/infiniband/hw/ehca/hcp_if.c @@ -933,3 +933,13 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle, r_cb, 0, 0, 0, 0); } + +u64 hipz_h_eoi(int irq) +{ +unsigned long xirr; + +iosync(); +xirr = (0xffULL 24) | irq; + +return plpar_hcall_norets(H_EOI, xirr); +} diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h index 60ce02b..2c3c6e0 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.h +++ b/drivers/infiniband/hw/ehca/hcp_if.h @@ -260,5 +260,6 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle, const u64 ressource_handle, void *rblock, unsigned long *byte_count); +u64 hipz_h_eoi(int irq); #endif /* __HCP_IF_H__ */ -- 1.5.5 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
Gunnar von Boehn writes: I have no results for P5/P6, but I did some tests on JS21 aka PPC-970. On PPC-970 the CELL memcpy is faster than the current Linux routine. This becomes really visible when you really copy memory-to-memory and are not only working in the 2ndlevelcache. Could you send some more details, like the actual copy speed you measured and how you did the tests? Thanks, Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
On Thu, 19 Jun 2008 09:53:16 pm Arnd Bergmann wrote: On Thursday 19 June 2008, Mark Nelson wrote: The plan is to use Michael Ellerman's code patching work so that at runtime if we're running on a Cell machine the new routines are called but otherwise the existing memory copy routines are used. Have you tried running this code on other platforms to see if it actually performs worse on any of them? I would guess that the older code also doesn't work too well on Power 5 and Power 6, so the cell optimized version could give us a significant advantage as well, albeit less than another CPU specific version. Arnd I did run the tests on Power 5 and Power 6, and on Power 5 with the new routines, the iperf bandwidth increased to 7.9 GBits/sec up from 7.5 GBits/sec; but on Power 6 the bandwidth with the old routines was 13.6 GBits/sec compared to 12.8 GBits/sec... I also couldn't get the updated routines to boot on 970MP without removing the dcbz instructions. I'll investigate more and also rerun the tests again Thanks! Mark ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
On Fri, 20 Jun 2008 12:53:49 am Olof Johansson wrote: On Jun 19, 2008, at 8:59 AM, Arnd Bergmann wrote: I assume it has suffered from bitrot and nobody tried to do better since the Power3 days. AFAICT, it hasn't seen any update since your original Power4 version from 2002. I've got an out-of-tree optimized version for pa6t as well that I haven't bothered posting yet. The real pain with the usercopy code is all the exception cases. If anyone has made a test harness to make sure they're all right, please do post it for others to use as well... I second that request - I verified (to the best that I could) with pen and paper that the exception handling on this new version is correct but it would be great to have a better way to test it. Mark ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [Cbe-oss-dev] [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell
Gunnar von Boehn writes: The regular code was much slower for the normal case and has a special version for the 4K optimized case. That's a slightly inaccurate view... The reason for having the two cases is that when I profiled the distribution of sizes and alignments of memory copies in the kernel, the result was that almost all copies (something like 99%, IIRC) were either 128 bytes or less, or else a whole page at a page-aligned address. Thus we get the best performance by having a simple copy routine with minimal setup overhead for the small copy case, plus an aggressively optimized page copy routine. Spending time setting up for a multi-cacheline copy that's not a whole page is just going to hurt the small copy case without providing any real benefit. Transferring data over loopback is possibly an exception to that. However, it's very rare to transfer large amounts of data over loopback, unless you're running a benchmark like iperf or netperf. :-/ Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell
* The naming of the labels (with just numbers) is rather confusing, it would be good to have something better, but I must admit that I don't have a good idea either. I will admit that at first glance the label naming with numbers does look confusing but when you notice that all the loads start at 20 and all the stores start at 60 and that to get the exception handler for those instructions you just add 100 I think it makes sense, but that could be because I've been looking at it way too long... (I thought I had a comment in there to that effect but it must have gotten lost along the way. I'll add a new comment explaining the above, that should help) * The trick of using the condition code in cr7 for the last bytes is really cute, but are the four branches actually better than a single computed branch into the middle of 15 byte wise copies? The original copy_tofrom_user does this also, which I guess is carried over to this new version... Gunnar did you have an old version that did something similar to this? Mark ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [Cbe-oss-dev] [RFC 3/3] powerpc: copy_4K_page tweaked for Cell
On Fri, 20 Jun 2008 07:28:50 am Arnd Bergmann wrote: On Thursday 19 June 2008, Mark Nelson wrote: .align 7 _GLOBAL(copy_4K_page) dcbt0,r4/* Prefetch ONE SRC cacheline */ addir6,r3,-8/* prepare for stdu */ addir4,r4,-8/* prepare for ldu */ li r10,32 /* copy 32 cache lines for a 4K page */ li r12,128+8 /* prefetch distance*/ Since you have a loop here anyway instead of the fully unrolled code, why not provide a copy_64K_page function as well, jumping in here? That is a good idea. What effect will that have on how the code patching will work? The inline 64k copy_page function otherwise just adds code size, as well as being a tiny bit slower. It may even be good to have an out-of-line copy_64K_page for the regular code, just calling copy_4K_page repeatedly. Doing that sounds like it'll make the code patching easier. Thanks! Mark ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] fs_enet: restore promiscuous and multicast settings in restart()
On Wed, 18 Jun 2008, Laurent Pinchart wrote: The restart() function is called when the link state changes and resets multicast and promiscous settings. This patch restores those settings at the end of restart(). Signed-off-by: Laurent Pinchart [EMAIL PROTECTED] --- drivers/net/fs_enet/mac-fcc.c |3 +++ 2 files changed, 4 insertions(+), 1 deletions(-) Is the whole patch here? The above says 2 files changed and 5 lines changed, but what's included here is only 1 file and 3 line changes. diff --git a/drivers/net/fs_enet/mac-fcc.c b/drivers/net/fs_enet/mac-fcc.c index ce40cf9..1a95cf1 100644 --- a/drivers/net/fs_enet/mac-fcc.c +++ b/drivers/net/fs_enet/mac-fcc.c @@ -464,6 +464,9 @@ static void restart(struct net_device *dev) C32(fccp, fcc_fpsmr, FCC_PSMR_FDE | FCC_PSMR_LPB); S32(fccp, fcc_gfmr, FCC_GFMR_ENR | FCC_GFMR_ENT); + + /* Restore multicast and promiscous settings */ + set_multicast_list(dev); } static void stop(struct net_device *dev) -Bill ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
The following set of patches adds Vector Scalar Extentions (VSX) support for POWER7. Includes context switch, ptrace and signals support. Signed-off-by: Michael Neuling [EMAIL PROTECTED] --- Paulus: please consider for your 2.6.27 tree. Updated with comments from Kumar, Milton, Dave Woodhouse and Mark 'NKOTB' Nelson. - Changed thread_struct array definition to be cleaner - Updated CPU_FTRS_POSSIBLE - Updated Kconfig typo and dupilicate - Added comment to clarify ibm,vmx = 2 really means VSX. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 3/9] powerpc: Move altivec_unavailable
Move the altivec_unavailable code, to make room at 0xf40 where the vsx_unavailable exception will be. Signed-off-by: Michael Neuling [EMAIL PROTECTED] --- arch/powerpc/kernel/head_64.S |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S +++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S @@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) . = 0xf00 b performance_monitor_pSeries - STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable) + . = 0xf20 + b altivec_unavailable_pSeries #ifdef CONFIG_CBE_RAS HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error) @@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) /* moved from 0xf00 */ STD_EXCEPTION_PSERIES(., performance_monitor) + STD_EXCEPTION_PSERIES(., altivec_unavailable) /* * An interrupt came in while soft-disabled; clear EE in SRR1, ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
We are going to change where the floating point registers are stored in the thread_struct, so in preparation add some macros to access the floating point registers. Update all code to use these new macros. Signed-off-by: Michael Neuling [EMAIL PROTECTED] --- arch/powerpc/kernel/align.c |6 ++-- arch/powerpc/kernel/asm-offsets.c |2 - arch/powerpc/kernel/process.c |5 ++- arch/powerpc/kernel/ptrace.c | 14 + arch/powerpc/kernel/ptrace32.c|9 -- arch/powerpc/kernel/signal_32.c |6 ++-- arch/powerpc/kernel/signal_64.c | 13 +--- arch/powerpc/kernel/softemu8xx.c |4 +- arch/powerpc/math-emu/math.c | 56 +++--- include/asm-powerpc/ppc_asm.h |5 ++- include/asm-powerpc/processor.h |7 11 files changed, 71 insertions(+), 56 deletions(-) Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c @@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr, unsigned int reg, unsigned int flags) { - char *ptr = (char *) current-thread.fpr[reg]; + char *ptr = (char *) current-thread.TS_FPR(reg); int i, ret; if (!(flags F)) @@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs) return -EFAULT; } } else if (flags F) { - data.dd = current-thread.fpr[reg]; + data.dd = current-thread.TS_FPR(reg); if (flags S) { /* Single-precision FP store requires conversion... */ #ifdef CONFIG_PPC_FPU @@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs) if (unlikely(ret)) return -EFAULT; } else if (flags F) - current-thread.fpr[reg] = data.dd; + current-thread.TS_FPR(reg) = data.dd; else regs-gpr[reg] = data.ll; Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c @@ -66,7 +66,7 @@ int main(void) DEFINE(KSP_LIMIT, offsetof(struct thread_struct, ksp_limit)); DEFINE(PT_REGS, offsetof(struct thread_struct, regs)); DEFINE(THREAD_FPEXC_MODE, offsetof(struct thread_struct, fpexc_mode)); - DEFINE(THREAD_FPR0, offsetof(struct thread_struct, fpr[0])); + DEFINE(THREAD_FPR0, offsetof(struct thread_struct, TS_FPR(0))); DEFINE(THREAD_FPSCR, offsetof(struct thread_struct, fpscr)); #ifdef CONFIG_ALTIVEC DEFINE(THREAD_VR0, offsetof(struct thread_struct, vr[0])); Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c @@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts return 0; flush_fp_to_thread(current); - memcpy(fpregs, tsk-thread.fpr[0], sizeof(*fpregs)); + memcpy(fpregs, tsk-thread.TS_FPR(0), sizeof(*fpregs)); return 1; } @@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs, #endif discard_lazy_cpu_state(); - memset(current-thread.fpr, 0, sizeof(current-thread.fpr)); + memset(current-thread.TS_FPRSTART, 0, + sizeof(current-thread.TS_FPRSTART)); current-thread.fpscr.val = 0; #ifdef CONFIG_ALTIVEC memset(current-thread.vr, 0, sizeof(current-thread.vr)); Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c @@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t flush_fp_to_thread(target); BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) != -offsetof(struct thread_struct, fpr[32])); +offsetof(struct thread_struct, TS_FPR(32))); return user_regset_copyout(pos, count, kbuf, ubuf, - target-thread.fpr, 0, -1); + target-thread.TS_FPRSTART, 0, -1); } static int fpr_set(struct task_struct *target, const struct user_regset *regset, @@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t flush_fp_to_thread(target); BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) != -offsetof(struct thread_struct, fpr[32])); +offsetof(struct thread_struct, TS_FPR(32))); return user_regset_copyin(pos,
[PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
If we set the SPE MSR bit in save_user_regs we can blow away the VEC bit. This will never happen in reality (VMX and SPE will never be in the same processor as their opcodes overlap), but it looks bad. Also when we add VSX here in a later patch, we can hit two of these at the same time. Signed-off-by: Michael Neuling [EMAIL PROTECTED] --- arch/powerpc/kernel/signal_32.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c @@ -336,6 +336,8 @@ struct rt_sigframe { static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame, int sigret) { + unsigned long msr = regs-msr; + /* Make sure floating point registers are stored in regs */ flush_fp_to_thread(current); @@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs return 1; /* set MSR_VEC in the saved MSR value to indicate that frame-mc_vregs contains valid data */ - if (__put_user(regs-msr | MSR_VEC, frame-mc_gregs[PT_MSR])) - return 1; + msr |= MSR_VEC; } /* else assert((regs-msr MSR_VEC) == 0) */ @@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs return 1; /* set MSR_SPE in the saved MSR value to indicate that frame-mc_vregs contains valid data */ - if (__put_user(regs-msr | MSR_SPE, frame-mc_gregs[PT_MSR])) - return 1; + msr |= MSR_SPE; } /* else assert((regs-msr MSR_SPE) == 0) */ @@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs return 1; #endif /* CONFIG_SPE */ + if (__put_user(msr, frame-mc_gregs[PT_MSR])) + return 1; if (sigret) { /* Set up the sigreturn trampoline: li r0,sigret; sc */ if (__put_user(0x3800UL + sigret, frame-tramp[0]) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
Make load_up_fpu and load_up_altivec callable so they can be reused by the VSX code. Signed-off-by: Michael Neuling [EMAIL PROTECTED] --- arch/powerpc/kernel/fpu.S|2 +- arch/powerpc/kernel/head_32.S|6 -- arch/powerpc/kernel/head_64.S|8 +--- arch/powerpc/kernel/head_booke.h |6 -- 4 files changed, 14 insertions(+), 8 deletions(-) Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S +++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S @@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu) #endif /* CONFIG_SMP */ /* restore registers and return */ /* we haven't used ctr or xer or lr */ - b fast_exception_return + blr /* * giveup_fpu(tsk) Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S +++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S @@ -421,8 +421,10 @@ BEGIN_FTR_SECTION b ProgramCheck END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE) EXCEPTION_PROLOG - bne load_up_fpu /* if from user, just load it up */ - addir3,r1,STACK_FRAME_OVERHEAD + beq 1f + bl load_up_fpu /* if from user, just load it up */ + b fast_exception_return +1: addir3,r1,STACK_FRAME_OVERHEAD EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception) /* Decrementer */ Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S +++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S @@ -741,7 +741,8 @@ fp_unavailable_common: ENABLE_INTS bl .kernel_fp_unavailable_exception BUG_OPCODE -1: b .load_up_fpu +1: bl .load_up_fpu + b fast_exception_return .align 7 .globl altivec_unavailable_common @@ -749,7 +750,8 @@ altivec_unavailable_common: EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN) #ifdef CONFIG_ALTIVEC BEGIN_FTR_SECTION - bne .load_up_altivec/* if from user, just load it up */ + bnel.load_up_altivec + b fast_exception_return END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif bl .save_nvgprs @@ -829,7 +831,7 @@ _STATIC(load_up_altivec) std r4,0(r3) #endif /* CONFIG_SMP */ /* restore registers and return */ - b fast_exception_return + blr #endif /* CONFIG_ALTIVEC */ /* Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h +++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h @@ -363,8 +363,10 @@ label: #define FP_UNAVAILABLE_EXCEPTION \ START_EXCEPTION(FloatingPointUnavailable) \ NORMAL_EXCEPTION_PROLOG; \ - bne load_up_fpu;/* if from user, just load it up */ \ - addir3,r1,STACK_FRAME_OVERHEAD; \ + beq 1f; \ + bl load_up_fpu;/* if from user, just load it up */ \ + b fast_exception_return;\ +1: addir3,r1,STACK_FRAME_OVERHEAD; \ EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception) #endif /* __HEAD_BOOKE_H__ */ ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
The layout of the new VSR registers and how they overlap on top of the legacy FPR and VR registers is: VSR doubleword 0 VSR doubleword 1 VSR[0] | FPR[0]| | VSR[1] | FPR[1]| | | ... | | | ... | | VSR[30] | FPR[30] | | VSR[31] | FPR[31] | | VSR[32] | VR[0]| VSR[33] | VR[1]| | ... | | ... | VSR[62] | VR[30] | VSR[63] | VR[31] | VSX has 64 128bit registers. The first 32 regs overlap with the FP registers and hence extend them with and additional 64 bits. The second 32 regs overlap with the VMX registers. This patch introduces the thread_struct changes required to reflect this register layout. Ptrace and signals code is updated so that the floating point registers are correctly accessed from the thread_struct when CONFIG_VSX is enabled. Signed-off-by: Michael Neuling [EMAIL PROTECTED] --- arch/powerpc/kernel/asm-offsets.c |4 ++ arch/powerpc/kernel/ptrace.c | 28 ++ arch/powerpc/kernel/signal_32.c | 59 +- arch/powerpc/kernel/signal_64.c | 36 +++ include/asm-powerpc/processor.h | 31 +++ 5 files changed, 139 insertions(+), 19 deletions(-) Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c @@ -74,6 +74,10 @@ int main(void) DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr)); DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr)); #endif /* CONFIG_ALTIVEC */ +#ifdef CONFIG_VSX + DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpvsr[0].vsr)); + DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr)); +#endif /* CONFIG_VSX */ #ifdef CONFIG_PPC64 DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid)); #else /* CONFIG_PPC64 */ Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c @@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t unsigned int pos, unsigned int count, void *kbuf, void __user *ubuf) { +#ifdef CONFIG_VSX + double buf[33]; + int i; +#endif flush_fp_to_thread(target); +#ifdef CONFIG_VSX + /* copy to local buffer then write that out */ + for (i = 0; i 32 ; i++) + buf[i] = target-thread.TS_FPR(i); + memcpy(buf[32], target-thread.fpscr, sizeof(double)); + return user_regset_copyout(pos, count, kbuf, ubuf, buf, 0, -1); + +#else BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) != offsetof(struct thread_struct, TS_FPR(32))); return user_regset_copyout(pos, count, kbuf, ubuf, target-thread.TS_FPRSTART, 0, -1); +#endif } static int fpr_set(struct task_struct *target, const struct user_regset *regset, unsigned int pos, unsigned int count, const void *kbuf, const void __user *ubuf) { +#ifdef CONFIG_VSX + double buf[33]; + int i; +#endif flush_fp_to_thread(target); +#ifdef CONFIG_VSX + /* copy to local buffer then write that out */ + i = user_regset_copyin(pos, count,
[PATCH 9/9] powerpc: Add CONFIG_VSX config option
Add CONFIG_VSX config build option. Must compile with POWER4, FPU and ALTIVEC. Signed-off-by: Michael Neuling [EMAIL PROTECTED] --- arch/powerpc/platforms/Kconfig.cputype | 16 1 file changed, 16 insertions(+) Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype === --- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype +++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype @@ -155,6 +155,22 @@ config ALTIVEC If in doubt, say Y here. +config VSX + bool VSX Support + depends on POWER4 ALTIVEC PPC_FPU + ---help--- + + This option enables kernel support for the Vector Scaler extensions + to the PowerPC processor. The kernel currently supports saving and + restoring VSX registers, and turning on the 'VSX enable' bit so user + processes can execute VSX instructions. + + This option is only useful if you have a processor that supports + VSX (P7 and above), but does not have any affect on a non-VSX + CPUs (it does, however add code to the kernel). + + If in doubt, say Y here. + config SPE bool SPE Support depends on E200 || E500 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 6/9] powerpc: Add VSX CPU feature
Add a VSX CPU feature. Also add code to detect if VSX is available from the device tree. Signed-off-by: Michael Neuling [EMAIL PROTECTED] Signed-off-by: Joel Schopp [EMAIL PROTECTED] --- arch/powerpc/kernel/prom.c |4 include/asm-powerpc/cputable.h | 15 ++- 2 files changed, 18 insertions(+), 1 deletion(-) Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c @@ -609,6 +609,10 @@ static struct feature_property { {altivec, 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC}, {ibm,vmx, 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC}, #endif /* CONFIG_ALTIVEC */ +#ifdef CONFIG_VSX + /* Yes, this _really_ is ibm,vmx == 2 to enable VSX */ + {ibm,vmx, 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX}, +#endif /* CONFIG_VSX */ #ifdef CONFIG_PPC64 {ibm,dfp, 1, 0, PPC_FEATURE_HAS_DFP}, {ibm,purr, 1, CPU_FTR_PURR, 0}, Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h === --- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h +++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h @@ -27,6 +27,7 @@ #define PPC_FEATURE_HAS_DFP0x0400 #define PPC_FEATURE_POWER6_EXT 0x0200 #define PPC_FEATURE_ARCH_2_06 0x0100 +#define PPC_FEATURE_HAS_VSX0x0080 #define PPC_FEATURE_TRUE_LE0x0002 #define PPC_FEATURE_PPC_LE 0x0001 @@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l #define CPU_FTR_DSCR LONG_ASM_CONST(0x0002) #define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004) #define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008) +#define CPU_FTR_VSXLONG_ASM_CONST(0x0010) #ifndef __ASSEMBLY__ @@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l #define PPC_FEATURE_HAS_ALTIVEC_COMP0 #endif +/* We only set the VSX features if the kernel was compiled with VSX + * support + */ +#ifdef CONFIG_VSX +#define CPU_FTR_VSX_COMP CPU_FTR_VSX +#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX +#else +#define CPU_FTR_VSX_COMP 0 +#define PPC_FEATURE_HAS_VSX_COMP0 +#endif + /* We only set the spe features if the kernel was compiled with spe * support */ @@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l (CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 |\ CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 | \ CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \ - CPU_FTR_1T_SEGMENT) + CPU_FTR_1T_SEGMENT | CPU_FTR_VSX) #else enum { CPU_FTRS_POSSIBLE = ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 7/9] powerpc: Add VSX assembler code macros
This adds the macros for the VSX load/store instruction as most binutils are not going to support this for a while. Also add VSX register save/restore macros and vsr[0-63] register definitions. Signed-off-by: Michael Neuling [EMAIL PROTECTED] --- include/asm-powerpc/ppc_asm.h | 127 ++ 1 file changed, 127 insertions(+) Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h === --- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h +++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h @@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR); REST_10GPRS(22, base) #endif +/* + * Define what the VSX XX1 form instructions will look like, then add + * the 128 bit load store instructions based on that. + */ +#define VSX_XX1(xs, ra, rb)(((xs) 0x1f) 21 | ((ra) 16) | \ +((rb) 11) | (((xs) 5))) + +#define STXVD2X(xs, ra, rb).long (0x7c000798 | VSX_XX1((xs), (ra), (rb))) +#define LXVD2X(xs, ra, rb) .long (0x7c000698 | VSX_XX1((xs), (ra), (rb))) #define SAVE_2GPRS(n, base)SAVE_GPR(n, base); SAVE_GPR(n+1, base) #define SAVE_4GPRS(n, base)SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base) @@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR); #define REST_16VRS(n,b,base) REST_8VRS(n,b,base); REST_8VRS(n+8,b,base) #define REST_32VRS(n,b,base) REST_16VRS(n,b,base); REST_16VRS(n+16,b,base) +/* Save the lower 32 VSRs in the thread VSR region */ +#define SAVE_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); STXVD2X(n,b,base) +#define SAVE_2VSRS(n,b,base) SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base) +#define SAVE_4VSRS(n,b,base) SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base) +#define SAVE_8VSRS(n,b,base) SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base) +#define SAVE_16VSRS(n,b,base) SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base) +#define SAVE_32VSRS(n,b,base) SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base) +#define REST_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base) +#define REST_2VSRS(n,b,base) REST_VSR(n,b,base); REST_VSR(n+1,b,base) +#define REST_4VSRS(n,b,base) REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base) +#define REST_8VSRS(n,b,base) REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base) +#define REST_16VSRS(n,b,base) REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base) +#define REST_32VSRS(n,b,base) REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base) +/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */ +#define SAVE_VSRU(n,b,base)li b,THREAD_VR0+(16*(n)); STXVD2X(n+32,b,base) +#define SAVE_2VSRSU(n,b,base) SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base) +#define SAVE_4VSRSU(n,b,base) SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base) +#define SAVE_8VSRSU(n,b,base) SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base) +#define SAVE_16VSRSU(n,b,base) SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base) +#define SAVE_32VSRSU(n,b,base) SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base) +#define REST_VSRU(n,b,base)li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base) +#define REST_2VSRSU(n,b,base) REST_VSRU(n,b,base); REST_VSRU(n+1,b,base) +#define REST_4VSRSU(n,b,base) REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base) +#define REST_8VSRSU(n,b,base) REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base) +#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base) +#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base) + +#ifdef CONFIG_VSX +#define REST_32FPVSRS(n,c,base) \ +BEGIN_FTR_SECTION \ + b 2f; \ +END_FTR_SECTION_IFSET(CPU_FTR_VSX);\ + REST_32FPRS(n,base);\ + b 3f; \ +2: REST_32VSRS(n,c,base); \ +3: + +#define SAVE_32FPVSRS(n,c,base) \ +BEGIN_FTR_SECTION \ + b 2f; \ +END_FTR_SECTION_IFSET(CPU_FTR_VSX);\ + SAVE_32FPRS(n,base);\ + b 3f; \ +2: SAVE_32VSRS(n,c,base); \ +3: + +#else +#define REST_32FPVSRS(n,b,base)REST_32FPRS(n, base) +#define SAVE_32FPVSRS(n,b,base)SAVE_32FPRS(n, base) +#endif + #define SAVE_EVR(n,s,base) evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base) #define SAVE_2EVRS(n,s,base) SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base) #define SAVE_4EVRS(n,s,base) SAVE_2EVRS(n,s,base);
[PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
This patch extends the floating point save and restore code to use the VSX load/stores when VSX is available. This will make FP context save/restore marginally slower on FP only code, when VSX is available, as it has to load/store 128bits rather than just 64bits. Mixing FP, VMX and VSX code will get constant architected state. The signals interface is extended to enable access to VSR 0-31 doubleword 1 after discussions with tool chain maintainers. Backward compatibility is maintained. The ptrace interface is also extended to allow access to VSR 0-31 full registers. Signed-off-by: Michael Neuling [EMAIL PROTECTED] --- arch/powerpc/kernel/entry_64.S |5 + arch/powerpc/kernel/fpu.S| 16 - arch/powerpc/kernel/head_64.S| 65 +++ arch/powerpc/kernel/misc_64.S| 33 +++ arch/powerpc/kernel/ppc32.h |1 arch/powerpc/kernel/ppc_ksyms.c |3 + arch/powerpc/kernel/process.c| 109 ++- arch/powerpc/kernel/ptrace.c | 70 + arch/powerpc/kernel/signal_32.c | 33 +++ arch/powerpc/kernel/signal_64.c | 31 ++- arch/powerpc/kernel/traps.c | 29 ++ include/asm-powerpc/elf.h|6 +- include/asm-powerpc/ptrace.h | 12 include/asm-powerpc/reg.h|2 include/asm-powerpc/sigcontext.h | 37 - include/asm-powerpc/system.h |9 +++ include/linux/elf.h |1 17 files changed, 454 insertions(+), 8 deletions(-) Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S +++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S @@ -353,6 +353,11 @@ _GLOBAL(_switch) mflrr20 /* Return to switch caller */ mfmsr r22 li r0, MSR_FP +#ifdef CONFIG_VSX +BEGIN_FTR_SECTION + orisr0,r0,[EMAIL PROTECTED] /* Disable VSX */ +END_FTR_SECTION_IFSET(CPU_FTR_VSX) +#endif /* CONFIG_VSX */ #ifdef CONFIG_ALTIVEC BEGIN_FTR_SECTION orisr0,r0,[EMAIL PROTECTED] /* Disable altivec */ Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S +++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S @@ -34,6 +34,11 @@ _GLOBAL(load_up_fpu) mfmsr r5 ori r5,r5,MSR_FP +#ifdef CONFIG_VSX +BEGIN_FTR_SECTION + orisr5,r5,[EMAIL PROTECTED] +END_FTR_SECTION_IFSET(CPU_FTR_VSX) +#endif SYNC MTMSRD(r5) /* enable use of fpu now */ isync @@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu) beq 1f toreal(r4) addir4,r4,THREAD/* want last_task_used_math-thread */ - SAVE_32FPRS(0, r4) + SAVE_32FPVSRS(0, r5, r4) mffsfr0 stfdfr0,THREAD_FPSCR(r4) PPC_LL r5,PT_REGS(r4) @@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu) #endif lfd fr0,THREAD_FPSCR(r5) MTFSF_L(fr0) - REST_32FPRS(0, r5) + REST_32FPVSRS(0, r4, r5) #ifndef CONFIG_SMP subir4,r5,THREAD fromreal(r4) @@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu) _GLOBAL(giveup_fpu) mfmsr r5 ori r5,r5,MSR_FP +#ifdef CONFIG_VSX +BEGIN_FTR_SECTION + orisr5,r5,[EMAIL PROTECTED] +END_FTR_SECTION_IFSET(CPU_FTR_VSX) +#endif SYNC_601 ISYNC_601 MTMSRD(r5) /* enable use of fpu now */ @@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu) addir3,r3,THREAD/* want THREAD of task */ PPC_LL r5,PT_REGS(r3) PPC_LCMPI 0,r5,0 - SAVE_32FPRS(0, r3) + SAVE_32FPVSRS(0, r4 ,r3) mffsfr0 stfdfr0,THREAD_FPSCR(r3) beq 1f Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S +++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S @@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) . = 0xf20 b altivec_unavailable_pSeries + . = 0xf40 + b vsx_unavailable_pSeries + #ifdef CONFIG_CBE_RAS HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error) #endif /* CONFIG_CBE_RAS */ @@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) /* moved from 0xf00 */ STD_EXCEPTION_PSERIES(., performance_monitor) STD_EXCEPTION_PSERIES(., altivec_unavailable) + STD_EXCEPTION_PSERIES(., vsx_unavailable) /* * An interrupt came in while soft-disabled; clear EE in SRR1, @@ -834,6 +838,67 @@ _STATIC(load_up_altivec) blr #endif /* CONFIG_ALTIVEC */ + .align 7 + .globl vsx_unavailable_common +vsx_unavailable_common: + EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN) +#ifdef CONFIG_VSX +BEGIN_FTR_SECTION