Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX

2008-06-19 Thread Michael Neuling
In message [EMAIL PROTECTED] you wrote
:
  +   } fpvsr __attribute__((aligned(16)));
 
  Do we really need a union here?  what would happen if you just
  changed
  the type of fpr[32] from double to vector if #CONFIG_VSX?
 
  I really dont like the union and think we can just make the storage
  look opaque which is the key.  I doubt we every really care about
  using fpr[] as a double in the kernel.
 
  I did something similar to this for the first cut of this patch, but
  it
  made the code accessing this structure much less readable.
 
  really, what code is that?
 
  Any code that has to read/write the top or bottom 64 bits _only_ of  
  the
  128 bit vector.
 
  The signals code is a good example where, for backwards compatibility,
  we need to read/write the old 64 bit FP regs, from the 128 bit value  
  in
  the struct.
 
  Similarly, the way we've extended the signals interface for VSX, you
  need to read/write out the bottom 64 bits (vsrlow) of a 128 bit value.
 
  eg. the simple:
  current-thread.fpvsr.fp[i].vsrlow = buf[i]
  would turn into some abomination/macro.
 
 it would turn into something like:
 
 current-thread.fpr[i][2] = buf[i];
 current-thread.fpr[i][3] = buf[i+1];

Maybe abomination was going too far :-) 

I still think using the union makes it is easier to read than what you
have here.  Also, it better reflects the structure of what's being
stored there.

Mikey

 if you look at your code you'll see there are only a few places you  
 accessing the union as fpvsr.vsr[] and those places could easily be  
 fpr[], since they are already #CONFIG_VSX protected.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX

2008-06-19 Thread Kumar Gala


On Jun 19, 2008, at 1:01 AM, Michael Neuling wrote:

In message B0E87874-BC65-4037- 
[EMAIL PROTECTED] you wrote

:

+   } fpvsr __attribute__((aligned(16)));


Do we really need a union here?  what would happen if you just
changed
the type of fpr[32] from double to vector if #CONFIG_VSX?

I really dont like the union and think we can just make the  
storage

look opaque which is the key.  I doubt we every really care about
using fpr[] as a double in the kernel.


I did something similar to this for the first cut of this patch,  
but

it
made the code accessing this structure much less readable.


really, what code is that?


Any code that has to read/write the top or bottom 64 bits _only_ of
the
128 bit vector.

The signals code is a good example where, for backwards  
compatibility,

we need to read/write the old 64 bit FP regs, from the 128 bit value
in
the struct.

Similarly, the way we've extended the signals interface for VSX, you
need to read/write out the bottom 64 bits (vsrlow) of a 128 bit  
value.


eg. the simple:
   current-thread.fpvsr.fp[i].vsrlow = buf[i]
would turn into some abomination/macro.


it would turn into something like:

current-thread.fpr[i][2] = buf[i];
current-thread.fpr[i][3] = buf[i+1];


Maybe abomination was going too far :-)

I still think using the union makes it is easier to read than what you
have here.  Also, it better reflects the structure of what's being
stored there.


I don't think that holds much weight with me.  We don't union the  
vector128 type to show it also supports float, u16, and u8 types.


I stick by the fact that the ONLY place it looks like you access the  
union via the .vsr member is for memset or memcpy so you clearly know  
if the size should be sizeof(double) or sizeof(vector).


Also, I can see the case in the future that 'fpr's become 128-bits  
wide' and allow for native long double support.


- k
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/6] Move code patching code into arch/powerpc/lib/code-patching.c

2008-06-19 Thread Kumar Gala


On May 29, 2008, at 1:20 AM, Michael Ellerman wrote:

We currently have a few routines for patching code in asm/system.h,  
because
they didn't fit anywhere else. I'd like to clean them up a little  
and add
some more, so first move them into a dedicated C file - they don't  
need to

be inlined.

While we're moving the code, drop create_function_call(), it's  
intended

caller never got merged and will be replaced in future with something
different.

Signed-off-by: Michael Ellerman [EMAIL PROTECTED]
---
arch/powerpc/kernel/crash_dump.c  |1 +
arch/powerpc/lib/Makefile |2 +
arch/powerpc/lib/code-patching.c  |   33 
arch/powerpc/platforms/86xx/mpc86xx_smp.c |1 +
arch/powerpc/platforms/powermac/smp.c |1 +
include/asm-powerpc/code-patching.h   |   25 +++
include/asm-powerpc/system.h  |   48  
-

7 files changed, 63 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/kernel/crash_dum


what's the state of these patches and getting them into powerpc-next?

I'm looking at some runtime fix ups that I was thinking of basing on  
this code.


- k
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 6/9] powerpc: Add VSX CPU feature

2008-06-19 Thread David Woodhouse
On Wed, 2008-06-18 at 10:47 +1000, Michael Neuling wrote:
 {ibm,vmx, 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
  #endif /* CONFIG_ALTIVEC */
 +#ifdef CONFIG_VSX
 +   {ibm,vmx, 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
 +#endif /* CONFIG_VSX */

Should that be ibm,vsx?

-- 
dwmw2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/6] Move code patching code into arch/powerpc/lib/code-patching.c

2008-06-19 Thread Michael Ellerman
On Thu, 2008-06-19 at 01:15 -0500, Kumar Gala wrote:
 On May 29, 2008, at 1:20 AM, Michael Ellerman wrote:
 
  We currently have a few routines for patching code in asm/system.h,  
  because
  they didn't fit anywhere else. I'd like to clean them up a little  
  and add
  some more, so first move them into a dedicated C file - they don't  
  need to
  be inlined.
 
  While we're moving the code, drop create_function_call(), it's  
  intended
  caller never got merged and will be replaced in future with something
  different.
 
  Signed-off-by: Michael Ellerman [EMAIL PROTECTED]
  ---
  arch/powerpc/kernel/crash_dump.c  |1 +
  arch/powerpc/lib/Makefile |2 +
  arch/powerpc/lib/code-patching.c  |   33 
  arch/powerpc/platforms/86xx/mpc86xx_smp.c |1 +
  arch/powerpc/platforms/powermac/smp.c |1 +
  include/asm-powerpc/code-patching.h   |   25 +++
  include/asm-powerpc/system.h  |   48  
  -
  7 files changed, 63 insertions(+), 48 deletions(-)
 
  diff --git a/arch/powerpc/kernel/crash_dum
 
 what's the state of these patches and getting them into powerpc-next?

I think what I posted is reasonably solid, I've added some more routines
for the stuff I'm working on. I'll repost today or tommorrow.

 I'm looking at some runtime fix ups that I was thinking of basing on  
 this code.

What have you got in mind? I'm working on some runtime fixups too :)

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 6/9] powerpc: Add VSX CPU feature

2008-06-19 Thread Michael Neuling
 On Wed, 2008-06-18 at 10:47 +1000, Michael Neuling wrote:
  {ibm,vmx, 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
   #endif /* CONFIG_ALTIVEC */
  +#ifdef CONFIG_VSX
  +   {ibm,vmx, 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
  +#endif /* CONFIG_VSX */
 
 Should that be ibm,vsx?

Nope ibm,vmx == 2 is correct for VSX.

You're not the first to think it looks wrong, so I should add a
comment.  

Mikey
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[RFC 3/3] powerpc: copy_4K_page tweaked for Cell

2008-06-19 Thread Mark Nelson
/*
 * Copyright (C) 2008 Gunnar von Boehn, IBM Corp.
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * as published by the Free Software Foundation; either version
 * 2 of the License, or (at your option) any later version.
 *
 *
 * copy_4K_page routine optimized for CELL-BE-PPC
 *
 * The CELL PPC core has 1 integerunit and 1 load/store unit
 * CELL: 1st level data cache = 32K - 2nd level data cache = 512K
 * - 3rd level data cache = 0K
 * To improve copy performance we need to prefetch source data
 * far ahead to hide this latency
 * For best performance instruction forms ending in . like andi.
 * should be avoided as they are implemented in microcode on CELL.
 *
 * The below code is loop unrolled for the CELL cache line of 128 bytes.
 */

#include asm/processor.h
#include asm/ppc_asm.h

#define PREFETCH_AHEAD 6
#define ZERO_AHEAD 4

.align  7
_GLOBAL(copy_4K_page)
dcbt0,r4/* Prefetch ONE SRC cacheline */

addir6,r3,-8/* prepare for stdu */
addir4,r4,-8/* prepare for ldu */

li  r10,32  /* copy 32 cache lines for a 4K page */
li  r12,128+8   /* prefetch distance*/

subir11,r10,PREFETCH_AHEAD
li  r10,PREFETCH_AHEAD

mtctr   r10
.LprefetchSRC:
dcbtr12,r4
addir12,r12,128
bdnz.LprefetchSRC

.Louterloop:/* copy while cache lines */
mtctr   r11

li  r11,128*ZERO_AHEAD +8   /* DCBZ dist */

.align  4
/* Copy whole cachelines, optimized by prefetching SRC cacheline */
.Lloop: /* Copy aligned body */
dcbtr12,r4  /* PREFETCH SOURCE some cache lines 
ahead*/
ld  r9, 0x08(r4)
dcbzr11,r6
ld  r7, 0x10(r4)/* 4 register stride copy */
ld  r8, 0x18(r4)/* 4 are optimal to hide 1st level 
cache lantency*/
ld  r0, 0x20(r4)
std r9, 0x08(r6)
std r7, 0x10(r6)
std r8, 0x18(r6)
std r0, 0x20(r6)
ld  r9, 0x28(r4)
ld  r7, 0x30(r4)
ld  r8, 0x38(r4)
ld  r0, 0x40(r4)
std r9, 0x28(r6)
std r7, 0x30(r6)
std r8, 0x38(r6)
std r0, 0x40(r6)
ld  r9, 0x48(r4)
ld  r7, 0x50(r4)
ld  r8, 0x58(r4)
ld  r0, 0x60(r4)
std r9, 0x48(r6)
std r7, 0x50(r6)
std r8, 0x58(r6)
std r0, 0x60(r6)
ld  r9, 0x68(r4)
ld  r7, 0x70(r4)
ld  r8, 0x78(r4)
ldu r0, 0x80(r4)
std r9, 0x68(r6)
std r7, 0x70(r6)
std r8, 0x78(r6)
stdur0, 0x80(r6)

bdnz.Lloop

sldir10,r10,2   /* adjust from 128 to 32 byte stride */
mtctr   r10
.Lloop2:/* Copy aligned body */
ld  r9, 0x08(r4)
ld  r7, 0x10(r4)
ld  r8, 0x18(r4)
ldu r0, 0x20(r4)
std r9, 0x08(r6)
std r7, 0x10(r6)
std r8, 0x18(r6)
stdur0, 0x20(r6)

bdnz.Lloop2

.Lendloop2:
blr
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[RFC 2/3] powerpc: memcpy tweaked for Cell

2008-06-19 Thread Mark Nelson
/*
 * Copyright (C) 2008 Gunnar von Boehn, IBM Corp.
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * as published by the Free Software Foundation; either version
 * 2 of the License, or (at your option) any later version.
 *
 *
 * memcpy (and copy_4K_page) routine optimized for CELL-BE-PPC
 *
 * The CELL PPC core has 1 integerunit and 1 load/store unit
 * CELL: 1st level data cache = 32K - 2nd level data cache = 512K
 * - 3rd level data cache = 0K
 * To improve copy performance we need to prefetch source data
 * far ahead to hide this latency
 * For best performance instruction forms ending in . like andi.
 * should be avoided as they are implemented in microcode on CELL.
 *
 * The below code is loop unrolled for the CELL cache line of 128 bytes.
 */

#include asm/processor.h
#include asm/ppc_asm.h

#define PREFETCH_AHEAD 6
#define ZERO_AHEAD 4

.align  7
_GLOBAL(memcpy)
dcbt0,r4/* Prefetch ONE SRC cacheline */
cmpldi  cr1,r5,16   /* is size  16 ? */
mr  r6,r3
blt+cr1,.Lshortcopy

.Lbigcopy:
neg r8,r3   /* LS 3 bits = # bytes to 8-byte dest bdry */
clrldi  r8,r8,64-4  /* aling to 16byte boundary */
sub r7,r4,r3
cmpldi  cr0,r8,0
beq+.Ldst_aligned

.Ldst_unaligned:
mtcrf   0x01,r8 /* put #bytes to boundary into cr7 */
subfr5,r8,r5

bf  cr7*4+3,1f
lbzxr0,r7,r6/* copy 1 byte */
stb r0,0(r6)
addir6,r6,1
1:  bf  cr7*4+2,2f
lhzxr0,r7,r6/* copy 2 byte */
sth r0,0(r6)
addir6,r6,2
2:  bf  cr7*4+1,4f
lwzxr0,r7,r6/* copy 4 byte */
stw r0,0(r6)
addir6,r6,4
4:  bf  cr7*4+0,8f
ldx r0,r7,r6/* copy 8 byte */
std r0,0(r6)
addir6,r6,8
8:
add r4,r7,r6

.Ldst_aligned:

cmpdi   cr5,r5,128-1

neg r7,r6
addir6,r6,-8/* prepare for stdu */
addir4,r4,-8/* prepare for ldu */

clrldi  r7,r7,64-7  /* align to cacheline boundary */
ble+cr5,.Llessthancacheline


cmpldi  cr6,r7,0
subfr5,r7,r5
srdir7,r7,4 /* divide size by 16 */
srdir10,r5,7/* number of cache lines to copy */


cmpldi  r10,0
li  r11,0   /* number cachelines to copy with 
prefetch */
beq .Lnocacheprefetch

cmpldi  r10,PREFETCH_AHEAD
li  r12,128+8   /* prefetch distance*/
ble .Llessthanmaxprefetch

subir11,r10,PREFETCH_AHEAD
li  r10,PREFETCH_AHEAD
.Llessthanmaxprefetch:

mtctr   r10
.LprefetchSRC:
dcbtr12,r4
addir12,r12,128
bdnz.LprefetchSRC
.Lnocacheprefetch:


mtctr   r7
cmpldi  cr1,r5,128
clrldi  r5,r5,64-7

beq cr6,.Lcachelinealigned  /*  */
.Laligntocacheline:
ld  r9,0x08(r4)
ldu r7,0x10(r4)
std r9,0x08(r6)
stdur7,0x10(r6)
bdnz.Laligntocacheline


.Lcachelinealigned: /* copy while cache lines */


blt-cr1,.Llessthancacheline /* size 128 */

.Louterloop:
cmpdi   r11,0
mtctr   r11
beq-.Lendloop

li  r11,128*ZERO_AHEAD +8   /* DCBZ dist */

.align  4
/* Copy whole cachelines, optimized by prefetching SRC cacheline */
.Lloop: /* Copy aligned body */
dcbtr12,r4  /* PREFETCH SOURCE some cache lines 
ahead*/
ld  r9, 0x08(r4)
dcbzr11,r6
ld  r7, 0x10(r4)/* 4 register stride copy */
ld  r8, 0x18(r4)/* 4 are optimal to hide 1st level 
cache lantency*/
ld  r0, 0x20(r4)
std r9, 0x08(r6)
std r7, 0x10(r6)
std r8, 0x18(r6)
std r0, 0x20(r6)
ld  r9, 0x28(r4)
ld  r7, 0x30(r4)
ld  r8, 0x38(r4)
ld  r0, 0x40(r4)
std r9, 0x28(r6)
std r7, 0x30(r6)
std r8, 0x38(r6)
std r0, 0x40(r6)
ld  r9, 0x48(r4)
ld  r7, 0x50(r4)
ld  r8, 0x58(r4)
ld  r0, 0x60(r4)
std r9, 0x48(r6)
std r7, 0x50(r6)
std r8, 0x58(r6)
std r0, 0x60(r6)
ld  r9, 0x68(r4)
ld  r7, 0x70(r4)
ld  r8, 0x78(r4)
ldu r0, 0x80(r4)
std r9, 0x68(r6)
std r7, 0x70(r6)
std r8, 0x78(r6)
stdur0, 0x80(r6)

bdnz.Lloop
.Lendloop:


cmpdi   r10,0
sldir10,r10,2   /* adjust from 128 

Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX

2008-06-19 Thread Benjamin Herrenschmidt
On Thu, 2008-06-19 at 01:10 -0500, Kumar Gala wrote:
  I still think using the union makes it is easier to read than what you
  have here.  Also, it better reflects the structure of what's being
  stored there.
 
 I don't think that holds much weight with me.  We don't union the  
 vector128 type to show it also supports float, u16, and u8 types.

But this is different. The same registers are either basic FP regs or 
full VSX regs.

I don't see what's wrong with union, it's a nice way to express things.
 
 I stick by the fact that the ONLY place it looks like you access the  
 union via the .vsr member is for memset or memcpy so you clearly know  
 if the size should be sizeof(double) or sizeof(vector).
 
 Also, I can see the case in the future that 'fpr's become 

What's wrong with the union ? there's nothing ugly about them..

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC/PATCH 0/3] sched: allow arch override of cpu power

2008-06-19 Thread Ingo Molnar

* Nathan Lynch [EMAIL PROTECTED] wrote:

 There is an interesting quality of POWER6 cores, which each have 2 
 hardware threads: assuming one thread on the core is idle, the primary 
 thread is a little faster than the secondary thread.  To illustrate:
 
 for cpumask in 0x1 0x2 ; do
 taskset $cpumask /usr/bin/time -f %e elapsed, %U user, %S sys \
 /bin/sh -c i=100 ; while (( i-- )) ; do : ; done
 done
 
 17.05 elapsed, 16.83 user, 0.22 sys
 17.54 elapsed, 17.32 user, 0.22 sys
 
 (The first result is for a primary thread; the second result for a 
 secondary thread.)
 
 So it would be nice to have the scheduler slightly prefer primary 
 threads on POWER6 machines.  These patches, which allow the 
 architecture to override the scheduler's CPU power calculation, are 
 one possible approach, but I'm open to others.  Please note: these 
 seemed to have the desired effect on 2.6.25-rc kernels (2-3% 
 improvement in a kernbench-like make -j nr_cores), but I'm not 
 seeing this improvement with 2.6.26-rc kernels for some reason I am 
 still trying to track down.

ok, i guess that discrepancy has to be tracked down before we can think 
about these patches - but the principle is OK.

One problem is that the whole cpu-power balancing code in sched.c is a 
bit ... unclear and under-documented. So any change to this area should 
begin at documenting the basics: what do the units mean exactly, how are 
they used in balancing and what is the desired effect.

I'd not be surprised if there were a few buglets in this area, SMT is 
not at the forefront of testing at the moment. There's nothing 
spectacularly broken in it (i have a HT machine myself), but the 
concepts have bitrotten a bit. Patches - even if they just add comments 
- are welcome :-)

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell

2008-06-19 Thread Arnd Bergmann
On Thursday 19 June 2008, Mark Nelson wrote:
 The plan is to use Michael Ellerman's code patching work so that at runtime
 if we're running on a Cell machine the new routines are called but otherwise
 the existing memory copy routines are used.

Have you tried running this code on other platforms to see if it
actually performs worse on any of them? I would guess that the
older code also doesn't work too well on Power 5 and Power 6, so the
cell optimized version could give us a significant advantage as well,
albeit less than another CPU specific version.

Arnd 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell

2008-06-19 Thread Paul Mackerras
Arnd Bergmann writes:

 Have you tried running this code on other platforms to see if it
 actually performs worse on any of them? I would guess that the
 older code also doesn't work too well on Power 5 and Power 6,

Why would you guess that?

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell

2008-06-19 Thread Gunnar von Boehn
Hi Arnd,

I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
On PPC-970 the CELL memcpy is faster than the current Linux routine.
This becomes really visible when you really copy memory-to-memory and are
not only working in the 2ndlevelcache.


Kind regards

Gunnar von Boehn




   
 Arnd Bergmann 
 [EMAIL PROTECTED]   
To 
 19/06/2008 13:53  linuxppc-dev@ozlabs.org 
cc 
   Mark Nelson [EMAIL PROTECTED],
   [EMAIL PROTECTED], Gunnar von  
   Boehn/Germany/Contr/[EMAIL PROTECTED],   
   
   Michael Ellerman
   [EMAIL PROTECTED]  
   Subject 
   Re: [RFC 0/3] powerpc: memory copy  
   routines tweaked for Cell   
   
   
   
   
   
   




On Thursday 19 June 2008, Mark Nelson wrote:
 The plan is to use Michael Ellerman's code patching work so that at
runtime
 if we're running on a Cell machine the new routines are called but
otherwise
 the existing memory copy routines are used.

Have you tried running this code on other platforms to see if it
actually performs worse on any of them? I would guess that the
older code also doesn't work too well on Power 5 and Power 6, so the
cell optimized version could give us a significant advantage as well,
albeit less than another CPU specific version.

 Arnd 


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/6] Move code patching code into arch/powerpc/lib/code-patching.c

2008-06-19 Thread Kumar Gala


On Jun 19, 2008, at 1:55 AM, Michael Ellerman wrote:


On Thu, 2008-06-19 at 01:15 -0500, Kumar Gala wrote:

On May 29, 2008, at 1:20 AM, Michael Ellerman wrote:


We currently have a few routines for patching code in asm/system.h,
because
they didn't fit anywhere else. I'd like to clean them up a little
and add
some more, so first move them into a dedicated C file - they don't
need to
be inlined.

While we're moving the code, drop create_function_call(), it's
intended
caller never got merged and will be replaced in future with  
something

different.

Signed-off-by: Michael Ellerman [EMAIL PROTECTED]
---
arch/powerpc/kernel/crash_dump.c  |1 +
arch/powerpc/lib/Makefile |2 +
arch/powerpc/lib/code-patching.c  |   33 ++ 
++

arch/powerpc/platforms/86xx/mpc86xx_smp.c |1 +
arch/powerpc/platforms/powermac/smp.c |1 +
include/asm-powerpc/code-patching.h   |   25 +++
include/asm-powerpc/system.h  |   48
-
7 files changed, 63 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/kernel/crash_dum


what's the state of these patches and getting them into powerpc-next?


I think what I posted is reasonably solid, I've added some more  
routines

for the stuff I'm working on. I'll repost today or tommorrow.


I'm looking at some runtime fix ups that I was thinking of basing on
this code.


What have you got in mind? I'm working on some runtime fixups too :)


I want to be able to run time fix up lwsync an remove it as compile  
time thing.


- k
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX

2008-06-19 Thread Kumar Gala


On Jun 19, 2008, at 4:33 AM, Benjamin Herrenschmidt wrote:


On Thu, 2008-06-19 at 01:10 -0500, Kumar Gala wrote:
I still think using the union makes it is easier to read than what  
you

have here.  Also, it better reflects the structure of what's being
stored there.


I don't think that holds much weight with me.  We don't union the
vector128 type to show it also supports float, u16, and u8 types.


But this is different. The same registers are either basic FP regs or
full VSX regs.

I don't see what's wrong with union, it's a nice way to express  
things.


We also don't do this for SPE (the freescale version).


I stick by the fact that the ONLY place it looks like you access the
union via the .vsr member is for memset or memcpy so you clearly know
if the size should be sizeof(double) or sizeof(vector).

Also, I can see the case in the future that 'fpr's become


What's wrong with the union ? there's nothing ugly about them..


I'll wait for the next version and see how many places .vsr is  
actually accessed.


- k
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell

2008-06-19 Thread Arnd Bergmann
On Thursday 19 June 2008, Paul Mackerras wrote:
 Arnd Bergmann writes:
 
  Have you tried running this code on other platforms to see if it
  actually performs worse on any of them? I would guess that the
  older code also doesn't work too well on Power 5 and Power 6,
 
 Why would you guess that?

I remembered that Gunnar had done some tests on other CPUs showing
that an earlier version of the code was better than the kernel
memcpy.
Also, I had tried to trace the history of the usercopy function
and found that it predates most of the CPUs in current use, so
I assume it has suffered from bitrot and nobody tried to do better
since the Power3 days. AFAICT, it hasn't seen any update since your
original Power4 version from 2002.

Arnd 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC/PATCH] powerpc: rework 4xx PTE access and TLB miss

2008-06-19 Thread Josh Boyer
On Wed, 11 Jun 2008 10:50:31 +1000
Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:

 This is some preliminary work to improve TLB management on SW loaded
 TLB powerpc platforms. This introduce support for non-atomic PTE
 operations in pgtable-ppc32.h and removes write back to the PTE from
 the TLB miss handlers. In addition, the DSI interrupt code no longer
 tries to fixup write permission, this is left to generic code, and
 _PAGE_HWWRITE is gone.
 
 Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
 ---
 
 This is a first step, plan is to do the same for FSL BookE, 405 and
 possibly 8xx too. From there, I want to rework a bit the execute
 permission handling to avoid multiple faults, add support for
 _PAGE_EXEC (no executable mappings), for prefaulting (especially
 for kmap) and proper SMP support for future SMP capable BookE
 platforms.

I've looked this over quite a bit and can't find anything wrong with
it.  As soon as I get my boards set back up next week, I will try it
out on a few and see if I can find a good stress test as well.

If you could add the comments that Kumar suggested and send out an
updated patch, I'm inclined to get this into 2.6.27, but we should do
that soon if that is our target.

josh
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [OOPS] RT kernel on PowerPC

2008-06-19 Thread Sebastien Dugue

  Hi Chirag

On Thu, 19 Jun 2008 18:16:34 +0530 Chirag Jog [EMAIL PROTECTED] wrote:

 Hi,
 I was trying out the realtime linux kernel 2.6.25.4-rt3 on a powerpc box.
 The kernel booted fine.
 On running the matrix_mult testcase from the real-time testsuite
 in ltp (ltp/testcases/realtime/func), I get the following Oops.
 After the which machine just freezes.

  I do get the same thing on a JS22 blade, and you will find that some other
tests are hanging or oopsing. For example the sbrk_mutex testcase suffers from
missed hrtimer wakeups. I also get loads of:

  BUG: using smp_processor_id() in preemptible [] code

all triggered from the sys_munmap - ... - free_pgtables code path.

  Currently I'm trying to debug some networking problems where the whole
stack gets stuck under heavy receive load.

  As you can see, -rt is far from stable on the Power architecture. Sorry for
not having an answer for you but I just wanted to show some of the
obstacles laying ahead.

  Sebastien.

 I tried setting the panic_on_oops but that didn't help strangely.
 Also, attached is the config file
 
 Oops: Kernel access of bad area, sig: 11 [#1]
 PREEMPT SMP NR_CPUS=64 NUMA pSeries
 Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ehea inet_lro 
 iptable_filter ip_tables xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 
 dm_mirror dm_multipath dm_mod parport_pc lp parport sg sr_mod ibmvscsic 
 scsi_transport_srp sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
 NIP: c0050fa0 LR: c0053db4 CTR: c005ab98
 REGS: c0009224fe50 TRAP: 0300   Not tainted  (2.6.25.4-rt3)
 MSR: 80001032 ME,IR,DR  CR: 2888  XER: 
 DAR: c180004fc4b0, DSISR: 4000
 TASK = c00092200ad0[59] 'events/3' THREAD: c0009225 CPU: 3
 GPR00: c04fc480 c000922500d0 c05bca00 c00092200ad0 
 GPR04: 0002 0038  000f 
 GPR08: 0001 c180004fc480 0f0f0f0f0f0f0f0f c04d9a00 
 GPR12: 80009032 c04fca80  c0413910 
 GPR16: 40c0 c0412108  00284000 
 GPR20: c04cb9b0 010cb9b0  001fdaa40b13 
 GPR24:  001f c000922501d0  
 GPR28: 0001 c00092200ad0 c056a680 c00090b18ad0 
 NIP [c0050fa0] .__resched_task+0x38/0xfc
 LR [c0053db4] .try_to_wake_up+0x168/0x200
 Call Trace:
 Instruction dump:
 fbc1fff0 fbe1fff8 ebc2af70 7c7d1b78 f8010010 f821ff71 e81e8008 e97e8000 
 e9230008 81290010 79294da4 7d290214 e8090030 7c0b002e 7c74 7800d182 
 
 Disassembling the __resched_task,
 
 Dump of assembler code for function __resched_task:
 0xc0050f68 __resched_task+0:  mflrr0
 0xc0050f6c __resched_task+4:  std r29,-24(r1)
 0xc0050f70 __resched_task+8:  std r30,-16(r1)
 0xc0050f74 __resched_task+12: std r31,-8(r1)
 0xc0050f78 __resched_task+16: ld  r30,-20624(r2)
 0xc0050f7c __resched_task+20: mr  r29,r3
 0xc0050f80 __resched_task+24: std r0,16(r1)
 0xc0050f84 __resched_task+28: stdur1,-144(r1)
 0xc0050f88 __resched_task+32: ld  r0,-32760(r30)
 0xc0050f8c __resched_task+36: ld  r11,-32768(r30)
 0xc0050f90 __resched_task+40: ld  r9,8(r3)
 0xc0050f94 __resched_task+44: lwz r9,16(r9)
 0xc0050f98 __resched_task+48: rldicr  r9,r9,9,54
 0xc0050f9c __resched_task+52: add r9,r9,r0
 0xc0050fa0 __resched_task+56: ld  r0,48(r9) 
 offending instruction 
 0xc0050fa4 __resched_task+60: lwzxr0,r11,r0
 
 
 
 
 
 -- 
 Cheers,
 Chirag Jog
 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell

2008-06-19 Thread Arnd Bergmann
On Thursday 19 June 2008, Mark Nelson wrote:

  * __copy_tofrom_user routine optimized for CELL-BE-PPC

A few things I noticed:

* You don't have a page wise user copy, which the regular code
has. This is probably not so noticable in iperf, but should
have a significant impact on lmbench and on a number of file
system tests that copy large amounts of data. Have you checked
that the loop around cache lines is just as fast?

* You don't align the source to word size, only the target.
Does this get handled correctly when the source is a noncacheable
mapping, e.g. an unaligned copy_from_user where the source points
to a physical local store mapping of an SPU? I don't think we
need to optimize this case for performance, but I'm not sure
if it would crash. AFAIR, unaligned loads from noncacheable storage
give you an alignment exception that you need to handle, right?

* The naming of the labels (with just numbers) is rather confusing,
it would be good to have something better, but I must admit that
I don't have a good idea either.

* The trick of using the condition code in cr7 for the last bytes
is really cute, but are the four branches actually better than a
single computed branch into the middle of 15 byte wise copies?

Arnd 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell

2008-06-19 Thread Olof Johansson


On Jun 19, 2008, at 8:59 AM, Arnd Bergmann wrote:


I assume it has suffered from bitrot and nobody tried to do better
since the Power3 days. AFAICT, it hasn't seen any update since your
original Power4 version from 2002.


I've got an out-of-tree optimized version for pa6t as well that I  
haven't bothered posting yet.


The real pain with the usercopy code is all the exception cases. If  
anyone has made a test harness to make sure they're all right, please  
do post it for others to use as well...



-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell

2008-06-19 Thread Gunnar von Boehn
Hi Arnd,

 You don't have a page wise user copy,
 which the regular code has.

The new code does not need two version IMHO.
The regular code was much slower for the normal case and has a special
version for the 4K optimized case.
The new code is equally good in both cases, so adding an extra 4K routine
is will increase the code size for very minor gain. I'm not sure if its
worth it.

Benchmark result on QS22 for good aligned copy
Old-code : 1300 MB/sec
Old-code 4k Special case: 2600 MB/sec
New code : 4000 MB/sec (always)


 You don't align the source to word size, only the target.
 Does this get handled correctly when the source
 is a noncacheable mapping, e.g.

The problem is that on CELL the required shift instructions
for SRC alignment are microcoded, in other words really slow.
You are right the main copy2user requires that the SRC is cacheable.
IMHO because of the exception on load, the routine should fallback to the
byte copy loop.

Arnd, could you verify that it works on localstore?


Cheers
Gunnar





   
 Arnd Bergmann 
 [EMAIL PROTECTED]   
To 
 19/06/2008 16:43  linuxppc-dev@ozlabs.org 
cc 
   Mark Nelson [EMAIL PROTECTED],
   [EMAIL PROTECTED], Gunnar von  
   Boehn/Germany/Contr/[EMAIL PROTECTED],   
   
   Michael Ellerman
   [EMAIL PROTECTED]  
   Subject 
   Re: [RFC 1/3] powerpc:  
   __copy_tofrom_user tweaked for Cell 
   
   
   
   
   
   




On Thursday 19 June 2008, Mark Nelson wrote:

  * __copy_tofrom_user routine optimized for CELL-BE-PPC

A few things I noticed:

* You don't have a page wise user copy, which the regular code
has. This is probably not so noticable in iperf, but should
have a significant impact on lmbench and on a number of file
system tests that copy large amounts of data. Have you checked
that the loop around cache lines is just as fast?

* You don't align the source to word size, only the target.
Does this get handled correctly when the source is a noncacheable
mapping, e.g. an unaligned copy_from_user where the source points
to a physical local store mapping of an SPU? I don't think we
need to optimize this case for performance, but I'm not sure
if it would crash. AFAIR, unaligned loads from noncacheable storage
give you an alignment exception that you need to handle, right?

* The naming of the labels (with just numbers) is rather confusing,
it would be good to have something better, but I must admit that
I don't have a good idea either.

* The trick of using the condition code in cr7 for the last bytes
is really cute, but are the four branches actually better than a
single computed branch into the middle of 15 byte wise copies?

 Arnd 


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell

2008-06-19 Thread Sanjay Patel



--- On Thu, 6/19/08, Gunnar von Boehn [EMAIL PROTECTED] wrote:

 You are right the main copy2user requires that the SRC is
 cacheable.
 IMHO because of the exception on load, the routine should
 fallback to the
 byte copy loop.
 
 Arnd, could you verify that it works on localstore?

Since the main loops use 'dcbz', the destination must also be cacheable. IIRC, 
if the destination is write-through or cache-inhibited, the 'dcbz' will cause 
an alignment exception. I suppose it would still function correctly via the 
handler, but horribly slowly.

--Sanjay




  
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


DTC 1.2.0-rc1 Tagged

2008-06-19 Thread Jon Loeliger
Folks,

I've pushed out a freshly tagged DTC 1.2.0-rc1 to jdl.com.

Please feel free to test it!

Thanks,
jdl



David Gibson (34):
  libfdt: Add and use a node iteration helper function.
  libfdt: Fix NOP handling bug in fdt_add_subnode_namelen()
  dtc: Fold comment handling test into testsuite
  libfdt: More tests of NOP handling behaviour
  libfdt: Trivial cleanup for CHECK_HEADER)
  libfdt: Remove no longer used code from fdt_node_offset_by_compatible()
  dtc: Fix error reporting in push_input_file()
  dtc: Implement checks for the format of node and property names
  dtc: Fix indentation of fixup_phandle_references
  dtc: Strip redundant name properties
  dtc: Test and fix conversion to/from old dtb versions
  dtc: Use for_each_marker_of_type in asm_emit_data()
  dtc: Make -I dtb mode use fill_fullpaths()
  dtc: Make eval_literal() static
  dtc: Assorted improvements to test harness
  dtc: Testcases for input handling
  dtc: Make dtc_open_file() die() if unable to open requested file
  dtc: Remove ugly include stack abuse
  dtc: Abolish asize field of struct data
  dtc: Add some documentation for the dts formta
  dtc: Cleanup \nnn and \xNN string escape handling
  dtc: Change exit code for usage message
  dtc: Simplify error handling for unparseable input
  dtc: Clean up included Makefile fragments
  dtc: Trivial formatting fixes
  dtc: Make dt_from_blob() open its own input file, like the other input 
formats
  dtc: Rework handling of boot_cpuid_phys
  dtc: Add program to convert dts files from v0 to v1
  dtc: Remove reference to dead Makefile variables
  libfdt: Several cleanups to parameter checking
  dtc: Remove some small bashisms from test scripts
  dtc: Fix some printf() format warnings when compiling 64-bit
  dtc: Add a testcase for 'reg' or 'ranges' in /
  dtc: Add support for binary includes.

Jon Loeliger (1):
  Tag Version 1.2.0-rc1
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC/PATCH 0/3] sched: allow arch override of cpu power

2008-06-19 Thread Nathan Lynch
Ingo Molnar wrote:
 
 * Nathan Lynch [EMAIL PROTECTED] wrote:
  So it would be nice to have the scheduler slightly prefer primary 
  threads on POWER6 machines.  These patches, which allow the 
  architecture to override the scheduler's CPU power calculation, are 
  one possible approach, but I'm open to others.  Please note: these 
  seemed to have the desired effect on 2.6.25-rc kernels (2-3% 
  improvement in a kernbench-like make -j nr_cores), but I'm not 
  seeing this improvement with 2.6.26-rc kernels for some reason I am 
  still trying to track down.
 
 ok, i guess that discrepancy has to be tracked down before we can think 
 about these patches - but the principle is OK.

Great.  I'll keep trying to figure out what's going on there.


 One problem is that the whole cpu-power balancing code in sched.c is a 
 bit ... unclear and under-documented. So any change to this area should 
 begin at documenting the basics: what do the units mean exactly, how are 
 they used in balancing and what is the desired effect.

 I'd not be surprised if there were a few buglets in this area, SMT is 
 not at the forefront of testing at the moment. There's nothing 
 spectacularly broken in it (i have a HT machine myself), but the 
 concepts have bitrotten a bit. Patches - even if they just add comments 
 - are welcome :-)

Okay, I'll have a look.  Thanks Ingo.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


cell: add support for power button of future IBM cell blades

2008-06-19 Thread Christian Krafft
This patch adds support for the power button on future IBM cell blades.
It actually doesn't shut down the machine. Instead it exposes an
input device /dev/input/event0 to userspace which sends KEY_POWER
if power button has been pressed.
haldaemon actually recognizes the button, so a plattform independent acpid
replacement should handle it correctly.

Signed-off-by: Christian Krafft [EMAIL PROTECTED]

Index: linux.git/arch/powerpc/platforms/cell/pervasive.c
===
--- linux.git.orig/arch/powerpc/platforms/cell/pervasive.c
+++ linux.git/arch/powerpc/platforms/cell/pervasive.c
@@ -24,8 +24,10 @@
 #undef DEBUG
 
 #include linux/interrupt.h
+#include linux/input.h
 #include linux/irq.h
 #include linux/percpu.h
+#include linux/platform_device.h
 #include linux/types.h
 #include linux/kallsyms.h
 
@@ -40,6 +42,9 @@
 
 static int sysreset_hack;
 
+static struct input_dev *button_dev;
+static struct platform_device *button_pdev;
+
 static void cbe_power_save(void)
 {
unsigned long ctrl, thread_switch_control;
@@ -105,10 +110,21 @@ static int cbe_system_reset_exception(st
 */
if (sysreset_hack  (cpu = smp_processor_id()) == 0) {
pmd = cbe_get_cpu_pmd_regs(cpu);
-   if (in_be64(pmd-ras_esc_0)  0x) {
+   if (in_be64(pmd-ras_esc_0)  0x) {
out_be64(pmd-ras_esc_0, 0);
return 0;
}
+   if (in_be64(pmd-ras_esc_0)  0x0001) {
+   out_be64(pmd-ras_esc_0, 0);
+   if (!button_dev)
+   return 0;
+
+   input_report_key(button_dev, KEY_POWER, 1);
+   input_sync(button_dev);
+   input_report_key(button_dev, KEY_POWER, 0);
+   input_sync(button_dev);
+   return 1;
+   }
}
break;
 #ifdef CONFIG_CBE_RAS
@@ -155,3 +171,55 @@ void __init cbe_pervasive_init(void)
ppc_md.power_save = cbe_power_save;
ppc_md.system_reset_exception = cbe_system_reset_exception;
 }
+
+static int __init cbe_power_button_init(void)
+{
+   int ret;
+   struct input_dev *dev;
+
+   if (!sysreset_hack)
+   return 0;
+
+   dev = input_allocate_device();
+if (!dev) {
+   ret = -ENOMEM;
+printk(KERN_ERR %s: Not enough memory\n, __func__);
+goto out;
+}
+
+   set_bit(EV_KEY, dev-evbit);
+   set_bit(KEY_POWER, dev-keybit);
+
+   dev-name = Power Button;
+   dev-id.bustype = BUS_HOST;
+
+   /* this makes the button look like an acpi power button
+* no clue whether anyone relies on that though */
+   dev-id.product = 0x02;
+   dev-phys = LNXPWRBN/button/input0;
+
+   button_pdev = platform_device_register_simple(power_button, 0, NULL,
0);
+   if (IS_ERR(button_pdev)) {
+   ret = PTR_ERR(button_pdev);
+   goto out_free_input;
+   }
+
+   dev-dev.parent = button_pdev-dev;
+
+   ret = input_register_device(dev);
+   if (ret) {
+printk(KERN_ERR %s: Failed to register device\n, __func__);
+   goto out_free_pdev;
+}
+
+   button_dev = dev;
+   return ret;
+
+out_free_pdev:
+   platform_device_unregister(button_pdev);
+out_free_input:
+   input_free_device(dev);
+out:
+   return ret;
+}
+device_initcall(cbe_power_button_init);


-- 
Mit freundlichen Gruessen,
kind regards,

Christian Krafft
IBM Systems  Technology Group,
Linux Kernel Development
IT Specialist


Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Herbert Kircher
Sitz der Gesellschaft:  Boeblingen
Registriergericht:  Amtsgericht Stuttgart, HRB 243294


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Please pull 'next' branch of 4xx tree

2008-06-19 Thread Josh Boyer
Hi Paul,

Please pull from:

 master.kernel.org:/pub/scm/linux/kernel/git/jwboyer/powerpc-4xx.git next

to get some more changes for 2.6.27.  A new board port, a revert, and a
few fixes.

I'll have a few more after this as well, most notably Ben's rework
patch.

josh

Giuseppe Coviello (2):
  powerpc/4xx: Sam440ep support
  powerpc/4xx: Convert Sam440ep DTS to dts-v1

Imre Kaloz (1):
  powerpc/4xx: MTD support for the AMCC Taishan Board

Josh Boyer (2):
  Revert [POWERPC] 4xx: Fix 460GT support to not enable FPU
  powerpc/4xx: Workaround for PPC440EPx/GRx PCI_28 Errata

Stefan Roese (1):
  powerpc/4xx: PCIe driver now detects if a port is disabled via the dev-tre

Valentine Barshak (1):
  powerpc/4xx: Fix resource issue in warp-nand.c

 arch/powerpc/boot/Makefile  |3 +-
 arch/powerpc/boot/cuboot-sam440ep.c |   49 ++
 arch/powerpc/boot/dts/sam440ep.dts  |  293 +++
 arch/powerpc/boot/dts/taishan.dts   |   29 +-
 arch/powerpc/configs/44x/sam440ep_defconfig | 1192 +++
 arch/powerpc/configs/44x/taishan_defconfig  |   79 ++-
 arch/powerpc/kernel/cpu_setup_44x.S |1 +
 arch/powerpc/kernel/cputable.c  |4 +-
 arch/powerpc/platforms/44x/Kconfig  |9 +
 arch/powerpc/platforms/44x/Makefile |1 +
 arch/powerpc/platforms/44x/sam440ep.c   |   79 ++
 arch/powerpc/platforms/44x/warp-nand.c  |3 +-
 arch/powerpc/sysdev/indirect_pci.c  |6 +
 arch/powerpc/sysdev/ppc4xx_pci.c|   14 +
 include/asm-powerpc/pci-bridge.h|3 +
 15 files changed, 1759 insertions(+), 6 deletions(-)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] fs_enet: restore promiscuous and multicast settings in restart()

2008-06-19 Thread Vitaly Bordug
On Wed, 18 Jun 2008 22:45:57 +0400
Matvejchikov Ilya [EMAIL PROTECTED] wrote:

 I'm glad that you have corrected it. Half a year ago I pointed out
 that there was such a mistake:
 http://patchwork.ozlabs.org/linuxppc/patch?id=10700
 
You've used -embedded ML, and patch wasn't noticed... I can add your S-O-B line 
if that will make you fill better :) 

-Vitaly
 Thanks.
 
 2008/6/18 Laurent Pinchart [EMAIL PROTECTED]:
  The restart() function is called when the link state changes and resets
  multicast and promiscous settings. This patch restores those settings at the
  end of restart().
 
  Signed-off-by: Laurent Pinchart [EMAIL PROTECTED]
  ---
   drivers/net/fs_enet/mac-fcc.c |3 +++
   2 files changed, 4 insertions(+), 1 deletions(-)
 
  diff --git a/drivers/net/fs_enet/mac-fcc.c b/drivers/net/fs_enet/mac-fcc.c
  index ce40cf9..1a95cf1 100644
  --- a/drivers/net/fs_enet/mac-fcc.c
  +++ b/drivers/net/fs_enet/mac-fcc.c
  @@ -464,6 +464,9 @@ static void restart(struct net_device *dev)
 C32(fccp, fcc_fpsmr, FCC_PSMR_FDE | FCC_PSMR_LPB);
 
 S32(fccp, fcc_gfmr, FCC_GFMR_ENR | FCC_GFMR_ENT);
  +
  +   /* Restore multicast and promiscous settings */
  +   set_multicast_list(dev);
   }
 
   static void stop(struct net_device *dev)
  --
  1.5.0
 
  --
  Laurent Pinchart
  CSE Semaphore Belgium
 
  Chaussee de Bruxelles, 732A
  B-1410 Waterloo
  Belgium
 
  T +32 (2) 387 42 59
  F +32 (2) 387 42 75
 


-- 
Sincerely, 
Vitaly
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] fs_enet: restore promiscuous and multicast settings in restart()

2008-06-19 Thread Jon Loeliger

Vitaly Bordug wrote:

On Wed, 18 Jun 2008 22:45:57 +0400
Matvejchikov Ilya [EMAIL PROTECTED] wrote:


I'm glad that you have corrected it. Half a year ago I pointed out
that there was such a mistake:
http://patchwork.ozlabs.org/linuxppc/patch?id=10700

You've used -embedded ML, and patch wasn't noticed... 


*sigh*

We should merge the -embedded list into -dev
and retire the -embedded list finally.

jdl


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] fs_enet: restore promiscuous and multicast settings in restart()

2008-06-19 Thread Olof Johansson

On Jun 19, 2008, at 1:47 PM, Jon Loeliger wrote:


We should merge the -embedded list into -dev
and retire the -embedded list finally.


I used to be an opponent to this given the amount of help my board  
doesn't work questions on -embedded, but the volume isn't that great,  
and much lower than the -dev list anyway.


So yes, I agree.


-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] fs_enet: restore promiscuous and multicast settings in restart()

2008-06-19 Thread Matvejchikov Ilya
Yes, please. =)

2008/6/19 Vitaly Bordug [EMAIL PROTECTED]:
 On Wed, 18 Jun 2008 22:45:57 +0400
 Matvejchikov Ilya [EMAIL PROTECTED] wrote:

 I'm glad that you have corrected it. Half a year ago I pointed out
 that there was such a mistake:
 http://patchwork.ozlabs.org/linuxppc/patch?id=10700

 You've used -embedded ML, and patch wasn't noticed... I can add your S-O-B 
 line if that will make you fill better :)

 -Vitaly
 Thanks.

 2008/6/18 Laurent Pinchart [EMAIL PROTECTED]:
  The restart() function is called when the link state changes and resets
  multicast and promiscous settings. This patch restores those settings at 
  the
  end of restart().
 
  Signed-off-by: Laurent Pinchart [EMAIL PROTECTED]
  ---
   drivers/net/fs_enet/mac-fcc.c |3 +++
   2 files changed, 4 insertions(+), 1 deletions(-)
 
  diff --git a/drivers/net/fs_enet/mac-fcc.c b/drivers/net/fs_enet/mac-fcc.c
  index ce40cf9..1a95cf1 100644
  --- a/drivers/net/fs_enet/mac-fcc.c
  +++ b/drivers/net/fs_enet/mac-fcc.c
  @@ -464,6 +464,9 @@ static void restart(struct net_device *dev)
 C32(fccp, fcc_fpsmr, FCC_PSMR_FDE | FCC_PSMR_LPB);
 
 S32(fccp, fcc_gfmr, FCC_GFMR_ENR | FCC_GFMR_ENT);
  +
  +   /* Restore multicast and promiscous settings */
  +   set_multicast_list(dev);
   }
 
   static void stop(struct net_device *dev)
  --
  1.5.0
 
  --
  Laurent Pinchart
  CSE Semaphore Belgium
 
  Chaussee de Bruxelles, 732A
  B-1410 Waterloo
  Belgium
 
  T +32 (2) 387 42 59
  F +32 (2) 387 42 75
 


 --
 Sincerely,
 Vitaly

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Cbe-oss-dev] [RFC 3/3] powerpc: copy_4K_page tweaked for Cell

2008-06-19 Thread Arnd Bergmann
On Thursday 19 June 2008, Mark Nelson wrote:
 .align  7
 _GLOBAL(copy_4K_page)
 dcbt0,r4/* Prefetch ONE SRC cacheline */
 
 addir6,r3,-8/* prepare for stdu */
 addir4,r4,-8/* prepare for ldu */
 
 li  r10,32  /* copy 32 cache lines for a 4K page */
 li  r12,128+8   /* prefetch distance*/

Since you have a loop here anyway instead of the fully unrolled
code, why not provide a copy_64K_page function as well, jumping in
here?

The inline 64k copy_page function otherwise just adds code size,
as well as being a tiny bit slower. It may even be good to
have an out-of-line copy_64K_page for the regular code, just
calling copy_4K_page repeatedly.

Arnd 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH REPOST #2] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

2008-06-19 Thread Roland Dreier
  During corner case testing, we noticed that some versions of ehca 
  do not properly transition to interrupt done in special load situations.
  This can be resolved by periodically triggering EOI through H_EOI, 
  if eqes are pending.
  
  Signed-off-by: Stefan Roscher [EMAIL PROTECTED]
  ---
  As firmware team suggested I moved the call of the EOI h_call into 
  the handler function, this ensures that we will call EOI only when we 
  find a valid eqe on the event queue.
  Additionally I changed the calculation of the xirr value as Roland suggested.

paulus / benh -- does this version still get your ack?  Seems that fw
team is OK with it according to Stefan...

If so I will add this to my tree for 2.6.27.

  diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c 
  b/drivers/infiniband/hw/ehca/ehca_irq.c
  index ce1ab05..0792d93 100644
  --- a/drivers/infiniband/hw/ehca/ehca_irq.c
  +++ b/drivers/infiniband/hw/ehca/ehca_irq.c
  @@ -531,7 +531,7 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq)
   {
   struct ehca_eq *eq = shca-eq;
   struct ehca_eqe_cache_entry *eqe_cache = eq-eqe_cache;
  -u64 eqe_value;
  +u64 eqe_value, ret;
   unsigned long flags;
   int eqe_cnt, i;
   int eq_empty = 0;
  @@ -583,8 +583,13 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq)
   ehca_dbg(shca-ib_device,
No eqe found for irq event);
   goto unlock_irq_spinlock;
  -} else if (!is_irq)
  +} else if (!is_irq) {
  +ret = hipz_h_eoi(eq-ist);
  +if (ret != H_SUCCESS)
  +ehca_err(shca-ib_device,
  + bad return code EOI -rc = %ld\n, ret);
   ehca_dbg(shca-ib_device, deadman found %x eqe, eqe_cnt);
  +}
   if (unlikely(eqe_cnt == EHCA_EQE_CACHE_SIZE))
   ehca_dbg(shca-ib_device, too many eqes for one irq event);
   /* enable irq for new packets */
  diff --git a/drivers/infiniband/hw/ehca/hcp_if.c 
  b/drivers/infiniband/hw/ehca/hcp_if.c
  index 5245e13..415d3a4 100644
  --- a/drivers/infiniband/hw/ehca/hcp_if.c
  +++ b/drivers/infiniband/hw/ehca/hcp_if.c
  @@ -933,3 +933,13 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle 
  adapter_handle,
  r_cb,
  0, 0, 0, 0);
   }
  +
  +u64 hipz_h_eoi(int irq)
  +{
  +unsigned long xirr;
  +
  +iosync();
  +xirr = (0xffULL  24) | irq;
  +
  +return plpar_hcall_norets(H_EOI, xirr);
  +}
  diff --git a/drivers/infiniband/hw/ehca/hcp_if.h 
  b/drivers/infiniband/hw/ehca/hcp_if.h
  index 60ce02b..2c3c6e0 100644
  --- a/drivers/infiniband/hw/ehca/hcp_if.h
  +++ b/drivers/infiniband/hw/ehca/hcp_if.h
  @@ -260,5 +260,6 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle 
  adapter_handle,
 const u64 ressource_handle,
 void *rblock,
 unsigned long *byte_count);
  +u64 hipz_h_eoi(int irq);
   
   #endif /* __HCP_IF_H__ */
  -- 
  1.5.5
  
  
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell

2008-06-19 Thread Paul Mackerras
Gunnar von Boehn writes:

 I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
 On PPC-970 the CELL memcpy is faster than the current Linux routine.
 This becomes really visible when you really copy memory-to-memory and are
 not only working in the 2ndlevelcache.

Could you send some more details, like the actual copy speed you
measured and how you did the tests?

Thanks,
Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell

2008-06-19 Thread Mark Nelson
On Thu, 19 Jun 2008 09:53:16 pm Arnd Bergmann wrote:
 On Thursday 19 June 2008, Mark Nelson wrote:
  The plan is to use Michael Ellerman's code patching work so that at runtime
  if we're running on a Cell machine the new routines are called but otherwise
  the existing memory copy routines are used.
 
 Have you tried running this code on other platforms to see if it
 actually performs worse on any of them? I would guess that the
 older code also doesn't work too well on Power 5 and Power 6, so the
 cell optimized version could give us a significant advantage as well,
 albeit less than another CPU specific version.
 
   Arnd 
 

I did run the tests on Power 5 and Power 6, and on Power 5 with the
new routines, the iperf bandwidth increased to 7.9 GBits/sec up from
7.5 GBits/sec; but on Power 6 the bandwidth with the old routines
was 13.6 GBits/sec compared to 12.8 GBits/sec...

I also couldn't get the updated routines to boot on 970MP without
removing the dcbz instructions.

I'll investigate more and also rerun the tests again

Thanks!

Mark
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell

2008-06-19 Thread Mark Nelson
On Fri, 20 Jun 2008 12:53:49 am Olof Johansson wrote:
 
 On Jun 19, 2008, at 8:59 AM, Arnd Bergmann wrote:
 
  I assume it has suffered from bitrot and nobody tried to do better
  since the Power3 days. AFAICT, it hasn't seen any update since your
  original Power4 version from 2002.
 
 I've got an out-of-tree optimized version for pa6t as well that I  
 haven't bothered posting yet.
 
 The real pain with the usercopy code is all the exception cases. If  
 anyone has made a test harness to make sure they're all right, please  
 do post it for others to use as well...

I second that request - I verified (to the best that I could) with
pen and paper that the exception handling on this new version
is correct but it would be great to have a better way to test it.

Mark
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Cbe-oss-dev] [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell

2008-06-19 Thread Paul Mackerras
Gunnar von Boehn writes:

 The regular code was much slower for the normal case and has a special
 version for the 4K optimized case.

That's a slightly inaccurate view...

The reason for having the two cases is that when I profiled the
distribution of sizes and alignments of memory copies in the kernel,
the result was that almost all copies (something like 99%, IIRC) were
either 128 bytes or less, or else a whole page at a page-aligned
address.

Thus we get the best performance by having a simple copy routine with
minimal setup overhead for the small copy case, plus an aggressively
optimized page copy routine.  Spending time setting up for a
multi-cacheline copy that's not a whole page is just going to hurt the
small copy case without providing any real benefit.

Transferring data over loopback is possibly an exception to that.
However, it's very rare to transfer large amounts of data over
loopback, unless you're running a benchmark like iperf or netperf. :-/

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell

2008-06-19 Thread Mark Nelson
 * The naming of the labels (with just numbers) is rather confusing,
 it would be good to have something better, but I must admit that
 I don't have a good idea either.

I will admit that at first glance the label naming with numbers
does look confusing but when you notice that all the loads start
at 20 and all the stores start at 60 and that to get the exception
handler for those instructions you just add 100 I think it makes
sense, but that could be because I've been looking at it way too
long...

(I thought I had a comment in there to that effect but it must
have gotten lost along the way. I'll add a new comment
explaining the above, that should help)

 
 * The trick of using the condition code in cr7 for the last bytes
 is really cute, but are the four branches actually better than a
 single computed branch into the middle of 15 byte wise copies?

The original copy_tofrom_user does this also, which I guess is
carried over to this new version...

Gunnar did you have an old version that did something similar
to this?

Mark
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Cbe-oss-dev] [RFC 3/3] powerpc: copy_4K_page tweaked for Cell

2008-06-19 Thread Mark Nelson
On Fri, 20 Jun 2008 07:28:50 am Arnd Bergmann wrote:
 On Thursday 19 June 2008, Mark Nelson wrote:
  .align  7
  _GLOBAL(copy_4K_page)
  dcbt0,r4/* Prefetch ONE SRC cacheline */
  
  addir6,r3,-8/* prepare for stdu */
  addir4,r4,-8/* prepare for ldu */
  
  li  r10,32  /* copy 32 cache lines for a 4K page */
  li  r12,128+8   /* prefetch distance*/
 
 Since you have a loop here anyway instead of the fully unrolled
 code, why not provide a copy_64K_page function as well, jumping in
 here?

That is a good idea. What effect will that have on how the code
patching will work?

 
 The inline 64k copy_page function otherwise just adds code size,
 as well as being a tiny bit slower. It may even be good to
 have an out-of-line copy_64K_page for the regular code, just
 calling copy_4K_page repeatedly.

Doing that sounds like it'll make the code patching easier.

Thanks!

Mark
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] fs_enet: restore promiscuous and multicast settings in restart()

2008-06-19 Thread Bill Fink
On Wed, 18 Jun 2008, Laurent Pinchart wrote:

 The restart() function is called when the link state changes and resets
 multicast and promiscous settings. This patch restores those settings at the
 end of restart().
 
 Signed-off-by: Laurent Pinchart [EMAIL PROTECTED]
 ---
  drivers/net/fs_enet/mac-fcc.c |3 +++
  2 files changed, 4 insertions(+), 1 deletions(-)

Is the whole patch here?  The above says 2 files changed and 5 lines
changed, but what's included here is only 1 file and 3 line changes.

 diff --git a/drivers/net/fs_enet/mac-fcc.c b/drivers/net/fs_enet/mac-fcc.c
 index ce40cf9..1a95cf1 100644
 --- a/drivers/net/fs_enet/mac-fcc.c
 +++ b/drivers/net/fs_enet/mac-fcc.c
 @@ -464,6 +464,9 @@ static void restart(struct net_device *dev)
   C32(fccp, fcc_fpsmr, FCC_PSMR_FDE | FCC_PSMR_LPB);
  
   S32(fccp, fcc_gfmr, FCC_GFMR_ENR | FCC_GFMR_ENT);
 +
 + /* Restore multicast and promiscous settings */
 + set_multicast_list(dev);
  }
  
  static void stop(struct net_device *dev)

-Bill
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.

2008-06-19 Thread Michael Neuling
The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7.  Includes context switch, ptrace and signals support.

Signed-off-by: Michael Neuling [EMAIL PROTECTED]
--- 
Paulus: please consider for your 2.6.27 tree.

Updated with comments from Kumar, Milton, Dave Woodhouse and Mark
'NKOTB' Nelson.
- Changed thread_struct array definition to be cleaner
- Updated CPU_FTRS_POSSIBLE 
- Updated Kconfig typo and dupilicate
- Added comment to clarify ibm,vmx = 2 really means VSX. 

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 3/9] powerpc: Move altivec_unavailable

2008-06-19 Thread Michael Neuling
Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.

Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---

 arch/powerpc/kernel/head_64.S |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf00
b   performance_monitor_pSeries
 
-   STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+   . = 0xf20
+   b   altivec_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
+   STD_EXCEPTION_PSERIES(., altivec_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.

2008-06-19 Thread Michael Neuling
We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers.  Update all code to use these new macros.

Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---

 arch/powerpc/kernel/align.c   |6 ++--
 arch/powerpc/kernel/asm-offsets.c |2 -
 arch/powerpc/kernel/process.c |5 ++-
 arch/powerpc/kernel/ptrace.c  |   14 +
 arch/powerpc/kernel/ptrace32.c|9 --
 arch/powerpc/kernel/signal_32.c   |6 ++--
 arch/powerpc/kernel/signal_64.c   |   13 +---
 arch/powerpc/kernel/softemu8xx.c  |4 +-
 arch/powerpc/math-emu/math.c  |   56 +++---
 include/asm-powerpc/ppc_asm.h |5 ++-
 include/asm-powerpc/processor.h   |7 
 11 files changed, 71 insertions(+), 56 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
 static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
   unsigned int reg, unsigned int flags)
 {
-   char *ptr = (char *) current-thread.fpr[reg];
+   char *ptr = (char *) current-thread.TS_FPR(reg);
int i, ret;
 
if (!(flags  F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
return -EFAULT;
}
} else if (flags  F) {
-   data.dd = current-thread.fpr[reg];
+   data.dd = current-thread.TS_FPR(reg);
if (flags  S) {
/* Single-precision FP store requires conversion... */
 #ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
if (unlikely(ret))
return -EFAULT;
} else if (flags  F)
-   current-thread.fpr[reg] = data.dd;
+   current-thread.TS_FPR(reg) = data.dd;
else
regs-gpr[reg] = data.ll;
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -66,7 +66,7 @@ int main(void)
DEFINE(KSP_LIMIT, offsetof(struct thread_struct, ksp_limit));
DEFINE(PT_REGS, offsetof(struct thread_struct, regs));
DEFINE(THREAD_FPEXC_MODE, offsetof(struct thread_struct, fpexc_mode));
-   DEFINE(THREAD_FPR0, offsetof(struct thread_struct, fpr[0]));
+   DEFINE(THREAD_FPR0, offsetof(struct thread_struct, TS_FPR(0)));
DEFINE(THREAD_FPSCR, offsetof(struct thread_struct, fpscr));
 #ifdef CONFIG_ALTIVEC
DEFINE(THREAD_VR0, offsetof(struct thread_struct, vr[0]));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
return 0;
flush_fp_to_thread(current);
 
-   memcpy(fpregs, tsk-thread.fpr[0], sizeof(*fpregs));
+   memcpy(fpregs, tsk-thread.TS_FPR(0), sizeof(*fpregs));
 
return 1;
 }
@@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
discard_lazy_cpu_state();
-   memset(current-thread.fpr, 0, sizeof(current-thread.fpr));
+   memset(current-thread.TS_FPRSTART, 0,
+  sizeof(current-thread.TS_FPRSTART));
current-thread.fpscr.val = 0;
 #ifdef CONFIG_ALTIVEC
memset(current-thread.vr, 0, sizeof(current-thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
flush_fp_to_thread(target);
 
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-offsetof(struct thread_struct, fpr[32]));
+offsetof(struct thread_struct, TS_FPR(32)));
 
return user_regset_copyout(pos, count, kbuf, ubuf,
-  target-thread.fpr, 0, -1);
+  target-thread.TS_FPRSTART, 0, -1);
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset 
*regset,
@@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
flush_fp_to_thread(target);
 
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-offsetof(struct thread_struct, fpr[32]));
+offsetof(struct thread_struct, TS_FPR(32)));
 
return user_regset_copyin(pos, 

[PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code

2008-06-19 Thread Michael Neuling
If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit.  This will never happen in reality (VMX and SPE will never be in
the same processor as their opcodes overlap), but it looks bad.  Also
when we add VSX here in a later patch, we can hit two of these at the
same time.  

Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---

 arch/powerpc/kernel/signal_32.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
 static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
int sigret)
 {
+   unsigned long msr = regs-msr;
+
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
 
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_VEC in the saved MSR value to indicate that
   frame-mc_vregs contains valid data */
-   if (__put_user(regs-msr | MSR_VEC, frame-mc_gregs[PT_MSR]))
-   return 1;
+   msr |= MSR_VEC;
}
/* else assert((regs-msr  MSR_VEC) == 0) */
 
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_SPE in the saved MSR value to indicate that
   frame-mc_vregs contains valid data */
-   if (__put_user(regs-msr | MSR_SPE, frame-mc_gregs[PT_MSR]))
-   return 1;
+   msr |= MSR_SPE;
}
/* else assert((regs-msr  MSR_SPE) == 0) */
 
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
return 1;
 #endif /* CONFIG_SPE */
 
+   if (__put_user(msr, frame-mc_gregs[PT_MSR]))
+   return 1;
if (sigret) {
/* Set up the sigreturn trampoline: li r0,sigret; sc */
if (__put_user(0x3800UL + sigret, frame-tramp[0])
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable

2008-06-19 Thread Michael Neuling
Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.  

Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---

 arch/powerpc/kernel/fpu.S|2 +-
 arch/powerpc/kernel/head_32.S|6 --
 arch/powerpc/kernel/head_64.S|8 +---
 arch/powerpc/kernel/head_booke.h |6 --
 4 files changed, 14 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
 #endif /* CONFIG_SMP */
/* restore registers and return */
/* we haven't used ctr or xer or lr */
-   b   fast_exception_return
+   blr
 
 /*
  * giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
b   ProgramCheck
 END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
EXCEPTION_PROLOG
-   bne load_up_fpu /* if from user, just load it up */
-   addir3,r1,STACK_FRAME_OVERHEAD
+   beq 1f
+   bl  load_up_fpu /* if from user, just load it up */
+   b   fast_exception_return
+1: addir3,r1,STACK_FRAME_OVERHEAD
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 /* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
ENABLE_INTS
bl  .kernel_fp_unavailable_exception
BUG_OPCODE
-1: b   .load_up_fpu
+1: bl  .load_up_fpu
+   b   fast_exception_return
 
.align  7
.globl altivec_unavailable_common
@@ -749,7 +750,8 @@ altivec_unavailable_common:
EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
-   bne .load_up_altivec/* if from user, just load it up */
+   bnel.load_up_altivec
+   b   fast_exception_return
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif
bl  .save_nvgprs
@@ -829,7 +831,7 @@ _STATIC(load_up_altivec)
std r4,0(r3)
 #endif /* CONFIG_SMP */
/* restore registers and return */
-   b   fast_exception_return
+   blr
 #endif /* CONFIG_ALTIVEC */
 
 /*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
 #define FP_UNAVAILABLE_EXCEPTION \
START_EXCEPTION(FloatingPointUnavailable) \
NORMAL_EXCEPTION_PROLOG;  \
-   bne load_up_fpu;/* if from user, just load it up */   \
-   addir3,r1,STACK_FRAME_OVERHEAD;   \
+   beq 1f;   \
+   bl  load_up_fpu;/* if from user, just load it up */   \
+   b   fast_exception_return;\
+1: addir3,r1,STACK_FRAME_OVERHEAD;   \
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 #endif /* __HEAD_BOOKE_H__ */
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX

2008-06-19 Thread Michael Neuling
The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:

   VSR doubleword 0   VSR doubleword 1
  
  VSR[0]  | FPR[0]|  |
  
  VSR[1]  | FPR[1]|  |
  
  |  ...  |  |
  |  ...  |  |
  
  VSR[30] | FPR[30]   |  |
  
  VSR[31] | FPR[31]   |  |
  
  VSR[32] | VR[0]|
  
  VSR[33] | VR[1]|
  
  |  ... |
  |  ... |
  
  VSR[62] | VR[30]   |
  
  VSR[63] | VR[31]   |
  

VSX has 64 128bit registers.  The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits.  The
second 32 regs overlap with the VMX registers.

This patch introduces the thread_struct changes required to reflect
this register layout.  Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.

Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---

 arch/powerpc/kernel/asm-offsets.c |4 ++
 arch/powerpc/kernel/ptrace.c  |   28 ++
 arch/powerpc/kernel/signal_32.c   |   59 +-
 arch/powerpc/kernel/signal_64.c   |   36 +++
 include/asm-powerpc/processor.h   |   31 +++
 5 files changed, 139 insertions(+), 19 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+   DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpvsr[0].vsr));
+   DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
 #else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t
   unsigned int pos, unsigned int count,
   void *kbuf, void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+   double buf[33];
+   int i;
+#endif
flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+   /* copy to local buffer then write that out */
+   for (i = 0; i  32 ; i++)
+   buf[i] = target-thread.TS_FPR(i);
+   memcpy(buf[32], target-thread.fpscr, sizeof(double));
+   return user_regset_copyout(pos, count, kbuf, ubuf, buf, 0, -1);
+
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 offsetof(struct thread_struct, TS_FPR(32)));
 
return user_regset_copyout(pos, count, kbuf, ubuf,
   target-thread.TS_FPRSTART, 0, -1);
+#endif
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset 
*regset,
   unsigned int pos, unsigned int count,
   const void *kbuf, const void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+   double buf[33];
+   int i;
+#endif
flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+   /* copy to local buffer then write that out */
+   i = user_regset_copyin(pos, count, 

[PATCH 9/9] powerpc: Add CONFIG_VSX config option

2008-06-19 Thread Michael Neuling
Add CONFIG_VSX config build option.  Must compile with POWER4, FPU and ALTIVEC.

Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---

 arch/powerpc/platforms/Kconfig.cputype |   16 
 1 file changed, 16 insertions(+)

Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -155,6 +155,22 @@ config ALTIVEC
 
  If in doubt, say Y here.
 
+config VSX
+   bool VSX Support
+   depends on POWER4  ALTIVEC  PPC_FPU
+   ---help---
+
+ This option enables kernel support for the Vector Scaler extensions
+ to the PowerPC processor. The kernel currently supports saving and
+ restoring VSX registers, and turning on the 'VSX enable' bit so user
+ processes can execute VSX instructions.
+
+ This option is only useful if you have a processor that supports
+ VSX (P7 and above), but does not have any affect on a non-VSX
+ CPUs (it does, however add code to the kernel).
+
+ If in doubt, say Y here.
+
 config SPE
bool SPE Support
depends on E200 || E500
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 6/9] powerpc: Add VSX CPU feature

2008-06-19 Thread Michael Neuling
Add a VSX CPU feature.  Also add code to detect if VSX is available
from the device tree.

Signed-off-by: Michael Neuling [EMAIL PROTECTED]
Signed-off-by: Joel Schopp [EMAIL PROTECTED]

---

 arch/powerpc/kernel/prom.c |4 
 include/asm-powerpc/cputable.h |   15 ++-
 2 files changed, 18 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,10 @@ static struct feature_property {
{altivec, 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
{ibm,vmx, 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+   /* Yes, this _really_ is ibm,vmx == 2 to enable VSX */
+   {ibm,vmx, 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
{ibm,dfp, 1, 0, PPC_FEATURE_HAS_DFP},
{ibm,purr, 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
 #define PPC_FEATURE_HAS_DFP0x0400
 #define PPC_FEATURE_POWER6_EXT 0x0200
 #define PPC_FEATURE_ARCH_2_06  0x0100
+#define PPC_FEATURE_HAS_VSX0x0080
 
 #define PPC_FEATURE_TRUE_LE0x0002
 #define PPC_FEATURE_PPC_LE 0x0001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
 #define CPU_FTR_DSCR   LONG_ASM_CONST(0x0002)
 #define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004)
 #define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008)
+#define CPU_FTR_VSXLONG_ASM_CONST(0x0010)
 
 #ifndef __ASSEMBLY__
 
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
 #define PPC_FEATURE_HAS_ALTIVEC_COMP0
 #endif
 
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP   CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP   0
+#define PPC_FEATURE_HAS_VSX_COMP0
+#endif
+
 /* We only set the spe features if the kernel was compiled with spe
  * support
  */
@@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l
(CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 |\
CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 |   \
CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T |   \
-   CPU_FTR_1T_SEGMENT)
+   CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
 #else
 enum {
CPU_FTRS_POSSIBLE =
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 7/9] powerpc: Add VSX assembler code macros

2008-06-19 Thread Michael Neuling
This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.

Also add VSX register save/restore macros and vsr[0-63] register definitions.

Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---

 include/asm-powerpc/ppc_asm.h |  127 ++
 1 file changed, 127 insertions(+)

Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR); 
REST_10GPRS(22, base)
 #endif
 
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb)(((xs)  0x1f)  21 | ((ra)  16) |  \
+((rb)  11) | (((xs)  5)))
+
+#define STXVD2X(xs, ra, rb).long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb) .long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
 
 #define SAVE_2GPRS(n, base)SAVE_GPR(n, base); SAVE_GPR(n+1, base)
 #define SAVE_4GPRS(n, base)SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);   

 #define REST_16VRS(n,b,base)   REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
 #define REST_32VRS(n,b,base)   REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
 
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base) li b,THREAD_VSR0+(16*(n));  STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base)   SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base)   SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base)   SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base)  SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base)  SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base)   REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base)   REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base)   REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base)  REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base)  REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base)li b,THREAD_VR0+(16*(n));  STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base)  SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base)  SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base)  SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base) SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base) SAVE_16VSRSU(n,b,base); 
SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base)li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base)  REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base)  REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base)  REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base); 
REST_16VSRSU(n+16,b,base)
+
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base)
\
+BEGIN_FTR_SECTION  \
+   b   2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);\
+   REST_32FPRS(n,base);\
+   b   3f; \
+2: REST_32VSRS(n,c,base);  \
+3:
+
+#define SAVE_32FPVSRS(n,c,base)
\
+BEGIN_FTR_SECTION  \
+   b   2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);\
+   SAVE_32FPRS(n,base);\
+   b   3f; \
+2: SAVE_32VSRS(n,c,base);  \
+3:
+
+#else
+#define REST_32FPVSRS(n,b,base)REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base)SAVE_32FPRS(n, base)
+#endif
+
 #define SAVE_EVR(n,s,base) evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
 #define SAVE_2EVRS(n,s,base)   SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
 #define SAVE_4EVRS(n,s,base)   SAVE_2EVRS(n,s,base); 

[PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support

2008-06-19 Thread Michael Neuling
This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available.  This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.

Mixing FP, VMX and VSX code will get constant architected state.

The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers.  Backward
compatibility is maintained.  

The ptrace interface is also extended to allow access to VSR 0-31 full
registers.

Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---

 arch/powerpc/kernel/entry_64.S   |5 +
 arch/powerpc/kernel/fpu.S|   16 -
 arch/powerpc/kernel/head_64.S|   65 +++
 arch/powerpc/kernel/misc_64.S|   33 +++
 arch/powerpc/kernel/ppc32.h  |1 
 arch/powerpc/kernel/ppc_ksyms.c  |3 +
 arch/powerpc/kernel/process.c|  109 ++-
 arch/powerpc/kernel/ptrace.c |   70 +
 arch/powerpc/kernel/signal_32.c  |   33 +++
 arch/powerpc/kernel/signal_64.c  |   31 ++-
 arch/powerpc/kernel/traps.c  |   29 ++
 include/asm-powerpc/elf.h|6 +-
 include/asm-powerpc/ptrace.h |   12 
 include/asm-powerpc/reg.h|2 
 include/asm-powerpc/sigcontext.h |   37 -
 include/asm-powerpc/system.h |9 +++
 include/linux/elf.h  |1 
 17 files changed, 454 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
mflrr20 /* Return to switch caller */
mfmsr   r22
li  r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+   orisr0,r0,[EMAIL PROTECTED] /* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
orisr0,r0,[EMAIL PROTECTED] /* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -34,6 +34,11 @@
 _GLOBAL(load_up_fpu)
mfmsr   r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+   orisr5,r5,[EMAIL PROTECTED]
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC
MTMSRD(r5)  /* enable use of fpu now */
isync
@@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu)
beq 1f
toreal(r4)
addir4,r4,THREAD/* want last_task_used_math-thread */
-   SAVE_32FPRS(0, r4)
+   SAVE_32FPVSRS(0, r5, r4)
mffsfr0
stfdfr0,THREAD_FPSCR(r4)
PPC_LL  r5,PT_REGS(r4)
@@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu)
 #endif
lfd fr0,THREAD_FPSCR(r5)
MTFSF_L(fr0)
-   REST_32FPRS(0, r5)
+   REST_32FPVSRS(0, r4, r5)
 #ifndef CONFIG_SMP
subir4,r5,THREAD
fromreal(r4)
@@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu)
 _GLOBAL(giveup_fpu)
mfmsr   r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+   orisr5,r5,[EMAIL PROTECTED]
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC_601
ISYNC_601
MTMSRD(r5)  /* enable use of fpu now */
@@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu)
addir3,r3,THREAD/* want THREAD of task */
PPC_LL  r5,PT_REGS(r3)
PPC_LCMPI   0,r5,0
-   SAVE_32FPRS(0, r3)
+   SAVE_32FPVSRS(0, r4 ,r3)
mffsfr0
stfdfr0,THREAD_FPSCR(r3)
beq 1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf20
b   altivec_unavailable_pSeries
 
+   . = 0xf40
+   b   vsx_unavailable_pSeries
+
 #ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
 #endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
STD_EXCEPTION_PSERIES(., altivec_unavailable)
+   STD_EXCEPTION_PSERIES(., vsx_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -834,6 +838,67 @@ _STATIC(load_up_altivec)
blr
 #endif /* CONFIG_ALTIVEC */
 
+   .align  7
+   .globl vsx_unavailable_common
+vsx_unavailable_common:
+   EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION