[PATCH] mm/hugetlb: sanity check gigantic hugepage count

2014-11-14 Thread James Yang
Limit the number of gigantic hugepages specified by the
hugepages= parameter to MAX_NUMBER_GPAGES.

Signed-off-by: James Yang james.y...@freescale.com
---
 arch/powerpc/mm/hugetlbpage.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index d67db4b..5ef84a2 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -362,6 +362,13 @@ static int __init do_gpage_early_setup(char *param, char 
*val,
if (size != 0) {
if (sscanf(val, %lu, npages) = 0)
npages = 0;
+   if (npages  MAX_NUMBER_GPAGES) {
+   pr_warn(MMU: %lu pages requested for page 
+   size %llu KB, limiting to 
+   __stringify(MAX_NUMBER_GPAGES) \n,
+   npages, size / 1024);
+   npages = MAX_NUMBER_GPAGES;
+   }
gpage_npages[shift_to_mmu_psize(__ffs(size))] = npages;
size = 0;
}
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc: Add cpu family documentation

2014-05-14 Thread James Yang
On Wed, 30 Apr 2014, Scott Wood wrote:

 On Wed, 2014-04-30 at 16:45 +1000, Michael Ellerman wrote:
  On Tue, 2014-02-04 at 16:43 -0600, Scott Wood wrote: 
+Motorola/Freescale 8xx
+--
+
+ - Software loaded with hardware assist.
+ - All 32 bit
+
+   +--+
+   | 8xx  |
+   +--+
+  |
+  |
+  v
+   +--+
+   | 850  |
+   +--+
   
   Is the core of MPC850 different from other MPC8xx?
  
  Dunno, maybe someone who works at Freescale knows ;)
 
 I think they're the same -- I was just wondering if you had some
 difference in mind that led you to single it out.


They are the same.  

There should not be a separate box that singles out 850.  (Still don't 
know why the diagram was drawn to single out 850 in the first place.)

The CPU core should be called MPC8xx Core.  

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?

2014-02-10 Thread James Yang
On Sun, 9 Feb 2014, Stephen N Chivers wrote:

 James Yang james.y...@freescale.com wrote on 02/08/2014 07:49:40 AM:
 
  From: James Yang james.y...@freescale.com
  To: Gabriel Paubert paub...@iram.es
  Cc: Stephen N Chivers schiv...@csc.com.au, Chris Proctor 
  cproc...@csc.com.au, linuxppc-dev@lists.ozlabs.org
  Date: 02/08/2014 07:49 AM
  Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
  
  On Fri, 7 Feb 2014, Gabriel Paubert wrote:
  
  Hi Stephen,
   
   On Fri, Feb 07, 2014 at 11:27:57AM +1000, Stephen N Chivers wrote:
Gabriel Paubert paub...@iram.es wrote on 02/06/2014 07:26:37 PM:

 From: Gabriel Paubert paub...@iram.es
 To: Stephen N Chivers schiv...@csc.com.au
 Cc: linuxppc-dev@lists.ozlabs.org, Chris Proctor 
 cproc...@csc.com.au
 Date: 02/06/2014 07:26 PM
 Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
 
 On Thu, Feb 06, 2014 at 12:09:00PM +1000, Stephen N Chivers wrote:
  With the above mask computation I get consistent results for 
  both the MPC8548 and MPC7410 boards.
  
  Am I missing something subtle?
 
 No I think you are correct. This said, this code may probably be 
optimized 
 to eliminate a lot of the conditional branches. I think that:
  
  
  If the compiler is enabled to generate isel instructions, it would not 
  use a conditional branch for this code. (ignore the andi's values, 
  this is an old compile)
  
 From limited research, the 440GP is a processor
 that doesn't implement the isel instruction and it does
 not implement floating point.
 
 The kernel emulates isel and so using that instruction
 for the 440GP would have a double trap penalty.

Are you writing about something outside the scope of this thread?  OP 
was using MPC8548 not a 440GP.  The compiler should not be using or 
targeting 8548 for a 440GP so having to emulate isel shouldn't be an 
issue because there wouldn't be any.  (The assembly listing I posted 
was generated by gcc targeting 8548.)  Anyway, I had measured the 
non-isel routines to be faster and that works without illop traps.

 
 Correct me if I am wrong, the isel instruction first appears
 in PowerPC ISA v2.04 around mid 2007. 


isel appeared in 2003 in the e500 (v1) core that is in the MPC8540.  
The instruction is Power ISA 2.03 (9/2006).

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?

2014-02-10 Thread James Yang
On Mon, 10 Feb 2014, Gabriel Paubert wrote:

 On Fri, Feb 07, 2014 at 02:49:40PM -0600, James Yang wrote:
  On Fri, 7 Feb 2014, Gabriel Paubert wrote:
  
 Hi Stephen,
   
   On Fri, Feb 07, 2014 at 11:27:57AM +1000, Stephen N Chivers wrote:
Gabriel Paubert paub...@iram.es wrote on 02/06/2014 07:26:37 PM:

 From: Gabriel Paubert paub...@iram.es
 To: Stephen N Chivers schiv...@csc.com.au
 Cc: linuxppc-dev@lists.ozlabs.org, Chris Proctor cproc...@csc.com.au
 Date: 02/06/2014 07:26 PM
 Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
 
 On Thu, Feb 06, 2014 at 12:09:00PM +1000, Stephen N Chivers wrote:


  
  
  mask = 0;
  if (FM  (1  0))
  mask |= 0x000f;
  if (FM  (1  1))
  mask |= 0x00f0;
  if (FM  (1  2))
  mask |= 0x0f00;
  if (FM  (1  3))
  mask |= 0xf000;
  if (FM  (1  4))
  mask |= 0x000f;
  if (FM  (1  5))
  mask |= 0x00f0;
  if (FM  (1  6))
  mask |= 0x0f00;
  if (FM  (1  7))
  mask |= 0x9000;
  
  With the above mask computation I get consistent results for 
  both the MPC8548 and MPC7410 boards.

 Ok, if you have measured that method1 is faster than method2, let us go for 
 it.
 I believe method2 would be faster if you had a large out-of-order execution
 window, because more parallelism can be extracted from it, but this is 
 probably
 only true for high end cores, which do not need FPU emulation in the first 
 place.

Yeah, 8548 can issue 2 SFX instructions per cycle which is what the 
compiler generated.   

 
 The other place where we can optimize is the generation of FEX. Here is 
 my current patch:
 
 
 diff --git a/arch/powerpc/math-emu/mtfsf.c b/arch/powerpc/math-emu/mtfsf.c
 index dbce92e..b57b3fa8 100644
 --- a/arch/powerpc/math-emu/mtfsf.c
 +++ b/arch/powerpc/math-emu/mtfsf.c
 @@ -11,48 +11,35 @@ mtfsf(unsigned int FM, u32 *frB)
   u32 mask;
   u32 fpscr;
  
 - if (FM == 0)
 + if (likely(FM == 0xff))
 + mask = 0x;
 + else if (unlikely(FM == 0))
   return 0;
 -
 - if (FM == 0xff)
 - mask = 0x9fff;
   else {
 - mask = 0;
 - if (FM  (1  0))
 - mask |= 0x9000;
 - if (FM  (1  1))
 - mask |= 0x0f00;
 - if (FM  (1  2))
 - mask |= 0x00f0;
 - if (FM  (1  3))
 - mask |= 0x000f;
 - if (FM  (1  4))
 - mask |= 0xf000;
 - if (FM  (1  5))
 - mask |= 0x0f00;
 - if (FM  (1  6))
 - mask |= 0x00f0;
 - if (FM  (1  7))
 - mask |= 0x000f;
 + mask = (FM  1);
 + mask |= (FM  3)  0x10;
 + mask |= (FM  6)  0x100;
 + mask |= (FM  9)  0x1000;
 + mask |= (FM  12)  0x1;
 + mask |= (FM  15)  0x10;
 + mask |= (FM  18)  0x100;
 + mask |= (FM  21)  0x1000;
 + mask *= 15;


Needs to also mask out bits 1 and 2, they aren't to be set from frB.

mask = 0x9FFF;




   }
  
 - __FPU_FPSCR = ~(mask);
 - __FPU_FPSCR |= (frB[1]  mask);
 + fpscr = ((__FPU_FPSCR  ~mask) | (frB[1]  mask)) 
 + ~(FPSCR_VX | FPSCR_FEX);
  
 - __FPU_FPSCR = ~(FPSCR_VX);
 - if (__FPU_FPSCR  (FPSCR_VXSNAN | FPSCR_VXISI | FPSCR_VXIDI |
 + if (fpscr  (FPSCR_VXSNAN | FPSCR_VXISI | FPSCR_VXIDI |
FPSCR_VXZDZ | FPSCR_VXIMZ | FPSCR_VXVC |
FPSCR_VXSOFT | FPSCR_VXSQRT | FPSCR_VXCVI))
 - __FPU_FPSCR |= FPSCR_VX;
 -
 - fpscr = __FPU_FPSCR;
 - fpscr = ~(FPSCR_FEX);
 - if (((fpscr  FPSCR_VX)  (fpscr  FPSCR_VE)) ||
 - ((fpscr  FPSCR_OX)  (fpscr  FPSCR_OE)) ||
 - ((fpscr  FPSCR_UX)  (fpscr  FPSCR_UE)) ||
 - ((fpscr  FPSCR_ZX)  (fpscr  FPSCR_ZE)) ||
 - ((fpscr  FPSCR_XX)  (fpscr  FPSCR_XE)))
 - fpscr |= FPSCR_FEX;
 + fpscr |= FPSCR_VX;
 +
 + /* The bit order of exception enables and exception status
 +  * is the same. Simply shift and mask to check for enabled
 +  * exceptions.
 +  */
 + if (fpscr  (fpscr  22)   0xf8) fpscr |= FPSCR_FEX;
   __FPU_FPSCR = fpscr;
  
  #ifdef DEBUG
  mtfsf.c |   57 ++---
  1 file changed, 22 insertions(+), 35 deletions(-)
 
 
 Notes: 
 
 1) I'm really unsure on whether 0xff is frequent or not. So

Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?

2014-02-07 Thread James Yang
On Fri, 7 Feb 2014, Gabriel Paubert wrote:

   Hi Stephen,
 
 On Fri, Feb 07, 2014 at 11:27:57AM +1000, Stephen N Chivers wrote:
  Gabriel Paubert paub...@iram.es wrote on 02/06/2014 07:26:37 PM:
  
   From: Gabriel Paubert paub...@iram.es
   To: Stephen N Chivers schiv...@csc.com.au
   Cc: linuxppc-dev@lists.ozlabs.org, Chris Proctor cproc...@csc.com.au
   Date: 02/06/2014 07:26 PM
   Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
   
   On Thu, Feb 06, 2014 at 12:09:00PM +1000, Stephen N Chivers wrote:


mask = 0;
if (FM  (1  0))
mask |= 0x000f;
if (FM  (1  1))
mask |= 0x00f0;
if (FM  (1  2))
mask |= 0x0f00;
if (FM  (1  3))
mask |= 0xf000;
if (FM  (1  4))
mask |= 0x000f;
if (FM  (1  5))
mask |= 0x00f0;
if (FM  (1  6))
mask |= 0x0f00;
if (FM  (1  7))
mask |= 0x9000;

With the above mask computation I get consistent results for 
both the MPC8548 and MPC7410 boards.

Am I missing something subtle?
   
   No I think you are correct. This said, this code may probably be 
  optimized 
   to eliminate a lot of the conditional branches. I think that:


If the compiler is enabled to generate isel instructions, it would not 
use a conditional branch for this code. (ignore the andi's values, 
this is an old compile)

c0037c2c mtfsf:
c0037c2c:   2c 03 00 00 cmpwi   r3,0
c0037c30:   41 82 01 1c beq-c0037d4c mtfsf+0x120
c0037c34:   2f 83 00 ff cmpwi   cr7,r3,255
c0037c38:   41 9e 01 28 beq-cr7,c0037d60 mtfsf+0x134
c0037c3c:   70 66 00 01 andi.   r6,r3,1
c0037c40:   3d 00 90 00 lis r8,-28672
c0037c44:   7d 20 40 9e iseleq  r9,r0,r8
c0037c48:   70 6a 00 02 andi.   r10,r3,2
c0037c4c:   65 28 0f 00 orisr8,r9,3840
c0037c50:   7d 29 40 9e iseleq  r9,r9,r8
c0037c54:   70 66 00 04 andi.   r6,r3,4
c0037c58:   65 28 00 f0 orisr8,r9,240
c0037c5c:   7d 29 40 9e iseleq  r9,r9,r8
c0037c60:   70 6a 00 08 andi.   r10,r3,8
c0037c64:   65 28 00 0f orisr8,r9,15
c0037c68:   7d 29 40 9e iseleq  r9,r9,r8
c0037c6c:   70 66 00 10 andi.   r6,r3,16
c0037c70:   61 28 f0 00 ori r8,r9,61440
c0037c74:   7d 29 40 9e iseleq  r9,r9,r8
c0037c78:   70 6a 00 20 andi.   r10,r3,32
c0037c7c:   61 28 0f 00 ori r8,r9,3840
c0037c80:   54 6a cf fe rlwinm  r10,r3,25,31,31
c0037c84:   7d 29 40 9e iseleq  r9,r9,r8
c0037c88:   2f 8a 00 00 cmpwi   cr7,r10,0
c0037c8c:   70 66 00 40 andi.   r6,r3,64
c0037c90:   61 28 00 f0 ori r8,r9,240
c0037c94:   7d 29 40 9e iseleq  r9,r9,r8
c0037c98:   41 9e 00 08 beq-cr7,c0037ca0 mtfsf+0x74
c0037c9c:   61 29 00 0f ori r9,r9,15
...
 
However, your other solutions are better.


   
   mask = (FM  1);
   mask |= (FM  3)  0x10;
   mask |= (FM  6)  0x100;
   mask |= (FM  9)  0x1000;
   mask |= (FM  12)  0x1;
   mask |= (FM  15)  0x10;
   mask |= (FM  18)  0x100;
   mask |= (FM  21)  0x1000;
   mask *= 15;
   
   should do the job, in less code space and without a single branch.
   
   Each one of the mask |= lines should be translated into an
   rlwinm instruction followed by an or. Actually it should be possible
   to transform each of these lines into a single rlwimi instruction
   but I don't know how to coerce gcc to reach this level of optimization.
   
   Another way of optomizing this could be:
   
   mask = (FM  0x0f) | ((FM  12)  0x000f);
   mask = (mask  0x00030003) | ((mask  6)  0x03030303);
   mask = (mask  0x01010101) | ((mask  3)  0x10101010);
   mask *= 15;
   


 Ok, I finally edited my sources and test compiled the suggestions
 I gave. I'd say that method2 is the best overall indeed. You can
 actually save one more instruction by setting mask to all ones in 
 the case FM=0xff, but that's about all in this area.


My measurements show method1 to be smaller and faster than method2 due 
to the number of instructions needed to generate the constant masks in 
method2, but it may depend upon your compiler and hardware.  Both are 
faster than the original with isel.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc: Add cpu family documentation

2014-02-06 Thread James Yang
On Sat, 1 Feb 2014, Michael Ellerman wrote:

 This patch adds some documentation on the different cpu families
 supported by arch/powerpc.
 
 Signed-off-by: Michael Ellerman m...@ellerman.id.au
 ---
 v2: Reworked formatting to avoid wrapping.
 Fixed up Freescale details.
 
 
  Documentation/powerpc/cpu_families.txt | 227 
 +
  1 file changed, 227 insertions(+)
  create mode 100644 Documentation/powerpc/cpu_families.txt
 
 diff --git a/Documentation/powerpc/cpu_families.txt 
 b/Documentation/powerpc/cpu_families.txt
 new file mode 100644
 index 000..fa4f159
 --- /dev/null
 +++ b/Documentation/powerpc/cpu_families.txt
 @@ -0,0 +1,227 @@
 +CPU Families
 +
 +
 +This document tries to summarise some of the different cpu families that 
 exist
 +and are supported by arch/powerpc.

I think there are more CPUs that exist(ed), but aren't supported by 
arch/powerpc.  602, Exponential, and there were a few others.  Did you 
want to include those?

Do the arrows' endpoints have a particular meaning?  

Does the direction of the arrows (left vs. down) have a meaning?

What's the meaning of the box?

Do you want to have all of the minor derivatives or just major 
implementations?  Some derivatives were just faster versions, others 
have microarchitectural changes, and some have additional registers or 
removed interfaces.  

There are a lot of ways to delineate this. 



 +
 +
 +Book3S (aka sPAPR)
 +--
 +
 + - Hash MMU
 + - Mix of 32  64 bit
 +
 +   +--+  ++
 +   |  Old POWER   | --- | RS64 (threads) |
 +   +--+  ++
 +  |
 +  |
 +  v
 +   +--+  ++
 +---+
 +   | 601  | --- |  603   | - |  
 740  |
 +   +--+  ++
 +---+

There was also at least 603e, 603ev, EC603e, and then offshoot 
G2 core, then the offshoot of that e300c1/e300c2/e300c3/e300c4.  I 
might be missing one here.

740 is a 750 but without the L2, so it would probably fit 
better off pointing out of the 750 box. Then there is the 755/745 
which have more BATs, L2 features, etc.  Missing from the IBM side is 
750CXe, 750GX, 750GL, but I don't know the lineage of those.  Do you 
want the Nintendo stuff too?  RAD750?


 +  |  |
 +  |  |
 +  v  v
 +   +--+  ++
 +---+
 +   | 604  |  |750 (G3)| - | 
 750CX |
 +   +--+  ++
 +---+

Also 604e, and I guess 604ev?



 +  |  | |
 +  |  | |
 +  v  v v
 +   +--+  ++
 +---+
 +   | 620 (64 bit) |  |  7400  || 
 750CL |
 +   +--+  ++
 +---+
 +  |  | |
 +  |  | |
 +  v  v v
 +   +--+  ++
 +---+
 +   |  POWER3/630  |  |  7410  || 
 750FX |
 +   +--+  ++
 +---+


7400/7410 mostly the same thing


 +  |  |
 +  |  |
 +  v  v
 +   +--+  ++
 +   |   POWER3+|  |  7450  |
 +   +--+  ++
 +  |  |
 +  |  |
 +  v  v
 +   +--+  ++
 +   |POWER4|  |  7455  |
 +   +--+  ++
 +  |  |
 +  |  |
 +  v  v
 +   +--+

Re: BookE branch taken behavior vis-a-vis updating the NIP register

2013-11-13 Thread James Yang
On Tue, 12 Nov 2013, pegasus wrote:

 So, off I went and downloaded the latest version of 
 arch/powerpc/kernel/traps.c hoping to see those very changes in 
 them. However it didn't match one on one with what was written in 
 that thread. Ditto for the other files in your patch. Looks like 
 your patch didn't make it to upstream but it looks exactly like what 
 I need here. 

The patches were posted RFC -- request for comment.  Thank you for 
posting your comments.


  So requesting PTRACE_CONT has to happen inside the SIGTRAP signal 
 handler right? 

After you get the SIGTRAP from the branch, yeah.


 Now as for the second patch, as far as I can see, implements the 
 same functionality. However it makes the change permanent and any 
 tool which is used to the NIP pointing to the branch target will be 
 broken.

The second patch places the burden of determining if you are BookE or 
server on the tool.  I feel this is important because the Server and 
Book-E branch exceptions are fundamentally incompatible, and the 
behavior of PTRACE_SINGLEBLOCK can not be made to be identical by the 
kernel for both subarch, so a tool will have to adjust accordingly.


 Anyways, for me either of them will work. But I think the first patch makes
 everyone happy by using the 'addr' field of ptrace. This also means I will
 have to make my (broken) ptrace working which, it seems is not as easy
 adding an enum field as you suggested. May be theres a check somewhere in
 the actual ptrace code which checks for illegal values and hence even after
 adding an enum, it is being reported as illegal in my case. However getting
 that to work is another story.

Actually, I provided the first patch to show why it probably should 
not be done even though technically possible, since it requires ptrace 
API to be extended to now recognize addr field, needs a TIF flag, etc.  
The second patch seems to me more reasonable to support, since nobody 
has come forth to say that there are actually any tools that rely upon 
the existing hack for BookE or even PTRACE_SINGLEBLOCK.  And the 
existing hack's behavior makes things somewhat useless on BookE, as 
you and I have confirmed.

 
 Please confirm my understanding of your patches and since these 
 patches have not made their way to the upstream kernel, will have to 
 use them myself directly. By the way, I'm using 2.6.32.10 (you 
 know..the long-term kernel) and I couldn't find any of your changes 
 in them but then again I couldn't find it in the latest 3.12 version 
 either.


You will have to backport the patches to your kernel if you want to 
use them.  Both patches change the behavior of existing API, and I 
guess we are still waiting to hear if anyone cares which way it should 
be fixed.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: BookE branch taken behavior vis-a-vis updating the NIP register

2013-11-11 Thread James Yang
On Mon, 11 Nov 2013, pegasus wrote:

 And this is due to the pipelining feature that is inherent in all 
 processors.

No, it has nothing to do with pipelining at all.  It is just the 
convention that IBM defined as to how the branch taken exception works 
in BookE Power ISA.  The pipeline behavior is not visible from the 
architected state.

 
 But I still have a question about how one would then be able to 
 signal to the userspace who might be interpreting this information 
 differently? I mean if SRR0 contains, not the branch instruction 
 address but the address of the branch target, how would any debugger 
 be able to catch function calls?

On Server and the BookE-mimicking-Server, PT_NIP would point to the 
start of the function you are calling.  However, you do not get the 
callsite address unless you know you stopped after a function-calling 
branch-and-link executed, single step again and read the LR and 
subtract 4.  If you don't know if the branch that caused the exception 
was actually a branch-and-link, you don't know if you are at the 
beginning of a function or not, unless you keep a table of the address 
of all possible functions and look up each PT_NIP to see if you are.  
But that wouldn't be 100% accurate either, since a non-linking branch 
could just redirect to the beginning of the function.  In the other 
thread, Ben said they have the CFAR register, but from the way it is 
described in the ISA document, you will not always get the address of 
the callsite, e.g. if your target is in the same cache black, the CFAR 
might not change.  So, I don't think it can work in a simple way.

On BookE with my patch to NOT mimic server, you will stop before the 
branch executes, so PT_NIP points to the branch itself, so that there 
is your callsite information.  If you want the function target, you 
decode the instruction to see if it is a branch-and-link, and then 
compute the target address or manually singlestep and grab the new 
PT_NIP.


 May be there is a trick involved here and hence gdb or for that 
 matter the other debuggers are still in the market. 

Per the other thread, and my inquiry through my company's gdb experts 
and upstream on the gdb mailing list, gdb does NOT use hardware branch 
tracing, at least not using PTRACE_SINGLEBLOCK.  I don't personally 
know of any debuggers that use it in the form that it exists in Linux.


 But then I would be immensely obliged if you could shed some light 
 on how is this accomplished. Lets say I am waiting at the userspace 
 in my own sigtrap, to watch out for branch instructions. Lets say I 
 want to profile my code to get to know how many branch instructions 
 it has generated. How could I ever do that using my own custom 
 SIGTRAP handler?

Lots of ways to skin this cat, but I don't think any of them are easy 
or simple.

- Using valgrind is probably the easiest way to get this information 
on an unmodified executable.

- You may want to recompile your program with call-arch profiling if 
you have source code.  gcc -fprofile-arcs but read the docs.

- Another method is to set up performance monitor counter to interrupt 
after a branch has executed, but callsite information may be lost as 
well and you still have the issue of discerning whether you just 
called a function or not.  

- You can use my patch for BookE, but it's just an RFC.  Also, using 
PTRACE_SINGLEBLOCK is slow.  You could also stop on every 
PTRACE_SINGLESTEP, but this is even slower.



 Coming on to PTRACE_SINGLESTEP, the sysroot that has been provided 
 to us by our vendor does not include a PTRACE_SINGLEBLOCK in 
 sys/ptrace.h:

 Although I can clearly see that PTRACE_SINGLEBLOCK is supported in the
 kernel. 
 
 Hence I am not able to compile this simple program in userspace:

 Heres the error I get:
 testptrace.c: In function 'int main()':
 testptrace.c:47: error: invalid conversion from 'int' to '__ptrace_request'
 testptrace.c:47: error:   initializing argument 1 of 'long int
 ptrace(__ptrace_request, ...)'
 make: *** [testptrace] Error 1
 
 How should I go about using ptrace to test this? 

I don't know. You'll probably have to request support from your vendor 
or whoever provided you with those headers.  If you are using a glibc 
based toolchain, your glibc is probably out-of-date. You can try 
adding PTRACE_SINGLEBLOCK into the enum definition just to get past 
the compiler syntax error, but I don't know if that will break things 
or not in your libc ptrace().  You may have to write your own function 
that calls the __ptrace syscall directly if the libc ptrace() does 
something to the request or the response.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: BookE branch taken behavior vis-a-vis updating the NIP register

2013-11-10 Thread James Yang
On Fri, 8 Nov 2013, pegasus wrote:

 Hello.
 
 I was reading the source code for the debug exception under powerpc. I saw
 that BookE processors stop before (actually) taking a branch. Hence in order
 to force it to take that branch and then stop, the source code for it had to
 be hacked'  to (temporarily) enable single step until the branch
 instruction has been taken, thereby mimicing the BookS behavior. 

Have a look at this thread:

https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-July/108618.html


 By doing this, I believe we would want the exception to be triggered after
 the branch has been successfully taken. Hence I put a printk to print the
 value of the instruction that actually caused the exception. I was assuming
 that initially a debug breakpoint would be hit on the branch instruction
 (assuming the branch is supposed to be taken). Now since at this point in
 time, the branch instruction has NOT finished, the kernel, after merely
 disabling the BT bit in DBCR0 and enabling IC bit in DBCR0, returns. At this
 point I was assuming we will see another exception at the very same
 instruction in userspace. However, when printing the NIP it becomes clear
 that when it gets to the debug exception handler after being (temporarily)
 set to single step, NIP points to the instruction after the branch
 instruction. To me, it appears that, after disabling BT (branch taken) debug
 event monitoring (and enabling single stepping), it does not catch an
 exception at that very same branch instruction, instead it catches an
 exception for the subsequent instruction. Sorry for the repetition but I
 wanted to clarify what I am seeing here.

 May be this IS the way it is supposed to behave (which means my thinking
 about it is flawed). I am a bit confused here. 

You have it correct, that is the behavior of what is there.  The 
Server branch execute debug exception occurs after the branch 
completes, and the SRR0 (the value in PT_NIP) points to the address of 
the instruction that is to be executed after the branch.  BookE branch 
exception occurs before the branch is executed, but only if the branch 
will be taken.  The hack tries to make BookE return an exception to 
the process with the same timing as Server: exception signal sent to 
process after the branch completes.

I think some of the confusion is due to the existence of the hack.  
Its existence implies that BookE can perfectly emulate Server, but 
this is not possible because BookE would only take the exception if 
the branch's condition would cause the branch to be actually taken 
(that the branch condition, if present, is true). Therefore, even with 
this hack, untaken branches are not signaled at all on BookE while 
they would on Server.

Note that PTRACE_SINGLESTEP works the same on both Server and BookE. 
The exception occurs after an instruction completes, and the SRR0 / 
NIP points to the instruction that will be executed next.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH] [RFC] Emulate lwsync to run standard user land on e500 cores

2013-10-25 Thread James Yang
On Fri, 25 Oct 2013, David Laight wrote:

  This is not a distro issue. It's a libstdc++ portability issue. libstdc++
  hardcodes lwsync unless __NO_LWSYNC__ is explicitly defined,
  which you only get with -mcpu=8540/-mcpu=8548.  When compiled
  for any powerpc target other than -mcpu=8540/-mcpu=8548, including
  the default -mcpu=common,  libstdc++ will end up containing lwsync.
  There is no way to explicitly request libstdc++ to be built without lwsync
  with an -mcpu target other than 8540/8548.
  
  The issue is easily demonstrated by running a program that throws a
  C++ exception: __cxa_throw() is called, which has an lwsync.  This
  results in an illegal instruction exception when run on an e500v1/e500v2.
 
 Perhaps libstc++ should be working out at run time whether lwsync is 
 valid?

lwsync has been in libstdc++ for over 8 years so there's a substantial 
amount of legacy binary code that exists such that changing libstdc++ 
now won't solve the problem.

This isn't a performance issue, it's a functional issue -- libstdc++ 
for -mcpu=common targets doesn't work on e500v1/e500v2. (QorIQ P4080 
that have e500mc and newer cores support lwsync so this is no longer 
an issue.)

If one /really/ cared about losing performance while running common 
target binaries, emulating lwsync is an inconsequential performance 
loss compared to the kernel doing floating point emulation.

Recompiling libstdc++ (and all your other userland) with -mcpu=8548 is 
what you'd have to do to avoid classic FP and use Embedded FP or SPE, 
and if you are doing that you'll get sync instead of lwsync in 
libstdc++.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] [RFC] Emulate lwsync to run standard user land on e500 cores

2013-10-25 Thread James Yang
On Fri, 25 Oct 2013, Scott Wood wrote:

 Has anyone measured how much this slows things down with a typical
 userspace?

Not a measurement of the patch in question but an older similar patch 
on 3.0.51 (8572 running Debian 5.0.3):

$ ./test-lwsync 
cycles per emulated lwsync = 371
cycles per sync = 36

--
#include stdio.h

int main (void) {
unsigned long atb_start, atb_stop;
unsigned long i;

asm volatile (mfspr %0,526 : =r (atb_start));
for (i = 0; i  100; i++) {
asm volatile (lwsync);
}
asm volatile (mfspr %0,526 : =r (atb_stop));
printf(cycles per emulated lwsync = %lu\n, (atb_stop - 
atb_start) / 100);

asm volatile (mfspr %0,526 : =r (atb_start));
for (i = 0; i  100; i++) {
asm volatile (sync);
}
asm volatile (mfspr %0,526 : =r (atb_stop));

printf(cycles per sync = %lu\n, (atb_stop - atb_start) / 
100);

return 0;
}
--





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] [RFC] Emulate lwsync to run standard user land on e500 cores

2013-10-24 Thread James Yang
On Thu, 24 Oct 2013, Kumar Gala wrote:

 On Oct 24, 2013, at 4:45 AM, Benjamin Herrenschmidt wrote:
 
  On Wed, 2013-10-23 at 23:06 -0500, Kumar Gala wrote:
  On Oct 23, 2013, at 5:15 AM, Scott Wood wrote:
  
  On Wed, 2013-10-23 at 00:07 -0500, Kumar Gala wrote:
  On Oct 18, 2013, at 2:38 AM, Wolfgang Denk wrote:
  diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
  index f783c93..f330374 100644
  --- a/arch/powerpc/kernel/traps.c
  +++ b/arch/powerpc/kernel/traps.c
  @@ -986,6 +986,13 @@ static int emulate_instruction(struct pt_regs 
  *regs)
  return 0;
  }
  
  +   /* Emulating the lwsync insn as a sync insn */
  +   if (instword == PPC_INST_LWSYNC) {
  +   PPC_WARN_EMULATED(lwsync, regs);
  +   asm volatile(sync : : : memory);
  
  Do we really need the inline asm?  Doesn't the fact of just taking an 
  exception and returning from it equate to a sync.
  
  No, it doesn't equate to a sync.  See the discussion here:
  http://patchwork.ozlabs.org/patch/256747/
  
  
  Thanks. 
  
  I'm not sure I'm a fan of doing this as it silently hides a 
  significant performance impact.
  
  Could we possible re-write the userspace instruction to be a 
  'sync' when we hit this?
  
  Rewriting user space is a can of worms I wouldn't get into ... is 
  any other arch doing it ?
 
 Fair enough
  
  I'm not too worried as long as we warn and account them.
 
 Than, I'd ask this be under a Kconfig option that is disabled by 
 default.  Users should have to explicitly enable this so they know 
 what they are doing.


I think it should be enabled by default, rather than disabled, so that 
users would actually see a warning rather than get a sig 4.  Or, let 
it not be Kconfig-able so that this doesn't become a problem any more. 
(It's been 4 years since I sent to you an earlier version of this 
patch.)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc/ppc64: remove __volatile__ in get_current()

2013-08-23 Thread James Yang
On Sat, 10 Aug 2013, James Yang wrote:

 Uses of get_current() that normally get optimized away still result in
 a load instruction of the current pointer in 64-bit because the inline
 asm uses __volatile__.  This patch removes __volatile__ so that nop-ed
 uses of get_current() don't actually result in a load of the pointer.
 
 Signed-off-by: James Yang james.y...@freescale.com
 ---
  arch/powerpc/include/asm/current.h |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/current.h 
 b/arch/powerpc/include/asm/current.h
 index e2c7f06..bb250c8 100644
 --- a/arch/powerpc/include/asm/current.h
 +++ b/arch/powerpc/include/asm/current.h
 @@ -19,7 +19,7 @@ static inline struct task_struct *get_current(void)
  {
   struct task_struct *task;
  
 - __asm__ __volatile__(ld %0,%1(13)
 + __asm__ (ld %0,%1(13)
   : =r (task)
   : i (offsetof(struct paca_struct, __current)));


Hello, 

Scott's been able to put enough doubt in me to think that this is not 
entirely safe, even though the testing and code generation show it to 
work.  Please reject this patch.

I think there is still value in getting the unnecessary loads to be 
removed since it would also allow unnecessary conditional branches to 
be removed.  I'll think about alternate ways to do this.

Regards,

--James

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc/ppc64: remove __volatile__ in get_current()

2013-08-09 Thread James Yang
Uses of get_current() that normally get optimized away still result in
a load instruction of the current pointer in 64-bit because the inline
asm uses __volatile__.  This patch removes __volatile__ so that nop-ed
uses of get_current() don't actually result in a load of the pointer.

Signed-off-by: James Yang james.y...@freescale.com
---
 arch/powerpc/include/asm/current.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/current.h 
b/arch/powerpc/include/asm/current.h
index e2c7f06..bb250c8 100644
--- a/arch/powerpc/include/asm/current.h
+++ b/arch/powerpc/include/asm/current.h
@@ -19,7 +19,7 @@ static inline struct task_struct *get_current(void)
 {
struct task_struct *task;
 
-   __asm__ __volatile__(ld %0,%1(13)
+   __asm__ (ld %0,%1(13)
: =r (task)
: i (offsetof(struct paca_struct, __current)));
 
-- 
1.7.0.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc/booke: clear DBCR0_BT in user_disable_single_step()

2013-07-05 Thread James Yang
BookE version of user_disable_single_step() clears DBCR0_IC for the
instruction completion debug, but did not also clear DBCR0_BT for the
branch taken exception.  This behavior was lost by the 2/2010 patch.

Signed-off-by: James Yang james.y...@freescale.com
---
 arch/powerpc/kernel/ptrace.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 98c2fc1..c634970 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -894,7 +894,7 @@ void user_disable_single_step(struct task_struct *task)
 * And, after doing so, if all debug flags are off, turn
 * off DBCR0(IDM) and MSR(DE)  Torez
 */
-   task-thread.dbcr0 = ~DBCR0_IC;
+   task-thread.dbcr0 = ~(DBCR0_IC|DBCR0_BT);
/*
 * Test to see if any of the DBCR_ACTIVE_EVENTS bits are set.
 */
-- 
1.7.0.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH 0/2] powerpc/booke: PTRACE_SINGLEBLOCK support for BookE

2013-07-05 Thread James Yang
PTRACE_SINGLEBLOCK support for BookE currently stops on the instruction
after taken branches.  This is different from the behavior on Server
where it stops after all branches.  

BookE was made to simulate Server by taking a single step after the
branch taken exception.  It is understood that the reason for making
PTRACE_SINGLEBLOCK on BookE to simulate Server was to make the semantics
exposed to user space identicial on both, but this is not really
possible due to the fundamental difference that untaken branches do not
trigger the branch taken exception in BookE.

BookE ISA's branch taken exception triggers before a branch that will be
taken executes.  This allows software to examine the branch and the
conditions under which it will be taken.  It also means software can
tell where basic blocks end (at least the ones which are terminated by
taken branches).  There are no architected registers that report the
address of the branch instruction after it has executed.

Server's branch trace exception triggers after a branch executes
regardless of whether or not it was taken.  The exception stops on the
instruction after fall-through branches.

Two mutually-exclusive patches are provided for RFC that expose BookE's
branch taken debug exception behavior accessible through
PTRACE_SINGLEBLOCK:  

- The first patch keeps the semantic behavior of the existing support by
  using the ptrace() addr parameter to select between the modes.  This
  requires a new bit in the TIF as well as changes in kernel/ptrace.c.

- The second patch makes PTRACE_SINGLEBLOCK reflect the BookE native
  behavior, which stops on the branch instruction.  The changes are
  isolated to arch/powerpc/kernel/traps.c.

IMHO, the only reason not to do the 2nd patch would be to maintain
compatibility for any tools that actually rely on the inaccurate
simulation of Server's behavior when run on a BookE system.  Are there
any tools that actually rely upon the behavior currently implemented for
BookE in Linux -- SIGTRAP only after taken branches?  Even if there are,
it should be possible to modify such a tool to issue a PTRACE_SINGLESTEP
after receiving the SIGTRAP on the branch to retain equivalent
functionality.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH 2/2] powerpc/booke: revert PTRACE_SINGLEBLOCK to BookE behavior

2013-07-05 Thread James Yang
A BookE branch taken debug exception followed by a single step does not
accurately simulate Server's branch execute debug exception.  BookE's
branch taken debug exception stops before the branch is to be executed
and only happens if the branch will actually be taken.  Server's branch
execute trace exception stops on the instruction after the branch
executes, regardless of whether or not the branch redirected the program
counter.

The existing PTRACE_SINGLEBLOCK support for BookE hardcodes a single
step after the branch taken exception is taken in order to simulate
Server's behavior, but this misses fall-through branch instructions
(i.e., branches that are NOT taken).  Also, the si_code became masked as
TRAP_TRACE instead of TRAP_BRANCH.

This patch provides native support for the BookE branch taken debug
exception's behavior:  PTRACE_SINGLEBLOCK stops with a SIGTRAP before a
branch-that-would-be-taken would execute.  Userspace software will be
able to examine the process state upon catching the SIGTRAP, and it
will need to issue a PTRACE_SINGLESTEP or PTRACE_CONT to resume program
execution past the branch.

Signed-off-by: James Yang james.y...@freescale.com
---
 arch/powerpc/kernel/traps.c |   40 +++-
 1 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index c3ceaa2..5837d7f 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1518,12 +1518,21 @@ void __kprobes DebugException(struct pt_regs *regs, 
unsigned long debug_status)
 {
current-thread.dbsr = debug_status;
 
-   /* Hack alert: On BookE, Branch Taken stops on the branch itself, while
-* on server, it stops on the target of the branch. In order to simulate
-* the server behaviour, we thus restart right away with a single step
-* instead of stopping here when hitting a BT
+   /* BookE's Branch Taken Debug Exception stops on the branch iff the
+* branch is resolved to be taken.  No exception occurs if the branch
+* is not taken (no exception if the branch does not redirect the PC).
+* This is unlike Classic/Server's behavior where the exception occurs
+* after the branch executes, regardless of whether or not the branch
+* redirected the PC.
+*
+* The past behavior of this function was to simulate Classic/Server's
+* behavior by performing a single-step upon a branch taken exception.
+* However, the simulation is not accurate because fall-through non-
+* taken branches would not result in a SIGTRAP.  Also, that SIGTRAP's
+* si_code would be reported as a TRAP_TRACE instead of a TRAP_BRANCH.
 */
-   if (debug_status  DBSR_BT) {
+
+   if (debug_status  DBSR_BT) {   /* Branch Taken */
regs-msr = ~MSR_DE;
 
/* Disable BT */
@@ -1531,20 +1540,25 @@ void __kprobes DebugException(struct pt_regs *regs, 
unsigned long debug_status)
/* Clear the BT event */
mtspr(SPRN_DBSR, DBSR_BT);
 
-   /* Do the single step trick only when coming from userspace */
-   if (user_mode(regs)) {
-   current-thread.dbcr0 = ~DBCR0_BT;
-   current-thread.dbcr0 |= DBCR0_IDM | DBCR0_IC;
-   regs-msr |= MSR_DE;
-   return;
-   }
-
if (notify_die(DIE_SSTEP, block_step, regs, 5,
   5, SIGTRAP) == NOTIFY_STOP) {
return;
}
+
if (debugger_sstep(regs))
return;
+
+   if (user_mode(regs)) {
+   current-thread.dbcr0 = ~DBCR0_BT;
+   if (DBCR_ACTIVE_EVENTS(current-thread.dbcr0,
+  current-thread.dbcr1))
+   regs-msr |= MSR_DE;
+   else
+   /* Make sure the IDM bit is off */
+   current-thread.dbcr0 = ~DBCR0_IDM;
+   }
+
+   _exception(SIGTRAP, regs, TRAP_BRANCH, regs-nip);
} else if (debug_status  DBSR_IC) {/* Instruction complete */
regs-msr = ~MSR_DE;
 
-- 
1.7.0.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH 1/2] powerpc/booke: extend PTRACE_SINGLEBLOCK for BookE Branch Taken Debug

2013-07-05 Thread James Yang
A BookE branch taken debug exception followed by a single step does not
accurately simulate Server's branch execute debug exception.  BookE's
branch taken debug exception stops before the branch is to be executed
and only happens if the branch will actually be taken.  Server's branch
execute trace exception stops on the instruction after the branch
executes, regardless of whether or not the branch redirected the program
counter.

The existing PTRACE_SINGLEBLOCK support for BookE hardcodes a single
step after the branch taken exception is taken in order to simulate
Server's behavior, but this misses fall-through branch instructions
(i.e., branches that are NOT taken).  Also, the si_code became masked as
TRAP_TRACE instead of TRAP_BRANCH.

This patch makes available the unmodified BookE branch taken debug
exception through PTRACE_SINGLEBLOCK if the ptrace() addr parameter is
set to 2.  (The existing behavior of PTRACE_SINGLEBLOCK is retained for
any other addr parameter value, e.g., 0.)  SIGTRAP will be signaled with
the NIP pointing to the branch instruction before it has executed.  The
ptrace-calling program can then examine the program state.  It should
then request a PTRACE_SINGLESTEP in order to advance the program to the
next instruction or a PTRACE_CONT to resume normal program execution.
The si_code now also reports TRAP_BRANCH.

Signed-off-by: James Yang james.y...@freescale.com
---
 arch/powerpc/include/asm/thread_info.h |3 ++
 arch/powerpc/kernel/traps.c|   51 ---
 kernel/ptrace.c|   13 ++--
 3 files changed, 52 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index ba7b197..ab7c257 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -107,6 +107,8 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_EMULATE_STACK_STORE16  /* Is an instruction emulation
for stack store? */
 #define TIF_MEMDIE 17  /* is terminating due to OOM killer */
+#define TIF_BOOKE_SINGLEBLOCK  18  /* Do not advance PC after Branch Taken
+  debug exception for BookE */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE (1TIF_SYSCALL_TRACE)
@@ -125,6 +127,7 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_UPROBE(1TIF_UPROBE)
 #define _TIF_SYSCALL_TRACEPOINT(1TIF_SYSCALL_TRACEPOINT)
 #define _TIF_EMULATE_STACK_STORE   (1TIF_EMULATE_STACK_STORE)
+#define _TIF_BOOKE_SINGLEBLOCK (1TIF_BOOKE_SINGLEBLOCK)
 #define _TIF_NOHZ  (1TIF_NOHZ)
 #define _TIF_SYSCALL_T_OR_A(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index c3ceaa2..ee899b3 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1518,11 +1518,14 @@ void __kprobes DebugException(struct pt_regs *regs, 
unsigned long debug_status)
 {
current-thread.dbsr = debug_status;
 
-   /* Hack alert: On BookE, Branch Taken stops on the branch itself, while
-* on server, it stops on the target of the branch. In order to simulate
-* the server behaviour, we thus restart right away with a single step
-* instead of stopping here when hitting a BT
+   /* Hack alert: On BookE, Branch Taken stops on the branch itself iff the
+* branch will be taken, while on server, it stops on the target of the
+* branch, regardless of whether or not the branch was taken. In order
+* to simulate the server behaviour (at least for taken branches), we
+* thus restart right away with a single step instead of stopping here
+* when hitting a BT.
 */
+
if (debug_status  DBSR_BT) {
regs-msr = ~MSR_DE;
 
@@ -1531,20 +1534,44 @@ void __kprobes DebugException(struct pt_regs *regs, 
unsigned long debug_status)
/* Clear the BT event */
mtspr(SPRN_DBSR, DBSR_BT);
 
-   /* Do the single step trick only when coming from userspace */
-   if (user_mode(regs)) {
-   current-thread.dbcr0 = ~DBCR0_BT;
-   current-thread.dbcr0 |= DBCR0_IDM | DBCR0_IC;
-   regs-msr |= MSR_DE;
-   return;
-   }
-
if (notify_die(DIE_SSTEP, block_step, regs, 5,
   5, SIGTRAP) == NOTIFY_STOP) {
return;
}
+
if (debugger_sstep(regs))
return;
+
+   if (user_mode(regs)) {
+
+   /* Do the single step trick only when coming from
+* userspace

Re: [RFC][PATCH 2/2] powerpc/booke: revert PTRACE_SINGLEBLOCK to BookE behavior

2013-07-05 Thread James Yang
On Sat, 6 Jul 2013, Benjamin Herrenschmidt wrote:

 On Fri, 2013-07-05 at 17:11 -0500, James Yang wrote:
  A BookE branch taken debug exception followed by a single step does not
  accurately simulate Server's branch execute debug exception.  BookE's
  branch taken debug exception stops before the branch is to be executed
  and only happens if the branch will actually be taken.  Server's branch
  execute trace exception stops on the instruction after the branch
  executes, regardless of whether or not the branch redirected the program
  counter.
  
  The existing PTRACE_SINGLEBLOCK support for BookE hardcodes a single
  step after the branch taken exception is taken in order to simulate
  Server's behavior, but this misses fall-through branch instructions
  (i.e., branches that are NOT taken).  Also, the si_code became masked as
  TRAP_TRACE instead of TRAP_BRANCH.
 
 But that changes the user visible behaviour, won't that break gdb
 expectations ?

I am having a difficult time finding any use of PTRACE_SINGLEBLOCK in 
any arch in the various versions of gdb source trees I downloaded.  
Would you please provide a pointer or at least a hint as to where you 
think it would be?  I don't know gdb internals at all, but grepping 
the sources for PTRACE_SINGLEBLOCK returns nothing.


 Another way to fix it is to instead use lib/sstep.c to emulate the
 single step maybe ?

I don't think there's any issue any more with the hard coded single 
step with the fixes that Scott and Bharat recently provided.  
Actually, I don't even know if this feature was practically useful on 
BookE before those fixes due to the hangs and non-deterministic 
behavior.

Hypothetically, sstep.c could let server to emulate BookE's behavior..


 On the other hand, I tend to think that trapping before the branch is
 actually more useful especially if you don't have the CFAR register.

And there's no exception for fall-through branches.

So, yeah, this really is the question:  are there actually any tools 
that rely on PTRACE_SINGLEBLOCK behaving in the different ways it 
currently does on the two Power subarchiectures?  For BookE targes, 
how do they handle not being able to catch fall-through branches?  
For server targets, how do they capture the old LR value for blrl 
after the branch?  (I'm guessing not even sstep emulation can provide 
this information, though it might not be practically useful.)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc/math-emu: fix load/store indexed emulation

2013-07-04 Thread James Yang
Load/store indexed instructions where the index register RA=R0, such
as lfdx f1,0,r3, are not illegal.

Load/store indexed with update instructions where the index register
RA=R0, such as lfdux f1,0,r3, are invalid, and, to be consistent
with existing math-emu behavior for other invalid instruction forms,
will signal as illegal.
---
 arch/powerpc/math-emu/math.c |   15 +--
 1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/math-emu/math.c b/arch/powerpc/math-emu/math.c
index 164d559..eabce90 100644
--- a/arch/powerpc/math-emu/math.c
+++ b/arch/powerpc/math-emu/math.c
@@ -410,21 +410,16 @@ do_mathemu(struct pt_regs *regs)
case XE:
idx = (insn  16)  0x1f;
op0 = (void *)current-thread.TS_FPR((insn  21)  0x1f);
-   if (!idx) {
-   if (((insn  1)  0x3ff) == STFIWX)
-   op1 = (void *)(regs-gpr[(insn  11)  0x1f]);
-   else
-   goto illegal;
-   } else {
-   op1 = (void *)(regs-gpr[idx] + regs-gpr[(insn  11) 
 0x1f]);
-   }
-
+   op1 = (void *)((idx ? regs-gpr[idx] : 0)
+   + regs-gpr[(insn  11)  0x1f]);
break;
 
case XEU:
idx = (insn  16)  0x1f;
+   if (!idx)
+   goto illegal;
op0 = (void *)current-thread.TS_FPR((insn  21)  0x1f);
-   op1 = (void *)((idx ? regs-gpr[idx] : 0)
+   op1 = (void *)(regs-gpr[idx]
+ regs-gpr[(insn  11)  0x1f]);
break;
 
-- 
1.7.0.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2] powerpc/math-emu: fix load/store indexed emulation

2013-07-04 Thread James Yang
Load/store indexed instructions where the index register RA=R0, such
as lfdx f1,0,r3, are not illegal.

Load/store indexed with update instructions where the index register
RA=R0, such as lfdux f1,0,r3, are invalid, and, to be consistent
with existing math-emu behavior for other invalid instruction forms,
will signal as illegal.

Signed-off-by: James Yang james.y...@freescale.com
---
 arch/powerpc/math-emu/math.c |   15 +--
 1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/math-emu/math.c b/arch/powerpc/math-emu/math.c
index 164d559..eabce90 100644
--- a/arch/powerpc/math-emu/math.c
+++ b/arch/powerpc/math-emu/math.c
@@ -410,21 +410,16 @@ do_mathemu(struct pt_regs *regs)
case XE:
idx = (insn  16)  0x1f;
op0 = (void *)current-thread.TS_FPR((insn  21)  0x1f);
-   if (!idx) {
-   if (((insn  1)  0x3ff) == STFIWX)
-   op1 = (void *)(regs-gpr[(insn  11)  0x1f]);
-   else
-   goto illegal;
-   } else {
-   op1 = (void *)(regs-gpr[idx] + regs-gpr[(insn  11) 
 0x1f]);
-   }
-
+   op1 = (void *)((idx ? regs-gpr[idx] : 0)
+   + regs-gpr[(insn  11)  0x1f]);
break;
 
case XEU:
idx = (insn  16)  0x1f;
+   if (!idx)
+   goto illegal;
op0 = (void *)current-thread.TS_FPR((insn  21)  0x1f);
-   op1 = (void *)((idx ? regs-gpr[idx] : 0)
+   op1 = (void *)(regs-gpr[idx]
+ regs-gpr[(insn  11)  0x1f]);
break;
 
-- 
1.7.0.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Emulate sync instruction variants

2013-07-04 Thread James Yang
On Thu, 4 Jul 2013, Benjamin Herrenschmidt wrote:

 On Thu, 2013-07-04 at 09:31 +0100, David Laight wrote:
  Do you need to execute 'sync' here?
  It is worth checking whether the trap entry/exit doesn't
  do an implicit one for you.
 
 Not really. It does an implicit isync (more than one even) but not a 
 sync.

The execution of a sync is required because the original sync variant 
instruction is not actually executed, and, per Ben, no other explicit 
sync exists in the exception handler.  If the ISA extends the sync 
instruction using its reserved bits, the intent would also be that 
executing a heavyweight sync is suitably compatible, otherwise they 
should have modified some other synchronization instruction.  This 
patch ensures at least one sync instruction is executed for each sync 
variant instruction executed that triggers a program exception.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] Fix string emulation for 32-bit process on ppc64

2013-06-25 Thread James Yang
String instruction emulation would erroneously result in a segfault if
the upper bits of the EA are set and is so high that it fails access
check.  Truncate the EA to 32 bits if the process is 32-bit.

Signed-off-by: James Yang james.y...@freescale.com
---
 arch/powerpc/kernel/traps.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index dce1bea..c72e7e9 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -840,6 +840,10 @@ static int emulate_string_inst(struct pt_regs *regs, u32 
instword)
u8 val;
u32 shift = 8 * (3 - (pos  0x3));
 
+   /* if process is 32-bit, clear upper 32 bits of EA */
+   if ((regs-msr  MSR_64BIT) == 0)
+   EA = 0x;
+
switch ((instword  PPC_INST_STRING_MASK)) {
case PPC_INST_LSWX:
case PPC_INST_LSWI:
-- 
1.7.0.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev