[PATCH] mm/hugetlb: sanity check gigantic hugepage count
Limit the number of gigantic hugepages specified by the hugepages= parameter to MAX_NUMBER_GPAGES. Signed-off-by: James Yang james.y...@freescale.com --- arch/powerpc/mm/hugetlbpage.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index d67db4b..5ef84a2 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -362,6 +362,13 @@ static int __init do_gpage_early_setup(char *param, char *val, if (size != 0) { if (sscanf(val, %lu, npages) = 0) npages = 0; + if (npages MAX_NUMBER_GPAGES) { + pr_warn(MMU: %lu pages requested for page + size %llu KB, limiting to + __stringify(MAX_NUMBER_GPAGES) \n, + npages, size / 1024); + npages = MAX_NUMBER_GPAGES; + } gpage_npages[shift_to_mmu_psize(__ffs(size))] = npages; size = 0; } -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc: Add cpu family documentation
On Wed, 30 Apr 2014, Scott Wood wrote: On Wed, 2014-04-30 at 16:45 +1000, Michael Ellerman wrote: On Tue, 2014-02-04 at 16:43 -0600, Scott Wood wrote: +Motorola/Freescale 8xx +-- + + - Software loaded with hardware assist. + - All 32 bit + + +--+ + | 8xx | + +--+ + | + | + v + +--+ + | 850 | + +--+ Is the core of MPC850 different from other MPC8xx? Dunno, maybe someone who works at Freescale knows ;) I think they're the same -- I was just wondering if you had some difference in mind that led you to single it out. They are the same. There should not be a separate box that singles out 850. (Still don't know why the diagram was drawn to single out 850 in the first place.) The CPU core should be called MPC8xx Core. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
On Sun, 9 Feb 2014, Stephen N Chivers wrote: James Yang james.y...@freescale.com wrote on 02/08/2014 07:49:40 AM: From: James Yang james.y...@freescale.com To: Gabriel Paubert paub...@iram.es Cc: Stephen N Chivers schiv...@csc.com.au, Chris Proctor cproc...@csc.com.au, linuxppc-dev@lists.ozlabs.org Date: 02/08/2014 07:49 AM Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask? On Fri, 7 Feb 2014, Gabriel Paubert wrote: Hi Stephen, On Fri, Feb 07, 2014 at 11:27:57AM +1000, Stephen N Chivers wrote: Gabriel Paubert paub...@iram.es wrote on 02/06/2014 07:26:37 PM: From: Gabriel Paubert paub...@iram.es To: Stephen N Chivers schiv...@csc.com.au Cc: linuxppc-dev@lists.ozlabs.org, Chris Proctor cproc...@csc.com.au Date: 02/06/2014 07:26 PM Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask? On Thu, Feb 06, 2014 at 12:09:00PM +1000, Stephen N Chivers wrote: With the above mask computation I get consistent results for both the MPC8548 and MPC7410 boards. Am I missing something subtle? No I think you are correct. This said, this code may probably be optimized to eliminate a lot of the conditional branches. I think that: If the compiler is enabled to generate isel instructions, it would not use a conditional branch for this code. (ignore the andi's values, this is an old compile) From limited research, the 440GP is a processor that doesn't implement the isel instruction and it does not implement floating point. The kernel emulates isel and so using that instruction for the 440GP would have a double trap penalty. Are you writing about something outside the scope of this thread? OP was using MPC8548 not a 440GP. The compiler should not be using or targeting 8548 for a 440GP so having to emulate isel shouldn't be an issue because there wouldn't be any. (The assembly listing I posted was generated by gcc targeting 8548.) Anyway, I had measured the non-isel routines to be faster and that works without illop traps. Correct me if I am wrong, the isel instruction first appears in PowerPC ISA v2.04 around mid 2007. isel appeared in 2003 in the e500 (v1) core that is in the MPC8540. The instruction is Power ISA 2.03 (9/2006). ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
On Mon, 10 Feb 2014, Gabriel Paubert wrote: On Fri, Feb 07, 2014 at 02:49:40PM -0600, James Yang wrote: On Fri, 7 Feb 2014, Gabriel Paubert wrote: Hi Stephen, On Fri, Feb 07, 2014 at 11:27:57AM +1000, Stephen N Chivers wrote: Gabriel Paubert paub...@iram.es wrote on 02/06/2014 07:26:37 PM: From: Gabriel Paubert paub...@iram.es To: Stephen N Chivers schiv...@csc.com.au Cc: linuxppc-dev@lists.ozlabs.org, Chris Proctor cproc...@csc.com.au Date: 02/06/2014 07:26 PM Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask? On Thu, Feb 06, 2014 at 12:09:00PM +1000, Stephen N Chivers wrote: mask = 0; if (FM (1 0)) mask |= 0x000f; if (FM (1 1)) mask |= 0x00f0; if (FM (1 2)) mask |= 0x0f00; if (FM (1 3)) mask |= 0xf000; if (FM (1 4)) mask |= 0x000f; if (FM (1 5)) mask |= 0x00f0; if (FM (1 6)) mask |= 0x0f00; if (FM (1 7)) mask |= 0x9000; With the above mask computation I get consistent results for both the MPC8548 and MPC7410 boards. Ok, if you have measured that method1 is faster than method2, let us go for it. I believe method2 would be faster if you had a large out-of-order execution window, because more parallelism can be extracted from it, but this is probably only true for high end cores, which do not need FPU emulation in the first place. Yeah, 8548 can issue 2 SFX instructions per cycle which is what the compiler generated. The other place where we can optimize is the generation of FEX. Here is my current patch: diff --git a/arch/powerpc/math-emu/mtfsf.c b/arch/powerpc/math-emu/mtfsf.c index dbce92e..b57b3fa8 100644 --- a/arch/powerpc/math-emu/mtfsf.c +++ b/arch/powerpc/math-emu/mtfsf.c @@ -11,48 +11,35 @@ mtfsf(unsigned int FM, u32 *frB) u32 mask; u32 fpscr; - if (FM == 0) + if (likely(FM == 0xff)) + mask = 0x; + else if (unlikely(FM == 0)) return 0; - - if (FM == 0xff) - mask = 0x9fff; else { - mask = 0; - if (FM (1 0)) - mask |= 0x9000; - if (FM (1 1)) - mask |= 0x0f00; - if (FM (1 2)) - mask |= 0x00f0; - if (FM (1 3)) - mask |= 0x000f; - if (FM (1 4)) - mask |= 0xf000; - if (FM (1 5)) - mask |= 0x0f00; - if (FM (1 6)) - mask |= 0x00f0; - if (FM (1 7)) - mask |= 0x000f; + mask = (FM 1); + mask |= (FM 3) 0x10; + mask |= (FM 6) 0x100; + mask |= (FM 9) 0x1000; + mask |= (FM 12) 0x1; + mask |= (FM 15) 0x10; + mask |= (FM 18) 0x100; + mask |= (FM 21) 0x1000; + mask *= 15; Needs to also mask out bits 1 and 2, they aren't to be set from frB. mask = 0x9FFF; } - __FPU_FPSCR = ~(mask); - __FPU_FPSCR |= (frB[1] mask); + fpscr = ((__FPU_FPSCR ~mask) | (frB[1] mask)) + ~(FPSCR_VX | FPSCR_FEX); - __FPU_FPSCR = ~(FPSCR_VX); - if (__FPU_FPSCR (FPSCR_VXSNAN | FPSCR_VXISI | FPSCR_VXIDI | + if (fpscr (FPSCR_VXSNAN | FPSCR_VXISI | FPSCR_VXIDI | FPSCR_VXZDZ | FPSCR_VXIMZ | FPSCR_VXVC | FPSCR_VXSOFT | FPSCR_VXSQRT | FPSCR_VXCVI)) - __FPU_FPSCR |= FPSCR_VX; - - fpscr = __FPU_FPSCR; - fpscr = ~(FPSCR_FEX); - if (((fpscr FPSCR_VX) (fpscr FPSCR_VE)) || - ((fpscr FPSCR_OX) (fpscr FPSCR_OE)) || - ((fpscr FPSCR_UX) (fpscr FPSCR_UE)) || - ((fpscr FPSCR_ZX) (fpscr FPSCR_ZE)) || - ((fpscr FPSCR_XX) (fpscr FPSCR_XE))) - fpscr |= FPSCR_FEX; + fpscr |= FPSCR_VX; + + /* The bit order of exception enables and exception status + * is the same. Simply shift and mask to check for enabled + * exceptions. + */ + if (fpscr (fpscr 22) 0xf8) fpscr |= FPSCR_FEX; __FPU_FPSCR = fpscr; #ifdef DEBUG mtfsf.c | 57 ++--- 1 file changed, 22 insertions(+), 35 deletions(-) Notes: 1) I'm really unsure on whether 0xff is frequent or not. So
Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
On Fri, 7 Feb 2014, Gabriel Paubert wrote: Hi Stephen, On Fri, Feb 07, 2014 at 11:27:57AM +1000, Stephen N Chivers wrote: Gabriel Paubert paub...@iram.es wrote on 02/06/2014 07:26:37 PM: From: Gabriel Paubert paub...@iram.es To: Stephen N Chivers schiv...@csc.com.au Cc: linuxppc-dev@lists.ozlabs.org, Chris Proctor cproc...@csc.com.au Date: 02/06/2014 07:26 PM Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask? On Thu, Feb 06, 2014 at 12:09:00PM +1000, Stephen N Chivers wrote: mask = 0; if (FM (1 0)) mask |= 0x000f; if (FM (1 1)) mask |= 0x00f0; if (FM (1 2)) mask |= 0x0f00; if (FM (1 3)) mask |= 0xf000; if (FM (1 4)) mask |= 0x000f; if (FM (1 5)) mask |= 0x00f0; if (FM (1 6)) mask |= 0x0f00; if (FM (1 7)) mask |= 0x9000; With the above mask computation I get consistent results for both the MPC8548 and MPC7410 boards. Am I missing something subtle? No I think you are correct. This said, this code may probably be optimized to eliminate a lot of the conditional branches. I think that: If the compiler is enabled to generate isel instructions, it would not use a conditional branch for this code. (ignore the andi's values, this is an old compile) c0037c2c mtfsf: c0037c2c: 2c 03 00 00 cmpwi r3,0 c0037c30: 41 82 01 1c beq-c0037d4c mtfsf+0x120 c0037c34: 2f 83 00 ff cmpwi cr7,r3,255 c0037c38: 41 9e 01 28 beq-cr7,c0037d60 mtfsf+0x134 c0037c3c: 70 66 00 01 andi. r6,r3,1 c0037c40: 3d 00 90 00 lis r8,-28672 c0037c44: 7d 20 40 9e iseleq r9,r0,r8 c0037c48: 70 6a 00 02 andi. r10,r3,2 c0037c4c: 65 28 0f 00 orisr8,r9,3840 c0037c50: 7d 29 40 9e iseleq r9,r9,r8 c0037c54: 70 66 00 04 andi. r6,r3,4 c0037c58: 65 28 00 f0 orisr8,r9,240 c0037c5c: 7d 29 40 9e iseleq r9,r9,r8 c0037c60: 70 6a 00 08 andi. r10,r3,8 c0037c64: 65 28 00 0f orisr8,r9,15 c0037c68: 7d 29 40 9e iseleq r9,r9,r8 c0037c6c: 70 66 00 10 andi. r6,r3,16 c0037c70: 61 28 f0 00 ori r8,r9,61440 c0037c74: 7d 29 40 9e iseleq r9,r9,r8 c0037c78: 70 6a 00 20 andi. r10,r3,32 c0037c7c: 61 28 0f 00 ori r8,r9,3840 c0037c80: 54 6a cf fe rlwinm r10,r3,25,31,31 c0037c84: 7d 29 40 9e iseleq r9,r9,r8 c0037c88: 2f 8a 00 00 cmpwi cr7,r10,0 c0037c8c: 70 66 00 40 andi. r6,r3,64 c0037c90: 61 28 00 f0 ori r8,r9,240 c0037c94: 7d 29 40 9e iseleq r9,r9,r8 c0037c98: 41 9e 00 08 beq-cr7,c0037ca0 mtfsf+0x74 c0037c9c: 61 29 00 0f ori r9,r9,15 ... However, your other solutions are better. mask = (FM 1); mask |= (FM 3) 0x10; mask |= (FM 6) 0x100; mask |= (FM 9) 0x1000; mask |= (FM 12) 0x1; mask |= (FM 15) 0x10; mask |= (FM 18) 0x100; mask |= (FM 21) 0x1000; mask *= 15; should do the job, in less code space and without a single branch. Each one of the mask |= lines should be translated into an rlwinm instruction followed by an or. Actually it should be possible to transform each of these lines into a single rlwimi instruction but I don't know how to coerce gcc to reach this level of optimization. Another way of optomizing this could be: mask = (FM 0x0f) | ((FM 12) 0x000f); mask = (mask 0x00030003) | ((mask 6) 0x03030303); mask = (mask 0x01010101) | ((mask 3) 0x10101010); mask *= 15; Ok, I finally edited my sources and test compiled the suggestions I gave. I'd say that method2 is the best overall indeed. You can actually save one more instruction by setting mask to all ones in the case FM=0xff, but that's about all in this area. My measurements show method1 to be smaller and faster than method2 due to the number of instructions needed to generate the constant masks in method2, but it may depend upon your compiler and hardware. Both are faster than the original with isel. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc: Add cpu family documentation
On Sat, 1 Feb 2014, Michael Ellerman wrote: This patch adds some documentation on the different cpu families supported by arch/powerpc. Signed-off-by: Michael Ellerman m...@ellerman.id.au --- v2: Reworked formatting to avoid wrapping. Fixed up Freescale details. Documentation/powerpc/cpu_families.txt | 227 + 1 file changed, 227 insertions(+) create mode 100644 Documentation/powerpc/cpu_families.txt diff --git a/Documentation/powerpc/cpu_families.txt b/Documentation/powerpc/cpu_families.txt new file mode 100644 index 000..fa4f159 --- /dev/null +++ b/Documentation/powerpc/cpu_families.txt @@ -0,0 +1,227 @@ +CPU Families + + +This document tries to summarise some of the different cpu families that exist +and are supported by arch/powerpc. I think there are more CPUs that exist(ed), but aren't supported by arch/powerpc. 602, Exponential, and there were a few others. Did you want to include those? Do the arrows' endpoints have a particular meaning? Does the direction of the arrows (left vs. down) have a meaning? What's the meaning of the box? Do you want to have all of the minor derivatives or just major implementations? Some derivatives were just faster versions, others have microarchitectural changes, and some have additional registers or removed interfaces. There are a lot of ways to delineate this. + + +Book3S (aka sPAPR) +-- + + - Hash MMU + - Mix of 32 64 bit + + +--+ ++ + | Old POWER | --- | RS64 (threads) | + +--+ ++ + | + | + v + +--+ ++ +---+ + | 601 | --- | 603 | - | 740 | + +--+ ++ +---+ There was also at least 603e, 603ev, EC603e, and then offshoot G2 core, then the offshoot of that e300c1/e300c2/e300c3/e300c4. I might be missing one here. 740 is a 750 but without the L2, so it would probably fit better off pointing out of the 750 box. Then there is the 755/745 which have more BATs, L2 features, etc. Missing from the IBM side is 750CXe, 750GX, 750GL, but I don't know the lineage of those. Do you want the Nintendo stuff too? RAD750? + | | + | | + v v + +--+ ++ +---+ + | 604 | |750 (G3)| - | 750CX | + +--+ ++ +---+ Also 604e, and I guess 604ev? + | | | + | | | + v v v + +--+ ++ +---+ + | 620 (64 bit) | | 7400 || 750CL | + +--+ ++ +---+ + | | | + | | | + v v v + +--+ ++ +---+ + | POWER3/630 | | 7410 || 750FX | + +--+ ++ +---+ 7400/7410 mostly the same thing + | | + | | + v v + +--+ ++ + | POWER3+| | 7450 | + +--+ ++ + | | + | | + v v + +--+ ++ + |POWER4| | 7455 | + +--+ ++ + | | + | | + v v + +--+
Re: BookE branch taken behavior vis-a-vis updating the NIP register
On Tue, 12 Nov 2013, pegasus wrote: So, off I went and downloaded the latest version of arch/powerpc/kernel/traps.c hoping to see those very changes in them. However it didn't match one on one with what was written in that thread. Ditto for the other files in your patch. Looks like your patch didn't make it to upstream but it looks exactly like what I need here. The patches were posted RFC -- request for comment. Thank you for posting your comments. So requesting PTRACE_CONT has to happen inside the SIGTRAP signal handler right? After you get the SIGTRAP from the branch, yeah. Now as for the second patch, as far as I can see, implements the same functionality. However it makes the change permanent and any tool which is used to the NIP pointing to the branch target will be broken. The second patch places the burden of determining if you are BookE or server on the tool. I feel this is important because the Server and Book-E branch exceptions are fundamentally incompatible, and the behavior of PTRACE_SINGLEBLOCK can not be made to be identical by the kernel for both subarch, so a tool will have to adjust accordingly. Anyways, for me either of them will work. But I think the first patch makes everyone happy by using the 'addr' field of ptrace. This also means I will have to make my (broken) ptrace working which, it seems is not as easy adding an enum field as you suggested. May be theres a check somewhere in the actual ptrace code which checks for illegal values and hence even after adding an enum, it is being reported as illegal in my case. However getting that to work is another story. Actually, I provided the first patch to show why it probably should not be done even though technically possible, since it requires ptrace API to be extended to now recognize addr field, needs a TIF flag, etc. The second patch seems to me more reasonable to support, since nobody has come forth to say that there are actually any tools that rely upon the existing hack for BookE or even PTRACE_SINGLEBLOCK. And the existing hack's behavior makes things somewhat useless on BookE, as you and I have confirmed. Please confirm my understanding of your patches and since these patches have not made their way to the upstream kernel, will have to use them myself directly. By the way, I'm using 2.6.32.10 (you know..the long-term kernel) and I couldn't find any of your changes in them but then again I couldn't find it in the latest 3.12 version either. You will have to backport the patches to your kernel if you want to use them. Both patches change the behavior of existing API, and I guess we are still waiting to hear if anyone cares which way it should be fixed. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: BookE branch taken behavior vis-a-vis updating the NIP register
On Mon, 11 Nov 2013, pegasus wrote: And this is due to the pipelining feature that is inherent in all processors. No, it has nothing to do with pipelining at all. It is just the convention that IBM defined as to how the branch taken exception works in BookE Power ISA. The pipeline behavior is not visible from the architected state. But I still have a question about how one would then be able to signal to the userspace who might be interpreting this information differently? I mean if SRR0 contains, not the branch instruction address but the address of the branch target, how would any debugger be able to catch function calls? On Server and the BookE-mimicking-Server, PT_NIP would point to the start of the function you are calling. However, you do not get the callsite address unless you know you stopped after a function-calling branch-and-link executed, single step again and read the LR and subtract 4. If you don't know if the branch that caused the exception was actually a branch-and-link, you don't know if you are at the beginning of a function or not, unless you keep a table of the address of all possible functions and look up each PT_NIP to see if you are. But that wouldn't be 100% accurate either, since a non-linking branch could just redirect to the beginning of the function. In the other thread, Ben said they have the CFAR register, but from the way it is described in the ISA document, you will not always get the address of the callsite, e.g. if your target is in the same cache black, the CFAR might not change. So, I don't think it can work in a simple way. On BookE with my patch to NOT mimic server, you will stop before the branch executes, so PT_NIP points to the branch itself, so that there is your callsite information. If you want the function target, you decode the instruction to see if it is a branch-and-link, and then compute the target address or manually singlestep and grab the new PT_NIP. May be there is a trick involved here and hence gdb or for that matter the other debuggers are still in the market. Per the other thread, and my inquiry through my company's gdb experts and upstream on the gdb mailing list, gdb does NOT use hardware branch tracing, at least not using PTRACE_SINGLEBLOCK. I don't personally know of any debuggers that use it in the form that it exists in Linux. But then I would be immensely obliged if you could shed some light on how is this accomplished. Lets say I am waiting at the userspace in my own sigtrap, to watch out for branch instructions. Lets say I want to profile my code to get to know how many branch instructions it has generated. How could I ever do that using my own custom SIGTRAP handler? Lots of ways to skin this cat, but I don't think any of them are easy or simple. - Using valgrind is probably the easiest way to get this information on an unmodified executable. - You may want to recompile your program with call-arch profiling if you have source code. gcc -fprofile-arcs but read the docs. - Another method is to set up performance monitor counter to interrupt after a branch has executed, but callsite information may be lost as well and you still have the issue of discerning whether you just called a function or not. - You can use my patch for BookE, but it's just an RFC. Also, using PTRACE_SINGLEBLOCK is slow. You could also stop on every PTRACE_SINGLESTEP, but this is even slower. Coming on to PTRACE_SINGLESTEP, the sysroot that has been provided to us by our vendor does not include a PTRACE_SINGLEBLOCK in sys/ptrace.h: Although I can clearly see that PTRACE_SINGLEBLOCK is supported in the kernel. Hence I am not able to compile this simple program in userspace: Heres the error I get: testptrace.c: In function 'int main()': testptrace.c:47: error: invalid conversion from 'int' to '__ptrace_request' testptrace.c:47: error: initializing argument 1 of 'long int ptrace(__ptrace_request, ...)' make: *** [testptrace] Error 1 How should I go about using ptrace to test this? I don't know. You'll probably have to request support from your vendor or whoever provided you with those headers. If you are using a glibc based toolchain, your glibc is probably out-of-date. You can try adding PTRACE_SINGLEBLOCK into the enum definition just to get past the compiler syntax error, but I don't know if that will break things or not in your libc ptrace(). You may have to write your own function that calls the __ptrace syscall directly if the libc ptrace() does something to the request or the response. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: BookE branch taken behavior vis-a-vis updating the NIP register
On Fri, 8 Nov 2013, pegasus wrote: Hello. I was reading the source code for the debug exception under powerpc. I saw that BookE processors stop before (actually) taking a branch. Hence in order to force it to take that branch and then stop, the source code for it had to be hacked' to (temporarily) enable single step until the branch instruction has been taken, thereby mimicing the BookS behavior. Have a look at this thread: https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-July/108618.html By doing this, I believe we would want the exception to be triggered after the branch has been successfully taken. Hence I put a printk to print the value of the instruction that actually caused the exception. I was assuming that initially a debug breakpoint would be hit on the branch instruction (assuming the branch is supposed to be taken). Now since at this point in time, the branch instruction has NOT finished, the kernel, after merely disabling the BT bit in DBCR0 and enabling IC bit in DBCR0, returns. At this point I was assuming we will see another exception at the very same instruction in userspace. However, when printing the NIP it becomes clear that when it gets to the debug exception handler after being (temporarily) set to single step, NIP points to the instruction after the branch instruction. To me, it appears that, after disabling BT (branch taken) debug event monitoring (and enabling single stepping), it does not catch an exception at that very same branch instruction, instead it catches an exception for the subsequent instruction. Sorry for the repetition but I wanted to clarify what I am seeing here. May be this IS the way it is supposed to behave (which means my thinking about it is flawed). I am a bit confused here. You have it correct, that is the behavior of what is there. The Server branch execute debug exception occurs after the branch completes, and the SRR0 (the value in PT_NIP) points to the address of the instruction that is to be executed after the branch. BookE branch exception occurs before the branch is executed, but only if the branch will be taken. The hack tries to make BookE return an exception to the process with the same timing as Server: exception signal sent to process after the branch completes. I think some of the confusion is due to the existence of the hack. Its existence implies that BookE can perfectly emulate Server, but this is not possible because BookE would only take the exception if the branch's condition would cause the branch to be actually taken (that the branch condition, if present, is true). Therefore, even with this hack, untaken branches are not signaled at all on BookE while they would on Server. Note that PTRACE_SINGLESTEP works the same on both Server and BookE. The exception occurs after an instruction completes, and the SRR0 / NIP points to the instruction that will be executed next. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] [RFC] Emulate lwsync to run standard user land on e500 cores
On Fri, 25 Oct 2013, David Laight wrote: This is not a distro issue. It's a libstdc++ portability issue. libstdc++ hardcodes lwsync unless __NO_LWSYNC__ is explicitly defined, which you only get with -mcpu=8540/-mcpu=8548. When compiled for any powerpc target other than -mcpu=8540/-mcpu=8548, including the default -mcpu=common, libstdc++ will end up containing lwsync. There is no way to explicitly request libstdc++ to be built without lwsync with an -mcpu target other than 8540/8548. The issue is easily demonstrated by running a program that throws a C++ exception: __cxa_throw() is called, which has an lwsync. This results in an illegal instruction exception when run on an e500v1/e500v2. Perhaps libstc++ should be working out at run time whether lwsync is valid? lwsync has been in libstdc++ for over 8 years so there's a substantial amount of legacy binary code that exists such that changing libstdc++ now won't solve the problem. This isn't a performance issue, it's a functional issue -- libstdc++ for -mcpu=common targets doesn't work on e500v1/e500v2. (QorIQ P4080 that have e500mc and newer cores support lwsync so this is no longer an issue.) If one /really/ cared about losing performance while running common target binaries, emulating lwsync is an inconsequential performance loss compared to the kernel doing floating point emulation. Recompiling libstdc++ (and all your other userland) with -mcpu=8548 is what you'd have to do to avoid classic FP and use Embedded FP or SPE, and if you are doing that you'll get sync instead of lwsync in libstdc++. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] [RFC] Emulate lwsync to run standard user land on e500 cores
On Fri, 25 Oct 2013, Scott Wood wrote: Has anyone measured how much this slows things down with a typical userspace? Not a measurement of the patch in question but an older similar patch on 3.0.51 (8572 running Debian 5.0.3): $ ./test-lwsync cycles per emulated lwsync = 371 cycles per sync = 36 -- #include stdio.h int main (void) { unsigned long atb_start, atb_stop; unsigned long i; asm volatile (mfspr %0,526 : =r (atb_start)); for (i = 0; i 100; i++) { asm volatile (lwsync); } asm volatile (mfspr %0,526 : =r (atb_stop)); printf(cycles per emulated lwsync = %lu\n, (atb_stop - atb_start) / 100); asm volatile (mfspr %0,526 : =r (atb_start)); for (i = 0; i 100; i++) { asm volatile (sync); } asm volatile (mfspr %0,526 : =r (atb_stop)); printf(cycles per sync = %lu\n, (atb_stop - atb_start) / 100); return 0; } -- ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] [RFC] Emulate lwsync to run standard user land on e500 cores
On Thu, 24 Oct 2013, Kumar Gala wrote: On Oct 24, 2013, at 4:45 AM, Benjamin Herrenschmidt wrote: On Wed, 2013-10-23 at 23:06 -0500, Kumar Gala wrote: On Oct 23, 2013, at 5:15 AM, Scott Wood wrote: On Wed, 2013-10-23 at 00:07 -0500, Kumar Gala wrote: On Oct 18, 2013, at 2:38 AM, Wolfgang Denk wrote: diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index f783c93..f330374 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -986,6 +986,13 @@ static int emulate_instruction(struct pt_regs *regs) return 0; } + /* Emulating the lwsync insn as a sync insn */ + if (instword == PPC_INST_LWSYNC) { + PPC_WARN_EMULATED(lwsync, regs); + asm volatile(sync : : : memory); Do we really need the inline asm? Doesn't the fact of just taking an exception and returning from it equate to a sync. No, it doesn't equate to a sync. See the discussion here: http://patchwork.ozlabs.org/patch/256747/ Thanks. I'm not sure I'm a fan of doing this as it silently hides a significant performance impact. Could we possible re-write the userspace instruction to be a 'sync' when we hit this? Rewriting user space is a can of worms I wouldn't get into ... is any other arch doing it ? Fair enough I'm not too worried as long as we warn and account them. Than, I'd ask this be under a Kconfig option that is disabled by default. Users should have to explicitly enable this so they know what they are doing. I think it should be enabled by default, rather than disabled, so that users would actually see a warning rather than get a sig 4. Or, let it not be Kconfig-able so that this doesn't become a problem any more. (It's been 4 years since I sent to you an earlier version of this patch.) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/ppc64: remove __volatile__ in get_current()
On Sat, 10 Aug 2013, James Yang wrote: Uses of get_current() that normally get optimized away still result in a load instruction of the current pointer in 64-bit because the inline asm uses __volatile__. This patch removes __volatile__ so that nop-ed uses of get_current() don't actually result in a load of the pointer. Signed-off-by: James Yang james.y...@freescale.com --- arch/powerpc/include/asm/current.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/current.h b/arch/powerpc/include/asm/current.h index e2c7f06..bb250c8 100644 --- a/arch/powerpc/include/asm/current.h +++ b/arch/powerpc/include/asm/current.h @@ -19,7 +19,7 @@ static inline struct task_struct *get_current(void) { struct task_struct *task; - __asm__ __volatile__(ld %0,%1(13) + __asm__ (ld %0,%1(13) : =r (task) : i (offsetof(struct paca_struct, __current))); Hello, Scott's been able to put enough doubt in me to think that this is not entirely safe, even though the testing and code generation show it to work. Please reject this patch. I think there is still value in getting the unnecessary loads to be removed since it would also allow unnecessary conditional branches to be removed. I'll think about alternate ways to do this. Regards, --James ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/ppc64: remove __volatile__ in get_current()
Uses of get_current() that normally get optimized away still result in a load instruction of the current pointer in 64-bit because the inline asm uses __volatile__. This patch removes __volatile__ so that nop-ed uses of get_current() don't actually result in a load of the pointer. Signed-off-by: James Yang james.y...@freescale.com --- arch/powerpc/include/asm/current.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/current.h b/arch/powerpc/include/asm/current.h index e2c7f06..bb250c8 100644 --- a/arch/powerpc/include/asm/current.h +++ b/arch/powerpc/include/asm/current.h @@ -19,7 +19,7 @@ static inline struct task_struct *get_current(void) { struct task_struct *task; - __asm__ __volatile__(ld %0,%1(13) + __asm__ (ld %0,%1(13) : =r (task) : i (offsetof(struct paca_struct, __current))); -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/booke: clear DBCR0_BT in user_disable_single_step()
BookE version of user_disable_single_step() clears DBCR0_IC for the instruction completion debug, but did not also clear DBCR0_BT for the branch taken exception. This behavior was lost by the 2/2010 patch. Signed-off-by: James Yang james.y...@freescale.com --- arch/powerpc/kernel/ptrace.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index 98c2fc1..c634970 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -894,7 +894,7 @@ void user_disable_single_step(struct task_struct *task) * And, after doing so, if all debug flags are off, turn * off DBCR0(IDM) and MSR(DE) Torez */ - task-thread.dbcr0 = ~DBCR0_IC; + task-thread.dbcr0 = ~(DBCR0_IC|DBCR0_BT); /* * Test to see if any of the DBCR_ACTIVE_EVENTS bits are set. */ -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH 0/2] powerpc/booke: PTRACE_SINGLEBLOCK support for BookE
PTRACE_SINGLEBLOCK support for BookE currently stops on the instruction after taken branches. This is different from the behavior on Server where it stops after all branches. BookE was made to simulate Server by taking a single step after the branch taken exception. It is understood that the reason for making PTRACE_SINGLEBLOCK on BookE to simulate Server was to make the semantics exposed to user space identicial on both, but this is not really possible due to the fundamental difference that untaken branches do not trigger the branch taken exception in BookE. BookE ISA's branch taken exception triggers before a branch that will be taken executes. This allows software to examine the branch and the conditions under which it will be taken. It also means software can tell where basic blocks end (at least the ones which are terminated by taken branches). There are no architected registers that report the address of the branch instruction after it has executed. Server's branch trace exception triggers after a branch executes regardless of whether or not it was taken. The exception stops on the instruction after fall-through branches. Two mutually-exclusive patches are provided for RFC that expose BookE's branch taken debug exception behavior accessible through PTRACE_SINGLEBLOCK: - The first patch keeps the semantic behavior of the existing support by using the ptrace() addr parameter to select between the modes. This requires a new bit in the TIF as well as changes in kernel/ptrace.c. - The second patch makes PTRACE_SINGLEBLOCK reflect the BookE native behavior, which stops on the branch instruction. The changes are isolated to arch/powerpc/kernel/traps.c. IMHO, the only reason not to do the 2nd patch would be to maintain compatibility for any tools that actually rely on the inaccurate simulation of Server's behavior when run on a BookE system. Are there any tools that actually rely upon the behavior currently implemented for BookE in Linux -- SIGTRAP only after taken branches? Even if there are, it should be possible to modify such a tool to issue a PTRACE_SINGLESTEP after receiving the SIGTRAP on the branch to retain equivalent functionality. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH 2/2] powerpc/booke: revert PTRACE_SINGLEBLOCK to BookE behavior
A BookE branch taken debug exception followed by a single step does not accurately simulate Server's branch execute debug exception. BookE's branch taken debug exception stops before the branch is to be executed and only happens if the branch will actually be taken. Server's branch execute trace exception stops on the instruction after the branch executes, regardless of whether or not the branch redirected the program counter. The existing PTRACE_SINGLEBLOCK support for BookE hardcodes a single step after the branch taken exception is taken in order to simulate Server's behavior, but this misses fall-through branch instructions (i.e., branches that are NOT taken). Also, the si_code became masked as TRAP_TRACE instead of TRAP_BRANCH. This patch provides native support for the BookE branch taken debug exception's behavior: PTRACE_SINGLEBLOCK stops with a SIGTRAP before a branch-that-would-be-taken would execute. Userspace software will be able to examine the process state upon catching the SIGTRAP, and it will need to issue a PTRACE_SINGLESTEP or PTRACE_CONT to resume program execution past the branch. Signed-off-by: James Yang james.y...@freescale.com --- arch/powerpc/kernel/traps.c | 40 +++- 1 files changed, 27 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index c3ceaa2..5837d7f 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1518,12 +1518,21 @@ void __kprobes DebugException(struct pt_regs *regs, unsigned long debug_status) { current-thread.dbsr = debug_status; - /* Hack alert: On BookE, Branch Taken stops on the branch itself, while -* on server, it stops on the target of the branch. In order to simulate -* the server behaviour, we thus restart right away with a single step -* instead of stopping here when hitting a BT + /* BookE's Branch Taken Debug Exception stops on the branch iff the +* branch is resolved to be taken. No exception occurs if the branch +* is not taken (no exception if the branch does not redirect the PC). +* This is unlike Classic/Server's behavior where the exception occurs +* after the branch executes, regardless of whether or not the branch +* redirected the PC. +* +* The past behavior of this function was to simulate Classic/Server's +* behavior by performing a single-step upon a branch taken exception. +* However, the simulation is not accurate because fall-through non- +* taken branches would not result in a SIGTRAP. Also, that SIGTRAP's +* si_code would be reported as a TRAP_TRACE instead of a TRAP_BRANCH. */ - if (debug_status DBSR_BT) { + + if (debug_status DBSR_BT) { /* Branch Taken */ regs-msr = ~MSR_DE; /* Disable BT */ @@ -1531,20 +1540,25 @@ void __kprobes DebugException(struct pt_regs *regs, unsigned long debug_status) /* Clear the BT event */ mtspr(SPRN_DBSR, DBSR_BT); - /* Do the single step trick only when coming from userspace */ - if (user_mode(regs)) { - current-thread.dbcr0 = ~DBCR0_BT; - current-thread.dbcr0 |= DBCR0_IDM | DBCR0_IC; - regs-msr |= MSR_DE; - return; - } - if (notify_die(DIE_SSTEP, block_step, regs, 5, 5, SIGTRAP) == NOTIFY_STOP) { return; } + if (debugger_sstep(regs)) return; + + if (user_mode(regs)) { + current-thread.dbcr0 = ~DBCR0_BT; + if (DBCR_ACTIVE_EVENTS(current-thread.dbcr0, + current-thread.dbcr1)) + regs-msr |= MSR_DE; + else + /* Make sure the IDM bit is off */ + current-thread.dbcr0 = ~DBCR0_IDM; + } + + _exception(SIGTRAP, regs, TRAP_BRANCH, regs-nip); } else if (debug_status DBSR_IC) {/* Instruction complete */ regs-msr = ~MSR_DE; -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH 1/2] powerpc/booke: extend PTRACE_SINGLEBLOCK for BookE Branch Taken Debug
A BookE branch taken debug exception followed by a single step does not accurately simulate Server's branch execute debug exception. BookE's branch taken debug exception stops before the branch is to be executed and only happens if the branch will actually be taken. Server's branch execute trace exception stops on the instruction after the branch executes, regardless of whether or not the branch redirected the program counter. The existing PTRACE_SINGLEBLOCK support for BookE hardcodes a single step after the branch taken exception is taken in order to simulate Server's behavior, but this misses fall-through branch instructions (i.e., branches that are NOT taken). Also, the si_code became masked as TRAP_TRACE instead of TRAP_BRANCH. This patch makes available the unmodified BookE branch taken debug exception through PTRACE_SINGLEBLOCK if the ptrace() addr parameter is set to 2. (The existing behavior of PTRACE_SINGLEBLOCK is retained for any other addr parameter value, e.g., 0.) SIGTRAP will be signaled with the NIP pointing to the branch instruction before it has executed. The ptrace-calling program can then examine the program state. It should then request a PTRACE_SINGLESTEP in order to advance the program to the next instruction or a PTRACE_CONT to resume normal program execution. The si_code now also reports TRAP_BRANCH. Signed-off-by: James Yang james.y...@freescale.com --- arch/powerpc/include/asm/thread_info.h |3 ++ arch/powerpc/kernel/traps.c| 51 --- kernel/ptrace.c| 13 ++-- 3 files changed, 52 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index ba7b197..ab7c257 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -107,6 +107,8 @@ static inline struct thread_info *current_thread_info(void) #define TIF_EMULATE_STACK_STORE16 /* Is an instruction emulation for stack store? */ #define TIF_MEMDIE 17 /* is terminating due to OOM killer */ +#define TIF_BOOKE_SINGLEBLOCK 18 /* Do not advance PC after Branch Taken + debug exception for BookE */ /* as above, but as bit values */ #define _TIF_SYSCALL_TRACE (1TIF_SYSCALL_TRACE) @@ -125,6 +127,7 @@ static inline struct thread_info *current_thread_info(void) #define _TIF_UPROBE(1TIF_UPROBE) #define _TIF_SYSCALL_TRACEPOINT(1TIF_SYSCALL_TRACEPOINT) #define _TIF_EMULATE_STACK_STORE (1TIF_EMULATE_STACK_STORE) +#define _TIF_BOOKE_SINGLEBLOCK (1TIF_BOOKE_SINGLEBLOCK) #define _TIF_NOHZ (1TIF_NOHZ) #define _TIF_SYSCALL_T_OR_A(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \ diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index c3ceaa2..ee899b3 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1518,11 +1518,14 @@ void __kprobes DebugException(struct pt_regs *regs, unsigned long debug_status) { current-thread.dbsr = debug_status; - /* Hack alert: On BookE, Branch Taken stops on the branch itself, while -* on server, it stops on the target of the branch. In order to simulate -* the server behaviour, we thus restart right away with a single step -* instead of stopping here when hitting a BT + /* Hack alert: On BookE, Branch Taken stops on the branch itself iff the +* branch will be taken, while on server, it stops on the target of the +* branch, regardless of whether or not the branch was taken. In order +* to simulate the server behaviour (at least for taken branches), we +* thus restart right away with a single step instead of stopping here +* when hitting a BT. */ + if (debug_status DBSR_BT) { regs-msr = ~MSR_DE; @@ -1531,20 +1534,44 @@ void __kprobes DebugException(struct pt_regs *regs, unsigned long debug_status) /* Clear the BT event */ mtspr(SPRN_DBSR, DBSR_BT); - /* Do the single step trick only when coming from userspace */ - if (user_mode(regs)) { - current-thread.dbcr0 = ~DBCR0_BT; - current-thread.dbcr0 |= DBCR0_IDM | DBCR0_IC; - regs-msr |= MSR_DE; - return; - } - if (notify_die(DIE_SSTEP, block_step, regs, 5, 5, SIGTRAP) == NOTIFY_STOP) { return; } + if (debugger_sstep(regs)) return; + + if (user_mode(regs)) { + + /* Do the single step trick only when coming from +* userspace
Re: [RFC][PATCH 2/2] powerpc/booke: revert PTRACE_SINGLEBLOCK to BookE behavior
On Sat, 6 Jul 2013, Benjamin Herrenschmidt wrote: On Fri, 2013-07-05 at 17:11 -0500, James Yang wrote: A BookE branch taken debug exception followed by a single step does not accurately simulate Server's branch execute debug exception. BookE's branch taken debug exception stops before the branch is to be executed and only happens if the branch will actually be taken. Server's branch execute trace exception stops on the instruction after the branch executes, regardless of whether or not the branch redirected the program counter. The existing PTRACE_SINGLEBLOCK support for BookE hardcodes a single step after the branch taken exception is taken in order to simulate Server's behavior, but this misses fall-through branch instructions (i.e., branches that are NOT taken). Also, the si_code became masked as TRAP_TRACE instead of TRAP_BRANCH. But that changes the user visible behaviour, won't that break gdb expectations ? I am having a difficult time finding any use of PTRACE_SINGLEBLOCK in any arch in the various versions of gdb source trees I downloaded. Would you please provide a pointer or at least a hint as to where you think it would be? I don't know gdb internals at all, but grepping the sources for PTRACE_SINGLEBLOCK returns nothing. Another way to fix it is to instead use lib/sstep.c to emulate the single step maybe ? I don't think there's any issue any more with the hard coded single step with the fixes that Scott and Bharat recently provided. Actually, I don't even know if this feature was practically useful on BookE before those fixes due to the hangs and non-deterministic behavior. Hypothetically, sstep.c could let server to emulate BookE's behavior.. On the other hand, I tend to think that trapping before the branch is actually more useful especially if you don't have the CFAR register. And there's no exception for fall-through branches. So, yeah, this really is the question: are there actually any tools that rely on PTRACE_SINGLEBLOCK behaving in the different ways it currently does on the two Power subarchiectures? For BookE targes, how do they handle not being able to catch fall-through branches? For server targets, how do they capture the old LR value for blrl after the branch? (I'm guessing not even sstep emulation can provide this information, though it might not be practically useful.) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/math-emu: fix load/store indexed emulation
Load/store indexed instructions where the index register RA=R0, such as lfdx f1,0,r3, are not illegal. Load/store indexed with update instructions where the index register RA=R0, such as lfdux f1,0,r3, are invalid, and, to be consistent with existing math-emu behavior for other invalid instruction forms, will signal as illegal. --- arch/powerpc/math-emu/math.c | 15 +-- 1 files changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/math-emu/math.c b/arch/powerpc/math-emu/math.c index 164d559..eabce90 100644 --- a/arch/powerpc/math-emu/math.c +++ b/arch/powerpc/math-emu/math.c @@ -410,21 +410,16 @@ do_mathemu(struct pt_regs *regs) case XE: idx = (insn 16) 0x1f; op0 = (void *)current-thread.TS_FPR((insn 21) 0x1f); - if (!idx) { - if (((insn 1) 0x3ff) == STFIWX) - op1 = (void *)(regs-gpr[(insn 11) 0x1f]); - else - goto illegal; - } else { - op1 = (void *)(regs-gpr[idx] + regs-gpr[(insn 11) 0x1f]); - } - + op1 = (void *)((idx ? regs-gpr[idx] : 0) + + regs-gpr[(insn 11) 0x1f]); break; case XEU: idx = (insn 16) 0x1f; + if (!idx) + goto illegal; op0 = (void *)current-thread.TS_FPR((insn 21) 0x1f); - op1 = (void *)((idx ? regs-gpr[idx] : 0) + op1 = (void *)(regs-gpr[idx] + regs-gpr[(insn 11) 0x1f]); break; -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] powerpc/math-emu: fix load/store indexed emulation
Load/store indexed instructions where the index register RA=R0, such as lfdx f1,0,r3, are not illegal. Load/store indexed with update instructions where the index register RA=R0, such as lfdux f1,0,r3, are invalid, and, to be consistent with existing math-emu behavior for other invalid instruction forms, will signal as illegal. Signed-off-by: James Yang james.y...@freescale.com --- arch/powerpc/math-emu/math.c | 15 +-- 1 files changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/math-emu/math.c b/arch/powerpc/math-emu/math.c index 164d559..eabce90 100644 --- a/arch/powerpc/math-emu/math.c +++ b/arch/powerpc/math-emu/math.c @@ -410,21 +410,16 @@ do_mathemu(struct pt_regs *regs) case XE: idx = (insn 16) 0x1f; op0 = (void *)current-thread.TS_FPR((insn 21) 0x1f); - if (!idx) { - if (((insn 1) 0x3ff) == STFIWX) - op1 = (void *)(regs-gpr[(insn 11) 0x1f]); - else - goto illegal; - } else { - op1 = (void *)(regs-gpr[idx] + regs-gpr[(insn 11) 0x1f]); - } - + op1 = (void *)((idx ? regs-gpr[idx] : 0) + + regs-gpr[(insn 11) 0x1f]); break; case XEU: idx = (insn 16) 0x1f; + if (!idx) + goto illegal; op0 = (void *)current-thread.TS_FPR((insn 21) 0x1f); - op1 = (void *)((idx ? regs-gpr[idx] : 0) + op1 = (void *)(regs-gpr[idx] + regs-gpr[(insn 11) 0x1f]); break; -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Emulate sync instruction variants
On Thu, 4 Jul 2013, Benjamin Herrenschmidt wrote: On Thu, 2013-07-04 at 09:31 +0100, David Laight wrote: Do you need to execute 'sync' here? It is worth checking whether the trap entry/exit doesn't do an implicit one for you. Not really. It does an implicit isync (more than one even) but not a sync. The execution of a sync is required because the original sync variant instruction is not actually executed, and, per Ben, no other explicit sync exists in the exception handler. If the ISA extends the sync instruction using its reserved bits, the intent would also be that executing a heavyweight sync is suitably compatible, otherwise they should have modified some other synchronization instruction. This patch ensures at least one sync instruction is executed for each sync variant instruction executed that triggers a program exception. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] Fix string emulation for 32-bit process on ppc64
String instruction emulation would erroneously result in a segfault if the upper bits of the EA are set and is so high that it fails access check. Truncate the EA to 32 bits if the process is 32-bit. Signed-off-by: James Yang james.y...@freescale.com --- arch/powerpc/kernel/traps.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index dce1bea..c72e7e9 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -840,6 +840,10 @@ static int emulate_string_inst(struct pt_regs *regs, u32 instword) u8 val; u32 shift = 8 * (3 - (pos 0x3)); + /* if process is 32-bit, clear upper 32 bits of EA */ + if ((regs-msr MSR_64BIT) == 0) + EA = 0x; + switch ((instword PPC_INST_STRING_MASK)) { case PPC_INST_LSWX: case PPC_INST_LSWI: -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev