Re: [PATCH v2 7/9] powerpc/powernv: Add platform support for stop instruction

2016-05-19 Thread Paul Mackerras
On Tue, May 03, 2016 at 01:54:36PM +0530, Shreyas B. Prabhu wrote:
> POWER ISA v3 defines a new idle processor core mechanism. In summary,
>  a) new instruction named stop is added. This instruction replaces
>   instructions like nap, sleep, rvwinkle.
>  b) new per thread SPR named PSSCR is added which controls the behavior
>   of stop instruction.
> 
> PSSCR has following key fields
>   Bits 0:3  - Power-Saving Level Status. This field indicates the lowest
>   power-saving state the thread entered since stop instruction was last
>   executed.
> 
>   Bit 42 - Enable State Loss
>   0 - No state is lost irrespective of other fields
>   1 - Allows state loss
> 
>   Bits 44:47 - Power-Saving Level Limit
>   This limits the power-saving level that can be entered into.
> 
>   Bits 60:63 - Requested Level
>   Used to specify which power-saving level must be entered on executing
>   stop instruction
> 
> This patch adds support for stop instruction and PSSCR handling.

I notice that you have duplicated a whole lot of assembly code
relating to synchronizing between threads going into and out of
power-saving modes, saving/restoring SPRs, resyncing the timebase, and
so on.

Two questions arise:

- Are we really going to have to do all of that in the same way for
  POWER9 as we did for POWER8?  You even copied over a comment about
  the fastsleep workaround, which I really hope we won't have to do on
  POWER9.  Also, on POWER9, the threads are much more independent, so
  I was not expecting that there would still be shared registers.

- If we do have to do all that, could we use the same code as on
  POWER8 rather than having another copy of all that code?

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/9] powerpc/kvm: make hypervisor state restore a function

2016-05-19 Thread Paul Mackerras
On Tue, May 03, 2016 at 01:54:31PM +0530, Shreyas B. Prabhu wrote:
> In the current code, when the thread wakes up in reset vector, some
> of the state restore code and check for whether a thread needs to
> branch to kvm is duplicated. Reorder the code such that this
> duplication is avoided.

This is a nice cleanup.  The one minor comment I have is that since
power7_restore_hyp_resource has some unusual entry requirements (such
as requiring cr3 to be set a certain way), those requirements should
be documented in the comment just about the function entry point.  I
didn't see any unusual exit conditions, but if there are any they
should be documented too.

Reviewed-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

2016-05-19 Thread Andy Lutomirski
On Thu, May 19, 2016 at 4:15 PM, Josh Poimboeuf  wrote:
> On Mon, May 02, 2016 at 08:52:41AM -0700, Andy Lutomirski wrote:
>> On Mon, May 2, 2016 at 6:52 AM, Josh Poimboeuf  wrote:
>> > On Fri, Apr 29, 2016 at 05:08:50PM -0700, Andy Lutomirski wrote:
>> >> On Apr 29, 2016 3:41 PM, "Josh Poimboeuf"  wrote:
>> >> >
>> >> > On Fri, Apr 29, 2016 at 02:37:41PM -0700, Andy Lutomirski wrote:
>> >> > > On Fri, Apr 29, 2016 at 2:25 PM, Josh Poimboeuf  
>> >> > > wrote:
>> >> > > >> I suppose we could try to rejigger the code so that rbp points to
>> >> > > >> pt_regs or similar.
>> >> > > >
>> >> > > > I think we should avoid doing something like that because it would 
>> >> > > > break
>> >> > > > gdb and all the other unwinders who don't know about it.
>> >> > >
>> >> > > How so?
>> >> > >
>> >> > > Currently, rbp in the entry code is meaningless.  I'm suggesting that,
>> >> > > when we do, for example, 'call \do_sym' in idtentry, we point rbp to
>> >> > > the pt_regs.  Currently it points to something stale (which the
>> >> > > dump_stack code might be relying on.  Hmm.)  But it's probably also
>> >> > > safe to assume that if you unwind to the 'call \do_sym', then pt_regs
>> >> > > is the next thing on the stack, so just doing the section thing would
>> >> > > work.
>> >> >
>> >> > Yes, rbp is meaningless on the entry from user space.  But if an
>> >> > in-kernel interrupt occurs (e.g. page fault, preemption) and you have
>> >> > nested entry, rbp keeps its old value, right?  So the unwinder can walk
>> >> > past the nested entry frame and keep going until it gets to the original
>> >> > entry.
>> >>
>> >> Yes.
>> >>
>> >> It would be nice if we could do better, though, and actually notice
>> >> the pt_regs and identify the entry.  For example, I'd love to see
>> >> "page fault, RIP=xyz" printed in the middle of a stack dump on a
>> >> crash.
>> >>
>> >> Also, I think that just following rbp links will lose the
>> >> actual function that took the page fault (or whatever function
>> >> pt_regs->ip actually points to).
>> >
>> > Hm.  I think we could fix all that in a more standard way.  Whenever a
>> > new pt_regs frame gets saved on entry, we could also create a new stack
>> > frame which points to a fake kernel_entry() function.  That would tell
>> > the unwinder there's a pt_regs frame without otherwise breaking frame
>> > pointers across the frame.
>> >
>> > Then I guess we wouldn't need my other solution of putting the idt
>> > entries in a special section.
>> >
>> > How does that sound?
>>
>> Let me try to understand.
>>
>> The normal call sequence is call; push %rbp; mov %rsp, %rbp.  So rbp
>> points to (prev rbp, prev rip) on the stack, and you can follow the
>> chain back.  Right now, on a user access page fault or similar, we
>> have rbp (probably) pointing to the interrupted frame, and the
>> interrupted rip isn't saved anywhere that a naive unwinder can find
>> it.  (It's in pt_regs, but the rbp chain skips right over that.)
>>
>> We could change the entry code so that an interrupt / idtentry does:
>>
>> push pt_regs
>> push kernel_entry
>> push %rbp
>> mov %rsp, %rbp
>> call handler
>> pop %rbp
>> addq $8, %rsp
>>
>> or similar.  That would make it appear that the actual C handler was
>> caused by a dummy function "kernel_entry".  Now the unwinder would get
>> to kernel_entry, but it *still* wouldn't find its way to the calling
>> frame, which only solves part of the problem.  We could at least teach
>> the unwinder how kernel_entry works and let it decode pt_regs to
>> continue unwinding.  This would be nice, and I think it could work.
>>
>> I think I like this, except that, if it used a separate section, it
>> could potentially be faster, as, for each actual entry type, the
>> offset from the C handler frame to pt_regs is a foregone conclusion.
>> But this is pretty simple and performance is already abysmal in most
>> handlers.
>>
>> There's an added benefit to using a separate section, though: we could
>> also annotate the calls with what type of entry they were so the
>> unwinder could print it out nicely.
>>
>> I could be convinced either way.
>
> Ok, I took a stab at this.  See the patch below.
>
> In addition to annotating interrupt/exception pt_regs frames, I also
> annotated all the syscall pt_regs, for consistency.
>
> As you mentioned, it will affect performance a bit, but I think it will
> be insignificant.
>
> I think I like this approach better than putting the
> interrupt/idtentry's in a special section, because this is much more
> precise.  Especially now that I'm annotating pt_regs syscalls.
>
> Also I think with a few minor changes we could implement your idea of
> annotating the calls with what type of entry they are.  But I don't
> think that's really needed, because the name of the interrupt/idtentry
> is already on the stack trace.
>
> Before:
>
>   [] dump_stack+0x85/0xc2
>   [] 

[PATCH] ps3_gelic: use kmemdup

2016-05-19 Thread Muhammad Falak R Wani
Use kmemdup when some other buffer is immediately copied into allocated
region. It replaces call to allocation followed by memcpy, by a single
call to kmemdup.

Signed-off-by: Muhammad Falak R Wani 
---
 drivers/net/ethernet/toshiba/ps3_gelic_wireless.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c 
b/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
index 743b182..446ea58 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
@@ -1616,13 +1616,13 @@ static void gelic_wl_scan_complete_event(struct 
gelic_wl_info *wl)
target->valid = 1;
target->eurus_index = i;
kfree(target->hwinfo);
-   target->hwinfo = kzalloc(be16_to_cpu(scan_info->size),
+   target->hwinfo = kmemdup(scan_info,
+be16_to_cpu(scan_info->size),
 GFP_KERNEL);
if (!target->hwinfo)
continue;
 
/* copy hw scan info */
-   memcpy(target->hwinfo, scan_info, be16_to_cpu(scan_info->size));
target->essid_len = strnlen(scan_info->essid,
sizeof(scan_info->essid));
target->rate_len = 0;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

2016-05-19 Thread Josh Poimboeuf
On Mon, May 02, 2016 at 08:52:41AM -0700, Andy Lutomirski wrote:
> On Mon, May 2, 2016 at 6:52 AM, Josh Poimboeuf  wrote:
> > On Fri, Apr 29, 2016 at 05:08:50PM -0700, Andy Lutomirski wrote:
> >> On Apr 29, 2016 3:41 PM, "Josh Poimboeuf"  wrote:
> >> >
> >> > On Fri, Apr 29, 2016 at 02:37:41PM -0700, Andy Lutomirski wrote:
> >> > > On Fri, Apr 29, 2016 at 2:25 PM, Josh Poimboeuf  
> >> > > wrote:
> >> > > >> I suppose we could try to rejigger the code so that rbp points to
> >> > > >> pt_regs or similar.
> >> > > >
> >> > > > I think we should avoid doing something like that because it would 
> >> > > > break
> >> > > > gdb and all the other unwinders who don't know about it.
> >> > >
> >> > > How so?
> >> > >
> >> > > Currently, rbp in the entry code is meaningless.  I'm suggesting that,
> >> > > when we do, for example, 'call \do_sym' in idtentry, we point rbp to
> >> > > the pt_regs.  Currently it points to something stale (which the
> >> > > dump_stack code might be relying on.  Hmm.)  But it's probably also
> >> > > safe to assume that if you unwind to the 'call \do_sym', then pt_regs
> >> > > is the next thing on the stack, so just doing the section thing would
> >> > > work.
> >> >
> >> > Yes, rbp is meaningless on the entry from user space.  But if an
> >> > in-kernel interrupt occurs (e.g. page fault, preemption) and you have
> >> > nested entry, rbp keeps its old value, right?  So the unwinder can walk
> >> > past the nested entry frame and keep going until it gets to the original
> >> > entry.
> >>
> >> Yes.
> >>
> >> It would be nice if we could do better, though, and actually notice
> >> the pt_regs and identify the entry.  For example, I'd love to see
> >> "page fault, RIP=xyz" printed in the middle of a stack dump on a
> >> crash.
> >>
> >> Also, I think that just following rbp links will lose the
> >> actual function that took the page fault (or whatever function
> >> pt_regs->ip actually points to).
> >
> > Hm.  I think we could fix all that in a more standard way.  Whenever a
> > new pt_regs frame gets saved on entry, we could also create a new stack
> > frame which points to a fake kernel_entry() function.  That would tell
> > the unwinder there's a pt_regs frame without otherwise breaking frame
> > pointers across the frame.
> >
> > Then I guess we wouldn't need my other solution of putting the idt
> > entries in a special section.
> >
> > How does that sound?
> 
> Let me try to understand.
> 
> The normal call sequence is call; push %rbp; mov %rsp, %rbp.  So rbp
> points to (prev rbp, prev rip) on the stack, and you can follow the
> chain back.  Right now, on a user access page fault or similar, we
> have rbp (probably) pointing to the interrupted frame, and the
> interrupted rip isn't saved anywhere that a naive unwinder can find
> it.  (It's in pt_regs, but the rbp chain skips right over that.)
> 
> We could change the entry code so that an interrupt / idtentry does:
> 
> push pt_regs
> push kernel_entry
> push %rbp
> mov %rsp, %rbp
> call handler
> pop %rbp
> addq $8, %rsp
> 
> or similar.  That would make it appear that the actual C handler was
> caused by a dummy function "kernel_entry".  Now the unwinder would get
> to kernel_entry, but it *still* wouldn't find its way to the calling
> frame, which only solves part of the problem.  We could at least teach
> the unwinder how kernel_entry works and let it decode pt_regs to
> continue unwinding.  This would be nice, and I think it could work.
> 
> I think I like this, except that, if it used a separate section, it
> could potentially be faster, as, for each actual entry type, the
> offset from the C handler frame to pt_regs is a foregone conclusion.
> But this is pretty simple and performance is already abysmal in most
> handlers.
> 
> There's an added benefit to using a separate section, though: we could
> also annotate the calls with what type of entry they were so the
> unwinder could print it out nicely.
> 
> I could be convinced either way.

Ok, I took a stab at this.  See the patch below.

In addition to annotating interrupt/exception pt_regs frames, I also
annotated all the syscall pt_regs, for consistency.

As you mentioned, it will affect performance a bit, but I think it will
be insignificant.

I think I like this approach better than putting the
interrupt/idtentry's in a special section, because this is much more
precise.  Especially now that I'm annotating pt_regs syscalls.

Also I think with a few minor changes we could implement your idea of
annotating the calls with what type of entry they are.  But I don't
think that's really needed, because the name of the interrupt/idtentry
is already on the stack trace.

Before:

  [] dump_stack+0x85/0xc2
  [] __do_page_fault+0x576/0x5a0
  [] trace_do_page_fault+0x5c/0x2e0
  [] do_async_page_fault+0x2c/0xa0
  [] async_page_fault+0x28/0x30
  [] ? copy_page_to_iter+0x70/0x440
  [] ? pagecache_get_page+0x2c/0x290
  [] 

Re: [PATCH 3/4] rcutorture: Make -soundhw a x86 specific option

2016-05-19 Thread Josh Triplett
On Thu, May 19, 2016 at 12:38:47PM -0700, Paul E. McKenney wrote:
> On Thu, May 19, 2016 at 09:23:39AM -0700, Paul E. McKenney wrote:
> > On Thu, May 19, 2016 at 08:40:42AM -0700, Josh Triplett wrote:
> > > On Thu, May 19, 2016 at 07:10:13AM -0700, Paul E. McKenney wrote:
> > > > On Wed, May 18, 2016 at 09:23:10PM -0700, Josh Triplett wrote:
> > > > > On Thu, May 19, 2016 at 11:42:23AM +0800, Boqun Feng wrote:
> > > > > > The option "-soundhw pcspk" gives me a error on PPC as follow:
> > > > > > 
> > > > > > qemu-system-ppc64: ISA bus not available for pcspk
> > > > > > 
> > > > > > , which means this option doesn't work on ppc by default. So simply 
> > > > > > make
> > > > > > this an x86-specific option via identify_qemu_args().
> > > > > > 
> > > > > > Signed-off-by: Boqun Feng 
> > > > > 
> > > > > The emulated system for RCU testing does not need sound hardware at 
> > > > > all.
> > > > > Paul added this option in commit
> > > > > 16c77ea7d0f4a74e49009aa2d26c275f7f93de7c to disable the default sound
> > > > > hardware, saying that '"-soundhw pcspk" makes the script a bit less
> > > > > dependent on odd audio libraries being installed'.  Unfortunately, it
> > > > > looks like there isn't a "-soundhw none".  As far as I can tell,
> > > > > currently the only way to completely eliminate sound hardware is to 
> > > > > pass
> > > > > "-nodefaults" and then explicitly specify each desired device; while
> > > > > that would solve the issue, it would likely introduce *more*
> > > > > hardware-specific command-line options...
> > > > > 
> > > > > I've filed two feature requests on upstream qemu to make this simpler:
> > > > > https://bugs.launchpad.net/qemu/+bug/1583420 and
> > > > > https://bugs.launchpad.net/qemu/+bug/1583421 .
> > > > > 
> > > > > Paul, what did you mean by "dependent on odd audio libraries"?  Did 
> > > > > you
> > > > > mean in the guest or the host?  And either way, is this something that
> > > > > could potentially be solved another way?
> > > > 
> > > > If I remember correctly, Ubuntu 14.04 qemu refused to run the guest
> > > > without this option, but I don't recall the exact error message.
> > > > I chalked it up to my ignorance of qemu, but I would very much welcome
> > > > some way to not have to specify irrelevant hardware.  So thank you very
> > > > much for filing the bugs!
> > > 
> > > According to qemu upstream, qemu doesn't enable any sound hardware by
> > > default, so I can't think of any obvious reason why adding "-soundhw
> > > pcspkr" would make the rcutorture VM boot.  Did qemu refuse to run at
> > > all, or did the VM start but fail during the boot process?
> > > 
> > > Could you check if you can currently run without this option?  If so,
> > > perhaps we should just drop it for now.
> > 
> > Will do!  As soon as the current test completes.
> 
> And it now works just fine without the "-soundhw pcspkr".  Search me!

In that case, can you replace the patch in this series making "-soundhw
pcspkr" target-specific with one removing "-soundhw pcspkr"?

- Josh Triplett
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] rcutorture: Make -soundhw a x86 specific option

2016-05-19 Thread Paul E. McKenney
On Thu, May 19, 2016 at 09:23:39AM -0700, Paul E. McKenney wrote:
> On Thu, May 19, 2016 at 08:40:42AM -0700, Josh Triplett wrote:
> > On Thu, May 19, 2016 at 07:10:13AM -0700, Paul E. McKenney wrote:
> > > On Wed, May 18, 2016 at 09:23:10PM -0700, Josh Triplett wrote:
> > > > On Thu, May 19, 2016 at 11:42:23AM +0800, Boqun Feng wrote:
> > > > > The option "-soundhw pcspk" gives me a error on PPC as follow:
> > > > > 
> > > > > qemu-system-ppc64: ISA bus not available for pcspk
> > > > > 
> > > > > , which means this option doesn't work on ppc by default. So simply 
> > > > > make
> > > > > this an x86-specific option via identify_qemu_args().
> > > > > 
> > > > > Signed-off-by: Boqun Feng 
> > > > 
> > > > The emulated system for RCU testing does not need sound hardware at all.
> > > > Paul added this option in commit
> > > > 16c77ea7d0f4a74e49009aa2d26c275f7f93de7c to disable the default sound
> > > > hardware, saying that '"-soundhw pcspk" makes the script a bit less
> > > > dependent on odd audio libraries being installed'.  Unfortunately, it
> > > > looks like there isn't a "-soundhw none".  As far as I can tell,
> > > > currently the only way to completely eliminate sound hardware is to pass
> > > > "-nodefaults" and then explicitly specify each desired device; while
> > > > that would solve the issue, it would likely introduce *more*
> > > > hardware-specific command-line options...
> > > > 
> > > > I've filed two feature requests on upstream qemu to make this simpler:
> > > > https://bugs.launchpad.net/qemu/+bug/1583420 and
> > > > https://bugs.launchpad.net/qemu/+bug/1583421 .
> > > > 
> > > > Paul, what did you mean by "dependent on odd audio libraries"?  Did you
> > > > mean in the guest or the host?  And either way, is this something that
> > > > could potentially be solved another way?
> > > 
> > > If I remember correctly, Ubuntu 14.04 qemu refused to run the guest
> > > without this option, but I don't recall the exact error message.
> > > I chalked it up to my ignorance of qemu, but I would very much welcome
> > > some way to not have to specify irrelevant hardware.  So thank you very
> > > much for filing the bugs!
> > 
> > According to qemu upstream, qemu doesn't enable any sound hardware by
> > default, so I can't think of any obvious reason why adding "-soundhw
> > pcspkr" would make the rcutorture VM boot.  Did qemu refuse to run at
> > all, or did the VM start but fail during the boot process?
> > 
> > Could you check if you can currently run without this option?  If so,
> > perhaps we should just drop it for now.
> 
> Will do!  As soon as the current test completes.

And it now works just fine without the "-soundhw pcspkr".  Search me!

> BTW, am I the only one getting "interesting" failures in the merge
> window?

I will be chasing these down, but am likely to be off the grid until
Monday morning, Pacific time.  Looks like the same failure to awaken
as before, but much higher probability.

Thanx, Paul

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: Improve comment explaining why we modify VRSAVE

2016-05-19 Thread Anton Blanchard via Linuxppc-dev
The comment explaining why we modify VRSAVE is misleading, glibc
does rely on the behaviour. Update the comment.

Signed-off-by: Anton Blanchard 
---

diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 1c2e7a3..3907fcf 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -70,10 +70,11 @@ _GLOBAL(load_up_altivec)
MTMSRD(r5)  /* enable use of AltiVec now */
isync
 
-   /* Hack: if we get an altivec unavailable trap with VRSAVE
-* set to all zeros, we assume this is a broken application
-* that fails to set it properly, and thus we switch it to
-* all 1's
+   /*
+* While userspace in general ignores VRSAVE, glibc uses it as a
+* boolean to optimise userspace context save/restore. Whenever we
+* take an altivec unavailable exception we must set VRSAVE to
+* something non zero. Set it to all 1s.
 */
mfspr   r4,SPRN_VRSAVE
cmpwi   0,r4,0
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 2/2] powerpc/mm: Support segment table for Power9

2016-05-19 Thread Aneesh Kumar K.V
PowerISA 3.0 adds an in memory table for storing segment translation
information. In this mode, which is enabled by setting both HOST RADIX
and GUEST RADIX bits in partition table to 0 and enabling UPRT to
1, we have a per process segment table. The segment table details
are stored in the process table indexed by PID value.

Segment table mode also requires us to map the process table at the
beginning of a 1TB segment.

On the linux kernel side we enable this model if we find that
the radix is explicitily disabled by setting the ibm,pa-feature radix
bit (byte 40 bit 0) set to 0. If the size of ibm,pa-feature node is less
than 40 bytes, we enable the legacy HPT mode using SLB. If radix bit
is set to 1, we use the radix mode.

With respect to SLB mapping, we bolt mapp the entire kernel range and
and only handle user space segment fault.

We also have access to 4 SLB register in software. So we continue to use
3 of that for bolted kernel SLB entries as we use them currently.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h |  10 +
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  17 ++
 arch/powerpc/include/asm/book3s/64/mmu.h  |   5 +
 arch/powerpc/include/asm/mmu.h|   6 +-
 arch/powerpc/include/asm/mmu_context.h|   5 +-
 arch/powerpc/kernel/prom.c|   1 +
 arch/powerpc/mm/hash_utils_64.c   |  84 ++-
 arch/powerpc/mm/mmu_context_book3s64.c|  32 ++-
 arch/powerpc/mm/slb.c | 327 +-
 9 files changed, 470 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index f61cad3de4e6..5f0deeda7884 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -58,6 +58,16 @@
 #define H_VMALLOC_END  (H_VMALLOC_START + H_VMALLOC_SIZE)
 
 /*
+ * Process table with ISA 3.0 need to be mapped at the beginning of a 1TB 
segment
+ * We put that in the top of VMALLOC region. For each region we can go upto 
64TB
+ * for now. Hence we have space to put process table there. We should not get
+ * an SLB miss for this address, because the VSID for this is placed in the
+ * partition table.
+ */
+#define H_SEG_PROC_TBL_START   ASM_CONST(0xD0002000)
+#define H_SEG_PROC_TBL_END ASM_CONST(0xD00020ff)
+
+/*
  * Region IDs
  */
 #define REGION_SHIFT   60UL
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index a5fa6be7d5ae..75016f8cbd51 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -101,6 +101,18 @@
 #define HPTE_V_1TB_SEG ASM_CONST(0x4000)
 #define HPTE_V_VRMA_MASK   ASM_CONST(0x4001ff00)
 
+/* segment table entry masks/bits */
+/* Upper 64 bit */
+#define STE_VALID  ASM_CONST(0x800)
+/*
+ * lower 64 bit
+ * 64th bit become 0 bit
+ */
+/*
+ * Software defined bolted bit
+ */
+#define STE_BOLTED ASM_CONST(0x1)
+
 /* Values for PP (assumes Ks=0, Kp=1) */
 #define PP_RWXX0   /* Supervisor read/write, User none */
 #define PP_RWRX 1  /* Supervisor read/write, User read */
@@ -128,6 +140,11 @@ struct hash_pte {
__be64 r;
 };
 
+struct seg_entry {
+   __be64 ste_e;
+   __be64 ste_v;
+};
+
 extern struct hash_pte *htab_address;
 extern unsigned long htab_size_bytes;
 extern unsigned long htab_hash_mask;
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index c6b1ff795632..b7464bc013c9 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -60,7 +60,9 @@ extern struct patb_entry *partition_tb;
  * Power9 currently only support 64K partition table size.
  */
 #define PATB_SIZE_SHIFT16
+#define SEGTB_SIZE_SHIFT   PAGE_SHIFT
 
+extern unsigned long segment_table_initialize(struct prtb_entry *prtb);
 typedef unsigned long mm_context_id_t;
 struct spinlock;
 
@@ -90,6 +92,9 @@ typedef struct {
 #ifdef CONFIG_SPAPR_TCE_IOMMU
struct list_head iommu_group_mem_list;
 #endif
+   unsigned long seg_table;
+   struct spinlock *seg_tbl_lock;
+
 } mm_context_t;
 
 /*
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 4ad66a547d4c..4c58b470f9c9 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -24,6 +24,10 @@
  * Radix page table available
  */
 #define MMU_FTR_TYPE_RADIX ASM_CONST(0x0040)
+
+/* Seg table only supported for book3s 64 */
+#define MMU_FTR_TYPE_SEG_TABLE ASM_CONST(0x0080)
+
 /*
  * individual features
  */
@@ -124,7 +128,7 @@ enum {
MMU_FTR_LOCKLESS_TLBIE | MMU_FTR_CI_LARGE_PAGE |
MMU_FTR_1T_SEGMENT |
 #ifdef CONFIG_PPC_RADIX_MMU
-   MMU_FTR_TYPE_RADIX |
+

[RFC PATCH 1/2] powerpc/mm: Switch user slb fault handling to translation enabled

2016-05-19 Thread Aneesh Kumar K.V
We also handle fault with proper stack initialized. This enable us to
callout to C in fault handling routines. We don't do this for kernel
mapping, because of the possibility of taking recursive fault if kernel
stack in not yet mapped by an slb entry.

This enable us to handle Power9 slb fault better. We will add bolted
entries for the entire kernel mapping in segment table and user slb
entries we take fault and insert on demand. With translation on, we
should be able to access segment table from fault handler.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/exceptions-64s.S | 55 
 arch/powerpc/mm/slb.c| 11 
 2 files changed, 61 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index f2bd375b9a4e..2f2c52559ea9 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -794,7 +794,7 @@ data_access_slb_relon_pSeries:
mfspr   r3,SPRN_DAR
mfspr   r12,SPRN_SRR1
 #ifndef CONFIG_RELOCATABLE
-   b   slb_miss_realmode
+   b   handle_slb_miss_relon
 #else
/*
 * We can't just use a direct branch to slb_miss_realmode
@@ -803,7 +803,7 @@ data_access_slb_relon_pSeries:
 */
mfctr   r11
ld  r10,PACAKBASE(r13)
-   LOAD_HANDLER(r10, slb_miss_realmode)
+   LOAD_HANDLER(r10, handle_slb_miss_relon)
mtctr   r10
bctr
 #endif
@@ -819,11 +819,11 @@ instruction_access_slb_relon_pSeries:
mfspr   r3,SPRN_SRR0/* SRR0 is faulting address */
mfspr   r12,SPRN_SRR1
 #ifndef CONFIG_RELOCATABLE
-   b   slb_miss_realmode
+   b   handle_slb_miss_relon
 #else
mfctr   r11
ld  r10,PACAKBASE(r13)
-   LOAD_HANDLER(r10, slb_miss_realmode)
+   LOAD_HANDLER(r10, handle_slb_miss_relon)
mtctr   r10
bctr
 #endif
@@ -961,7 +961,23 @@ h_data_storage_common:
bl  unknown_exception
b   ret_from_except
 
+/* r3 point to DAR */
.align  7
+   .globl slb_miss_user
+slb_miss_user:
+   std r3,PACA_EXSLB+EX_DAR(r13)
+   /* Restore r3 as expected by PROLOG_COMMON below */
+   ld  r3,PACA_EXSLB+EX_R3(r13)
+   EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB)
+   RECONCILE_IRQ_STATE(r10, r11)
+   ld  r4,PACA_EXSLB+EX_DAR(r13)
+   li  r5,0x380
+   std r4,_DAR(r1)
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  handle_slb_miss
+   b   ret_from_except_lite
+
+.align 7
.globl instruction_access_common
 instruction_access_common:
EXCEPTION_PROLOG_COMMON(0x400, PACA_EXGEN)
@@ -1379,11 +1395,17 @@ unrecover_mce:
  * We assume we aren't going to take any exceptions during this procedure.
  */
 slb_miss_realmode:
-   mflrr10
 #ifdef CONFIG_RELOCATABLE
mtctr   r11
 #endif
+   /*
+* Handle user slb miss with translation enabled
+*/
+   cmpdi   r3,0
+   bge 3f
 
+slb_miss_kernel:
+   mflrr10
stw r9,PACA_EXSLB+EX_CCR(r13)   /* save CR in exc. frame */
std r10,PACA_EXSLB+EX_LR(r13)   /* save LR */
 
@@ -1428,6 +1450,29 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
mtspr   SPRN_SRR1,r10
rfid
b   .
+3:
+   /*
+* Enable IR/DR and handle the fault
+*/
+   EXCEPTION_PROLOG_PSERIES_1(slb_miss_user, EXC_STD)
+   /*
+* handler with relocation on
+*/
+handle_slb_miss_relon:
+#ifdef CONFIG_RELOCATABLE
+   mtctr   r11
+#endif
+   /*
+* Handle user slb miss with stack initialized.
+*/
+   cmpdi   r3,0
+   bge 4f
+   /*
+* go back to slb_miss_realmode
+*/
+   b   slb_miss_kernel
+4:
+   EXCEPTION_RELON_PROLOG_PSERIES_1(slb_miss_user, EXC_STD)
 
 unrecov_slb:
EXCEPTION_PROLOG_COMMON(0x4100, PACA_EXSLB)
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 48fc28bab544..b18d7df5601d 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+#include 
+
 enum slb_index {
LINEAR_INDEX= 0, /* Kernel linear map  (0xc000) */
VMALLOC_INDEX   = 1, /* Kernel virtual map (0xd000) */
@@ -346,3 +348,12 @@ void slb_initialize(void)
 
asm volatile("isync":::"memory");
 }
+
+void handle_slb_miss(struct pt_regs *regs,
+unsigned long address, unsigned long trap)
+{
+   enum ctx_state prev_state = exception_enter();
+
+   slb_allocate(address);
+   exception_exit(prev_state);
+}
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] rcutorture: Make -soundhw a x86 specific option

2016-05-19 Thread Paul E. McKenney
On Thu, May 19, 2016 at 08:40:42AM -0700, Josh Triplett wrote:
> On Thu, May 19, 2016 at 07:10:13AM -0700, Paul E. McKenney wrote:
> > On Wed, May 18, 2016 at 09:23:10PM -0700, Josh Triplett wrote:
> > > On Thu, May 19, 2016 at 11:42:23AM +0800, Boqun Feng wrote:
> > > > The option "-soundhw pcspk" gives me a error on PPC as follow:
> > > > 
> > > > qemu-system-ppc64: ISA bus not available for pcspk
> > > > 
> > > > , which means this option doesn't work on ppc by default. So simply make
> > > > this an x86-specific option via identify_qemu_args().
> > > > 
> > > > Signed-off-by: Boqun Feng 
> > > 
> > > The emulated system for RCU testing does not need sound hardware at all.
> > > Paul added this option in commit
> > > 16c77ea7d0f4a74e49009aa2d26c275f7f93de7c to disable the default sound
> > > hardware, saying that '"-soundhw pcspk" makes the script a bit less
> > > dependent on odd audio libraries being installed'.  Unfortunately, it
> > > looks like there isn't a "-soundhw none".  As far as I can tell,
> > > currently the only way to completely eliminate sound hardware is to pass
> > > "-nodefaults" and then explicitly specify each desired device; while
> > > that would solve the issue, it would likely introduce *more*
> > > hardware-specific command-line options...
> > > 
> > > I've filed two feature requests on upstream qemu to make this simpler:
> > > https://bugs.launchpad.net/qemu/+bug/1583420 and
> > > https://bugs.launchpad.net/qemu/+bug/1583421 .
> > > 
> > > Paul, what did you mean by "dependent on odd audio libraries"?  Did you
> > > mean in the guest or the host?  And either way, is this something that
> > > could potentially be solved another way?
> > 
> > If I remember correctly, Ubuntu 14.04 qemu refused to run the guest
> > without this option, but I don't recall the exact error message.
> > I chalked it up to my ignorance of qemu, but I would very much welcome
> > some way to not have to specify irrelevant hardware.  So thank you very
> > much for filing the bugs!
> 
> According to qemu upstream, qemu doesn't enable any sound hardware by
> default, so I can't think of any obvious reason why adding "-soundhw
> pcspkr" would make the rcutorture VM boot.  Did qemu refuse to run at
> all, or did the VM start but fail during the boot process?
> 
> Could you check if you can currently run without this option?  If so,
> perhaps we should just drop it for now.

Will do!  As soon as the current test completes.

BTW, am I the only one getting "interesting" failures in the merge
window?

Thanx, Paul

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] rcutorture: Make -soundhw a x86 specific option

2016-05-19 Thread Josh Triplett
On Thu, May 19, 2016 at 07:10:13AM -0700, Paul E. McKenney wrote:
> On Wed, May 18, 2016 at 09:23:10PM -0700, Josh Triplett wrote:
> > On Thu, May 19, 2016 at 11:42:23AM +0800, Boqun Feng wrote:
> > > The option "-soundhw pcspk" gives me a error on PPC as follow:
> > > 
> > > qemu-system-ppc64: ISA bus not available for pcspk
> > > 
> > > , which means this option doesn't work on ppc by default. So simply make
> > > this an x86-specific option via identify_qemu_args().
> > > 
> > > Signed-off-by: Boqun Feng 
> > 
> > The emulated system for RCU testing does not need sound hardware at all.
> > Paul added this option in commit
> > 16c77ea7d0f4a74e49009aa2d26c275f7f93de7c to disable the default sound
> > hardware, saying that '"-soundhw pcspk" makes the script a bit less
> > dependent on odd audio libraries being installed'.  Unfortunately, it
> > looks like there isn't a "-soundhw none".  As far as I can tell,
> > currently the only way to completely eliminate sound hardware is to pass
> > "-nodefaults" and then explicitly specify each desired device; while
> > that would solve the issue, it would likely introduce *more*
> > hardware-specific command-line options...
> > 
> > I've filed two feature requests on upstream qemu to make this simpler:
> > https://bugs.launchpad.net/qemu/+bug/1583420 and
> > https://bugs.launchpad.net/qemu/+bug/1583421 .
> > 
> > Paul, what did you mean by "dependent on odd audio libraries"?  Did you
> > mean in the guest or the host?  And either way, is this something that
> > could potentially be solved another way?
> 
> If I remember correctly, Ubuntu 14.04 qemu refused to run the guest
> without this option, but I don't recall the exact error message.
> I chalked it up to my ignorance of qemu, but I would very much welcome
> some way to not have to specify irrelevant hardware.  So thank you very
> much for filing the bugs!

According to qemu upstream, qemu doesn't enable any sound hardware by
default, so I can't think of any obvious reason why adding "-soundhw
pcspkr" would make the rcutorture VM boot.  Did qemu refuse to run at
all, or did the VM start but fail during the boot process?

Could you check if you can currently run without this option?  If so,
perhaps we should just drop it for now.

- Josh Triplett
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 4/8] powerpc: add io{read,write}64 accessors

2016-05-19 Thread Horia Geantă
This will allow device drivers to consistently use io{read,write}XX
also for 64-bit accesses.

Acked-by: Michael Ellerman 
Signed-off-by: Horia Geantă 
---
 arch/powerpc/kernel/iomap.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/arch/powerpc/kernel/iomap.c b/arch/powerpc/kernel/iomap.c
index 12e48d56f771..3963f0b68d52 100644
--- a/arch/powerpc/kernel/iomap.c
+++ b/arch/powerpc/kernel/iomap.c
@@ -38,6 +38,18 @@ EXPORT_SYMBOL(ioread16);
 EXPORT_SYMBOL(ioread16be);
 EXPORT_SYMBOL(ioread32);
 EXPORT_SYMBOL(ioread32be);
+#ifdef __powerpc64__
+u64 ioread64(void __iomem *addr)
+{
+   return readq(addr);
+}
+u64 ioread64be(void __iomem *addr)
+{
+   return readq_be(addr);
+}
+EXPORT_SYMBOL(ioread64);
+EXPORT_SYMBOL(ioread64be);
+#endif /* __powerpc64__ */
 
 void iowrite8(u8 val, void __iomem *addr)
 {
@@ -64,6 +76,18 @@ EXPORT_SYMBOL(iowrite16);
 EXPORT_SYMBOL(iowrite16be);
 EXPORT_SYMBOL(iowrite32);
 EXPORT_SYMBOL(iowrite32be);
+#ifdef __powerpc64__
+void iowrite64(u64 val, void __iomem *addr)
+{
+   writeq(val, addr);
+}
+void iowrite64be(u64 val, void __iomem *addr)
+{
+   writeq_be(val, addr);
+}
+EXPORT_SYMBOL(iowrite64);
+EXPORT_SYMBOL(iowrite64be);
+#endif /* __powerpc64__ */
 
 /*
  * These are the "repeat read/write" functions. Note the
-- 
2.4.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH v2 3/3] arch/powerpc : Enable optprobes support in powerpc

2016-05-19 Thread Anju T
Signed-off-by: Anju T 
---
 Documentation/features/debug/optprobes/arch-support.txt | 2 +-
 arch/powerpc/Kconfig| 1 +
 arch/powerpc/kernel/Makefile| 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/features/debug/optprobes/arch-support.txt 
b/Documentation/features/debug/optprobes/arch-support.txt
index b8999d8..45bc99d 100644
--- a/Documentation/features/debug/optprobes/arch-support.txt
+++ b/Documentation/features/debug/optprobes/arch-support.txt
@@ -27,7 +27,7 @@
 |   nios2: | TODO |
 |openrisc: | TODO |
 |  parisc: | TODO |
-| powerpc: | TODO |
+| powerpc: |  ok  |
 |s390: | TODO |
 |   score: | TODO |
 |  sh: | TODO |
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7cd32c0..a87c9b1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -104,6 +104,7 @@ config PPC
select HAVE_IOREMAP_PROT
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !CPU_LITTLE_ENDIAN
select HAVE_KPROBES
+   select HAVE_OPTPROBES
select HAVE_ARCH_KGDB
select HAVE_KRETPROBES
select HAVE_ARCH_TRACEHOOK
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2da380f..7994e22 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -99,6 +99,7 @@ endif
 obj-$(CONFIG_BOOTX_TEXT)   += btext.o
 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_KPROBES)  += kprobes.o
+obj-$(CONFIG_OPTPROBES)+= optprobes.o optprobes_head.o
 obj-$(CONFIG_UPROBES)  += uprobes.o
 obj-$(CONFIG_PPC_UDBG_16550)   += legacy_serial.o udbg_16550.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH v2 2/3] arch/powerpc : optprobes for powerpc core

2016-05-19 Thread Anju T
ppc_get_optinsn_slot() and ppc_free_optinsn_slot() are
geared towards the allocation and freeing of memory from 
the area reserved for detour buffer.

Signed-off-by: Anju T 
---
 arch/powerpc/kernel/optprobes.c | 480 
 1 file changed, 480 insertions(+)
 create mode 100644 arch/powerpc/kernel/optprobes.c

diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
new file mode 100644
index 000..bb61e18
--- /dev/null
+++ b/arch/powerpc/kernel/optprobes.c
@@ -0,0 +1,480 @@
+/*
+ * Code for Kernel probes Jump optimization.
+ *
+ * Copyright 2016, Anju T, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SLOT_SIZE 65536
+#define TMPL_CALL_HDLR_IDX \
+   (optprobe_template_call_handler - optprobe_template_entry)
+#define TMPL_EMULATE_IDX   \
+   (optprobe_template_call_emulate - optprobe_template_entry)
+#define TMPL_RET_BRANCH_IDX\
+   (optprobe_template_ret_branch - optprobe_template_entry)
+#define TMPL_RET_IDX   \
+   (optprobe_template_ret - optprobe_template_entry)
+#define TMPL_OP1_IDX   \
+   (optprobe_template_op_address1 - optprobe_template_entry)
+#define TMPL_OP2_IDX   \
+   (optprobe_template_op_address2 - optprobe_template_entry)
+#define TMPL_INSN_IDX  \
+   (optprobe_template_insn - optprobe_template_entry)
+#define TMPL_END_IDX   \
+   (optprobe_template_end - optprobe_template_entry)
+
+struct kprobe_ppc_insn_page {
+   struct list_head list;
+   kprobe_opcode_t *insns; /* Page of instruction slots */
+   struct kprobe_insn_cache *cache;
+   int nused;
+   int ngarbage;
+   char slot_used[];
+};
+
+#define PPC_KPROBE_INSN_PAGE_SIZE(slots)   \
+   (offsetof(struct kprobe_ppc_insn_page, slot_used) + \
+   (sizeof(char) * (slots)))
+
+enum ppc_kprobe_slot_state {
+   SLOT_CLEAN = 0,
+   SLOT_DIRTY = 1,
+   SLOT_USED = 2,
+};
+
+static struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
+   .mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
+   .pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
+   /* .insn_size is initialized later */
+   .nr_garbage = 0,
+};
+
+static int ppc_slots_per_page(struct kprobe_insn_cache *c)
+{
+   /*
+* Here the #slots per page differs from x86 as we have
+* only 64KB reserved.
+*/
+   return SLOT_SIZE / (c->insn_size * sizeof(kprobe_opcode_t));
+}
+
+/* Return 1 if all garbages are collected, otherwise 0. */
+static int collect_one_slot(struct kprobe_ppc_insn_page *kip, int idx)
+{
+   kip->slot_used[idx] = SLOT_CLEAN;
+   kip->nused--;
+   return 0;
+}
+
+static int collect_garbage_slots(struct kprobe_insn_cache *c)
+{
+   struct kprobe_ppc_insn_page *kip, *next;
+
+   /* Ensure no-one is interrupted on the garbages */
+   synchronize_sched();
+
+   list_for_each_entry_safe(kip, next, >pages, list) {
+   int i;
+
+   if (kip->ngarbage == 0)
+   continue;
+   kip->ngarbage = 0;  /* we will collect all garbages */
+   for (i = 0; i < ppc_slots_per_page(c); i++) {
+   if (kip->slot_used[i] == SLOT_DIRTY &&
+   collect_one_slot(kip, i))
+   break;
+   }
+   }
+   c->nr_garbage = 0;
+   return 0;
+}
+
+kprobe_opcode_t  *__ppc_get_optinsn_slot(struct kprobe_insn_cache *c)
+{
+   struct kprobe_ppc_insn_page *kip;
+   kprobe_opcode_t *slot = NULL;
+
+   mutex_lock(>mutex);
+   list_for_each_entry(kip, >pages, list) {
+   if (kip->nused < ppc_slots_per_page(c)) {
+   int i;
+
+   for (i = 0; i < ppc_slots_per_page(c); i++) {
+   if (kip->slot_used[i] == SLOT_CLEAN) {
+   kip->slot_used[i] = SLOT_USED;
+   kip->nused++;
+   slot = kip->insns + (i * c->insn_size);
+   goto out;
+   }
+   }
+   /* kip->nused reached max value. */
+   kip->nused = ppc_slots_per_page(c);
+   WARN_ON(1);
+   }
+   if (!list_empty(>pages)) {
+   pr_info("No more slots to allocate\n");
+   return NULL;
+   }
+   }
+   kip = kmalloc(PPC_KPROBE_INSN_PAGE_SIZE(ppc_slots_per_page(c)),
+ 

[RFC PATCH v2 1/3] arch/powerpc : Add detour buffer support for optprobes

2016-05-19 Thread Anju T
Detour buffer contains instructions to create an in memory pt_regs.
After the execution of prehandler a call is made for instruction emulation.
The NIP is decided after the probed instruction is executed. Hence a branch
instruction is created to the NIP returned by emulate_step().

Instruction slot for detour buffer is allocated from
the reserved area. For the time being 64KB is reserved
in memory for this purpose.

Signed-off-by: Anju T 
---
 arch/powerpc/include/asm/kprobes.h   |  25 
 arch/powerpc/kernel/optprobes_head.S | 108 +++
 2 files changed, 133 insertions(+)
 create mode 100644 arch/powerpc/kernel/optprobes_head.S

diff --git a/arch/powerpc/include/asm/kprobes.h 
b/arch/powerpc/include/asm/kprobes.h
index 039b583..3e4c998 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -38,7 +38,25 @@ struct pt_regs;
 struct kprobe;
 
 typedef ppc_opcode_t kprobe_opcode_t;
+
+extern kprobe_opcode_t optinsn_slot;
+/* Optinsn template address */
+extern kprobe_opcode_t optprobe_template_entry[];
+extern kprobe_opcode_t optprobe_template_call_handler[];
+extern kprobe_opcode_t optprobe_template_call_emulate[];
+extern kprobe_opcode_t optprobe_template_ret_branch[];
+extern kprobe_opcode_t optprobe_template_ret[];
+extern kprobe_opcode_t optprobe_template_insn[];
+extern kprobe_opcode_t optprobe_template_op_address1[];
+extern kprobe_opcode_t optprobe_template_op_address2[];
+extern kprobe_opcode_t optprobe_template_end[];
+
 #define MAX_INSN_SIZE 1
+#define MAX_OPTIMIZED_LENGTH4
+#define MAX_OPTINSN_SIZE   \
+   ((unsigned long)_template_end -\
+   (unsigned long)_template_entry)
+#define RELATIVEJUMP_SIZE   4
 
 #ifdef CONFIG_PPC64
 #if defined(_CALL_ELF) && _CALL_ELF == 2
@@ -129,5 +147,12 @@ struct kprobe_ctlblk {
 extern int kprobe_exceptions_notify(struct notifier_block *self,
unsigned long val, void *data);
 extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
+
+struct arch_optimized_insn {
+   kprobe_opcode_t copied_insn[1];
+   /* detour buffer */
+   kprobe_opcode_t *insn;
+};
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_KPROBES_H */
diff --git a/arch/powerpc/kernel/optprobes_head.S 
b/arch/powerpc/kernel/optprobes_head.S
new file mode 100644
index 000..ce32aec
--- /dev/null
+++ b/arch/powerpc/kernel/optprobes_head.S
@@ -0,0 +1,108 @@
+/*
+ * Code to prepare detour buffer for optprobes in kernel.
+ *
+ * Copyright 2016, Anju T, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+
+.global optinsn_slot
+optinsn_slot:
+   /* Reserve an area to allocate slots for detour buffer */
+   .space  65536
+.global optprobe_template_entry
+optprobe_template_entry:
+   stdur1,-INT_FRAME_SIZE(r1)
+   SAVE_GPR(0,r1)
+   /* Save the previous SP into stack */
+   addir0,r1,INT_FRAME_SIZE
+   std 0,GPR1(r1)
+   SAVE_2GPRS(2,r1)
+   SAVE_8GPRS(4,r1)
+   SAVE_10GPRS(12,r1)
+   SAVE_10GPRS(22,r1)
+   /* Save SPRS */
+   mfcfar  r5
+   std r5,_NIP(r1)
+   mfmsr   r5
+   std r5,_MSR(r1)
+   mfctr   r5
+   std r5,_CTR(r1)
+   mflrr5
+   std r5,_LINK(r1)
+   mfspr   r5,SPRN_XER
+   std r5,_XER(r1)
+   li  r5,0
+   std r5,_TRAP(r1)
+   mfdar   r5
+   std r5,_DAR(r1)
+   mfdsisr r5
+   std r5,_DSISR(r1)
+   /* Pass parameters for optimized_callback */
+.global optprobe_template_op_address1
+optprobe_template_op_address1:
+   nop
+   nop
+   nop
+   nop
+   nop
+   addir4,r1,STACK_FRAME_OVERHEAD
+   /* Branch to the prehandler */
+.global optprobe_template_call_handler
+optprobe_template_call_handler:
+   nop
+   /* Pass parameters for instruction emulation */
+   addir3,r1,STACK_FRAME_OVERHEAD
+.global optprobe_template_insn
+optprobe_template_insn:
+   nop
+   nop
+   /* Branch to instruction emulation  */
+.global optprobe_template_call_emulate
+optprobe_template_call_emulate:
+   nop
+.global optprobe_template_op_address2
+optprobe_template_op_address2:
+   nop
+   nop
+   nop
+   nop
+   nop
+   addir4,r1,STACK_FRAME_OVERHEAD
+   /* Branch to create_return_branch() function */
+.global optprobe_template_ret_branch
+optprobe_template_ret_branch:
+   nop
+   /* Restore the registers */
+   ld  r5,_MSR(r1)
+   mtmsr   r5
+   ld  r5,_CTR(r1)
+   mtctr   r5
+   ld  r5,_LINK(r1)
+   mtlrr5
+   ld  r5,_XER(r1)
+   mtxer   r5
+

[RFC PATCH v2 0/3] OPTPROBES for powerpc

2016-05-19 Thread Anju T
Here are the RFC patchset of the kprobes jump optimization
(a.k.a OPTPROBES)for powerpc. Kprobe being an inevitable tool
for kernel developers,enhancing the performance of kprobe has
got much importance.

Currently kprobes inserts a trap instruction to probe a running kernel.
Jump optimization allows kprobes to replace the trap with a branch,reducing
the probe overhead drastically.

Performance:
=
An optimized kprobe in powerpc is 1.05 to 4.7 times faster than a kprobe.

Example:

Placed a probe at an offset 0x50 in _do_fork().
*Time Diff here is, difference in time before hitting the probe and after the 
probed instruction.
mftb() is employed in kernel/fork.c for this purpose.


# echo 0 > /proc/sys/debug/kprobes-optimization 
Kprobes globally unoptimized

[  233.607120] Time Diff = 0x1f0
[  233.608273] Time Diff = 0x1ee
[  233.609228] Time Diff = 0x203
[  233.610400] Time Diff = 0x1ec
[  233.611335] Time Diff = 0x200
[  233.612552] Time Diff = 0x1f0
[  233.613386] Time Diff = 0x1ee
[  233.614547] Time Diff = 0x212
[  233.615570] Time Diff = 0x206
[  233.616819] Time Diff = 0x1f3
[  233.617773] Time Diff = 0x1ec
[  233.618944] Time Diff = 0x1fb
[  233.619879] Time Diff = 0x1f0
[  233.621066] Time Diff = 0x1f9
[  233.621999] Time Diff = 0x283
[  233.623281] Time Diff = 0x24d
[  233.624172] Time Diff = 0x1ea
[  233.625381] Time Diff = 0x1f0
[  233.626358] Time Diff = 0x200
[  233.627572] Time Diff = 0x1ed

# echo 1 > /proc/sys/debug/kprobes-optimization 
Kprobes globally optimized

[   70.797075] Time Diff = 0x103
[   70.799102] Time Diff = 0x181
[   70.801861] Time Diff = 0x15e
[   70.803466] Time Diff = 0xf0
[   70.804348] Time Diff = 0xd0
[   70.805653] Time Diff = 0xad
[   70.806477] Time Diff = 0xe0
[   70.807725] Time Diff = 0xbe
[   70.808541] Time Diff = 0xc3
[   70.810191] Time Diff = 0xc7
[   70.811007] Time Diff = 0xc0
[   70.812629] Time Diff = 0xc0
[   70.813640] Time Diff = 0xda
[   70.814915] Time Diff = 0xbb
[   70.815726] Time Diff = 0xc4
[   70.816955] Time Diff = 0xc0
[   70.817778] Time Diff = 0xcd
[   70.818999] Time Diff = 0xcd
[   70.820099] Time Diff = 0xcb
[   70.821333] Time Diff = 0xf0

Implementation:
===

The trap instruction is replaced by a branch to a detour buffer.
To address the limitation of branch instruction in power architecture
detour buffer slot is allocated from a reserved area . This will ensure
that the branch is within +/- 32 MB range. Patch 2/3 furnishes this.
The current kprobes insn caches  allocate memory area for insn slots
with module_alloc(). This will always be beyond +/- 32MB range.
Hence for allocating and freeing  slots from this reserved area
ppc_get_optinsn_slot() and ppc_free_optinsns_slot() are introduced.

The detour buffer contains a call to optimized_callback() which in turn
call the pre_handler(). Once the pre-handler is run, the original instruction
is emulated from the detour buffer itself. Also the detour buffer is equipped
with a branch back to the normal work flow after the probed instruction is 
emulated.
Before preparing optimization, Kprobes inserts original(user-defined) kprobe on 
the
specified address. So, even if the kprobe is not possible to be optimized, it 
just uses
a normal kprobe.

Limitations:
==

- Number of probes which can be optimized is limited by the size of the area 
reserved.

* TODO: Have a template based implementation that will alleviate the 
probe count by
  using a lesser space from the reserved area for optimization.

- Currently instructions which can be emulated are the only candidates for 
optimization.



Changes from RFC-v1:
---
- Detour buffer memory reservation code moved to optprobes.c
- optimized_callback() is marked as NOKPROBE_SYMBOL.
- Return NULL when there is no more slots to allocate from detour buffer.
- Other comments by Masami are addressed.


Kindly let me know your suggestions and comments.

Thanks
-Anju


Anju T (3):
  arch/powerpc : Add detour buffer support for optprobes
  arch/powerpc : optprobes for powerpc core
  arch/powerpc : Enable optprobes support in powerpc

 .../features/debug/optprobes/arch-support.txt  |   2 +-
 arch/powerpc/Kconfig   |   1 +
 arch/powerpc/include/asm/kprobes.h |  25 ++
 arch/powerpc/kernel/Makefile   |   1 +
 arch/powerpc/kernel/optprobes.c| 474 +
 arch/powerpc/kernel/optprobes_head.S   | 108 +
 6 files changed, 610 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/kernel/optprobes.c
 create mode 100644 arch/powerpc/kernel/optprobes_head.S

-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/9] powerpc/kvm: make hypervisor state restore a function

2016-05-19 Thread Gautham R Shenoy
Hi Shreyas,

On Wed, May 18, 2016 at 12:37:56PM +0530, Shreyas B Prabhu wrote:

[..snip..]
> >> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> >> b/arch/powerpc/kernel/exceptions-64s.S
> >> index 7716ceb..7ebfbb0 100644
> >> --- a/arch/powerpc/kernel/exceptions-64s.S
> >> +++ b/arch/powerpc/kernel/exceptions-64s.S
> >> @@ -107,25 +107,8 @@ BEGIN_FTR_SECTION
> >>beq 9f
> >>
> >>cmpwi   cr3,r13,2
> >> +  bl  power7_restore_hyp_resource
> >>
> >> -  /*
> >> -   * Check if last bit of HSPGR0 is set. This indicates whether we are
> >> -   * waking up from winkle.
> >> -   */
> >> -  GET_PACA(r13)
> >> -  clrldi  r5,r13,63
> >> -  clrrdi  r13,r13,1
> >> -  cmpwi   cr4,r5,1
> >> -  mtspr   SPRN_HSPRG0,r13
> >> -
> >> -  lbz r0,PACA_THREAD_IDLE_STATE(r13)
> >> -  cmpwi   cr2,r0,PNV_THREAD_NAP
> >> -  bgt cr2,8f  /* Either sleep or Winkle */
> >> -
> >> -  /* Waking up from nap should not cause hypervisor state loss */
> >> -  bgt cr3,.
> >> -
> >> -  /* Waking up from nap */
> >>li  r0,PNV_THREAD_RUNNING
> >>stb r0,PACA_THREAD_IDLE_STATE(r13)  /* Clear thread state */
> >>
> >> @@ -143,13 +126,9 @@ BEGIN_FTR_SECTION
> >>
> >>/* Return SRR1 from power7_nap() */
> >>mfspr   r3,SPRN_SRR1
> >> -  beq cr3,2f
> >> -  b   power7_wakeup_noloss
> >> -2:b   power7_wakeup_loss
> >> -
> >> -  /* Fast Sleep wakeup on PowerNV */
> >> -8:GET_PACA(r13)
> > 
> > In the old code, we do a GET_PACA(r13) before invoking the
> > power7_wakeup_tb_loss. In the new code we don't. Can you explain
> > this omission ?
> 
> GET_PACA(13) is the called in the beginning of
> power7_restore_hyp_resource. So r13 contains pointer to PACA when
> power7_wakeup_tb_loss invoked later in the same function.

Ah, I see it now. So the GET_PACA(r13) at 8: was anyway redundant in
the older code.

You can add my Reviewed-by: to this patch.

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH] Increase in idle power with schedutil

2016-05-19 Thread Rafael J. Wysocki
On Thu, May 19, 2016 at 1:40 PM, Peter Zijlstra  wrote:
> On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote:
>> On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat
>>  wrote:
>> > This patch adds driver callback for fast_switch and below observations
>> > on schedutil governor are done with this patch.
>> >
>> > In POWER8 there is a regression observed with schedutil compared to
>> > ondemand. With schedutil the frequency is not ramping down and is
>> > mostly stuck at max frequency during idle . This is because of the
>> > watchdog timer, an RT task which is fired every 4 seconds which
>> > results in requesting max frequency.
>>
>> Well, yes, that would be problematic.
>>
>
> Right; we need to come up with something for RT tasks;

I think we need the hints thing for that to be able to distinguish
between RT and the rest.

Also in this particular case it looks like an RT task is the only task
that wakes up often enough and we don't drop the frequency when going
idle.  Do we need a hook somewhere in the idle path?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 5/9] powerpc/powernv: Move idle related macros to cpuidle.h

2016-05-19 Thread Gautham R Shenoy
On Tue, May 03, 2016 at 01:54:34PM +0530, Shreyas B. Prabhu wrote:
> Move idle related macros to a common location asm/cpuidle.h so that
> they can be used for stop instruction support.
> 
> Signed-off-by: Shreyas B. Prabhy 

Reviewed-by: Gautham R. Shenoy 

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 4/9] powerpc/powernv: Make power7_powersave_common more generic

2016-05-19 Thread Gautham R Shenoy
On Wed, May 18, 2016 at 12:21:17PM +0530, Shreyas B Prabhu wrote:
> With this patch, r5 which is the third parameter to
> power_powersave_common contains the return address that needs to be
> written to SRR0. So here I'm keeping r5 unaltered and using r7 for the MSR.

Ok.

Reviewed-by: Gautham R. Shenoy 

> 
> Thanks,
> Shreyas

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/4] rcutorture: Several fixes to run selftest scripts on PPC

2016-05-19 Thread Paul E. McKenney
On Wed, May 18, 2016 at 09:25:17PM -0700, Josh Triplett wrote:
> On Thu, May 19, 2016 at 11:42:20AM +0800, Boqun Feng wrote:
> > I spend some time to make tools/testing/selftest/rcutorture run on PPC,
> > here are some documention and fixes made while I was trying.
> > 
> > The scripts are able to run and get results on PPC, however please
> > note there are some stalls even build errors that could be found
> > by the tests currently.
> > 
> > As I'm certainly not an expert of qemu or bash programming, there
> > may be something I am missing in those patches. So tests and comments
> > are welcome ;-)
> > 
> > Regards,
> > Boqun
> > 
> > Boqun Feng (4):
> >   rcutorture/doc: Add a new way to create initrd using dracut
> >   rcutorture: Use vmlinux as the fallback kernel image
> >   rcutorture: Make -soundhw a x86 specific option
> >   rcutorture: Don't specify the cpu type of QEMU on PPC
> 
> All four of these seem reasonable to me:
> Reviewed-by: Josh Triplett 

Thank you both!  I have queued all four for further review and for
testing.

> I responded to the -soundhw patch, trying to track down why that option
> was needed in the first place, and seeking a solution that doesn't
> require adding to the set of target-specific options.  But I don't think
> that investigation should block your fix.

Agreed, it should work better now than it did before!

Thanx, Paul

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] rcutorture: Make -soundhw a x86 specific option

2016-05-19 Thread Paul E. McKenney
On Wed, May 18, 2016 at 09:23:10PM -0700, Josh Triplett wrote:
> On Thu, May 19, 2016 at 11:42:23AM +0800, Boqun Feng wrote:
> > The option "-soundhw pcspk" gives me a error on PPC as follow:
> > 
> > qemu-system-ppc64: ISA bus not available for pcspk
> > 
> > , which means this option doesn't work on ppc by default. So simply make
> > this an x86-specific option via identify_qemu_args().
> > 
> > Signed-off-by: Boqun Feng 
> 
> The emulated system for RCU testing does not need sound hardware at all.
> Paul added this option in commit
> 16c77ea7d0f4a74e49009aa2d26c275f7f93de7c to disable the default sound
> hardware, saying that '"-soundhw pcspk" makes the script a bit less
> dependent on odd audio libraries being installed'.  Unfortunately, it
> looks like there isn't a "-soundhw none".  As far as I can tell,
> currently the only way to completely eliminate sound hardware is to pass
> "-nodefaults" and then explicitly specify each desired device; while
> that would solve the issue, it would likely introduce *more*
> hardware-specific command-line options...
> 
> I've filed two feature requests on upstream qemu to make this simpler:
> https://bugs.launchpad.net/qemu/+bug/1583420 and
> https://bugs.launchpad.net/qemu/+bug/1583421 .
> 
> Paul, what did you mean by "dependent on odd audio libraries"?  Did you
> mean in the guest or the host?  And either way, is this something that
> could potentially be solved another way?

If I remember correctly, Ubuntu 14.04 qemu refused to run the guest
without this option, but I don't recall the exact error message.
I chalked it up to my ignorance of qemu, but I would very much welcome
some way to not have to specify irrelevant hardware.  So thank you very
much for filing the bugs!

Thanx, Paul

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: next build: 37 warnings 2 failures (next/next-20160519)

2016-05-19 Thread Arnd Bergmann
On Thursday 19 May 2016 15:03:34 Kishon Vijay Abraham I wrote:
> > 
> >>   1 drivers/phy/phy-exynos-mipi-video.c:238:13: warning: 'val' may be 
> >> used uninitialized in this function [-Wmaybe-uninitialized]
> > 
> > I sent a patch on May 11, it was reviewed by Krzysztof Kozlowski, but not 
> > yet
> > applied.
> 
> Is it okay if I send this during the -rc cycle?
> 

Yes, it's a bug fix, so it should just go in as soon as possible, that's what
the -rc cycle is for after all.

Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: next build: 37 warnings 2 failures (next/next-20160519)

2016-05-19 Thread Arnd Bergmann
On Thursday 19 May 2016 11:20:44 Michal Hocko wrote:
> On Thu 19-05-16 11:07:09, Arnd Bergmann wrote:
> [...]
> > >   6 mm/page_alloc.c:3651:6: warning: 'compact_result' may be used 
> > > uninitialized in this function [-Wmaybe-uninitialized]
> > 
> > I'm surprised this one is still there, I sent a patch but Michal Hocko came 
> > up with
> > a better fix on May 12, which was not applied yet.
> > 
> > Michael, can you resend this one to Andrew? I suspect he missed it as it was
> > sent as a reply to mine.
> 
> Andrew has taken the patch IIRC but he hasn't released any mmotm since
> then so it didn't get to the linux-next.
> 

Ok, cool, that explains it.

Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: WARNING at kernel/sched/core.c:1166 while booting 4.6.0 mainline on ppc64le bare metal

2016-05-19 Thread Gavin Shan
On Thu, May 19, 2016 at 04:27:49PM +0530, abdhalee wrote:
>Hi
>
>Today's mainline stable 4.6 on ppc64le bare metal booted with the following
>warning.
>
>[0.080615] EEH: PowerNV platform initialized
>[0.080709] POWER8 performance monitor hardware support registered
>[0.080791] power8-pmu: PMAO restore workaround active.
>[0.100780] [ cut here ]
>[0.100869] WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166
>__set_cpus_allowed_ptr+0x21c/0x290

I ran into same issue on yesterday's linux-next. Also, I added some logs
and it seems the CPU isn't marked as active in time. The stack trace is
poped up under the circumstance: CPU#80 is online, but not active yet.

==> cpuhp_thread_fun: CPU=80
cpuhp_thread_fun: state=10 target=45
cpuhp_ap_online: CPU=80, state=10 target=45
smpboot_unpark_threads: CPU=80
notify_online: CPU=80 CPU#80 isn't active yet.
[ cut here ]
WARNING: CPU: 80 PID: 408 at kernel/sched/core.c:1166 
__set_cpus_allowed_ptr+0x22c/0x290
Modules linked in:
CPU: 80 PID: 408 Comm: cpuhp/80 Not tainted 
4.6.0-next-20160517-gavin-00020-g176bf86-dirty #35
task: c01e5243de00 ti: c01ffc10c000 task.ti: c01ffc10c000
NIP: c00d923c LR: c00d9224 CTR: 
REGS: c01ffc10f730 TRAP: 0700   Not tainted  
(4.6.0-next-20160517-gavin-00020-g176bf86-dirty)
MSR: 90029033   CR: 28002044  XER: 2000
CFAR: c047135c SOFTE: 0 
GPR00: c00d9138 c01ffc10f9b0 c1321300  
GPR04: c135aa18 0400 0010  
GPR08:  0050 c135aa90  
GPR12: 2200 cff14000 c00ffa60c5d0 c1292800 
GPR16: 0001 c12780a8 c139b678 0001 
GPR20: c01e523b c1278048 0008 c12cfa8e 
GPR24: c12780c8 c01ffc10fa40 c1278048 c135a898 
GPR28: c00ff133ff08 c00ff9c0c780 c01e5240  
NIP [c00d923c] __set_cpus_allowed_ptr+0x22c/0x290
LR [c00d9224] __set_cpus_allowed_ptr+0x214/0x290
Call Trace:
[c01ffc10f9b0] [c00d9138] __set_cpus_allowed_ptr+0x128/0x290 
(unreliable)
[c01ffc10fa20] [c00c65e0] workqueue_cpu_up_callback+0x460/0x5d0
[c01ffc10faf0] [c00cee6c] notifier_call_chain+0xac/0x110
[c01ffc10fb40] [c009fc64] __cpu_notify+0x54/0xa0
[c01ffc10fb60] [c009fd9c] notify_online+0x4c/0x70
[c01ffc10fbd0] [c009f5b4] cpuhp_up_callbacks+0x74/0x1a0
[c01ffc10fc20] [c00a0100] cpuhp_thread_fun+0x1e0/0x2a0
[c01ffc10fcc0] [c00d2ac0] smpboot_thread_fn+0x290/0x2a0
[c01ffc10fd20] [c00cd578] kthread+0x108/0x130
[c01ffc10fe30] [c0009578] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
419eff38 3c820004 38849718 7f83e378 38a00400 483980f1 6000 2fa3 
409eff18 813e0254 2f890001 419eff0c <0fe0> 4b04 80810038 387d0018 
---[ end trace 5cf6676167cdd41c ]---
sched_cpu_activate: CPU=80  < CPU#80 is marked as active

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Pull request: scottwood/linux.git next

2016-05-19 Thread Michael Ellerman
On Mon, 2016-05-16 at 20:37 -0500, Scott Wood wrote:

> Sorry for the lateness...
> 
> Contains include 86xx fixes, minor device tree fixes, an erratum
> workaround, and a kconfig dependency fix.

Thanks, merged into next and pushed.

Will send a pull request to Linus late tomorrow once the whole assemblage has
been through linux-next.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH] Increase in idle power with schedutil

2016-05-19 Thread Peter Zijlstra
On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote:
> On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat
>  wrote:
> > This patch adds driver callback for fast_switch and below observations
> > on schedutil governor are done with this patch.
> >
> > In POWER8 there is a regression observed with schedutil compared to
> > ondemand. With schedutil the frequency is not ramping down and is
> > mostly stuck at max frequency during idle . This is because of the
> > watchdog timer, an RT task which is fired every 4 seconds which
> > results in requesting max frequency.
> 
> Well, yes, that would be problematic.
> 

Right; we need to come up with something for RT tasks; but what happens
if you disable the watchdog? This should be entirely doable and might
give a better comparison.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] [PATCH v8 2/4] GCC plugin infrastructure

2016-05-19 Thread PaX Team
On 19 May 2016 at 16:22, Michael Ellerman wrote:

> On Wed, 2016-05-18 at 12:33 +0200, Emese Revfy wrote:
> > Did you test the plugins with all gcc versions (4.5-6)?
> 
> What's the concern about gcc versions? Just not breaking the build on old
> compilers?

the earlier plugin capable gcc versions used to install gcc headers in a 
somewhat
ad-hoc manner resulting in compile time breakage for plugins and since some of
those potentially missing headers are target specific, each target arch should
be verified before enabling plugin support on them. things have much improved 
with
gcc 5 (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61176) though there's 
still
an occasional missing header but with wider use of plugins they will hopefully 
be
discovered earlier now. perhaps linux-arch should be cc'ed on the plugin 
infrastructure
so that arch maintainers are aware of this?

> I'm pretty sure powerpc big endian still builds with gcc 4.4.
> 
> However if Andrew's only tested on little endian, then that select should be
> guarded with an "if CPU_LITTLE_ENDIAN". And to build LE you need gcc >= 4.9.

i guess that's part of the target tuple so in general arch maintainers should 
test
the target tuples used on their arch with all the supported gcc versions 
(speaking
of CC, not HOSTCC/HOSTCXX).

cheers,
 PaX Team

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: next build: 37 warnings 2 failures (next/next-20160519)

2016-05-19 Thread Stefano Stabellini
On Thu, 19 May 2016, Arnd Bergmann wrote:
> >   2 drivers/xen/balloon.c:154:13: warning: 'release_memory_resource' 
> > declared 'static' but never defined [-Wunused-function]
> 
> I sent a patch on May 11, subject "xen: remove incorrect forward declaration" 
> and
> Stefano Stabellini reviewed it. Ross Lagerwall did the same patch a day 
> earlier,
> but neither of them has made it into linux-next so far. According to Ross, 
> this
> one should be backported to v4.4.

It's on our radar, the patch hasn't been lost.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: next build: 37 warnings 2 failures (next/next-20160519)

2016-05-19 Thread Kishon Vijay Abraham I
Hi Arnd,

On Thursday 19 May 2016 02:37 PM, Arnd Bergmann wrote:
> On Thursday 19 May 2016 00:45:16 Olof's autobuilder wrote:
>> Errors:
>>
>> arm64.allmodconfig:
>> samples/seccomp/bpf-fancy.c:13:27: fatal error: linux/seccomp.h: No such 
>> file or directory
>> samples/seccomp/dropper.c:20:27: fatal error: linux/seccomp.h: No such file 
>> or directory
>> samples/seccomp/bpf-helper.h:20:50: fatal error: linux/seccomp.h: No such 
>> file or directory
>> samples/seccomp/bpf-direct.c:21:27: fatal error: linux/seccomp.h: No such 
>> file or directory
> 
> This one is interesting: the same header dependency seems to be present for 
> samples/bpf,
> but only samples/seccomp fails. Can you check if both are attempted to be 
> built?
> 
> samples/bpf/README.rst says about this:
> 
> |Kernel headers
> |--
> |
> |There are usually dependencies to header files of the current kernel.
> |To avoid installing devel kernel headers system wide, as a normal
> |user, simply call::
> |
> | make headers_install
> |
> |This will creates a local "usr/include" directory in the git/build top
> |level directory, that the make system automatically pickup first.
> 
> which I assume would fix the problem, but it would be better if Kbuild was 
> smart enough
> to do this implicitly when building these samples.
> 
>> powerpc.pasemi_defconfig:
>> arch/powerpc/kernel/ptrace.c:380:24: error: index 32 denotes an offset 
>> greater than size of 'u64[32][1] {aka long long unsigned int[32][1]}' 
>> [-Werror=array-bounds]
>> arch/powerpc/kernel/ptrace.c:408:24: error: index 32 denotes an offset 
>> greater than size of 'u64[32][1] {aka long long unsigned int[32][1]}' 
>> [-Werror=array-bounds]
> 
> I don't see a good way to avoid the warning other than dropping the
> 
>BUILD_BUG_ON(offsetof(struct thread_fp_state, fpscr) !=
> offsetof(struct thread_fp_state, fpr[32][0]));
> 
> statements in the powerpc ptrace implementation. It doesn't seem too
> important to check for though.
> 
> 
>> Warnings:
> 
>>   2 drivers/net/wireless/intel/iwlegacy/3945.c:1022:5: warning: suggest 
>> explicit braces to avoid ambiguous 'else' [-Wparentheses]
> 
> I had not seen this before, sent a patch now.
> 
>>   3 drivers/pinctrl/stm32/pinctrl-stm32.c:797:17: warning: too many 
>> arguments for format [-Wformat-extra-args]
> 
> sent a fix yesterday, got an ack but it wasn't applied yet. I'm sure Linus 
> Walleij
> will take care of it soon.
> 
>>   6 mm/page_alloc.c:3651:6: warning: 'compact_result' may be used 
>> uninitialized in this function [-Wmaybe-uninitialized]
> 
> I'm surprised this one is still there, I sent a patch but Michal Hocko came 
> up with
> a better fix on May 12, which was not applied yet.
> 
> Michael, can you resend this one to Andrew? I suspect he missed it as it was
> sent as a reply to mine.
> 
>>   2 drivers/xen/balloon.c:154:13: warning: 'release_memory_resource' 
>> declared 'static' but never defined [-Wunused-function]
> 
> I sent a patch on May 11, subject "xen: remove incorrect forward declaration" 
> and
> Stefano Stabellini reviewed it. Ross Lagerwall did the same patch a day 
> earlier,
> but neither of them has made it into linux-next so far. According to Ross, 
> this
> one should be backported to v4.4.
> 
>>   3 fs/xfs/xfs_aops.c:97:16: warning: unused variable 'blockmask' 
>> [-Wunused-variable]
> 
> I sent a patch on April 16, but got no reply. Resending it now.
> 
>>   2 arch/arm/mach-lpc32xx/include/mach/irqs.h:115:0: warning: "NR_IRQS" 
>> redefined
> 
> I missed this one, as I have some other patches for lp32xx in my randconfig
> fixup tree that hides it.
> 
> I've created a fix now and applied it to the arm-soc fixes branch.
> 
>>   1 drivers/soc/mediatek/mtk-pmic-wrap.c:1062:16: warning: large integer 
>> implicitly truncated to unsigned type [-Woverflow]
>>   1 drivers/soc/mediatek/mtk-pmic-wrap.c:1074:16: warning: large integer 
>> implicitly truncated to unsigned type [-Woverflow]
>>   1 drivers/soc/mediatek/mtk-pmic-wrap.c:1086:16: warning: large integer 
>> implicitly truncated to unsigned type [-Woverflow]
> 
> I sent out a patch on May 12 for this, got no reply. I've applied my own patch
> now on the arm-soc fixes branch.
> 
>>   1 drivers/phy/phy-exynos-mipi-video.c:238:13: warning: 'val' may be 
>> used uninitialized in this function [-Wmaybe-uninitialized]
> 
> I sent a patch on May 11, it was reviewed by Krzysztof Kozlowski, but not yet
> applied.

Is it okay if I send this during the -rc cycle?

Thanks
Kishon
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: next build: 37 warnings 2 failures (next/next-20160519)

2016-05-19 Thread Michal Hocko
On Thu 19-05-16 11:07:09, Arnd Bergmann wrote:
[...]
> >   6 mm/page_alloc.c:3651:6: warning: 'compact_result' may be used 
> > uninitialized in this function [-Wmaybe-uninitialized]
> 
> I'm surprised this one is still there, I sent a patch but Michal Hocko came 
> up with
> a better fix on May 12, which was not applied yet.
> 
> Michael, can you resend this one to Andrew? I suspect he missed it as it was
> sent as a reply to mine.

Andrew has taken the patch IIRC but he hasn't released any mmotm since
then so it didn't get to the linux-next.
-- 
Michal Hocko
SUSE Labs
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: next build: 37 warnings 2 failures (next/next-20160519)

2016-05-19 Thread Arnd Bergmann
On Thursday 19 May 2016 00:45:16 Olof's autobuilder wrote:
> Errors:
> 
> arm64.allmodconfig:
> samples/seccomp/bpf-fancy.c:13:27: fatal error: linux/seccomp.h: No such file 
> or directory
> samples/seccomp/dropper.c:20:27: fatal error: linux/seccomp.h: No such file 
> or directory
> samples/seccomp/bpf-helper.h:20:50: fatal error: linux/seccomp.h: No such 
> file or directory
> samples/seccomp/bpf-direct.c:21:27: fatal error: linux/seccomp.h: No such 
> file or directory

This one is interesting: the same header dependency seems to be present for 
samples/bpf,
but only samples/seccomp fails. Can you check if both are attempted to be built?

samples/bpf/README.rst says about this:

|Kernel headers
|--
|
|There are usually dependencies to header files of the current kernel.
|To avoid installing devel kernel headers system wide, as a normal
|user, simply call::
|
| make headers_install
|
|This will creates a local "usr/include" directory in the git/build top
|level directory, that the make system automatically pickup first.

which I assume would fix the problem, but it would be better if Kbuild was 
smart enough
to do this implicitly when building these samples.

> powerpc.pasemi_defconfig:
> arch/powerpc/kernel/ptrace.c:380:24: error: index 32 denotes an offset 
> greater than size of 'u64[32][1] {aka long long unsigned int[32][1]}' 
> [-Werror=array-bounds]
> arch/powerpc/kernel/ptrace.c:408:24: error: index 32 denotes an offset 
> greater than size of 'u64[32][1] {aka long long unsigned int[32][1]}' 
> [-Werror=array-bounds]

I don't see a good way to avoid the warning other than dropping the

   BUILD_BUG_ON(offsetof(struct thread_fp_state, fpscr) !=
offsetof(struct thread_fp_state, fpr[32][0]));

statements in the powerpc ptrace implementation. It doesn't seem too
important to check for though.


> Warnings:

>   2 drivers/net/wireless/intel/iwlegacy/3945.c:1022:5: warning: suggest 
> explicit braces to avoid ambiguous 'else' [-Wparentheses]

I had not seen this before, sent a patch now.

>   3 drivers/pinctrl/stm32/pinctrl-stm32.c:797:17: warning: too many 
> arguments for format [-Wformat-extra-args]

sent a fix yesterday, got an ack but it wasn't applied yet. I'm sure Linus 
Walleij
will take care of it soon.

>   6 mm/page_alloc.c:3651:6: warning: 'compact_result' may be used 
> uninitialized in this function [-Wmaybe-uninitialized]

I'm surprised this one is still there, I sent a patch but Michal Hocko came up 
with
a better fix on May 12, which was not applied yet.

Michael, can you resend this one to Andrew? I suspect he missed it as it was
sent as a reply to mine.

>   2 drivers/xen/balloon.c:154:13: warning: 'release_memory_resource' 
> declared 'static' but never defined [-Wunused-function]

I sent a patch on May 11, subject "xen: remove incorrect forward declaration" 
and
Stefano Stabellini reviewed it. Ross Lagerwall did the same patch a day earlier,
but neither of them has made it into linux-next so far. According to Ross, this
one should be backported to v4.4.

>   3 fs/xfs/xfs_aops.c:97:16: warning: unused variable 'blockmask' 
> [-Wunused-variable]

I sent a patch on April 16, but got no reply. Resending it now.

>   2 arch/arm/mach-lpc32xx/include/mach/irqs.h:115:0: warning: "NR_IRQS" 
> redefined

I missed this one, as I have some other patches for lp32xx in my randconfig
fixup tree that hides it.

I've created a fix now and applied it to the arm-soc fixes branch.

>   1 drivers/soc/mediatek/mtk-pmic-wrap.c:1062:16: warning: large integer 
> implicitly truncated to unsigned type [-Woverflow]
>   1 drivers/soc/mediatek/mtk-pmic-wrap.c:1074:16: warning: large integer 
> implicitly truncated to unsigned type [-Woverflow]
>   1 drivers/soc/mediatek/mtk-pmic-wrap.c:1086:16: warning: large integer 
> implicitly truncated to unsigned type [-Woverflow]

I sent out a patch on May 12 for this, got no reply. I've applied my own patch
now on the arm-soc fixes branch.

>   1 drivers/phy/phy-exynos-mipi-video.c:238:13: warning: 'val' may be 
> used uninitialized in this function [-Wmaybe-uninitialized]

I sent a patch on May 11, it was reviewed by Krzysztof Kozlowski, but not yet
applied.

>   1 include/soc/nps/common.h:148:9: warning: cast to pointer from integer 
> of different size [-Wint-to-pointer-cast]
>   1 include/soc/nps/common.h:162:9: warning: cast to pointer from integer 
> of different size [-Wint-to-pointer-cast]

I sent a patch on May 12, but it hasn't appeared in linux-next yet.

>   1 drivers/infiniband/core/cma.c:1253:12: warning: 
> 'src_addr_storage.sin_addr.s_addr' may be used uninitialized in this function 
> [-Wmaybe-uninitialized]

This seems to only happen on powerpc. What compiler version are you using 
there? If it's
an older compiler, we might not necessarily care about the warnings but you may 
want to
upgrade.

I've confirmed that this is a false positive, but I see 

[PATCH 7/7] powerpc/mm: remove flush_tlb_page_nohash

2016-05-19 Thread Aneesh Kumar K.V
This should be same as flush_tlb_page except for hash32. For hash32
I guess the existing code is wrong, because we don't seem to be
flushing tlb for Hash != 0 case at all. Fix this by switching to
calling flush_tlb_page() which does the right thing by flushing
tlb for both hash and nohash case with hash32

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h |  5 -
 arch/powerpc/include/asm/book3s/64/tlbflush.h  |  8 
 arch/powerpc/include/asm/tlbflush.h|  1 -
 arch/powerpc/mm/pgtable.c  |  2 +-
 arch/powerpc/mm/tlb_hash32.c   | 11 ---
 5 files changed, 1 insertion(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
index f12ddf5e8de5..2f6373144e2c 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
@@ -75,11 +75,6 @@ static inline void hash__flush_tlb_page(struct 
vm_area_struct *vma,
 {
 }
 
-static inline void hash__flush_tlb_page_nohash(struct vm_area_struct *vma,
-  unsigned long vmaddr)
-{
-}
-
 static inline void hash__flush_tlb_range(struct vm_area_struct *vma,
 unsigned long start, unsigned long end)
 {
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index 3b3e5e944af7..ea29cc3318d2 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -57,14 +57,6 @@ static inline void local_flush_tlb_page(struct 
vm_area_struct *vma,
return hash__local_flush_tlb_page(vma, vmaddr);
 }
 
-static inline void flush_tlb_page_nohash(struct vm_area_struct *vma,
-unsigned long vmaddr)
-{
-   if (radix_enabled())
-   return radix__flush_tlb_page(vma, vmaddr);
-   return hash__flush_tlb_page_nohash(vma, vmaddr);
-}
-
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
if (radix_enabled())
diff --git a/arch/powerpc/include/asm/tlbflush.h 
b/arch/powerpc/include/asm/tlbflush.h
index 1b38eea28e5a..13dbcd41885e 100644
--- a/arch/powerpc/include/asm/tlbflush.h
+++ b/arch/powerpc/include/asm/tlbflush.h
@@ -54,7 +54,6 @@ extern void __flush_tlb_page(struct mm_struct *mm, unsigned 
long vmaddr,
 #define flush_tlb_page(vma,addr)   local_flush_tlb_page(vma,addr)
 #define __flush_tlb_page(mm,addr,p,i)  __local_flush_tlb_page(mm,addr,p,i)
 #endif
-#define flush_tlb_page_nohash(vma,addr)flush_tlb_page(vma,addr)
 
 #elif defined(CONFIG_PPC_STD_MMU_32)
 
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 88a307504b5a..0b6fb244d0a1 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -225,7 +225,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, 
unsigned long address,
if (!is_vm_hugetlb_page(vma))
assert_pte_locked(vma->vm_mm, address);
__ptep_set_access_flags(ptep, entry);
-   flush_tlb_page_nohash(vma, address);
+   flush_tlb_page(vma, address);
}
return changed;
 }
diff --git a/arch/powerpc/mm/tlb_hash32.c b/arch/powerpc/mm/tlb_hash32.c
index 558e30cce33e..702d7689d714 100644
--- a/arch/powerpc/mm/tlb_hash32.c
+++ b/arch/powerpc/mm/tlb_hash32.c
@@ -49,17 +49,6 @@ void flush_hash_entry(struct mm_struct *mm, pte_t *ptep, 
unsigned long addr)
 EXPORT_SYMBOL(flush_hash_entry);
 
 /*
- * Called by ptep_set_access_flags, must flush on CPUs for which the
- * DSI handler can't just "fixup" the TLB on a write fault
- */
-void flush_tlb_page_nohash(struct vm_area_struct *vma, unsigned long addr)
-{
-   if (Hash != 0)
-   return;
-   _tlbie(addr);
-}
-
-/*
  * Called at the end of a mmu_gather operation to make sure the
  * TLB flush is completely done.
  */
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/7] powerpc/mm: Use hugetlb flush functions

2016-05-19 Thread Aneesh Kumar K.V
Use flush_hugetlb_page instead of flush_tlb_page when we clear flush the
pte.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/hugetlb.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index e2d9f4996e5c..c5517f463ec7 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -147,7 +147,7 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
pte_t pte;
pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
-   flush_tlb_page(vma, addr);
+   flush_hugetlb_page(vma, addr);
 }
 
 static inline int huge_pte_none(pte_t pte)
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 6/7] powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range

2016-05-19 Thread Aneesh Kumar K.V
Some archs like ppc64 need to do special things when flushing tlb for
hugepage. Add a new helper to flush hugetlb tlb range. This helps us to
avoid flushing the entire tlb mapping for the pid.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/tlbflush-radix.h |  2 ++
 arch/powerpc/include/asm/book3s/64/tlbflush.h   | 10 ++
 arch/powerpc/mm/hugetlbpage-radix.c | 10 ++
 mm/hugetlb.c| 10 +-
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 68839e6adcf1..73953a44d4e3 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -10,6 +10,8 @@ static inline int mmu_get_ap(int psize)
return mmu_psize_defs[psize].ap;
 }
 
+extern void radix__flush_hugetlb_tlb_range(struct vm_area_struct *vma,
+  unsigned long start, unsigned long 
end);
 extern void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long 
start,
 unsigned long end, int psize);
 extern void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index f0d6c9d38916..3b3e5e944af7 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -16,6 +16,16 @@ static inline void flush_pmd_tlb_range(struct vm_area_struct 
*vma,
return hash__flush_tlb_range(vma, start, end);
 }
 
+#define __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE
+static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma,
+  unsigned long start,
+  unsigned long end)
+{
+   if (radix_enabled())
+   return radix__flush_hugetlb_tlb_range(vma, start, end);
+   return hash__flush_tlb_range(vma, start, end);
+}
+
 static inline void flush_tlb_range(struct vm_area_struct *vma,
   unsigned long start, unsigned long end)
 {
diff --git a/arch/powerpc/mm/hugetlbpage-radix.c 
b/arch/powerpc/mm/hugetlbpage-radix.c
index 1eca0deaf89b..35254a678456 100644
--- a/arch/powerpc/mm/hugetlbpage-radix.c
+++ b/arch/powerpc/mm/hugetlbpage-radix.c
@@ -25,6 +25,16 @@ void radix__local_flush_hugetlb_page(struct vm_area_struct 
*vma, unsigned long v
radix__local_flush_tlb_page_psize(vma->vm_mm, vmaddr, psize);
 }
 
+void radix__flush_hugetlb_tlb_range(struct vm_area_struct *vma, unsigned long 
start,
+  unsigned long end)
+{
+   int psize;
+   struct hstate *hstate = hstate_file(vma->vm_file);
+
+   psize = hstate_get_psize(hstate);
+   radix__flush_tlb_range_psize(vma->vm_mm, start, end, psize);
+}
+
 /*
  * A vairant of hugetlb_get_unmapped_area doing topdown search
  * FIXME!! should we do as x86 does or non hugetlb area does ?
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 19d0d08b396f..076a57ee8790 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3893,6 +3893,14 @@ same_page:
return i ? i : -EFAULT;
 }
 
+#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE
+/*
+ * ARCHes with special requirements for evicting HUGETLB backing TLB entries 
can
+ * implement this.
+ */
+#define flush_hugetlb_tlb_range(vma, addr, end)flush_tlb_range(vma, 
addr, end)
+#endif
+
 unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
unsigned long address, unsigned long end, pgprot_t newprot)
 {
@@ -3953,7 +3961,7 @@ unsigned long hugetlb_change_protection(struct 
vm_area_struct *vma,
 * once we release i_mmap_rwsem, another task can do the final put_page
 * and that page table be reused and filled with junk.
 */
-   flush_tlb_range(vma, start, end);
+   flush_hugetlb_tlb_range(vma, start, end);
mmu_notifier_invalidate_range(mm, start, end);
i_mmap_unlock_write(vma->vm_file->f_mapping);
mmu_notifier_invalidate_range_end(mm, start, end);
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 5/7] powerpc/mm/radix/hugetlb: Add helper for finding page size from hstate

2016-05-19 Thread Aneesh Kumar K.V
Use the helper instead of open coding the same at multiple place

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hugetlb-radix.h | 15 +++
 .../powerpc/include/asm/book3s/64/tlbflush-radix.h |  4 +--
 arch/powerpc/mm/hugetlbpage-radix.c| 29 ++
 arch/powerpc/mm/tlb-radix.c| 10 +---
 4 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb-radix.h 
b/arch/powerpc/include/asm/book3s/64/hugetlb-radix.h
index 60f47649306f..c45189aa7476 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb-radix.h
@@ -11,4 +11,19 @@ extern unsigned long
 radix__hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
unsigned long len, unsigned long pgoff,
unsigned long flags);
+
+static inline int hstate_get_psize(struct hstate *hstate)
+{
+   unsigned long shift;
+
+   shift = huge_page_shift(hstate);
+   if (shift == mmu_psize_defs[MMU_PAGE_2M].shift)
+   return MMU_PAGE_2M;
+   else if (shift == mmu_psize_defs[MMU_PAGE_1G].shift)
+   return MMU_PAGE_1G;
+   else {
+   WARN(1, "Wrong huge page shift\n");
+   return mmu_virtual_psize;
+   }
+}
 #endif
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 07b2e0031dad..68839e6adcf1 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -21,13 +21,13 @@ extern void radix__flush_tlb_kernel_range(unsigned long 
start, unsigned long end
 extern void radix__local_flush_tlb_mm(struct mm_struct *mm);
 extern void radix__local_flush_tlb_page(struct vm_area_struct *vma, unsigned 
long vmaddr);
 extern void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned 
long vmaddr,
- unsigned long ap);
+ int psize);
 extern void radix__tlb_flush(struct mmu_gather *tlb);
 #ifdef CONFIG_SMP
 extern void radix__flush_tlb_mm(struct mm_struct *mm);
 extern void radix__flush_tlb_page(struct vm_area_struct *vma, unsigned long 
vmaddr);
 extern void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long 
vmaddr,
-   unsigned long ap);
+   int psize);
 #else
 #define radix__flush_tlb_mm(mm)radix__local_flush_tlb_mm(mm)
 #define radix__flush_tlb_page(vma,addr)
radix__local_flush_tlb_page(vma,addr)
diff --git a/arch/powerpc/mm/hugetlbpage-radix.c 
b/arch/powerpc/mm/hugetlbpage-radix.c
index 0dfa1816f0c6..1eca0deaf89b 100644
--- a/arch/powerpc/mm/hugetlbpage-radix.c
+++ b/arch/powerpc/mm/hugetlbpage-radix.c
@@ -5,39 +5,24 @@
 #include 
 #include 
 #include 
+#include 
 
 void radix__flush_hugetlb_page(struct vm_area_struct *vma, unsigned long 
vmaddr)
 {
-   unsigned long ap, shift;
+   int psize;
struct hstate *hstate = hstate_file(vma->vm_file);
 
-   shift = huge_page_shift(hstate);
-   if (shift == mmu_psize_defs[MMU_PAGE_2M].shift)
-   ap = mmu_get_ap(MMU_PAGE_2M);
-   else if (shift == mmu_psize_defs[MMU_PAGE_1G].shift)
-   ap = mmu_get_ap(MMU_PAGE_1G);
-   else {
-   WARN(1, "Wrong huge page shift\n");
-   return ;
-   }
-   radix__flush_tlb_page_psize(vma->vm_mm, vmaddr, ap);
+   psize = hstate_get_psize(hstate);
+   radix__flush_tlb_page_psize(vma->vm_mm, vmaddr, psize);
 }
 
 void radix__local_flush_hugetlb_page(struct vm_area_struct *vma, unsigned long 
vmaddr)
 {
-   unsigned long ap, shift;
+   int psize;
struct hstate *hstate = hstate_file(vma->vm_file);
 
-   shift = huge_page_shift(hstate);
-   if (shift == mmu_psize_defs[MMU_PAGE_2M].shift)
-   ap = mmu_get_ap(MMU_PAGE_2M);
-   else if (shift == mmu_psize_defs[MMU_PAGE_1G].shift)
-   ap = mmu_get_ap(MMU_PAGE_1G);
-   else {
-   WARN(1, "Wrong huge page shift\n");
-   return ;
-   }
-   radix__local_flush_tlb_page_psize(vma->vm_mm, vmaddr, ap);
+   psize = hstate_get_psize(hstate);
+   radix__local_flush_tlb_page_psize(vma->vm_mm, vmaddr, psize);
 }
 
 /*
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index b1dc4675925d..7bc3d1402c63 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -128,9 +128,10 @@ void radix__local_flush_tlb_mm(struct mm_struct *mm)
 EXPORT_SYMBOL(radix__local_flush_tlb_mm);
 
 void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned long 
vmaddr,
-  unsigned long ap)
+  int psize)

[PATCH 4/7] powerpc/mm/radix: Rename function and drop unused arg

2016-05-19 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/tlbflush-radix.h | 10 +-
 arch/powerpc/mm/hugetlbpage-radix.c |  4 ++--
 arch/powerpc/mm/tlb-radix.c | 16 
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 823528d34688..07b2e0031dad 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -20,18 +20,18 @@ extern void radix__flush_tlb_kernel_range(unsigned long 
start, unsigned long end
 
 extern void radix__local_flush_tlb_mm(struct mm_struct *mm);
 extern void radix__local_flush_tlb_page(struct vm_area_struct *vma, unsigned 
long vmaddr);
-extern void radix___local_flush_tlb_page(struct mm_struct *mm, unsigned long 
vmaddr,
-   unsigned long ap, int nid);
+extern void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned 
long vmaddr,
+ unsigned long ap);
 extern void radix__tlb_flush(struct mmu_gather *tlb);
 #ifdef CONFIG_SMP
 extern void radix__flush_tlb_mm(struct mm_struct *mm);
 extern void radix__flush_tlb_page(struct vm_area_struct *vma, unsigned long 
vmaddr);
-extern void radix___flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
- unsigned long ap, int nid);
+extern void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long 
vmaddr,
+   unsigned long ap);
 #else
 #define radix__flush_tlb_mm(mm)radix__local_flush_tlb_mm(mm)
 #define radix__flush_tlb_page(vma,addr)
radix__local_flush_tlb_page(vma,addr)
-#define radix___flush_tlb_page(mm,addr,p,i)
radix___local_flush_tlb_page(mm,addr,p,i)
+#define radix__flush_tlb_page_psize(mm,addr,p) 
radix__local_flush_tlb_page_psize(mm,addr,p)
 #endif
 
 #endif
diff --git a/arch/powerpc/mm/hugetlbpage-radix.c 
b/arch/powerpc/mm/hugetlbpage-radix.c
index 1e11559e1aac..0dfa1816f0c6 100644
--- a/arch/powerpc/mm/hugetlbpage-radix.c
+++ b/arch/powerpc/mm/hugetlbpage-radix.c
@@ -20,7 +20,7 @@ void radix__flush_hugetlb_page(struct vm_area_struct *vma, 
unsigned long vmaddr)
WARN(1, "Wrong huge page shift\n");
return ;
}
-   radix___flush_tlb_page(vma->vm_mm, vmaddr, ap, 0);
+   radix__flush_tlb_page_psize(vma->vm_mm, vmaddr, ap);
 }
 
 void radix__local_flush_hugetlb_page(struct vm_area_struct *vma, unsigned long 
vmaddr)
@@ -37,7 +37,7 @@ void radix__local_flush_hugetlb_page(struct vm_area_struct 
*vma, unsigned long v
WARN(1, "Wrong huge page shift\n");
return ;
}
-   radix___local_flush_tlb_page(vma->vm_mm, vmaddr, ap, 0);
+   radix__local_flush_tlb_page_psize(vma->vm_mm, vmaddr, ap);
 }
 
 /*
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index fe2fc58d2e00..b1dc4675925d 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -127,8 +127,8 @@ void radix__local_flush_tlb_mm(struct mm_struct *mm)
 }
 EXPORT_SYMBOL(radix__local_flush_tlb_mm);
 
-void radix___local_flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
-   unsigned long ap, int nid)
+void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned long 
vmaddr,
+  unsigned long ap)
 {
unsigned int pid;
 
@@ -146,8 +146,8 @@ void radix__local_flush_tlb_page(struct vm_area_struct 
*vma, unsigned long vmadd
if (vma && is_vm_hugetlb_page(vma))
return __local_flush_hugetlb_page(vma, vmaddr);
 #endif
-   radix___local_flush_tlb_page(vma ? vma->vm_mm : NULL, vmaddr,
-  mmu_get_ap(mmu_virtual_psize), 0);
+   radix__local_flush_tlb_page_psize(vma ? vma->vm_mm : NULL, vmaddr,
+ mmu_get_ap(mmu_virtual_psize));
 }
 EXPORT_SYMBOL(radix__local_flush_tlb_page);
 
@@ -176,8 +176,8 @@ no_context:
 }
 EXPORT_SYMBOL(radix__flush_tlb_mm);
 
-void radix___flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
-  unsigned long ap, int nid)
+void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
+unsigned long ap)
 {
unsigned int pid;
 
@@ -205,8 +205,8 @@ void radix__flush_tlb_page(struct vm_area_struct *vma, 
unsigned long vmaddr)
if (vma && is_vm_hugetlb_page(vma))
return flush_hugetlb_page(vma, vmaddr);
 #endif
-   radix___flush_tlb_page(vma ? vma->vm_mm : NULL, vmaddr,
-mmu_get_ap(mmu_virtual_psize), 0);
+   radix__flush_tlb_page_psize(vma ? vma->vm_mm : NULL, vmaddr,
+   mmu_get_ap(mmu_virtual_psize));
 }
 

[PATCH 3/7] powerpc/mm/radix: Add tlb flush of THP ptes

2016-05-19 Thread Aneesh Kumar K.V
Instead of flushing the entire mm, implement a flush_pmd_tlb_range

Signed-off-by: Aneesh Kumar K.V 
---
 .../powerpc/include/asm/book3s/64/tlbflush-radix.h |  4 ++
 arch/powerpc/include/asm/book3s/64/tlbflush.h  |  9 
 arch/powerpc/mm/pgtable-book3s64.c |  4 +-
 arch/powerpc/mm/tlb-radix.c| 54 ++
 4 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 13ef38828dfe..823528d34688 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -10,6 +10,10 @@ static inline int mmu_get_ap(int psize)
return mmu_psize_defs[psize].ap;
 }
 
+extern void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long 
start,
+unsigned long end, int psize);
+extern void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
+  unsigned long start, unsigned long end);
 extern void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long 
start,
unsigned long end);
 extern void radix__flush_tlb_kernel_range(unsigned long start, unsigned long 
end);
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index d98424ae356c..f0d6c9d38916 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -7,6 +7,15 @@
 #include 
 #include 
 
+#define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
+static inline void flush_pmd_tlb_range(struct vm_area_struct *vma,
+  unsigned long start, unsigned long end)
+{
+   if (radix_enabled())
+   return radix__flush_pmd_tlb_range(vma, start, end);
+   return hash__flush_tlb_range(vma, start, end);
+}
+
 static inline void flush_tlb_range(struct vm_area_struct *vma,
   unsigned long start, unsigned long end)
 {
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 670318766545..7bb8acffe876 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -33,7 +33,7 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, 
unsigned long address,
changed = !pmd_same(*(pmdp), entry);
if (changed) {
__ptep_set_access_flags(pmdp_ptep(pmdp), pmd_pte(entry));
-   flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+   flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
}
return changed;
 }
@@ -66,7 +66,7 @@ void pmdp_invalidate(struct vm_area_struct *vma, unsigned 
long address,
 pmd_t *pmdp)
 {
pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
-   flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+   flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
/*
 * This ensures that generic code that rely on IRQ disabling
 * to prevent a parallel THP split work as expected.
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 5807f5d72e1b..fe2fc58d2e00 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -243,3 +243,57 @@ void radix__tlb_flush(struct mmu_gather *tlb)
struct mm_struct *mm = tlb->mm;
radix__flush_tlb_mm(mm);
 }
+
+#define TLB_FLUSH_ALL -1UL
+/*
+ * Number of pages above which we will do a bcast tlbie. Just a
+ * number at this point copied from x86
+ */
+static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
+
+void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
+ unsigned long end, int psize)
+{
+   unsigned int pid;
+   unsigned long addr;
+   int local = mm_is_core_local(mm);
+   unsigned long ap = mmu_get_ap(psize);
+   int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+   unsigned long page_size = 1UL << mmu_psize_defs[psize].shift;
+
+
+   preempt_disable();
+   pid = mm ? mm->context.id : 0;
+   if (unlikely(pid == MMU_NO_CONTEXT))
+   goto err_out;
+
+   if (end == TLB_FLUSH_ALL ||
+   (end - start) > tlb_single_page_flush_ceiling * page_size) {
+   if (local)
+   _tlbiel_pid(pid);
+   else
+   _tlbie_pid(pid);
+   goto err_out;
+   }
+   for (addr = start; addr < end; addr += page_size) {
+
+   if (local)
+   _tlbiel_va(addr, pid, ap);
+   else {
+   if (lock_tlbie)
+   raw_spin_lock(_tlbie_lock);
+   _tlbie_va(addr, pid, ap);
+   if (lock_tlbie)
+  

[PATCH 2/7] powerpc/mm: Drop multiple definition of mm_is_core_local

2016-05-19 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/tlb.h | 13 +
 arch/powerpc/mm/tlb-radix.c|  6 --
 arch/powerpc/mm/tlb_nohash.c   |  6 --
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 20733fa518ae..f6f68f73e858 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -46,5 +46,18 @@ static inline void __tlb_remove_tlb_entry(struct mmu_gather 
*tlb, pte_t *ptep,
 #endif
 }
 
+#ifdef CONFIG_SMP
+static inline int mm_is_core_local(struct mm_struct *mm)
+{
+   return cpumask_subset(mm_cpumask(mm),
+ topology_sibling_cpumask(smp_processor_id()));
+}
+#else
+static inline int mm_is_core_local(struct mm_struct *mm)
+{
+   return 1;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_TLB_H */
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 0fdaf93a3e09..5807f5d72e1b 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -152,12 +152,6 @@ void radix__local_flush_tlb_page(struct vm_area_struct 
*vma, unsigned long vmadd
 EXPORT_SYMBOL(radix__local_flush_tlb_page);
 
 #ifdef CONFIG_SMP
-static int mm_is_core_local(struct mm_struct *mm)
-{
-   return cpumask_subset(mm_cpumask(mm),
- topology_sibling_cpumask(smp_processor_id()));
-}
-
 void radix__flush_tlb_mm(struct mm_struct *mm)
 {
unsigned int pid;
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index f4668488512c..050badc0ebd3 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -215,12 +215,6 @@ EXPORT_SYMBOL(local_flush_tlb_page);
 
 static DEFINE_RAW_SPINLOCK(tlbivax_lock);
 
-static int mm_is_core_local(struct mm_struct *mm)
-{
-   return cpumask_subset(mm_cpumask(mm),
- topology_sibling_cpumask(smp_processor_id()));
-}
-
 struct tlb_flush_param {
unsigned long addr;
unsigned int pid;
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/7] TLB flush update for radix

2016-05-19 Thread Aneesh Kumar K.V
Hi,

This patch series introduce range based tlb flush and use the same for radix
implementation. We still need to handle the mmu_gather related tlb flush.
That will be done in a later patch.

Aneesh Kumar K.V (7):
  powerpc/mm: Use hugetlb flush functions
  powerpc/mm: Drop multiple definition of mm_is_core_local
  powerpc/mm/radix: Add tlb flush of THP ptes
  powerpc/mm/radix: Rename function and drop unused arg
  powerpc/mm/radix/hugetlb: Add helper for finding page size from hstate
  powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range
  powerpc/mm: remove flush_tlb_page_nohash

 arch/powerpc/include/asm/book3s/64/hugetlb-radix.h | 15 +
 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h |  5 --
 .../powerpc/include/asm/book3s/64/tlbflush-radix.h | 16 +++--
 arch/powerpc/include/asm/book3s/64/tlbflush.h  | 27 +---
 arch/powerpc/include/asm/hugetlb.h |  2 +-
 arch/powerpc/include/asm/tlb.h | 13 
 arch/powerpc/include/asm/tlbflush.h|  1 -
 arch/powerpc/mm/hugetlbpage-radix.c| 39 +--
 arch/powerpc/mm/pgtable-book3s64.c |  4 +-
 arch/powerpc/mm/pgtable.c  |  2 +-
 arch/powerpc/mm/tlb-radix.c| 78 ++
 arch/powerpc/mm/tlb_hash32.c   | 11 ---
 arch/powerpc/mm/tlb_nohash.c   |  6 --
 mm/hugetlb.c   | 10 ++-
 14 files changed, 152 insertions(+), 77 deletions(-)

-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 5/6] powerpc/mm: Make MMU_FTR_RADIX a MMU family feature

2016-05-19 Thread Aneesh Kumar K.V
MMU feature bits are defined such that we use the lower half to
present MMU family features. Remove the strict split of half and
also move Radix to a mmu family feature. Radix introduce a new MMU
model and strictly speaking it is a new MMU family. This also free
up bits which can be used for individual features later.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  2 +-
 arch/powerpc/include/asm/mmu.h   | 16 +++-
 arch/powerpc/kernel/entry_64.S   |  2 +-
 arch/powerpc/kernel/exceptions-64s.S |  8 
 arch/powerpc/kernel/prom.c   |  2 +-
 5 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 5854263d4d6e..c6b1ff795632 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -23,7 +23,7 @@ struct mmu_psize_def {
 };
 extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 
-#define radix_enabled() mmu_has_feature(MMU_FTR_RADIX)
+#define radix_enabled() mmu_has_feature(MMU_FTR_TYPE_RADIX)
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index e53ebebff474..4ad66a547d4c 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -12,7 +12,7 @@
  */
 
 /*
- * First half is MMU families
+ * MMU families
  */
 #define MMU_FTR_HPTE_TABLE ASM_CONST(0x0001)
 #define MMU_FTR_TYPE_8xx   ASM_CONST(0x0002)
@@ -20,9 +20,12 @@
 #define MMU_FTR_TYPE_44x   ASM_CONST(0x0008)
 #define MMU_FTR_TYPE_FSL_E ASM_CONST(0x0010)
 #define MMU_FTR_TYPE_47x   ASM_CONST(0x0020)
-
 /*
- * This is individual features
+ * Radix page table available
+ */
+#define MMU_FTR_TYPE_RADIX ASM_CONST(0x0040)
+/*
+ * individual features
  */
 
 /* Enable use of high BAT registers */
@@ -88,11 +91,6 @@
  */
 #define MMU_FTR_1T_SEGMENT ASM_CONST(0x4000)
 
-/*
- * Radix page table available
- */
-#define MMU_FTR_RADIX  ASM_CONST(0x8000)
-
 /* MMU feature bit sets for various CPUs */
 #define MMU_FTRS_DEFAULT_HPTE_ARCH_V2  \
MMU_FTR_HPTE_TABLE | MMU_FTR_PPCAS_ARCH_V2
@@ -126,7 +124,7 @@ enum {
MMU_FTR_LOCKLESS_TLBIE | MMU_FTR_CI_LARGE_PAGE |
MMU_FTR_1T_SEGMENT |
 #ifdef CONFIG_PPC_RADIX_MMU
-   MMU_FTR_RADIX |
+   MMU_FTR_TYPE_RADIX |
 #endif
0,
 };
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 73e461a3dfbb..dd26d4ed7513 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -532,7 +532,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 #ifdef CONFIG_PPC_STD_MMU_64
 BEGIN_MMU_FTR_SECTION
b   2f
-END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX)
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
 BEGIN_FTR_SECTION
clrrdi  r6,r8,28/* get its ESID */
clrrdi  r9,r1,28/* get current sp ESID */
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 4c9440629128..f2bd375b9a4e 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -945,7 +945,7 @@ BEGIN_MMU_FTR_SECTION
b   do_hash_page/* Try to handle as hpte fault */
 MMU_FTR_SECTION_ELSE
b   handle_page_fault
-ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_RADIX)
+ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
.align  7
.globl  h_data_storage_common
@@ -976,7 +976,7 @@ BEGIN_MMU_FTR_SECTION
b   do_hash_page/* Try to handle as hpte fault */
 MMU_FTR_SECTION_ELSE
b   handle_page_fault
-ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_RADIX)
+ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
STD_EXCEPTION_COMMON(0xe20, h_instr_storage, unknown_exception)
 
@@ -1390,7 +1390,7 @@ slb_miss_realmode:
 #ifdef CONFIG_PPC_STD_MMU_64
 BEGIN_MMU_FTR_SECTION
bl  slb_allocate_realmode
-END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX)
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX)
 #endif
/* All done -- return from exception. */
 
@@ -1401,7 +1401,7 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX)
mtlrr10
 BEGIN_MMU_FTR_SECTION
b   2f
-END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX)
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
andi.   r10,r12,MSR_RI  /* check for unrecoverable exception */
beq-2f
 
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index d924cd60fc8e..8d5579b5b6c8 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -166,7 +166,7 @@ static struct ibm_pa_feature {
 * which is 0 if the kernel doesn't support TM.
 */
{CPU_FTR_TM_COMP, 0, 0, 22, 0, 0},
-   {0, MMU_FTR_RADIX, 0,   

[PATCH 4/6] powerpc/mm/hash: Compute the segment size correctly for ISA 3.0

2016-05-19 Thread Aneesh Kumar K.V
PowerISA 3.0 encodes the segment size in the second half of hash page
table entry. Update hpte_decode accordingly.

Fixes: 50de596de8be ("powerpc/mm/hash: Add support for Power9 Hash")

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hash_native_64.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index d873f6507f72..c9715fc99d68 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -550,7 +550,10 @@ static void hpte_decode(struct hash_pte *hpte, unsigned 
long slot,
}
}
/* This works for all page sizes, and for 256M and 1T segments */
-   *ssize = hpte_v >> HPTE_V_SSIZE_SHIFT;
+   if (cpu_has_feature(CPU_FTR_ARCH_300))
+   *ssize = hpte_r >> HPTE_R_3_0_SSIZE_SHIFT;
+   else
+   *ssize = hpte_v >> HPTE_V_SSIZE_SHIFT;
shift = mmu_psize_defs[size].shift;
 
avpn = (HPTE_V_AVPN_VAL(hpte_v) & ~mmu_psize_defs[size].avpnm);
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/6] powerpc/mm/radix: Update PID switch sequence

2016-05-19 Thread Aneesh Kumar K.V
Update the PID switch as per ISA doc. slbia is needed in radix to
invalidate any implementation specific lookaside information

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/mmu_context_book3s64.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
b/arch/powerpc/mm/mmu_context_book3s64.c
index 227b2a6c4544..565f1b1da33b 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -181,7 +181,10 @@ void destroy_context(struct mm_struct *mm)
 #ifdef CONFIG_PPC_RADIX_MMU
 void radix__switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 {
-   mtspr(SPRN_PID, next->context.id);
asm volatile("isync": : :"memory");
+   mtspr(SPRN_PID, next->context.id);
+   asm volatile("isync \n"
+"slbia 0x7 \n"
+: : :"memory");
 }
 #endif
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 6/6] powerpc/mm/hash: Add helper for finding SLBE LLP encoding

2016-05-19 Thread Aneesh Kumar K.V
Replace opencoding of the same at multiple places with the helper.
No functional change with this patch.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 9 +
 arch/powerpc/include/asm/kvm_book3s_64.h  | 3 +--
 arch/powerpc/mm/hash_native_64.c  | 6 ++
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 290157e8d5b2..a5fa6be7d5ae 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -150,6 +150,15 @@ static inline unsigned int mmu_psize_to_shift(unsigned int 
mmu_psize)
BUG();
 }
 
+static inline unsigned long get_sllp_encoding(int psize)
+{
+   unsigned long sllp;
+
+   sllp = ((mmu_psize_defs[psize].sllp & SLB_VSID_L) >> 6) |
+   ((mmu_psize_defs[psize].sllp & SLB_VSID_LP) >> 4);
+   return sllp;
+}
+
 #endif /* __ASSEMBLY__ */
 
 /*
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 1f4497fb5b83..88d17b4ea9c8 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -181,8 +181,7 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
 
switch (b_psize) {
case MMU_PAGE_4K:
-   sllp = ((mmu_psize_defs[a_psize].sllp & SLB_VSID_L) >> 6) |
-   ((mmu_psize_defs[a_psize].sllp & SLB_VSID_LP) >> 4);
+   sllp = get_sllp_encoding(a_psize);
rb |= sllp << 5;/*  AP field */
rb |= (va_low & 0x7ff) << 12;   /* remaining 11 bits of AVA */
break;
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index c9715fc99d68..db108e478c80 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -71,8 +71,7 @@ static inline void __tlbie(unsigned long vpn, int psize, int 
apsize, int ssize)
/* clear out bits after (52) [052.63] */
va &= ~((1ul << (64 - 52)) - 1);
va |= ssize << 8;
-   sllp = ((mmu_psize_defs[apsize].sllp & SLB_VSID_L) >> 6) |
-   ((mmu_psize_defs[apsize].sllp & SLB_VSID_LP) >> 4);
+   sllp = get_sllp_encoding(apsize);
va |= sllp << 5;
asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
 : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
@@ -120,8 +119,7 @@ static inline void __tlbiel(unsigned long vpn, int psize, 
int apsize, int ssize)
/* clear out bits after(52) [052.63] */
va &= ~((1ul << (64 - 52)) - 1);
va |= ssize << 8;
-   sllp = ((mmu_psize_defs[apsize].sllp & SLB_VSID_L) >> 6) |
-   ((mmu_psize_defs[apsize].sllp & SLB_VSID_LP) >> 4);
+   sllp = get_sllp_encoding(apsize);
va |= sllp << 5;
asm volatile(".long 0x7c000224 | (%0 << 11) | (0 << 21)"
 : : "r"(va) : "memory");
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/6] powerpc/mm/hash: Update SDR1 size encoding as documented in ISA 3.0

2016-05-19 Thread Aneesh Kumar K.V
ISA 3.0 document hash table size in bytes = 2^(HTABSIZE + 18)

No functionality change by this patch.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hash_utils_64.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 59268969a0bc..3849de15b65f 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -677,10 +677,9 @@ int remove_section_mapping(unsigned long start, unsigned 
long end)
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 static void __init hash_init_partition_table(phys_addr_t hash_table,
-unsigned long pteg_count)
+unsigned long htab_size)
 {
unsigned long ps_field;
-   unsigned long htab_size;
unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
 
/*
@@ -688,7 +687,7 @@ static void __init hash_init_partition_table(phys_addr_t 
hash_table,
 * We can ignore that for lpid 0
 */
ps_field = 0;
-   htab_size =  __ilog2(pteg_count) - 11;
+   htab_size =  __ilog2(htab_size) - 18;
 
BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too 
large.");
partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
@@ -774,7 +773,7 @@ static void __init htab_initialize(void)
htab_address = __va(table);
 
/* htab absolute addr + encoded htabsize */
-   _SDR1 = table + __ilog2(pteg_count) - 11;
+   _SDR1 = table + __ilog2(htab_size_bytes) - 18;
 
/* Initialize the HPT with no entries */
memset((void *)table, 0, htab_size_bytes);
@@ -783,7 +782,7 @@ static void __init htab_initialize(void)
/* Set SDR1 */
mtspr(SPRN_SDR1, _SDR1);
else
-   hash_init_partition_table(table, pteg_count);
+   hash_init_partition_table(table, htab_size_bytes);
}
 
prot = pgprot_val(PAGE_KERNEL);
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/6] powerpc/mm/radix: Update LPCR only if it is powernv

2016-05-19 Thread Aneesh Kumar K.V
LPCR cannot be updated when running in guest mode.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable-radix.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 18b2c11604fa..c939e6e57a9e 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -296,11 +296,6 @@ found:
 void __init radix__early_init_mmu(void)
 {
unsigned long lpcr;
-   /*
-* setup LPCR UPRT based on mmu_features
-*/
-   lpcr = mfspr(SPRN_LPCR);
-   mtspr(SPRN_LPCR, lpcr | LPCR_UPRT);
 
 #ifdef CONFIG_PPC_64K_PAGES
/* PAGE_SIZE mappings */
@@ -343,8 +338,11 @@ void __init radix__early_init_mmu(void)
__pte_frag_size_shift = H_PTE_FRAG_SIZE_SHIFT;
 
radix_init_page_sizes();
-   if (!firmware_has_feature(FW_FEATURE_LPAR))
+   if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+   lpcr = mfspr(SPRN_LPCR);
+   mtspr(SPRN_LPCR, lpcr | LPCR_UPRT);
radix_init_partition_table();
+   }
 
radix_init_pgtable();
 }
@@ -353,16 +351,15 @@ void radix__early_init_mmu_secondary(void)
 {
unsigned long lpcr;
/*
-* setup LPCR UPRT based on mmu_features
+* update partition table control register and UPRT
 */
-   lpcr = mfspr(SPRN_LPCR);
-   mtspr(SPRN_LPCR, lpcr | LPCR_UPRT);
-   /*
-* update partition table control register, 64 K size.
-*/
-   if (!firmware_has_feature(FW_FEATURE_LPAR))
+   if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+   lpcr = mfspr(SPRN_LPCR);
+   mtspr(SPRN_LPCR, lpcr | LPCR_UPRT);
+
mtspr(SPRN_PTCR,
  __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
+   }
 }
 
 void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 2/3] arch/powerpc : optprobes for powerpc core

2016-05-19 Thread Anju T

Hi Masami,

 Thank you for reviewing the patch.

On Wednesday 18 May 2016 08:43 PM, Masami Hiramatsu wrote:

On Wed, 18 May 2016 02:09:37 +0530
Anju T  wrote:


Instruction slot for detour buffer is allocated from
the reserved area. For the time being 64KB is reserved
in memory for this purpose. ppc_get_optinsn_slot() and
ppc_free_optinsn_slot() are geared towards the allocation and freeing
of memory from this area.

Thank you for porting optprobe on ppc!!

I have some comments on this patch.


Signed-off-by: Anju T 
---
  arch/powerpc/kernel/optprobes.c | 463 
  1 file changed, 463 insertions(+)
  create mode 100644 arch/powerpc/kernel/optprobes.c

diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
new file mode 100644
index 000..50a60c1
--- /dev/null
+++ b/arch/powerpc/kernel/optprobes.c
@@ -0,0 +1,463 @@
+/*
+ * Code for Kernel probes Jump optimization.
+ *
+ * Copyright 2016, Anju T, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Reserve an area to allocate slots for detour buffer */
+extern void  optprobe_trampoline_holder(void)
+{
+   asm volatile(".global optinsn_slot\n"
+   "optinsn_slot:\n"
+   ".space 65536");
+}

Would we better move this into optprobes_head.S?


Yes. Will do.

+
+#define SLOT_SIZE 65536
+#define TMPL_CALL_HDLR_IDX \
+   (optprobe_template_call_handler - optprobe_template_entry)
+#define TMPL_EMULATE_IDX   \
+   (optprobe_template_call_emulate - optprobe_template_entry)
+#define TMPL_RET_BRANCH_IDX\
+   (optprobe_template_ret_branch - optprobe_template_entry)
+#define TMPL_RET_IDX   \
+   (optprobe_template_ret - optprobe_template_entry)
+#define TMPL_OP1_IDX   \
+   (optprobe_template_op_address1 - optprobe_template_entry)
+#define TMPL_OP2_IDX   \
+   (optprobe_template_op_address2 - optprobe_template_entry)
+#define TMPL_INSN_IDX  \
+   (optprobe_template_insn - optprobe_template_entry)
+#define TMPL_END_IDX   \
+   (optprobe_template_end - optprobe_template_entry)
+
+struct kprobe_ppc_insn_page {
+   struct list_head list;
+   kprobe_opcode_t *insns; /* Page of instruction slots */
+   struct kprobe_insn_cache *cache;
+   int nused;
+   int ngarbage;
+   char slot_used[];
+};
+
+#define PPC_KPROBE_INSN_PAGE_SIZE(slots)   \
+   (offsetof(struct kprobe_ppc_insn_page, slot_used) + \
+   (sizeof(char) * (slots)))
+
+enum ppc_kprobe_slot_state {
+   SLOT_CLEAN = 0,
+   SLOT_DIRTY = 1,
+   SLOT_USED = 2,
+};
+
+static struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
+   .mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
+   .pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
+   /* .insn_size is initialized later */
+   .nr_garbage = 0,
+};
+
+static int ppc_slots_per_page(struct kprobe_insn_cache *c)
+{
+   /*
+* Here the #slots per page differs from x86 as we have
+* only 64KB reserved.
+*/
+   return SLOT_SIZE / (c->insn_size * sizeof(kprobe_opcode_t));
+}
+
+/* Return 1 if all garbages are collected, otherwise 0. */
+static int collect_one_slot(struct kprobe_ppc_insn_page *kip, int idx)
+{
+   kip->slot_used[idx] = SLOT_CLEAN;
+   kip->nused--;
+   return 0;
+}
+
+static int collect_garbage_slots(struct kprobe_insn_cache *c)
+{
+   struct kprobe_ppc_insn_page *kip, *next;
+
+   /* Ensure no-one is interrupted on the garbages */
+   synchronize_sched();
+
+   list_for_each_entry_safe(kip, next, >pages, list) {
+   int i;
+
+   if (kip->ngarbage == 0)
+   continue;
+   kip->ngarbage = 0;   /* we will collect all garbages */
+   for (i = 0; i < ppc_slots_per_page(c); i++) {
+   if (kip->slot_used[i] == SLOT_DIRTY &&
+   collect_one_slot(kip, i))
+   break;
+   }
+   }
+   c->nr_garbage = 0;
+   return 0;
+}
+
+kprobe_opcode_t  *__ppc_get_optinsn_slot(struct kprobe_insn_cache *c)
+{
+   struct kprobe_ppc_insn_page *kip;
+   kprobe_opcode_t *slot = NULL;
+
+   mutex_lock(>mutex);
+   list_for_each_entry(kip, >pages, list) {
+   if (kip->nused < ppc_slots_per_page(c)) {
+   int i;
+
+   for (i = 0; i < ppc_slots_per_page(c); i++) {
+   if (kip->slot_used[i] == SLOT_CLEAN) {
+   kip->slot_used[i] = 

Re: [PATCH] kvm-pr: manage illegal instructions

2016-05-19 Thread Thomas Huth
On 18.05.2016 12:53, Thomas Huth wrote:
> On 18.05.2016 12:18, Thomas Huth wrote:
>> On 17.05.2016 19:49, Laurent Vivier wrote:
>>>
>>>
>>> On 17/05/2016 10:37, Alexander Graf wrote:
 On 05/17/2016 10:35 AM, Laurent Vivier wrote:
>
> On 12/05/2016 16:23, Laurent Vivier wrote:
>>
>> On 12/05/2016 11:27, Alexander Graf wrote:
>>> On 05/12/2016 11:10 AM, Laurent Vivier wrote:
 On 11/05/2016 13:49, Alexander Graf wrote:
> On 05/11/2016 01:14 PM, Laurent Vivier wrote:
>> On 11/05/2016 12:35, Alexander Graf wrote:
>>> On 03/15/2016 09:18 PM, Laurent Vivier wrote:
 While writing some instruction tests for kvm-unit-tests for
 powerpc,
 I've found that illegal instructions are not managed correctly
 with
 kvm-pr,
 while it is fine with kvm-hv.

 When an illegal instruction (like ".long 0") is processed by
 kvm-pr,
 the kernel logs are filled with:

  Couldn't emulate instruction 0x (op 0 xop 0)
  kvmppc_handle_exit_pr: emulation at 700 failed ()

 While the exception handler receives an interrupt for each
 instruction
 executed after the illegal instruction.

 Signed-off-by: Laurent Vivier 
 ---
  arch/powerpc/kvm/book3s_emulate.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/arch/powerpc/kvm/book3s_emulate.c
 b/arch/powerpc/kvm/book3s_emulate.c
 index 2afdb9c..4ee969d 100644
 --- a/arch/powerpc/kvm/book3s_emulate.c
 +++ b/arch/powerpc/kvm/book3s_emulate.c
 @@ -99,7 +99,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run
 *run,
 struct kvm_vcpu *vcpu,
switch (get_op(inst)) {
  case 0:
 -emulated = EMULATE_FAIL;
  if ((kvmppc_get_msr(vcpu) & MSR_LE) &&
  (inst == swab32(inst_sc))) {
  /*
 @@ -112,6 +111,9 @@ int kvmppc_core_emulate_op_pr(struct kvm_run
 *run,
 struct kvm_vcpu *vcpu,
  kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED);
  kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
  emulated = EMULATE_DONE;
 +} else {
 +kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
>>> But isn't that exactly what the semantic of EMULATE_FAIL is?
>>> Fixing it
>>> up in book3s_emulate.c is definitely the wrong spot.
>>>
>>> So what is the problem you're trying to solve? Is the SRR0 at the
>>> wrong
>>> spot or are the log messages the problem?
>> No, the problem is the host kernel logs are filled by the message
>> and
>> the execution hangs. And the host becomes unresponsiveness, even
>> after
>> the end of the tests.
>>
>> Please, try to run kvm-unit-tests (the emulator test) on a KVM-PR
>> host,
>> and check the kernel logs (dmesg), then try to ssh to the host...
> Ok, so the log messages are the problem. Please fix the message
> output
> then - or remove it altogether. Or if you like, create a module
> parameter that allows you to emit them.
>
> I personally think the best solution would be to just convert the
> message into a trace point.
>
> While at it, please see whether the guest can trigger similar host
> log
> output excess in other code paths.
 The problem is not really with the log messages: they are
 consequence of
 the bug I try to fix.

 What happens is once kvm_pr decodes an invalid instruction all the
 valid
 following instructions trigger a Program exception to the guest
 (but are
 executed correctly). It has no real consequence on big machine like
 POWER8, except that the guest become very slow and the log files of
 the
 host are filled with messages (and qemu uses 100% of the CPU). On a
 smaller machine like a  PowerMac G5, the machine becomes simply
 unusable.
>>> It's probably more related to your verbosity level of kernel messages.
>>> If you pass loglevel=0 (or quiet) to you kernel cmdline you won't get
>>> the messages printed to serial which is what's slowing you down.
>>>
>>> The other problem sounds pretty severe, but the only thing your patch
>>> does any different from the current code flow would be the patch below.
>>> Or did I miss anything?
>>>

Re: [kernel-hardening] [PATCH v8 2/4] GCC plugin infrastructure

2016-05-19 Thread Andrew Donnellan

On 19/05/16 16:22, Michael Ellerman wrote:

Did you test the plugins with all gcc versions (4.5-6)?


What's the concern about gcc versions? Just not breaking the build on old
compilers?
I'm pretty sure powerpc big endian still builds with gcc 4.4.


gcc's plugin support only landed in 4.5, so we don't care about <=4.4.


However if Andrew's only tested on little endian, then that select should be
guarded with an "if CPU_LITTLE_ENDIAN". And to build LE you need gcc >= 4.9.


I'm going to give BE a test too just to be sure.

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] [PATCH v8 2/4] GCC plugin infrastructure

2016-05-19 Thread Michael Ellerman
On Wed, 2016-05-18 at 12:33 +0200, Emese Revfy wrote:

> > I've done some basic sanity testing on powerpc with the cyclomatic 
> > complexity plugin (with LE native + cross-compilers) and it seems to 
> > work with the patch below.
> > 
> > Signed-off-by: Andrew Donnellan 
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index a18a0dc..0cfed5b 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -97,6 +97,7 @@ config PPC
> >  select HAVE_DYNAMIC_FTRACE_WITH_REGS if MPROFILE_KERNEL
> >  select HAVE_FUNCTION_TRACER
> >  select HAVE_FUNCTION_GRAPH_TRACER
> > +   select HAVE_GCC_PLUGINS
> >  select SYSCTL_EXCEPTION_TRACE
> >  select ARCH_WANT_OPTIONAL_GPIOLIB
> >  select VIRT_TO_BUS if !PPC64
> 
> Hi,
> 
> Did you test the plugins with all gcc versions (4.5-6)?

What's the concern about gcc versions? Just not breaking the build on old
compilers?

I'm pretty sure powerpc big endian still builds with gcc 4.4.

However if Andrew's only tested on little endian, then that select should be
guarded with an "if CPU_LITTLE_ENDIAN". And to build LE you need gcc >= 4.9.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev