Re: [PATCH v20 00/20] perf, tools: Add support for PMU events in JSON format

2016-09-13 Thread Ingo Molnar

* Michael Ellerman  wrote:

> Jiri Olsa  writes:
> 
> > On Wed, Aug 31, 2016 at 09:15:30AM -0700, Andi Kleen wrote:
> >> > > 
> >> > > > 
> >> > > > I've already made some changes in pmu-events/* to support
> >> > > > this hierarchy to see how bad the change would be.. and
> >> > > > it's not that bad ;-)
> >> > > 
> >> > > Everything has to be automated, please no manual changes.
> >> > 
> >> > sure
> >> > 
> >> > so, if you're ok with the layout, how do you want to proceed further?
> >> 
> >> If the split version is acceptable it's fine for me to merge it.
> >> 
> >> I'll add split-json to my scripting, so the next update would
> >> be split too.
> >
> > ook, I'll wait for patches then
> 
> Who are you waiting for patches from?
> 
> Would be great if this could go in for 4.9 still.

No objections from me - the latest bits were good:

  Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [RFC] KVM: PPC: Book3S HV: Fall back to same size HPT in allocation ioctl

2016-09-13 Thread Anshuman Khandual
On 09/14/2016 05:27 AM, Michael Ellerman wrote:
> Anshuman Khandual  writes:

Not sure whether this mail ever went. Sending it again.

> 
>> > When the HPT size is explicitly passed on from the userspace, currently
>> > the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT
>> > from reserved CMA area and if that is not possible, the allocation just
>> > fails. With the commit 572abd563befd56 ("KVM: PPC: Book3S HV: Don't fall
>> > back to smaller HPT size in allocation ioctl"), it does not even try to
>> > allocate the same order pages from the page allocator before failing for
>> > good. Same order allocation should be attempted from the page allocator
>> > as a fallback option when the CMA allocation attempt fails.
> It looks like if CMA is not configured we will just fail instantly.

Right and also we have this fallback registered any way. I wonder
why we are still debating about the need of a fallback mechanism
when we already have got one.

> 
> So this does look like something we should fix.
> 
> But I think it is just a bug in commit 572abd563bef ("KVM: PPC: Book3S
> HV: Don't fall back to smaller HPT size in allocation ioctl"), which did:

Hmm, I think its something the commit missed to accommodate for.
But maybe yes, its a bug in the commit.

> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 1f9c0a17f445..10722b1e38b5 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
> }
> 
> /* Lastly try successively smaller sizes from the page allocator */
> -   while (!hpt && order > PPC_MIN_HPT_ORDER) {
> +   /* Only do this if userspace didn't specify a size via ioctl */
> +   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
> hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
>__GFP_NOWARN, order - PAGE_SHIFT);
> if (!hpt)
> 
> 
> Instead of guarding the loop entry with !htab_orderp, it should have
> allowed the loop to enter, but prevented it from iterating if the
> allocation fails and htab_orderp != 0.

Right and thats what Aneesh's proposed patch (in the other thread) does.



Re: [PATCH v14 13/15] selftests/powerpc: Add ptrace tests for TM SPR registers

2016-09-13 Thread Cyril Bur
On Mon, 2016-09-12 at 15:33 +0800, wei.guo.si...@gmail.com wrote:
> From: Anshuman Khandual 
> 
> This patch adds ptrace interface test for TM SPR registers. This
> also adds ptrace interface based helper functions related to TM
> SPR registers access.
> 

I'm seeing this one fail a lot, it does occasionally succeed but fails
a lot on my test setup.

I use qemu on a power8 for most of my testing:
qemu-system-ppc64 --enable-kvm -machine pseries,accel=kvm,usb=off -m
4096 -realtime mlock=off -smp 4,sockets=1,cores=2,threads=2 -nographic
-vga none


> Signed-off-by: Anshuman Khandual 
> Signed-off-by: Simon Guo 
> ---
>  tools/testing/selftests/powerpc/ptrace/Makefile|   3 +-
>  .../selftests/powerpc/ptrace/ptrace-tm-spr.c   | 186
> +
>  tools/testing/selftests/powerpc/ptrace/ptrace.h|  35 
>  3 files changed, 223 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-tm-
> spr.c
> 
> diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile
> b/tools/testing/selftests/powerpc/ptrace/Makefile
> index 797840a..f34670e 100644
> --- a/tools/testing/selftests/powerpc/ptrace/Makefile
> +++ b/tools/testing/selftests/powerpc/ptrace/Makefile
> @@ -1,7 +1,8 @@
>  TEST_PROGS := ptrace-ebb ptrace-gpr ptrace-tm-gpr ptrace-tm-spd-gpr
> \
>  ptrace-tar ptrace-tm-tar ptrace-tm-spd-tar ptrace-vsx ptrace-tm-vsx
> \
> -ptrace-tm-spd-vsx
> +ptrace-tm-spd-vsx ptrace-tm-spr
>  
> +include ../../lib.mk
>  
>  all: $(TEST_PROGS)
>  CFLAGS += -m64
> diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c
> b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c
> new file mode 100644
> index 000..2863070
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c
> @@ -0,0 +1,186 @@
> +/*
> + * Ptrace test TM SPR registers
> + *
> + * Copyright (C) 2015 Anshuman Khandual, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +#include "ptrace.h"
> +
> +/* Tracee and tracer shared data */
> +struct shared {
> + int flag;
> + struct tm_spr_regs regs;
> +};
> +unsigned long tfhar;
> +
> +int shm_id;
> +volatile struct shared *cptr, *pptr;
> +
> +int shm_id1;
> +volatile int *cptr1, *pptr1;
> +
> +#define TM_SCHED 0xde018c01
> +#define TM_KVM_SCHED 0xe001ac01
> +
> +int validate_tm_spr(struct tm_spr_regs *regs)
> +{
> + if (regs->tm_tfhar != tfhar)
> + return TEST_FAIL;
> +
> + if ((regs->tm_texasr != TM_SCHED) && (regs->tm_texasr !=
> TM_KVM_SCHED))
> + return TEST_FAIL;

The above condition fails, should this test try again if this condition
is true, rather than fail?

> +
> + if ((regs->tm_texasr == TM_KVM_SCHED) && (regs->tm_tfiar !=
> 0))
> + return TEST_FAIL;
> +
> + return TEST_PASS;
> +}
> +

[snip]



Re: [PATCH] powerpc/powernv: Initialise nest mmu

2016-09-13 Thread Alistair Popple
On Mon, 15 Aug 2016 04:51:59 PM Alistair Popple wrote:
> POWER9 contains an off core mmu called the nest mmu (NMMU). This is
> used by other hardware units on the chip to translate virtual
> addresses into real addresses. The unit attempting an address
> translation provides the majority of the context required for the
> translation request except for the base address of the partition table
> (ie. the PTCR) which needs to be programmed into the NMMU.
> 
> This patch adds a call to OPAL to set the PTCR for the nest mmu in
> opal_init().
> 
> Signed-off-by: Alistair Popple 
> ---
> 
> This patch depends on a new OPAL call which has yet to be added to
> skiboot, although the patch to do so has been posted to
> http://patchwork.ozlabs.org/patch/659106/

This has now gone into skiboot with the same OPAL call number :-
https://github.com/open-power/skiboot/commit/84e63a8d4fd9eb3efc872099d579c49fef6a5810

>  arch/powerpc/include/asm/opal-api.h| 3 ++-
>  arch/powerpc/include/asm/opal.h| 1 +
>  arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
>  arch/powerpc/platforms/powernv/opal.c  | 3 +++
>  4 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index 0e2e57b..a0aa285 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -167,7 +167,8 @@
>  #define OPAL_INT_EOI 124
>  #define OPAL_INT_SET_MFRR125
>  #define OPAL_PCI_TCE_KILL126
> -#define OPAL_LAST126
> +#define OPAL_NMMU_SET_PTCR   127
> +#define OPAL_LAST127
> 
>  /* Device tree flags */
> 
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index ee05bd2..433df5e 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -228,6 +228,7 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t 
> kill_type,
>  int64_t opal_rm_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
>uint32_t pe_num, uint32_t tce_size,
>uint64_t dma_addr, uint32_t npages);
> +int64_t opal_nmmu_set_ptcr(uint64_t chip_id, uint64_t ptcr);
> 
>  /* Internal functions */
>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
> diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
> b/arch/powerpc/platforms/powernv/opal-wrappers.S
> index 3d29d40..a955649 100644
> --- a/arch/powerpc/platforms/powernv/opal-wrappers.S
> +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
> @@ -308,3 +308,4 @@ OPAL_CALL(opal_int_eoi,   
> OPAL_INT_EOI);
>  OPAL_CALL(opal_int_set_mfrr, OPAL_INT_SET_MFRR);
>  OPAL_CALL(opal_pci_tce_kill, OPAL_PCI_TCE_KILL);
>  OPAL_CALL_REAL(opal_rm_pci_tce_kill, OPAL_PCI_TCE_KILL);
> +OPAL_CALL(opal_nmmu_set_ptcr,OPAL_NMMU_SET_PTCR);
> diff --git a/arch/powerpc/platforms/powernv/opal.c 
> b/arch/powerpc/platforms/powernv/opal.c
> index 8b4fc68..b533245 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -762,6 +762,9 @@ static int __init opal_init(void)
>   /* Initialise OPAL kmsg dumper for flushing console on panic */
>   opal_kmsg_init();
> 
> + /* Update partition table control register on all Nest MMUs */
> + opal_nmmu_set_ptcr(-1UL, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
> +
>   return 0;
>  }
>  machine_subsys_initcall(powernv, opal_init);
> --
> 2.1.4



Re: [PATCH v14 15/15] selftests/powerpc: Fix a build issue

2016-09-13 Thread Cyril Bur
On Mon, 2016-09-12 at 15:33 +0800, wei.guo.si...@gmail.com wrote:
> From: Anshuman Khandual 
> 
> Fixes the following build failure -
> 
> cp_abort.c:90:3: error: ‘for’ loop initial declarations are only
> allowed in C99 or C11 mode
>    for (int i = 0; i < NUM_LOOPS; i++) {
>    ^
> cp_abort.c:90:3: note: use option -std=c99, -std=gnu99, -std=c11 or
> -std=gnu11 to compile your code
> cp_abort.c:97:3: error: ‘for’ loop initial declarations are only
> allowed in C99 or C11 mode
>    for (int i = 0; i < NUM_LOOPS; i++) {
> 
> Signed-off-by: Anshuman Khandual 
> Signed-off-by: Simon Guo 

Reviewed-by: Cyril Bur 

> ---
>  tools/testing/selftests/powerpc/context_switch/cp_abort.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git
> a/tools/testing/selftests/powerpc/context_switch/cp_abort.c
> b/tools/testing/selftests/powerpc/context_switch/cp_abort.c
> index 5a5b55a..1ce7dce 100644
> --- a/tools/testing/selftests/powerpc/context_switch/cp_abort.c
> +++ b/tools/testing/selftests/powerpc/context_switch/cp_abort.c
> @@ -67,7 +67,7 @@ int test_cp_abort(void)
>   /* 128 bytes for a full cache line */
>   char buf[128] __cacheline_aligned;
>   cpu_set_t cpuset;
> - int fd1[2], fd2[2], pid;
> + int fd1[2], fd2[2], pid, i;
>   char c;
>  
>   /* only run this test on a P9 or later */
> @@ -87,14 +87,14 @@ int test_cp_abort(void)
>   FAIL_IF(pid < 0);
>  
>   if (!pid) {
> - for (int i = 0; i < NUM_LOOPS; i++) {
> + for (i = 0; i < NUM_LOOPS; i++) {
>   FAIL_IF((write(fd1[WRITE_FD], , 1)) != 1);
>   FAIL_IF((read(fd2[READ_FD], , 1)) != 1);
>   /* A paste succeeds if CR0 EQ bit is set */
>   FAIL_IF(paste(buf) & 0x2000);
>   }
>   } else {
> - for (int i = 0; i < NUM_LOOPS; i++) {
> + for (i = 0; i < NUM_LOOPS; i++) {
>   FAIL_IF((read(fd1[READ_FD], , 1)) != 1);
>   copy(buf);
>   FAIL_IF((write(fd2[WRITE_FD], , 1) != 1));


Re: [PATCH v14 07/15] selftests/powerpc: Add ptrace tests for TAR, PPR, DSCR registers

2016-09-13 Thread Cyril Bur
On Mon, 2016-09-12 at 15:33 +0800, wei.guo.si...@gmail.com wrote:
> From: Anshuman Khandual 
> 
> This patch adds ptrace interface test for TAR, PPR, DSCR
> registers. This also adds ptrace interface based helper
> functions related to TAR, PPR, DSCR register access.
> 
> Signed-off-by: Anshuman Khandual 
> Signed-off-by: Simon Guo 
> ---
>  tools/testing/selftests/powerpc/ptrace/Makefile|   3 +-
>  .../testing/selftests/powerpc/ptrace/ptrace-tar.c  | 159
> ++
>  .../testing/selftests/powerpc/ptrace/ptrace-tar.h  |  50 ++
>  tools/testing/selftests/powerpc/ptrace/ptrace.h| 181
> +
>  4 files changed, 392 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-
> tar.c
>  create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-
> tar.h
> 

[snip]

> +
> +void tar(void)
> +{
> + unsigned long reg[3];
> + int ret;
> +
> + cptr = (int *)shmat(shm_id, NULL, 0);
> + printf("%-30s TAR: %u PPR: %lx DSCR: %u\n",
> + user_write, TAR_1, PPR_1, DSCR_1);
> +
> + mtspr(SPRN_TAR, TAR_1);
> + mtspr(SPRN_PPR, PPR_1);
> + mtspr(SPRN_DSCR, DSCR_1);
> +
> + cptr[2] = 1;
> +
> + /* Wait on parent */
> + while (!cptr[0]);
asm volatile("" ::: "memory");

> +
> + reg[0] = mfspr(SPRN_TAR);
> + reg[1] = mfspr(SPRN_PPR);
> + reg[2] = mfspr(SPRN_DSCR);
> +
> + printf("%-30s TAR: %lu PPR: %lx DSCR: %lu\n",
> + user_read, reg[0], reg[1], reg[2]);
> +
> + /* Unblock the parent now */
> + cptr[1] = 1;
> + shmdt((int *)cptr);
> +
> + ret = validate_tar_registers(reg, TAR_2, PPR_2, DSCR_2);
> + if (ret)
> + exit(1);
> + exit(0);
> +}
> +
> +int trace_tar(pid_t child)
> +{
> + unsigned long reg[3];
> + int ret;
> +
> + ret = start_trace(child);
> + if (ret)
> + return TEST_FAIL;
> +
> + ret = show_tar_registers(child, reg);
> + if (ret)
> + return TEST_FAIL;
> +
> + printf("%-30s TAR: %lu PPR: %lx DSCR: %lu\n",
> + ptrace_read_running, reg[0], reg[1],
> reg[2]);
> +
> + ret = validate_tar_registers(reg, TAR_1, PPR_1, DSCR_1);
> + if (ret)
> + return TEST_FAIL;
> +
> + ret = stop_trace(child);
> + if (ret)
> + return TEST_FAIL;
> +
> + return TEST_PASS;
> +}
> +
> +int trace_tar_write(pid_t child)
> +{
> + int ret;
> +
> + ret = start_trace(child);
> + if (ret)
> + return TEST_FAIL;
> +
> + ret = write_tar_registers(child, TAR_2, PPR_2, DSCR_2);
> + if (ret)
> + return TEST_FAIL;
> +
> + printf("%-30s TAR: %u PPR: %lx DSCR: %u\n",
> + ptrace_write_running, TAR_2, PPR_2, DSCR_2);
> +
> + ret = stop_trace(child);
> + if (ret)
> + return TEST_FAIL;
> +
> + return TEST_PASS;
> +}

More comments about calling TEST_FAIL(x)

> +
> +int ptrace_tar(void)
> +{
> + pid_t pid;
> + int ret, status;
> +
> + shm_id = shmget(IPC_PRIVATE, sizeof(int) * 3,
> 0777|IPC_CREAT);
> + pid = fork();
> + if (pid < 0) {
> + perror("fork() failed");
> + return TEST_FAIL;
> + }
> +
> + if (pid == 0)
> + tar();
> +
> + if (pid) {
> 

[snip]


Re: [PATCH v14 05/15] selftests/powerpc: Add ptrace tests for GPR/FPR registers in TM

2016-09-13 Thread Cyril Bur
On Mon, 2016-09-12 at 15:33 +0800, wei.guo.si...@gmail.com wrote:
> From: Anshuman Khandual 
> 
> This patch adds ptrace interface test for GPR/FPR registers
> inside TM context. This adds ptrace interface based helper
> functions related to checkpointed GPR/FPR access.
> 
> Signed-off-by: Anshuman Khandual 
> Signed-off-by: Simon Guo 
> ---
>  tools/testing/selftests/powerpc/ptrace/Makefile|   3 +-
>  .../selftests/powerpc/ptrace/ptrace-tm-gpr.c   | 296
> +
>  2 files changed, 298 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-tm-
> gpr.c
> 
> diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile
> b/tools/testing/selftests/powerpc/ptrace/Makefile
> index 31e8e33..170683a 100644
> --- a/tools/testing/selftests/powerpc/ptrace/Makefile
> +++ b/tools/testing/selftests/powerpc/ptrace/Makefile
> @@ -1,8 +1,9 @@
> -TEST_PROGS := ptrace-ebb ptrace-gpr
> +TEST_PROGS := ptrace-ebb ptrace-gpr ptrace-tm-gpr
>  
>  all: $(TEST_PROGS)
>  CFLAGS += -m64
>  $(TEST_PROGS): ../harness.c ptrace.S ../utils.c ptrace.h
>  ptrace-ebb: ../pmu/event.c ../pmu/lib.c ../pmu/ebb/ebb_handler.S
> ../pmu/ebb/busy_loop.S
> +
>  clean:
>   rm -f $(TEST_PROGS) *.o
> diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
> b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
> new file mode 100644
> index 000..8417d04
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
> @@ -0,0 +1,296 @@
> +/*
> + * Ptrace test for GPR/FPR registers in TM context
> + *
> + * Copyright (C) 2015 Anshuman Khandual, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +#include "ptrace.h"
> +#include "ptrace-gpr.h"
> +
> +/* Tracer and Tracee Shared Data */
> +int shm_id;
> +volatile unsigned long *cptr, *pptr;
> +
> +float a = FPR_1;
> +float b = FPR_2;
> +float c = FPR_3;
> +
> +void tm_gpr(void)
> +{
> + unsigned long gpr_buf[18];
> + unsigned long result, texasr;
> + float fpr_buf[32];
> +
> + printf("Starting the child\n");
> + cptr = (unsigned long *)shmat(shm_id, NULL, 0);
> +
> +trans:
> + cptr[1] = 0;
> + asm __volatile__(
> +
> + "li 14, %[gpr_1];"
> + "li 15, %[gpr_1];"
> + "li 16, %[gpr_1];"
> + "li 17, %[gpr_1];"
> + "li 18, %[gpr_1];"
> + "li 19, %[gpr_1];"
> + "li 20, %[gpr_1];"
> + "li 21, %[gpr_1];"
> + "li 22, %[gpr_1];"
> + "li 23, %[gpr_1];"
> + "li 24, %[gpr_1];"
> + "li 25, %[gpr_1];"
> + "li 26, %[gpr_1];"
> + "li 27, %[gpr_1];"
> + "li 28, %[gpr_1];"
> + "li 29, %[gpr_1];"
> + "li 30, %[gpr_1];"
> + "li 31, %[gpr_1];"
> +
> + "lfs 0, 0(%[flt_1]);"
> + "lfs 1, 0(%[flt_1]);"
> + "lfs 2, 0(%[flt_1]);"
> + "lfs 3, 0(%[flt_1]);"
> + "lfs 4, 0(%[flt_1]);"
> + "lfs 5, 0(%[flt_1]);"
> + "lfs 6, 0(%[flt_1]);"
> + "lfs 7, 0(%[flt_1]);"
> + "lfs 8, 0(%[flt_1]);"
> + "lfs 9, 0(%[flt_1]);"
> + "lfs 10, 0(%[flt_1]);"
> + "lfs 11, 0(%[flt_1]);"
> + "lfs 12, 0(%[flt_1]);"
> + "lfs 13, 0(%[flt_1]);"
> + "lfs 14, 0(%[flt_1]);"
> + "lfs 15, 0(%[flt_1]);"
> + "lfs 16, 0(%[flt_1]);"
> + "lfs 17, 0(%[flt_1]);"
> + "lfs 18, 0(%[flt_1]);"
> + "lfs 19, 0(%[flt_1]);"
> + "lfs 20, 0(%[flt_1]);"
> + "lfs 21, 0(%[flt_1]);"
> + "lfs 22, 0(%[flt_1]);"
> + "lfs 23, 0(%[flt_1]);"
> + "lfs 24, 0(%[flt_1]);"
> + "lfs 25, 0(%[flt_1]);"
> + "lfs 26, 0(%[flt_1]);"
> + "lfs 27, 0(%[flt_1]);"
> + "lfs 28, 0(%[flt_1]);"
> + "lfs 29, 0(%[flt_1]);"
> + "lfs 30, 0(%[flt_1]);"
> + "lfs 31, 0(%[flt_1]);"
> +

There was this in the previous patch? Can we consolidate?

> + "1: ;"
> + TBEGIN

tbegin. is probably fine

> + "beq 2f;"
> +
> + "li 14, %[gpr_2];"
> + "li 15, %[gpr_2];"
> + "li 16, %[gpr_2];"
> + "li 17, %[gpr_2];"
> + "li 18, %[gpr_2];"
> + "li 19, %[gpr_2];"
> + "li 20, %[gpr_2];"
> + "li 21, %[gpr_2];"
> + "li 22, %[gpr_2];"
> + "li 23, %[gpr_2];"
> + "li 24, %[gpr_2];"
> + "li 25, %[gpr_2];"
> + "li 26, 

Re: [PATCH v14 04/15] selftests/powerpc: Add ptrace tests for GPR/FPR registers

2016-09-13 Thread Cyril Bur
On Mon, 2016-09-12 at 15:33 +0800, wei.guo.si...@gmail.com wrote:
> From: Anshuman Khandual 
> 
> This patch adds ptrace interface test for GPR/FPR registers.
> This adds ptrace interface based helper functions related to
> GPR/FPR access and some assembly helper functions related to
> GPR/FPR registers.
> 
> Signed-off-by: Anshuman Khandual 
> Signed-off-by: Simon Guo 
> ---
>  tools/testing/selftests/powerpc/ptrace/Makefile|   3 +-
>  .../testing/selftests/powerpc/ptrace/ptrace-gpr.c  | 196
> +++
>  .../testing/selftests/powerpc/ptrace/ptrace-gpr.h  |  74 
>  tools/testing/selftests/powerpc/ptrace/ptrace.S| 131
> +
>  tools/testing/selftests/powerpc/ptrace/ptrace.h| 211
> +
>  5 files changed, 614 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-
> gpr.c
>  create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-
> gpr.h
>  create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace.S
> 

[snip]

> new file mode 100644
> index 000..193beea
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/ptrace/ptrace.S
> @@ -0,0 +1,131 @@
> +/*
> + * Ptrace interface test helper assembly functions
> + *
> + * Copyright (C) 2015 Anshuman Khandual, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +#include 
> +#include "../reg.h"
> +
> +
> +/* Non volatile GPR - unsigned long buf[18] */
> +FUNC_START(load_gpr)
> + ld  14, 0*8(3)
> + ld  15, 1*8(3)
> + ld  16, 2*8(3)
> + ld  17, 3*8(3)
> + ld  18, 4*8(3)
> + ld  19, 5*8(3)
> + ld  20, 6*8(3)
> + ld  21, 7*8(3)
> + ld  22, 8*8(3)
> + ld  23, 9*8(3)
> + ld  24, 10*8(3)
> + ld  25, 11*8(3)
> + ld  26, 12*8(3)
> + ld  27, 13*8(3)
> + ld  28, 14*8(3)
> + ld  29, 15*8(3)
> + ld  30, 16*8(3)
> + ld  31, 17*8(3)
> + blr
> +FUNC_END(load_gpr)
> +
> +FUNC_START(store_gpr)
> + std 14, 0*8(3)
> + std 15, 1*8(3)
> + std 16, 2*8(3)
> + std 17, 3*8(3)
> + std 18, 4*8(3)
> + std 19, 5*8(3)
> + std 20, 6*8(3)
> + std 21, 7*8(3)
> + std 22, 8*8(3)
> + std 23, 9*8(3)
> + std 24, 10*8(3)
> + std 25, 11*8(3)
> + std 26, 12*8(3)
> + std 27, 13*8(3)
> + std 28, 14*8(3)
> + std 29, 15*8(3)
> + std 30, 16*8(3)
> + std 31, 17*8(3)
> + blr
> +FUNC_END(store_gpr)
> +
> +/* Single Precision Float - float buf[32] */
> +FUNC_START(load_fpr)
> + lfs 0, 0*4(3)
> + lfs 1, 1*4(3)
> + lfs 2, 2*4(3)
> + lfs 3, 3*4(3)
> + lfs 4, 4*4(3)
> + lfs 5, 5*4(3)
> + lfs 6, 6*4(3)
> + lfs 7, 7*4(3)
> + lfs 8, 8*4(3)
> + lfs 9, 9*4(3)
> + lfs 10, 10*4(3)
> + lfs 11, 11*4(3)
> + lfs 12, 12*4(3)
> + lfs 13, 13*4(3)
> + lfs 14, 14*4(3)
> + lfs 15, 15*4(3)
> + lfs 16, 16*4(3)
> + lfs 17, 17*4(3)
> + lfs 18, 18*4(3)
> + lfs 19, 19*4(3)
> + lfs 20, 20*4(3)
> + lfs 21, 21*4(3)
> + lfs 22, 22*4(3)
> + lfs 23, 23*4(3)
> + lfs 24, 24*4(3)
> + lfs 25, 25*4(3)
> + lfs 26, 26*4(3)
> + lfs 27, 27*4(3)
> + lfs 28, 28*4(3)
> + lfs 29, 29*4(3)
> + lfs 30, 30*4(3)
> + lfs 31, 31*4(3)
> + blr
> +FUNC_END(load_fpr)
> +
> +FUNC_START(store_fpr)
> + stfs 0, 0*4(3)
> + stfs 1, 1*4(3)
> + stfs 2, 2*4(3)
> + stfs 3, 3*4(3)
> + stfs 4, 4*4(3)
> + stfs 5, 5*4(3)
> + stfs 6, 6*4(3)
> + stfs 7, 7*4(3)
> + stfs 8, 8*4(3)
> + stfs 9, 9*4(3)
> + stfs 10, 10*4(3)
> + stfs 11, 11*4(3)
> + stfs 12, 12*4(3)
> + stfs 13, 13*4(3)
> + stfs 14, 14*4(3)
> + stfs 15, 15*4(3)
> + stfs 16, 16*4(3)
> + stfs 17, 17*4(3)
> + stfs 18, 18*4(3)
> + stfs 19, 19*4(3)
> + stfs 20, 20*4(3)
> + stfs 21, 21*4(3)
> + stfs 22, 22*4(3)
> + stfs 23, 23*4(3)
> + stfs 24, 24*4(3)
> + stfs 25, 25*4(3)
> + stfs 26, 26*4(3)
> + stfs 27, 27*4(3)
> + stfs 28, 28*4(3)
> + stfs 29, 29*4(3)
> + stfs 30, 30*4(3)
> + stfs 31, 31*4(3)
> + blr
> +FUNC_END(store_fpr)

I wrote similar functions in math/fpu_asm.S perhaps it would be time
consolidate those and these into an fpu_asm.S file at a higher level,
TM related tests would benefit as well.

> diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace.h
> b/tools/testing/selftests/powerpc/ptrace/ptrace.h
> index fbf73ca..1019004 100644
> --- a/tools/testing/selftests/powerpc/ptrace/ptrace.h
> +++ 

Re: [PATCH v14 01/15] selftests/powerpc: Add more SPR numbers, TM & VMX instructions to 'reg.h'

2016-09-13 Thread Cyril Bur
On Mon, 2016-09-12 at 15:33 +0800, wei.guo.si...@gmail.com wrote:
> From: Anshuman Khandual 
> 
> This patch adds SPR number for TAR, PPR, DSCR special
> purpose registers. It also adds TM, VSX, VMX related
> instructions which will then be used by patches later
> in the series.
> 
> Signed-off-by: Anshuman Khandual 
> Signed-off-by: Simon Guo 
> ---
>  tools/testing/selftests/powerpc/reg.h | 42
> ---
>  1 file changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/powerpc/reg.h
> b/tools/testing/selftests/powerpc/reg.h
> index fddf368..fee7be9 100644
> --- a/tools/testing/selftests/powerpc/reg.h
> +++ b/tools/testing/selftests/powerpc/reg.h
> @@ -18,6 +18,19 @@
>  
>  #define mb() asm volatile("sync" : : : "memory");
>  
> +/* Vector Instructions */
> +#define VSX_XX1(xs, ra, rb)  (((xs) & 0x1f) << 21 | ((ra) <<
> 16) |  \
> +  ((rb) << 11) | (((xs) >> 5)))
> +#define STXVD2X(xs, ra, rb)  .long (0x7c000798 | VSX_XX1((xs),
> (ra), (rb)))
> +#define LXVD2X(xs, ra, rb)   .long (0x7c000698 | VSX_XX1((xs),
> (ra), (rb)))

Theres an instructions.h file in tools/testing/selftests/powerpc/ would
these be better suited there?

> +
> +/* TM instructions */
> +#define TBEGIN   ".long 0x7C00051D;"
> +#define TABORT   ".long 0x7C00071D;"
> +#define TEND ".long 0x7C00055D;"
> +#define TSUSPEND ".long 0x7C0005DD;"
> +#define TRESUME  ".long 0x7C2005DD;"
> +

These are only useful on old compilers that don't know about TM. For
selftests I would discourage creating these and using the actual
instructions, they are fairly well known today by most compilers.

>  #define SPRN_MMCR2 769
>  #define SPRN_MMCRA 770
>  #define SPRN_MMCR0 779
> @@ -51,10 +64,33 @@
>  #define SPRN_SDAR  781
>  #define SPRN_SIER  768
>  
> -#define SPRN_TEXASR 0x82
> +#define SPRN_TEXASR 0x82/* Transaction Exception and Status
> Register */
>  #define SPRN_TFIAR  0x81/* Transaction Failure Inst
> Addr*/
>  #define SPRN_TFHAR  0x80/* Transaction Failure Handler Addr
> */
> -#define TEXASR_FS   0x0800
> -#define SPRN_TAR0x32f
> +#define SPRN_TAR0x32f/* Target Address Register */
> +
> +#define SPRN_DSCR_PRIV 0x11  /* Privilege State DSCR */
> +#define SPRN_DSCR  0x03  /* Data Stream Control Register
> */
> +#define SPRN_PPR   896   /* Program Priority Register */
> +
> +/* TEXASR register bits */
> +#define TEXASR_FC0xFE00
> +#define TEXASR_FP0x0100
> +#define TEXASR_DA0x0080
> +#define TEXASR_NO0x0040
> +#define TEXASR_FO0x0020
> +#define TEXASR_SIC   0x0010
> +#define TEXASR_NTC   0x0008
> +#define TEXASR_TC0x0004
> +#define TEXASR_TIC   0x0002
> +#define TEXASR_IC0x0001
> +#define TEXASR_IFC   0x8000
> +#define TEXASR_ABT   0x0001
> +#define TEXASR_SPD   0x8000
> +#define TEXASR_HV0x2000
> +#define TEXASR_PR0x1000
> +#define TEXASR_FS0x0800
> +#define TEXASR_TE0x0400
> +#define TEXASR_ROT   0x0200
>  
>  #endif /* _SELFTESTS_POWERPC_REG_H */


Re: [PATCH v14 00/15] selftests/powerpc: Add ptrace tests for ppc registers

2016-09-13 Thread Cyril Bur
On Tue, 2016-09-13 at 01:01 +0800, Simon Guo wrote:
> Hi Cyril,
> On Tue, Sep 13, 2016 at 03:49:10PM +1000, Cyril Bur wrote:
> > 
> > Thanks for putting the effort in to get these merged! I have a few
> > remarks that apply to more than one patch which I'll say here.
> > 
> > I'm not sure #defining the TM instructions as .long for the
> > selftests
> > is useful. Compilers these days know about the instructions
> > 'tbegin.'
> > 'tsuspend.' and the like, I would question anyone using a compiler
> > old
> > enough not to know about these...
> I agree. But let me check with original author Anshuman firstly.
> 

Great!

I'll send through my other emails that I actually wrote before this
one, please ignore if I've repeated myself.

> > 
> > 
> > There are a few assembly fpu register load functions that could be
> > consolidated into those in math/ and even some in tm/
> Will rework that.
> 

Thanks

> > 
> > 
> > Doing while(ptr); to wait for another thread should be 
> > 
> > while(ptr)
> >     asm volatile("" : : : "memory");
> > 
> > Documentation/volatile-considered-harmful.txt for reasons why.
> > Even knowing this I did it your way without thinking in a selftest
> > I
> > wrote doing similar things and it turns out that it didn't work
> > [the
> > way we both expect it would].
> You are right.
> 

Thanks

> > 
> > 
> > Having said all that, I'm aware that these are selftests and this
> > series could be nicer but I won't lose any sleep if they were
> > merged
> > almost as is. Thanks for your work!
> > 
> > Finally, they didn't compile for me, I did a git rebase --exec with
> > my
> > build scripts and:
> > 
> > selftests/powerpc: Add ptrace tests for EBB
> > [snip]
> > *** No rule to make target 'ptrace.S', needed by 'ptrace-ebb'.
> > (that appears fixed by subsequent patch)
> > 
> > selftests/powerpc: Add ptrace tests for GPR/FPR registers
> > Seems to have failed horribly and those problems continue...
> > 
> > I applied these to powerpc-next at:
> > commit c6935931c1894ff857616ff8549b61236a19148f
> > Author: Linus Torvalds 
> > Date:   Sun Sep 4 14:31:46 2016 -0700
> > 
> > Linux 4.8-rc5
> > 
> > Should I have based on something else?
> I didn't reproduce the latter error and I also applied on c69359.
> My build script is only one line:
> make -C tools/testing/selftests TARGETS=powerpc 1>/dev/null
> 

I do the same except without TARGETS=powerpc (I test with all the
selftests). The difference I suspect is that I'm cross compiling and
the #include  in ptrace.h is the problem. On my x86
machine this includes an x86 version which has different defines.


> Did I miss anything with your build script?
> Anyway I need to fix that.

Its messy but I think the accepted solution for kselftests is to do:

#include "../../../../../usr/include/linux/elf.h"

which I believe will get the headers generated for the target by `make
headers_install` and therefore should match that for which the
kselftests are being compiled.

> 
> Thanks for the sharing. Most are good comments and I will rework
> that.
> 
> BR,
> - Simon


Re: [RFC] KVM: PPC: Book3S HV: Fall back to same size HPT in allocation ioctl

2016-09-13 Thread Anshuman Khandual
On 09/14/2016 05:42 AM, Paul Mackerras wrote:
> On Wed, Sep 14, 2016 at 09:57:48AM +1000, Michael Ellerman wrote:
>> > Anshuman Khandual  writes:
>> > 
>>> > > When the HPT size is explicitly passed on from the userspace, currently
>>> > > the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT
>>> > > from reserved CMA area and if that is not possible, the allocation just
>>> > > fails. With the commit 572abd563befd56 ("KVM: PPC: Book3S HV: Don't fall
>>> > > back to smaller HPT size in allocation ioctl"), it does not even try to
>>> > > allocate the same order pages from the page allocator before failing for
>>> > > good. Same order allocation should be attempted from the page allocator
>>> > > as a fallback option when the CMA allocation attempt fails.
>> > 
>> > It looks like if CMA is not configured we will just fail instantly.
>> > 
>> > So this does look like something we should fix.
>> > 
>> > But I think it is just a bug in commit 572abd563bef ("KVM: PPC: Book3S
>> > HV: Don't fall back to smaller HPT size in allocation ioctl"), which did:
>> > 
>> > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
>> > b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> > index 1f9c0a17f445..10722b1e38b5 100644
>> > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> > @@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>> > }
>> >  
>> > /* Lastly try successively smaller sizes from the page allocator */
>> > -   while (!hpt && order > PPC_MIN_HPT_ORDER) {
>> > +   /* Only do this if userspace didn't specify a size via ioctl */
>> > +   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
>> > hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
>> >__GFP_NOWARN, order - PAGE_SHIFT);
>> > if (!hpt)
>> > 
>> > 
>> > Instead of guarding the loop entry with !htab_orderp, it should have
>> > allowed the loop to enter, but prevented it from iterating if the
>> > allocation fails and htab_orderp != 0.
> You're right.  I'll fix it.

Thanks Paul, so I will not be sending follow up patch on this.



Re: [RFC] KVM: PPC: Book3S HV: Fall back to same size HPT in allocation ioctl

2016-09-13 Thread Anshuman Khandual
On 09/14/2016 05:27 AM, Michael Ellerman wrote:
> Anshuman Khandual  writes:
> 
>> When the HPT size is explicitly passed on from the userspace, currently
>> the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT
>> from reserved CMA area and if that is not possible, the allocation just
>> fails. With the commit 572abd563befd56 ("KVM: PPC: Book3S HV: Don't fall
>> back to smaller HPT size in allocation ioctl"), it does not even try to
>> allocate the same order pages from the page allocator before failing for
>> good. Same order allocation should be attempted from the page allocator
>> as a fallback option when the CMA allocation attempt fails.
> 
> It looks like if CMA is not configured we will just fail instantly.

Right and also we have this fallback registered any way. I wonder
why we are still debating about the need of a fallback mechanism
when we already have got one.

> 
> So this does look like something we should fix.
> 
> But I think it is just a bug in commit 572abd563bef ("KVM: PPC: Book3S
> HV: Don't fall back to smaller HPT size in allocation ioctl"), which did:

Hmm, I think its something the commit missed to accommodate for.
But maybe yes, its a bug in the commit.

> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 1f9c0a17f445..10722b1e38b5 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
> }
> 
> /* Lastly try successively smaller sizes from the page allocator */
> -   while (!hpt && order > PPC_MIN_HPT_ORDER) {
> +   /* Only do this if userspace didn't specify a size via ioctl */
> +   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
> hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
>__GFP_NOWARN, order - PAGE_SHIFT);
> if (!hpt)
> 
> 
> Instead of guarding the loop entry with !htab_orderp, it should have
> allowed the loop to enter, but prevented it from iterating if the
> allocation fails and htab_orderp != 0.

Right and thats what Aneesh's proposed patch (in the other thread) does.



[PATCH] powerpc/64: syscall ABI

2016-09-13 Thread Nicholas Piggin
Add some documentation for the 64-bit syscall ABI, which doesn't seem
to be documented elsewhere.

This attempts to documented existing practice. The only small
discrepancy is glibc clobbers not quite matching the kernel (e.g.,
xer, some vsyscalls trash cr1 whereas glibc only clobbers cr0). These
will be resolved after this document is merged.

Signed-off-by: Nicholas Piggin 
---
 Documentation/powerpc/syscall64-abi.txt | 103 
 1 file changed, 103 insertions(+)
 create mode 100644 Documentation/powerpc/syscall64-abi.txt

diff --git a/Documentation/powerpc/syscall64-abi.txt 
b/Documentation/powerpc/syscall64-abi.txt
new file mode 100644
index 000..30d1843
--- /dev/null
+++ b/Documentation/powerpc/syscall64-abi.txt
@@ -0,0 +1,103 @@
+Power Architecture 64-bit Linux system call ABI
+
+===
+syscall
+===
+syscall calling sequence[*] matches the Power Architecture 64-bit ELF ABI
+specification C function calling sequence, including register preservation
+rules, with the following differences.
+
+[*] Some syscalls (typically low-level management functions) may have
+different calling sequences (e.g., rt_sigreturn).
+
+Parameters and return value
+---
+The system call number is specified in r0.
+
+There is a maximum of 6 integer parameters to a syscall, passed in r3-r8.
+
+Both a return value and a return error code are returned. cr0.SO is the return
+error code, and r3 is the return value or error code. When cr0.SO is clear,
+the syscall succeeded and r3 is the return value. When cr0.SO is set, the
+syscall failed and r3 is the error code that generally corresponds to errno.
+
+Stack
+-
+System calls do not modify the caller's stack frame. For example, the caller's
+stack frame LR and CR save fields are not used.
+
+Register preservation rules
+---
+Register preservation rules match the ELF ABI calling sequence with the
+following differences:
+
+r0: Volatile.   (System call number.)
+r3: Volatile.   (Parameter 1, and return value.)
+r4-r8:  Volatile.   (Parameters 2-6.)
+cr0:Volatile(cr0.SO is the return error condition)
+cr1, cr5-7: Nonvolatile.
+lr: Nonvolatile.
+
+All floating point and vector data registers as well as control and status
+registers are nonvolatile.
+
+Invocation
+--
+The syscall is performed with the sc instruction, and returns with execution
+continuing at the instruction following the sc instruction.
+
+Transactional Memory
+
+Syscall behavior can change if the processor is in transactional or suspended
+transaction state, and the syscall can affect the behavior of the transaction.
+
+If the processor is in suspended state when a syscall is made, the syscall
+will be performed as normal, and will return as normal. The syscall will be
+performed in suspended state, so its side effects will be persistent according
+to the usual transactional memory semantics. A syscall may or may not result
+in the transaction being doomed by hardware.
+
+If the processor is in transactional state when a syscall is made, then the
+behavior depends on the presence of PPC_FEATURE2_HTM_NOSC in the AT_HWCAP2 ELF
+auxiliary vector.
+
+- If present, which is the case for newer kernels, then the syscall will not
+  be performed and the transaction will be doomed by the kernel with the
+  failure code TM_CAUSE_SYSCALL | TM_CAUSE_PERSISTENT in the TEXASR SPR.
+
+- If not present (older kernels), then the kernel will suspend the
+  transactional state and the syscall will proceed as in the case of a
+  suspended state syscall, and will resume the transactional state before
+  returning to the caller. This case is not well defined or supported, so this
+  behavior should not be relied upon.
+
+===
+vsyscall
+===
+vsyscall calling sequence matches the syscall calling sequence, with the
+following differences. Some vsyscalls may have different calling sequences.
+
+Parameters and return value
+---
+r0 is not used as an input. The vsyscall is selected by its address.
+
+Stack
+-
+The vsyscall may or may not use the caller's stack frame save areas.
+
+Register preservation rules
+---
+r0: Volatile.
+cr1, cr5-7: Volatile.
+lr: Volatile.
+
+Invocation
+--
+The vsyscall is performed with a branch-with-link instruction to the vsyscall
+function address.
+
+Transactional Memory
+
+vsyscalls will run in the same transactional state as the caller. A vsyscall
+may or may not result in the transaction being doomed by hardware.
+
-- 
2.9.3



[PATCH] powerpc/64: replay hypervisor maintenance priority

2016-09-13 Thread Nicholas Piggin
The hmi is defined to be higher priority than other maskable
interrupts, so replay it first, as a best-effort to replay according
to hardware priorities.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/irq.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 3cb46a3..ad1a930 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -155,6 +155,15 @@ notrace unsigned int __check_irq_replay(void)
}
 
/*
+* Check if an hypervisor Maintenance interrupt happened.
+* This is a higher priority interrupt than the others, so
+* replay it first.
+*/
+   local_paca->irq_happened &= ~PACA_IRQ_HMI;
+   if (happened & PACA_IRQ_HMI)
+   return 0xe60;
+
+   /*
 * We may have missed a decrementer interrupt. We check the
 * decrementer itself rather than the paca irq_happened field
 * in case we also had a rollover while hard disabled
@@ -189,11 +198,6 @@ notrace unsigned int __check_irq_replay(void)
}
 #endif /* CONFIG_PPC_BOOK3E */
 
-   /* Check if an hypervisor Maintenance interrupt happened */
-   local_paca->irq_happened &= ~PACA_IRQ_HMI;
-   if (happened & PACA_IRQ_HMI)
-   return 0xe60;
-
/* There should be nothing left ! */
BUG_ON(local_paca->irq_happened != 0);
 
-- 
2.9.3



[PATCH] powerpc/64: whitelist unresolved modversions CRCs

2016-09-13 Thread Nicholas Piggin
These are a symptom of CRC generation failure in generic
build code, and not powerpc specific.

Signed-off-by: Nicholas Piggin 
---

Hi Michal,

Please merge this via your trees with Al's patches. I have
a patch that can catch crc generation failure generically,
and both Arnd and I have some ideas to fix crc generation
for .S exports. This should get powerpc building again with
your tree.

Thanks,
Nick


 arch/powerpc/relocs_check.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/relocs_check.sh b/arch/powerpc/relocs_check.sh
index 2e4ebd0..ec2d5c8 100755
--- a/arch/powerpc/relocs_check.sh
+++ b/arch/powerpc/relocs_check.sh
@@ -30,6 +30,7 @@ bad_relocs=$(
# On PPC64:
#   R_PPC64_RELATIVE, R_PPC64_NONE
#   R_PPC64_ADDR64 mach_
+   #   R_PPC64_ADDR64 __crc_
# On PPC:
#   R_PPC_RELATIVE, R_PPC_ADDR16_HI,
#   R_PPC_ADDR16_HA,R_PPC_ADDR16_LO,
@@ -41,7 +42,8 @@ R_PPC_ADDR16_HI
 R_PPC_ADDR16_HA
 R_PPC_RELATIVE
 R_PPC_NONE' |
-   grep -E -v '\

Re: [RFC] fs: add userspace critical mounts event support

2016-09-13 Thread Rob Landley
On 09/02/2016 07:20 PM, Luis R. Rodriguez wrote:
> kernel_read_file_from_path() can try to read a file from
> the system's filesystem. This is typically done for firmware
> for instance, which lives in /lib/firmware. One issue with
> this is that the kernel cannot know for sure when the real
> final /lib/firmare/ is ready, and even if you use initramfs
> drivers are currently initialized *first* prior to the initramfs
> kicking off.

Why?

> During init we run through all init calls first
> (do_initcalls()) and finally the initramfs is processed via
> prepare_namespace():

What's the downside of moving initramfs cpio extraction earlier in the boot?

I did some shuffling around of those code to make initmpfs work, does
anybody know why initramfs extraction _before_ we initialize drivers
would be a bad thing? (The cpio is in memory, either linked into the
kernel or from the bootloader. No drivers are needed to extract it,
that's sort of the point.)

The only things I can think of are memory churn (large contiguous
physical page allocations), or if a driver somehow got us access to more
physical memory?

Rob


Re: [PATCH] hwrng: pasemi-rng - Use linux/io.h instead of asm/io.h

2016-09-13 Thread Michael Ellerman
Herbert Xu  writes:

> On Tue, Sep 06, 2016 at 01:58:39PM +0530, PrasannaKumar Muralidharan wrote:
>> Checkpatch.pl warns about usage of asm/io.h. Use linux/io.h instead.
>> 
>> Signed-off-by: PrasannaKumar Muralidharan 
>
> Patch applied.  Thanks.

Oops I merged it too, my bad.

Hopefully git will work out the resolution.

cheers


Re: [PATCH v20 00/20] perf, tools: Add support for PMU events in JSON format

2016-09-13 Thread Michael Ellerman
Jiri Olsa  writes:

> On Wed, Aug 31, 2016 at 09:15:30AM -0700, Andi Kleen wrote:
>> > > 
>> > > > 
>> > > > I've already made some changes in pmu-events/* to support
>> > > > this hierarchy to see how bad the change would be.. and
>> > > > it's not that bad ;-)
>> > > 
>> > > Everything has to be automated, please no manual changes.
>> > 
>> > sure
>> > 
>> > so, if you're ok with the layout, how do you want to proceed further?
>> 
>> If the split version is acceptable it's fine for me to merge it.
>> 
>> I'll add split-json to my scripting, so the next update would
>> be split too.
>
> ook, I'll wait for patches then

Who are you waiting for patches from?

Would be great if this could go in for 4.9 still.

cheers


Re: [RFC] KVM: PPC: Book3S HV: Fall back to same size HPT in allocation ioctl

2016-09-13 Thread Paul Mackerras
On Wed, Sep 14, 2016 at 09:57:48AM +1000, Michael Ellerman wrote:
> Anshuman Khandual  writes:
> 
> > When the HPT size is explicitly passed on from the userspace, currently
> > the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT
> > from reserved CMA area and if that is not possible, the allocation just
> > fails. With the commit 572abd563befd56 ("KVM: PPC: Book3S HV: Don't fall
> > back to smaller HPT size in allocation ioctl"), it does not even try to
> > allocate the same order pages from the page allocator before failing for
> > good. Same order allocation should be attempted from the page allocator
> > as a fallback option when the CMA allocation attempt fails.
> 
> It looks like if CMA is not configured we will just fail instantly.
> 
> So this does look like something we should fix.
> 
> But I think it is just a bug in commit 572abd563bef ("KVM: PPC: Book3S
> HV: Don't fall back to smaller HPT size in allocation ioctl"), which did:
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 1f9c0a17f445..10722b1e38b5 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
> }
>  
> /* Lastly try successively smaller sizes from the page allocator */
> -   while (!hpt && order > PPC_MIN_HPT_ORDER) {
> +   /* Only do this if userspace didn't specify a size via ioctl */
> +   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
> hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
>__GFP_NOWARN, order - PAGE_SHIFT);
> if (!hpt)
> 
> 
> Instead of guarding the loop entry with !htab_orderp, it should have
> allowed the loop to enter, but prevented it from iterating if the
> allocation fails and htab_orderp != 0.

You're right.  I'll fix it.

Paul.


Re: [PATCH v2 2/5] powerpc/sparse: Make a bunch of things static

2016-09-13 Thread Michael Ellerman
Daniel Axtens  writes:
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 05f09ae82587..95abca69b168 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -1608,7 +1608,7 @@ static ssize_t debugfs_htab_read(struct file *file, 
> char __user *buf,
>   return ret;
>  }
>  
> -ssize_t debugfs_htab_write(struct file *file, const char __user *buf,
> +static ssize_t debugfs_htab_write(struct file *file, const char __user *buf,
>  size_t len, loff_t *ppos)
>  {
>   return -EACCES;

I dropped these hunks, because they touch arch/powerpc/kvm and so should
technically go via Paul.

Can you resend them to him?

cheers


Re: [RFC] KVM: PPC: Book3S HV: Fall back to same size HPT in allocation ioctl

2016-09-13 Thread Michael Ellerman
Anshuman Khandual  writes:

> When the HPT size is explicitly passed on from the userspace, currently
> the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT
> from reserved CMA area and if that is not possible, the allocation just
> fails. With the commit 572abd563befd56 ("KVM: PPC: Book3S HV: Don't fall
> back to smaller HPT size in allocation ioctl"), it does not even try to
> allocate the same order pages from the page allocator before failing for
> good. Same order allocation should be attempted from the page allocator
> as a fallback option when the CMA allocation attempt fails.

It looks like if CMA is not configured we will just fail instantly.

So this does look like something we should fix.

But I think it is just a bug in commit 572abd563bef ("KVM: PPC: Book3S
HV: Don't fall back to smaller HPT size in allocation ioctl"), which did:

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1f9c0a17f445..10722b1e38b5 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
/* Lastly try successively smaller sizes from the page allocator */
-   while (!hpt && order > PPC_MIN_HPT_ORDER) {
+   /* Only do this if userspace didn't specify a size via ioctl */
+   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
   __GFP_NOWARN, order - PAGE_SHIFT);
if (!hpt)


Instead of guarding the loop entry with !htab_orderp, it should have
allowed the loop to enter, but prevented it from iterating if the
allocation fails and htab_orderp != 0.

cheers


Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms

2016-09-13 Thread Scott Wood
On Tue, 2016-09-13 at 07:23 +, Y.B. Lu wrote:
> > 


> > 
> > -Original Message-
> > From: linux-mmc-ow...@vger.kernel.org [mailto:linux-mmc-
> > ow...@vger.kernel.org] On Behalf Of Scott Wood
> > Sent: Tuesday, September 13, 2016 7:25 AM
> > To: Y.B. Lu; linux-...@vger.kernel.org; ulf.hans...@linaro.org; Arnd
> > Bergmann
> > Cc: linuxppc-dev@lists.ozlabs.org; devicet...@vger.kernel.org; linux-arm-
> > ker...@lists.infradead.org; linux-ker...@vger.kernel.org; linux-
> > c...@vger.kernel.org; linux-...@vger.kernel.org; iommu@lists.linux-
> > foundation.org; net...@vger.kernel.org; Mark Rutland; Rob Herring;
> > Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh
> > Sharma; Qiang Zhao; Kumar Gala; Santosh Shilimkar; Leo Li; X.B. Xie
> > Subject: Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
> > 
> > BTW, aren't ls2080a and ls2085a the same die?  And is there no non-E
> > version of LS2080A/LS2040A?
> [Lu Yangbo-B47093] I checked all the svr values in chip errata doc "Revision
> level to part marking cross-reference" table.
> I found ls2080a and ls2085a were in two separate doc. And I didn’t find non-
> E version of LS2080A/LS2040A in chip errata doc.
> Do you know is there any other doc we can confirm this?

No.  Traditionally we've always had E and non-E versions of each chip, but I
have no knowledge of whether that has changed (I do note that the way that E-
status is indicated in SVR has changed).

But please label LS2080A and LS2085A as the same die (or provide strong
evidence that they are not).

> 
> > 
> > 
> > > 
> > > > > 
> > > > > + do {
> > > > > + if (!matches->soc_id)
> > > > > + return NULL;
> > > > > + if (glob_match(svr_match, matches->soc_id))
> > > > > + break;
> > > > > + } while (matches++);
> > > > Are you expecting "matches++" to ever evaluate as false?
> > > [Lu Yangbo-B47093] Yes, this is used to match the soc we use in
> > > qoriq_soc array until getting true.
> > > We need to get the name and die information defined in array.
> > I'm not asking whether the glob_match will ever return true.  I'm saying
> > that "matches++" will never become NULL.
> [Lu Yangbo-B47093] The matches++ will never become NULL while it will return
> NULL after matching for all the members in array.

"matches++" will never "return NULL".  It's just an incrementing address.  It
won't be null until you wrap around the address space, and even if the other
loop terminators never kicked in you'd crash long before that happens.

Please rewrite the loop as something like:

while (matches->soc_id) {
if (glob_match(...))
return matches;

matches++;
}

return NULL;


> > > > > + /* Register soc device */
> > > > > + soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
> > > > > + if (!soc_dev_attr) {
> > > > > + ret = -ENOMEM;
> > > > > + goto out_unmap;
> > > > > + }
> > > > Couldn't this be statically allocated?
> > > [Lu Yangbo-B47093] Do you mean we define this struct statically ?
> > > 
> > > static struct soc_device_attribute soc_dev_attr;
> > Yes.
> > 
> [Lu Yangbo-B47093] It's ok to define it statically. Is there any need to do
> that?

It's simpler.

-Scott



Re: [PATCH] net: Remove NO_IRQ from powerpc-only network drivers

2016-09-13 Thread David Miller
From: Michael Ellerman 
Date: Sat, 10 Sep 2016 19:59:05 +1000

> We'd like to eventually remove NO_IRQ on powerpc, so remove usages of it
> from powerpc-only drivers.
> 
> Signed-off-by: Michael Ellerman 

Applied to net-next, thanks.


Re: [PATCH] hwrng: pasemi-rng - Use linux/io.h instead of asm/io.h

2016-09-13 Thread Herbert Xu
On Tue, Sep 06, 2016 at 01:58:39PM +0530, PrasannaKumar Muralidharan wrote:
> Checkpatch.pl warns about usage of asm/io.h. Use linux/io.h instead.
> 
> Signed-off-by: PrasannaKumar Muralidharan 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: powerpc: set used_vsr/used_vr/used_spe in sigreturn path when MSR bits are active

2016-09-13 Thread Michael Ellerman
On Tue, 2016-26-07 at 08:06:01 UTC, Simon Guo wrote:
> From: Simon Guo 
> 
> Normally, when MSR[VSX/VR/SPE] bits = 1, the used_vsr/used_vr/used_spe
> bit have already been set. However signal frame locates at user space
> and it is controlled by user application. It is up to kernel to make
> sure used_vsr/used_vr/used_spe(in kernel)=1 and consistent with MSR
> bits.
> 
> For example, CRIU application, who utilizes sigreturn to restore
> checkpointed process, will lead to the case where MSR[VSX] bit is
> active in signal frame, but used_vsx bit is not set. (the same applies
> to VR/SPE).
> 
> This patch will reinforce this at kernel by always setting used_* bit
> when MSR related bits are active in signal frame and we are doing
> sigreturn.
> 
> This patch is based on Ben's Proposal.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Simon Guo 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e1c0d66fcb179a1737b3d5cc11

cheers


Re: hwrng: pasemi-rng - Use linux/io.h instead of asm/io.h

2016-09-13 Thread Michael Ellerman
On Tue, 2016-06-09 at 08:28:39 UTC, PrasannaKumar Muralidharan wrote:
> Checkpatch.pl warns about usage of asm/io.h. Use linux/io.h instead.
> 
> Signed-off-by: PrasannaKumar Muralidharan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/63019f3cab99c7acd27df5a5b8

cheers


Re: powerpc/ps3: fix spelling mistake in function name

2016-09-13 Thread Michael Ellerman
On Sun, 2016-28-08 at 10:59:00 UTC, Colin King wrote:
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in dev_warn message and remove
> extraneous trailing whitespace at end of the message.
> 
> Signed-off-by: Colin Ian King 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/6f95d4b2f655faaf38804999de

cheers


Re: [v2] powerpc/ptrace: Fix cppcheck issue in gpr32_set_common/gpr32_get_common()

2016-09-13 Thread Michael Ellerman
On Sun, 2016-11-09 at 13:44:13 UTC, Simon Guo wrote:
> From: Simon Guo 
> 
> The ckpt_regs usage in gpr32_set_common/gpr32_get_common() will lead to
> following cppcheck error at ifndef CONFIG_PPC_TRANSACTIONAL_MEM case:
> 
> [arch/powerpc/kernel/ptrace.c:2062]:
> (error) Uninitialized variable: ckpt_regs
> [arch/powerpc/kernel/ptrace.c:2130]:
> (error) Uninitialized variable: ckpt_regs
> 
> The problem is due to gpr32_set_common() used ckpt_regs variable which
> only makes sense at #ifdef CONFIG_PPC_TRANSACTIONAL_MEM.
> 
> This patch fix this issue by passing in "regs" parameter instead.
> 
> Reported-by: Daniel Axtens 
> Signed-off-by: Simon Guo 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/261831160d4df6bafe2f0e12e6

cheers


Re: powerpc/xmon: Don't use ld on 32-bit

2016-09-13 Thread Michael Ellerman
On Fri, 2016-09-09 at 05:54:37 UTC, Michael Ellerman wrote:
> In commit 31cdd0c39c75 ("powerpc/xmon: Fix SPR read/write commands and
> add command to dump SPRs") I added two uses of the "ld" instruction in
> spr_access.S. "ld" is a 64-bit instruction, so shouldn't be used on
> 32-bit CPUs.
> 
> Replace it with PPC_LL which is a macro that gives us either "ld" or
> "lwz" depending on whether we're 64 or 32-bit.
> 
> Fixes: 31cdd0c39c75 ("powerpc/xmon: Fix SPR read/write commands and add 
> command to dump SPRs")
> Cc: sta...@vger.kernel.org # v4.7+
> Reported-by: John Paul Adrian Glaubitz 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/b42d9023a31e384504f5b53fc9

cheers


Re: cxl: Fix informational message

2016-09-13 Thread Michael Ellerman
On Mon, 2016-12-09 at 10:37:43 UTC, Frederic Barrat wrote:
> When set_sl_ops() is called, the adapter data structure is not fully
> initialized yet. Therefore the device name is not showing up in the
> trace. Fix is simply to get the device name from the pci_dev
> structure.
> 
> Fixes: 6d382616ac22 ("cxl: Abstract the differences between the PSL and XSL")
> Signed-off-by: Frederic Barrat 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b135077b83f01549c2a0685b16

cheers


Re: powerpc/32: add missing \n at end of printk warning message

2016-09-13 Thread Michael Ellerman
On Mon, 2016-12-09 at 10:12:24 UTC, Colin King wrote:
> From: Colin Ian King 
> 
> The message is missing a \n, add it.
> 
> Signed-off-by: Colin Ian King 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/3daf3c206992891ac0cec6a54a

cheers


Re: [v2,5/5] powerpc/sparse: Add more assembler prototypes

2016-09-13 Thread Michael Ellerman
On Tue, 2016-06-09 at 05:32:43 UTC, Daniel Axtens wrote:
> Another set of things that are only called from assembler and so need
> prototypes to keep sparse happy.
> 
> Signed-off-by: Daniel Axtens 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/0545d5436aefddff7ca417adc1

cheers


Re: [v2,4/5] powerpc/fadump: Make ELF eflags depend on endian

2016-09-13 Thread Michael Ellerman
On Tue, 2016-06-09 at 05:32:42 UTC, Daniel Axtens wrote:
> Firmware Assisted Dump is a facility to dump kernel core with assistance
> from firmware.  As part of this process the kernel ELF version is
> stored.
> 
> Currently, fadump.h defines this to 0 if it is not already defined. This
> clashes with a define in elf.h which sets it based on the current task -
> not based on the kernel.
> 
> When the kernel is compiled on LE, the kernel will always be version
> 2. Otherwise it will be version 0. So the correct behaviour is to set
> the ELF eflags based on the endianness of the kernel. Do that.
> 
> Remove the definition in fadump.h, which becomes unused.
> 
> Cc: Mahesh Salgaonkar 
> Cc: Hari Bathini 
> Signed-off-by: Daniel Axtens 
> Reviewed-by: Mahesh Salgaonkar 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d8bced27be25537bde3714cbdb

cheers


Re: [v2,2/5] powerpc/sparse: Make a bunch of things static

2016-09-13 Thread Michael Ellerman
On Tue, 2016-06-09 at 05:32:40 UTC, Daniel Axtens wrote:
> Squash a bunch of sparse warnings by making things static.
> 
> Reviewed-by: Andrew Donnellan 
> Signed-off-by: Daniel Axtens 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7c98bd72081c44670e2d0b60ae

cheers


Re: [2/3] powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address

2016-09-13 Thread Michael Ellerman
On Fri, 2016-02-09 at 11:49:21 UTC, Paul Mackerras wrote:
> Currently, if userspace or the kernel accesses a completely bogus address,
> for example with any of bits 46-59 set, we first take an SLB miss interrupt,
> install a corresponding SLB entry with VSID 0, retry the instruction, then
> take a DSI/ISI interrupt because there is no HPT entry mapping the address.
> However, by the time of the second interrupt, the Come-From Address Register
> (CFAR) has been overwritten by the rfid instruction at the end of the SLB
> miss interrupt handler.  Since bogus accesses can often be caused by a
> function return after the stack has been overwritten, the CFAR value would
> be very useful as it could indicate which function it was whose return had
> led to the bogus address.
> 
> This patch adds code to create a full exception frame in the SLB miss handler
> in the case of a bogus address, rather than inserting an SLB entry with a
> zero VSID field.  Then we call a new slb_miss_bad_addr() function in C code,
> which delivers a signal for a user access or creates an oops for a kernel
> access.  In the latter case the oops message will show the CFAR value at the
> time of the access.
> 
> In the case of the radix MMU, a segment miss interrupt indicates an access
> outside the ranges mapped by the page tables.  Previously this was handled
> by the code for an unrecoverable SLB miss (one with MSR[RI] = 0), which is
> not really correct.  With this patch, we now handle these interrupts with
> slb_miss_bad_addr(), which is much more consistent.
> 
> Signed-off-by: Paul Mackerras 
> Reviewed-by: Aneesh Kumar K.V 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f0f558b131db0e793fd90aac5d

cheers


Re: [2/2] powerpc/64: Do load of PACAKBASE in LOAD_HANDLER

2016-09-13 Thread Michael Ellerman
On Tue, 2016-26-07 at 05:29:30 UTC, Michael Ellerman wrote:
> The LOAD_HANDLER macro requires that you have previously loaded "reg"
> with PACAKBASE. Although that gives callers flexibility to get PACAKBASE
> in some interesting way, none of the callers actually do that. So fold
> the load of PACAKBASE into the macro, making it simpler for callers to
> use correctly.
> 
> Signed-off-by: Michael Ellerman 
> Reviewed-by: Nick Piggin 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/d8d42b0511fefc78165ee9b4c2

cheers


Re: [v2,1/5] powerpc/cell: drop unused iic_get_irq_host()

2016-09-13 Thread Michael Ellerman
On Tue, 2016-06-09 at 05:32:39 UTC, Daniel Axtens wrote:
> Sparse checking revealed that it is no longer used.
> There is an EXPORT_SYMBOL_GPL, but there's no header that
> provides a prototype, so nothing should be using it anyway.
> 
> Remove it.
> 
> Signed-off-by: Daniel Axtens 
> Reviewed-by: Andrew Donnellan 
> Acked-by: Arnd Bergmann 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/bc42f1d9f5b31060a3c6b83983

cheers


Re: [1/5] powerpc/Makefile: CROSS32AS is unused, remove it

2016-09-13 Thread Michael Ellerman
On Thu, 2016-11-08 at 06:03:11 UTC, Michael Ellerman wrote:
> In fact it makes no sense at all to have this defined on little endian
> builds. Since we disabled the 32-bit VDSO on little endian, we don't
> build any 32-bit code when building a little endian kernel.
> 
> Signed-off-by: Michael Ellerman 

Series applied to powerpc next.

https://git.kernel.org/powerpc/c/d312603a44eb9dc0dbb0a642a6

cheers


Re: [1/4] powerpc/book3s: Add a cpu table entry for different POWER9 revs

2016-09-13 Thread Michael Ellerman
On Wed, 2016-24-08 at 09:33:36 UTC, "Aneesh Kumar K.V" wrote:
> Signed-off-by: Aneesh Kumar K.V 
> Acked-by: Michael Neuling 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7dccfbc325bb59f94521d544a8

cheers


Re: [RFC,1/3] powerpc/pasemi: Add Nemo motherboard config option.

2016-09-13 Thread Michael Ellerman
On Wed, 2016-31-08 at 12:24:34 UTC, Darren Stevens wrote:
> Add config option for the Nemo motherboard used in the Amigaone X1000.
> This is a custom PASemi board with an AMD SB600 southbridge, and needs
> some patches to it device tree. This option will be used to build these
> into the kernel
> 
> Signed-off-by: Darren Stevens 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/88c13e2f4f0487a67f9eed044b

cheers


Re: [1/2] powerpc/64: Correct comment on LOAD_HANDLER()

2016-09-13 Thread Michael Ellerman
On Tue, 2016-26-07 at 05:29:29 UTC, Michael Ellerman wrote:
> The comment for LOAD_HANDLER() was wrong. The part about kdump has not
> been true since 1f6a93e4c35e ("powerpc: Make it possible to move the
> interrupt handlers away from the kernel").
> 
> Describe how it currently works, and combine the two separate comments
> into one.
> 
> Signed-off-by: Michael Ellerman 
> Reviewed-by: Nick Piggin 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/27510235dd2bb1ab01d27b01f0

cheers


Re: [PATCH v14 00/15] selftests/powerpc: Add ptrace tests for ppc registers

2016-09-13 Thread Simon Guo
Hi Cyril,
On Tue, Sep 13, 2016 at 03:49:10PM +1000, Cyril Bur wrote:
> Thanks for putting the effort in to get these merged! I have a few
> remarks that apply to more than one patch which I'll say here.
> 
> I'm not sure #defining the TM instructions as .long for the selftests
> is useful. Compilers these days know about the instructions 'tbegin.'
> 'tsuspend.' and the like, I would question anyone using a compiler old
> enough not to know about these...
I agree. But let me check with original author Anshuman firstly.

> 
> There are a few assembly fpu register load functions that could be
> consolidated into those in math/ and even some in tm/
Will rework that.

> 
> Doing while(ptr); to wait for another thread should be 
> 
> while(ptr)
>     asm volatile("" : : : "memory");
> 
> Documentation/volatile-considered-harmful.txt for reasons why.
> Even knowing this I did it your way without thinking in a selftest I
> wrote doing similar things and it turns out that it didn't work [the
> way we both expect it would].
You are right.

> 
> Having said all that, I'm aware that these are selftests and this
> series could be nicer but I won't lose any sleep if they were merged
> almost as is. Thanks for your work!
> 
> Finally, they didn't compile for me, I did a git rebase --exec with my
> build scripts and:
> 
> selftests/powerpc: Add ptrace tests for EBB
>   [snip]
>   *** No rule to make target 'ptrace.S', needed by 'ptrace-ebb'.
> (that appears fixed by subsequent patch)
> 
> selftests/powerpc: Add ptrace tests for GPR/FPR registers
>   Seems to have failed horribly and those problems continue...
> 
> I applied these to powerpc-next at:
> commit c6935931c1894ff857616ff8549b61236a19148f
> Author: Linus Torvalds 
> Date:   Sun Sep 4 14:31:46 2016 -0700
> 
> Linux 4.8-rc5
> 
> Should I have based on something else?
I didn't reproduce the latter error and I also applied on c69359.
My build script is only one line:
make -C tools/testing/selftests TARGETS=powerpc 1>/dev/null

Did I miss anything with your build script?
Anyway I need to fix that.

Thanks for the sharing. Most are good comments and I will rework that.

BR,
- Simon


Re: [Openipmi-developer] [PATCH 3/4] ipmi: allow dynamic BMC version information

2016-09-13 Thread Jeremy Kerr
Hi Corey,

> In all, this looks good.  I have two minor nits inline below, and
> two more major comments here:
> 
> I would prefer if it always queried the data, even if the device id
> information is provided by the lower level driver.

OK, can do. I was concerned that the SMI driver may want to provide its
own device identification details (and not want to override those with
the ones provided by Get Device Id), but if that's not the case then it
makes sense to store the original values but allow requery.

> Since the values can change while you are querying them, do
> we need some sort of mutex on them?

Which values are you referring to here? The device ID, or the members of
struct bmc_device?

If it's the latter, I'd be happy to take your guidance on the locking
protocol there. Is there an existing mutex that would be suitable for
that?

>From the comments inline:

> > +/* Do we have enough data to parse the device ID details? This doesn't
> > + * inclde the optional auxilliary version data. */
> 
> Minor nit: include is misspelled. 

D'oh, will fix in v2.

> > @@ -2450,6 +2590,8 @@ static int ipmi_bmc_register(ipmi_smi_t intf, int 
> > ifnum)
> >   
> > mutex_lock(_mutex);
> >   
> > +   bmc_update_device_id(bmc);
> > +
> 
> What happens if this fails?  You can return an error from this function.
> It's also probably not necessary to have this inside the mutex.

I didn't think we'd want to fail registration on query failure there; it
could be a transient error.

Cheers,


Jeremy


Re: [RFC] KVM: PPC: Book3S HV: Fall back to same size HPT in allocation ioctl

2016-09-13 Thread Balbir Singh
On Tue, Sep 13, 2016 at 3:49 PM, Anshuman Khandual
 wrote:
> On 09/13/2016 10:04 AM, Balbir Singh wrote:
>>
>>
>> On 13/09/16 14:07, Anshuman Khandual wrote:
>>> On 09/12/2016 05:03 PM, Balbir Singh wrote:
 On Mon, Sep 12, 2016 at 9:13 PM, Anshuman Khandual
  wrote:
>> When the HPT size is explicitly passed on from the userspace, currently
>> the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT
>> from reserved CMA area and if that is not possible, the allocation just
>> fails. With the commit 572abd563befd56 ("KVM: PPC: Book3S HV: Don't fall
>> back to smaller HPT size in allocation ioctl"), it does not even try to
>> allocate the same order pages from the page allocator before failing for
>> good. Same order allocation should be attempted from the page allocator
>> as a fallback option when the CMA allocation attempt fails.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>> - This change saves guests from failing to start after migration
>>
>>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
>> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> index 05f09ae..0a30eb4 100644
>> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> @@ -78,6 +78,14 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 
>> *htab_orderp)
>> --order;
>> }
>>
>> +   /*
>> +* Fallback in case the userspace has provided a size via ioctl.
>> +* Try allocating the same order pages from the page allocator.
>> +*/
>> +   if (!hpt && order > PPC_MIN_HPT_ORDER && htab_orderp)
>> +   hpt = 
>> __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
>> +   __GFP_NOWARN, order - PAGE_SHIFT);
>> +
 How often does this succeed? Please provide data. I presume this for
>>>
>>> During continuous guest VM migration test from source host to destination 
>>> host
>>> this patch was able to prevent guest creation failure after migration on the
>>> destination host which was failing after 2-3 days. We have not seen the 
>>> failure
>>> till now even after 3-4 days.
>>>
>>
>> OK.. the CMA failures need analysis. Are we just ignoring a CMA bug? IOW, why
>
> Sure, it does need analysis. But there will be situations where CMA
> allocation request can fail, thats why we will need fallback option.

Please elaborate those situations. This patch needs more explanation
as to why we should fallback -- what are those short comings of CMA
allocation. Can anyone using CMA face them and have to design a fallback?

> That the same reason why we have fall back options of attempting from
> page allocator (in decreasing order every time) when the size is not
> specified as part of the ioctl. Why the case should be any different
> when the size is specified in the ioctl().
>
>> would CMA allocation fail -- CMA size is too small to accommodate the 
>> required
>> number of allocations?
>
> The same size seems to be good enough for first couple of days and
> then it fails. Probably some __GFP_MOVABLE allocation got pinned
> later on.
>

Please analyze and let us know

>>
 the case where guest pages are pinned?
>>>
>>> Hmm, need to check that in the test setup. There was nothing running inside 
>>> the
>>> guests though. IIUC, HPT size of the guest is computed based on the max 
>>> memory
>>> the guest is ever going to have irrespective of the RAM usage before 
>>> migration.
>>> How does pinning effect the HPT size ?
>>>
>>
>> If the pinned pages (from anywhere) belong to CMA, then CMA allocations 
>> would start failing
>
> Right and with the current design of CMA we can do nothing about it,
> unless we make sure the pages allocated to satisfy guest real memory
> do not come from CMA area at all.
>

I have patches to move non-THP pages out of CMA

Balbir


Re: [PATCH 3/8] powerpc/pseries: exception vector macros

2016-09-13 Thread Nicholas Piggin
On Tue, 13 Sep 2016 14:56:47 +0800
kbuild test robot <l...@intel.com> wrote:

> Hi Nicholas,
> 
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v4.8-rc6 next-20160912]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> [Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
> convenience) to record what (public, well-known) commit your patch series was 
> built on]
> [Check https://git-scm.com/docs/git-format-patch for more information]
> 
> url:
> https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64-use-asm-sections-for-head-exception-layout/20160913-113052
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-defconfig (attached as .config)
> compiler: powerpc64-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
> reproduce:
> wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=powerpc 
> 
> All errors (new ones prefixed by >>):
> 
>arch/powerpc/kernel/built-in.o: In function `.arch_local_irq_restore':
> >> (.text+0x7390): undefined reference to `.__replay_interrupt'  

Ah, __replay_interrupt lost its _GLOBAL annotation, that must be
it. I'm not sure why I didn't see this -- I tested big endian build...
I'll fix that up.


Re: linux-next: manual merge of the kbuild tree with Linus' tree

2016-09-13 Thread Nicholas Piggin
On Tue, 13 Sep 2016 09:48:03 +0200
Arnd Bergmann  wrote:

> On Tuesday, September 13, 2016 2:02:57 PM CEST Stephen Rothwell wrote:
> > [For the new cc's, we are discussing the "thin archives" and "link dead
> > code/data elimination" patches in the kbuild tree.]
> > 
> > On Tue, 13 Sep 2016 09:39:45 +1000 Stephen Rothwell  
> > wrote:  
> > >
> > > On Mon, 12 Sep 2016 11:03:08 +0200 Michal Marek  wrote:  
> > > >
> > > > On 2016-09-12 04:53, Nicholas Piggin wrote:
> > > > > Question, what is the best way to merge dependent patches? Considering
> > > > > they will need a good amount of architecture testing, I think they 
> > > > > will
> > > > > have to go via arch trees. But it also does not make sense to merge 
> > > > > these
> > > > > kbuild changes upstream first, without having tested them.  
> > > > 
> > > > I think it makes sense to merge the kbuild changes via kbuild.git, even
> > > > if they are unused and untested. Any follow-up fixes required to enable
> > > > the first architecture can go through the respective architecture tree.
> > > > Does that sound OK?
> > > 
> > > And if you guarantee not to rebase the kbuild tree (or at least the
> > > subset containing these patches), then each of the architecture trees
> > > can just merge your tree (or a tag?) and then implement any necessary
> > > arch dependent changes.  I fixes are necessary, they can also be merged
> > > into the architecture trees.  
> > 
> > Except, of course, the kbuild tree still has the asm EXPORT_SYMBOL
> > patches that produce warnings on PowerPC  (And I am still reverting
> > the PowerPC specific one of those patches).  
> 
> Is that really powerpc specific? I have the same problem on ARM
> and I don't see how any architecture would not have it.
> 
> I prototyped the patch below, which fixes it for me, but I have
> not dared submit that workaround because it's butt ugly.

No it's not powerpc specific, it's just that powerpc build dies
if there are unresolved relocations.

Interesting approach. I have something different that may rival
yours for ugliness, but maybe keeps the muck a bit more contained.
I was just about to submit it, but now I'll wait to see if there is
a preference between the approaches:

(Note this patch alone does not resolve all export symbols, each
arch next needs to add C prototypes for their .S exports)

 scripts/Makefile.build | 71 +-
 1 file changed, 65 insertions(+), 6 deletions(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 11602e5..1e89908 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -158,7 +158,8 @@ cmd_cpp_i_c   = $(CPP) $(c_flags) -o $@ $<
 $(obj)/%.i: $(src)/%.c FORCE
$(call if_changed_dep,cpp_i_c)
 
-cmd_gensymtypes =   \
+# These mirror gensymtypes_S and co below, keep them in synch.
+cmd_gensymtypes_c = \
 $(CPP) -D__GENKSYMS__ $(c_flags) $< |   \
 $(GENKSYMS) $(if $(1), -T $(2)) \
  $(patsubst y,-s _,$(CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX)) \
@@ -168,7 +169,7 @@ cmd_gensymtypes =   
\
 quiet_cmd_cc_symtypes_c = SYM $(quiet_modtag) $@
 cmd_cc_symtypes_c = \
 set -e; \
-$(call cmd_gensymtypes,true,$@) >/dev/null; \
+$(call cmd_gensymtypes_c,true,$@) >/dev/null;   \
 test -s $@ || rm -f $@
 
 $(obj)/%.symtypes : $(src)/%.c FORCE
@@ -197,9 +198,10 @@ else
 #   the actual value of the checksum generated by genksyms
 
 cmd_cc_o_c = $(CC) $(c_flags) -c -o $(@D)/.tmp_$(@F) $<
-cmd_modversions =  
\
+
+cmd_modversions_c =
\
if $(OBJDUMP) -h $(@D)/.tmp_$(@F) | grep -q __ksymtab; then 
\
-   $(call cmd_gensymtypes,$(KBUILD_SYMTYPES),$(@:.o=.symtypes))
\
+   $(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes))  
\
> $(@D)/.tmp_$(@F:.o=.ver); 
\

\
$(LD) $(LDFLAGS) -r -o $@ $(@D)/.tmp_$(@F)  
\
@@ -267,13 +269,14 @@ endif # CONFIG_STACK_VALIDATION
 define rule_cc_o_c
$(call echo-cmd,checksrc) $(cmd_checksrc) \
$(call cmd_and_fixdep,cc_o_c) \
-   $(cmd_modversions)\
+   $(cmd_modversions_c)   

[PATCH v5 4/4] PCI: Add a macro to set default alignment for all PCI devices

2016-09-13 Thread Yongji Xie
When vfio passthroughs a PCI device of which MMIO BARs are
smaller than PAGE_SIZE, guest will not handle the mmio
accesses to the BARs which leads to mmio emulations in host.

This is because vfio will not allow to passthrough one BAR's
mmio page which may be shared with other BARs. Otherwise,
there will be a backdoor that guest can use to access BARs
of other guest.

This patch adds a macro to set default alignment for all
PCI devices. Then we could solve this issue on some platforms
which would easily hit this issue because of their 64K page
such as PowerNV platform by defining this macro as PAGE_SIZE.

Signed-off-by: Yongji Xie 
---
 arch/powerpc/include/asm/pci.h |4 
 drivers/pci/pci.c  |4 
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index e9bd6cf..5e31bc2 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -28,6 +28,10 @@
 #define PCIBIOS_MIN_IO 0x1000
 #define PCIBIOS_MIN_MEM0x1000
 
+#ifdef CONFIG_PPC_POWERNV
+#define PCIBIOS_DEFAULT_ALIGNMENT  PAGE_SIZE
+#endif
+
 struct pci_dev;
 
 /* Values for the `which' argument to sys_pciconfig_iobase syscall.  */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 37f8062..9c61cbe 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4959,6 +4959,10 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
resource_size_t align = 0;
char *p;
 
+#ifdef PCIBIOS_DEFAULT_ALIGNMENT
+   align = PCIBIOS_DEFAULT_ALIGNMENT;
+   *resize = false;
+#endif
spin_lock(_alignment_lock);
p = resource_alignment_param;
if (pci_has_flag(PCI_PROBE_ONLY)) {
-- 
1.7.9.5



[PATCH v5 3/4] PCI: Add a new option for resource_alignment to reassign alignment

2016-09-13 Thread Yongji Xie
When using resource_alignment kernel parameter, the current
implement reassigns the alignment by changing resources' size
which can potentially break some drivers. For example, the driver
uses the size to locate some register whose length is related
to the size.

This patch adds a new option "noresize" for the parameter to
solve this problem.

Signed-off-by: Yongji Xie 
---
 Documentation/kernel-parameters.txt |9 ++---
 drivers/pci/pci.c   |   37 +--
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index a4f4d69..d6a340d 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3023,9 +3023,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
window. The default value is 64 megabytes.
resource_alignment=
Format:
-   [@][:]:.[; ...]
-   [@]pci::\
-   [::][; 
...]
+   [@][noresize@][:]
+   :.[; ...]
+   [@][noresize@]pci::
+   [::][; ...]
Specifies alignment and device to reassign
aligned memory resources.
If  is not specified,
@@ -3036,6 +3037,8 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
instances of a device, the PCI vendor,
device, subvendor, and subdevice may be
specified, e.g., 4096@pci:8086:9c22:103c:198f
+   noresize: Don't change the resources' sizes when
+   reassigning alignment.
ecrc=   Enable/disable PCIe ECRC (transaction layer
end-to-end CRC checking).
bios: Use BIOS/firmware settings. This is the
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b8357d7..37f8062 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4946,11 +4946,13 @@ static DEFINE_SPINLOCK(resource_alignment_lock);
 /**
  * pci_specified_resource_alignment - get resource alignment specified by user.
  * @dev: the PCI device to get
+ * @resize: whether or not to change resources' size when reassigning alignment
  *
  * RETURNS: Resource alignment if it is specified.
  *  Zero if it is not specified.
  */
-static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
+static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
+   bool *resize)
 {
int seg, bus, slot, func, align_order, count;
unsigned short vendor, device, subsystem_vendor, subsystem_device;
@@ -4974,6 +4976,13 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev)
} else {
align_order = -1;
}
+
+   if (!strncmp(p, "noresize@", 9)) {
+   *resize = false;
+   p += 9;
+   } else
+   *resize = true;
+
if (strncmp(p, "pci:", 4) == 0) {
/* PCI vendor/device (subvendor/subdevice) ids are 
specified */
p += 4;
@@ -5045,6 +5054,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
 {
int i;
struct resource *r;
+   bool resize = true;
resource_size_t align, size;
u16 command;
 
@@ -5058,7 +5068,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
return;
 
/* check if specified PCI is target device to reassign */
-   align = pci_specified_resource_alignment(dev);
+   align = pci_specified_resource_alignment(dev, );
if (!align)
return;
 
@@ -5086,15 +5096,22 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
}
 
size = resource_size(r);
-   if (size < align) {
-   size = align;
-   dev_info(>dev,
-   "Rounding up size of resource #%d to %#llx.\n",
-   i, (unsigned long long)size);
+   if (resize) {
+   if (size < align) {
+   size = align;
+   dev_info(>dev,
+   "Rounding up size of resource #%d to 
%#llx.\n",
+   i, (unsigned long long)size);
+   }
+   r->flags |= IORESOURCE_UNSET;
+   r->end = size - 

[PATCH v5 2/4] PCI: Ignore enforced alignment to VF BARs

2016-09-13 Thread Yongji Xie
VF BARs are read-only zeroes according to SRIOV spec,
the normal way(writing BARs) of allocating resources wouldn't
be applied to VFs. The VFs' resources would be allocated
when we enable SR-IOV capability. So we should not try to
reassign alignment after we enable VFs. It's meaningless
and will release the allocated resources which leads to a bug.

Signed-off-by: Yongji Xie 
---
 drivers/pci/pci.c |9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2d85a96..b8357d7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5048,6 +5048,15 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
resource_size_t align, size;
u16 command;
 
+   /*
+* VF BARs are RO zero according to SR-IOV spec 3.4.1.11. Their
+* resources would be allocated when we enable them and not be
+* re-allocated any more. So we should never try to reassign
+* VF's alignment here.
+*/
+   if (dev->is_virtfn)
+   return;
+
/* check if specified PCI is target device to reassign */
align = pci_specified_resource_alignment(dev);
if (!align)
-- 
1.7.9.5



[PATCH v5 1/4] PCI: Ignore enforced alignment when kernel uses existing firmware setup

2016-09-13 Thread Yongji Xie
PCI resources allocator will use firmware setup and not try to
reassign resource when PCI_PROBE_ONLY or IORESOURCE_PCI_FIXED
is set.

The enforced alignment in pci_reassigndev_resource_alignment()
should be ignored in this case. Otherwise, some PCI devices'
resources would be released here and not re-allocated.

Signed-off-by: Yongji Xie 
---
 drivers/pci/pci.c |   13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aab9d51..2d85a96 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4959,6 +4959,13 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev)
 
spin_lock(_alignment_lock);
p = resource_alignment_param;
+   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (*p)
+   pr_info_once("PCI: resource_alignment ignored with 
PCI_PROBE_ONLY\n");
+   spin_unlock(_alignment_lock);
+   return 0;
+   }
+
while (*p) {
count = 0;
if (sscanf(p, "%d%n", _order, ) == 1 &&
@@ -5063,6 +5070,12 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
r = >resource[i];
if (!(r->flags & IORESOURCE_MEM))
continue;
+   if (r->flags & IORESOURCE_PCI_FIXED) {
+   dev_info(>dev, "No alignment for fixed BAR%d: 
%pR\n",
+   i, r);
+   continue;
+   }
+
size = resource_size(r);
if (size < align) {
size = align;
-- 
1.7.9.5



[PATCH v5 0/4] PCI: Introduce a way to enforce all MMIO BARs not to share PAGE_SIZE

2016-09-13 Thread Yongji Xie
This series introduces a way for PCI resource allocator to force
MMIO BARs not to share PAGE_SIZE. This would make sense to VFIO
driver. Because current VFIO implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs which may share the same page
with other BARs for security reasons. Thus, we have to handle mmio
access to these BARs in QEMU emulation rather than in guest which
will cause some performance loss.

In our solution, we try to make use of the existing code path of
resource_alignment kernel parameter and add a macro to set default
alignment for it. Thus we can define this macro by default on some
archs which may easily hit the performance issue because of their
64K page.

In this series, patch 1,2 fixed bugs of using resource_alignment;
patch 3 tried to add a new option for resource_alignment to use
IORESOURCE_STARTALIGN to specify the alignment of PCI BARs; patch 4
adds a macro to set the default alignment of all MMIO BARs.

Changelog v5:
- Rebased against v4.8-rc6
- Drop the patch that forbidding disable memory decoding in
  pci_reassigndev_resource_alignment()

Changelog v4:
- Rebased against v4.8-rc1
- Drop one irrelevant patch
- Drop the patch that adding wildcard to resource_alignment to enforce
  the alignment of all MMIO BARs to be at least PAGE_SIZE
- Change the format of option "noresize" of resource_alignment
- Code style improvements

Changelog v3:
- Ignore enforced alignment to fixed BARs
- Fix issue that disabling memory decoding when reassigning the alignment
- Only enable default alignment on PowerNV platform

Changelog v2:
- Ignore enforced alignment to VF BARs on pci_reassigndev_resource_alignment()

Yongji Xie (4):
  PCI: Ignore enforced alignment when kernel uses existing firmware setup
  PCI: Ignore enforced alignment to VF BARs
  PCI: Add a new option for resource_alignment to reassign alignment
  PCI: Add a macro to set default alignment for all PCI devices

 Documentation/kernel-parameters.txt |9 +++--
 arch/powerpc/include/asm/pci.h  |4 +++
 drivers/pci/pci.c   |   63 +--
 3 files changed, 63 insertions(+), 13 deletions(-)

-- 
1.7.9.5



Re: linux-next: manual merge of the kbuild tree with Linus' tree

2016-09-13 Thread Arnd Bergmann
On Tuesday, September 13, 2016 2:02:57 PM CEST Stephen Rothwell wrote:
> [For the new cc's, we are discussing the "thin archives" and "link dead
> code/data elimination" patches in the kbuild tree.]
> 
> On Tue, 13 Sep 2016 09:39:45 +1000 Stephen Rothwell  
> wrote:
> >
> > On Mon, 12 Sep 2016 11:03:08 +0200 Michal Marek  wrote:
> > >
> > > On 2016-09-12 04:53, Nicholas Piggin wrote:  
> > > > Question, what is the best way to merge dependent patches? Considering
> > > > they will need a good amount of architecture testing, I think they will
> > > > have to go via arch trees. But it also does not make sense to merge 
> > > > these
> > > > kbuild changes upstream first, without having tested them.
> > > 
> > > I think it makes sense to merge the kbuild changes via kbuild.git, even
> > > if they are unused and untested. Any follow-up fixes required to enable
> > > the first architecture can go through the respective architecture tree.
> > > Does that sound OK?  
> > 
> > And if you guarantee not to rebase the kbuild tree (or at least the
> > subset containing these patches), then each of the architecture trees
> > can just merge your tree (or a tag?) and then implement any necessary
> > arch dependent changes.  I fixes are necessary, they can also be merged
> > into the architecture trees.
> 
> Except, of course, the kbuild tree still has the asm EXPORT_SYMBOL
> patches that produce warnings on PowerPC  (And I am still reverting
> the PowerPC specific one of those patches).

Is that really powerpc specific? I have the same problem on ARM
and I don't see how any architecture would not have it.

I prototyped the patch below, which fixes it for me, but I have
not dared submit that workaround because it's butt ugly.

Arnd

 arch/arm/include/asm/io.h |  7 ---
 arch/arm/kernel/entry-ftrace.S| 12 +---
 arch/arm/kernel/head.S| 12 ++--
 arch/arm/kernel/smccc-call.S  |  6 +-
 arch/arm/lib/ashldi3.S|  7 ++-
 arch/arm/lib/ashrdi3.S|  6 +-
 arch/arm/lib/bitops.h | 19 +++
 arch/arm/lib/bswapsdi2.S  |  5 +
 arch/arm/lib/changebit.S  |  6 ++
 arch/arm/lib/clear_user.S | 10 +++---
 arch/arm/lib/clearbit.S   |  6 ++
 arch/arm/lib/copy_from_user.S |  7 +--
 arch/arm/lib/copy_page.S  |  5 +
 arch/arm/lib/copy_to_user.S   | 11 +++
 arch/arm/lib/csumipv6.S   |  5 +
 arch/arm/lib/csumpartial.S|  4 
 arch/arm/lib/csumpartialcopy.S|  7 ++-
 arch/arm/lib/csumpartialcopygeneric.S |  1 -
 arch/arm/lib/csumpartialcopyuser.S|  7 ++-
 arch/arm/lib/div64.S  |  7 ++-
 arch/arm/lib/findbit.S| 23 ++-
 arch/arm/lib/getuser.S| 23 +++
 arch/arm/lib/io-readsb.S  |  4 
 arch/arm/lib/io-readsl.S  |  4 
 arch/arm/lib/io-readsw-armv3.S|  4 
 arch/arm/lib/io-readsw-armv4.S|  5 +
 arch/arm/lib/io-writesb.S |  5 +
 arch/arm/lib/io-writesl.S |  5 +
 arch/arm/lib/io-writesw-armv3.S   |  5 +
 arch/arm/lib/io-writesw-armv4.S   |  4 
 arch/arm/lib/lib1funcs.S  | 33 ++---
 arch/arm/lib/lshrdi3.S|  7 ++-
 arch/arm/lib/memchr.S |  4 
 arch/arm/lib/memcpy.S |  5 +
 arch/arm/lib/memmove.S|  7 ++-
 arch/arm/lib/memset.S |  7 +++
 arch/arm/lib/memzero.S|  6 ++
 arch/arm/lib/muldi3.S |  7 ++-
 arch/arm/lib/putuser.S| 13 +
 arch/arm/lib/setbit.S |  6 ++
 arch/arm/lib/strchr.S |  4 
 arch/arm/lib/strrchr.S|  4 
 arch/arm/lib/testchangebit.S  |  6 ++
 arch/arm/lib/testclearbit.S   |  6 ++
 arch/arm/lib/testsetbit.S |  6 ++
 arch/arm/lib/ucmpdi2.S| 12 +---
 arch/arm/mach-imx/ssi-fiq.S   |  6 +-
 scripts/Makefile.build| 15 ---
 48 files changed, 288 insertions(+), 98 deletions(-)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 51458d8273ad..95ca0beda6a9 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -317,10 +317,13 @@ extern void _memset_io(volatile void __iomem *, int, 
size_t);
 #define writesl(p,d,l) __raw_writesl(p,d,l)
 
 #ifndef __ARMBE__
+
+extern void mmioset(void *, unsigned int, size_t);
+extern void mmiocpy(void *, const void *, size_t);
+
 static inline void memset_io(volatile void __iomem *dst, unsigned c,
size_t count)
 {
-   extern void mmioset(void *, unsigned int, size_t);

RE: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms

2016-09-13 Thread Y.B. Lu
> -Original Message-
> From: linux-mmc-ow...@vger.kernel.org [mailto:linux-mmc-
> ow...@vger.kernel.org] On Behalf Of Scott Wood
> Sent: Tuesday, September 13, 2016 7:25 AM
> To: Y.B. Lu; linux-...@vger.kernel.org; ulf.hans...@linaro.org; Arnd
> Bergmann
> Cc: linuxppc-dev@lists.ozlabs.org; devicet...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; linux-ker...@vger.kernel.org; linux-
> c...@vger.kernel.org; linux-...@vger.kernel.org; iommu@lists.linux-
> foundation.org; net...@vger.kernel.org; Mark Rutland; Rob Herring;
> Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh
> Sharma; Qiang Zhao; Kumar Gala; Santosh Shilimkar; Leo Li; X.B. Xie
> Subject: Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
> 
> On Mon, 2016-09-12 at 06:39 +, Y.B. Lu wrote:
> > Hi Scott,
> >
> > Thanks for your review :)
> > See my comment inline.
> >
> > >
> > > -Original Message-
> > > From: Scott Wood [mailto:o...@buserror.net]
> > > Sent: Friday, September 09, 2016 11:47 AM
> > > To: Y.B. Lu; linux-...@vger.kernel.org; ulf.hans...@linaro.org; Arnd
> > > Bergmann
> > > Cc: linuxppc-dev@lists.ozlabs.org; devicet...@vger.kernel.org;
> > > linux-arm- ker...@lists.infradead.org; linux-ker...@vger.kernel.org;
> > > linux- c...@vger.kernel.org; linux-...@vger.kernel.org;
> > > iommu@lists.linux- foundation.org; net...@vger.kernel.org; Mark
> > > Rutland; Rob Herring; Russell King; Jochen Friedrich; Joerg Roedel;
> > > Claudiu Manoil; Bhupesh Sharma; Qiang Zhao; Kumar Gala; Santosh
> > > Shilimkar; Leo Li; X.B. Xie
> > > Subject: Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ
> > > platforms
> > >
> > > On Tue, 2016-09-06 at 16:28 +0800, Yangbo Lu wrote:
> > > >
> > > > The global utilities block controls power management, I/O device
> > > > enabling, power-onreset(POR) configuration monitoring, alternate
> > > > function selection for multiplexed signals,and clock control.
> > > >
> > > > This patch adds a driver to manage and access global utilities
> block.
> > > > Initially only reading SVR and registering soc device are supported.
> > > > Other guts accesses, such as reading RCW, should eventually be
> > > > moved into this driver as well.
> > > >
> > > > Signed-off-by: Yangbo Lu 
> > > > Signed-off-by: Scott Wood 
> > > Don't put my signoff on patches that I didn't put it on myself.
> > > Definitely don't put mine *after* yours on patches that were last
> > > modified by you.
> > >
> > > If you want to mention that the soc_id encoding was my suggestion,
> > > then do so explicitly.
> > >
> > [Lu Yangbo-B47093] I found your 'signoff' on this patch at below link.
> > http://patchwork.ozlabs.org/patch/649211/
> >
> > So, let me just change the order in next version ?
> > Signed-off-by: Scott Wood 
> > Signed-off-by: Yangbo Lu 
> 
> No.  This isn't my patch so my signoff shouldn't be on it.

[Lu Yangbo-B47093] Ok, will remove it.

> 
> > [Lu Yangbo-B47093] It's a good idea to move die into .family I think.
> > In my opinion, it's better to keep svr and name in soc_id just like
> > your suggestion above.
> > >
> > >   {
> > >   .soc_id = "svr:0x85490010,name:T1023E,",
> > >   .family = "QorIQ T1024",
> > >   }
> > The user probably don’t like to learn the svr value. What they want is
> > just to match the soc they use.
> > It's convenient to use name+rev for them to match a soc.
> 
> What the user should want 99% of the time is to match the die (plus
> revision), not the soc.
> 
> > Regarding shrinking the table, I think it's hard to use svr+mask.
> > Because I find many platforms use different masks.
> > We couldn’t know the mask according svr value.
> 
> The mask would be part of the table:
> 
> {
>   {
>   .die = "T1024",
>   .svr = 0x8540,
>   .mask = 0xfff0,
>   },
>   {
>   .die = "T1040",
>   .svr = 0x8520,
>   .mask = 0xfff0,
>   },
>   {
>   .die = "LS1088A",
>   .svr = 0x8703,
>   .mask = 0x,
>   },
>   ...
> }
> 
> There's a small risk that we get the mask wrong and a different die is
> created that matches an existing table, but it doesn't seem too likely,
> and can easily be fixed with a kernel update if it happens.
> 

[Lu Yangbo-B47093] You mean we will not define soc device attribute for each 
soc and we will define attribute for each die instead, right?
If so, when we want to match a specific soc we need to use its svr value in 
code. If it's acceptable, I can try in next version.

> BTW, aren't ls2080a and ls2085a the same die?  And is there no non-E
> version of LS2080A/LS2040A?

[Lu Yangbo-B47093] I checked all the svr values in chip errata doc "Revision 
level to part marking cross-reference" table.
I found ls2080a and ls2085a were in two separate doc. And I didn’t find non-E 
version of 

Re: [PATCH 3/8] powerpc/pseries: exception vector macros

2016-09-13 Thread kbuild test robot
Hi Nicholas,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.8-rc6 next-20160912]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]
[Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
convenience) to record what (public, well-known) commit your patch series was 
built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64-use-asm-sections-for-head-exception-layout/20160913-113052
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/built-in.o: In function `.arch_local_irq_restore':
>> (.text+0x7390): undefined reference to `.__replay_interrupt'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[PATCH v2] Fix __tlbiel in hash_native_64

2016-09-13 Thread Balbir Singh
__tlbie and __tlbiel are out of sync. __tlbie does the right thing
it calls tlbie with "tlbie rb, L" if CPU_FTR_ARCH_206 (cpu feature) is clear
and with "tlbie rb" otherwise. During the cleanup of __tlbiel I noticed
that __tlbiel was setting bit 11 PPC_BIT(21) independent of the ISA
version for non-4k (L) pages. This patch fixes that issue. It also changes
the current PPC_TLBIEL to PPC_TLBIEL_5 and introduces a new PPC_TLBIEL similar
to PPC_TLBIE.

The arguments to PPC_TLBIE have also been changed/switched in order
to be consistent with the actual assembly usage for clearer reading
of code.

Cc: Paul Mackerras 
Cc: Aneesh Kumar K.V 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 

Signed-off-by: Balbir Singh 
---
 arch/powerpc/include/asm/ppc-opcode.h |  9 ++---
 arch/powerpc/mm/hash_native_64.c  | 14 --
 arch/powerpc/mm/tlb-radix.c   |  4 ++--
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 127ebf5..308004a 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -354,14 +354,17 @@
 #define PPC_TLBILX_VA(a, b)PPC_TLBILX(3, a, b)
 #define PPC_WAIT(w)stringify_in_c(.long PPC_INST_WAIT | \
__PPC_WC(w))
-#define PPC_TLBIE(lp,a)stringify_in_c(.long PPC_INST_TLBIE | \
-  ___PPC_RB(a) | ___PPC_RS(lp))
+#define PPC_TLBIE(rb,lp)   stringify_in_c(.long PPC_INST_TLBIE | \
+  ___PPC_RB(rb) | ___PPC_RS(lp))
 #definePPC_TLBIE_5(rb,rs,ric,prs,r) \
stringify_in_c(.long PPC_INST_TLBIE | \
___PPC_RB(rb) | ___PPC_RS(rs) | \
___PPC_RIC(ric) | ___PPC_PRS(prs) | \
___PPC_R(r))
-#definePPC_TLBIEL(rb,rs,ric,prs,r) \
+#definePPC_TLBIEL(rb,lp) \
+   stringify_in_c(.long PPC_INST_TLBIEL | \
+   ___PPC_RB(rb) | ___PPC_RS(lp))
+#definePPC_TLBIEL_5(rb,rs,ric,prs,r) \
stringify_in_c(.long PPC_INST_TLBIEL | \
___PPC_RB(rb) | ___PPC_RS(rs) | \
___PPC_RIC(ric) | ___PPC_PRS(prs) | \
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 0e4e965..b3c34c8 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -74,7 +74,7 @@ static inline void __tlbie(unsigned long vpn, int psize, int 
apsize, int ssize)
va |= ssize << 8;
sllp = get_sllp_encoding(apsize);
va |= sllp << 5;
-   asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
+   asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%0,%1), %2)
 : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
 : "memory");
break;
@@ -93,7 +93,7 @@ static inline void __tlbie(unsigned long vpn, int psize, int 
apsize, int ssize)
 */
va |= (vpn & 0xfe); /* AVAL */
va |= 1; /* L */
-   asm volatile(ASM_FTR_IFCLR("tlbie %0,1", PPC_TLBIE(%1,%0), %2)
+   asm volatile(ASM_FTR_IFCLR("tlbie %0,1", PPC_TLBIE(%0,%1), %2)
 : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
 : "memory");
break;
@@ -123,8 +123,9 @@ static inline void __tlbiel(unsigned long vpn, int psize, 
int apsize, int ssize)
va |= ssize << 8;
sllp = get_sllp_encoding(apsize);
va |= sllp << 5;
-   asm volatile(".long 0x7c000224 | (%0 << 11) | (0 << 21)"
-: : "r"(va) : "memory");
+   asm volatile(ASM_FTR_IFCLR("tlbiel %0,0", PPC_TLBIEL(%0,0), %1)
+: : "r" (va),  "i" (CPU_FTR_ARCH_206)
+: "memory");
break;
default:
/* We need 14 to 14 + i bits of va */
@@ -141,8 +142,9 @@ static inline void __tlbiel(unsigned long vpn, int psize, 
int apsize, int ssize)
 */
va |= (vpn & 0xfe);
va |= 1; /* L */
-   asm volatile(".long 0x7c000224 | (%0 << 11) | (1 << 21)"
-: : "r"(va) : "memory");
+   asm volatile(ASM_FTR_IFCLR("tlbiel %0,1", PPC_TLBIEL(%0,0), %1)
+: : "r" (va), "i" (CPU_FTR_ARCH_206)
+: "memory");
break;
}
 
diff --git a/arch/powerpc/mm/tlb-radix.c 

[PATCH] powerpc/powernv: Fix the state of root PE

2016-09-13 Thread Gavin Shan
The PE for root bus (root PE) can be removed because of PCI hot
remove in EEH recovery path for fenced PHB error. We need update
@phb->root_pe_populated accordingly so that the root PE can be
populated again in forthcoming PCI hot add path. Also, the PE
shouldn't be destroyed as it's global and reserved resource.

Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE")
Reported-by: Frederic Barrat 
Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index e98f4a8..c38a6a1 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3426,7 +3426,16 @@ static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
}
}
 
-   pnv_ioda_free_pe(pe);
+   /* The PE for root bus can be removed because of hotplug in EEH
+* recovery for fenced PHB error. We need mark the PE dead so
+* that it can be populated again in PCI hot add path. The PE
+* shouldn't be destroyed as it's the global reserved resource.
+*/
+   if (phb->ioda.root_pe_populated &&
+   phb->ioda.root_pe_idx == pe->pe_number)
+   phb->ioda.root_pe_populated = false;
+   else
+   pnv_ioda_free_pe(pe);
 }
 
 static void pnv_pci_release_device(struct pci_dev *pdev)
-- 
2.1.0



Re: [RFC] KVM: PPC: Book3S HV: Fall back to same size HPT in allocation ioctl

2016-09-13 Thread Anshuman Khandual
On 09/12/2016 05:35 PM, Aneesh Kumar K.V wrote:
> Anshuman Khandual  writes:
> 
>> When the HPT size is explicitly passed on from the userspace, currently
>> the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT
>> from reserved CMA area and if that is not possible, the allocation just
>> fails. With the commit 572abd563befd56 ("KVM: PPC: Book3S HV: Don't fall
>> back to smaller HPT size in allocation ioctl"), it does not even try to
>> allocate the same order pages from the page allocator before failing for
>> good. Same order allocation should be attempted from the page allocator
>> as a fallback option when the CMA allocation attempt fails.
> 
> IMO we should fix the reason for these CMA allocation failure. We are just

IMHO, irrespective of the ability of CMA to satisfy allocation requests,
a fall back option from page allocator should always be there when the
guest is failing to start due to unavailability of memory. In my previous
response in this thread, also pointed out how there need to be a parity
between what we do for cases when ioctl calls specify size or not from
having a fallback option.

> doing work around here.
> 
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>> - This change saves guests from failing to start after migration
>>
>>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
>> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> index 05f09ae..0a30eb4 100644
>> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> @@ -78,6 +78,14 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>>  --order;
>>  }
>>
>> +/*
>> + * Fallback in case the userspace has provided a size via ioctl.
>> + * Try allocating the same order pages from the page allocator.
>> + */
>> +if (!hpt && order > PPC_MIN_HPT_ORDER && htab_orderp)
>> +hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
>> +__GFP_NOWARN, order - PAGE_SHIFT);
>> +
>>  if (!hpt)
>>  return -ENOMEM;
> 
> A better way to do that would be (not even compile tested) ?
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 65b2b00d93d7..3f8995f27339 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -68,16 +68,18 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>   memset((void *)hpt, 0, (1ul << order));
>   kvm->arch.hpt_cma_alloc = 1;
>   }
> -
> - /* Lastly try successively smaller sizes from the page allocator */
> - /* Only do this if userspace didn't specify a size via ioctl */
> - while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
> + /*
> +  * Try successively smaller sizes from the page allocator.
> +  * If a size was specified via an ioctl, we just try that
> +  * specific size
> +  */
> +-while (!hpt && order > PPC_MIN_HPT_ORDER) {
>   hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
>  __GFP_NOWARN, order - PAGE_SHIFT);
> - if (!hpt)
> - --order;
> + if (htab_orderp)
> + break;
> + --order;
>   }
> -
>   if (!hpt)
>   return -ENOMEM;
> 

Initially thought about this way but then decided not to change this
existing code block instead just one more. But anything is fine, I
can just change this next time around.