Re: [PATCH 11/16] powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type

2017-06-08 Thread Greg Kroah-Hartman
On Fri, Jun 09, 2017 at 08:53:22AM +1000, Michael Ellerman wrote:
> Greg Kroah-Hartman  writes:
> 
> > On Thu, Jun 08, 2017 at 11:12:10PM +1000, Michael Ellerman wrote:
> >> Greg Kroah-Hartman  writes:
> >> 
> >> > The dev_attrs field has long been "depreciated" and is finally being
> >> > removed, so move the driver to use the "correct" dev_groups field
> >> > instead for struct bus_type.
> >> >
> >> > Cc: Benjamin Herrenschmidt 
> >> > Cc: Paul Mackerras 
> >> > Cc: Michael Ellerman 
> >> > Cc: Vineet Gupta 
> >> > Cc: Bart Van Assche 
> >> > Cc: Robin Murphy 
> >> > Cc: Joerg Roedel 
> >> > Cc: Johan Hovold 
> >> > Cc: Alexey Kardashevskiy 
> >> > Cc: Krzysztof Kozlowski 
> >> > Cc: 
> >> > Signed-off-by: Greg Kroah-Hartman 
> >> > ---
> >> >  arch/powerpc/platforms/pseries/vio.c | 37 
> >> > +---
> >> >  1 file changed, 22 insertions(+), 15 deletions(-)
> >> 
> >> This one needed a bit more work to get building, the incremental diff is
> >> below. We need a forward declaration of name, devspec and modalias,
> >> which is a bit weird, but that's how the code is currently structured.
> >> And there's dev and bus attributes with the same name, so that needed an
> >> added "bus".
> >> 
> >> I booted v2 of patch 10 and this one and everything looks identical to
> >> upstream.
> >
> > Ah, many thanks, this was on my todo list to fix up today.
> >
> > But you renamed the sysfs files when you added "bus" to the function
> > names, are you sure you want to do that?  I don't mind, but if you
> > happen to have userspace tools that look at those files, they just broke
> > :(
> 
> Ugh crap, no that won't work.
> 
> I didn't see it when I tested because my machine doesn't have the CMO
> feature enabled.
> 
> I guess we have to open code some of the BUS_ATTR_RO() etc. so we can
> avoid the name clash.

Or split it into multiple files, I've solved this that way in the past.
You shouldn't have to "open code" BUS_ATTR_RO().

thanks,

greg k-h


Re: [PATCH] macintosh: move mac_hid driver to input/mouse.

2017-06-08 Thread Dmitry Torokhov
On Thu, Jun 8, 2017 at 4:07 PM, Peter Hutterer  wrote:
> On Thu, Jun 08, 2017 at 03:18:42PM +0200, Michal Suchánek wrote:
>> This is what evtest reports about my keyboard:
>>
>> Select the device event number [0-12]: 2
>> Input driver version is 1.0.1
>> Input device ID: bus 0x3 vendor 0x413c product 0x2107 version 0x111
>> Input device name: "DELL Dell USB Entry Keyboard"
>> Supported events:
>>   Event type 0 (EV_SYN)
>>   Event type 1 (EV_KEY)
>> Event code 1 (KEY_ESC)
>> Event code 2 (KEY_1)
>> Event code 3 (KEY_2)
>> Event code 4 (KEY_3)
>> ...
>> Event code 193 (KEY_F23)
>> Event code 194 (KEY_F24)
>> Event code 240 (KEY_UNKNOWN)
>> Event code 272 (BTN_LEFT)
>> Event code 273 (BTN_RIGHT)
>> Event code 274 (BTN_MIDDLE)
>>   Event type 4 (EV_MSC)
>> Event code 4 (MSC_SCAN)
>>   Event type 17 (EV_LED)
>> Event code 0 (LED_NUML) state 1
>> Event code 1 (LED_CAPSL) state 0
>> Event code 2 (LED_SCROLLL) state 0
>> Event code 3 (LED_COMPOSE) state 0
>> Event code 4 (LED_KANA) state 0
>> Key repeat handling:
>>   Repeat type 20 (EV_REP)
>> Repeat code 0 (REP_DELAY)
>>   Value250
>> Repeat code 1 (REP_PERIOD)
>>   Value 33
>> Properties:
>
> looks like it's not tagged as ID_INPUT_MOUSE by the default udev rules
> because for that we need x/y axes (either relative for real mice or absolute
> ones for the VMWare USB mouse). This keyboard only has buttons though. So it
> gets ID_INPUT_KEYBOARD for the keys, but no ID_INPUT_MOUSE.
>
> Google isn't overly forthcoming on "DELL Dell USB Entry Keyboard" but the
> few pictures I can find all point to a keyboard that doesn't have any
> physical mouse buttons at all. Does yours have buttons? Can you post a
> picture of it somewhere?
>

Michal is using udev/hwdb to replace some of the keys on his keyboard
to generate BTN_RIGHT/BTN_MIDDLE trying to achive the same end result
as with mac_hid. It is not the default keyboard behavior. Having
another custom udev rule to mark the device as ID_INPUT_MOUSE is a
fair requirement in this case I think.

-- 
Dmitry


Re: [PATCH] macintosh: move mac_hid driver to input/mouse.

2017-06-08 Thread Peter Hutterer
On Thu, Jun 08, 2017 at 03:18:42PM +0200, Michal Suchánek wrote:
> This is what evtest reports about my keyboard:
> 
> Select the device event number [0-12]: 2
> Input driver version is 1.0.1
> Input device ID: bus 0x3 vendor 0x413c product 0x2107 version 0x111
> Input device name: "DELL Dell USB Entry Keyboard"
> Supported events:
>   Event type 0 (EV_SYN)
>   Event type 1 (EV_KEY)
> Event code 1 (KEY_ESC)
> Event code 2 (KEY_1)
> Event code 3 (KEY_2)
> Event code 4 (KEY_3)
> ...
> Event code 193 (KEY_F23)
> Event code 194 (KEY_F24)
> Event code 240 (KEY_UNKNOWN)
> Event code 272 (BTN_LEFT)
> Event code 273 (BTN_RIGHT)
> Event code 274 (BTN_MIDDLE)
>   Event type 4 (EV_MSC)
> Event code 4 (MSC_SCAN)
>   Event type 17 (EV_LED)
> Event code 0 (LED_NUML) state 1
> Event code 1 (LED_CAPSL) state 0
> Event code 2 (LED_SCROLLL) state 0
> Event code 3 (LED_COMPOSE) state 0
> Event code 4 (LED_KANA) state 0
> Key repeat handling:
>   Repeat type 20 (EV_REP)
> Repeat code 0 (REP_DELAY)
>   Value250
> Repeat code 1 (REP_PERIOD)
>   Value 33
> Properties:

looks like it's not tagged as ID_INPUT_MOUSE by the default udev rules
because for that we need x/y axes (either relative for real mice or absolute
ones for the VMWare USB mouse). This keyboard only has buttons though. So it
gets ID_INPUT_KEYBOARD for the keys, but no ID_INPUT_MOUSE.

Google isn't overly forthcoming on "DELL Dell USB Entry Keyboard" but the
few pictures I can find all point to a keyboard that doesn't have any
physical mouse buttons at all. Does yours have buttons? Can you post a
picture of it somewhere?

Cheers,
   Peter


Re: [PATCH 11/16] powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type

2017-06-08 Thread Michael Ellerman
Greg Kroah-Hartman  writes:

> On Thu, Jun 08, 2017 at 11:12:10PM +1000, Michael Ellerman wrote:
>> Greg Kroah-Hartman  writes:
>> 
>> > The dev_attrs field has long been "depreciated" and is finally being
>> > removed, so move the driver to use the "correct" dev_groups field
>> > instead for struct bus_type.
>> >
>> > Cc: Benjamin Herrenschmidt 
>> > Cc: Paul Mackerras 
>> > Cc: Michael Ellerman 
>> > Cc: Vineet Gupta 
>> > Cc: Bart Van Assche 
>> > Cc: Robin Murphy 
>> > Cc: Joerg Roedel 
>> > Cc: Johan Hovold 
>> > Cc: Alexey Kardashevskiy 
>> > Cc: Krzysztof Kozlowski 
>> > Cc: 
>> > Signed-off-by: Greg Kroah-Hartman 
>> > ---
>> >  arch/powerpc/platforms/pseries/vio.c | 37 
>> > +---
>> >  1 file changed, 22 insertions(+), 15 deletions(-)
>> 
>> This one needed a bit more work to get building, the incremental diff is
>> below. We need a forward declaration of name, devspec and modalias,
>> which is a bit weird, but that's how the code is currently structured.
>> And there's dev and bus attributes with the same name, so that needed an
>> added "bus".
>> 
>> I booted v2 of patch 10 and this one and everything looks identical to
>> upstream.
>
> Ah, many thanks, this was on my todo list to fix up today.
>
> But you renamed the sysfs files when you added "bus" to the function
> names, are you sure you want to do that?  I don't mind, but if you
> happen to have userspace tools that look at those files, they just broke
> :(

Ugh crap, no that won't work.

I didn't see it when I tested because my machine doesn't have the CMO
feature enabled.

I guess we have to open code some of the BUS_ATTR_RO() etc. so we can
avoid the name clash.

I'll try and get it fixed later today.

cheers


Re: powerpc/book3s64: Move PPC_DT_CPU_FTRs and enable it by default

2017-06-08 Thread Michael Ellerman
On Thu, 2017-06-08 at 06:51:41 UTC, Michael Ellerman wrote:
> The PPC_DT_CPU_FTRs is a bit misplaced in menuconfig, it shows up with
> other general kernel options. It's really more at home in the "Platform
> Support" section, so move it there.
> 
> Also enable it by default, for Book3s 64. It does mostly nothing unless
> the device tree properties are found, and we will want it enabled
> eventually in distro kernels, so turn it on to start getting more
> testing.
> 
> Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for 
> discovering CPU features")
> Signed-off-by: Michael Ellerman 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/c6ee9619e2edd9912316f7e2eaf9ff

cheers


Re: [v2] cxl: Fix error path on bad ioctl

2017-06-08 Thread Michael Ellerman
On Tue, 2017-06-06 at 09:43:41 UTC, Frederic Barrat wrote:
> Fix error path if we can't copy user structure on CXL_IOCTL_START_WORK
> ioctl. We shouldn't unlock the context status mutex as it was not
> locked (yet).
> 
> Signed-off-by: Frederic Barrat 
> Cc: sta...@vger.kernel.org
> Fixes: 0712dc7e73e5 ("cxl: Fix issues when unmapping contexts")

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/cec422c11caeeccae709e9942058b6

cheers


Re: powerpc/mm/4k: Limit 4k page size config to 64TB virtual address space

2017-06-08 Thread Michael Ellerman
On Thu, 2017-06-01 at 14:35:04 UTC, "Aneesh Kumar K.V" wrote:
> Supporting 512TB requires us to do a order 3 allocation for level 1 page
> table(pgd). This results in page allocation failures with certain workloads.
> For now limit 4k linux page size config to 64TB.
> 
> Signed-off-by: Aneesh Kumar K.V 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/92d9dfda8b547cc292af27e11e11c9

cheers


Re: Resend [PATCH] powerpc/hotplug-mem: Fix aa_index match bug for hotplug

2017-06-08 Thread Michael Ellerman
Michael Bringmann  writes:

> When adding or removing memory, the aa_index (affinity value) for the
> memblock must also be converted to match the endianness of the rest
> of the 'ibm,dynamic-memory' property.  Otherwise, subsequent retrieval
> of the attribute will likely lead to non-existent nodes, followed by
> using the default node in the code inappropriately.
>
> Note: This is a bug that we would likely want to see fixed in stable
>   circa v4.1+.

I merged this a week ago:

https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-June/158426.html

It was annotated:

Fixes: 5f97b2a0d176 ("powerpc/pseries: Implement memory hotplug add in the 
kernel")
Cc: sta...@vger.kernel.org # v4.1+

cheers


Re: [PATCH 03/44] dmaengine: ioat: don't use DMA_ERROR_CODE

2017-06-08 Thread Dave Jiang
On 06/08/2017 06:25 AM, Christoph Hellwig wrote:
> DMA_ERROR_CODE is not a public API and will go away.  Instead properly
> unwind based on the loop counter.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Dave Jiang 

> ---
>  drivers/dma/ioat/init.c | 24 +++-
>  1 file changed, 7 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
> index 6ad4384b3fa8..ed8ed1192775 100644
> --- a/drivers/dma/ioat/init.c
> +++ b/drivers/dma/ioat/init.c
> @@ -839,8 +839,6 @@ static int ioat_xor_val_self_test(struct ioatdma_device 
> *ioat_dma)
>   goto free_resources;
>   }
>  
> - for (i = 0; i < IOAT_NUM_SRC_TEST; i++)
> - dma_srcs[i] = DMA_ERROR_CODE;
>   for (i = 0; i < IOAT_NUM_SRC_TEST; i++) {
>   dma_srcs[i] = dma_map_page(dev, xor_srcs[i], 0, PAGE_SIZE,
>  DMA_TO_DEVICE);
> @@ -910,8 +908,6 @@ static int ioat_xor_val_self_test(struct ioatdma_device 
> *ioat_dma)
>  
>   xor_val_result = 1;
>  
> - for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
> - dma_srcs[i] = DMA_ERROR_CODE;
>   for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++) {
>   dma_srcs[i] = dma_map_page(dev, xor_val_srcs[i], 0, PAGE_SIZE,
>  DMA_TO_DEVICE);
> @@ -965,8 +961,6 @@ static int ioat_xor_val_self_test(struct ioatdma_device 
> *ioat_dma)
>   op = IOAT_OP_XOR_VAL;
>  
>   xor_val_result = 0;
> - for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
> - dma_srcs[i] = DMA_ERROR_CODE;
>   for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++) {
>   dma_srcs[i] = dma_map_page(dev, xor_val_srcs[i], 0, PAGE_SIZE,
>  DMA_TO_DEVICE);
> @@ -1017,18 +1011,14 @@ static int ioat_xor_val_self_test(struct 
> ioatdma_device *ioat_dma)
>   goto free_resources;
>  dma_unmap:
>   if (op == IOAT_OP_XOR) {
> - if (dest_dma != DMA_ERROR_CODE)
> - dma_unmap_page(dev, dest_dma, PAGE_SIZE,
> -DMA_FROM_DEVICE);
> - for (i = 0; i < IOAT_NUM_SRC_TEST; i++)
> - if (dma_srcs[i] != DMA_ERROR_CODE)
> - dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
> -DMA_TO_DEVICE);
> + while (--i >= 0)
> + dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
> +DMA_TO_DEVICE);
> + dma_unmap_page(dev, dest_dma, PAGE_SIZE, DMA_FROM_DEVICE);
>   } else if (op == IOAT_OP_XOR_VAL) {
> - for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
> - if (dma_srcs[i] != DMA_ERROR_CODE)
> - dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
> -DMA_TO_DEVICE);
> + while (--i >= 0)
> + dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
> +DMA_TO_DEVICE);
>   }
>  free_resources:
>   dma->device_free_chan_resources(dma_chan);
> 


[PATCH 00/35] defconfig: Cleanup from old entries

2017-06-08 Thread Krzysztof Kozlowski
Hi,

While cleaning Samsung ARM defconfigs with savedefconfig, I encountered
similar obsolete entries in other files.

Except the ARM, no dependencies.
For ARM, the rest of patches depend on the first change (otherwise
it might not apply cleanly).


Best regards,
Krzysztof

Krzysztof Kozlowski (35):
  ARM: defconfig: Cleanup from old Kconfig options
  MIPS: defconfig: Cleanup from old Kconfig options
  blackfin: defconfig: Cleanup from old Kconfig options
  c6x: defconfig: Cleanup from old Kconfig options
  m32r: defconfig: Cleanup from old Kconfig options
  sh: defconfig: Cleanup from old Kconfig options
  x86: defconfig: Cleanup from old Kconfig options
  cris: defconfig: Cleanup from old Kconfig options
  parisc: defconfig: Cleanup from old Kconfig options
  sparc: defconfig: Cleanup from old Kconfig options
  hexagon: defconfig: Cleanup from old Kconfig options
  microblaze: defconfig: Cleanup from old Kconfig options
  mn10300: defconfig: Cleanup from old Kconfig options
  score: defconfig: Cleanup from old Kconfig options
  unicore32: defconfig: Cleanup from old Kconfig options
  alpha: defconfig: Cleanup from old Kconfig options
  ARC: defconfig: Cleanup from old Kconfig options
  m68k: defconfig: Cleanup from old Kconfig options
  nios2: defconfig: Cleanup from old Kconfig options
  openrisc: defconfig: Cleanup from old Kconfig options
  powerpc: defconfig: Cleanup from old Kconfig options
  um: defconfig: Cleanup from old Kconfig options
  tile: defconfig: Cleanup from old Kconfig options
  ARM: defconfig: samsung: Re-order entries to match savedefconfig
  ARM: mini2440_defconfig: Bring back lost (but wanted) options
  ARM: tct_hammer_defconfig: Bring back lost (but wanted) options
  ARM: s3c2410_defconfig: Bring back lost (but wanted) options
  ARM: s3c6400_defconfig: Bring back lost (but wanted) options
  ARM: s5pv210_defconfig: Bring back lost (but wanted) options
  ARM: exynos_defconfig: Save defconfig
  ARM: s3c2410_defconfig: Save defconfig
  ARM: mini2440_defconfig: Save defconfig
  ARM: s3c6400_defconfig: Save defconfig
  ARM: s5pv210_defconfig: Save defconfig
  ARM: tct_hammer_defconfig: Save defconfig

 arch/alpha/defconfig  |  2 -
 arch/arc/configs/nps_defconfig|  1 -
 arch/arc/configs/tb10x_defconfig  |  1 -
 arch/arm/configs/acs5k_defconfig  |  8 --
 arch/arm/configs/acs5k_tiny_defconfig | 10 ---
 arch/arm/configs/am200epdkit_defconfig|  7 --
 arch/arm/configs/assabet_defconfig|  3 -
 arch/arm/configs/axm55xx_defconfig|  1 -
 arch/arm/configs/badge4_defconfig |  5 --
 arch/arm/configs/cerfcube_defconfig   |  4 -
 arch/arm/configs/cm_x2xx_defconfig| 13 
 arch/arm/configs/cm_x300_defconfig| 10 ---
 arch/arm/configs/cns3420vb_defconfig  |  7 --
 arch/arm/configs/colibri_pxa270_defconfig | 13 
 arch/arm/configs/colibri_pxa300_defconfig |  9 ---
 arch/arm/configs/collie_defconfig |  4 -
 arch/arm/configs/corgi_defconfig  | 13 
 arch/arm/configs/dove_defconfig   |  1 -
 arch/arm/configs/ebsa110_defconfig|  1 -
 arch/arm/configs/efm32_defconfig  |  1 -
 arch/arm/configs/em_x270_defconfig| 13 
 arch/arm/configs/ep93xx_defconfig |  1 -
 arch/arm/configs/eseries_pxa_defconfig|  8 --
 arch/arm/configs/exynos_defconfig |  6 +-
 arch/arm/configs/ezx_defconfig| 18 -
 arch/arm/configs/footbridge_defconfig |  1 -
 arch/arm/configs/h5000_defconfig  |  6 --
 arch/arm/configs/hackkit_defconfig|  3 -
 arch/arm/configs/imote2_defconfig | 16 
 arch/arm/configs/imx_v4_v5_defconfig  |  1 -
 arch/arm/configs/iop13xx_defconfig|  4 -
 arch/arm/configs/iop32x_defconfig |  5 --
 arch/arm/configs/iop33x_defconfig |  6 --
 arch/arm/configs/ixp4xx_defconfig |  9 ---
 arch/arm/configs/jornada720_defconfig |  8 --
 arch/arm/configs/ks8695_defconfig |  7 --
 arch/arm/configs/lart_defconfig   |  3 -
 arch/arm/configs/lpc18xx_defconfig|  1 -
 arch/arm/configs/lpd270_defconfig |  5 --
 arch/arm/configs/lubbock_defconfig|  4 -
 arch/arm/configs/magician_defconfig   | 15 
 arch/arm/configs/mainstone_defconfig  |  4 -
 arch/arm/configs/mini2440_defconfig   | 91 +++
 arch/arm/configs/mmp2_defconfig   |  9 ---
 arch/arm/configs/moxart_defconfig |  1 -
 arch/arm/configs/mps2_defconfig   |  1 -
 arch/arm/configs/mv78xx0_defconfig|  8 --
 arch/arm/configs/mxs_defconfig

[PATCH v10 10/10] powerpc/perf: Thread imc cpuhotplug support

2017-06-08 Thread Madhavan Srinivasan
From: Anju T Sudhakar 

Code to add support for thread IMC on cpuhotplug.
When a cpu goes offline, the LDBAR for that cpu is disabled, and when it comes
back online the previous ldbar value is written back to the LDBAR for that cpu.

To register the hotplug functions for thread_imc, a new state
CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE is added to the list of existing
states.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/imc-pmu.c | 42 +++---
 include/linux/cpuhotplug.h  |  1 +
 2 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index ca8c0b8e157a..69bc411a82af 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -377,6 +377,37 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr)
return 0;
 }
 
+static int ppc_thread_imc_cpu_online(unsigned int cpu)
+{
+   int rc = 0;
+   u64 ldbar_value;
+
+   if (per_cpu(thread_imc_mem, cpu) == NULL)
+   rc = thread_imc_mem_alloc(cpu, thread_imc_mem_size);
+
+   if (rc)
+   mtspr(SPRN_LDBAR, 0);
+
+   ldbar_value = ((u64)per_cpu(thread_imc_mem, cpu) & 
(u64)THREAD_IMC_LDBAR_MASK) |
+   (u64)THREAD_IMC_ENABLE;
+   mtspr(SPRN_LDBAR, ldbar_value);
+   return 0;
+}
+
+static int ppc_thread_imc_cpu_offline(unsigned int cpu)
+{
+   mtspr(SPRN_LDBAR, 0);
+   return 0;
+}
+
+void thread_imc_cpu_init(void)
+{
+   cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
+ "perf/powerpc/imc_thread:online",
+ ppc_thread_imc_cpu_online,
+ ppc_thread_imc_cpu_offline);
+}
+
 static int ppc_core_imc_cpu_online(unsigned int cpu)
 {
const struct cpumask *l_cpumask;
@@ -860,8 +891,8 @@ static void cleanup_all_thread_imc_memory(void)
int i;
 
for_each_online_cpu(i) {
-   if (per_cpu(thread_imc_mem, cpu))
-   free_pages(per_cpu(thread_imc_mem, cpu), 0);
+   if (per_cpu(thread_imc_mem, i))
+   free_pages((u64)per_cpu(thread_imc_mem, i), 0);
}
 }
 
@@ -899,6 +930,9 @@ int __init init_imc_pmu(struct imc_events *events, int idx,
if (ret)
return ret;
break;
+   case IMC_DOMAIN_THREAD:
+   thread_imc_cpu_init();
+   break;
default:
return -1;  /* Unknown domain */
}
@@ -933,8 +967,10 @@ int __init init_imc_pmu(struct imc_events *events, int idx,
}
 
/* For thread_imc, we have allocated memory, we need to free it */
-   if (pmu_ptr->domain == IMC_DOMAIN_THREAD)
+   if (pmu_ptr->domain == IMC_DOMAIN_THREAD) {
cleanup_all_thread_imc_memory();
+   cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE);
+   }
 
return ret;
 }
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index e145fffec093..937d1ec8c3e9 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -141,6 +141,7 @@ enum cpuhp_state {
CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE,
CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE,
CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
+   CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
CPUHP_AP_WORKQUEUE_ONLINE,
CPUHP_AP_RCUTREE_ONLINE,
CPUHP_AP_ONLINE_DYN,
-- 
2.7.4



[PATCH v10 09/10] powerpc/perf: Thread IMC PMU functions

2017-06-08 Thread Madhavan Srinivasan
From: Anju T Sudhakar 

Code to add PMU functions required for event initialization,
read, update, add, del etc. for thread IMC PMU. Thread IMC PMUs are used
for per-task monitoring.

For each CPU, a page of memory is allocated and is kept static i.e.,
these pages will exist till the machine shuts down. The base address of
this page is assigned to the ldbar of that cpu. As soon as we do that,
the thread IMC counters start running for that cpu and the data of these
counters are assigned to the page allocated. But we use this for
per-task monitoring. Whenever we start monitoring a task, the event is
added is onto the task. At that point, we read the initial value of the
event. Whenever, we stop monitoring the task, the final value is taken
and the difference is the event data.

Now, a task can move to a different cpu. Suppose a task X is moving from
cpu A to cpu B. When the task is scheduled out of A, we get an
event_del for A, and hence, the event data is updated. And, we stop
updating the X's event data. As soon as X moves on to B, event_add is
called for B, and we again update the event_data. And this is how it
keeps on updating the event data even when the task is scheduled on to
different cpus.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h|   4 +
 arch/powerpc/perf/imc-pmu.c   | 219 +-
 arch/powerpc/platforms/powernv/opal-imc.c |   2 +
 3 files changed, 218 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 6659846c21ab..56b2ace565f4 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -43,6 +43,9 @@
 #define IMC_DTB_COMPAT "ibm,opal-in-memory-counters"
 #define IMC_DTB_UNIT_COMPAT"ibm,imc-counters"
 
+#define THREAD_IMC_LDBAR_MASK   0x0003e000
+#define THREAD_IMC_ENABLE   0x8000
+
 /*
  * Structure to hold memory address information for imc units.
  */
@@ -102,4 +105,5 @@ extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 extern struct imc_pmu *core_imc_pmu;
 extern int imc_control(unsigned long type, bool operation);
 extern int __init init_imc_pmu(struct imc_events *events, int idx, struct 
imc_pmu *pmu_ptr);
+void thread_imc_disable(void);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index e15a8df3d3b7..ca8c0b8e157a 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -18,6 +18,9 @@
 #include 
 #include 
 
+/* Maintains base address for all the cpus */
+static DEFINE_PER_CPU(u64 *, thread_imc_mem);
+
 /* Needed for sanity check */
 extern u64 nest_max_offset;
 extern u64 core_max_offset;
@@ -39,6 +42,7 @@ static struct cpumask imc_result_mask;
 static DEFINE_MUTEX(imc_control_mutex);
 
 struct imc_pmu *core_imc_pmu;
+static int thread_imc_mem_size;
 
 struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
 {
@@ -317,18 +321,59 @@ bool is_core_imc_mem_inited(int cpu)
 }
 
 /*
- * imc_mem_init : Function to support memory allocation for core imc.
+ * Allocates a page of memory for each of the online cpus, and, writes the
+ * physical base address of that page to the LDBAR for that cpu. This starts
+ * the thread IMC counters.
+ */
+static int thread_imc_mem_alloc(int cpu_id, int size)
+{
+   u64 ldbar_value, *local_mem;
+   int phys_id = topology_physical_package_id(cpu_id);
+
+   if (per_cpu(thread_imc_mem, cpu_id) != NULL)
+   return 0;
+
+   local_mem = alloc_pages_exact_nid(phys_id,
+   (size_t)size, GFP_KERNEL | __GFP_ZERO);
+   if (!local_mem)
+   return -ENOMEM;
+
+   per_cpu(thread_imc_mem, cpu_id) = local_mem;
+
+   ldbar_value = ((u64)local_mem & (u64)THREAD_IMC_LDBAR_MASK) |
+   (u64)THREAD_IMC_ENABLE;
+
+   mtspr(SPRN_LDBAR, ldbar_value);
+   return 0;
+}
+
+/*
+ * imc_mem_init : Function to support memory allocation for core and thread 
imc.
  */
 static int imc_mem_init(struct imc_pmu *pmu_ptr)
 {
-   int nr_cores;
+   int nr_cores, cpu, res;
 
if (pmu_ptr->imc_counter_mmaped)
return 0;
-   nr_cores = num_present_cpus() / threads_per_core;
-   pmu_ptr->mem_info = kzalloc((sizeof(struct imc_mem_info) * nr_cores), 
GFP_KERNEL);
-   if (!pmu_ptr->mem_info)
-   return -ENOMEM;
+   switch (pmu_ptr->domain) {
+   case IMC_DOMAIN_CORE:
+   nr_cores = num_present_cpus() / threads_per_core;
+   pmu_ptr->mem_info = kzalloc((sizeof(struct imc_mem_info) * 
nr_cores), GFP_KERNEL);
+   if (!pmu_ptr->mem_info)
+   return -ENOMEM;
+

[PATCH v10 07/10] powerpc/perf: PMU functions for Core IMC and hotplugging

2017-06-08 Thread Madhavan Srinivasan
Code to add PMU function to initialize a core IMC event. It also
adds cpumask initialization function for core IMC PMU. For
initialization, memory is allocated per core where the data
for core IMC counters will be accumulated. The base address for this
page is sent to OPAL via an OPAL call which initializes various SCOMs
related to Core IMC initialization. Upon any errors, the pages are
free'ed and core IMC counters are disabled using the same OPAL call.

For CPU hotplugging, a cpumask is initialized which contains an online
CPU from each core. If a cpu goes offline, we check whether that cpu
belongs to the core imc cpumask, if yes, then, we migrate the PMU
context to any other online cpu (if available) in that core. If a cpu
comes back online, then this cpu will be added to the core imc cpumask
only if there was no other cpu from that core in the previous cpumask.

To register the hotplug functions for core_imc, a new state
CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE is added to the list of existing
states.

Patch also adds OPAL device shutdown callback. Needed to disable the
IMC core engine to handle kexec.

Signed-off-by: Hemant Kumar 
Signed-off-by: Anju T Sudhakar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h|   1 +
 arch/powerpc/include/asm/opal-api.h   |   1 +
 arch/powerpc/perf/imc-pmu.c   | 265 +-
 arch/powerpc/platforms/powernv/opal-imc.c |   7 +
 include/linux/cpuhotplug.h|   1 +
 5 files changed, 269 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 0f018b6510c0..88bc7ef6cc1e 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -99,5 +99,6 @@ struct imc_pmu {
 
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 extern struct imc_pmu *core_imc_pmu;
+extern int imc_control(unsigned long type, bool operation);
 extern int __init init_imc_pmu(struct imc_events *events, int idx, struct 
imc_pmu *pmu_ptr);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 7056b977df53..7a58617490cf 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -1018,6 +1018,7 @@ enum {
 /* Argument to OPAL_IMC_COUNTERS_*  */
 enum {
OPAL_IMC_COUNTERS_NEST = 1,
+   OPAL_IMC_COUNTERS_CORE = 2,
 };
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 08b363ac35ac..cdee29b92945 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1,5 +1,5 @@
 /*
- * Nest Performance Monitor counter support.
+ * IMC Performance Monitor counter support.
  *
  * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
  *   (C) 2017 Anju T Sudhakar, IBM Corporation.
@@ -24,11 +24,15 @@ extern u64 core_max_offset;
 
 struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 static cpumask_t nest_imc_cpumask;
+static cpumask_t core_imc_cpumask;
 static int nest_imc_cpumask_initialized;
 
 static atomic_t nest_events;
+static atomic_t core_events;
 /* Used to avoid races in calling enable/disable nest-pmu units */
 static DEFINE_MUTEX(imc_nest_reserve);
+/* Used to avoid races in calling enable/disable nest-pmu units */
+static DEFINE_MUTEX(imc_core_reserve);
 
 static struct cpumask imc_result_mask;
 static DEFINE_MUTEX(imc_control_mutex);
@@ -56,9 +60,15 @@ static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
struct device_attribute *attr,
char *buf)
 {
+   struct pmu *pmu = dev_get_drvdata(dev);
cpumask_t *active_mask;
 
-   active_mask = _imc_cpumask;
+   if (!strncmp(pmu->name, "nest_", strlen("nest_")))
+   active_mask = _imc_cpumask;
+   else if (!strncmp(pmu->name, "core_", strlen("core_")))
+   active_mask = _imc_cpumask;
+   else
+   return 0;
return cpumap_print_to_pagebuf(true, buf, active_mask);
 }
 
@@ -107,6 +117,9 @@ int imc_control(unsigned long type, bool operation)
case OPAL_IMC_COUNTERS_NEST:
imc_domain_mask = _imc_cpumask;
break;
+   case OPAL_IMC_COUNTERS_CORE:
+   imc_domain_mask = _imc_cpumask;
+   break;
default:
res = -EINVAL;
goto out;
@@ -223,6 +236,160 @@ static void nest_imc_counters_release(struct perf_event 
*event)
}
 }
 
+static void cleanup_all_core_imc_memory(struct imc_pmu *pmu_ptr)
+{
+   struct imc_mem_info *ptr;
+
+   for (ptr = pmu_ptr->mem_info; ptr; ptr++) {
+   if (ptr->vbase[0])
+   free_pages((u64)ptr->vbase[0], 0);
+   }
+
+   kfree(pmu_ptr->mem_info);
+}
+
+static void 

[PATCH v10 08/10] powerpc/powernv: Thread IMC events detection

2017-06-08 Thread Madhavan Srinivasan
From: Anju T Sudhakar 

Code to add support for detection of thread IMC events. It adds a new
domain IMC_DOMAIN_THREAD and it is determined with the help of the
"type" property in the imc device-tree.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h| 1 +
 arch/powerpc/include/asm/opal-api.h   | 1 +
 arch/powerpc/perf/imc-pmu.c   | 1 +
 arch/powerpc/platforms/powernv/opal-imc.c | 9 -
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 88bc7ef6cc1e..6659846c21ab 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -96,6 +96,7 @@ struct imc_pmu {
  */
 #define IMC_DOMAIN_NEST1
 #define IMC_DOMAIN_CORE2
+#define IMC_DOMAIN_THREAD  3
 
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 extern struct imc_pmu *core_imc_pmu;
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 7a58617490cf..40a4df6e161b 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -1008,6 +1008,7 @@ enum {
 
 /* In-Memory Collection Counters Type */
 enum {
+   IMC_COUNTER_PER_THREAD  = 0x1,
IMC_COUNTER_PER_SUB_CORE= 0x2,
IMC_COUNTER_PER_CORE= 0x4,
IMC_COUNTER_PER_QUAD= 0x8,
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index cdee29b92945..e15a8df3d3b7 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -21,6 +21,7 @@
 /* Needed for sanity check */
 extern u64 nest_max_offset;
 extern u64 core_max_offset;
+extern u64 thread_max_offset;
 
 struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 static cpumask_t nest_imc_cpumask;
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index a3f48866ec12..5fa008cb1b93 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -36,6 +36,7 @@
 
 u64 nest_max_offset;
 u64 core_max_offset;
+u64 thread_max_offset;
 
 static int imc_event_prop_update(char *name, struct imc_events *events)
 {
@@ -120,6 +121,10 @@ static void update_max_value(u32 value, int pmu_domain)
if (core_max_offset < value)
core_max_offset = value;
break;
+   case IMC_DOMAIN_THREAD:
+   if (thread_max_offset < value)
+   thread_max_offset = value;
+   break;
default:
/* Unknown domain, return */
return;
@@ -404,7 +409,7 @@ static int imc_get_mem_addr_nest(struct device_node *node,
 /*
  * imc_pmu_create : Takes the parent device which is the pmu unit, pmu_index
  * and domain as the inputs.
- * Allocates memory for the pmu, sets up its domain (NEST/CORE), and
+ * Allocates memory for the pmu, sets up its domain (NEST/CORE/THREAD), and
  * calls imc_events_setup() to allocate memory for the events supported
  * by this pmu. Assigns a name for the pmu.
  *
@@ -523,6 +528,8 @@ static int opal_imc_counters_probe(struct platform_device 
*pdev)
domain = IMC_DOMAIN_NEST;
else if (type == IMC_COUNTER_PER_CORE)
domain = IMC_DOMAIN_CORE;
+   else if (type == IMC_COUNTER_PER_THREAD)
+   domain = IMC_DOMAIN_THREAD;
else
continue;
if (!imc_pmu_create(imc_dev, pmu_count, domain))
-- 
2.7.4



[PATCH v10 06/10] powerpc/powernv: Core IMC events detection

2017-06-08 Thread Madhavan Srinivasan
This patch adds support for detection of core IMC events along with the
Nest IMC events. It adds a new domain IMC_DOMAIN_CORE and its determined
with the help of the "type" property in the IMC device tree.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h|  2 ++
 arch/powerpc/include/asm/opal-api.h   |  3 +++
 arch/powerpc/perf/imc-pmu.c   |  4 
 arch/powerpc/platforms/powernv/opal-imc.c | 19 ---
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 865cd756bfb6..0f018b6510c0 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -95,7 +95,9 @@ struct imc_pmu {
  * Domains for IMC PMUs
  */
 #define IMC_DOMAIN_NEST1
+#define IMC_DOMAIN_CORE2
 
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+extern struct imc_pmu *core_imc_pmu;
 extern int __init init_imc_pmu(struct imc_events *events, int idx, struct 
imc_pmu *pmu_ptr);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index ebe9d6bd3f06..7056b977df53 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -1008,6 +1008,9 @@ enum {
 
 /* In-Memory Collection Counters Type */
 enum {
+   IMC_COUNTER_PER_SUB_CORE= 0x2,
+   IMC_COUNTER_PER_CORE= 0x4,
+   IMC_COUNTER_PER_QUAD= 0x8,
IMC_COUNTER_PER_CHIP= 0x10,
IMC_COUNTER_PER_SOCKET  = 0x20,
 };
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index d3691d4d5f4b..08b363ac35ac 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -20,6 +20,8 @@
 
 /* Needed for sanity check */
 extern u64 nest_max_offset;
+extern u64 core_max_offset;
+
 struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 static cpumask_t nest_imc_cpumask;
 static int nest_imc_cpumask_initialized;
@@ -31,6 +33,8 @@ static DEFINE_MUTEX(imc_nest_reserve);
 static struct cpumask imc_result_mask;
 static DEFINE_MUTEX(imc_control_mutex);
 
+struct imc_pmu *core_imc_pmu;
+
 struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
 {
return container_of(event->pmu, struct imc_pmu, pmu);
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index c2a6019536a1..be6158a098dd 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -35,6 +35,7 @@
 #include 
 
 u64 nest_max_offset;
+u64 core_max_offset;
 
 static int imc_event_prop_update(char *name, struct imc_events *events)
 {
@@ -115,6 +116,10 @@ static void update_max_value(u32 value, int pmu_domain)
if (nest_max_offset < value)
nest_max_offset = value;
break;
+   case IMC_DOMAIN_CORE:
+   if (core_max_offset < value)
+   core_max_offset = value;
+   break;
default:
/* Unknown domain, return */
return;
@@ -399,7 +404,7 @@ static int imc_get_mem_addr_nest(struct device_node *node,
 /*
  * imc_pmu_create : Takes the parent device which is the pmu unit, pmu_index
  * and domain as the inputs.
- * Allocates memory for the pmu, sets up its domain (NEST), and
+ * Allocates memory for the pmu, sets up its domain (NEST/CORE), and
  * calls imc_events_setup() to allocate memory for the events supported
  * by this pmu. Assigns a name for the pmu.
  *
@@ -426,7 +431,10 @@ static int imc_pmu_create(struct device_node *parent, int 
pmu_index, int domain)
pmu_ptr->domain = domain;
 
/* Needed for hotplug/migration */
-   per_nest_pmu_arr[pmu_index] = pmu_ptr;
+   if (pmu_ptr->domain == IMC_DOMAIN_CORE)
+   core_imc_pmu = pmu_ptr;
+   else if (pmu_ptr->domain == IMC_DOMAIN_NEST)
+   per_nest_pmu_arr[pmu_index] = pmu_ptr;
 
pp = of_find_property(parent, "name", NULL);
if (!pp) {
@@ -447,7 +455,10 @@ static int imc_pmu_create(struct device_node *parent, int 
pmu_index, int domain)
goto free_pmu;
}
/* Save the name to register it later */
-   sprintf(buf, "nest_%s", (char *)pp->value);
+   if (pmu_ptr->domain == IMC_DOMAIN_NEST)
+   sprintf(buf, "nest_%s", (char *)pp->value);
+   else
+   sprintf(buf, "%s_imc", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
 
if (of_property_read_u32(parent, "size", _ptr->counter_mem_size))
@@ -510,6 +521,8 @@ static int opal_imc_counters_probe(struct platform_device 
*pdev)
continue;
if (type == IMC_COUNTER_PER_CHIP)
 

[PATCH v10 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-06-08 Thread Madhavan Srinivasan
From: Anju T Sudhakar 

Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
online CPU) from each chip for nest PMUs is designated to read counters.

On CPU hotplug, dying CPU is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same chip (for nest
units) is designated as new cpu to read counters. For this purpose, we
introduce a new state : CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/opal-api.h|  14 +-
 arch/powerpc/include/asm/opal.h|   4 +
 arch/powerpc/perf/imc-pmu.c| 211 -
 arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
 include/linux/cpuhotplug.h |   1 +
 5 files changed, 229 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index aa150f03a2d7..ebe9d6bd3f06 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -190,7 +190,10 @@
 #define OPAL_NPU_INIT_CONTEXT  146
 #define OPAL_NPU_DESTROY_CONTEXT   147
 #define OPAL_NPU_MAP_LPAR  148
-#define OPAL_LAST  148
+#define OPAL_IMC_COUNTERS_INIT 149
+#define OPAL_IMC_COUNTERS_START150
+#define OPAL_IMC_COUNTERS_STOP 151
+#define OPAL_LAST  151
 
 /* Device tree flags */
 
@@ -1005,8 +1008,13 @@ enum {
 
 /* In-Memory Collection Counters Type */
 enum {
-   IMC_COUNTER_PER_CHIP= 0x10,
-   IMC_COUNTER_PER_SOCKET  = 0x20,
+   IMC_COUNTER_PER_CHIP= 0x10,
+   IMC_COUNTER_PER_SOCKET  = 0x20,
+};
+
+/* Argument to OPAL_IMC_COUNTERS_*  */
+enum {
+   OPAL_IMC_COUNTERS_NEST = 1,
 };
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 588fb1c23af9..cdf78b6a9245 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -268,6 +268,10 @@ int64_t opal_xive_free_irq(uint32_t girq);
 int64_t opal_xive_sync(uint32_t type, uint32_t id);
 int64_t opal_xive_dump(uint32_t type, uint32_t id);
 
+int64_t opal_imc_counters_init(uint32_t type, uint64_t address, uint64_t cpu);
+int64_t opal_imc_counters_start(uint32_t type);
+int64_t opal_imc_counters_stop(uint32_t type);
+
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
   int depth, void *data);
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index e410788af10f..d3691d4d5f4b 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -21,6 +21,15 @@
 /* Needed for sanity check */
 extern u64 nest_max_offset;
 struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+static cpumask_t nest_imc_cpumask;
+static int nest_imc_cpumask_initialized;
+
+static atomic_t nest_events;
+/* Used to avoid races in calling enable/disable nest-pmu units */
+static DEFINE_MUTEX(imc_nest_reserve);
+
+static struct cpumask imc_result_mask;
+static DEFINE_MUTEX(imc_control_mutex);
 
 struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
 {
@@ -38,9 +47,181 @@ static struct attribute_group imc_format_group = {
.attrs = imc_format_attrs,
 };
 
+/* Get the cpumask printed to a buffer "buf" */
+static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   cpumask_t *active_mask;
+
+   active_mask = _imc_cpumask;
+   return cpumap_print_to_pagebuf(true, buf, active_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, imc_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *imc_pmu_cpumask_attrs[] = {
+   _attr_cpumask.attr,
+   NULL,
+};
+
+static struct attribute_group imc_pmu_cpumask_attr_group = {
+   .attrs = imc_pmu_cpumask_attrs,
+};
+
+/*
+ * opal_imc_stop : Does OPAL call to stop imc engine.
+ */
+static void opal_imc_stop(void *type)
+{
+   if (opal_imc_counters_stop((unsigned long) type))
+   cpumask_set_cpu(smp_processor_id(), _result_mask);
+}
+
+/* opal_imc_start: Does the OPAL call to start imc engine */
+static void opal_imc_start(void *type)
+{
+   if (opal_imc_counters_start((unsigned long) type))
+   cpumask_set_cpu(smp_processor_id(), _result_mask);
+}
+
+/*
+ * imc_control: Function to start and stop imc engine.
+ * The function is called from event init, event destroy and
+ * device shutdown.
+ */
+int imc_control(unsigned long type, bool operation)
+{
+   smp_call_func_t fn;
+   int res;
+   cpumask_t *imc_domain_mask;
+
+ 

[PATCH v10 04/10] powerpc/perf: Add generic IMC pmu group and event functions

2017-06-08 Thread Madhavan Srinivasan
From: Anju T Sudhakar 

Device tree IMC driver code parses the IMC units and their events. It
passes the information to IMC pmu code which is placed in powerpc/perf
as "imc-pmu.c".

Patch adds a set of generic imc pmu related event functions to be
used  by each imc pmu unit. Add code to setup format attribute and to
register imc pmus. Add a event_init function for nest_imc events.

Since, the IMC counters' data are periodically fed to a memory location,
the functions to read/update, start/stop, add/del can be generic and can
be used by all IMC PMU units.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h|   2 +
 arch/powerpc/perf/Makefile|   3 +
 arch/powerpc/perf/imc-pmu.c   | 268 ++
 arch/powerpc/platforms/powernv/opal-imc.c |  10 +-
 4 files changed, 282 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/perf/imc-pmu.c

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index ffaea0b9c13e..865cd756bfb6 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -96,4 +96,6 @@ struct imc_pmu {
  */
 #define IMC_DOMAIN_NEST1
 
+extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+extern int __init init_imc_pmu(struct imc_events *events, int idx, struct 
imc_pmu *pmu_ptr);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 4d606b99a5cb..b29d918814d3 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -6,6 +6,9 @@ obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)  += power4-pmu.o ppc970-pmu.o power5-pmu.o \
   power5+-pmu.o power6-pmu.o power7-pmu.o \
   isa207-common.o power8-pmu.o power9-pmu.o
+
+obj-$(CONFIG_HV_PERF_IMC_CTRS) += imc-pmu.o
+
 obj32-$(CONFIG_PPC_PERF_CTRS)  += mpc7450-pmu.o
 
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
new file mode 100644
index ..e410788af10f
--- /dev/null
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -0,0 +1,268 @@
+/*
+ * Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *   (C) 2017 Anju T Sudhakar, IBM Corporation.
+ *   (C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or later version.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Needed for sanity check */
+extern u64 nest_max_offset;
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
+{
+   return container_of(event->pmu, struct imc_pmu, pmu);
+}
+
+PMU_FORMAT_ATTR(event, "config:0-20");
+static struct attribute *imc_format_attrs[] = {
+   _attr_event.attr,
+   NULL,
+};
+
+static struct attribute_group imc_format_group = {
+   .name = "format",
+   .attrs = imc_format_attrs,
+};
+
+static int nest_imc_event_init(struct perf_event *event)
+{
+   int chip_id;
+   u32 config = event->attr.config;
+   struct imc_mem_info *pcni;
+   struct imc_pmu *pmu;
+   bool flag = false;
+
+   if (event->attr.type != event->pmu->type)
+   return -ENOENT;
+
+   /* Sampling not supported */
+   if (event->hw.sample_period)
+   return -EINVAL;
+
+   /* unsupported modes and filters */
+   if (event->attr.exclude_user   ||
+   event->attr.exclude_kernel ||
+   event->attr.exclude_hv ||
+   event->attr.exclude_idle   ||
+   event->attr.exclude_host   ||
+   event->attr.exclude_guest)
+   return -EINVAL;
+
+   if (event->cpu < 0)
+   return -EINVAL;
+
+   /* Sanity check for config (event offset) */
+   if (config > nest_max_offset)
+   return -EINVAL;
+
+   chip_id = topology_physical_package_id(event->cpu);
+   pmu = imc_event_to_pmu(event);
+   pcni = pmu->mem_info;
+   do {
+   if (pcni->id == chip_id) {
+   flag = true;
+   break;
+   }
+   pcni++;
+   }while(pcni);
+   if (!flag)
+   return -ENODEV;
+   /*
+* Memory for Nest HW counter data could be in multiple pages.
+* Hence check and pick the right event base page for chip with
+* "chip_id" and add "config" to it".
+*/
+   event->hw.event_base = 

[PATCH v10 03/10] powerpc/powernv: Detect supported IMC units and its events

2017-06-08 Thread Madhavan Srinivasan
Parse device tree to detect IMC units. Traverse through each IMC unit
node to find supported events and corresponding unit/scale files (if any).

The device tree for IMC counters starts at the node "imc-counters".
This node contains all the IMC PMU nodes and event nodes
for these IMC PMUs. The PMU nodes have an "events" property which has a
phandle value for the actual events node. The events are separated from
the PMU nodes to abstract out the common events. For example, PMU node
"mcs0", "mcs1" etc. will contain a pointer to "nest-mcs-events" since,
the events are common between these PMUs. These events have a different
prefix based on their relation to different PMUs, and hence, the PMU
nodes themselves contain an "events-prefix" property. The value for this
property concatenated to the event name, forms the actual event
name. Also, the PMU have a "reg" field as the base offset for the events
which belong to this PMU. This "reg" field is added to event's "reg" field
in the "events" node, which gives us the location of the counter data. Kernel
code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit property in the event
node and passes on the value as an event attr for perf interface to use
in the post processing by the perf tool. Some PMUs may have common scale
and unit properties which implies that all events supported by this PMU
inherit the scale and unit properties of the PMU itself. For those
events, we need to set the common unit and scale values.

For failure to initialize any unit or any event, disable that unit and
continue setting up the rest of them.

Signed-off-by: Hemant Kumar 
Signed-off-by: Anju T Sudhakar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/opal-api.h   |   6 +
 arch/powerpc/platforms/powernv/opal-imc.c | 458 +-
 2 files changed, 463 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index cb3e6242a78c..aa150f03a2d7 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -1003,6 +1003,12 @@ enum {
XIVE_DUMP_EMU_STATE = 5,
 };
 
+/* In-Memory Collection Counters Type */
+enum {
+   IMC_COUNTER_PER_CHIP= 0x10,
+   IMC_COUNTER_PER_SOCKET  = 0x20,
+};
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_API_H */
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 5b1045c81af4..6a38f7c9ea66 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -34,9 +34,456 @@
 #include 
 #include 
 
+u64 nest_max_offset;
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+static int imc_event_prop_update(char *name, struct imc_events *events)
+{
+   char *buf;
+
+   if (!events || !name)
+   return -EINVAL;
+
+   /* memory for content */
+   buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   events->ev_name = name;
+   events->ev_value = buf;
+   return 0;
+}
+
+static int imc_event_prop_str(struct property *pp, char *name,
+ struct imc_events *events)
+{
+   int ret;
+
+   ret = imc_event_prop_update(name, events);
+   if (ret)
+   return ret;
+
+   if (!pp->value || (strnlen(pp->value, pp->length) == pp->length) ||
+  (pp->length > IMC_MAX_NAME_VAL_LEN))
+   return -EINVAL;
+   strncpy(events->ev_value, (const char *)pp->value, pp->length);
+
+   return 0;
+}
+
+static int imc_event_prop_val(char *name, u32 val,
+ struct imc_events *events)
+{
+   int ret;
+
+   ret = imc_event_prop_update(name, events);
+   if (ret)
+   return ret;
+   snprintf(events->ev_value, IMC_MAX_NAME_VAL_LEN, "event=0x%x", val);
+
+   return 0;
+}
+
+static int set_event_property(struct property *pp, char *event_prop,
+ struct imc_events *events, char *ev_name)
+{
+   char *buf;
+   int ret;
+
+   buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   sprintf(buf, "%s.%s", ev_name, event_prop);
+   ret = imc_event_prop_str(pp, buf, events);
+   if (ret) {
+   if (events->ev_name)
+   kfree(events->ev_name);
+   if (events->ev_value)
+   kfree(events->ev_value);
+   }
+   return ret;
+}
+
+/*
+ * Updates the maximum offset for an event in the pmu with domain
+ * "pmu_domain".
+ */
+static void update_max_value(u32 value, int pmu_domain)
+{
+   switch (pmu_domain) {
+   case IMC_DOMAIN_NEST:
+   if (nest_max_offset < value)
+   nest_max_offset = value;
+  

[PATCH v10 02/10] powerpc/powernv: Autoload IMC device driver module

2017-06-08 Thread Madhavan Srinivasan
From: Anju T Sudhakar 

Code to create platform device for the IMC counters.
Paltform devices are created based on the IMC compatibility
string.

New Config flag "CONFIG_HV_PERF_IMC_CTRS" add to contain the
IMC counter changes.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/platforms/powernv/Kconfig| 10 +
 arch/powerpc/platforms/powernv/Makefile   |  1 +
 arch/powerpc/platforms/powernv/opal-imc.c | 73 +++
 arch/powerpc/platforms/powernv/opal.c | 18 
 4 files changed, 102 insertions(+)
 create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c

diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 6a6f4ef46b9e..543c6cd5e8d3 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -30,3 +30,13 @@ config OPAL_PRD
help
  This enables the opal-prd driver, a facility to run processor
  recovery diagnostics on OpenPower machines
+
+config HV_PERF_IMC_CTRS
+   bool "Hypervisor supplied In Memory Collection PMU events (Nest & Core)"
+   default y
+   depends on PERF_EVENTS && PPC_POWERNV
+   help
+ Enable access to hypervisor supplied in-memory collection counters
+ in perf. IMC counters are available from Power9 systems.
+
+  If unsure, select Y.
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb3f482..715e531f6711 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_PPC_SCOM)+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)   += opal-memory-errors.o
 obj-$(CONFIG_TRACEPOINTS)  += opal-tracepoints.o
 obj-$(CONFIG_OPAL_PRD) += opal-prd.o
+obj-$(CONFIG_HV_PERF_IMC_CTRS) += opal-imc.o
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
new file mode 100644
index ..5b1045c81af4
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -0,0 +1,73 @@
+/*
+ * OPAL IMC interface detection driver
+ * Supported on POWERNV platform
+ *
+ * Copyright   (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ * (C) 2017 Anju T Sudhakar, IBM Corporation.
+ * (C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int opal_imc_counters_probe(struct platform_device *pdev)
+{
+   struct device_node *imc_dev = NULL;
+
+   if (!pdev || !pdev->dev.of_node)
+   return -ENODEV;
+
+   /*
+* Check whether this is kdump kernel. If yes, just return.
+*/
+   if (is_kdump_kernel())
+   return -ENODEV;
+
+   imc_dev = pdev->dev.of_node;
+   if (!imc_dev)
+   return -ENODEV;
+
+   return 0;
+}
+
+static const struct of_device_id opal_imc_match[] = {
+   { .compatible = IMC_DTB_COMPAT },
+   {},
+};
+
+static struct platform_driver opal_imc_driver = {
+   .driver = {
+   .name = "opal-imc-counters",
+   .of_match_table = opal_imc_match,
+   },
+   .probe = opal_imc_counters_probe,
+};
+
+MODULE_DEVICE_TABLE(of, opal_imc_match);
+module_platform_driver(opal_imc_driver);
+MODULE_DESCRIPTION("PowerNV OPAL IMC driver");
+MODULE_LICENSE("GPL");
diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 59684b4af4d1..fbdca259ea76 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -30,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "powernv.h"
 
@@ -705,6 +707,17 @@ static void opal_pdev_init(const char *compatible)
of_platform_device_create(np, NULL, NULL);
 }
 
+#ifdef CONFIG_HV_PERF_IMC_CTRS
+static void __init opal_imc_init_dev(void)
+{
+   struct device_node *np;
+
+   np = of_find_compatible_node(NULL, NULL, IMC_DTB_COMPAT);
+   if (np)
+   of_platform_device_create(np, NULL, NULL);
+}
+#endif
+
 static int kopald(void *unused)
 

[PATCH v10 01/10] powerpc/powernv: Data structure and macros definitions for IMC

2017-06-08 Thread Madhavan Srinivasan
Create a new header file to add the data structures and
macros needed for In-Memory Collection (IMC) counter support.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h | 99 ++
 1 file changed, 99 insertions(+)
 create mode 100644 arch/powerpc/include/asm/imc-pmu.h

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
new file mode 100644
index ..ffaea0b9c13e
--- /dev/null
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -0,0 +1,99 @@
+#ifndef PPC_POWERNV_IMC_PMU_DEF_H
+#define PPC_POWERNV_IMC_PMU_DEF_H
+
+/*
+ * IMC Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *   (C) 2017 Anju T Sudhakar, IBM Corporation.
+ *   (C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * For static allocation of some of the structures.
+ */
+#define IMC_MAX_PMUS   32
+
+/*
+ * This macro is used for memory buffer allocation of
+ * event names and event string
+ */
+#define IMC_MAX_NAME_VAL_LEN   96
+
+/*
+ * Currently Microcode supports a max of 256KB of counter memory
+ * in the reserved memory region. Max pages to mmap (considering 4K PAGESIZE).
+ */
+#define IMC_MAX_PAGES  64
+
+/*
+ *Compatbility macros for IMC devices
+ */
+#define IMC_DTB_COMPAT "ibm,opal-in-memory-counters"
+#define IMC_DTB_UNIT_COMPAT"ibm,imc-counters"
+
+/*
+ * Structure to hold memory address information for imc units.
+ */
+struct imc_mem_info {
+   u32 id;
+   u64 *vbase[IMC_MAX_PAGES];
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct imc_events {
+   char *ev_name;
+   char *ev_value;
+};
+
+#define IMC_FORMAT_ATTR0
+#define IMC_CPUMASK_ATTR   1
+#define IMC_EVENT_ATTR 2
+#define IMC_NULL_ATTR  3
+
+/*
+ * Device tree parser code detects IMC pmu support and
+ * registers new IMC pmus. This structure will hold the
+ * pmu functions, events, counter memory information
+ * and attrs for each imc pmu and will be referenced at
+ * the time of pmu registration.
+ */
+struct imc_pmu {
+   struct pmu pmu;
+   int domain;
+   /*
+* flag to notify whether the memory is mmaped
+* or allocated by kernel.
+*/
+   int imc_counter_mmaped;
+   struct imc_mem_info *mem_info;
+   struct imc_events *events;
+   u32 counter_mem_size;
+   /*
+* Attribute groups for the PMU. Slot 0 used for
+* format attribute, slot 1 used for cpusmask attribute,
+* slot 2 used for event attribute. Slot 3 keep as
+* NULL.
+*/
+   const struct attribute_group *attr_groups[4];
+};
+
+/*
+ * Domains for IMC PMUs
+ */
+#define IMC_DOMAIN_NEST1
+
+#endif /* PPC_POWERNV_IMC_PMU_DEF_H */
-- 
2.7.4



[PATCH v10 00/10] IMC Instrumentation Support

2017-06-08 Thread Madhavan Srinivasan
Power9 has In-Memory-Collection (IMC) infrastructure which contains
various Performance Monitoring Units (PMUs) at Nest level (these are
on-chip but off-core), Core level and Thread level.

The Nest PMU counters are handled by a Nest IMC microcode which runs
in the OCC (On-Chip Controller) complex. The microcode collects the
counter data and moves the nest IMC counter data to memory.

The Core and Thread IMC PMU counters are handled in the core. Core
level PMU counters give us the IMC counters' data per core and thread
level PMU counters give us the IMC counters' data per CPU thread.

This patchset enables the nest IMC, core IMC and thread IMC
PMUs and is based on the initial work done by Madhavan Srinivasan.
"Nest Instrumentation Support" :
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132078.html

v1 for this patchset can be found here :
https://lwn.net/Articles/705475/

Nest events:
Per-chip nest instrumentation provides various per-chip metrics
such as memory, powerbus, Xlink and Alink bandwidth.

Core events:
Per-core IMC instrumentation provides various per-core metrics
such as non-idle cycles, non-idle instructions, various cache and
memory related metrics etc.

Thread events:
All the events for thread level are same as core level with the
difference being in the domain. These are per-cpu metrics.

PMU Events' Information:
OPAL obtains the IMC PMU and event information from the IMC Catalog
and passes on to the kernel via the device tree. The events' information
contains :
 - Event name
 - Event Offset
 - Event description
and, maybe :
 - Event scale
 - Event unit

Some PMUs may have a common scale and unit values for all their
supported events. For those cases, the scale and unit properties for
those events must be inherited from the PMU.

The event offset in the memory is where the counter data gets
accumulated.

The OPAL-side patches are posted upstream :
https://lists.ozlabs.org/pipermail/skiboot/2017-May/007360.html

The kernel discovers the IMC counters information in the device tree
at the "imc-counters" device node which has a compatible field
"ibm,opal-in-memory-counters".

Parsing of the Events' information:
To parse the IMC PMUs and events information, the kernel has to
discover the "imc-counters" node and walk through the pmu and event
nodes.

Here is an excerpt of the dt showing the imc-counters with
mcs0 (nest), core and thread node:

/dts-v1/;

/ {
name = "";
compatible = "ibm,opal-in-memory-counters";
#address-cells = <0x1>;
#size-cells = <0x1>;
version-id = "";

NEST_MCS: nest-mcs-events {
#address-cells = <0x1>;
#size-cells = <0x1>;

event at 0 {
event-name = "RRTO_QFULL_NO_DISP" ;
reg = <0x0 0x8>;
desc = "RRTO not dispatched in MCS0 due to capacity - 
pulses once for each time a valid RRTO op is not dispatched due to a command 
list full condition" ;
};
event at 8 {
event-name = "WRTO_QFULL_NO_DISP" ;
reg = <0x8 0x8>;
desc = "WRTO not dispatched in MCS0 due to capacity - 
pulses once for each time a valid WRTO op is not dispatched due to a command 
list full condition" ;
};
[...]
mcs0 {
compatible = "ibm,imc-counters";
events-prefix = "PM_MCS0_";
unit = "";
scale = "";
reg = <0x118 0x8>;
events = < _MCS >;
type = <0x10>;
};
   mcs1 {
compatible = "ibm,imc-counters";
events-prefix = "PM_MCS1_";
unit = "";
scale = "";
reg = <0x198 0x8>;
events = < _MCS >;
type = <0x10>;
};
[...]

CORE_EVENTS: core-events {
#address-cells = <0x1>;
#size-cells = <0x1>;

event at e0 {
event-name = "0THRD_NON_IDLE_PCYC" ;
reg = <0xe0 0x8>;
desc = "The number of processor cycles when all threads 
are idle" ;
};
event at 120 {
event-name = "1THRD_NON_IDLE_PCYC" ;
reg = <0x120 0x8>;
desc = "The number of processor cycles when exactly one 
SMT thread is executing non-idle code" ;
};
[...]
core {
compatible = "ibm,imc-counters";
events-prefix = "CPM_";
unit = "";
scale = "";
reg = <0x0 0x8>;
events = < _EVENTS >;
type = <0x4>;
};

thread {
compatible = "ibm,imc-counters";
events-prefix = "CPM_";
unit = 

Re: [PATCH 3/4] powerpc/64: context switch an hwsync instruction can be avoided

2017-06-08 Thread Peter Zijlstra
On Fri, Jun 09, 2017 at 01:36:08AM +1000, Nicholas Piggin wrote:
> The hwsync in the context switch code to prevent MMIO access being
> reordered from the point of view of a single process if it gets
> migrated to a different CPU is not required because there is an hwsync
> performed earlier in the context switch path.
> 
> Comment this so it's clear enough if anything changes on the scheduler
> or the powerpc sides. Remove the hwsync from _switch.
> 
> This improves context switch performance by 2-3% on POWER8.
> 
> Cc: Peter Zijlstra 
> Signed-off-by: Nicholas Piggin 

Acked-by: Peter Zijlstra (Intel) 


[PATCH] kernel/kprobes: Add test to validate pt_regs

2017-06-08 Thread Naveen N. Rao
Add a test to verify that the registers passed in pt_regs on kprobe
(trap), optprobe (jump) and kprobe_on_ftrace (ftrace_caller) are
accurate. The tests are exercized if KPROBES_SANITY_TEST is enabled.

Implemented for powerpc64. Other architectures will have to implement
the relevant arch_* helpers and define HAVE_KPROBES_REGS_SANITY_TEST.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/kprobes.h  |   4 +
 arch/powerpc/lib/Makefile   |   3 +-
 arch/powerpc/lib/test_kprobe_regs.S |  62 
 arch/powerpc/lib/test_kprobes.c | 115 ++
 include/linux/kprobes.h |  11 +++
 kernel/test_kprobes.c   | 183 
 6 files changed, 377 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/lib/test_kprobe_regs.S
 create mode 100644 arch/powerpc/lib/test_kprobes.c

diff --git a/arch/powerpc/include/asm/kprobes.h 
b/arch/powerpc/include/asm/kprobes.h
index 566da372e02b..10c91d3132a1 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -124,6 +124,10 @@ static inline int skip_singlestep(struct kprobe *p, struct 
pt_regs *regs,
return 0;
 }
 #endif
+#if defined(CONFIG_KPROBES_SANITY_TEST) && defined(CONFIG_PPC64)
+#define HAVE_KPROBES_REGS_SANITY_TEST
+void arch_kprobe_regs_set_ptregs(struct pt_regs *regs);
+#endif
 #else
 static inline int kprobe_handler(struct pt_regs *regs) { return 0; }
 static inline int kprobe_post_handler(struct pt_regs *regs) { return 0; }
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 3c3146ba62da..8a0bb8e20179 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -27,7 +27,8 @@ obj64-y   += copypage_64.o copyuser_64.o mem_64.o 
hweight_64.o \
 
 obj64-$(CONFIG_SMP)+= locks.o
 obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o
-obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o
+obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o test_kprobe_regs.o \
+  test_kprobes.o
 
 obj-y  += checksum_$(BITS).o checksum_wrappers.o
 
diff --git a/arch/powerpc/lib/test_kprobe_regs.S 
b/arch/powerpc/lib/test_kprobe_regs.S
new file mode 100644
index ..4e95eca6dcd3
--- /dev/null
+++ b/arch/powerpc/lib/test_kprobe_regs.S
@@ -0,0 +1,62 @@
+/*
+ * test_kprobe_regs: architectural helpers for validating pt_regs
+ *  received on a kprobe.
+ *
+ * Copyright 2017 Naveen N. Rao 
+ *   IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include 
+#include 
+#include 
+
+_GLOBAL(arch_kprobe_regs_function)
+   mflrr0
+   std r0, LRSAVE(r1)
+   stdur1, -SWITCH_FRAME_SIZE(r1)
+
+   /* Tell pre handler about our pt_regs location */
+   addir3, r1, STACK_FRAME_OVERHEAD
+   bl  arch_kprobe_regs_set_ptregs
+
+   /* Load back our true LR */
+   ld  r0, (SWITCH_FRAME_SIZE + LRSAVE)(r1)
+   mtlrr0
+
+   /* Save all SPRs that we care about */
+   mfctr   r0
+   std r0, _CTR(r1)
+   mflrr0
+   std r0, _LINK(r1)
+   mfspr   r0, SPRN_XER
+   std r0, _XER(r1)
+   mfcrr0
+   std r0, _CCR(r1)
+
+   /* Now, save all GPRs */
+   SAVE_2GPRS(0, r1)
+   SAVE_10GPRS(2, r1)
+   SAVE_10GPRS(12, r1)
+   SAVE_10GPRS(22, r1)
+
+   /* We're now ready to be probed */
+.global arch_kprobe_regs_probepoint
+arch_kprobe_regs_probepoint:
+   nop
+
+#ifdef CONFIG_KPROBES_ON_FTRACE
+   /* Let's also test KPROBES_ON_FTRACE */
+   bl  kprobe_regs_kp_on_ftrace_target
+   nop
+#endif
+
+   /* All done */
+   addir1, r1, SWITCH_FRAME_SIZE
+   ld  r0, LRSAVE(r1)
+   mtlrr0
+   blr
diff --git a/arch/powerpc/lib/test_kprobes.c b/arch/powerpc/lib/test_kprobes.c
new file mode 100644
index ..23f7a7ffcdd6
--- /dev/null
+++ b/arch/powerpc/lib/test_kprobes.c
@@ -0,0 +1,115 @@
+/*
+ * test_kprobes: architectural helpers for validating pt_regs
+ *  received on a kprobe.
+ *
+ * Copyright 2017 Naveen N. Rao 
+ *   IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#define pr_fmt(fmt) "Kprobe smoke test (regs): " fmt
+
+#include 
+#include 
+#include 
+
+static struct pt_regs *r;
+
+void arch_kprobe_regs_set_ptregs(struct pt_regs *regs)
+{
+   r = regs;
+}
+
+static int validate_regs(struct kprobe *p, struct pt_regs *regs,
+   

[PATCH] powerpc/kprobes: Don't save/restore DAR/DSISR to/from pt_regs for optprobes

2017-06-08 Thread Naveen N. Rao
We don't save/restore these across a trap, or with KPROBES_ON_FTRACE.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/optprobes_head.S | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/kernel/optprobes_head.S 
b/arch/powerpc/kernel/optprobes_head.S
index 4937bef7652f..52fc864cdec4 100644
--- a/arch/powerpc/kernel/optprobes_head.S
+++ b/arch/powerpc/kernel/optprobes_head.S
@@ -60,10 +60,6 @@ optprobe_template_entry:
std r5,_CCR(r1)
lbz r5,PACASOFTIRQEN(r13)
std r5,SOFTE(r1)
-   mfdar   r5
-   std r5,_DAR(r1)
-   mfdsisr r5
-   std r5,_DSISR(r1)
 
/*
 * We may get here from a module, so load the kernel TOC in r2.
@@ -122,10 +118,6 @@ optprobe_template_call_emulate:
mtxer   r5
ld  r5,_CCR(r1)
mtcrr5
-   ld  r5,_DAR(r1)
-   mtdar   r5
-   ld  r5,_DSISR(r1)
-   mtdsisr r5
REST_GPR(0,r1)
REST_10GPRS(2,r1)
REST_10GPRS(12,r1)
-- 
2.12.2



Resend [PATCH] powerpc/hotplug-mem: Fix aa_index match bug for hotplug

2017-06-08 Thread Michael Bringmann

When adding or removing memory, the aa_index (affinity value) for the
memblock must also be converted to match the endianness of the rest
of the 'ibm,dynamic-memory' property.  Otherwise, subsequent retrieval
of the attribute will likely lead to non-existent nodes, followed by
using the default node in the code inappropriately.

Note: This is a bug that we would likely want to see fixed in stable
  circa v4.1+.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index e104c71..1fb162b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -124,6 +124,7 @@ static struct property *dlpar_clone_drconf_property(struct 
device_node *dn)
for (i = 0; i < num_lmbs; i++) {
lmbs[i].base_addr = be64_to_cpu(lmbs[i].base_addr);
lmbs[i].drc_index = be32_to_cpu(lmbs[i].drc_index);
+   lmbs[i].aa_index = be32_to_cpu(lmbs[i].aa_index);
lmbs[i].flags = be32_to_cpu(lmbs[i].flags);
}
 
@@ -147,6 +148,7 @@ static void dlpar_update_drconf_property(struct device_node 
*dn,
for (i = 0; i < num_lmbs; i++) {
lmbs[i].base_addr = cpu_to_be64(lmbs[i].base_addr);
lmbs[i].drc_index = cpu_to_be32(lmbs[i].drc_index);
+   lmbs[i].aa_index = cpu_to_be32(lmbs[i].aa_index);
lmbs[i].flags = cpu_to_be32(lmbs[i].flags);
}
 



Re: [PATCH v9 10/10] powerpc/perf: Thread imc cpuhotplug support

2017-06-08 Thread Madhavan Srinivasan



On Tuesday 06 June 2017 03:57 PM, Thomas Gleixner wrote:

On Mon, 5 Jun 2017, Anju T Sudhakar wrote:

  static void thread_imc_mem_alloc(int cpu_id)
  {
-   u64 ldbar_addr, ldbar_value;
int phys_id = topology_physical_package_id(cpu_id);
  
  	per_cpu_add[cpu_id] = (u64)alloc_pages_exact_nid(phys_id,

(size_t)IMC_THREAD_COUNTER_MEM, GFP_KERNEL | 
__GFP_ZERO);

It took me a while to understand that per_cpu_add is an array and not a new
fangled per cpu function which adds something to a per cpu variable.

So why is that storing the address as u64?


Value that we need to load to the ldbar spr also has other bits
apart from memory addr. So we made it as an array. But this
should be a per_cpu pointer.  Will fix it.




And why is this a NR_CPUS sized array instead of a regular per cpu variable?


Yes. will fix it.


+}
+
+static void thread_imc_update_ldbar(unsigned int cpu_id)
+{
+   u64 ldbar_addr, ldbar_value;
+
+   if (per_cpu_add[cpu_id] == 0)
+   thread_imc_mem_alloc(cpu_id);

So if that allocation fails then you happily proceed. Optmistic
programming, right?


Will add return value check.




+
ldbar_addr = (u64)virt_to_phys((void *)per_cpu_add[cpu_id]);

Ah, it's stored as u64 because you have to convert it back to a void
pointer at the place where it is actually used. Interesting approach.


ldbar_value = (ldbar_addr & (u64)THREAD_IMC_LDBAR_MASK) |
-   (u64)THREAD_IMC_ENABLE;
+   (u64)THREAD_IMC_ENABLE;
mtspr(SPRN_LDBAR, ldbar_value);
  }
  
@@ -442,6 +450,26 @@ static void core_imc_change_cpu_context(int old_cpu, int new_cpu)

perf_pmu_migrate_context(_imc_pmu->pmu, old_cpu, new_cpu);
  }
  
+static int ppc_thread_imc_cpu_online(unsigned int cpu)

+{
+   thread_imc_update_ldbar(cpu);
+   return 0;
+}
+
+static int ppc_thread_imc_cpu_offline(unsigned int cpu)
+{
+   mtspr(SPRN_LDBAR, 0);
+   return 0;
+}
+
+void thread_imc_cpu_init(void)
+{
+   cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
+ "perf/powerpc/imc_thread:online",
+ ppc_thread_imc_cpu_online,
+ ppc_thread_imc_cpu_offline);
+}
+
  static int ppc_core_imc_cpu_online(unsigned int cpu)
  {
const struct cpumask *l_cpumask;
@@ -953,6 +981,9 @@ int __init init_imc_pmu(struct imc_events *events, int idx,
if (ret)
return ret;
break;
+   case IMC_DOMAIN_THREAD:
+   thread_imc_cpu_init();
+   break;
default:
return -1;  /* Unknown domain */
}

Just a general observation. If anything in init_imc_pmu() fails, then all
the hotplug callbacks are staying registered, at least I haven't seen
anything undoing it.

Yes. We did not add code to unregister the call back on failure.
Will fix that in next version.

Thanks for the review.
Maddy


Thanks,

tglx





Re: [PATCH v5 2/2] powerpc/fadump: update documentation about 'fadump_append=' parameter

2017-06-08 Thread Hari Bathini

Hi Michal,

Sorry for taking this long to respond. I was working on a few other things.

On Monday 15 May 2017 02:59 PM, Michal Suchánek wrote:

Hello,

On Mon, 15 May 2017 12:59:46 +0530
Hari Bathini  wrote:


On Friday 12 May 2017 09:12 PM, Michal Suchánek wrote:

On Fri, 12 May 2017 15:15:33 +0530
Hari Bathini  wrote:
  

On Thursday 11 May 2017 06:46 PM, Michal Suchánek wrote:

On Thu, 11 May 2017 02:00:11 +0530
Hari Bathini  wrote:
 

Hello Michal,

On Wednesday 10 May 2017 09:31 PM, Michal Suchánek wrote:

Hello,

On Wed, 03 May 2017 23:52:52 +0530
Hari Bathini  wrote:


With the introduction of 'fadump_append=' parameter to pass
additional parameters to fadump (capture) kernel, update
documentation about it.

Signed-off-by: Hari Bathini 
---

Changes from v4:
* Based on top of patchset that includes
  
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=akpm=05f383cdfba8793240e73f9a9fbff4e25d66003f


 Documentation/powerpc/firmware-assisted-dump.txt |   10
+- 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt
b/Documentation/powerpc/firmware-assisted-dump.txt index
8394bc8..6327193 100644 ---
a/Documentation/powerpc/firmware-assisted-dump.txt +++
b/Documentation/powerpc/firmware-assisted-dump.txt @@ -162,7
+162,15 @@ How to enable firmware-assisted dump (fadump):
 1. Set config option CONFIG_FA_DUMP=y and build kernel.
 2. Boot into linux kernel with 'fadump=on' kernel cmdline
option. -3. Optionally, user can also set 'crashkernel=' kernel
cmdline +3. A user can pass additional command line parameters
as a comma
+   separated list through 'fadump_append=' parameter, to be
enforced
+   when fadump is active. For example, if parameters like
nr_cpus=1,
+   numa=off & udev.children-max=2 are to be enforced when
fadump is
+   active,
'fadump_append=nr_cpus=1,numa=off,udev.children-max=2'
+   can be passed in command line, which will be replaced with
+   "nr_cpus=1 numa=off udev.children-max=2" when fadump is
active.
+   This helps in reducing memory consumption during dump
capture. +4. Optionally, user can also set 'crashkernel='
kernel cmdline to specify size of the memory to reserve for
boot memory dump preservation.
 


Writing your own deficient parser for comma separated arguments
when perfectly fine parser for space separated quoted arguments
exists in the kernel and the bootloader does not seem like a
good idea to me.

Couple of things that prompted me for v5 are:
  1. Using parse_early_options() limits the kind of parameters
 that can be passed to fadump capture kernel. Passing
parameters like systemd.unit= & udev.childern.max= has no effect
with v4. Updating
 boot_command_line parameter, when fadump is active,
seems a better alternative.

  2. Passing space-separated quoted arguments is not working
 as intended with lilo. Updating bootloader with the below
 entry in /etc/lilo.conf file results in a missing append
 entry in /etc/yaboot.conf file.

   append = "quiet sysrq=1 insmod=sym53c8xx insmod=ipr
crashkernel=512M-:256M fadump_append=\"nr_cpus=1 numa=off
udev.children-max=2\""

Meaning that a script that emulates LILO semantics on top of
yaboot which is completely capable of passing qouted space
separated arguments fails. IMHO it is more reasonable to fix the
script or whatever adaptation layer or use yaboot directly than
working around bug in said script by introducing a new argument
parser in the kernel.

 

Hmmm.. while trying to implement space-separated parameter list
with quotes as syntax for fadump_append parameter, noticed that it
can make implemenation
more vulnerable. Here are some problems I am facing while
implementing this..

How so?

presumably you can reuse parse_args even if you do not register with
early_param and call it yourself. Then your parsing of
fadump_append is


I wasn't aware of that. Thanks for pointing it out, Michal.
Will try to use parse_args and get back.


I was thinking a bit more about the uses of the commandline and how
fadump_append potentially breaks it.

The first thing that should be addressed and is the special -- argument
which denotes the start of init arguments that are not to be parsed by
the kernel. Incidentally the initial implementation using early_param
happened to handles that without issue. parse_args surely handles that
so adding a hook somewhere should give you location of that argument
(if any). And interesting thing that can happen is passing an -- inside
the fadump_append argument. It should be handled (or not) in some way
or other and the handling documented.



The intention with this patch is to replace

  "root=/dev/sda2 ro fadump_append=nr_cpus=1,numa=off crashkernel=1024M"

with

  "root=/dev/sda2 

Re: [PATCH 19/44] s390: implement ->mapping_error

2017-06-08 Thread Gerald Schaefer
On Thu,  8 Jun 2017 15:25:44 +0200
Christoph Hellwig  wrote:

> s390 can also use noop_dma_ops, and while that currently does not return
> errors it will so in the future.  Implementing the mapping_error method
> is the proper way to have per-ops error conditions.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Gerald Schaefer 



[PATCH 21/35] powerpc: defconfig: Cleanup from old Kconfig options

2017-06-08 Thread Krzysztof Kozlowski
Remove old, dead Kconfig option USB_LED. It is gone since
commit a335aaf3125c ("usb: misc: remove outdated USB LED driver").

Signed-off-by: Krzysztof Kozlowski 
---
 arch/powerpc/configs/c2k_defconfig| 1 -
 arch/powerpc/configs/ppc6xx_defconfig | 1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/powerpc/configs/c2k_defconfig 
b/arch/powerpc/configs/c2k_defconfig
index 7c9d95370150..10f5ffd08956 100644
--- a/arch/powerpc/configs/c2k_defconfig
+++ b/arch/powerpc/configs/c2k_defconfig
@@ -298,7 +298,6 @@ CONFIG_USB_EMI62=m
 CONFIG_USB_RIO500=m
 CONFIG_USB_LEGOTOWER=m
 CONFIG_USB_LCD=m
-CONFIG_USB_LED=m
 CONFIG_USB_TEST=m
 CONFIG_USB_ATM=m
 CONFIG_USB_SPEEDTOUCH=m
diff --git a/arch/powerpc/configs/ppc6xx_defconfig 
b/arch/powerpc/configs/ppc6xx_defconfig
index 18d0d60dadbf..f170744738cf 100644
--- a/arch/powerpc/configs/ppc6xx_defconfig
+++ b/arch/powerpc/configs/ppc6xx_defconfig
@@ -967,7 +967,6 @@ CONFIG_USB_ADUTUX=m
 CONFIG_USB_SEVSEG=m
 CONFIG_USB_LEGOTOWER=m
 CONFIG_USB_LCD=m
-CONFIG_USB_LED=m
 CONFIG_USB_IDMOUSE=m
 CONFIG_USB_FTDI_ELAN=m
 CONFIG_USB_APPLEDISPLAY=m
-- 
2.9.3



[PATCH 14/14] powerpc/64s: idle runlatch switch is done with MSR[EE]=0

2017-06-08 Thread Nicholas Piggin
2*mfmsr and 2*mtmsr can be avoided in the idle sleep/wake code
because we know the MSR[EE] is clear.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/powernv/idle.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 7c83a95f929e..8db49f9c2bd8 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -263,9 +263,9 @@ static unsigned long __power7_idle_type(unsigned long type)
if (!prep_irq_for_idle())
return 0;
 
-   ppc64_runlatch_off();
+   __ppc64_runlatch_off();
srr1 = power7_idle_insn(type);
-   ppc64_runlatch_on();
+   __ppc64_runlatch_on();
 
return srr1;
 }
@@ -300,9 +300,9 @@ static unsigned long __power9_idle_type(unsigned long 
stop_psscr_val,
psscr = mfspr(SPRN_PSSCR);
psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
 
-   ppc64_runlatch_off();
+   __ppc64_runlatch_off();
srr1 = power9_idle_stop(psscr);
-   ppc64_runlatch_on();
+   __ppc64_runlatch_on();
 
trace_hardirqs_off();
 
@@ -336,7 +336,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
unsigned long srr1;
u32 idle_states = pnv_get_supported_cpuidle_states();
 
-   ppc64_runlatch_off();
+   __ppc64_runlatch_off();
 
if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
unsigned long psscr;
@@ -363,7 +363,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
HMT_medium();
}
 
-   ppc64_runlatch_on();
+   __ppc64_runlatch_on();
 
return srr1;
 }
-- 
2.11.0



[PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation

2017-06-08 Thread Nicholas Piggin
The CTRL register is read-only except bit 63 which is the run latch
control. This means it can be updated with a mtspr rather than
mfspr/mtspr.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/process.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 6273b5d5baec..29865c817b02 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1991,12 +1991,8 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
 void notrace __ppc64_runlatch_on(void)
 {
struct thread_info *ti = current_thread_info();
-   unsigned long ctrl;
-
-   ctrl = mfspr(SPRN_CTRLF);
-   ctrl |= CTRL_RUNLATCH;
-   mtspr(SPRN_CTRLT, ctrl);
 
+   mtspr(SPRN_CTRLT, CTRL_RUNLATCH);
ti->local_flags |= _TLF_RUNLATCH;
 }
 
@@ -2004,13 +2000,9 @@ void notrace __ppc64_runlatch_on(void)
 void notrace __ppc64_runlatch_off(void)
 {
struct thread_info *ti = current_thread_info();
-   unsigned long ctrl;
 
ti->local_flags &= ~_TLF_RUNLATCH;
-
-   ctrl = mfspr(SPRN_CTRLF);
-   ctrl &= ~CTRL_RUNLATCH;
-   mtspr(SPRN_CTRLT, ctrl);
+   mtspr(SPRN_CTRLT, 0);
 }
 #endif /* CONFIG_PPC64 */
 
-- 
2.11.0



[PATCH 12/14] powerpc/64s: cpuidle no memory barrier after break from idle

2017-06-08 Thread Nicholas Piggin
A memory barrier is not required after the task wakes up,
only if we clear the polling flag before waking. The case
where we have work to do is the important one, so optimise
for it.

Signed-off-by: Nicholas Piggin 
---
 drivers/cpuidle/cpuidle-powernv.c | 11 +--
 drivers/cpuidle/cpuidle-pseries.c | 11 +--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index f0247652d91f..c53a8bb40471 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -59,14 +59,21 @@ static int snooze_loop(struct cpuidle_device *dev,
ppc64_runlatch_off();
HMT_very_low();
while (!need_resched()) {
-   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time)
+   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+   /*
+* Task has not woken up but we are exiting the polling
+* loop anyway. Require a barrier after polling is
+* cleared to order subsequent test of need_resched().
+*/
+   clear_thread_flag(TIF_POLLING_NRFLAG);
+   smp_mb();
break;
+   }
}
 
HMT_medium();
ppc64_runlatch_on();
clear_thread_flag(TIF_POLLING_NRFLAG);
-   smp_mb();
 
return index;
 }
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index a404f352d284..e9b3853d93ea 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -71,13 +71,20 @@ static int snooze_loop(struct cpuidle_device *dev,
while (!need_resched()) {
HMT_low();
HMT_very_low();
-   if (snooze_timeout_en && get_tb() > snooze_exit_time)
+   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+   /*
+* Task has not woken up but we are exiting the polling
+* loop anyway. Require a barrier after polling is
+* cleared to order subsequent test of need_resched().
+*/
+   clear_thread_flag(TIF_POLLING_NRFLAG);
+   smp_mb();
break;
+   }
}
 
HMT_medium();
clear_thread_flag(TIF_POLLING_NRFLAG);
-   smp_mb();
 
idle_loop_epilog(in_purr);
 
-- 
2.11.0



[PATCH 11/14] powerpc/64s: cpuidle read mostly for common globals

2017-06-08 Thread Nicholas Piggin
Ensure these don't get put into bouncing cachelines.

Signed-off-by: Nicholas Piggin 
---
 drivers/cpuidle/cpuidle-powernv.c | 10 +-
 drivers/cpuidle/cpuidle-pseries.c |  8 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 0ee4660efb5f..f0247652d91f 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -32,18 +32,18 @@ static struct cpuidle_driver powernv_idle_driver = {
.owner= THIS_MODULE,
 };
 
-static int max_idle_state;
-static struct cpuidle_state *cpuidle_state_table;
+static int max_idle_state __read_mostly;
+static struct cpuidle_state *cpuidle_state_table __read_mostly;
 
 struct stop_psscr_table {
u64 val;
u64 mask;
 };
 
-static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX];
+static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] 
__read_mostly;
 
-static u64 snooze_timeout;
-static bool snooze_timeout_en;
+static u64 snooze_timeout __read_mostly;
+static bool snooze_timeout_en __read_mostly;
 
 static int snooze_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 7b12bb2ea70f..a404f352d284 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -25,10 +25,10 @@ struct cpuidle_driver pseries_idle_driver = {
.owner= THIS_MODULE,
 };
 
-static int max_idle_state;
-static struct cpuidle_state *cpuidle_state_table;
-static u64 snooze_timeout;
-static bool snooze_timeout_en;
+static int max_idle_state __read_mostly;
+static struct cpuidle_state *cpuidle_state_table __read_mostly;
+static u64 snooze_timeout __read_mostly;
+static bool snooze_timeout_en __read_mostly;
 
 static inline void idle_loop_prolog(unsigned long *in_purr)
 {
-- 
2.11.0



[PATCH 10/14] powerpc/64s: cpuidle set polling before enabling irqs

2017-06-08 Thread Nicholas Piggin
local_irq_enable can cause interrupts to be taken which could
take significant amount of processing time. The idle process
should set its polling flag before this, so another process that
wakes it during this time will not have to send an IPI.

Expand the TIF_POLLING_NRFLAG coverage to as large as possible.

Signed-off-by: Nicholas Piggin 
---
 drivers/cpuidle/cpuidle-powernv.c | 4 +++-
 drivers/cpuidle/cpuidle-pseries.c | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 150b971c303b..0ee4660efb5f 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -51,9 +51,10 @@ static int snooze_loop(struct cpuidle_device *dev,
 {
u64 snooze_exit_time;
 
-   local_irq_enable();
set_thread_flag(TIF_POLLING_NRFLAG);
 
+   local_irq_enable();
+
snooze_exit_time = get_tb() + snooze_timeout;
ppc64_runlatch_off();
HMT_very_low();
@@ -66,6 +67,7 @@ static int snooze_loop(struct cpuidle_device *dev,
ppc64_runlatch_on();
clear_thread_flag(TIF_POLLING_NRFLAG);
smp_mb();
+
return index;
 }
 
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 166ccd711ec9..7b12bb2ea70f 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -62,9 +62,10 @@ static int snooze_loop(struct cpuidle_device *dev,
unsigned long in_purr;
u64 snooze_exit_time;
 
+   set_thread_flag(TIF_POLLING_NRFLAG);
+
idle_loop_prolog(_purr);
local_irq_enable();
-   set_thread_flag(TIF_POLLING_NRFLAG);
snooze_exit_time = get_tb() + snooze_timeout;
 
while (!need_resched()) {
-- 
2.11.0



[PATCH 09/14] powerpc/64s: idle hmi wakeup is unlikely

2017-06-08 Thread Nicholas Piggin
In a busy system, idle wakeups can be expected from IPIs and device
interrupts.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/idle_book3s.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 2efb88da8ba3..4004cdf72f42 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -303,7 +303,7 @@ FTR_SECTION_ELSE_NESTED(66);
\
rlwinm  r0,r12,45-31,0xe;  /* P7 wake reason field is 3 bits */ \
 ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);   \
cmpwi   r0,0xa; /* Hypervisor maintenance ? */  \
-   bne 20f;\
+   bne+20f;\
/* Invoke opal call to handle hmi */\
ld  r2,PACATOC(r13);\
ld  r1,PACAR1(r13); \
-- 
2.11.0



[PATCH 08/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths

2017-06-08 Thread Nicholas Piggin
Idle code now always runs at the 0xc... effective address whether
in real or virtual mode. This means rfid can be ditched, along
with a lot of SRR manipulations.

In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change
MSR states as required.

This also balances the return prediction for the idle call, by
doing blr rather than rfid to return to the idle caller.

On POWER9, 2-process context switch on different cores, with snooze
disabled, increases performance by 2%.
---
 arch/powerpc/kernel/exceptions-64s.S|  1 +
 arch/powerpc/kernel/idle_book3s.S   | 57 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 -
 3 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 153cd967554a..eb703c20f4ad 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -130,6 +130,7 @@ EXC_VIRT_NONE(0x4100, 0x100)
 
 #ifdef CONFIG_PPC_P7_NAP
 EXC_COMMON_BEGIN(system_reset_idle_common)
+   mfspr   r12,SPRN_SRR1
b   pnv_powersave_wakeup
 #endif
 
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index c7edb374d1aa..2efb88da8ba3 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -108,7 +108,7 @@ core_idle_lock_held:
  * r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
  *- Requested PSSCR value in POWER9
  *
- * Address of idle handler to 'rfid' to in r4
+ * Address of idle handler to branch to in realmode in r4
  */
 pnv_powersave_common:
/* Use r3 to pass state nap/sleep/winkle */
@@ -118,14 +118,14 @@ pnv_powersave_common:
 * need to save PC, some CR bits and the NV GPRs,
 * but for now an interrupt frame will do.
 */
+   mtctr   r4
+
mflrr0
std r0,16(r1)
stdur1,-INT_FRAME_SIZE(r1)
std r0,_LINK(r1)
std r0,_NIP(r1)
 
-   mfmsr   r9
-
/* We haven't lost state ... yet */
li  r0,0
stb r0,PACA_NAPSTATELOST(r13)
@@ -135,7 +135,6 @@ pnv_powersave_common:
SAVE_NVGPRS(r1)
mfcrr5
std r5,_CCR(r1)
-   std r9,_MSR(r1)
std r1,PACAR1(r13)
 
/*
@@ -145,12 +144,8 @@ pnv_powersave_common:
 * the MMU context to the guest.
 */
LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
-   li  r6, MSR_RI
-   andcr6, r9, r6
-   mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
-   mtspr   SPRN_SRR0, r4
-   mtspr   SPRN_SRR1, r7
-   rfid
+   mtmsrd  r7,0
+   bctr
 
.globl pnv_enter_arch207_idle_mode
 pnv_enter_arch207_idle_mode:
@@ -302,11 +297,10 @@ _GLOBAL(power7_idle_insn)
b   pnv_powersave_common
 
 #define CHECK_HMI_INTERRUPT\
-   mfspr   r0,SPRN_SRR1;   \
 BEGIN_FTR_SECTION_NESTED(66);  \
-   rlwinm  r0,r0,45-31,0xf;  /* extract wake reason field (P8) */  \
+   rlwinm  r0,r12,45-31,0xf;  /* extract wake reason field (P8) */ \
 FTR_SECTION_ELSE_NESTED(66);   \
-   rlwinm  r0,r0,45-31,0xe;  /* P7 wake reason field is 3 bits */  \
+   rlwinm  r0,r12,45-31,0xe;  /* P7 wake reason field is 3 bits */ \
 ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);   \
cmpwi   r0,0xa; /* Hypervisor maintenance ? */  \
bne 20f;\
@@ -384,17 +378,17 @@ pnv_powersave_wakeup_mce:
 
/*
 * Now put the original SRR1 with SRR1_WAKEMCE_RESVD as the wake
-* reason into SRR1, which allows reuse of the system reset wakeup
+* reason into r12, which allows reuse of the system reset wakeup
 * code without being mistaken for another type of wakeup.
 */
-   orisr3,r3,SRR1_WAKEMCE_RESVD@h
-   mtspr   SPRN_SRR1,r3
+   orisr12,r3,SRR1_WAKEMCE_RESVD@h
 
b   pnv_powersave_wakeup
 
 /*
  * Called from reset vector for powersave wakeups.
  * cr3 - set to gt if waking up with partial/complete hypervisor state loss
+ * r12 - SRR1
  */
 .global pnv_powersave_wakeup
 pnv_powersave_wakeup:
@@ -404,8 +398,10 @@ BEGIN_FTR_SECTION
 BEGIN_FTR_SECTION_NESTED(70)
bl  power9_dd1_recover_paca
 END_FTR_SECTION_NESTED_IFSET(CPU_FTR_POWER9_DD1, 70)
+   ld  r1,PACAR1(r13)
bl  pnv_restore_hyp_resource_arch300
 FTR_SECTION_ELSE
+   ld  r1,PACAR1(r13)
bl  pnv_restore_hyp_resource_arch207
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 
@@ -425,7 +421,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 #endif
 
/* Return SRR1 from power7_nap() */
-   mfspr   r3,SPRN_SRR1
+   mr  r3,r12
blt cr3,pnv_wakeup_noloss
b   

[PATCH 07/14] powerpc/64s: idle branch to handler with virtual mode offset

2017-06-08 Thread Nicholas Piggin
Have the system reset idle wakeup handlers branched to in real mode
with the 0xc... kernel address applied. This allows simplifications of
avoiding rfid when switching to virtual mode in the wakeup handler.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 17 ++---
 arch/powerpc/kernel/exceptions-64s.S |  6 --
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 183d73b6ed99..0912e328e1d7 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -236,15 +236,26 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define kvmppc_interrupt kvmppc_interrupt_pr
 #endif
 
+/*
+ * Branch to label using its 0xC000 address. This gives the same real address
+ * when relocation is off, but allows mtmsr to set MSR[IR|DR]=1.
+ * This could set the 0xc bits for !RELOCATABLE rather than load KBASE for
+ * a slight optimisation.
+ */
+#define BRANCH_TO_C000(reg, label) \
+   __LOAD_HANDLER(reg, label); \
+   mtctr   reg;\
+   bctr
+
 #ifdef CONFIG_RELOCATABLE
 #define BRANCH_TO_COMMON(reg, label)   \
__LOAD_HANDLER(reg, label); \
mtctr   reg;\
bctr
 
-#define BRANCH_LINK_TO_FAR(label)  \
-   __LOAD_FAR_HANDLER(r12, label); \
-   mtctr   r12;\
+#define BRANCH_LINK_TO_FAR(reg, label) \
+   __LOAD_FAR_HANDLER(reg, label); \
+   mtctr   reg;\
bctrl
 
 /*
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 52ad0789fa89..153cd967554a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -99,7 +99,9 @@ EXC_VIRT_NONE(0x4000, 0x100)
 #ifdef CONFIG_PPC_P7_NAP
/*
 * If running native on arch 2.06 or later, check if we are waking up
-* from nap/sleep/winkle, and branch to idle handler.
+* from nap/sleep/winkle, and branch to idle handler. The idle wakeup
+* handler initially runs in real mode, but we branch to the 0xc000...
+* address so we can turn on relocation with mtmsr.
 */
 #define IDLETEST(n)\
BEGIN_FTR_SECTION ; \
@@ -107,7 +109,7 @@ EXC_VIRT_NONE(0x4000, 0x100)
rlwinm. r10,r10,47-31,30,31 ;   \
beq-1f ;\
cmpwi   cr3,r10,2 ; \
-   BRANCH_TO_COMMON(r10, system_reset_idle_common) ;   \
+   BRANCH_TO_C000(r10, system_reset_idle_common) ; \
 1: \
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 #else
-- 
2.11.0



[PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor

2017-06-08 Thread Nicholas Piggin
The __replay_interrupt code is branched to with bl, but the caller is
returned to directly with rfid from the interrupt.

Instead return to a return stub that returns to the caller with blr,
which should do better with the return predictor.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 9cc34547c3b6..52ad0789fa89 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1646,7 +1646,7 @@ _GLOBAL(__replay_interrupt)
 * we don't give a damn about, so we don't bother storing them.
 */
mfmsr   r12
-   mflrr11
+   LOAD_REG_ADDR(r11, __replay_interrupt_return)
mfcrr9
ori r12,r12,MSR_EE
cmpwi   r3,0x900
@@ -1664,6 +1664,7 @@ FTR_SECTION_ELSE
cmpwi   r3,0xa00
beq doorbell_super_common_msgclr
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
+__replay_interrupt_return:
blr
 
 /*
@@ -1673,7 +1674,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
 _GLOBAL(__replay_wakeup_interrupt)
extrdi  r3,r3,42,4  /* Get SRR1 wake reason in low bits */
mfmsr   r12
-   mflrr11
+   LOAD_REG_ADDR(r11, __replay_wakeup_interrupt_return)
mfcrr9
ori r12,r12,MSR_EE
cmpwi   r3,0x6
@@ -1692,4 +1693,5 @@ FTR_SECTION_ELSE
beq doorbell_super_common_msgclr
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
mtmsrd  r12,1
+__replay_wakeup_interrupt_return:
blr
-- 
2.11.0



[PATCH 05/14] powerpc/64s: msgclr when handling doorbell exceptions

2017-06-08 Thread Nicholas Piggin
msgsnd doorbell exceptions are cleared when the doorbell interrupt is
taken. However if a doorbell exception causes a system reset interrupt
wake from power saving state, the message is not cleared. Processing
the doorbell from the system reset interrupt requires msgclr to avoid
taking the exception again.

Testing this plus the previous wakup direct patch gives:

original wakeup direct msgclr
Different threads, same core:   315k/s   264k/s345k/s
Different cores:235k/s   242k/s242k/s

Net speedup is +10% for same core, and +3% for different core.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/dbell.h  | 13 +
 arch/powerpc/include/asm/ppc-opcode.h |  3 +++
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/exceptions-64s.S  | 27 +++
 4 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/dbell.h b/arch/powerpc/include/asm/dbell.h
index f70cbfe0ec04..9f2ae0d25e15 100644
--- a/arch/powerpc/include/asm/dbell.h
+++ b/arch/powerpc/include/asm/dbell.h
@@ -56,6 +56,19 @@ static inline void ppc_msgsync(void)
: : "i" (CPU_FTR_HVMODE|CPU_FTR_ARCH_300));
 }
 
+static inline void _ppc_msgclr(u32 msg)
+{
+   __asm__ __volatile__ (ASM_FTR_IFSET(PPC_MSGCLR(%1), PPC_MSGCLRP(%1), %0)
+   : : "i" (CPU_FTR_HVMODE), "r" (msg));
+}
+
+static inline void ppc_msgclr(enum ppc_dbell type)
+{
+   u32 msg = PPC_DBELL_TYPE(type);
+
+   _ppc_msgclr(msg);
+}
+
 #else /* CONFIG_PPC_BOOK3S */
 
 #define PPC_DBELL_MSGTYPE  PPC_DBELL
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 3b6bbf5a8683..4e2cf719c9b2 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -220,6 +220,7 @@
 #define PPC_INST_MSGCLR0x7c0001dc
 #define PPC_INST_MSGSYNC   0x7c0006ec
 #define PPC_INST_MSGSNDP   0x7c00011c
+#define PPC_INST_MSGCLRP   0x7c00015c
 #define PPC_INST_MTTMR 0x7c0003dc
 #define PPC_INST_NOP   0x6000
 #define PPC_INST_PASTE 0x7c20070d
@@ -409,6 +410,8 @@
___PPC_RB(b))
 #define PPC_MSGSNDP(b) stringify_in_c(.long PPC_INST_MSGSNDP | \
___PPC_RB(b))
+#define PPC_MSGCLRP(b) stringify_in_c(.long PPC_INST_MSGCLRP | \
+   ___PPC_RB(b))
 #define PPC_POPCNTB(a, s)  stringify_in_c(.long PPC_INST_POPCNTB | \
__PPC_RA(a) | __PPC_RS(s))
 #define PPC_POPCNTD(a, s)  stringify_in_c(.long PPC_INST_POPCNTD | \
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 709e23425317..bd56c78ba87a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -745,6 +745,7 @@ int main(void)
 #endif
 
DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+   DEFINE(PPC_DBELL_MSGTYPE, PPC_DBELL_MSGTYPE);
 
 #ifdef CONFIG_PPC_8xx
DEFINE(VIRT_IMMR_BASE, (u64)__fix_to_virt(FIX_IMMR_BASE));
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 3d75641c2566..9cc34547c3b6 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1612,6 +1612,25 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
b   1b
 
 /*
+ * When doorbell is triggered from system reset wakeup, the message is
+ * not cleared, so it would fire again when EE is enabled.
+ *
+ * When coming from local_irq_enable, there may be the same problem if
+ * we were hard disabled.
+ *
+ * Execute msgclr to clear pending exceptions before handling it.
+ */
+h_doorbell_common_msgclr:
+   LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
+   PPC_MSGCLR(3)
+   b   h_doorbell_common
+
+doorbell_super_common_msgclr:
+   LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
+   PPC_MSGCLRP(3)
+   b   doorbell_super_common
+
+/*
  * Called from arch_local_irq_enable when an interrupt needs
  * to be resent. r3 contains 0x500, 0x900, 0xa00 or 0xe80 to indicate
  * which kind of interrupt. MSR:EE is already off. We generate a
@@ -1636,14 +1655,14 @@ _GLOBAL(__replay_interrupt)
beq hardware_interrupt_common
 BEGIN_FTR_SECTION
cmpwi   r3,0xe80
-   beq h_doorbell_common
+   beq h_doorbell_common_msgclr
cmpwi   r3,0xea0
beq h_virt_irq_common
cmpwi   r3,0xe60
beq hmi_exception_common
 FTR_SECTION_ELSE
cmpwi   r3,0xa00
-   beq doorbell_super_common
+   beq doorbell_super_common_msgclr
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
blr
 
@@ -1663,14 +1682,14 @@ 

[PATCH 04/14] powerpc/64s: idle process interrupts from system reset wakeup

2017-06-08 Thread Nicholas Piggin
When the CPU wakes from low power state, it begins at the system reset
interrupt with the exception that caused the wakeup encoded in SRR1.

Today, powernv idle wakeup ignores the wakeup reason (except a special
case for HMI), and the regular interrupt corresponding to the
exception will fire after the idle wakeup exits.

Change this to replay the interrupt from the idle wakeup before
interrupts are hard-enabled.

Test on POWER8 of context_switch selftests benchmark with polling idle
disabled (e.g., always nap, giving cross-CPU IPIs) gives the following
results:

original wakeup direct
Different threads, same core:   315k/s   264k/s
Different cores:235k/s   242k/s

There is a slowdown for doorbell IPI (same core) case because system
reset wakeup does not clear the message and the doorbell interrupt
fires again needlessly.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/hw_irq.h |  1 +
 arch/powerpc/kernel/exceptions-64s.S  | 28 
 arch/powerpc/platforms/powernv/idle.c | 12 
 3 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index eba60416536e..0ef9a33c139f 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -32,6 +32,7 @@
 #ifndef __ASSEMBLY__
 
 extern void __replay_interrupt(unsigned int vector);
+extern void __replay_wakeup_interrupt(unsigned long srr1);
 
 extern void timer_interrupt(struct pt_regs *);
 extern void performance_monitor_exception(struct pt_regs *regs);
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 2f700a15bfa3..3d75641c2566 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1646,3 +1646,31 @@ FTR_SECTION_ELSE
beq doorbell_super_common
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
blr
+
+/*
+ * Similar to __replay_interrupt but called from cpu idle wakeup
+ * with SRR1 wake value in r3.
+ */
+_GLOBAL(__replay_wakeup_interrupt)
+   extrdi  r3,r3,42,4  /* Get SRR1 wake reason in low bits */
+   mfmsr   r12
+   mflrr11
+   mfcrr9
+   ori r12,r12,MSR_EE
+   cmpwi   r3,0x6
+   beq decrementer_common
+   cmpwi   r3,0x8
+   beq hardware_interrupt_common
+BEGIN_FTR_SECTION
+   cmpwi   r3,0x3
+   beq h_doorbell_common
+   cmpwi   r3,0x9
+   beq h_virt_irq_common
+   cmpwi   r3,0xa
+   beq hmi_exception_common
+FTR_SECTION_ELSE
+   cmpwi   r3,0x5
+   beq doorbell_super_common
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
+   mtmsrd  r12,1
+   blr
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 78b4755b7947..7c83a95f929e 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -272,8 +272,10 @@ static unsigned long __power7_idle_type(unsigned long type)
 
 void power7_idle_type(unsigned long type)
 {
-   __power7_idle_type(type);
-   __hard_irq_enable();
+   unsigned long srr1;
+
+   srr1 = __power7_idle_type(type);
+   __replay_wakeup_interrupt(srr1);
 }
 
 void power7_idle(void)
@@ -310,8 +312,10 @@ static unsigned long __power9_idle_type(unsigned long 
stop_psscr_val,
 void power9_idle_type(unsigned long stop_psscr_val,
  unsigned long stop_psscr_mask)
 {
-   __power9_idle_type(stop_psscr_val, stop_psscr_mask);
-   __hard_irq_enable();
+   unsigned long srr1;
+
+   srr1 = __power9_idle_type(stop_psscr_val, stop_psscr_mask);
+   __replay_wakeup_interrupt(srr1);
 }
 
 /*
-- 
2.11.0



[PATCH 03/14] powerpc/64s: idle provide a default idle for POWER9

2017-06-08 Thread Nicholas Piggin
Before the cpuidle driver is enabled, provide a default idle
function similarly to POWER7/8.

This should not have much effect, because the cpuidle driver
for powernv is mandatory, but if that changes we should have
a fallback.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/powernv/idle.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 8562916b8cf7..78b4755b7947 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -649,6 +649,8 @@ static int __init pnv_init_idle_states(void)
 
if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
ppc_md.power_save = power7_idle;
+   else if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
+   ppc_md.power_save = power9_idle;
 
 out:
return 0;
-- 
2.11.0



[PATCH 02/14] powerpc/64s: idle hotplug lazy-irq simplification

2017-06-08 Thread Nicholas Piggin
Rather than concern ourselves with any soft-mask logic in the CPU
hotplug handler, just hard disable interrupts. This ensures there
are no lazy-irqs pending, which means we can call directly to idle
instruction in order to sleep.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/powernv/idle.c | 23 +++
 arch/powerpc/platforms/powernv/smp.c  | 29 ++---
 2 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index b82f3be23de4..8562916b8cf7 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -325,25 +325,31 @@ void power9_idle(void)
 /*
  * pnv_cpu_offline: A function that puts the CPU into the deepest
  * available platform idle state on a CPU-Offline.
+ * interrupts hard disabled and no lazy irq pending.
  */
 unsigned long pnv_cpu_offline(unsigned int cpu)
 {
unsigned long srr1;
-
u32 idle_states = pnv_get_supported_cpuidle_states();
 
+   ppc64_runlatch_off();
+
if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
-   srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val,
-   pnv_deepest_stop_psscr_mask);
+   unsigned long psscr;
+
+   psscr = mfspr(SPRN_PSSCR);
+   psscr = (psscr & ~pnv_deepest_stop_psscr_mask) |
+   pnv_deepest_stop_psscr_val;
+   srr1 = power9_idle_stop(psscr);
+
} else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
-   srr1 = power7_idle_type(PNV_THREAD_WINKLE);
+   srr1 = power7_idle_insn(PNV_THREAD_WINKLE);
} else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
   (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
-   srr1 = power7_idle_type(PNV_THREAD_SLEEP);
+   srr1 = power7_idle_insn(PNV_THREAD_SLEEP);
} else if (idle_states & OPAL_PM_NAP_ENABLED) {
-   srr1 = power7_idle_type(PNV_THREAD_NAP);
+   srr1 = power7_idle_insn(PNV_THREAD_NAP);
} else {
-   ppc64_runlatch_off();
/* This is the fallback method. We emulate snooze */
while (!generic_check_cpu_restart(cpu)) {
HMT_low();
@@ -351,9 +357,10 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
}
srr1 = 0;
HMT_medium();
-   ppc64_runlatch_on();
}
 
+   ppc64_runlatch_on();
+
return srr1;
 }
 
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index f8752795decf..c04c87adad94 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -144,7 +144,14 @@ static void pnv_smp_cpu_kill_self(void)
unsigned long srr1, wmask;
 
/* Standard hot unplug procedure */
-   local_irq_disable();
+   /*
+* This hard disables local interurpts, ensuring we have no lazy
+* irqs pending.
+*/
+   WARN_ON(irqs_disabled());
+   hard_irq_disable();
+   WARN_ON(lazy_irq_pending());
+
idle_task_exit();
current->active_mm = NULL; /* for sanity */
cpu = smp_processor_id();
@@ -162,16 +169,6 @@ static void pnv_smp_cpu_kill_self(void)
 */
mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
 
-   /*
-* Hard-disable interrupts, and then clear irq_happened flags
-* that we can safely ignore while off-line, since they
-* are for things for which we do no processing when off-line
-* (or in the case of HMI, all the processing we need to do
-* is done in lower-level real-mode code).
-*/
-   hard_irq_disable();
-   local_paca->irq_happened &= ~(PACA_IRQ_DEC | PACA_IRQ_HMI);
-
while (!generic_check_cpu_restart(cpu)) {
/*
 * Clear IPI flag, since we don't handle IPIs while
@@ -184,6 +181,8 @@ static void pnv_smp_cpu_kill_self(void)
 
srr1 = pnv_cpu_offline(cpu);
 
+   WARN_ON(lazy_irq_pending());
+
/*
 * If the SRR1 value indicates that we woke up due to
 * an external interrupt, then clear the interrupt.
@@ -196,8 +195,7 @@ static void pnv_smp_cpu_kill_self(void)
 * contains 0.
 */
if (((srr1 & wmask) == SRR1_WAKEEE) ||
-   ((srr1 & wmask) == SRR1_WAKEHVI) ||
-   (local_paca->irq_happened & PACA_IRQ_EE)) {
+   ((srr1 & wmask) == SRR1_WAKEHVI)) {
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
if (xive_enabled())
xive_flush_interrupt();
@@ -209,14 +207,15 @@ static void pnv_smp_cpu_kill_self(void)

[PATCH 01/14] powerpc/64s: idle move soft interrupt mask logic into C code

2017-06-08 Thread Nicholas Piggin
This simplifies the asm and fixes irq-off tracing over sleep
instructions.

Also move powersave_nap check for POWER8 into C code, and move
PSSCR register value calculation for POWER9 into C.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/machdep.h   |  1 +
 arch/powerpc/include/asm/processor.h | 10 ++--
 arch/powerpc/kernel/idle_book3s.S| 84 ++-
 arch/powerpc/kernel/irq.c|  3 +-
 arch/powerpc/platforms/powernv/idle.c| 85 +++-
 arch/powerpc/platforms/powernv/smp.c |  2 -
 arch/powerpc/platforms/powernv/subcore.c |  3 +-
 drivers/cpuidle/cpuidle-powernv.c| 12 ++---
 8 files changed, 105 insertions(+), 95 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index f90b22c722e1..cd2fc1cc1cc7 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -226,6 +226,7 @@ struct machdep_calls {
 extern void e500_idle(void);
 extern void power4_idle(void);
 extern void power7_idle(void);
+extern void power9_idle(void);
 extern void ppc6xx_idle(void);
 extern void book3e_idle(void);
 
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 586c0b72a155..832775771bd3 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -501,11 +501,11 @@ extern unsigned long cpuidle_disable;
 enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
 extern int powersave_nap;  /* set if nap mode can be used in idle loop */
-extern unsigned long power7_nap(int check_irq);
-extern unsigned long power7_sleep(void);
-extern unsigned long power7_winkle(void);
-extern unsigned long power9_idle_stop(unsigned long stop_psscr_val,
- unsigned long stop_psscr_mask);
+extern unsigned long power7_idle_insn(unsigned long type); /* 
PNV_THREAD_NAP/etc*/
+extern void power7_idle_type(unsigned long type);
+extern unsigned long power9_idle_stop(unsigned long psscr_val);
+extern void power9_idle_type(unsigned long stop_psscr_val,
+ unsigned long stop_psscr_mask);
 
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 4898d676dcae..c7edb374d1aa 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -106,13 +106,9 @@ core_idle_lock_held:
 /*
  * Pass requested state in r3:
  * r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
- *- Requested STOP state in POWER9
+ *- Requested PSSCR value in POWER9
  *
- * To check IRQ_HAPPENED in r4
- * 0 - don't check
- * 1 - check
- *
- * Address to 'rfid' to in r5
+ * Address of idle handler to 'rfid' to in r4
  */
 pnv_powersave_common:
/* Use r3 to pass state nap/sleep/winkle */
@@ -128,30 +124,7 @@ pnv_powersave_common:
std r0,_LINK(r1)
std r0,_NIP(r1)
 
-   /* Hard disable interrupts */
-   mfmsr   r9
-   rldicl  r9,r9,48,1
-   rotldi  r9,r9,16
-   mtmsrd  r9,1/* hard-disable interrupts */
-
-   /* Check if something happened while soft-disabled */
-   lbz r0,PACAIRQHAPPENED(r13)
-   andi.   r0,r0,~PACA_IRQ_HARD_DIS@l
-   beq 1f
-   cmpwi   cr0,r4,0
-   beq 1f
-   addir1,r1,INT_FRAME_SIZE
-   ld  r0,16(r1)
-   li  r3,0/* Return 0 (no nap) */
-   mtlrr0
-   blr
-
-1: /* We mark irqs hard disabled as this is the state we'll
-* be in when returning and we need to tell arch_local_irq_restore()
-* about it
-*/
-   li  r0,PACA_IRQ_HARD_DIS
-   stb r0,PACAIRQHAPPENED(r13)
+   mfmsr   r9
 
/* We haven't lost state ... yet */
li  r0,0
@@ -160,8 +133,8 @@ pnv_powersave_common:
/* Continue saving state */
SAVE_GPR(2, r1)
SAVE_NVGPRS(r1)
-   mfcrr4
-   std r4,_CCR(r1)
+   mfcrr5
+   std r5,_CCR(r1)
std r9,_MSR(r1)
std r1,PACAR1(r13)
 
@@ -175,7 +148,7 @@ pnv_powersave_common:
li  r6, MSR_RI
andcr6, r9, r6
mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
-   mtspr   SPRN_SRR0, r5
+   mtspr   SPRN_SRR0, r4
mtspr   SPRN_SRR1, r7
rfid
 
@@ -319,35 +292,14 @@ lwarx_loop_stop:
 
IDLE_STATE_ENTER_SEQ_NORET(PPC_STOP)
 
-_GLOBAL(power7_idle)
+/*
+ * Entered with MSR[EE]=0 and no soft-masked interrupts pending.
+ * r3 contains desired idle state (PNV_THREAD_NAP/SLEEP/WINKLE).
+ */
+_GLOBAL(power7_idle_insn)
/* Now check if user or arch enabled NAP mode */
-   LOAD_REG_ADDRBASE(r3,powersave_nap)
-   lwz r4,ADDROFF(powersave_nap)(r3)
-   cmpwi   0,r4,0
-   beqlr
-   li  r3, 1
-

[PATCH 00/14] idle performance improvements

2017-06-08 Thread Nicholas Piggin
These patches improve performance of idle sleep and wake. The
first patches rework the lazy-irq handling of idle code a bit
to make it simpler first.

Any review would be welcome. I've tested this with some
performance and simple correctness tests on POWER8, POWER9,
and with KVM on POWER8, so it's about ready to review now
I hope.

Thanks,
Nick

Nicholas Piggin (14):
  powerpc/64s: idle move soft interrupt mask logic into C code
  powerpc/64s: idle hotplug lazy-irq simplification
  powerpc/64s: idle provide a default idle for POWER9
  powerpc/64s: idle process interrupts from system reset wakeup
  powerpc/64s: msgclr when handling doorbell exceptions
  powerpc/64s: interrupt replay balance the return branch predictor
  powerpc/64s: idle branch to handler with virtual mode offset
  powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
  powerpc/64s: idle hmi wakeup is unlikely
  powerpc/64s: cpuidle set polling before enabling irqs
  powerpc/64s: cpuidle read mostly for common globals
  powerpc/64s: cpuidle no memory barrier after break from idle
  powerpc/64: runlatch CTRL[RUN] set optimisation
  powerpc/64s: idle runlatch switch is done with MSR[EE]=0

 arch/powerpc/include/asm/dbell.h |  13 +++
 arch/powerpc/include/asm/exception-64s.h |  17 +++-
 arch/powerpc/include/asm/hw_irq.h|   1 +
 arch/powerpc/include/asm/machdep.h   |   1 +
 arch/powerpc/include/asm/ppc-opcode.h|   3 +
 arch/powerpc/include/asm/processor.h |  10 +--
 arch/powerpc/kernel/asm-offsets.c|   1 +
 arch/powerpc/kernel/exceptions-64s.S |  62 --
 arch/powerpc/kernel/idle_book3s.S| 137 +--
 arch/powerpc/kernel/irq.c|   3 +-
 arch/powerpc/kernel/process.c|  12 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   8 +-
 arch/powerpc/platforms/powernv/idle.c| 104 +++
 arch/powerpc/platforms/powernv/smp.c |  31 ---
 arch/powerpc/platforms/powernv/subcore.c |   3 +-
 drivers/cpuidle/cpuidle-powernv.c|  37 +
 drivers/cpuidle/cpuidle-pseries.c|  22 +++--
 17 files changed, 288 insertions(+), 177 deletions(-)

-- 
2.11.0



[PATCH 4/4] powerpc/64s: context switch avoid cpabort

2017-06-08 Thread Nicholas Piggin
The ISA v3.0B copy-paste facility only requires cpabort when switching
to a process that has foreign real addresses mapped (direct access to
accelerators), to clear a potential copy buffer filled by a previous
thread. There is no accelerator driver implemented yet, so cpabort can
be removed. It can be be re-added when a driver is implemented.

POWER9 DD1 requires the copy buffer to always be cleared on context
switch, but if accelerators are not in use, then an unpaired copy from
a dummy region is sufficient to clear data out of the copy buffer.

This increases context switch performance by about 5% on POWER9.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ppc-opcode.h |  8 
 arch/powerpc/kernel/entry_64.S|  9 -
 arch/powerpc/kernel/process.c | 27 ++-
 3 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 3a8d278e7421..3b6bbf5a8683 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -189,8 +189,7 @@
 /* sorted alphabetically */
 #define PPC_INST_BHRBE 0x7c00025c
 #define PPC_INST_CLRBHRB   0x7c00035c
-#define PPC_INST_COPY  0x7c00060c
-#define PPC_INST_COPY_FIRST0x7c20060c
+#define PPC_INST_COPY  0x7c20060c
 #define PPC_INST_CP_ABORT  0x7c00068c
 #define PPC_INST_DCBA  0x7c0005ec
 #define PPC_INST_DCBA_MASK 0xfc0007fe
@@ -223,8 +222,7 @@
 #define PPC_INST_MSGSNDP   0x7c00011c
 #define PPC_INST_MTTMR 0x7c0003dc
 #define PPC_INST_NOP   0x6000
-#define PPC_INST_PASTE 0x7c00070c
-#define PPC_INST_PASTE_LAST0x7c20070d
+#define PPC_INST_PASTE 0x7c20070d
 #define PPC_INST_POPCNTB   0x7cf4
 #define PPC_INST_POPCNTB_MASK  0xfc0007fe
 #define PPC_INST_POPCNTD   0x7c0003f4
@@ -392,6 +390,8 @@
 
 /* Deal with instructions that older assemblers aren't aware of */
 #definePPC_CP_ABORTstringify_in_c(.long PPC_INST_CP_ABORT)
+#definePPC_COPY(a, b)  stringify_in_c(.long PPC_INST_COPY | \
+   ___PPC_RA(a) | ___PPC_RB(b))
 #definePPC_DCBAL(a, b) stringify_in_c(.long PPC_INST_DCBAL | \
__PPC_RA(a) | __PPC_RB(b))
 #definePPC_DCBZL(a, b) stringify_in_c(.long PPC_INST_DCBZL | \
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index fb143859cc68..da9486e2fd89 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -536,15 +536,6 @@ _GLOBAL(_switch)
 * which contains larx/stcx, which will clear any reservation
 * of the task being switched.
 */
-
-BEGIN_FTR_SECTION
-/*
- * A cp_abort (copy paste abort) here ensures that when context switching, a
- * copy from one process can't leak into the paste of another.
- */
-   PPC_CP_ABORT
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
-
 #ifdef CONFIG_PPC_BOOK3S
 /* Cancel all explict user streams as they will have no use after context
  * switch and will stop the HW from creating streams itself
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 45faa9a32a01..6273b5d5baec 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1137,6 +1137,11 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
 #endif
 }
 
+#ifdef CONFIG_PPC_BOOK3S_64
+#define CP_SIZE 128
+static const u8 dummy_copy_buffer[CP_SIZE] __attribute__((aligned(CP_SIZE)));
+#endif
+
 struct task_struct *__switch_to(struct task_struct *prev,
struct task_struct *new)
 {
@@ -1226,8 +1231,28 @@ struct task_struct *__switch_to(struct task_struct *prev,
batch->active = 1;
}
 
-   if (current_thread_info()->task->thread.regs)
+   if (current_thread_info()->task->thread.regs) {
restore_math(current_thread_info()->task->thread.regs);
+
+   /*
+* The copy-paste buffer can only store into foreign real
+* addresses, so unprivileged processes can not see the
+* data or use it in any way unless they have foreign real
+* mappings. We don't have a VAS driver that allocates those
+* yet, so no cpabort is required.
+*/
+   if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
+   /*
+* DD1 allows paste into normal system memory, so we
+* do an unpaired copy here to clear the buffer and
+* prevent a covert channel being set up.
+*
+* cpabort is not used because it is quite expensive.
+ 

[PATCH 3/4] powerpc/64: context switch an hwsync instruction can be avoided

2017-06-08 Thread Nicholas Piggin
The hwsync in the context switch code to prevent MMIO access being
reordered from the point of view of a single process if it gets
migrated to a different CPU is not required because there is an hwsync
performed earlier in the context switch path.

Comment this so it's clear enough if anything changes on the scheduler
or the powerpc sides. Remove the hwsync from _switch.

This improves context switch performance by 2-3% on POWER8.

Cc: Peter Zijlstra 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/barrier.h |  5 +
 arch/powerpc/kernel/entry_64.S | 23 +--
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/barrier.h 
b/arch/powerpc/include/asm/barrier.h
index c0deafc212b8..25d42bd3f114 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -74,6 +74,11 @@ do { 
\
___p1;  \
 })
 
+/*
+ * This must resolve to hwsync on SMP for the context switch path.
+ * See _switch, and core scheduler context switch memory ordering
+ * comments.
+ */
 #define smp_mb__before_spinlock()   smp_mb()
 
 #include 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 273a35926534..fb143859cc68 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -512,13 +512,24 @@ _GLOBAL(_switch)
std r23,_CCR(r1)
std r1,KSP(r3)  /* Set old stack pointer */
 
-#ifdef CONFIG_SMP
-   /* We need a sync somewhere here to make sure that if the
-* previous task gets rescheduled on another CPU, it sees all
-* stores it has performed on this one.
+   /*
+* On SMP kernels, care must be taken because a task may be
+* scheduled off CPUx and on to CPUy. Memory ordering must be
+* considered.
+*
+* Cacheable stores on CPUx will be visible when the task is
+* scheduled on CPUy by virtue of the core scheduler barriers
+* (see "Notes on Program-Order guarantees on SMP systems." in
+* kernel/sched/core.c).
+*
+* Uncacheable stores in the case of involuntary preemption must
+* be taken care of. The smp_mb__before_spin_lock() in __schedule()
+* is implemented as hwsync on powerpc, which orders MMIO too. So
+* long as there is an hwsync in the context switch path, it will
+* be executed on the source CPU after the task has performed
+* all MMIO ops on that CPU, and on the destination CPU before the
+* task performs any MMIO ops there.
 */
-   sync
-#endif /* CONFIG_SMP */
 
/*
 * The kernel context switch path must contain a spin_lock,
-- 
2.11.0



[PATCH 2/4] powerpc/64: context switch avoid reservation-clearing instruction

2017-06-08 Thread Nicholas Piggin
There is no need to break reservation in _switch, because we are
guranteed that context switch path will include a larx/stcx.

Comment the guarantee and remove the reservation clear from _switch.

This is worth 1-2% in context switch performance.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S | 11 +++
 kernel/sched/core.c|  6 ++
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 91f9fdc2d027..273a35926534 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -521,15 +521,10 @@ _GLOBAL(_switch)
 #endif /* CONFIG_SMP */
 
/*
-* If we optimise away the clear of the reservation in system
-* calls because we know the CPU tracks the address of the
-* reservation, then we need to clear it here to cover the
-* case that the kernel context switch path has no larx
-* instructions.
+* The kernel context switch path must contain a spin_lock,
+* which contains larx/stcx, which will clear any reservation
+* of the task being switched.
 */
-BEGIN_FTR_SECTION
-   ldarx   r6,0,r1
-END_FTR_SECTION_IFSET(CPU_FTR_STCX_CHECKS_ADDRESS)
 
 BEGIN_FTR_SECTION
 /*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 803c3bc274c4..1f0688ad09d7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2875,6 +2875,12 @@ context_switch(struct rq *rq, struct task_struct *prev,
rq_unpin_lock(rq, rf);
spin_release(>lock.dep_map, 1, _THIS_IP_);
 
+   /*
+* Some architectures require that a spin lock is taken before
+* _switch. The rq_lock satisfies this condition. See powerpc
+* _switch for details.
+*/
+
/* Here we just switch the register state and the stack. */
switch_to(prev, next, prev);
barrier();
-- 
2.11.0



[PATCH 1/4] powerpc/64s: context switch leave interrupts hard enabled for radix

2017-06-08 Thread Nicholas Piggin
Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug")
hard disabled interrupts over the low level context switch, because
the SLB management can't cope with a PMU interrupt accesing the stack
in that window.

Radix based kernel mapping does not use the SLB so it does not require
interrupts hard disabled here.

This is worth 1-2% in context switch performance on POWER9.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S |  8 
 arch/powerpc/kernel/process.c  | 14 --
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6f70ea821a07..91f9fdc2d027 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -607,6 +607,14 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
   top of the kernel stack. */
addir7,r7,THREAD_SIZE-SWITCH_FRAME_SIZE
 
+   /*
+* PMU interrupts in radix may come in here. They will use r1, not
+* PACAKSAVE, so this stack switch will not cause a problem. They
+* will store to the process stack, which may then be migrated to
+* another CPU. However the rq lock release on this CPU paired with
+* the rq lock acquire on the new CPU before the stack becomes
+* active on the new CPU, will order those stores.
+*/
mr  r1,r8   /* start using new stack pointer */
std r7,PACAKSAVE(r13)
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 5cbb8b1faf7e..45faa9a32a01 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1199,12 +1199,14 @@ struct task_struct *__switch_to(struct task_struct 
*prev,
 
__switch_to_tm(prev, new);
 
-   /*
-* We can't take a PMU exception inside _switch() since there is a
-* window where the kernel stack SLB and the kernel stack are out
-* of sync. Hard disable here.
-*/
-   hard_irq_disable();
+   if (!radix_enabled()) {
+   /*
+* We can't take a PMU exception inside _switch() since there
+* is a window where the kernel stack SLB and the kernel stack
+* are out of sync. Hard disable here.
+*/
+   hard_irq_disable();
+   }
 
/*
 * Call restore_sprs() before calling _switch(). If we move it after
-- 
2.11.0



[PATCH 2/2] powerpc/64: syscall avoid restore_math call if possible

2017-06-08 Thread Nicholas Piggin
The syscall exit code that branches to restore_math is quite heavy on
Book3S, consisting of 2 mtmsr instructions. Threads that don't use both
FP and vector can get caught here if the kernel ever uses FP or vector.
Lazy-FP/vec context switching also trips this case.

So check for lazy FP and vector before switching RI for restore_math.
Move most of this case out of line.

For threads that do want to restore math registers, the MSR switches are
still suboptimal. Future direction may be to use a soft-RI bit to avoid
MSR switches in kernel (similar to soft-EE), but for now at least the
no-restore

POWER9 context switch rate increases by about 5% due to sched_yield(2)
return performance. I haven't constructed a test to measure the syscall
cost.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S | 62 +-
 arch/powerpc/kernel/process.c  |  4 +++
 2 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index bfbad08a1207..6f70ea821a07 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -210,27 +210,17 @@ system_call:  /* label this so stack 
traces look sane */
andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne-syscall_exit_work
 
-   andi.   r0,r8,MSR_FP
-   beq 2f
+   /* If MSR_FP and MSR_VEC are set in user msr, then no need to restore */
+   li  r7,MSR_FP
 #ifdef CONFIG_ALTIVEC
-   andis.  r0,r8,MSR_VEC@h
-   bne 3f
+   orisr7,r7,MSR_VEC@h
 #endif
-2: addir3,r1,STACK_FRAME_OVERHEAD
-#ifdef CONFIG_PPC_BOOK3S
-   li  r10,MSR_RI
-   mtmsrd  r10,1   /* Restore RI */
-#endif
-   bl  restore_math
-#ifdef CONFIG_PPC_BOOK3S
-   li  r11,0
-   mtmsrd  r11,1
-#endif
-   ld  r8,_MSR(r1)
-   ld  r3,RESULT(r1)
-   li  r11,-MAX_ERRNO
+   and r0,r8,r7
+   cmpdr0,r7
+   bne syscall_restore_math
+.Lsyscall_restore_math_cont:
 
-3: cmpld   r3,r11
+   cmpld   r3,r11
ld  r5,_CCR(r1)
bge-syscall_error
 .Lsyscall_error_cont:
@@ -263,7 +253,41 @@ syscall_error:
neg r3,r3
std r5,_CCR(r1)
b   .Lsyscall_error_cont
-   
+
+syscall_restore_math:
+   /*
+* Some initial tests from restore_math to avoid the heavyweight
+* C code entry and MSR manipulations.
+*/
+   LOAD_REG_IMMEDIATE(r0, MSR_TS_MASK)
+   and.r0,r0,r8
+   bne 1f
+
+   ld  r7,PACACURRENT(r13)
+   lbz r0,THREAD+THREAD_LOAD_FP(r7)
+#ifdef CONFIG_ALTIVEC
+   lbz r6,THREAD+THREAD_LOAD_VEC(r7)
+   add r0,r0,r6
+#endif
+   cmpdi   r0,0
+   beq .Lsyscall_restore_math_cont
+
+1: addir3,r1,STACK_FRAME_OVERHEAD
+#ifdef CONFIG_PPC_BOOK3S
+   li  r10,MSR_RI
+   mtmsrd  r10,1   /* Restore RI */
+#endif
+   bl  restore_math
+#ifdef CONFIG_PPC_BOOK3S
+   li  r11,0
+   mtmsrd  r11,1
+#endif
+   /* Restore volatiles, reload MSR from updated one */
+   ld  r8,_MSR(r1)
+   ld  r3,RESULT(r1)
+   li  r11,-MAX_ERRNO
+   b   .Lsyscall_restore_math_cont
+
 /* Traced system call support */
 syscall_dotrace:
bl  save_nvgprs
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index baae104b16c7..5cbb8b1faf7e 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -511,6 +511,10 @@ void restore_math(struct pt_regs *regs)
 {
unsigned long msr;
 
+   /*
+* Syscall exit makes a similar initial check before branching
+* to restore_math. Keep them in synch.
+*/
if (!msr_tm_active(regs->msr) &&
!current->thread.load_fp && !loadvec(current->thread))
return;
-- 
2.11.0



[PATCH 1/2] powerpc/64s: syscall optimize hypercall/syscall entry

2017-06-08 Thread Nicholas Piggin
After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from
guest to host"), a getppid() system call goes from 307 cycles to 358
cycles (+17%) on POWER8. This is due significantly to the scratch SPR
used by the hypercall check.

It turns out there are a some volatile registers common to both system
call and hypercall (in particular, r12, cr0, ctr), which can be used to
avoid the SPR and some other overheads. This brings getppid to 320 cycles
(+4%).

Testing hcall entry performance by running "sc 1" in guest userspace
before this patch is 854 cycles, afterwards is 826. Also a small win
there.

POWER9 syscall is improved by about the same amount, hcall not tested.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 134 +--
 1 file changed, 97 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ae418b85c17c..2f700a15bfa3 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -821,46 +821,80 @@ EXC_VIRT(trap_0b, 0x4b00, 0x100, 0xb00)
 TRAMP_KVM(PACA_EXGEN, 0xb00)
 EXC_COMMON(trap_0b_common, 0xb00, unknown_exception)
 
+/*
+ * system call / hypercall (0xc00, 0x4c00)
+ *
+ * The system call exception is invoked with "sc 0" and does not alter HV bit.
+ * There is support for kernel code to invoke system calls but there are no
+ * in-tree users.
+ *
+ * The hypercall is invoked with "sc 1" and sets HV=1.
+ *
+ * In HPT, sc 1 always goes to 0xc00 real mode. In RADIX, sc 1 can go to
+ * 0x4c00 virtual mode.
+ *
+ * Call convention:
+ *
+ * syscall register convention is in Documentation/powerpc/syscall64-abi.txt
+ *
+ * For hypercalls, the register convention is as follows:
+ * r0 volatile
+ * r1-2 nonvolatile
+ * r3 volatile parameter and return value for status
+ * r4-r10 volatile input and output value
+ * r11 volatile hypercall number and output value
+ * r12 volatile
+ * r13-r31 nonvolatile
+ * LR nonvolatile
+ * CTR volatile
+ * XER volatile
+ * CR0-1 CR5-7 volatile
+ * CR2-4 nonvolatile
+ * Other registers nonvolatile
+ *
+ * The intersection of volatile registers that don't contain possible
+ * inputs is: r12, cr0, xer, ctr. We may use these as scratch regs
+ * upon entry without saving.
+ */
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
-/*
- * If CONFIG_KVM_BOOK3S_64_HANDLER is set, save the PPR (on systems
- * that support it) before changing to HMT_MEDIUM. That allows the KVM
- * code to save that value into the guest state (it is the guest's PPR
- * value). Otherwise just change to HMT_MEDIUM as userspace has
- * already saved the PPR.
- */
+   /*
+* There is a little bit of juggling to get syscall and hcall
+* working well. Save r10 in ctr to be restored in case it is a
+* hcall.
+*
+* Userspace syscalls have already saved the PPR, hcalls must save
+* it before setting HMT_MEDIUM.
+*/
 #define SYSCALL_KVMTEST
\
-   SET_SCRATCH0(r13);  \
+   mr  r12,r13;\
GET_PACA(r13);  \
-   std r9,PACA_EXGEN+EX_R9(r13);   \
-   OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR); \
+   mtctr   r10;\
+   KVMTEST_PR(0xc00); /* uses r10, branch to do_kvm_0xc00_system_call */ \
HMT_MEDIUM; \
-   std r10,PACA_EXGEN+EX_R10(r13); \
-   OPT_SAVE_REG_TO_PACA(PACA_EXGEN+EX_PPR, r9, CPU_FTR_HAS_PPR);   \
-   mfcrr9; \
-   KVMTEST_PR(0xc00);  \
-   GET_SCRATCH0(r13)
+   mr  r9,r12; \
 
 #else
 #define SYSCALL_KVMTEST
\
-   HMT_MEDIUM
+   HMT_MEDIUM; \
+   mr  r9,r13; \
+   GET_PACA(r13);
 #endif

 #define LOAD_SYSCALL_HANDLER(reg)  \
__LOAD_HANDLER(reg, system_call_common)
 
-/* Syscall routine is used twice, in reloc-off and reloc-on paths */
-#define SYSCALL_PSERIES_1  \
+#define SYSCALL_FASTENDIAN_TEST\
 BEGIN_FTR_SECTION  \
cmpdi   r0,0x1ebe ; \
beq-1f ;\
 END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)   

Re: [PATCH v2 1/4] powerpc/kprobes: Pause function_graph tracing during jprobes handling

2017-06-08 Thread Steven Rostedt
On Thu, 8 Jun 2017 22:43:54 +0900
Masami Hiramatsu  wrote:

> On Thu,  1 Jun 2017 16:18:15 +0530
> "Naveen N. Rao"  wrote:
> 
> > This fixes a crash when function_graph and jprobes are used together.
> > This is essentially commit 237d28db036e ("ftrace/jprobes/x86: Fix
> > conflict between jprobes and function graph tracing"), but for powerpc.
> > 
> > Jprobes breaks function_graph tracing since the jprobe hook needs to use
> > jprobe_return(), which never returns back to the hook, but instead to
> > the original jprobe'd function. The solution is to momentarily pause
> > function_graph tracing before invoking the jprobe hook and re-enable it
> > when returning back to the original jprobe'd function.
> > 
> > Fixes: 6794c78243bfd ("powerpc64: port of the function graph tracer")
> > Signed-off-by: Naveen N. Rao   
> 
> Acked-by: Masami Hiramatsu 

For what it's worth...

Acked-by: Steven Rostedt (VMware) 

-- Steve

> 
> Thanks!
> 
> > ---
> > v2: No changes since v1.
> > 
> >  arch/powerpc/kernel/kprobes.c | 11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> > index e1fb95385a9e..0554d6e66194 100644
> > --- a/arch/powerpc/kernel/kprobes.c
> > +++ b/arch/powerpc/kernel/kprobes.c
> > @@ -610,6 +610,15 @@ int setjmp_pre_handler(struct kprobe *p, struct 
> > pt_regs *regs)
> > regs->gpr[2] = (unsigned long)(((func_descr_t *)jp->entry)->toc);
> >  #endif
> >  
> > +   /*
> > +* jprobes use jprobe_return() which skips the normal return
> > +* path of the function, and this messes up the accounting of the
> > +* function graph tracer.
> > +*
> > +* Pause function graph tracing while performing the jprobe function.
> > +*/
> > +   pause_graph_tracing();
> > +
> > return 1;
> >  }
> >  NOKPROBE_SYMBOL(setjmp_pre_handler);
> > @@ -635,6 +644,8 @@ int longjmp_break_handler(struct kprobe *p, struct 
> > pt_regs *regs)
> >  * saved regs...
> >  */
> > memcpy(regs, >jprobe_saved_regs, sizeof(struct pt_regs));
> > +   /* It's OK to start function graph tracing again */
> > +   unpause_graph_tracing();
> > preempt_enable_no_resched();
> > return 1;
> >  }
> > -- 
> > 2.12.2
> >   
> 
> 



Re: [PATCH] include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH

2017-06-08 Thread Murilo Opsfelder Araújo
On 06/08/2017 10:10 AM, Alexey Kardashevskiy wrote:
[...]
> The config you attached in the first mail has CONFIG_VFIO_SPAPR_EEH=m, here
> is my confusion. The config from the link below does not have KVM_BOOK3S_64
> which selects SPAPR_TCE_IOMMU and which in turn selects VFIO_IOMMU_SPAPR_TCE.
> 
> So
> https://github.com/0day-ci/linux/commit/36ed1ddb05e132aa3cfbb610f0f8402a0774da12
> looks correct.

It wasn't me that attached the .config.gz, it was this 0dayci robot.

When CONFIG_VFIO_SPAPR_EEH=m, there is no definition of it in autoconf.h, only
CONFIG_VFIO_SPAPR_EEH_MODULE is defined:

$ grep 'VFIO_SPAPR_EEH' ./include/generated/autoconf.h
#define CONFIG_VFIO_SPAPR_EEH_MODULE 1

In this case, `#ifdef CONFIG_VFIO_SPAPR_EEH` will be false. That's why my v1
patch failed with the 0dayci .config and robot reported back.

This was addressed in my v2 patch using the IS_ENABLED() macro, which checks for
both CONFIG_ and CONFIG__MODULE definitions.

-- 
Murilo


Re: [PATCH 25/44] arm: implement ->mapping_error

2017-06-08 Thread Russell King - ARM Linux
BOn Thu, Jun 08, 2017 at 03:25:50PM +0200, Christoph Hellwig wrote:
> +static int dmabounce_mapping_error(struct device *dev, dma_addr_t dma_addr)
> +{
> + if (dev->archdata.dmabounce)
> + return 0;

I'm not convinced that we need this check here:

dev->archdata.dmabounce = device_info;
set_dma_ops(dev, _ops);

There shouldn't be any chance of dev->archdata.dmabounce being NULL if
the dmabounce_ops has been set as the current device DMA ops.  So I
think that test can be killed.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCH 28/44] sparc: remove arch specific dma_supported implementations

2017-06-08 Thread David Miller
From: Christoph Hellwig 
Date: Thu,  8 Jun 2017 15:25:53 +0200

> Usually dma_supported decisions are done by the dma_map_ops instance.
> Switch sparc to that model by providing a ->dma_supported instance for
> sbus that always returns false, and implementations tailored to the sun4u
> and sun4v cases for sparc64, and leave it unimplemented for PCI on
> sparc32, which means always supported.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 


Re: [PATCH 20/44] sparc: implement ->mapping_error

2017-06-08 Thread David Miller
From: Christoph Hellwig 
Date: Thu,  8 Jun 2017 15:25:45 +0200

> DMA_ERROR_CODE is going to go away, so don't rely on it.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 


Re: [PATCH 28/44] sparc: remove arch specific dma_supported implementations

2017-06-08 Thread Julian Calaby
Hi Christoph,

On Thu, Jun 8, 2017 at 11:25 PM, Christoph Hellwig  wrote:
> Usually dma_supported decisions are done by the dma_map_ops instance.
> Switch sparc to that model by providing a ->dma_supported instance for
> sbus that always returns false, and implementations tailored to the sun4u
> and sun4v cases for sparc64, and leave it unimplemented for PCI on
> sparc32, which means always supported.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/sparc/include/asm/dma-mapping.h |  3 ---
>  arch/sparc/kernel/iommu.c| 40 
> +++-
>  arch/sparc/kernel/ioport.c   | 22 ++--
>  arch/sparc/kernel/pci_sun4v.c| 17 +++
>  4 files changed, 39 insertions(+), 43 deletions(-)
>
> diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
> index dd081d557609..12894f259bea 100644
> --- a/arch/sparc/kernel/ioport.c
> +++ b/arch/sparc/kernel/ioport.c
> @@ -401,6 +401,11 @@ static void sbus_sync_sg_for_device(struct device *dev, 
> struct scatterlist *sg,
> BUG();
>  }
>
> +static int sbus_dma_supported(struct device *dev, u64 mask)
> +{
> +   return 0;
> +}
> +

I'm guessing there's a few places that have DMA ops but DMA isn't
actually supported. Why not have a common method for this, maybe
"dma_not_supported"?

>  static const struct dma_map_ops sbus_dma_ops = {
> .alloc  = sbus_alloc_coherent,
> .free   = sbus_free_coherent,
> @@ -410,6 +415,7 @@ static const struct dma_map_ops sbus_dma_ops = {
> .unmap_sg   = sbus_unmap_sg,
> .sync_sg_for_cpu= sbus_sync_sg_for_cpu,
> .sync_sg_for_device = sbus_sync_sg_for_device,
> +   .dma_supported  = sbus_dma_supported,
>  };
>
>  static int __init sparc_register_ioport(void)

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH 27/44] sparc: remove leon_dma_ops

2017-06-08 Thread David Miller
From: Christoph Hellwig 
Date: Thu,  8 Jun 2017 15:25:52 +0200

> We can just use pci32_dma_ops.
> 
> Btw, given that leon is 32-bit and appears to be PCI based, do even need
> the special case for it in get_arch_dma_ops at all?

I would need to defer to the LEON developers on that, but they haven't
been very actively lately so whether you'll get a response or not is
hard to predict.

> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 


Re: [PATCH 02/44] ibmveth: properly unwind on init errors

2017-06-08 Thread David Miller
From: Christoph Hellwig 
Date: Thu,  8 Jun 2017 15:25:27 +0200

> That way the driver doesn't have to rely on DMA_ERROR_CODE, which
> is not a public API and going away.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 


Re: clean up and modularize arch dma_mapping interface

2017-06-08 Thread David Miller
From: Christoph Hellwig 
Date: Thu,  8 Jun 2017 15:25:25 +0200

> for a while we have a generic implementation of the dma mapping routines
> that call into per-arch or per-device operations.  But right now there
> still are various bits in the interfaces where don't clearly operate
> on these ops.  This series tries to clean up a lot of those (but not all
> yet, but the series is big enough).  It gets rid of the DMA_ERROR_CODE
> way of signaling failures of the mapping routines from the
> implementations to the generic code (and cleans up various drivers that
> were incorrectly using it), and gets rid of the ->set_dma_mask routine
> in favor of relying on the ->dma_capable method that can be used in
> the same way, but which requires less code duplication.

There is unlikely to be conflicts for the sparc and net changes, so I
will simply ACK them.

Thanks Christoph.


Re: [PATCH 16/44] arm64: remove DMA_ERROR_CODE

2017-06-08 Thread Robin Murphy
On 08/06/17 14:25, Christoph Hellwig wrote:
> The dma alloc interface returns an error by return NULL, and the
> mapping interfaces rely on the mapping_error method, which the dummy
> ops already implement correctly.
> 
> Thus remove the DMA_ERROR_CODE define.

Reviewed-by: Robin Murphy 

> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm64/include/asm/dma-mapping.h | 1 -
>  arch/arm64/mm/dma-mapping.c  | 3 +--
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/dma-mapping.h 
> b/arch/arm64/include/asm/dma-mapping.h
> index 5392dbeffa45..cf8fc8f05580 100644
> --- a/arch/arm64/include/asm/dma-mapping.h
> +++ b/arch/arm64/include/asm/dma-mapping.h
> @@ -24,7 +24,6 @@
>  #include 
>  #include 
>  
> -#define DMA_ERROR_CODE   (~(dma_addr_t)0)
>  extern const struct dma_map_ops dummy_dma_ops;
>  
>  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type 
> *bus)
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 3216e098c058..147fbb907a2f 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -184,7 +184,6 @@ static void *__dma_alloc(struct device *dev, size_t size,
>  no_map:
>   __dma_free_coherent(dev, size, ptr, *dma_handle, attrs);
>  no_mem:
> - *dma_handle = DMA_ERROR_CODE;
>   return NULL;
>  }
>  
> @@ -487,7 +486,7 @@ static dma_addr_t __dummy_map_page(struct device *dev, 
> struct page *page,
>  enum dma_data_direction dir,
>  unsigned long attrs)
>  {
> - return DMA_ERROR_CODE;
> + return 0;
>  }
>  
>  static void __dummy_unmap_page(struct device *dev, dma_addr_t dev_addr,
> 



Re: [PATCH v2 3/4] powerpc/kprobes_on_ftrace: Skip livepatch_handler() for jprobes

2017-06-08 Thread Masami Hiramatsu
On Thu,  1 Jun 2017 16:18:17 +0530
"Naveen N. Rao"  wrote:

> ftrace_caller() depends on a modified regs->nip to detect if a certain
> function has been livepatched. However, with KPROBES_ON_FTRACE, it is
> possible for regs->nip to have been modified by the kprobes pre_handler
> (jprobes, for instance). In this case, we do not want to invoke the
> livepatch_handler so as not to consume the livepatch stack.
> 
> To distinguish between the two (kprobes and livepatch), we check if
> there is an active kprobe on the current function. If there is, then we
> know for sure that it must have modified the NIP as we don't support
> livepatching a kprobe'd function. In this case, we simply skip the
> livepatch_handler and branch to the new NIP. Otherwise, the
> livepatch_handler is invoked.

OK, this logic seems good to me, but I weould like to leave it for
PPC64 maintainer too.

Reviewed-by: Masami Hiramatsu 

Thanks!
> 
> Fixes: ead514d5fb30a ("powerpc/kprobes: Add support for
> KPROBES_ON_FTRACE")
> Signed-off-by: Naveen N. Rao 
> ---
> v2: Changed to use current_kprobe->addr to determine jprobe vs.
> livepatch.
> 
>  arch/powerpc/include/asm/kprobes.h |  1 +
>  arch/powerpc/kernel/kprobes.c  |  6 +
>  arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 32 
> ++
>  3 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kprobes.h 
> b/arch/powerpc/include/asm/kprobes.h
> index a83821f33ea3..8814a7249ceb 100644
> --- a/arch/powerpc/include/asm/kprobes.h
> +++ b/arch/powerpc/include/asm/kprobes.h
> @@ -103,6 +103,7 @@ extern int kprobe_exceptions_notify(struct notifier_block 
> *self,
>  extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
>  extern int kprobe_handler(struct pt_regs *regs);
>  extern int kprobe_post_handler(struct pt_regs *regs);
> +extern int is_current_kprobe_addr(unsigned long addr);
>  #ifdef CONFIG_KPROBES_ON_FTRACE
>  extern int skip_singlestep(struct kprobe *p, struct pt_regs *regs,
>  struct kprobe_ctlblk *kcb);
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 0554d6e66194..ec9d94c82fbd 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -43,6 +43,12 @@ DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
>  
>  struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
>  
> +int is_current_kprobe_addr(unsigned long addr)
> +{
> + struct kprobe *p = kprobe_running();
> + return (p && (unsigned long)p->addr == addr) ? 1 : 0;
> +}
> +
>  bool arch_within_kprobe_blacklist(unsigned long addr)
>  {
>   return  (addr >= (unsigned long)__kprobes_text_start &&
> diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S 
> b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> index fa0921410fa4..e6837e85ec28 100644
> --- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> +++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> @@ -99,13 +99,37 @@ ftrace_call:
>   bl  ftrace_stub
>   nop
>  
> - /* Load ctr with the possibly modified NIP */
> - ld  r3, _NIP(r1)
> - mtctr   r3
> + /* Load the possibly modified NIP */
> + ld  r15, _NIP(r1)
> +
>  #ifdef CONFIG_LIVEPATCH
> - cmpdr14,r3  /* has NIP been altered? */
> + cmpdr14, r15/* has NIP been altered? */
> +#endif
> +
> +#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_KPROBES_ON_FTRACE)
> + beq 1f
> +
> + /* Check if there is an active kprobe on us */
> + subir3, r14, 4
> + bl  is_current_kprobe_addr
> + nop
> +
> + /*
> +  * If r3 == 1, then this is a kprobe/jprobe.
> +  * else, this is livepatched function.
> +  *
> +  * The subsequent conditional branch for livepatch_handler
> +  * will use the result of this compare. For kprobe/jprobe,
> +  * we just need to branch to the new NIP, so nothing special
> +  * is needed.
> +  */
> + cmpdi   r3, 1
> +1:
>  #endif
>  
> + /* Load CTR with the possibly modified NIP */
> + mtctr   r15
> +
>   /* Restore gprs */
>   REST_GPR(0,r1)
>   REST_10GPRS(2,r1)
> -- 
> 2.12.2
> 


-- 
Masami Hiramatsu 


Re: [PATCH 06/44] iommu/dma: don't rely on DMA_ERROR_CODE

2017-06-08 Thread Robin Murphy
Hi Christoph,

On 08/06/17 14:25, Christoph Hellwig wrote:
> DMA_ERROR_CODE is not a public API and will go away soon.  dma dma-iommu
> driver already implements a proper ->mapping_error method, so it's only
> using the value internally.  Add a new local define using the value
> that arm64 which is the only current user of dma-iommu.

It would be fine to just use 0, since dma-iommu already makes sure that
that will never be allocated for a valid DMA address.

Otherwise, looks good!

Robin.

> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/dma-iommu.c | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 62618e77bedc..638aea814b94 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -31,6 +31,8 @@
>  #include 
>  #include 
>  
> +#define IOMMU_MAPPING_ERROR  (~(dma_addr_t)0)
> +
>  struct iommu_dma_msi_page {
>   struct list_headlist;
>   dma_addr_t  iova;
> @@ -500,7 +502,7 @@ void iommu_dma_free(struct device *dev, struct page 
> **pages, size_t size,
>  {
>   __iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size);
>   __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
> - *handle = DMA_ERROR_CODE;
> + *handle = IOMMU_MAPPING_ERROR;
>  }
>  
>  /**
> @@ -533,7 +535,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
> size, gfp_t gfp,
>   dma_addr_t iova;
>   unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
>  
> - *handle = DMA_ERROR_CODE;
> + *handle = IOMMU_MAPPING_ERROR;
>  
>   min_size = alloc_sizes & -alloc_sizes;
>   if (min_size < PAGE_SIZE) {
> @@ -627,11 +629,11 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
> phys_addr_t phys,
>  
>   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
>   if (!iova)
> - return DMA_ERROR_CODE;
> + return IOMMU_MAPPING_ERROR;
>  
>   if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
>   iommu_dma_free_iova(cookie, iova, size);
> - return DMA_ERROR_CODE;
> + return IOMMU_MAPPING_ERROR;
>   }
>   return iova + iova_off;
>  }
> @@ -671,7 +673,7 @@ static int __finalise_sg(struct device *dev, struct 
> scatterlist *sg, int nents,
>  
>   s->offset += s_iova_off;
>   s->length = s_length;
> - sg_dma_address(s) = DMA_ERROR_CODE;
> + sg_dma_address(s) = IOMMU_MAPPING_ERROR;
>   sg_dma_len(s) = 0;
>  
>   /*
> @@ -714,11 +716,11 @@ static void __invalidate_sg(struct scatterlist *sg, int 
> nents)
>   int i;
>  
>   for_each_sg(sg, s, nents, i) {
> - if (sg_dma_address(s) != DMA_ERROR_CODE)
> + if (sg_dma_address(s) != IOMMU_MAPPING_ERROR)
>   s->offset += sg_dma_address(s);
>   if (sg_dma_len(s))
>   s->length = sg_dma_len(s);
> - sg_dma_address(s) = DMA_ERROR_CODE;
> + sg_dma_address(s) = IOMMU_MAPPING_ERROR;
>   sg_dma_len(s) = 0;
>   }
>  }
> @@ -836,7 +838,7 @@ void iommu_dma_unmap_resource(struct device *dev, 
> dma_addr_t handle,
>  
>  int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
>  {
> - return dma_addr == DMA_ERROR_CODE;
> + return dma_addr == IOMMU_MAPPING_ERROR;
>  }
>  
>  static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
> 



Re: [RFC][PATCH 5/5] powerpc: Remove SYNC from _switch

2017-06-08 Thread Peter Zijlstra
On Thu, Jun 08, 2017 at 11:18:13PM +1000, Nicholas Piggin wrote:
> I guess I see your point... okay, will constrain the comment to powerpc
> context switch and primitives code. Any fundamental change to such
> scheduler barriers I guess would require at least a glance over arch
> switch code :)

Just so.

> My plan is to send the powerpc sync removal patch for hopefully 4.13
> merge. I'm pretty sure it will be equally happy with your patches.
> Unless you can see any problems with it? More eyes would be welcome.

Feel free to Cc me. I don't see a problem with removing that SYNC now,
I'll rebase my patches on top.


Re: [PATCH v2 1/4] powerpc/kprobes: Pause function_graph tracing during jprobes handling

2017-06-08 Thread Masami Hiramatsu
On Thu,  1 Jun 2017 16:18:15 +0530
"Naveen N. Rao"  wrote:

> This fixes a crash when function_graph and jprobes are used together.
> This is essentially commit 237d28db036e ("ftrace/jprobes/x86: Fix
> conflict between jprobes and function graph tracing"), but for powerpc.
> 
> Jprobes breaks function_graph tracing since the jprobe hook needs to use
> jprobe_return(), which never returns back to the hook, but instead to
> the original jprobe'd function. The solution is to momentarily pause
> function_graph tracing before invoking the jprobe hook and re-enable it
> when returning back to the original jprobe'd function.
> 
> Fixes: 6794c78243bfd ("powerpc64: port of the function graph tracer")
> Signed-off-by: Naveen N. Rao 

Acked-by: Masami Hiramatsu 

Thanks!

> ---
> v2: No changes since v1.
> 
>  arch/powerpc/kernel/kprobes.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index e1fb95385a9e..0554d6e66194 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -610,6 +610,15 @@ int setjmp_pre_handler(struct kprobe *p, struct pt_regs 
> *regs)
>   regs->gpr[2] = (unsigned long)(((func_descr_t *)jp->entry)->toc);
>  #endif
>  
> + /*
> +  * jprobes use jprobe_return() which skips the normal return
> +  * path of the function, and this messes up the accounting of the
> +  * function graph tracer.
> +  *
> +  * Pause function graph tracing while performing the jprobe function.
> +  */
> + pause_graph_tracing();
> +
>   return 1;
>  }
>  NOKPROBE_SYMBOL(setjmp_pre_handler);
> @@ -635,6 +644,8 @@ int longjmp_break_handler(struct kprobe *p, struct 
> pt_regs *regs)
>* saved regs...
>*/
>   memcpy(regs, >jprobe_saved_regs, sizeof(struct pt_regs));
> + /* It's OK to start function graph tracing again */
> + unpause_graph_tracing();
>   preempt_enable_no_resched();
>   return 1;
>  }
> -- 
> 2.12.2
> 


-- 
Masami Hiramatsu 


Re: [PATCH 11/16] powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type

2017-06-08 Thread Greg Kroah-Hartman
On Thu, Jun 08, 2017 at 11:12:10PM +1000, Michael Ellerman wrote:
> Greg Kroah-Hartman  writes:
> 
> > The dev_attrs field has long been "depreciated" and is finally being
> > removed, so move the driver to use the "correct" dev_groups field
> > instead for struct bus_type.
> >
> > Cc: Benjamin Herrenschmidt 
> > Cc: Paul Mackerras 
> > Cc: Michael Ellerman 
> > Cc: Vineet Gupta 
> > Cc: Bart Van Assche 
> > Cc: Robin Murphy 
> > Cc: Joerg Roedel 
> > Cc: Johan Hovold 
> > Cc: Alexey Kardashevskiy 
> > Cc: Krzysztof Kozlowski 
> > Cc: 
> > Signed-off-by: Greg Kroah-Hartman 
> > ---
> >  arch/powerpc/platforms/pseries/vio.c | 37 
> > +---
> >  1 file changed, 22 insertions(+), 15 deletions(-)
> 
> This one needed a bit more work to get building, the incremental diff is
> below. We need a forward declaration of name, devspec and modalias,
> which is a bit weird, but that's how the code is currently structured.
> And there's dev and bus attributes with the same name, so that needed an
> added "bus".
> 
> I booted v2 of patch 10 and this one and everything looks identical to
> upstream.

Ah, many thanks, this was on my todo list to fix up today.

But you renamed the sysfs files when you added "bus" to the function
names, are you sure you want to do that?  I don't mind, but if you
happen to have userspace tools that look at those files, they just broke
:(

thanks,

greg k-h


[PATCH 44/44] powerpc: merge __dma_set_mask into dma_set_mask

2017-06-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/dma-mapping.h |  1 -
 arch/powerpc/kernel/dma.c  | 13 -
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 73aedbe6c977..eaece3d3e225 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -112,7 +112,6 @@ static inline void set_dma_offset(struct device *dev, 
dma_addr_t off)
 #define HAVE_ARCH_DMA_SET_MASK 1
 extern int dma_set_mask(struct device *dev, u64 dma_mask);
 
-extern int __dma_set_mask(struct device *dev, u64 dma_mask);
 extern u64 __dma_get_required_mask(struct device *dev);
 
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 466c9f07b288..4194db10 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -314,14 +314,6 @@ EXPORT_SYMBOL(dma_set_coherent_mask);
 
 #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
 
-int __dma_set_mask(struct device *dev, u64 dma_mask)
-{
-   if (!dev->dma_mask || !dma_supported(dev, dma_mask))
-   return -EIO;
-   *dev->dma_mask = dma_mask;
-   return 0;
-}
-
 int dma_set_mask(struct device *dev, u64 dma_mask)
 {
if (ppc_md.dma_set_mask)
@@ -334,7 +326,10 @@ int dma_set_mask(struct device *dev, u64 dma_mask)
return phb->controller_ops.dma_set_mask(pdev, dma_mask);
}
 
-   return __dma_set_mask(dev, dma_mask);
+   if (!dev->dma_mask || !dma_supported(dev, dma_mask))
+   return -EIO;
+   *dev->dma_mask = dma_mask;
+   return 0;
 }
 EXPORT_SYMBOL(dma_set_mask);
 
-- 
2.11.0



[PATCH 43/44] dma-mapping: remove the set_dma_mask method

2017-06-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/dma.c   | 4 
 include/linux/dma-mapping.h | 6 --
 2 files changed, 10 deletions(-)

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 41c749586bd2..466c9f07b288 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -316,10 +316,6 @@ EXPORT_SYMBOL(dma_set_coherent_mask);
 
 int __dma_set_mask(struct device *dev, u64 dma_mask)
 {
-   const struct dma_map_ops *dma_ops = get_dma_ops(dev);
-
-   if ((dma_ops != NULL) && (dma_ops->set_dma_mask != NULL))
-   return dma_ops->set_dma_mask(dev, dma_mask);
if (!dev->dma_mask || !dma_supported(dev, dma_mask))
return -EIO;
*dev->dma_mask = dma_mask;
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 3e5908656226..527f2ed8c645 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -127,7 +127,6 @@ struct dma_map_ops {
   enum dma_data_direction dir);
int (*mapping_error)(struct device *dev, dma_addr_t dma_addr);
int (*dma_supported)(struct device *dev, u64 mask);
-   int (*set_dma_mask)(struct device *dev, u64 mask);
 #ifdef ARCH_HAS_DMA_GET_REQUIRED_MASK
u64 (*get_required_mask)(struct device *dev);
 #endif
@@ -563,11 +562,6 @@ static inline int dma_supported(struct device *dev, u64 
mask)
 #ifndef HAVE_ARCH_DMA_SET_MASK
 static inline int dma_set_mask(struct device *dev, u64 mask)
 {
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-
-   if (ops->set_dma_mask)
-   return ops->set_dma_mask(dev, mask);
-
if (!dev->dma_mask || !dma_supported(dev, mask))
return -EIO;
*dev->dma_mask = mask;
-- 
2.11.0



[PATCH 42/44] powerpc/cell: use the dma_supported method for ops switching

2017-06-08 Thread Christoph Hellwig
Besides removing the last instance of the set_dma_mask method this also
reduced the code duplication.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/platforms/cell/iommu.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/cell/iommu.c 
b/arch/powerpc/platforms/cell/iommu.c
index 497bfbdbd967..29d4f96ed33e 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -644,20 +644,14 @@ static void dma_fixed_unmap_sg(struct device *dev, struct 
scatterlist *sg,
   direction, attrs);
 }
 
-static int dma_fixed_dma_supported(struct device *dev, u64 mask)
-{
-   return mask == DMA_BIT_MASK(64);
-}
-
-static int dma_set_mask_and_switch(struct device *dev, u64 dma_mask);
+static int dma_suported_and_switch(struct device *dev, u64 dma_mask);
 
 static const struct dma_map_ops dma_iommu_fixed_ops = {
.alloc  = dma_fixed_alloc_coherent,
.free   = dma_fixed_free_coherent,
.map_sg = dma_fixed_map_sg,
.unmap_sg   = dma_fixed_unmap_sg,
-   .dma_supported  = dma_fixed_dma_supported,
-   .set_dma_mask   = dma_set_mask_and_switch,
+   .dma_supported  = dma_suported_and_switch,
.map_page   = dma_fixed_map_page,
.unmap_page = dma_fixed_unmap_page,
.mapping_error  = dma_iommu_mapping_error,
@@ -952,11 +946,8 @@ static u64 cell_iommu_get_fixed_address(struct device *dev)
return dev_addr;
 }
 
-static int dma_set_mask_and_switch(struct device *dev, u64 dma_mask)
+static int dma_suported_and_switch(struct device *dev, u64 dma_mask)
 {
-   if (!dev->dma_mask || !dma_supported(dev, dma_mask))
-   return -EIO;
-
if (dma_mask == DMA_BIT_MASK(64) &&
cell_iommu_get_fixed_address(dev) != OF_BAD_ADDR) {
u64 addr = cell_iommu_get_fixed_address(dev) +
@@ -965,14 +956,16 @@ static int dma_set_mask_and_switch(struct device *dev, 
u64 dma_mask)
dev_dbg(dev, "iommu: fixed addr = %llx\n", addr);
set_dma_ops(dev, _iommu_fixed_ops);
set_dma_offset(dev, addr);
-   } else {
+   return 1;
+   }
+
+   if (dma_iommu_dma_supported(dev, dma_mask)) {
dev_dbg(dev, "iommu: not 64-bit, using default ops\n");
set_dma_ops(dev, get_pci_dma_ops());
cell_dma_dev_setup(dev);
+   return 1;
}
 
-   *dev->dma_mask = dma_mask;
-
return 0;
 }
 
@@ -1127,7 +1120,7 @@ static int __init cell_iommu_fixed_mapping_init(void)
cell_iommu_setup_window(iommu, np, dbase, dsize, 0);
}
 
-   dma_iommu_ops.set_dma_mask = dma_set_mask_and_switch;
+   dma_iommu_ops.dma_supported = dma_suported_and_switch;
set_pci_dma_ops(_iommu_ops);
 
return 0;
-- 
2.11.0



[PATCH 41/44] powerpc/cell: clean up fixed mapping dma_ops initialization

2017-06-08 Thread Christoph Hellwig
By the time cell_pci_dma_dev_setup calls cell_dma_dev_setup no device can
have the fixed map_ops set yet as it's only set by the set_dma_mask
method.  So move the setup for the fixed case to be only called in that
place instead of indirecting through cell_dma_dev_setup.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/platforms/cell/iommu.c | 27 +++
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/platforms/cell/iommu.c 
b/arch/powerpc/platforms/cell/iommu.c
index 948086e33a0c..497bfbdbd967 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -663,14 +663,9 @@ static const struct dma_map_ops dma_iommu_fixed_ops = {
.mapping_error  = dma_iommu_mapping_error,
 };
 
-static void cell_dma_dev_setup_fixed(struct device *dev);
-
 static void cell_dma_dev_setup(struct device *dev)
 {
-   /* Order is important here, these are not mutually exclusive */
-   if (get_dma_ops(dev) == _iommu_fixed_ops)
-   cell_dma_dev_setup_fixed(dev);
-   else if (get_pci_dma_ops() == _iommu_ops)
+   if (get_pci_dma_ops() == _iommu_ops)
set_iommu_table_base(dev, cell_get_iommu_table(dev));
else if (get_pci_dma_ops() == _direct_ops)
set_dma_offset(dev, cell_dma_direct_offset);
@@ -963,32 +958,24 @@ static int dma_set_mask_and_switch(struct device *dev, 
u64 dma_mask)
return -EIO;
 
if (dma_mask == DMA_BIT_MASK(64) &&
-   cell_iommu_get_fixed_address(dev) != OF_BAD_ADDR)
-   {
+   cell_iommu_get_fixed_address(dev) != OF_BAD_ADDR) {
+   u64 addr = cell_iommu_get_fixed_address(dev) +
+   dma_iommu_fixed_base;
dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
+   dev_dbg(dev, "iommu: fixed addr = %llx\n", addr);
set_dma_ops(dev, _iommu_fixed_ops);
+   set_dma_offset(dev, addr);
} else {
dev_dbg(dev, "iommu: not 64-bit, using default ops\n");
set_dma_ops(dev, get_pci_dma_ops());
+   cell_dma_dev_setup(dev);
}
 
-   cell_dma_dev_setup(dev);
-
*dev->dma_mask = dma_mask;
 
return 0;
 }
 
-static void cell_dma_dev_setup_fixed(struct device *dev)
-{
-   u64 addr;
-
-   addr = cell_iommu_get_fixed_address(dev) + dma_iommu_fixed_base;
-   set_dma_offset(dev, addr);
-
-   dev_dbg(dev, "iommu: fixed addr = %llx\n", addr);
-}
-
 static void insert_16M_pte(unsigned long addr, unsigned long *ptab,
   unsigned long base_pte)
 {
-- 
2.11.0



[PATCH 40/44] tile: remove dma_supported and mapping_error methods

2017-06-08 Thread Christoph Hellwig
These just duplicate the default behavior if no method is provided.

Signed-off-by: Christoph Hellwig 
---
 arch/tile/kernel/pci-dma.c | 30 --
 1 file changed, 30 deletions(-)

diff --git a/arch/tile/kernel/pci-dma.c b/arch/tile/kernel/pci-dma.c
index 569bb6dd154a..f2abedc8a080 100644
--- a/arch/tile/kernel/pci-dma.c
+++ b/arch/tile/kernel/pci-dma.c
@@ -317,18 +317,6 @@ static void tile_dma_sync_sg_for_device(struct device *dev,
}
 }
 
-static inline int
-tile_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return 0;
-}
-
-static inline int
-tile_dma_supported(struct device *dev, u64 mask)
-{
-   return 1;
-}
-
 static const struct dma_map_ops tile_default_dma_map_ops = {
.alloc = tile_dma_alloc_coherent,
.free = tile_dma_free_coherent,
@@ -340,8 +328,6 @@ static const struct dma_map_ops tile_default_dma_map_ops = {
.sync_single_for_device = tile_dma_sync_single_for_device,
.sync_sg_for_cpu = tile_dma_sync_sg_for_cpu,
.sync_sg_for_device = tile_dma_sync_sg_for_device,
-   .mapping_error = tile_dma_mapping_error,
-   .dma_supported = tile_dma_supported
 };
 
 const struct dma_map_ops *tile_dma_map_ops = _default_dma_map_ops;
@@ -504,18 +490,6 @@ static void tile_pci_dma_sync_sg_for_device(struct device 
*dev,
}
 }
 
-static inline int
-tile_pci_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return 0;
-}
-
-static inline int
-tile_pci_dma_supported(struct device *dev, u64 mask)
-{
-   return 1;
-}
-
 static const struct dma_map_ops tile_pci_default_dma_map_ops = {
.alloc = tile_pci_dma_alloc_coherent,
.free = tile_pci_dma_free_coherent,
@@ -527,8 +501,6 @@ static const struct dma_map_ops 
tile_pci_default_dma_map_ops = {
.sync_single_for_device = tile_pci_dma_sync_single_for_device,
.sync_sg_for_cpu = tile_pci_dma_sync_sg_for_cpu,
.sync_sg_for_device = tile_pci_dma_sync_sg_for_device,
-   .mapping_error = tile_pci_dma_mapping_error,
-   .dma_supported = tile_pci_dma_supported
 };
 
 const struct dma_map_ops *gx_pci_dma_map_ops = _pci_default_dma_map_ops;
@@ -578,8 +550,6 @@ static const struct dma_map_ops pci_hybrid_dma_ops = {
.sync_single_for_device = tile_pci_dma_sync_single_for_device,
.sync_sg_for_cpu = tile_pci_dma_sync_sg_for_cpu,
.sync_sg_for_device = tile_pci_dma_sync_sg_for_device,
-   .mapping_error = tile_pci_dma_mapping_error,
-   .dma_supported = tile_pci_dma_supported
 };
 
 const struct dma_map_ops *gx_legacy_pci_dma_map_ops = _swiotlb_dma_ops;
-- 
2.11.0



[PATCH 39/44] xen-swiotlb: remove xen_swiotlb_set_dma_mask

2017-06-08 Thread Christoph Hellwig
This just duplicates the generic implementation.

Signed-off-by: Christoph Hellwig 
---
 drivers/xen/swiotlb-xen.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index c3a04b2d7532..82fc54f8eb77 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -661,17 +661,6 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
return xen_virt_to_bus(xen_io_tlb_end - 1) <= mask;
 }
 
-static int
-xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask)
-{
-   if (!dev->dma_mask || !xen_swiotlb_dma_supported(dev, dma_mask))
-   return -EIO;
-
-   *dev->dma_mask = dma_mask;
-
-   return 0;
-}
-
 /*
  * Create userspace mapping for the DMA-coherent memory.
  * This function should be called with the pages from the current domain only,
@@ -734,7 +723,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.map_page = xen_swiotlb_map_page,
.unmap_page = xen_swiotlb_unmap_page,
.dma_supported = xen_swiotlb_dma_supported,
-   .set_dma_mask = xen_swiotlb_set_dma_mask,
.mmap = xen_swiotlb_dma_mmap,
.get_sgtable = xen_swiotlb_get_sgtable,
.mapping_error  = xen_swiotlb_mapping_error,
-- 
2.11.0



[PATCH 38/44] arm: implement ->dma_supported instead of ->set_dma_mask

2017-06-08 Thread Christoph Hellwig
Same behavior, less code duplication.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/common/dmabounce.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 4aabf117e136..d89a0b56b245 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -445,12 +445,12 @@ static void dmabounce_sync_for_device(struct device *dev,
arm_dma_ops.sync_single_for_device(dev, handle, size, dir);
 }
 
-static int dmabounce_set_mask(struct device *dev, u64 dma_mask)
+static int dmabounce_dma_supported(struct device *dev, u64 dma_mask)
 {
if (dev->archdata.dmabounce)
return 0;
 
-   return arm_dma_ops.set_dma_mask(dev, dma_mask);
+   return arm_dma_ops.dma_supported(dev, dma_mask);
 }
 
 static int dmabounce_mapping_error(struct device *dev, dma_addr_t dma_addr)
@@ -474,9 +474,8 @@ static const struct dma_map_ops dmabounce_ops = {
.unmap_sg   = arm_dma_unmap_sg,
.sync_sg_for_cpu= arm_dma_sync_sg_for_cpu,
.sync_sg_for_device = arm_dma_sync_sg_for_device,
-   .set_dma_mask   = dmabounce_set_mask,
+   .dma_supported  = dmabounce_dma_supported,
.mapping_error  = dmabounce_mapping_error,
-   .dma_supported  = arm_dma_supported,
 };
 
 static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev,
-- 
2.11.0



[PATCH 37/44] mips/loongson64: implement ->dma_supported instead of ->set_dma_mask

2017-06-08 Thread Christoph Hellwig
Same behavior, less code duplication.

Signed-off-by: Christoph Hellwig 
---
 arch/mips/loongson64/common/dma-swiotlb.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/arch/mips/loongson64/common/dma-swiotlb.c 
b/arch/mips/loongson64/common/dma-swiotlb.c
index 178ca17a5667..34486c138206 100644
--- a/arch/mips/loongson64/common/dma-swiotlb.c
+++ b/arch/mips/loongson64/common/dma-swiotlb.c
@@ -75,19 +75,11 @@ static void loongson_dma_sync_sg_for_device(struct device 
*dev,
mb();
 }
 
-static int loongson_dma_set_mask(struct device *dev, u64 mask)
+static int loongson_dma_supported(struct device *dev, u64 mask)
 {
-   if (!dev->dma_mask || !dma_supported(dev, mask))
-   return -EIO;
-
-   if (mask > DMA_BIT_MASK(loongson_sysconf.dma_mask_bits)) {
-   *dev->dma_mask = DMA_BIT_MASK(loongson_sysconf.dma_mask_bits);
-   return -EIO;
-   }
-
-   *dev->dma_mask = mask;
-
-   return 0;
+   if (mask > DMA_BIT_MASK(loongson_sysconf.dma_mask_bits))
+   return 0;
+   return swiotlb_dma_supported(dev, mask);
 }
 
 dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
@@ -126,8 +118,7 @@ static const struct dma_map_ops loongson_dma_map_ops = {
.sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
.sync_sg_for_device = loongson_dma_sync_sg_for_device,
.mapping_error = swiotlb_dma_mapping_error,
-   .dma_supported = swiotlb_dma_supported,
-   .set_dma_mask = loongson_dma_set_mask
+   .dma_supported = loongson_dma_supported,
 };
 
 void __init plat_swiotlb_setup(void)
-- 
2.11.0



[PATCH 35/44] x86: remove arch specific dma_supported implementation

2017-06-08 Thread Christoph Hellwig
And instead wire it up as method for all the dma_map_ops instances.

Note that this also means the arch specific check will be fully instead
of partially applied in the AMD iommu driver.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/dma-mapping.h | 3 ---
 arch/x86/include/asm/iommu.h   | 2 ++
 arch/x86/kernel/amd_gart_64.c  | 1 +
 arch/x86/kernel/pci-calgary_64.c   | 1 +
 arch/x86/kernel/pci-dma.c  | 7 +--
 arch/x86/kernel/pci-nommu.c| 1 +
 arch/x86/pci/sta2x11-fixup.c   | 3 ++-
 drivers/iommu/amd_iommu.c  | 2 ++
 drivers/iommu/intel-iommu.c| 3 +++
 9 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h 
b/arch/x86/include/asm/dma-mapping.h
index c35d228aa381..398c79889f5c 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -33,9 +33,6 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
 bool arch_dma_alloc_attrs(struct device **dev, gfp_t *gfp);
 #define arch_dma_alloc_attrs arch_dma_alloc_attrs
 
-#define HAVE_ARCH_DMA_SUPPORTED 1
-extern int dma_supported(struct device *hwdev, u64 mask);
-
 extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_addr, gfp_t flag,
unsigned long attrs);
diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index 793869879464..fca144a104e4 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -6,6 +6,8 @@ extern int force_iommu, no_iommu;
 extern int iommu_detected;
 extern int iommu_pass_through;
 
+int x86_dma_supported(struct device *dev, u64 mask);
+
 /* 10 seconds */
 #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
 
diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index 815dd63f49d0..cc0e8bc0ea3f 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -704,6 +704,7 @@ static const struct dma_map_ops gart_dma_ops = {
.alloc  = gart_alloc_coherent,
.free   = gart_free_coherent,
.mapping_error  = gart_mapping_error,
+   .dma_supported  = x86_dma_supported,
 };
 
 static void gart_iommu_shutdown(void)
diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index e75b490f2b0b..5286a4a92cf7 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -493,6 +493,7 @@ static const struct dma_map_ops calgary_dma_ops = {
.map_page = calgary_map_page,
.unmap_page = calgary_unmap_page,
.mapping_error = calgary_mapping_error,
+   .dma_supported = x86_dma_supported,
 };
 
 static inline void __iomem * busno_to_bbar(unsigned char num)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 3a216ec869cd..b6f5684be3b5 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -213,10 +213,8 @@ static __init int iommu_setup(char *p)
 }
 early_param("iommu", iommu_setup);
 
-int dma_supported(struct device *dev, u64 mask)
+int x86_dma_supported(struct device *dev, u64 mask)
 {
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-
 #ifdef CONFIG_PCI
if (mask > 0x && forbid_dac > 0) {
dev_info(dev, "PCI: Disallowing DAC for device\n");
@@ -224,9 +222,6 @@ int dma_supported(struct device *dev, u64 mask)
}
 #endif
 
-   if (ops->dma_supported)
-   return ops->dma_supported(dev, mask);
-
/* Copied from i386. Doesn't make much sense, because it will
   only work for pci_alloc_coherent.
   The caller just has to use GFP_DMA in this case. */
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index 085fe6ce4049..a6d404087fe3 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -104,4 +104,5 @@ const struct dma_map_ops nommu_dma_ops = {
.sync_sg_for_device = nommu_sync_sg_for_device,
.is_phys= 1,
.mapping_error  = nommu_mapping_error,
+   .dma_supported  = x86_dma_supported,
 };
diff --git a/arch/x86/pci/sta2x11-fixup.c b/arch/x86/pci/sta2x11-fixup.c
index ec008e800b45..53d600217973 100644
--- a/arch/x86/pci/sta2x11-fixup.c
+++ b/arch/x86/pci/sta2x11-fixup.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define STA2X11_SWIOTLB_SIZE (4*1024*1024)
 extern int swiotlb_late_init_with_default_size(size_t default_size);
@@ -191,7 +192,7 @@ static const struct dma_map_ops sta2x11_dma_ops = {
.sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
.sync_sg_for_device = swiotlb_sync_sg_for_device,
.mapping_error = swiotlb_dma_mapping_error,
-   .dma_supported = NULL, /* FIXME: we should use this instead! */
+   .dma_supported = x86_dma_supported,
 };
 

[PATCH 36/44] dma-mapping: remove HAVE_ARCH_DMA_SUPPORTED

2017-06-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index a57875309bfd..3e5908656226 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -549,7 +549,6 @@ static inline int dma_mapping_error(struct device *dev, 
dma_addr_t dma_addr)
return 0;
 }
 
-#ifndef HAVE_ARCH_DMA_SUPPORTED
 static inline int dma_supported(struct device *dev, u64 mask)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
@@ -560,7 +559,6 @@ static inline int dma_supported(struct device *dev, u64 
mask)
return 1;
return ops->dma_supported(dev, mask);
 }
-#endif
 
 #ifndef HAVE_ARCH_DMA_SET_MASK
 static inline int dma_set_mask(struct device *dev, u64 mask)
-- 
2.11.0



[PATCH 34/44] arm: remove arch specific dma_supported implementation

2017-06-08 Thread Christoph Hellwig
And instead wire it up as method for all the dma_map_ops instances.

Note that the code seems a little fishy for dmabounce and iommu, but
for now I'd like to preserve the existing behavior 1:1.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/common/dmabounce.c| 1 +
 arch/arm/include/asm/dma-iommu.h   | 2 ++
 arch/arm/include/asm/dma-mapping.h | 3 ---
 arch/arm/mm/dma-mapping.c  | 7 +--
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index bad457395ff1..4aabf117e136 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -476,6 +476,7 @@ static const struct dma_map_ops dmabounce_ops = {
.sync_sg_for_device = arm_dma_sync_sg_for_device,
.set_dma_mask   = dmabounce_set_mask,
.mapping_error  = dmabounce_mapping_error,
+   .dma_supported  = arm_dma_supported,
 };
 
 static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev,
diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h
index 389a26a10ea3..c090ec675eac 100644
--- a/arch/arm/include/asm/dma-iommu.h
+++ b/arch/arm/include/asm/dma-iommu.h
@@ -35,5 +35,7 @@ int arm_iommu_attach_device(struct device *dev,
struct dma_iommu_mapping *mapping);
 void arm_iommu_detach_device(struct device *dev);
 
+int arm_dma_supported(struct device *dev, u64 mask);
+
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index 52a8fd5a8edb..8dabcfdf4505 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -20,9 +20,6 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return _dma_ops;
 }
 
-#define HAVE_ARCH_DMA_SUPPORTED 1
-extern int dma_supported(struct device *dev, u64 mask);
-
 #ifdef __arch_page_to_dma
 #error Please update to __arch_pfn_to_dma
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 2dbc94b5fe5c..2938b724826e 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -199,6 +199,7 @@ const struct dma_map_ops arm_dma_ops = {
.sync_sg_for_cpu= arm_dma_sync_sg_for_cpu,
.sync_sg_for_device = arm_dma_sync_sg_for_device,
.mapping_error  = arm_dma_mapping_error,
+   .dma_supported  = arm_dma_supported,
 };
 EXPORT_SYMBOL(arm_dma_ops);
 
@@ -218,6 +219,7 @@ const struct dma_map_ops arm_coherent_dma_ops = {
.map_page   = arm_coherent_dma_map_page,
.map_sg = arm_dma_map_sg,
.mapping_error  = arm_dma_mapping_error,
+   .dma_supported  = arm_dma_supported,
 };
 EXPORT_SYMBOL(arm_coherent_dma_ops);
 
@@ -1184,11 +1186,10 @@ void arm_dma_sync_sg_for_device(struct device *dev, 
struct scatterlist *sg,
  * during bus mastering, then you would pass 0x00ff as the mask
  * to this function.
  */
-int dma_supported(struct device *dev, u64 mask)
+int arm_dma_supported(struct device *dev, u64 mask)
 {
return __dma_supported(dev, mask, false);
 }
-EXPORT_SYMBOL(dma_supported);
 
 #define PREALLOC_DMA_DEBUG_ENTRIES 4096
 
@@ -2149,6 +2150,7 @@ const struct dma_map_ops iommu_ops = {
.unmap_resource = arm_iommu_unmap_resource,
 
.mapping_error  = arm_dma_mapping_error,
+   .dma_supported  = arm_dma_supported,
 };
 
 const struct dma_map_ops iommu_coherent_ops = {
@@ -2167,6 +2169,7 @@ const struct dma_map_ops iommu_coherent_ops = {
.unmap_resource = arm_iommu_unmap_resource,
 
.mapping_error  = arm_dma_mapping_error,
+   .dma_supported  = arm_dma_supported,
 };
 
 /**
-- 
2.11.0



[PATCH 32/44] hexagon: remove the unused dma_is_consistent prototype

2017-06-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/hexagon/include/asm/dma-mapping.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/hexagon/include/asm/dma-mapping.h 
b/arch/hexagon/include/asm/dma-mapping.h
index 9c15cb5271a6..463dbc18f853 100644
--- a/arch/hexagon/include/asm/dma-mapping.h
+++ b/arch/hexagon/include/asm/dma-mapping.h
@@ -37,7 +37,6 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return dma_ops;
 }
 
-extern int dma_is_consistent(struct device *dev, dma_addr_t dma_handle);
 extern void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
   enum dma_data_direction direction);
 
-- 
2.11.0



[PATCH 33/44] openrisc: remove arch-specific dma_supported implementation

2017-06-08 Thread Christoph Hellwig
This implementation is simply bogus - hexagon only has a simple
direct mapped DMA implementation and thus doesn't care about the
address.

Signed-off-by: Christoph Hellwig 
---
 arch/openrisc/include/asm/dma-mapping.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/arch/openrisc/include/asm/dma-mapping.h 
b/arch/openrisc/include/asm/dma-mapping.h
index a4ea139c2ef9..f41bd3cb76d9 100644
--- a/arch/openrisc/include/asm/dma-mapping.h
+++ b/arch/openrisc/include/asm/dma-mapping.h
@@ -33,11 +33,4 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return _dma_map_ops;
 }
 
-#define HAVE_ARCH_DMA_SUPPORTED 1
-static inline int dma_supported(struct device *dev, u64 dma_mask)
-{
-   /* Support 32 bit DMA mask exclusively */
-   return dma_mask == DMA_BIT_MASK(32);
-}
-
 #endif /* __ASM_OPENRISC_DMA_MAPPING_H */
-- 
2.11.0



[PATCH 31/44] hexagon: remove arch-specific dma_supported implementation

2017-06-08 Thread Christoph Hellwig
This implementation is simply bogus - hexagon only has a simple
direct mapped DMA implementation and thus doesn't care about the
address.

Signed-off-by: Christoph Hellwig 
---
 arch/hexagon/include/asm/dma-mapping.h | 2 --
 arch/hexagon/kernel/dma.c  | 9 -
 2 files changed, 11 deletions(-)

diff --git a/arch/hexagon/include/asm/dma-mapping.h 
b/arch/hexagon/include/asm/dma-mapping.h
index 00e3f10113b0..9c15cb5271a6 100644
--- a/arch/hexagon/include/asm/dma-mapping.h
+++ b/arch/hexagon/include/asm/dma-mapping.h
@@ -37,8 +37,6 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return dma_ops;
 }
 
-#define HAVE_ARCH_DMA_SUPPORTED 1
-extern int dma_supported(struct device *dev, u64 mask);
 extern int dma_is_consistent(struct device *dev, dma_addr_t dma_handle);
 extern void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
   enum dma_data_direction direction);
diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index 71269dc0f225..9ff1b2041f85 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -35,15 +35,6 @@ static inline void *dma_addr_to_virt(dma_addr_t dma_addr)
return phys_to_virt((unsigned long) dma_addr);
 }
 
-int dma_supported(struct device *dev, u64 mask)
-{
-   if (mask == DMA_BIT_MASK(32))
-   return 1;
-   else
-   return 0;
-}
-EXPORT_SYMBOL(dma_supported);
-
 static struct gen_pool *coherent_pool;
 
 
-- 
2.11.0



[PATCH 30/44] dma-virt: remove dma_supported and mapping_error methods

2017-06-08 Thread Christoph Hellwig
These just duplicate the default behavior if no method is provided.

Signed-off-by: Christoph Hellwig 
---
 lib/dma-virt.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/lib/dma-virt.c b/lib/dma-virt.c
index dcd4df1f7174..5c4f11329721 100644
--- a/lib/dma-virt.c
+++ b/lib/dma-virt.c
@@ -51,22 +51,10 @@ static int dma_virt_map_sg(struct device *dev, struct 
scatterlist *sgl,
return nents;
 }
 
-static int dma_virt_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return false;
-}
-
-static int dma_virt_supported(struct device *dev, u64 mask)
-{
-   return true;
-}
-
 const struct dma_map_ops dma_virt_ops = {
.alloc  = dma_virt_alloc,
.free   = dma_virt_free,
.map_page   = dma_virt_map_page,
.map_sg = dma_virt_map_sg,
-   .mapping_error  = dma_virt_mapping_error,
-   .dma_supported  = dma_virt_supported,
 };
 EXPORT_SYMBOL(dma_virt_ops);
-- 
2.11.0



[PATCH 29/44] dma-noop: remove dma_supported and mapping_error methods

2017-06-08 Thread Christoph Hellwig
These just duplicate the default behavior if no method is provided.

Signed-off-by: Christoph Hellwig 
---
 lib/dma-noop.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/lib/dma-noop.c b/lib/dma-noop.c
index de26c8b68f34..643a074f139d 100644
--- a/lib/dma-noop.c
+++ b/lib/dma-noop.c
@@ -54,23 +54,11 @@ static int dma_noop_map_sg(struct device *dev, struct 
scatterlist *sgl, int nent
return nents;
 }
 
-static int dma_noop_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return 0;
-}
-
-static int dma_noop_supported(struct device *dev, u64 mask)
-{
-   return 1;
-}
-
 const struct dma_map_ops dma_noop_ops = {
.alloc  = dma_noop_alloc,
.free   = dma_noop_free,
.map_page   = dma_noop_map_page,
.map_sg = dma_noop_map_sg,
-   .mapping_error  = dma_noop_mapping_error,
-   .dma_supported  = dma_noop_supported,
 };
 
 EXPORT_SYMBOL(dma_noop_ops);
-- 
2.11.0



[PATCH 28/44] sparc: remove arch specific dma_supported implementations

2017-06-08 Thread Christoph Hellwig
Usually dma_supported decisions are done by the dma_map_ops instance.
Switch sparc to that model by providing a ->dma_supported instance for
sbus that always returns false, and implementations tailored to the sun4u
and sun4v cases for sparc64, and leave it unimplemented for PCI on
sparc32, which means always supported.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/include/asm/dma-mapping.h |  3 ---
 arch/sparc/kernel/iommu.c| 40 +++-
 arch/sparc/kernel/ioport.c   | 22 ++--
 arch/sparc/kernel/pci_sun4v.c| 17 +++
 4 files changed, 39 insertions(+), 43 deletions(-)

diff --git a/arch/sparc/include/asm/dma-mapping.h 
b/arch/sparc/include/asm/dma-mapping.h
index 98da9f92c318..60bf1633d554 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -5,9 +5,6 @@
 #include 
 #include 
 
-#define HAVE_ARCH_DMA_SUPPORTED 1
-int dma_supported(struct device *dev, u64 mask);
-
 static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
  enum dma_data_direction dir)
 {
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index dafa316d978d..fcbcc031f615 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -746,6 +746,21 @@ static int dma_4u_mapping_error(struct device *dev, 
dma_addr_t dma_addr)
return dma_addr == SPARC_MAPPING_ERROR;
 }
 
+static int dma_4u_supported(struct device *dev, u64 device_mask)
+{
+   struct iommu *iommu = dev->archdata.iommu;
+
+   if (device_mask > DMA_BIT_MASK(32))
+   return 0;
+   if ((device_mask & iommu->dma_addr_mask) == iommu->dma_addr_mask)
+   return 1;
+#ifdef CONFIG_PCI
+   if (dev_is_pci(dev))
+   return pci64_dma_supported(to_pci_dev(dev), device_mask);
+#endif
+   return 0;
+}
+
 static const struct dma_map_ops sun4u_dma_ops = {
.alloc  = dma_4u_alloc_coherent,
.free   = dma_4u_free_coherent,
@@ -755,32 +770,9 @@ static const struct dma_map_ops sun4u_dma_ops = {
.unmap_sg   = dma_4u_unmap_sg,
.sync_single_for_cpu= dma_4u_sync_single_for_cpu,
.sync_sg_for_cpu= dma_4u_sync_sg_for_cpu,
+   .dma_supported  = dma_4u_supported,
.mapping_error  = dma_4u_mapping_error,
 };
 
 const struct dma_map_ops *dma_ops = _dma_ops;
 EXPORT_SYMBOL(dma_ops);
-
-int dma_supported(struct device *dev, u64 device_mask)
-{
-   struct iommu *iommu = dev->archdata.iommu;
-   u64 dma_addr_mask = iommu->dma_addr_mask;
-
-   if (device_mask > DMA_BIT_MASK(32)) {
-   if (iommu->atu)
-   dma_addr_mask = iommu->atu->dma_addr_mask;
-   else
-   return 0;
-   }
-
-   if ((device_mask & dma_addr_mask) == dma_addr_mask)
-   return 1;
-
-#ifdef CONFIG_PCI
-   if (dev_is_pci(dev))
-   return pci64_dma_supported(to_pci_dev(dev), device_mask);
-#endif
-
-   return 0;
-}
-EXPORT_SYMBOL(dma_supported);
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index dd081d557609..12894f259bea 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -401,6 +401,11 @@ static void sbus_sync_sg_for_device(struct device *dev, 
struct scatterlist *sg,
BUG();
 }
 
+static int sbus_dma_supported(struct device *dev, u64 mask)
+{
+   return 0;
+}
+
 static const struct dma_map_ops sbus_dma_ops = {
.alloc  = sbus_alloc_coherent,
.free   = sbus_free_coherent,
@@ -410,6 +415,7 @@ static const struct dma_map_ops sbus_dma_ops = {
.unmap_sg   = sbus_unmap_sg,
.sync_sg_for_cpu= sbus_sync_sg_for_cpu,
.sync_sg_for_device = sbus_sync_sg_for_device,
+   .dma_supported  = sbus_dma_supported,
 };
 
 static int __init sparc_register_ioport(void)
@@ -655,22 +661,6 @@ EXPORT_SYMBOL(pci32_dma_ops);
 const struct dma_map_ops *dma_ops = _dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
-
-/*
- * Return whether the given PCI device DMA address mask can be
- * supported properly.  For example, if your device can only drive the
- * low 24-bits during PCI bus mastering, then you would pass
- * 0x00ff as the mask to this function.
- */
-int dma_supported(struct device *dev, u64 mask)
-{
-   if (dev_is_pci(dev))
-   return 1;
-
-   return 0;
-}
-EXPORT_SYMBOL(dma_supported);
-
 #ifdef CONFIG_PROC_FS
 
 static int sparc_io_proc_show(struct seq_file *m, void *v)
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 8e2a56f4c03a..24f21c726dfa 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -24,6 +24,7 @@
 
 #include "pci_impl.h"
 #include "iommu_common.h"
+#include "kernel.h"
 
 #include "pci_sun4v.h"
 
@@ -669,6 

[PATCH 26/44] dma-mapping: remove DMA_ERROR_CODE

2017-06-08 Thread Christoph Hellwig
And update the documentation - dma_mapping_error has been supported
everywhere for a long time.

Signed-off-by: Christoph Hellwig 
---
 Documentation/DMA-API-HOWTO.txt | 31 +--
 include/linux/dma-mapping.h |  5 -
 2 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index 979228bc9035..4ed388356898 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -550,32 +550,11 @@ and to unmap it:
dma_unmap_single(dev, dma_handle, size, direction);
 
 You should call dma_mapping_error() as dma_map_single() could fail and return
-error. Not all DMA implementations support the dma_mapping_error() interface.
-However, it is a good practice to call dma_mapping_error() interface, which
-will invoke the generic mapping error check interface. Doing so will ensure
-that the mapping code will work correctly on all DMA implementations without
-any dependency on the specifics of the underlying implementation. Using the
-returned address without checking for errors could result in failures ranging
-from panics to silent data corruption. A couple of examples of incorrect ways
-to check for errors that make assumptions about the underlying DMA
-implementation are as follows and these are applicable to dma_map_page() as
-well.
-
-Incorrect example 1:
-   dma_addr_t dma_handle;
-
-   dma_handle = dma_map_single(dev, addr, size, direction);
-   if ((dma_handle & 0x != 0) || (dma_handle >= 0x100)) {
-   goto map_error;
-   }
-
-Incorrect example 2:
-   dma_addr_t dma_handle;
-
-   dma_handle = dma_map_single(dev, addr, size, direction);
-   if (dma_handle == DMA_ERROR_CODE) {
-   goto map_error;
-   }
+error.  Doing so will ensure that the mapping code will work correctly on all
+DMA implementations without any dependency on the specifics of the underlying
+implementation. Using the returned address without checking for errors could
+result in failures ranging from panics to silent data corruption.  The same
+applies to dma_map_page() as well.
 
 You should call dma_unmap_single() when the DMA activity is finished, e.g.,
 from the interrupt which told you that the DMA transfer is done.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4f3eecedca2d..a57875309bfd 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -546,12 +546,7 @@ static inline int dma_mapping_error(struct device *dev, 
dma_addr_t dma_addr)
 
if (get_dma_ops(dev)->mapping_error)
return get_dma_ops(dev)->mapping_error(dev, dma_addr);
-
-#ifdef DMA_ERROR_CODE
-   return dma_addr == DMA_ERROR_CODE;
-#else
return 0;
-#endif
 }
 
 #ifndef HAVE_ARCH_DMA_SUPPORTED
-- 
2.11.0



[PATCH 27/44] sparc: remove leon_dma_ops

2017-06-08 Thread Christoph Hellwig
We can just use pci32_dma_ops.

Btw, given that leon is 32-bit and appears to be PCI based, do even need
the special case for it in get_arch_dma_ops at all?

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/include/asm/dma-mapping.h | 3 +--
 arch/sparc/kernel/ioport.c   | 5 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/sparc/include/asm/dma-mapping.h 
b/arch/sparc/include/asm/dma-mapping.h
index b8e8dfcd065d..98da9f92c318 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -17,7 +17,6 @@ static inline void dma_cache_sync(struct device *dev, void 
*vaddr, size_t size,
 }
 
 extern const struct dma_map_ops *dma_ops;
-extern const struct dma_map_ops *leon_dma_ops;
 extern const struct dma_map_ops pci32_dma_ops;
 
 extern struct bus_type pci_bus_type;
@@ -26,7 +25,7 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
 {
 #ifdef CONFIG_SPARC_LEON
if (sparc_cpu_model == sparc_leon)
-   return leon_dma_ops;
+   return _dma_ops;
 #endif
 #if defined(CONFIG_SPARC32) && defined(CONFIG_PCI)
if (bus == _bus_type)
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index cf20033a1458..dd081d557609 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -637,6 +637,7 @@ static void pci32_sync_sg_for_device(struct device *device, 
struct scatterlist *
}
 }
 
+/* note: leon re-uses pci32_dma_ops */
 const struct dma_map_ops pci32_dma_ops = {
.alloc  = pci32_alloc_coherent,
.free   = pci32_free_coherent,
@@ -651,10 +652,6 @@ const struct dma_map_ops pci32_dma_ops = {
 };
 EXPORT_SYMBOL(pci32_dma_ops);
 
-/* leon re-uses pci32_dma_ops */
-const struct dma_map_ops *leon_dma_ops = _dma_ops;
-EXPORT_SYMBOL(leon_dma_ops);
-
 const struct dma_map_ops *dma_ops = _dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
-- 
2.11.0



[PATCH 25/44] arm: implement ->mapping_error

2017-06-08 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/common/dmabounce.c| 16 ---
 arch/arm/include/asm/dma-iommu.h   |  2 ++
 arch/arm/include/asm/dma-mapping.h |  1 -
 arch/arm/mm/dma-mapping.c  | 41 --
 4 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 9b1b7be2ec0e..bad457395ff1 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -33,6 +33,7 @@
 #include 
 
 #include 
+#include 
 
 #undef STATS
 
@@ -256,7 +257,7 @@ static inline dma_addr_t map_single(struct device *dev, 
void *ptr, size_t size,
if (buf == NULL) {
dev_err(dev, "%s: unable to map unsafe buffer %p!\n",
   __func__, ptr);
-   return DMA_ERROR_CODE;
+   return ARM_MAPPING_ERROR;
}
 
dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n",
@@ -326,7 +327,7 @@ static dma_addr_t dmabounce_map_page(struct device *dev, 
struct page *page,
 
ret = needs_bounce(dev, dma_addr, size);
if (ret < 0)
-   return DMA_ERROR_CODE;
+   return ARM_MAPPING_ERROR;
 
if (ret == 0) {
arm_dma_ops.sync_single_for_device(dev, dma_addr, size, dir);
@@ -335,7 +336,7 @@ static dma_addr_t dmabounce_map_page(struct device *dev, 
struct page *page,
 
if (PageHighMem(page)) {
dev_err(dev, "DMA buffer bouncing of HIGHMEM pages is not 
supported\n");
-   return DMA_ERROR_CODE;
+   return ARM_MAPPING_ERROR;
}
 
return map_single(dev, page_address(page) + offset, size, dir, attrs);
@@ -452,6 +453,14 @@ static int dmabounce_set_mask(struct device *dev, u64 
dma_mask)
return arm_dma_ops.set_dma_mask(dev, dma_mask);
 }
 
+static int dmabounce_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   if (dev->archdata.dmabounce)
+   return 0;
+
+   return arm_dma_ops.mapping_error(dev, dma_addr);
+}
+
 static const struct dma_map_ops dmabounce_ops = {
.alloc  = arm_dma_alloc,
.free   = arm_dma_free,
@@ -466,6 +475,7 @@ static const struct dma_map_ops dmabounce_ops = {
.sync_sg_for_cpu= arm_dma_sync_sg_for_cpu,
.sync_sg_for_device = arm_dma_sync_sg_for_device,
.set_dma_mask   = dmabounce_set_mask,
+   .mapping_error  = dmabounce_mapping_error,
 };
 
 static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev,
diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h
index 2ef282f96651..389a26a10ea3 100644
--- a/arch/arm/include/asm/dma-iommu.h
+++ b/arch/arm/include/asm/dma-iommu.h
@@ -9,6 +9,8 @@
 #include 
 #include 
 
+#define ARM_MAPPING_ERROR  (~(dma_addr_t)0x0)
+
 struct dma_iommu_mapping {
/* iommu specific data */
struct iommu_domain *domain;
diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index 680d3f3889e7..52a8fd5a8edb 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -12,7 +12,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
 extern const struct dma_map_ops arm_dma_ops;
 extern const struct dma_map_ops arm_coherent_dma_ops;
 
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c742dfd2967b..2dbc94b5fe5c 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -180,6 +180,11 @@ static void arm_dma_sync_single_for_device(struct device 
*dev,
__dma_page_cpu_to_dev(page, offset, size, dir);
 }
 
+static int arm_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == ARM_MAPPING_ERROR;
+}
+
 const struct dma_map_ops arm_dma_ops = {
.alloc  = arm_dma_alloc,
.free   = arm_dma_free,
@@ -193,6 +198,7 @@ const struct dma_map_ops arm_dma_ops = {
.sync_single_for_device = arm_dma_sync_single_for_device,
.sync_sg_for_cpu= arm_dma_sync_sg_for_cpu,
.sync_sg_for_device = arm_dma_sync_sg_for_device,
+   .mapping_error  = arm_dma_mapping_error,
 };
 EXPORT_SYMBOL(arm_dma_ops);
 
@@ -211,6 +217,7 @@ const struct dma_map_ops arm_coherent_dma_ops = {
.get_sgtable= arm_dma_get_sgtable,
.map_page   = arm_coherent_dma_map_page,
.map_sg = arm_dma_map_sg,
+   .mapping_error  = arm_dma_mapping_error,
 };
 EXPORT_SYMBOL(arm_coherent_dma_ops);
 
@@ -799,7 +806,7 @@ static void *__dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
gfp &= ~(__GFP_COMP);
args.gfp = gfp;
 
-   *handle = DMA_ERROR_CODE;
+   *handle = ARM_MAPPING_ERROR;
allowblock = 

[PATCH 22/44] x86/pci-nommu: implement ->mapping_error

2017-06-08 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/kernel/pci-nommu.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index a88952ef371c..085fe6ce4049 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+#define NOMMU_MAPPING_ERROR0
+
 static int
 check_addr(char *name, struct device *hwdev, dma_addr_t bus, size_t size)
 {
@@ -33,7 +35,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct 
page *page,
dma_addr_t bus = page_to_phys(page) + offset;
WARN_ON(size == 0);
if (!check_addr("map_single", dev, bus, size))
-   return DMA_ERROR_CODE;
+   return NOMMU_MAPPING_ERROR;
flush_write_buffers();
return bus;
 }
@@ -88,6 +90,11 @@ static void nommu_sync_sg_for_device(struct device *dev,
flush_write_buffers();
 }
 
+static int nommu_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == NOMMU_MAPPING_ERROR;
+}
+
 const struct dma_map_ops nommu_dma_ops = {
.alloc  = dma_generic_alloc_coherent,
.free   = dma_generic_free_coherent,
@@ -96,4 +103,5 @@ const struct dma_map_ops nommu_dma_ops = {
.sync_single_for_device = nommu_sync_single_for_device,
.sync_sg_for_device = nommu_sync_sg_for_device,
.is_phys= 1,
+   .mapping_error  = nommu_mapping_error,
 };
-- 
2.11.0



[PATCH 24/44] x86: remove DMA_ERROR_CODE

2017-06-08 Thread Christoph Hellwig
All dma_map_ops instances now handle their errors through
->mapping_error.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h 
b/arch/x86/include/asm/dma-mapping.h
index 08a0838b83fb..c35d228aa381 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -19,8 +19,6 @@
 # define ISA_DMA_BIT_MASK DMA_BIT_MASK(32)
 #endif
 
-#define DMA_ERROR_CODE 0
-
 extern int iommu_merge;
 extern struct device x86_dma_fallback_dev;
 extern int panic_on_overflow;
-- 
2.11.0



[PATCH 23/44] x86/calgary: implement ->mapping_error

2017-06-08 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/kernel/pci-calgary_64.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index fda7867046d0..e75b490f2b0b 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -50,6 +50,8 @@
 #include 
 #include 
 
+#define CALGARY_MAPPING_ERROR  0
+
 #ifdef CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT
 int use_calgary __read_mostly = 1;
 #else
@@ -252,7 +254,7 @@ static unsigned long iommu_range_alloc(struct device *dev,
if (panic_on_overflow)
panic("Calgary: fix the allocator.\n");
else
-   return DMA_ERROR_CODE;
+   return CALGARY_MAPPING_ERROR;
}
}
 
@@ -272,10 +274,10 @@ static dma_addr_t iommu_alloc(struct device *dev, struct 
iommu_table *tbl,
 
entry = iommu_range_alloc(dev, tbl, npages);
 
-   if (unlikely(entry == DMA_ERROR_CODE)) {
+   if (unlikely(entry == CALGARY_MAPPING_ERROR)) {
pr_warn("failed to allocate %u pages in iommu %p\n",
npages, tbl);
-   return DMA_ERROR_CODE;
+   return CALGARY_MAPPING_ERROR;
}
 
/* set the return dma address */
@@ -295,7 +297,7 @@ static void iommu_free(struct iommu_table *tbl, dma_addr_t 
dma_addr,
unsigned long flags;
 
/* were we called with bad_dma_address? */
-   badend = DMA_ERROR_CODE + (EMERGENCY_PAGES * PAGE_SIZE);
+   badend = CALGARY_MAPPING_ERROR + (EMERGENCY_PAGES * PAGE_SIZE);
if (unlikely(dma_addr < badend)) {
WARN(1, KERN_ERR "Calgary: driver tried unmapping bad DMA "
   "address 0x%Lx\n", dma_addr);
@@ -380,7 +382,7 @@ static int calgary_map_sg(struct device *dev, struct 
scatterlist *sg,
npages = iommu_num_pages(vaddr, s->length, PAGE_SIZE);
 
entry = iommu_range_alloc(dev, tbl, npages);
-   if (entry == DMA_ERROR_CODE) {
+   if (entry == CALGARY_MAPPING_ERROR) {
/* makes sure unmap knows to stop */
s->dma_length = 0;
goto error;
@@ -398,7 +400,7 @@ static int calgary_map_sg(struct device *dev, struct 
scatterlist *sg,
 error:
calgary_unmap_sg(dev, sg, nelems, dir, 0);
for_each_sg(sg, s, nelems, i) {
-   sg->dma_address = DMA_ERROR_CODE;
+   sg->dma_address = CALGARY_MAPPING_ERROR;
sg->dma_length = 0;
}
return 0;
@@ -453,7 +455,7 @@ static void* calgary_alloc_coherent(struct device *dev, 
size_t size,
 
/* set up tces to cover the allocated range */
mapping = iommu_alloc(dev, tbl, ret, npages, DMA_BIDIRECTIONAL);
-   if (mapping == DMA_ERROR_CODE)
+   if (mapping == CALGARY_MAPPING_ERROR)
goto free;
*dma_handle = mapping;
return ret;
@@ -478,6 +480,11 @@ static void calgary_free_coherent(struct device *dev, 
size_t size,
free_pages((unsigned long)vaddr, get_order(size));
 }
 
+static int calgary_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == CALGARY_MAPPING_ERROR;
+}
+
 static const struct dma_map_ops calgary_dma_ops = {
.alloc = calgary_alloc_coherent,
.free = calgary_free_coherent,
@@ -485,6 +492,7 @@ static const struct dma_map_ops calgary_dma_ops = {
.unmap_sg = calgary_unmap_sg,
.map_page = calgary_map_page,
.unmap_page = calgary_unmap_page,
+   .mapping_error = calgary_mapping_error,
 };
 
 static inline void __iomem * busno_to_bbar(unsigned char num)
@@ -732,7 +740,7 @@ static void __init calgary_reserve_regions(struct pci_dev 
*dev)
struct iommu_table *tbl = pci_iommu(dev->bus);
 
/* reserve EMERGENCY_PAGES from bad_dma_address and up */
-   iommu_range_reserve(tbl, DMA_ERROR_CODE, EMERGENCY_PAGES);
+   iommu_range_reserve(tbl, CALGARY_MAPPING_ERROR, EMERGENCY_PAGES);
 
/* avoid the BIOS/VGA first 640KB-1MB region */
/* for CalIOC2 - avoid the entire first MB */
-- 
2.11.0



[PATCH 21/44] powerpc: implement ->mapping_error

2017-06-08 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.  Instead
define a ->mapping_error method for all IOMMU based dma operation
instances.  The direct ops don't ever return an error and don't
need a ->mapping_error method.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/dma-mapping.h |  4 
 arch/powerpc/include/asm/iommu.h   |  4 
 arch/powerpc/kernel/dma-iommu.c|  6 ++
 arch/powerpc/kernel/iommu.c| 28 ++--
 arch/powerpc/platforms/cell/iommu.c|  1 +
 arch/powerpc/platforms/pseries/vio.c   |  3 ++-
 6 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 181a095468e4..73aedbe6c977 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -17,10 +17,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_PPC64
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-#endif
-
 /* Some dma direct funcs must be visible for use in other dma_ops */
 extern void *__dma_direct_alloc_coherent(struct device *dev, size_t size,
 dma_addr_t *dma_handle, gfp_t flag,
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 8a8ce220d7d0..20febe0b7f32 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -139,6 +139,8 @@ struct scatterlist;
 
 #ifdef CONFIG_PPC64
 
+#define IOMMU_MAPPING_ERROR(~(dma_addr_t)0x0)
+
 static inline void set_iommu_table_base(struct device *dev,
struct iommu_table *base)
 {
@@ -238,6 +240,8 @@ static inline int __init tce_iommu_bus_notifier_init(void)
 }
 #endif /* !CONFIG_IOMMU_API */
 
+int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
 #else
 
 static inline void *get_iommu_table_base(struct device *dev)
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index fb7cbaa37658..8f7abf9baa63 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -105,6 +105,11 @@ static u64 dma_iommu_get_required_mask(struct device *dev)
return mask;
 }
 
+int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == IOMMU_MAPPING_ERROR;
+}
+
 struct dma_map_ops dma_iommu_ops = {
.alloc  = dma_iommu_alloc_coherent,
.free   = dma_iommu_free_coherent,
@@ -115,5 +120,6 @@ struct dma_map_ops dma_iommu_ops = {
.map_page   = dma_iommu_map_page,
.unmap_page = dma_iommu_unmap_page,
.get_required_mask  = dma_iommu_get_required_mask,
+   .mapping_error  = dma_iommu_mapping_error,
 };
 EXPORT_SYMBOL(dma_iommu_ops);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index f2b724cd9e64..233ca3fe4754 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -198,11 +198,11 @@ static unsigned long iommu_range_alloc(struct device *dev,
if (unlikely(npages == 0)) {
if (printk_ratelimit())
WARN_ON(1);
-   return DMA_ERROR_CODE;
+   return IOMMU_MAPPING_ERROR;
}
 
if (should_fail_iommu(dev))
-   return DMA_ERROR_CODE;
+   return IOMMU_MAPPING_ERROR;
 
/*
 * We don't need to disable preemption here because any CPU can
@@ -278,7 +278,7 @@ static unsigned long iommu_range_alloc(struct device *dev,
} else {
/* Give up */
spin_unlock_irqrestore(&(pool->lock), flags);
-   return DMA_ERROR_CODE;
+   return IOMMU_MAPPING_ERROR;
}
}
 
@@ -310,13 +310,13 @@ static dma_addr_t iommu_alloc(struct device *dev, struct 
iommu_table *tbl,
  unsigned long attrs)
 {
unsigned long entry;
-   dma_addr_t ret = DMA_ERROR_CODE;
+   dma_addr_t ret = IOMMU_MAPPING_ERROR;
int build_fail;
 
entry = iommu_range_alloc(dev, tbl, npages, NULL, mask, align_order);
 
-   if (unlikely(entry == DMA_ERROR_CODE))
-   return DMA_ERROR_CODE;
+   if (unlikely(entry == IOMMU_MAPPING_ERROR))
+   return IOMMU_MAPPING_ERROR;
 
entry += tbl->it_offset;/* Offset into real TCE table */
ret = entry << tbl->it_page_shift;  /* Set the return dma address */
@@ -328,12 +328,12 @@ static dma_addr_t iommu_alloc(struct device *dev, struct 
iommu_table *tbl,
 
/* tbl->it_ops->set() only returns non-zero for transient errors.
 * Clean up the table bitmap in this case and return
-* DMA_ERROR_CODE. For all other errors the functionality is
+* IOMMU_MAPPING_ERROR. For all other errors the functionality is
 * not altered.
 */
if 

[PATCH 20/44] sparc: implement ->mapping_error

2017-06-08 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/include/asm/dma-mapping.h |  2 --
 arch/sparc/kernel/iommu.c| 12 +---
 arch/sparc/kernel/iommu_common.h |  2 ++
 arch/sparc/kernel/pci_sun4v.c| 14 ++
 4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/sparc/include/asm/dma-mapping.h 
b/arch/sparc/include/asm/dma-mapping.h
index 69cc627779f2..b8e8dfcd065d 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -5,8 +5,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-
 #define HAVE_ARCH_DMA_SUPPORTED 1
 int dma_supported(struct device *dev, u64 mask);
 
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index c63ba99ca551..dafa316d978d 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -314,7 +314,7 @@ static dma_addr_t dma_4u_map_page(struct device *dev, 
struct page *page,
 bad_no_ctx:
if (printk_ratelimit())
WARN_ON(1);
-   return DMA_ERROR_CODE;
+   return SPARC_MAPPING_ERROR;
 }
 
 static void strbuf_flush(struct strbuf *strbuf, struct iommu *iommu,
@@ -547,7 +547,7 @@ static int dma_4u_map_sg(struct device *dev, struct 
scatterlist *sglist,
 
if (outcount < incount) {
outs = sg_next(outs);
-   outs->dma_address = DMA_ERROR_CODE;
+   outs->dma_address = SPARC_MAPPING_ERROR;
outs->dma_length = 0;
}
 
@@ -573,7 +573,7 @@ static int dma_4u_map_sg(struct device *dev, struct 
scatterlist *sglist,
iommu_tbl_range_free(>tbl, vaddr, npages,
 IOMMU_ERROR_CODE);
 
-   s->dma_address = DMA_ERROR_CODE;
+   s->dma_address = SPARC_MAPPING_ERROR;
s->dma_length = 0;
}
if (s == outs)
@@ -741,6 +741,11 @@ static void dma_4u_sync_sg_for_cpu(struct device *dev,
spin_unlock_irqrestore(>lock, flags);
 }
 
+static int dma_4u_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == SPARC_MAPPING_ERROR;
+}
+
 static const struct dma_map_ops sun4u_dma_ops = {
.alloc  = dma_4u_alloc_coherent,
.free   = dma_4u_free_coherent,
@@ -750,6 +755,7 @@ static const struct dma_map_ops sun4u_dma_ops = {
.unmap_sg   = dma_4u_unmap_sg,
.sync_single_for_cpu= dma_4u_sync_single_for_cpu,
.sync_sg_for_cpu= dma_4u_sync_sg_for_cpu,
+   .mapping_error  = dma_4u_mapping_error,
 };
 
 const struct dma_map_ops *dma_ops = _dma_ops;
diff --git a/arch/sparc/kernel/iommu_common.h b/arch/sparc/kernel/iommu_common.h
index 828493329f68..5ea5c192b1d9 100644
--- a/arch/sparc/kernel/iommu_common.h
+++ b/arch/sparc/kernel/iommu_common.h
@@ -47,4 +47,6 @@ static inline int is_span_boundary(unsigned long entry,
return iommu_is_span_boundary(entry, nr, shift, boundary_size);
 }
 
+#define SPARC_MAPPING_ERROR(~(dma_addr_t)0x0)
+
 #endif /* _IOMMU_COMMON_H */
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 68bec7c97cb8..8e2a56f4c03a 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -412,12 +412,12 @@ static dma_addr_t dma_4v_map_page(struct device *dev, 
struct page *page,
 bad:
if (printk_ratelimit())
WARN_ON(1);
-   return DMA_ERROR_CODE;
+   return SPARC_MAPPING_ERROR;
 
 iommu_map_fail:
local_irq_restore(flags);
iommu_tbl_range_free(tbl, bus_addr, npages, IOMMU_ERROR_CODE);
-   return DMA_ERROR_CODE;
+   return SPARC_MAPPING_ERROR;
 }
 
 static void dma_4v_unmap_page(struct device *dev, dma_addr_t bus_addr,
@@ -590,7 +590,7 @@ static int dma_4v_map_sg(struct device *dev, struct 
scatterlist *sglist,
 
if (outcount < incount) {
outs = sg_next(outs);
-   outs->dma_address = DMA_ERROR_CODE;
+   outs->dma_address = SPARC_MAPPING_ERROR;
outs->dma_length = 0;
}
 
@@ -607,7 +607,7 @@ static int dma_4v_map_sg(struct device *dev, struct 
scatterlist *sglist,
iommu_tbl_range_free(tbl, vaddr, npages,
 IOMMU_ERROR_CODE);
/* XXX demap? XXX */
-   s->dma_address = DMA_ERROR_CODE;
+   s->dma_address = SPARC_MAPPING_ERROR;
s->dma_length = 0;
}
if (s == outs)
@@ -669,6 +669,11 @@ static void dma_4v_unmap_sg(struct device *dev, struct 
scatterlist *sglist,
local_irq_restore(flags);
 }
 
+static int dma_4v_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == SPARC_MAPPING_ERROR;
+}
+
 static const struct dma_map_ops 

[PATCH 19/44] s390: implement ->mapping_error

2017-06-08 Thread Christoph Hellwig
s390 can also use noop_dma_ops, and while that currently does not return
errors it will so in the future.  Implementing the mapping_error method
is the proper way to have per-ops error conditions.

Signed-off-by: Christoph Hellwig 
---
 arch/s390/include/asm/dma-mapping.h |  2 --
 arch/s390/pci/pci_dma.c | 18 +-
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/dma-mapping.h 
b/arch/s390/include/asm/dma-mapping.h
index 3108b8dbe266..512ad0eaa11a 100644
--- a/arch/s390/include/asm/dma-mapping.h
+++ b/arch/s390/include/asm/dma-mapping.h
@@ -8,8 +8,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t) 0x0)
-
 extern const struct dma_map_ops s390_pci_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 9081a57fa340..ea623faab525 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -14,6 +14,8 @@
 #include 
 #include 
 
+#define S390_MAPPING_ERROR (~(dma_addr_t) 0x0)
+
 static struct kmem_cache *dma_region_table_cache;
 static struct kmem_cache *dma_page_table_cache;
 static int s390_iommu_strict;
@@ -281,7 +283,7 @@ static dma_addr_t dma_alloc_address(struct device *dev, int 
size)
 
 out_error:
spin_unlock_irqrestore(>iommu_bitmap_lock, flags);
-   return DMA_ERROR_CODE;
+   return S390_MAPPING_ERROR;
 }
 
 static void dma_free_address(struct device *dev, dma_addr_t dma_addr, int size)
@@ -329,7 +331,7 @@ static dma_addr_t s390_dma_map_pages(struct device *dev, 
struct page *page,
/* This rounds up number of pages based on size and offset */
nr_pages = iommu_num_pages(pa, size, PAGE_SIZE);
dma_addr = dma_alloc_address(dev, nr_pages);
-   if (dma_addr == DMA_ERROR_CODE) {
+   if (dma_addr == S390_MAPPING_ERROR) {
ret = -ENOSPC;
goto out_err;
}
@@ -352,7 +354,7 @@ static dma_addr_t s390_dma_map_pages(struct device *dev, 
struct page *page,
 out_err:
zpci_err("map error:\n");
zpci_err_dma(ret, pa);
-   return DMA_ERROR_CODE;
+   return S390_MAPPING_ERROR;
 }
 
 static void s390_dma_unmap_pages(struct device *dev, dma_addr_t dma_addr,
@@ -429,7 +431,7 @@ static int __s390_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
int ret;
 
dma_addr_base = dma_alloc_address(dev, nr_pages);
-   if (dma_addr_base == DMA_ERROR_CODE)
+   if (dma_addr_base == S390_MAPPING_ERROR)
return -ENOMEM;
 
dma_addr = dma_addr_base;
@@ -476,7 +478,7 @@ static int s390_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
for (i = 1; i < nr_elements; i++) {
s = sg_next(s);
 
-   s->dma_address = DMA_ERROR_CODE;
+   s->dma_address = S390_MAPPING_ERROR;
s->dma_length = 0;
 
if (s->offset || (size & ~PAGE_MASK) ||
@@ -525,6 +527,11 @@ static void s390_dma_unmap_sg(struct device *dev, struct 
scatterlist *sg,
s->dma_length = 0;
}
 }
+   
+static int s390_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == S390_MAPPING_ERROR;
+}
 
 int zpci_dma_init_device(struct zpci_dev *zdev)
 {
@@ -657,6 +664,7 @@ const struct dma_map_ops s390_pci_dma_ops = {
.unmap_sg   = s390_dma_unmap_sg,
.map_page   = s390_dma_map_pages,
.unmap_page = s390_dma_unmap_pages,
+   .mapping_error  = s390_mapping_error,
/* if we support direct DMA this must be conditional */
.is_phys= 0,
/* dma_supported is unconditionally true without a callback */
-- 
2.11.0



[PATCH 18/44] iommu/amd: implement ->mapping_error

2017-06-08 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/amd_iommu.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 63cacf5d6cf2..d41280e869de 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -54,6 +54,8 @@
 #include "amd_iommu_types.h"
 #include "irq_remapping.h"
 
+#define AMD_IOMMU_MAPPING_ERROR0
+
 #define CMD_SET_TYPE(cmd, t) ((cmd)->data[1] |= ((t) << 28))
 
 #define LOOP_TIMEOUT   10
@@ -2394,7 +2396,7 @@ static dma_addr_t __map_single(struct device *dev,
paddr &= PAGE_MASK;
 
address = dma_ops_alloc_iova(dev, dma_dom, pages, dma_mask);
-   if (address == DMA_ERROR_CODE)
+   if (address == AMD_IOMMU_MAPPING_ERROR)
goto out;
 
prot = dir2prot(direction);
@@ -2431,7 +2433,7 @@ static dma_addr_t __map_single(struct device *dev,
 
dma_ops_free_iova(dma_dom, address, pages);
 
-   return DMA_ERROR_CODE;
+   return AMD_IOMMU_MAPPING_ERROR;
 }
 
 /*
@@ -2483,7 +2485,7 @@ static dma_addr_t map_page(struct device *dev, struct 
page *page,
if (PTR_ERR(domain) == -EINVAL)
return (dma_addr_t)paddr;
else if (IS_ERR(domain))
-   return DMA_ERROR_CODE;
+   return AMD_IOMMU_MAPPING_ERROR;
 
dma_mask = *dev->dma_mask;
dma_dom = to_dma_ops_domain(domain);
@@ -2560,7 +2562,7 @@ static int map_sg(struct device *dev, struct scatterlist 
*sglist,
npages = sg_num_pages(dev, sglist, nelems);
 
address = dma_ops_alloc_iova(dev, dma_dom, npages, dma_mask);
-   if (address == DMA_ERROR_CODE)
+   if (address == AMD_IOMMU_MAPPING_ERROR)
goto out_err;
 
prot = dir2prot(direction);
@@ -2683,7 +2685,7 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
*dma_addr = __map_single(dev, dma_dom, page_to_phys(page),
 size, DMA_BIDIRECTIONAL, dma_mask);
 
-   if (*dma_addr == DMA_ERROR_CODE)
+   if (*dma_addr == AMD_IOMMU_MAPPING_ERROR)
goto out_free;
 
return page_address(page);
@@ -2732,6 +2734,11 @@ static int amd_iommu_dma_supported(struct device *dev, 
u64 mask)
return check_device(dev);
 }
 
+static int amd_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == AMD_IOMMU_MAPPING_ERROR;
+}
+
 static const struct dma_map_ops amd_iommu_dma_ops = {
.alloc  = alloc_coherent,
.free   = free_coherent,
@@ -2740,6 +2747,7 @@ static const struct dma_map_ops amd_iommu_dma_ops = {
.map_sg = map_sg,
.unmap_sg   = unmap_sg,
.dma_supported  = amd_iommu_dma_supported,
+   .mapping_error  = amd_iommu_mapping_error,
 };
 
 static int init_reserved_iova_ranges(void)
-- 
2.11.0



[PATCH 16/44] arm64: remove DMA_ERROR_CODE

2017-06-08 Thread Christoph Hellwig
The dma alloc interface returns an error by return NULL, and the
mapping interfaces rely on the mapping_error method, which the dummy
ops already implement correctly.

Thus remove the DMA_ERROR_CODE define.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/dma-mapping.h | 1 -
 arch/arm64/mm/dma-mapping.c  | 3 +--
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index 5392dbeffa45..cf8fc8f05580 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -24,7 +24,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0)
 extern const struct dma_map_ops dummy_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 3216e098c058..147fbb907a2f 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -184,7 +184,6 @@ static void *__dma_alloc(struct device *dev, size_t size,
 no_map:
__dma_free_coherent(dev, size, ptr, *dma_handle, attrs);
 no_mem:
-   *dma_handle = DMA_ERROR_CODE;
return NULL;
 }
 
@@ -487,7 +486,7 @@ static dma_addr_t __dummy_map_page(struct device *dev, 
struct page *page,
   enum dma_data_direction dir,
   unsigned long attrs)
 {
-   return DMA_ERROR_CODE;
+   return 0;
 }
 
 static void __dummy_unmap_page(struct device *dev, dma_addr_t dev_addr,
-- 
2.11.0



[PATCH 17/44] hexagon: switch to use ->mapping_error for error reporting

2017-06-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/hexagon/include/asm/dma-mapping.h |  2 --
 arch/hexagon/kernel/dma.c  | 12 +---
 arch/hexagon/kernel/hexagon_ksyms.c|  1 -
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/hexagon/include/asm/dma-mapping.h 
b/arch/hexagon/include/asm/dma-mapping.h
index d3a87bd9b686..00e3f10113b0 100644
--- a/arch/hexagon/include/asm/dma-mapping.h
+++ b/arch/hexagon/include/asm/dma-mapping.h
@@ -29,8 +29,6 @@
 #include 
 
 struct device;
-extern int bad_dma_address;
-#define DMA_ERROR_CODE bad_dma_address
 
 extern const struct dma_map_ops *dma_ops;
 
diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index e74b65009587..71269dc0f225 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -25,11 +25,11 @@
 #include 
 #include 
 
+#define HEXAGON_MAPPING_ERROR  0
+
 const struct dma_map_ops *dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
-int bad_dma_address;  /*  globals are automatically initialized to zero  */
-
 static inline void *dma_addr_to_virt(dma_addr_t dma_addr)
 {
return phys_to_virt((unsigned long) dma_addr);
@@ -181,7 +181,7 @@ static dma_addr_t hexagon_map_page(struct device *dev, 
struct page *page,
WARN_ON(size == 0);
 
if (!check_addr("map_single", dev, bus, size))
-   return bad_dma_address;
+   return HEXAGON_MAPPING_ERROR;
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
dma_sync(dma_addr_to_virt(bus), size, dir);
@@ -203,6 +203,11 @@ static void hexagon_sync_single_for_device(struct device 
*dev,
dma_sync(dma_addr_to_virt(dma_handle), size, dir);
 }
 
+static int hexagon_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == HEXAGON_MAPPING_ERROR;
+}
+
 const struct dma_map_ops hexagon_dma_ops = {
.alloc  = hexagon_dma_alloc_coherent,
.free   = hexagon_free_coherent,
@@ -210,6 +215,7 @@ const struct dma_map_ops hexagon_dma_ops = {
.map_page   = hexagon_map_page,
.sync_single_for_cpu = hexagon_sync_single_for_cpu,
.sync_single_for_device = hexagon_sync_single_for_device,
+   .mapping_error  = hexagon_mapping_error;
.is_phys= 1,
 };
 
diff --git a/arch/hexagon/kernel/hexagon_ksyms.c 
b/arch/hexagon/kernel/hexagon_ksyms.c
index 00bcad9cbd8f..aa248f595431 100644
--- a/arch/hexagon/kernel/hexagon_ksyms.c
+++ b/arch/hexagon/kernel/hexagon_ksyms.c
@@ -40,7 +40,6 @@ EXPORT_SYMBOL(memset);
 /* Additional variables */
 EXPORT_SYMBOL(__phys_offset);
 EXPORT_SYMBOL(_dflt_cache_att);
-EXPORT_SYMBOL(bad_dma_address);
 
 #define DECLARE_EXPORT(name) \
extern void name(void); EXPORT_SYMBOL(name)
-- 
2.11.0



[PATCH 15/44] xtensa: remove DMA_ERROR_CODE

2017-06-08 Thread Christoph Hellwig
xtensa already implements the mapping_error method for its only
dma_map_ops instance.

Signed-off-by: Christoph Hellwig 
---
 arch/xtensa/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/xtensa/include/asm/dma-mapping.h 
b/arch/xtensa/include/asm/dma-mapping.h
index c6140fa8c0be..269738dc9d1d 100644
--- a/arch/xtensa/include/asm/dma-mapping.h
+++ b/arch/xtensa/include/asm/dma-mapping.h
@@ -16,8 +16,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-
 extern const struct dma_map_ops xtensa_dma_map_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-- 
2.11.0



[PATCH 14/44] sh: remove DMA_ERROR_CODE

2017-06-08 Thread Christoph Hellwig
sh does not return errors for dma_map_page.

Signed-off-by: Christoph Hellwig 
---
 arch/sh/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/sh/include/asm/dma-mapping.h 
b/arch/sh/include/asm/dma-mapping.h
index d99008af5f73..9b06be07db4d 100644
--- a/arch/sh/include/asm/dma-mapping.h
+++ b/arch/sh/include/asm/dma-mapping.h
@@ -9,8 +9,6 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct 
bus_type *bus)
return dma_ops;
 }
 
-#define DMA_ERROR_CODE 0
-
 void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction dir);
 
-- 
2.11.0



[PATCH 11/44] m32r: remove DMA_ERROR_CODE

2017-06-08 Thread Christoph Hellwig
dma-noop is the only dma_mapping_ops instance for m32r and does not return
errors.

Signed-off-by: Christoph Hellwig 
---
 arch/m32r/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/m32r/include/asm/dma-mapping.h 
b/arch/m32r/include/asm/dma-mapping.h
index c01d9f52d228..aff3ae8b62f7 100644
--- a/arch/m32r/include/asm/dma-mapping.h
+++ b/arch/m32r/include/asm/dma-mapping.h
@@ -8,8 +8,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
return _noop_ops;
-- 
2.11.0



[PATCH 10/44] ia64: remove DMA_ERROR_CODE

2017-06-08 Thread Christoph Hellwig
All ia64 dma_mapping_ops instances already have a mapping_error member.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/ia64/include/asm/dma-mapping.h 
b/arch/ia64/include/asm/dma-mapping.h
index 73ec3c6f4cfe..3ce5ab4339f3 100644
--- a/arch/ia64/include/asm/dma-mapping.h
+++ b/arch/ia64/include/asm/dma-mapping.h
@@ -12,8 +12,6 @@
 
 #define ARCH_HAS_DMA_GET_REQUIRED_MASK
 
-#define DMA_ERROR_CODE 0
-
 extern const struct dma_map_ops *dma_ops;
 extern struct ia64_machine_vector ia64_mv;
 extern void set_iommu_machvec(void);
-- 
2.11.0



[PATCH 12/44] microblaze: remove DMA_ERROR_CODE

2017-06-08 Thread Christoph Hellwig
microblaze does not return errors for dma_map_page.

Signed-off-by: Christoph Hellwig 
---
 arch/microblaze/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/microblaze/include/asm/dma-mapping.h 
b/arch/microblaze/include/asm/dma-mapping.h
index 3fad5e722a66..e15cd2f76e23 100644
--- a/arch/microblaze/include/asm/dma-mapping.h
+++ b/arch/microblaze/include/asm/dma-mapping.h
@@ -28,8 +28,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-
 #define __dma_alloc_coherent(dev, gfp, size, handle)   NULL
 #define __dma_free_coherent(size, addr)((void)0)
 
-- 
2.11.0



[PATCH 13/44] openrisc: remove DMA_ERROR_CODE

2017-06-08 Thread Christoph Hellwig
openrisc does not return errors for dma_map_page.

Signed-off-by: Christoph Hellwig 
---
 arch/openrisc/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/openrisc/include/asm/dma-mapping.h 
b/arch/openrisc/include/asm/dma-mapping.h
index 0c0075f17145..a4ea139c2ef9 100644
--- a/arch/openrisc/include/asm/dma-mapping.h
+++ b/arch/openrisc/include/asm/dma-mapping.h
@@ -26,8 +26,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-
 extern const struct dma_map_ops or1k_dma_map_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-- 
2.11.0



  1   2   >