date:20180730

Re: [GIT PULL 0/5] perf/urgent fixes

2018-07-30 Thread Ingo Molnar



* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling, just to get the build without warnings
> and finishing successfully in all my test environments,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 7f635ff187ab6be0b350b3ec06791e376af238ab:
> 
>   perf/core: Fix crash when using HW tracing kernel filters (2018-07-25 
> 11:46:22 +0200)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-urgent-for-mingo-4.18-20180730
> 
> for you to fetch changes up to 44fe619b1418ff4e9d2f9518a940fbe2fb686a08:
> 
>   perf tools: Fix the build on the alpine:edge distro (2018-07-30 13:15:03 
> -0300)
> 
> 
> perf/urgent fixes: (Arnaldo Carvalho de Melo)
> 
> - Update the tools copy of several files, including perf_event.h,
>   powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and
>   x86's memcpy_64.s (used in 'perf bench mem'), silencing the
>   respective warnings during the perf tools build.
> 
> - Fix the build on the alpine:edge distro.
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Arnaldo Carvalho de Melo (5):
>   tools headers uapi: Update tools's copy of linux/perf_event.h
>   tools headers powerpc: Update asm/unistd.h copy to pick new
>   tools headers uapi: Refresh linux/bpf.h copy
>   tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench 
> mem memcpy'
>   perf tools: Fix the build on the alpine:edge distro
> 
>  tools/arch/powerpc/include/uapi/asm/unistd.h |   1 +
>  tools/arch/x86/include/asm/mcsafe_test.h |  13 
>  tools/arch/x86/lib/memcpy_64.S   | 112 
> +--
>  tools/include/uapi/linux/bpf.h   |  28 +--
>  tools/include/uapi/linux/perf_event.h|   2 +
>  tools/perf/arch/x86/util/pmu.c   |   1 +
>  tools/perf/arch/x86/util/tsc.c   |   1 +
>  tools/perf/bench/Build   |   1 +
>  tools/perf/bench/mem-memcpy-x86-64-asm.S |   1 +
>  tools/perf/bench/mem-memcpy-x86-64-lib.c |  24 ++
>  tools/perf/perf.h|   1 +
>  tools/perf/util/header.h |   1 +
>  tools/perf/util/namespaces.h |   1 +
>  13 files changed, 124 insertions(+), 63 deletions(-)
>  create mode 100644 tools/arch/x86/include/asm/mcsafe_test.h
>  create mode 100644 tools/perf/bench/mem-memcpy-x86-64-lib.c

Pulled, thanks a lot Arnaldo!

Ingo

Re: [PATCH 1/2] selftests/powerpc: Skip earlier in alignment_handler test

2018-07-30 Thread Andrew Donnellan


On 31/07/18 14:41, Michael Ellerman wrote:

Currently the alignment_handler test prints "Can't open /dev/fb0"
about 80 times per run, which is a little annoying.

Refactor it to check earlier if it can open /dev/fb0 and skip if not,
this results in each test printing something like:

   test: test_alignment_handler_vsx_206
   tags: git_version:v4.18-rc3-134-gfb21a48904aa
   [SKIP] Test skipped on line 291
   skip: test_alignment_handler_vsx_206

Signed-off-by: Michael Ellerman 


This seems sensible.

Acked-by: Andrew Donnellan 


---
  .../powerpc/alignment/alignment_handler.c  | 40 +++---
  1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 0f2698f9fd6d..0eddd16af49f 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -40,6 +40,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -191,7 +192,7 @@ int test_memcmp(void *s1, void *s2, int n, int offset, char 
*test_name)
   */
  int do_test(char *test_name, void (*test_func)(char *, char *))
  {
-   int offset, width, fd, rc = 0, r;
+   int offset, width, fd, rc, r;
void *mem0, *mem1, *ci0, *ci1;
  
  	printf("\tDoing %s:\t", test_name);

@@ -199,8 +200,8 @@ int do_test(char *test_name, void (*test_func)(char *, char 
*))
fd = open("/dev/fb0", O_RDWR);
if (fd < 0) {
printf("\n");
-   perror("Can't open /dev/fb0");
-   SKIP_IF(1);
+   perror("Can't open /dev/fb0 now?");
+   return 1;
}
  
  	ci0 = mmap(NULL, bufsize, PROT_WRITE, MAP_SHARED,

@@ -226,6 +227,7 @@ int do_test(char *test_name, void (*test_func)(char *, char 
*))
return rc;
}
  
+	rc = 0;

/* offset = 0 no alignment fault, so skip */
for (offset = 1; offset < 16; offset++) {
width = 16; /* vsx == 16 bytes */
@@ -244,32 +246,50 @@ int do_test(char *test_name, void (*test_func)(char *, 
char *))
r |= test_memcpy(mem1, mem0, width, offset, test_func);
if (r && !debug) {
printf("FAILED: Got signal");
+   rc = 1;
break;
}
  
  		r |= test_memcmp(mem1, ci1, width, offset, test_name);

-   rc |= r;
if (r && !debug) {
printf("FAILED: Wrong Data");
+   rc = 1;
break;
}
}
-   if (!r)
+
+   if (rc == 0)
printf("PASSED");
+
printf("\n");
  
  	munmap(ci0, bufsize);

munmap(ci1, bufsize);
free(mem0);
free(mem1);
+   close(fd);


Good catch :D

  
  	return rc;

  }
  
+static bool can_open_fb0(void)

+{
+   int fd;
+
+   fd = open("/dev/fb0", O_RDWR);
+   if (fd < 0)
+   return false;
+
+   close(fd);
+   return true;
+}
+
  int test_alignment_handler_vsx_206(void)
  {
int rc = 0;
  
+	SKIP_IF(!can_open_fb0());

+
printf("VSX: 2.06B\n");
LOAD_VSX_XFORM_TEST(lxvd2x);
LOAD_VSX_XFORM_TEST(lxvw4x);
@@ -285,6 +305,8 @@ int test_alignment_handler_vsx_207(void)
  {
int rc = 0;
  
+	SKIP_IF(!can_open_fb0());

+
printf("VSX: 2.07B\n");
LOAD_VSX_XFORM_TEST(lxsspx);
LOAD_VSX_XFORM_TEST(lxsiwax);
@@ -298,6 +320,8 @@ int test_alignment_handler_vsx_300(void)
  {
int rc = 0;
  
+	SKIP_IF(!can_open_fb0());

+
SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_3_00));
printf("VSX: 3.00B\n");
LOAD_VMX_DFORM_TEST(lxsd);
@@ -328,6 +352,8 @@ int test_alignment_handler_integer(void)
  {
int rc = 0;
  
+	SKIP_IF(!can_open_fb0());

+
printf("Integer\n");
LOAD_DFORM_TEST(lbz);
LOAD_DFORM_TEST(lbzu);
@@ -383,6 +409,8 @@ int test_alignment_handler_vmx(void)
  {
int rc = 0;
  
+	SKIP_IF(!can_open_fb0());

+
printf("VMX\n");
LOAD_VMX_XFORM_TEST(lvx);
  
@@ -408,6 +436,8 @@ int test_alignment_handler_fp(void)

  {
int rc = 0;
  
+	SKIP_IF(!can_open_fb0());

+
printf("Floating point\n");
LOAD_FLOAT_DFORM_TEST(lfd);
LOAD_FLOAT_XFORM_TEST(lfdx);



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

[PATCH 2/2] selftests/powerpc: Add more version checks to alignment_handler test

2018-07-30 Thread Michael Ellerman

The alignment_handler is documented to only work on Power8/Power9, but
we can make it run on older CPUs by guarding more of the tests with
feature checks.

Signed-off-by: Michael Ellerman 
---
 .../powerpc/alignment/alignment_handler.c  | 76 +++---
 1 file changed, 68 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 0eddd16af49f..bcecc44627af 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -49,6 +49,8 @@
 #include 
 #include 
 
+#include 
+
 #include "utils.h"
 
 int bufsize;
@@ -289,6 +291,7 @@ int test_alignment_handler_vsx_206(void)
int rc = 0;
 
SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
 
printf("VSX: 2.06B\n");
LOAD_VSX_XFORM_TEST(lxvd2x);
@@ -306,6 +309,7 @@ int test_alignment_handler_vsx_207(void)
int rc = 0;
 
SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_2_07));
 
printf("VSX: 2.07B\n");
LOAD_VSX_XFORM_TEST(lxsspx);
@@ -380,7 +384,6 @@ int test_alignment_handler_integer(void)
LOAD_DFORM_TEST(ldu);
LOAD_XFORM_TEST(ldx);
LOAD_XFORM_TEST(ldux);
-   LOAD_XFORM_TEST(ldbrx);
LOAD_DFORM_TEST(lmw);
STORE_DFORM_TEST(stb);
STORE_XFORM_TEST(stbx);
@@ -400,8 +403,28 @@ int test_alignment_handler_integer(void)
STORE_XFORM_TEST(stdx);
STORE_DFORM_TEST(stdu);
STORE_XFORM_TEST(stdux);
-   STORE_XFORM_TEST(stdbrx);
STORE_DFORM_TEST(stmw);
+
+   if (have_hwcap(PPC_FEATURE_ARCH_2_06)) {
+   LOAD_XFORM_TEST(ldbrx);
+   STORE_XFORM_TEST(stdbrx);
+   }
+
+   return rc;
+}
+
+int test_alignment_handler_integer_206(void)
+{
+   int rc = 0;
+
+   SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
+
+   printf("Integer: 2.06\n");
+
+   LOAD_XFORM_TEST(ldbrx);
+   STORE_XFORM_TEST(stdbrx);
+
return rc;
 }
 
@@ -410,6 +433,7 @@ int test_alignment_handler_vmx(void)
int rc = 0;
 
SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_HAS_ALTIVEC));
 
printf("VMX\n");
LOAD_VMX_XFORM_TEST(lvx);
@@ -441,20 +465,14 @@ int test_alignment_handler_fp(void)
printf("Floating point\n");
LOAD_FLOAT_DFORM_TEST(lfd);
LOAD_FLOAT_XFORM_TEST(lfdx);
-   LOAD_FLOAT_DFORM_TEST(lfdp);
-   LOAD_FLOAT_XFORM_TEST(lfdpx);
LOAD_FLOAT_DFORM_TEST(lfdu);
LOAD_FLOAT_XFORM_TEST(lfdux);
LOAD_FLOAT_DFORM_TEST(lfs);
LOAD_FLOAT_XFORM_TEST(lfsx);
LOAD_FLOAT_DFORM_TEST(lfsu);
LOAD_FLOAT_XFORM_TEST(lfsux);
-   LOAD_FLOAT_XFORM_TEST(lfiwzx);
-   LOAD_FLOAT_XFORM_TEST(lfiwax);
STORE_FLOAT_DFORM_TEST(stfd);
STORE_FLOAT_XFORM_TEST(stfdx);
-   STORE_FLOAT_DFORM_TEST(stfdp);
-   STORE_FLOAT_XFORM_TEST(stfdpx);
STORE_FLOAT_DFORM_TEST(stfdu);
STORE_FLOAT_XFORM_TEST(stfdux);
STORE_FLOAT_DFORM_TEST(stfs);
@@ -466,6 +484,42 @@ int test_alignment_handler_fp(void)
return rc;
 }
 
+int test_alignment_handler_fp_205(void)
+{
+   int rc = 0;
+
+   SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_05));
+
+   printf("Floating point: 2.05\n");
+
+   LOAD_FLOAT_DFORM_TEST(lfdp);
+   LOAD_FLOAT_XFORM_TEST(lfdpx);
+   LOAD_FLOAT_XFORM_TEST(lfiwax);
+   STORE_FLOAT_DFORM_TEST(stfdp);
+   STORE_FLOAT_XFORM_TEST(stfdpx);
+
+   if (have_hwcap(PPC_FEATURE_ARCH_2_06)) {
+   LOAD_FLOAT_XFORM_TEST(lfiwzx);
+   }
+
+   return rc;
+}
+
+int test_alignment_handler_fp_206(void)
+{
+   int rc = 0;
+
+   SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
+
+   printf("Floating point: 2.06\n");
+
+   LOAD_FLOAT_XFORM_TEST(lfiwzx);
+
+   return rc;
+}
+
 void usage(char *prog)
 {
printf("Usage: %s [options]\n", prog);
@@ -513,9 +567,15 @@ int main(int argc, char *argv[])
   "test_alignment_handler_vsx_300");
rc |= test_harness(test_alignment_handler_integer,
   "test_alignment_handler_integer");
+   rc |= test_harness(test_alignment_handler_integer_206,
+  "test_alignment_handler_integer_206");
rc |= test_harness(test_alignment_handler_vmx,
   "test_alignment_handler_vmx");
rc |= test_harness(test_alignment_handler_fp,
   "test_alignment_handler_fp");
+   rc |= test_harness(test_alignment_handler_fp_205,
+  "test_alignment_handler_fp_205");
+   rc |= test_harness(test_alignment_handler_fp_206,
+  "test_alignment_handler_fp_206");

[PATCH 1/2] selftests/powerpc: Skip earlier in alignment_handler test

2018-07-30 Thread Michael Ellerman

Currently the alignment_handler test prints "Can't open /dev/fb0"
about 80 times per run, which is a little annoying.

Refactor it to check earlier if it can open /dev/fb0 and skip if not,
this results in each test printing something like:

  test: test_alignment_handler_vsx_206
  tags: git_version:v4.18-rc3-134-gfb21a48904aa
  [SKIP] Test skipped on line 291
  skip: test_alignment_handler_vsx_206

Signed-off-by: Michael Ellerman 
---
 .../powerpc/alignment/alignment_handler.c  | 40 +++---
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 0f2698f9fd6d..0eddd16af49f 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -191,7 +192,7 @@ int test_memcmp(void *s1, void *s2, int n, int offset, char 
*test_name)
  */
 int do_test(char *test_name, void (*test_func)(char *, char *))
 {
-   int offset, width, fd, rc = 0, r;
+   int offset, width, fd, rc, r;
void *mem0, *mem1, *ci0, *ci1;
 
printf("\tDoing %s:\t", test_name);
@@ -199,8 +200,8 @@ int do_test(char *test_name, void (*test_func)(char *, char 
*))
fd = open("/dev/fb0", O_RDWR);
if (fd < 0) {
printf("\n");
-   perror("Can't open /dev/fb0");
-   SKIP_IF(1);
+   perror("Can't open /dev/fb0 now?");
+   return 1;
}
 
ci0 = mmap(NULL, bufsize, PROT_WRITE, MAP_SHARED,
@@ -226,6 +227,7 @@ int do_test(char *test_name, void (*test_func)(char *, char 
*))
return rc;
}
 
+   rc = 0;
/* offset = 0 no alignment fault, so skip */
for (offset = 1; offset < 16; offset++) {
width = 16; /* vsx == 16 bytes */
@@ -244,32 +246,50 @@ int do_test(char *test_name, void (*test_func)(char *, 
char *))
r |= test_memcpy(mem1, mem0, width, offset, test_func);
if (r && !debug) {
printf("FAILED: Got signal");
+   rc = 1;
break;
}
 
r |= test_memcmp(mem1, ci1, width, offset, test_name);
-   rc |= r;
if (r && !debug) {
printf("FAILED: Wrong Data");
+   rc = 1;
break;
}
}
-   if (!r)
+
+   if (rc == 0)
printf("PASSED");
+
printf("\n");
 
munmap(ci0, bufsize);
munmap(ci1, bufsize);
free(mem0);
free(mem1);
+   close(fd);
 
return rc;
 }
 
+static bool can_open_fb0(void)
+{
+   int fd;
+
+   fd = open("/dev/fb0", O_RDWR);
+   if (fd < 0)
+   return false;
+
+   close(fd);
+   return true;
+}
+
 int test_alignment_handler_vsx_206(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
printf("VSX: 2.06B\n");
LOAD_VSX_XFORM_TEST(lxvd2x);
LOAD_VSX_XFORM_TEST(lxvw4x);
@@ -285,6 +305,8 @@ int test_alignment_handler_vsx_207(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
printf("VSX: 2.07B\n");
LOAD_VSX_XFORM_TEST(lxsspx);
LOAD_VSX_XFORM_TEST(lxsiwax);
@@ -298,6 +320,8 @@ int test_alignment_handler_vsx_300(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_3_00));
printf("VSX: 3.00B\n");
LOAD_VMX_DFORM_TEST(lxsd);
@@ -328,6 +352,8 @@ int test_alignment_handler_integer(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
printf("Integer\n");
LOAD_DFORM_TEST(lbz);
LOAD_DFORM_TEST(lbzu);
@@ -383,6 +409,8 @@ int test_alignment_handler_vmx(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
printf("VMX\n");
LOAD_VMX_XFORM_TEST(lvx);
 
@@ -408,6 +436,8 @@ int test_alignment_handler_fp(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
printf("Floating point\n");
LOAD_FLOAT_DFORM_TEST(lfd);
LOAD_FLOAT_XFORM_TEST(lfdx);
-- 
2.14.1

Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100

2018-07-30 Thread Alexey Kardashevskiy




On 31/07/2018 02:29, Alex Williamson wrote:
> On Mon, 30 Jul 2018 18:58:49 +1000
> Alexey Kardashevskiy  wrote:
> 
>> On 11/07/2018 19:26, Alexey Kardashevskiy wrote:
>>> On Tue, 10 Jul 2018 16:37:15 -0600
>>> Alex Williamson  wrote:
>>>   
 On Tue, 10 Jul 2018 14:10:20 +1000
 Alexey Kardashevskiy  wrote:
  
> On Thu, 7 Jun 2018 23:03:23 -0600
> Alex Williamson  wrote:
> 
>> On Fri, 8 Jun 2018 14:14:23 +1000
>> Alexey Kardashevskiy  wrote:
>>   
>>> On 8/6/18 1:44 pm, Alex Williamson wrote:
 On Fri, 8 Jun 2018 13:08:54 +1000
 Alexey Kardashevskiy  wrote:
   
> On 8/6/18 8:15 am, Alex Williamson wrote:  
>> On Fri, 08 Jun 2018 07:54:02 +1000
>> Benjamin Herrenschmidt  wrote:
>> 
>>> On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote:   
>>>  

 Can we back up and discuss whether the IOMMU grouping of NVLink
 connected devices makes sense?  AIUI we have a PCI view of these
 devices and from that perspective they're isolated.  That's the 
 view of
 the device used to generate the grouping.  However, not visible to 
 us,
 these devices are interconnected via NVLink.  What isolation 
 properties
 does NVLink provide given that its entire purpose for existing 
 seems to
 be to provide a high performance link for p2p between devices? 
  
>>>
>>> Not entire. On POWER chips, we also have an nvlink between the 
>>> device
>>> and the CPU which is running significantly faster than PCIe.
>>>
>>> But yes, there are cross-links and those should probably be 
>>> accounted
>>> for in the grouping.
>>
>> Then after we fix the grouping, can we just let the host driver 
>> manage
>> this coherent memory range and expose vGPUs to guests?  The use case 
>> of
>> assigning all 6 GPUs to one VM seems pretty limited.  (Might need to
>> convince NVIDIA to support more than a single vGPU per VM though)
>> 
>
> These are physical GPUs, not virtual sriov-alike things they are
> implementing as well elsewhere.  

 vGPUs as implemented on M- and P-series Teslas aren't SR-IOV like
 either.  That's why we have mdev devices now to implement software
 defined devices.  I don't have first hand experience with V-series, but
 I would absolutely expect a PCIe-based Tesla V100 to support vGPU. 
  
>>>
>>> So assuming V100 can do vGPU, you are suggesting ditching this patchset 
>>> and
>>> using mediated vGPUs instead, correct?
>>
>> If it turns out that our PCIe-only-based IOMMU grouping doesn't
>> account for lack of isolation on the NVLink side and we correct that,
>> limiting assignment to sets of 3 interconnected GPUs, is that still a
>> useful feature?  OTOH, it's entirely an NVIDIA proprietary decision
>> whether they choose to support vGPU on these GPUs or whether they can
>> be convinced to support multiple vGPUs per VM.
>>   
> My current understanding is that every P9 chip in that box has some 
> NVLink2
> logic on it so each P9 is directly connected to 3 GPUs via PCIe and
> 2xNVLink2, and GPUs in that big group are interconnected by NVLink2 
> links
> as well.
>
> From small bits of information I have it seems that a GPU can 
> perfectly
> work alone and if the NVIDIA driver does not see these interconnects
> (because we do not pass the rest of the big 3xGPU group to this 
> guest), it
> continues with a single GPU. There is an "nvidia-smi -r" big reset 
> hammer
> which simply refuses to work until all 3 GPUs are passed so there is 
> some
> distinction between passing 1 or 3 GPUs, and I am trying (as we 
> speak) to
> get a confirmation from NVIDIA that it is ok to pass just a single 
> GPU.
>
> So we will either have 6 groups (one per GPU) or 2 groups (one per
> interconnected group).  

 I'm not gaining much confidence that we can rely on isolation between
 NVLink connected GPUs, it sounds like you're simply expecting that
 proprietary code from NVIDIA on a proprietary interconnect from NVIDIA
 is going to play nice and nobody will figure out how to do bad things
 because... obfuscation?  Thanks,  
>>>
>>> Well, we already believe that a proprietary firmware of a sriov-capable
>>> adapter like Mellanox ConnextX is not doing bad things, how is

Re: [RFC 1/4] virtio: Define virtio_direct_dma_ops structure

2018-07-30 Thread Anshuman Khandual

On 07/30/2018 02:54 PM, Christoph Hellwig wrote:
>> +/*
>> + * Virtio direct mapping DMA API operations structure
>> + *
>> + * This defines DMA API structure for all virtio devices which would not
>> + * either bring in their own DMA OPS from architecture or they would not
>> + * like to use architecture specific IOMMU based DMA OPS because QEMU
>> + * expects GPA instead of an IOVA in absence of VIRTIO_F_IOMMU_PLATFORM.
>> + */
>> +dma_addr_t virtio_direct_map_page(struct device *dev, struct page *page,
>> +unsigned long offset, size_t size,
>> +enum dma_data_direction dir,
>> +unsigned long attrs)
> 
> All these functions should probably be marked static.

Sure.

> 
>> +void virtio_direct_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>> +size_t size, enum dma_data_direction dir,
>> +unsigned long attrs)
>> +{
>> +}
> 
> No need to implement no-op callbacks in struct dma_map_ops.

Okay.

> 
>> +
>> +int virtio_direct_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
>> +{
>> +return 0;
>> +}
> 
> Including this one.
> 
>> +void *virtio_direct_alloc(struct device *dev, size_t size, dma_addr_t 
>> *dma_handle,
>> +gfp_t gfp, unsigned long attrs)
>> +{
>> +void *queue = alloc_pages_exact(PAGE_ALIGN(size), gfp);
>> +
>> +if (queue) {
>> +phys_addr_t phys_addr = virt_to_phys(queue);
>> +*dma_handle = (dma_addr_t)phys_addr;
>> +
>> +if (WARN_ON_ONCE(*dma_handle != phys_addr)) {
>> +free_pages_exact(queue, PAGE_ALIGN(size));
>> +return NULL;
>> +}
>> +}
>> +return queue;
> 
> queue is a very odd name in a generic memory allocator.

Will change it to addr.

> 
>> +void virtio_direct_free(struct device *dev, size_t size, void *vaddr,
>> +dma_addr_t dma_addr, unsigned long attrs)
>> +{
>> +free_pages_exact(vaddr, PAGE_ALIGN(size));
>> +}
>> +
>> +const struct dma_map_ops virtio_direct_dma_ops = {
>> +.alloc  = virtio_direct_alloc,
>> +.free   = virtio_direct_free,
>> +.map_page   = virtio_direct_map_page,
>> +.unmap_page = virtio_direct_unmap_page,
>> +.mapping_error  = virtio_direct_mapping_error,
>> +};
> 
> This is missing a dma_map_sg implementation.  In general this is
> mandatory for dma_ops.  So either you implement it or explain in
> a common why you think you can skip it.

Hmm. IIUC virtio core never used dma_map_sg(). Am I missing something
here ? The only reference to dma_map_sg() is inside a comment.

$git grep dma_map_sg drivers/virtio/
drivers/virtio/virtio_ring.c:* We can't use dma_map_sg, because we don't 
use scatterlists in

> 
>> +EXPORT_SYMBOL(virtio_direct_dma_ops);
> 
> EXPORT_SYMBOL_GPL like all virtio symbols, please.

I am planning to drop EXPORT_SYMBOL from virtio_direct_dma_ops structure.

Re: [PATCH] powerpc: Add a checkpatch wrapper with our preferred settings

2018-07-30 Thread Russell Currey

On Wed, 2018-07-25 at 00:03 +1000, Michael Ellerman wrote:
> This makes it easy to run checkpatch with settings that we have
> agreed
> on (bwhahahahah).
> 
> Usage is eg:
> 
>   $ ./arch/powerpc/tools/checkpatch.sh -g origin/master..
> 
> To check all commits since origin/master.
> 
> Signed-off-by: Michael Ellerman 

Reviewed-by: Russell Currey

Re: [PATCH] powerpc/mobility: Fix node detach/rename problem

2018-07-30 Thread Tyrel Datwyler

On 07/30/2018 05:59 PM, Tyrel Datwyler wrote:
> On 07/29/2018 06:11 AM, Michael Bringmann wrote:
>> During LPAR migration, the content of the device tree/sysfs may
>> be updated including deletion and replacement of nodes in the
>> tree.  When nodes are added to the internal node structures, they
>> are appended in FIFO order to a list of nodes maintained by the
>> OF code APIs.  When nodes are removed from the device tree, they
>> are marked OF_DETACHED, but not actually deleted from the system
>> to allow for pointers cached elsewhere in the kernel.  The order
>> and content of the entries in the list of nodes is not altered,
>> though.
>>
>> During LPAR migration some common nodes are deleted and re-added
>> e.g. "ibm,platform-facilities".  If a node is re-added to the OF
>> node lists, the of_attach_node function checks to make sure that
>> the name + ibm,phandle of the to-be-added data is unique.  As the
>> previous copy of a re-added node is not modified beyond the addition
>> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING
>> notice to the console, (3) renames the to-be-added node to avoid
>> filename collisions within a directory, and (3) adds entries to
>> the sysfs/kernfs.
> 
> So, this patch actually just band aids over the real problem. This is a long 
> standing problem with several PFO drivers leaking references. The issue here 
> is that, during the device tree update that follows a migration. the update 
> of the ibm,platform-facilities node and friends below are always deleted and 
> re-added on the destination lpar and subsequently the leaked references 
> prevent the devices nodes from every actually being properly cleaned up after 
> detach. Thus, leading to the issue you are observing.
> 
> As and example a quick look at nx-842-pseries.c reveals several issues.
> 
> static int __init nx842_pseries_init(void)
> {
> struct nx842_devdata *new_devdata;
> int ret;
> 
> if (!of_find_compatible_node(NULL, NULL, "ibm,compression"))
> return -ENODEV;
> 
> This call to of_find_compatible_node() results in a node returned with 
> refcount incremented and therefore immediately leaked.
> 
> Further, the reconfig notifier logic makes the assumption that it only needs 
> to deal with node updates, but as I outlined above the post migration device 
> tree update always results in PFO nodes and properties being deleted and 
> re-added.
> 
> /**
>  * nx842_OF_notifier - Process updates to OF properties for the device
>  *
>  * @np: notifier block
>  * @action: notifier action
>  * @update: struct pSeries_reconfig_prop_update pointer if action is
>  *  PSERIES_UPDATE_PROPERTY
>  *
>  * Returns:
>  *  NOTIFY_OK on success
>  *  NOTIFY_BAD encoded with error number on failure, use
>  *  notifier_to_errno() to decode this value
>  */
> static int nx842_OF_notifier(struct notifier_block *np, unsigned long action,
>  void *data)
> {
> struct of_reconfig_data *upd = data;
> struct nx842_devdata *local_devdata;
> struct device_node *node = NULL;
> 
> rcu_read_lock();
> local_devdata = rcu_dereference(devdata);
> if (local_devdata)
> node = local_devdata->dev->of_node;
> 
> if (local_devdata &&
> action == OF_RECONFIG_UPDATE_PROPERTY &&
> !strcmp(upd->dn->name, node->name)) {
> rcu_read_unlock();
> nx842_OF_upd(upd->prop);
> } else
> rcu_read_unlock();
> 
> return NOTIFY_OK;
> }
> 
> I expect to find the same problems in pseries-rng.c and nx.c.

So, in actuality the main root of the problem is really in the vio core. A node 
reference for each PFO device under "ibm,platform-facilities" is taken during 
vio_register_device_node(). We need a reconfig notifier to call 
vio_unregister_device() for each PFO device on detach, and 
vio_bus_scan_register_devices("ibm,platform-facilities") on attach. This will 
make sure the PFO vio devices are released such that vio_dev_release() gets 
called to put the node reference that was taken at original registration time.

/* vio_dev refcount hit 0 */
static void vio_dev_release(struct device *dev)
{
struct iommu_table *tbl = get_iommu_table_base(dev);

if (tbl)
iommu_tce_table_put(tbl);
of_node_put(dev->of_node);
kfree(to_vio_dev(dev));
}

-Tyrel

> 
> -Tyrel
> 
>>
>> This patch fixes the 'migration add' problem by changing the
>> stored 'phandle' of the OF_DETACHed node to 0 (reserved value for
>> of_find_node_by_phandle), so that subsequent re-add operations,
>> such as those during migration, do not find the detached node,
>> do not observe duplicate names, do not rename them,  and the
>> extra WARNING notices are removed from the console output.
>>
>> In addition, it erases the 'name' field of the OF_DETACHED node,
>> to prevent any

[PATCH] powerpc/4xx: Fix error return path in ppc4xx_msi_probe()

2018-07-30 Thread Guenter Roeck

An arbitrary error in ppc4xx_msi_probe() quite likely results in a
crash similar to the following, seen after dma_alloc_coherent()
returned an error.

Unable to handle kernel paging request for data at address 0x
Faulting instruction address: 0xc001bff0
Oops: Kernel access of bad area, sig: 11 [#1]
BE Canyonlands
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Tainted: GW
4.18.0-rc6-00010-gff33d1030a6c #1
NIP:  c001bff0 LR: c001c418 CTR: c01faa7c
REGS: cf82db40 TRAP: 0300   Tainted: GW
(4.18.0-rc6-00010-gff33d1030a6c)
MSR:  00029000   CR: 28002024  XER: 
DEAR:  ESR: 
GPR00: c001c418 cf82dbf0 cf828000 cf8de400   00c4 00c4
GPR08: c0481ea4   00c4 22002024  c00025e8 
GPR16:       c0492380 004a
GPR24: 00029000 000c  cf8de410 c0494d60 c0494d60 cf8bebc0 0001
NIP [c001bff0] ppc4xx_of_msi_remove+0x48/0xa0
LR [c001c418] ppc4xx_msi_probe+0x294/0x3b8
Call Trace:
[cf82dbf0] [00029000] 0x29000 (unreliable)
[cf82dc10] [c001c418] ppc4xx_msi_probe+0x294/0x3b8
[cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
[cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
[cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
[cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
[cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
[cf82dd40] [c02050c8] device_add+0x404/0x5c4
[cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
[cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
[cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
[cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
[cf82de90] [c0449650] 
__machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
[cf82dea0] [c0002404] do_one_initcall+0x40/0x188
[cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
[cf82df30] [c0002600] kernel_init+0x18/0x104
[cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
Instruction dump:
90010024 813d0024 2f89 83c30058 41bd0014 4838 813d0024 7f89f800
409d002c 813e000c 57ea103a 3bff0001 <7c69502e> 2f83 419effe0 4803b26d
---[ end trace 8cf551077ecfc42a ]---

Fix it up. Specifically,

- Return valid error codes from ppc4xx_setup_pcieh_hw(), have it clean
  up after itself, and only access hardware after all possible error
  conditions have been handled.
- Use devm_kzalloc() instead of kzalloc() in ppc4xx_msi_probe()

Signed-off-by: Guenter Roeck 
---
 arch/powerpc/platforms/4xx/msi.c | 51 +++-
 1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/platforms/4xx/msi.c b/arch/powerpc/platforms/4xx/msi.c
index 81b2cbce7df8..7c324eff2f22 100644
--- a/arch/powerpc/platforms/4xx/msi.c
+++ b/arch/powerpc/platforms/4xx/msi.c
@@ -146,13 +146,19 @@ static int ppc4xx_setup_pcieh_hw(struct platform_device 
*dev,
const u32 *sdr_addr;
dma_addr_t msi_phys;
void *msi_virt;
+   int err;
 
sdr_addr = of_get_property(dev->dev.of_node, "sdr-base", NULL);
if (!sdr_addr)
-   return -1;
+   return -EINVAL;
 
-   mtdcri(SDR0, *sdr_addr, upper_32_bits(res.start));  /*HIGH addr */
-   mtdcri(SDR0, *sdr_addr + 1, lower_32_bits(res.start));  /* Low addr */
+   msi_data = of_get_property(dev->dev.of_node, "msi-data", NULL);
+   if (!msi_data)
+   return -EINVAL;
+
+   msi_mask = of_get_property(dev->dev.of_node, "msi-mask", NULL);
+   if (!msi_mask)
+   return -EINVAL;
 
msi->msi_dev = of_find_node_by_name(NULL, "ppc4xx-msi");
if (!msi->msi_dev)
@@ -160,30 +166,30 @@ static int ppc4xx_setup_pcieh_hw(struct platform_device 
*dev,
 
msi->msi_regs = of_iomap(msi->msi_dev, 0);
if (!msi->msi_regs) {
-   dev_err(>dev, "of_iomap problem failed\n");
-   return -ENOMEM;
+   dev_err(>dev, "of_iomap failed\n");
+   err = -ENOMEM;
+   goto node_put;
}
dev_dbg(>dev, "PCIE-MSI: msi register mapped 0x%x 0x%x\n",
(u32) (msi->msi_regs + PEIH_TERMADH), (u32) (msi->msi_regs));
 
msi_virt = dma_alloc_coherent(>dev, 64, _phys, GFP_KERNEL);
-   if (!msi_virt)
-   return -ENOMEM;
+   if (!msi_virt) {
+   err = -ENOMEM;
+   goto iounmap;
+   }
msi->msi_addr_hi = upper_32_bits(msi_phys);
msi->msi_addr_lo = lower_32_bits(msi_phys & 0x);
dev_dbg(>dev, "PCIE-MSI: msi address high 0x%x, low 0x%x\n",
msi->msi_addr_hi, msi->msi_addr_lo);
 
+   mtdcri(SDR0, *sdr_addr, upper_32_bits(res.start));  /*HIGH addr */
+   mtdcri(SDR0, *sdr_addr + 1, lower_32_bits(res.start));  /* Low addr */
+
/* Progam the Interrupt handler Termination addr registers */
out_be32(msi->msi_regs + PEIH_TERMADH, msi->msi_addr_hi);
out_be32(msi->msi_regs + PEIH_TERMADL, msi->msi_addr_lo);
 
-

Re: [PATCH 0/6] rapidio: move Kconfig menu definition to subsystem

2018-07-30 Thread Randy Dunlap

On 07/30/2018 03:50 PM, Alexei Colin wrote:
> The top-level Kconfig entry for RapidIO subsystem is currently
> duplicated in several architecture-specific Kconfig files. This set of
> patches does two things:
> 
> 1. Move the Kconfig menu definition into the RapidIO subsystem and
> remove the duplicate definitions from arch Kconfig files.
> 
> 2. Enable RapidIO Kconfig menu entry for arm and arm64 architectures,
> where it was not enabled before. I tested that subsystem and drivers
> build successfully for both architectures, and tested that the modules
> load on a custom arm64 Qemu model.
> 
> For all architectures, RapidIO menu should be offered when either:
> (1) The platform has a PCI bus (which host a RapidIO module on the bus).
> (2) The platform has a RapidIO IP block (connected to a system bus, e.g.
> AXI on ARM). In this case, 'select HAS_RAPIDIO' should be added to the
> 'config ARCH_*' menu entry for the SoCs that offer the IP block.
> 
> Prior to this patchset, different architectures used different criteria:
> * powerpc: (1) and (2)
> * mips: (1) and (2) after recent commit into next that added (2):
>   https://www.linux-mips.org/archives/linux-mips/2018-07/msg00596.html
>   fc5d988878942e9b42a4de5204bdd452f3f1ce47
>   491ec1553e0075f345fbe476a93775eabcbc40b6
> * x86: (1)
> * arm,arm64: none (RapidIO menus never offered)
> 
> Responses to feedback from prior submission (thanks for the reviews!):
> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593347.html
> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593349.html
> 
> Changelog:
>   * Moved Kconfig entry into RapidIO subsystem instead of duplicating
> 
> In the current patchset, I took the approach of adding '|| PCI' to the
> depends in the subsystem. I did try the alterantive approach mentioned
> in the reviews for v1 of this patch, where the subsystem Kconfig does
> not add a '|| PCI' and each per-architecture Kconfig has to add a
> 'select HAS_RAPIDIO if PCI' and SoCs with IP blocks have to also add
> 'select HAS_RAPIDIO'. This works too but requires each architecture's
> Kconfig to add the line for RapidIO (whereas current approach does not
> require that involvement) and also may create a false impression that
> the dependency on PCI is strict.
> 
> We appreciate the suggestion for also selecting the RapdiIO subsystem for
> compilation with COMPILE_TEST, but hope to address it in a separate
> patchset, localized to the subsystem, since it will need to change
> depends on all drivers, not just on the top level, and since this
> patch now spans multiple architectures.
> 
> 
> Alexei Colin (6):
>   rapidio: define top Kconfig menu in driver subtree
>   x86: factor out RapidIO Kconfig menu
>   powerpc: factor out RapidIO Kconfig menu entry
>   mips: factor out RapidIO Kconfig entry
>   arm: enable RapidIO menu in Kconfig
>   arm64: enable RapidIO menu in Kconfig
> 
>  arch/arm/Kconfig|  2 ++
>  arch/arm64/Kconfig  |  2 ++
>  arch/mips/Kconfig   | 11 ---
>  arch/powerpc/Kconfig| 13 +
>  arch/x86/Kconfig|  8 
>  drivers/rapidio/Kconfig | 15 +++
>  6 files changed, 20 insertions(+), 31 deletions(-)
> 

LGTM.

Acked-by: Randy Dunlap  # for the series

thanks,
-- 
~Randy

Re: [PATCH] powerpc/mobility: Fix node detach/rename problem

2018-07-30 Thread Tyrel Datwyler

On 07/29/2018 06:11 AM, Michael Bringmann wrote:
> During LPAR migration, the content of the device tree/sysfs may
> be updated including deletion and replacement of nodes in the
> tree.  When nodes are added to the internal node structures, they
> are appended in FIFO order to a list of nodes maintained by the
> OF code APIs.  When nodes are removed from the device tree, they
> are marked OF_DETACHED, but not actually deleted from the system
> to allow for pointers cached elsewhere in the kernel.  The order
> and content of the entries in the list of nodes is not altered,
> though.
> 
> During LPAR migration some common nodes are deleted and re-added
> e.g. "ibm,platform-facilities".  If a node is re-added to the OF
> node lists, the of_attach_node function checks to make sure that
> the name + ibm,phandle of the to-be-added data is unique.  As the
> previous copy of a re-added node is not modified beyond the addition
> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING
> notice to the console, (3) renames the to-be-added node to avoid
> filename collisions within a directory, and (3) adds entries to
> the sysfs/kernfs.

So, this patch actually just band aids over the real problem. This is a long 
standing problem with several PFO drivers leaking references. The issue here is 
that, during the device tree update that follows a migration. the update of the 
ibm,platform-facilities node and friends below are always deleted and re-added 
on the destination lpar and subsequently the leaked references prevent the 
devices nodes from every actually being properly cleaned up after detach. Thus, 
leading to the issue you are observing.

As and example a quick look at nx-842-pseries.c reveals several issues.

static int __init nx842_pseries_init(void)
{
struct nx842_devdata *new_devdata;
int ret;

if (!of_find_compatible_node(NULL, NULL, "ibm,compression"))
return -ENODEV;

This call to of_find_compatible_node() results in a node returned with refcount 
incremented and therefore immediately leaked.

Further, the reconfig notifier logic makes the assumption that it only needs to 
deal with node updates, but as I outlined above the post migration device tree 
update always results in PFO nodes and properties being deleted and re-added.

/**
 * nx842_OF_notifier - Process updates to OF properties for the device
 *
 * @np: notifier block
 * @action: notifier action
 * @update: struct pSeries_reconfig_prop_update pointer if action is
 *  PSERIES_UPDATE_PROPERTY
 *
 * Returns:
 *  NOTIFY_OK on success
 *  NOTIFY_BAD encoded with error number on failure, use
 *  notifier_to_errno() to decode this value
 */
static int nx842_OF_notifier(struct notifier_block *np, unsigned long action,
 void *data)
{
struct of_reconfig_data *upd = data;
struct nx842_devdata *local_devdata;
struct device_node *node = NULL;

rcu_read_lock();
local_devdata = rcu_dereference(devdata);
if (local_devdata)
node = local_devdata->dev->of_node;

if (local_devdata &&
action == OF_RECONFIG_UPDATE_PROPERTY &&
!strcmp(upd->dn->name, node->name)) {
rcu_read_unlock();
nx842_OF_upd(upd->prop);
} else
rcu_read_unlock();

return NOTIFY_OK;
}

I expect to find the same problems in pseries-rng.c and nx.c.

-Tyrel

> 
> This patch fixes the 'migration add' problem by changing the
> stored 'phandle' of the OF_DETACHed node to 0 (reserved value for
> of_find_node_by_phandle), so that subsequent re-add operations,
> such as those during migration, do not find the detached node,
> do not observe duplicate names, do not rename them,  and the
> extra WARNING notices are removed from the console output.
> 
> In addition, it erases the 'name' field of the OF_DETACHED node,
> to prevent any future calls to of_find_node_by_name() or
> of_find_node_by_path() from matching this node.
> 
> Signed-off-by: Michael Bringmann 
> ---
>  arch/powerpc/platforms/pseries/dlpar.c |3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
> b/arch/powerpc/platforms/pseries/dlpar.c
> index 2de0f0d..9d82c28 100644
> --- a/arch/powerpc/platforms/pseries/dlpar.c
> +++ b/arch/powerpc/platforms/pseries/dlpar.c
> @@ -274,6 +274,9 @@ int dlpar_detach_node(struct device_node *dn)
>   if (rc)
>   return rc;
> 
> + dn->phandle = 0;
> + memset(dn->name, 0, strlen(dn->name));
> +
>   return 0;
>  }
> 
>

Re: [PATCH 2/3] selftests/powerpc: Only run some tests on ppc64le

2018-07-30 Thread Gustavo Romero


Hi Michael,

On 07/27/2018 02:51 AM, Michael Ellerman wrote:

Gustavo Romero  writes:


Hi Michael,

On 07/26/2018 09:24 AM, Michael Ellerman wrote:

These tests are currently failing on (some) big endian systems. Until
we can fix that, skip them unless we're on ppc64le.


I can take a look at this.


Thanks!


Is that the same issue related to the gcc version we discussed a month ago?


Maybe, I've forgotten :)


If not, could you provide the crash logs as a starting point?


There's not much:

   test: tm_tar
   tags: git_version:9794259
   failure: tm_tar
   run-parts: ./tm-tar exited with return code 1
   
   test: tm_vmxcopy

   tags: git_version:9794259
   !! child died by signal 11
   failure: tm_vmxcopy
   run-parts: ./tm-vmxcopy exited with return code 1

And now I can't get the sigreturn one to fail :/


OK. I _think_ there is a chance it's the same issue we've discussed.

Hence, just for the record (in case somebody else wants to take a look at it as
well), on the occasion we had the following log about tm-sigreturn, that
although is not listed above (only tm-tar and tm-vmxcopy are ), it's marked to
run only on LE as per this patch, just like tm-tar and tm-vmxcopy:

[  403.458953] Unexpected TM Bad Thing exception at c006da4c (msr 
0x201032)
13:59:41 console: [  403.459865] Oops: Unrecoverable exception, sig: 6 [#1]
13:59:41 console: [  403.460286] BE SMP NR_CPUS=32 NUMA pSeries
13:59:41 console: [  403.460611] Modules linked in:
13:59:41 console: [  403.460929] CPU: 2 PID: 25233 Comm: tm-sigreturn Not 
tainted 4.16.0-gcc6x-g709b973 #1
13:59:41 console: [  403.461530] NIP:  c006da4c LR: c001d6a4 
CTR: 
13:59:41 console: [  403.461782] REGS: c0003ffcbd80 TRAP: 0700   Not 
tainted  (4.16.0-gcc6x-g709b973)
13:59:41 console: [  403.461943] MSR:  800300201032  
 CR: 48004288  XER: 2000
13:59:41 console: [  403.462112] CFAR: c001d6a0 SOFTE: 3
13:59:41 console: [  403.462112] PACATMSCRATCH: 00034280f032
13:59:41 console: [  403.462112] GPR00: d4018c01 c000f09bfc20 
c165bd00 c000fbd5c900
13:59:41 console: [  403.462112] GPR04: 80039032  
 
13:59:41 console: [  403.462112] GPR08:  0001 
0001 
13:59:41 console: [  403.462112] GPR12:  c0003fffdf00 
000f 1003a338
13:59:41 console: [  403.462112] GPR16: 10002e5c 10002e44 
10002d08 10002e24
13:59:41 console: [  403.462112] GPR20: 10002eac  
c000fbd5cdc8 c0df6578
13:59:41 console: [  403.462112] GPR24:  bfff 
fffef394 fffeee10
13:59:41 console: [  403.462112] GPR28: c000f09bfea0 0280d032 
 c000fbd5c900
13:59:41 console: [  403.463642] NIP [c006da4c] 
.tm_restore_sprs+0xc/0x1c
13:59:41 console: [  403.463782] LR [c001d6a4] 
.tm_recheckpoint.part.7+0x54/0x90
13:59:41 console: [  403.463912] Call Trace:
13:59:41 console: [  403.464003] [c000f09bfc20] [c0312ae0] 
.__might_fault+0x70/0x90 (unreliable)
13:59:41 console: [  403.464165] [c000f09bfca0] [c001b2b8] 
.restore_tm_user_regs.part.0+0x418/0x6c0
13:59:41 console: [  403.464326] [c000f09bfd70] [c001c55c] 
.compat_sys_sigreturn+0x14c/0x490
13:59:41 console: [  403.464495] [c000f09bfe30] [c000c070] 
system_call+0x58/0x6c
13:59:41 console: [  403.464624] Instruction dump:
13:59:41 console: [  403.464705] 7c800164 4e800020 7c0022a6 f80304b0 7c0222a6 
f80304b8 7c0122a6 f80304c0
13:59:41 console: [  403.464868] 4e800020 e80304b0 7c0023a6 e80304b8 <7c0223a6> 
e80304c0 7c0123a6 4e800020
13:59:41 console: [  403.465030] ---[ end trace 0045efc572a679cf ]---

and some initial debugging by mpe using xmon:

Unexpected TM Bad Thing exception at c0069e6c (msr 0x201032)
cpu 0x6: Vector: 700 (Program Check) at [c0003ff9bd80]
pc: c0069e6c: .tm_restore_sprs+0xc/0x1c
lr: c001e2a4: .tm_recheckpoint.part.11+0x64/0xf0
sp: c000f2a2bc30
   msr: 800300201032
  current = 0xc000fa68b000
  paca= 0xc0003fff8f00 softe: 3 irq_happened: 0x01
pid   = 3879, comm = tm-sigreturn
Linux version 4.17.0-rc1-gcc-7.0.1-5-g56376c5864f8 
(mich...@ka3.ozlabs.ibm.com) (gcc version 7.0.1 20170213 (experimental) (Custom 
8e8a14c238db56c7)) #172 SMP Thu Apr 19 21:48:31 AEST 2018
enter ? for help
[link register   ] c001e2a4 .tm_recheckpoint.part.11+0x64/0xf0
[c000f2a2bc30] c000f2a2bcb0 (unreliable)
[c000f2a2bcb0] c001c30c .restore_tm_user_regs.part.0+0x55c/0x610
[c000f2a2bd70] c001d400 .compat_sys_sigreturn+0x130/0x410
[c000f2a2be30] c000c268 system_call+0x58/0x6c
--- Exception: 300 (Data Access) at 19c8
SP (ffd75e90) is in userspace
6:mon> di %pc
c0069e6c  7c0223a6mtspr

[PATCH 0/6] rapidio: move Kconfig menu definition to subsystem

2018-07-30 Thread Alexei Colin

The top-level Kconfig entry for RapidIO subsystem is currently
duplicated in several architecture-specific Kconfig files. This set of
patches does two things:

1. Move the Kconfig menu definition into the RapidIO subsystem and
remove the duplicate definitions from arch Kconfig files.

2. Enable RapidIO Kconfig menu entry for arm and arm64 architectures,
where it was not enabled before. I tested that subsystem and drivers
build successfully for both architectures, and tested that the modules
load on a custom arm64 Qemu model.

For all architectures, RapidIO menu should be offered when either:
(1) The platform has a PCI bus (which host a RapidIO module on the bus).
(2) The platform has a RapidIO IP block (connected to a system bus, e.g.
AXI on ARM). In this case, 'select HAS_RAPIDIO' should be added to the
'config ARCH_*' menu entry for the SoCs that offer the IP block.

Prior to this patchset, different architectures used different criteria:
* powerpc: (1) and (2)
* mips: (1) and (2) after recent commit into next that added (2):
  https://www.linux-mips.org/archives/linux-mips/2018-07/msg00596.html
  fc5d988878942e9b42a4de5204bdd452f3f1ce47
  491ec1553e0075f345fbe476a93775eabcbc40b6
* x86: (1)
* arm,arm64: none (RapidIO menus never offered)

Responses to feedback from prior submission (thanks for the reviews!):
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593347.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593349.html

Changelog:
  * Moved Kconfig entry into RapidIO subsystem instead of duplicating

In the current patchset, I took the approach of adding '|| PCI' to the
depends in the subsystem. I did try the alterantive approach mentioned
in the reviews for v1 of this patch, where the subsystem Kconfig does
not add a '|| PCI' and each per-architecture Kconfig has to add a
'select HAS_RAPIDIO if PCI' and SoCs with IP blocks have to also add
'select HAS_RAPIDIO'. This works too but requires each architecture's
Kconfig to add the line for RapidIO (whereas current approach does not
require that involvement) and also may create a false impression that
the dependency on PCI is strict.

We appreciate the suggestion for also selecting the RapdiIO subsystem for
compilation with COMPILE_TEST, but hope to address it in a separate
patchset, localized to the subsystem, since it will need to change
depends on all drivers, not just on the top level, and since this
patch now spans multiple architectures.


Alexei Colin (6):
  rapidio: define top Kconfig menu in driver subtree
  x86: factor out RapidIO Kconfig menu
  powerpc: factor out RapidIO Kconfig menu entry
  mips: factor out RapidIO Kconfig entry
  arm: enable RapidIO menu in Kconfig
  arm64: enable RapidIO menu in Kconfig

 arch/arm/Kconfig|  2 ++
 arch/arm64/Kconfig  |  2 ++
 arch/mips/Kconfig   | 11 ---
 arch/powerpc/Kconfig| 13 +
 arch/x86/Kconfig|  8 
 drivers/rapidio/Kconfig | 15 +++
 6 files changed, 20 insertions(+), 31 deletions(-)

-- 
2.18.0

[PATCH 3/6] powerpc: factor out RapidIO Kconfig menu entry

2018-07-30 Thread Alexei Colin

The menu entry is now defined in the rapidio subtree.  Also, re-order
the bus menu so tha the platform-specific RapidIO controller appears
after the entry for the RapidIO subsystem.

Platforms with a PCI bus will be offered the RapidIO menu since they may
be want support for a RapidIO PCI device. Platforms without a PCI bus
that might include a RapidIO IP block will need to "select HAS_RAPIDIO"
in the platform-/machine-specific "config ARCH_*" Kconfig entry.

Cc: Andrew Morton 
Cc: John Paul Walters 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Alexei Colin 
---
 arch/powerpc/Kconfig | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 25d005af0a5b..17ea8a5f90a0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -993,16 +993,7 @@ source "drivers/pci/Kconfig"
 
 source "drivers/pcmcia/Kconfig"
 
-config HAS_RAPIDIO
-   bool
-   default n
-
-config RAPIDIO
-   tristate "RapidIO support"
-   depends on HAS_RAPIDIO || PCI
-   help
- If you say Y here, the kernel will include drivers and
- infrastructure code to support RapidIO interconnect devices.
+source "drivers/rapidio/Kconfig"
 
 config FSL_RIO
bool "Freescale Embedded SRIO Controller support"
@@ -1012,8 +1003,6 @@ config FSL_RIO
  Include support for RapidIO controller on Freescale embedded
  processors (MPC8548, MPC8641, etc).
 
-source "drivers/rapidio/Kconfig"
-
 endmenu
 
 config NONSTATIC_KERNEL
-- 
2.18.0

Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()

2018-07-30 Thread Murilo Opsfelder Araujo

Hi, Christophe.

On Mon, Jul 30, 2018 at 06:30:47PM +0200, LEROY Christophe wrote:
> Murilo Opsfelder Araujo  a écrit :
>
> > Hi, Christophe.
> >
> > On Fri, Jul 27, 2018 at 06:40:23PM +0200, LEROY Christophe wrote:
> > > Murilo Opsfelder Araujo  a écrit :
> > >
> > > > Simplify the message format by using REG_FMT as the register format.  
> > > > This
> > > > avoids having two different formats and avoids checking for MSR_64BIT.
> > >
> > > Are you sure it is what we want ?
> >
> > Yes.
> >
> > > Won't it change the behaviour for a 32 bits app running on a 64bits 
> > > kernel ?
> >
> > In fact, this changes how many zeroes are prefixed when displaying the
> > registers
> > (%016lx vs. %08lx format).  For example, 32-bits userspace, 64-bits kernel:
>
> Indeed that's what I suspected. What is the real benefit of this change ?
> Why not keep the current format for 32bits userspace ? All those leading
> zeroes are pointless to me.

One of the benefits is simplifying the code by removing some checks.  Another is
deduplicating almost identical format strings in favor of a unified one.

After reading Joe's comment [1], %px seems to be the format we're looking for.
An extract from Documentation/core-api/printk-formats.rst:

  "%px is functionally equivalent to %lx (or %lu). %px is preferred because it
  is more uniquely grep'able."

So I guess we don't need to worry about the format (%016lx vs. %08lx), let's
just use %px, as per the guideline.

[1] 
https://lore.kernel.org/lkml/26f07092cdde378ebb42c1034badde1b56521c36.ca...@perches.com/

Cheers
Murilo

Re: [PATCH 0/6] rapidio: move Kconfig menu definition to subsystem

2018-07-30 Thread Russell King - ARM Linux

I only have this message, and patches 5 and 6.  This is meaningless
for me to review - I can't tell what you've done as far as my comments
on your previous iteration.

Please arrange to _at least_ copy all patches the appropriate mailing
lists for the set with your complete patch set if you aren't going to
copy everyone on all patches in the set.

On Mon, Jul 30, 2018 at 06:50:28PM -0400, Alexei Colin wrote:
> In the current patchset, I took the approach of adding '|| PCI' to the
> depends in the subsystem. I did try the alterantive approach mentioned
> in the reviews for v1 of this patch, where the subsystem Kconfig does
> not add a '|| PCI' and each per-architecture Kconfig has to add a
> 'select HAS_RAPIDIO if PCI' and SoCs with IP blocks have to also add
> 'select HAS_RAPIDIO'. This works too but requires each architecture's
> Kconfig to add the line for RapidIO (whereas current approach does not
> require that involvement) and also may create a false impression that
> the dependency on PCI is strict.

... which means, as it stands, I've no idea what you actually came up
with for this.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up

Re: [PATCH v3 1/1] powerpc/pseries: fix EEH recovery of some IOV devices

2018-07-30 Thread Bjorn Helgaas

On Mon, Jul 30, 2018 at 11:59:14AM +1000, Sam Bobroff wrote:
> EEH recovery currently fails on pSeries for some IOV capable PCI
> devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
> certain device tree properties for the device. (Found on an IOV
> capable device using the ipr driver.)
> 
> Recovery fails in pci_enable_resources() at the check on r->parent,
> because r->flags is set and r->parent is not.  This state is due to
> sriov_init() setting the start, end and flags members of the IOV BARs
> but the parent not being set later in
> pseries_pci_fixup_iov_resources(), because the
> "ibm,open-sriov-vf-bar-info" property is missing.
> 
> Correct this by zeroing the resource flags for IOV BARs when they
> can't be configured (this is the same method used by sriov_init() and
> __pci_read_base()).
> 
> VFs cleared this way can't be enabled later, because that requires
> another device tree property, "ibm,number-of-configurable-vfs" as well
> as support for the RTAS function "ibm_map_pes". These are all part of
> hypervisor support for IOV and it seems unlikely that a hypervisor
> would ever partially, but not fully, support it. (None are currently
> provided by QEMU/KVM.)
> 
> Signed-off-by: Sam Bobroff 

Michael, I assume you'll take this, since it only touches powerpc.
Let me know if you need anything from me.

> ---
> Hi,
> 
> This is a fix to allow EEH recovery to succeed in a specific situation,
> which I've tried to explain in the commit message.
> 
> As with the RFC version, the IOV BARs are disabled by setting the resource
> flags to 0 but the other fields are now left as-is because that is what is 
> done
> elsewhere (see sriov_init() and __pci_read_base()).
> 
> I've also examined the concern raised by Bjorn Helgaas, that VFs could be
> enabled later after the BARs are disabled, and it already seems safe: enabling
> VFs (on pseries) depends on another device tree property,
> "ibm,number-of-configurable-vfs" as well as support for the RTAS function
> "ibm_map_pes". Since these are all part of the hypervisor's support for IOV it
> seems unlikely that we would ever see some of them but not all. (None are
> currently provided by QEMU/KVM.) (Additionally, the ipr driver on which the 
> EEH
> recovery failure was discovered doesn't even seem to have SR-IOV support so it
> certainly can't enable VFs.)
> 
> Cheers,
> Sam.
> 
> Patch set v3:
> Patch 1/1: powerpc/pseries: fix EEH recovery of some IOV devices
> * Moved some useful information from the cover letter to the commit log.
> 
> Patch set v2:
> Patch 1/1: powerpc/pseries: fix EEH recovery of some IOV devices
> * Moved the BAR disabling code to a function.
> * Also check in pseries_pci_fixup_resources().
> 
> Patch set v1:
> Patch 1/1: powerpc/pseries: fix EEH recovery of IOV devices
> 
>  arch/powerpc/platforms/pseries/setup.c | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/setup.c 
> b/arch/powerpc/platforms/pseries/setup.c
> index b55ad4286dc7..0a9e4243ae1d 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -645,6 +645,15 @@ void of_pci_parse_iov_addrs(struct pci_dev *dev, const 
> int *indexes)
>   }
>  }
>  
> +static void pseries_disable_sriov_resources(struct pci_dev *pdev)
> +{
> + int i;
> +
> + pci_warn(pdev, "No hypervisor support for SR-IOV on this device, IOV 
> BARs disabled.\n");
> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
> + pdev->resource[i + PCI_IOV_RESOURCES].flags = 0;
> +}
> +
>  static void pseries_pci_fixup_resources(struct pci_dev *pdev)
>  {
>   const int *indexes;
> @@ -652,10 +661,10 @@ static void pseries_pci_fixup_resources(struct pci_dev 
> *pdev)
>  
>   /*Firmware must support open sriov otherwise dont configure*/
>   indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL);
> - if (!indexes)
> - return;
> - /* Assign the addresses from device tree*/
> - of_pci_set_vf_bar_size(pdev, indexes);
> + if (indexes)
> + of_pci_set_vf_bar_size(pdev, indexes);
> + else
> + pseries_disable_sriov_resources(pdev);
>  }
>  
>  static void pseries_pci_fixup_iov_resources(struct pci_dev *pdev)
> @@ -667,10 +676,10 @@ static void pseries_pci_fixup_iov_resources(struct 
> pci_dev *pdev)
>   return;
>   /*Firmware must support open sriov otherwise dont configure*/
>   indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL);
> - if (!indexes)
> - return;
> - /* Assign the addresses from device tree*/
> - of_pci_parse_iov_addrs(pdev, indexes);
> + if (indexes)
> + of_pci_parse_iov_addrs(pdev, indexes);
> + else
> + pseries_disable_sriov_resources(pdev);
>  }
>  
>  static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev 
> *pdev,
> -- 
> 2.16.1.74.g9b0b1f47b
>

Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code

2018-07-30 Thread Bjorn Helgaas

[+cc Joerg]

On Mon, Jul 30, 2018 at 09:38:42AM +0200, Christoph Hellwig wrote:
> There is nothing arch specific about PCI or dma-debug, so move this
> call to common code just after registering the bus type.

I assume that previously, even if the user set CONFIG_DMA_API_DEBUG=y
we only got PCI DMA debug on powerpc, sh, and x86.  And after this
patch, we'll get PCI DMA debug on *all* arches?

If that's true, I'll add a comment to that effect to the commitlog
since that new functionality might be of interest to other arches.

> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/kernel/dma.c | 3 ---
>  arch/sh/drivers/pci/pci.c | 2 --
>  arch/x86/kernel/pci-dma.c | 3 ---
>  drivers/pci/pci-driver.c  | 2 +-
>  4 files changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index 155170d70324..dbfc7056d7df 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -357,9 +357,6 @@ EXPORT_SYMBOL_GPL(dma_get_required_mask);
>  
>  static int __init dma_init(void)
>  {
> -#ifdef CONFIG_PCI
> - dma_debug_add_bus(_bus_type);
> -#endif
>  #ifdef CONFIG_IBMVIO
>   dma_debug_add_bus(_bus_type);
>  #endif
> diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
> index e5b7437ab4af..8256626bc53c 100644
> --- a/arch/sh/drivers/pci/pci.c
> +++ b/arch/sh/drivers/pci/pci.c
> @@ -160,8 +160,6 @@ static int __init pcibios_init(void)
>   for (hose = hose_head; hose; hose = hose->next)
>   pcibios_scanbus(hose);
>  
> - dma_debug_add_bus(_bus_type);
> -
>   pci_initialized = 1;
>  
>   return 0;
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index ab5d9dd668d2..43f58632f123 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -155,9 +155,6 @@ static int __init pci_iommu_init(void)
>  {
>   struct iommu_table_entry *p;
>  
> -#ifdef CONFIG_PCI
> - dma_debug_add_bus(_bus_type);
> -#endif
>   x86_init.iommu.iommu_init();
>  
>   for (p = __iommu_table; p < __iommu_table_end; p++) {
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 6792292b5fc7..bef17c3fca67 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1668,7 +1668,7 @@ static int __init pci_driver_init(void)
>   if (ret)
>   return ret;
>  #endif
> -
> + dma_debug_add_bus(_bus_type);
>   return 0;
>  }
>  postcore_initcall(pci_driver_init);
> -- 
> 2.18.0
>

Re: [PATCH] powerpc/mobility: Fix node detach/rename problem

2018-07-30 Thread Michael Bringmann

See below.

On 07/30/2018 01:31 AM, Michael Ellerman wrote:
> Michael Bringmann  writes:
> 
>> During LPAR migration, the content of the device tree/sysfs may
>> be updated including deletion and replacement of nodes in the
>> tree.  When nodes are added to the internal node structures, they
>> are appended in FIFO order to a list of nodes maintained by the
>> OF code APIs.
> 
> That hasn't been true for several years. The data structure is an n-ary
> tree. What kernel version are you working on?

Sorry for an error in my description.  I oversimplified based on the
name of a search iterator.  Let me try to provide a better explanation
of the problem, here.

This is the problem.  The PPC mobility code receives RTAS requests to
delete nodes with platform-/hardware-specific attributes when restarting
the kernel after a migration.  My example is for migration between a
P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 'ibm,random-v1',
'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
or others.

The mobility.c code calls 'of_detach_node' for the nodes and their children.
This makes calls to detach the properties and to try to remove the associated
sysfs/kernfs files.

Then new copies of the same nodes are next provided by the PHYP, local
copies are built, and a pointer to the 'struct device_node' is passed to
of_attach_node.  Before the call to of_attach_node, the phandle is initialized
to 0 when the data structure is alloced.  During the call to of_attach_node,
it calls __of_attach_node which pulls the actual name and phandle from just
created sub-properties named something like 'name' and 'ibm,phandle'.

This is all fine for the first migration.  The problem occurs with the
second and subsequent migrations when the PHYP on the new system wants to
replace the same set of nodes again, referenced with the same names and
phandle values.

> 
>> When nodes are removed from the device tree, they
>> are marked OF_DETACHED, but not actually deleted from the system
>> to allow for pointers cached elsewhere in the kernel.  The order
>> and content of the entries in the list of nodes is not altered,
>> though.
> 
> Something is going wrong if this is actually happening.
> 
> When the node is detached it should be *detached* from the tree of all
> nodes, so it should not be discoverable other than by having an existing
> pointer to it.
On the second and subsequent migrations, the PHYP tells the system
to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
nodes by its known set of phandle values -- the same handles used
by the PHYP on the source system are known on the target system.
The mobility.c code calls of_find_node_by_phandle() with these values
and ends up locating the first instance of each node that was added
during the original boot, instead of the second instance of each node
created after the first migration.  The detach during the second
migration fails with errors like,

[ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 
__of_detach_node+0x8/0xa0
[ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag 
unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto 
sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi 
scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
[ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: GW 
4.18.0-rc1-wi107836-v05-120+ #201
[ 4565.030737] NIP:  c07c1ea8 LR: c07c1fb4 CTR: 00655170
[ 4565.030741] REGS: c003f302b690 TRAP: 0700   Tainted: GW  
(4.18.0-rc1-wi107836-v05-120+)
[ 4565.030745] MSR:  80010282b033   
CR: 22288822  XER: 000a
[ 4565.030757] CFAR: c07c1fb0 IRQMASK: 1
[ 4565.030757] GPR00: c07c1fa4 c003f302b910 c114bf00 
c0038e68
[ 4565.030757] GPR04: 0001  80c008e0b4b8 

[ 4565.030757] GPR08:  0001 8003 
2843
[ 4565.030757] GPR12: 8800 c0001ec9ae00 4000 

[ 4565.030757] GPR16:  0008  
f6ff
[ 4565.030757] GPR20: 0007  c003e9f1f034 
0001
[ 4565.030757] GPR24:    

[ 4565.030757] GPR28: c1549d28 c1134828 c0038e68 
c003f302b930
[ 4565.030804] NIP [c07c1ea8] __of_detach_node+0x8/0xa0
[ 4565.030808] LR [c07c1fb4] of_detach_node+0x74/0xd0
[ 4565.030811] Call Trace:
[ 4565.030815] [c003f302b910] [c07c1fa4] of_detach_node+0x64/0xd0 
(unreliable)
[ 4565.030821] [c003f302b980] [c00c33c4] 
dlpar_detach_node+0xb4/0x150
[ 4565.030826] [c003f302ba10] [c00c3ffc] delete_dt_node+0x3c/0x80
[

[PATCH 2/5] tools headers powerpc: Update asm/unistd.h copy to pick new

2018-07-30 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

The new 'io_pgetevents' syscall was wired up in PowerPC in the following
cset:

  b2f82565f2ca ("powerpc: Wire up io_pgetevents")

Update tools/arch/powerpc/ copy of the asm/unistd.h file so that 'perf
trace' on PowerPC gets it in its syscall table.

This elliminated the following perf build warning:

  Warning: Kernel ABI header at 'tools/arch/powerpc/include/uapi/asm/unistd.h' 
differs from latest version at 'arch/powerpc/include/uapi/asm/unistd.h'

Cc: Alexander Shishkin 
Cc: Breno Leitao 
Cc: Hendrik Brueckner 
Cc: Jiri Olsa 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman 
Cc: Namhyung Kim 
Cc: Ravi Bangoria 
Cc: Thomas Richter 
Link: https://lkml.kernel.org/n/tip-9uvu7tz4ud3bxxfyxwryu...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/arch/powerpc/include/uapi/asm/unistd.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/arch/powerpc/include/uapi/asm/unistd.h 
b/tools/arch/powerpc/include/uapi/asm/unistd.h
index ac5ba55066dd..985534d0b448 100644
--- a/tools/arch/powerpc/include/uapi/asm/unistd.h
+++ b/tools/arch/powerpc/include/uapi/asm/unistd.h
@@ -399,5 +399,6 @@
 #define __NR_pkey_free 385
 #define __NR_pkey_mprotect 386
 #define __NR_rseq  387
+#define __NR_io_pgetevents 388
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.14.4

[GIT PULL 0/5] perf/urgent fixes

2018-07-30 Thread Arnaldo Carvalho de Melo

Hi Ingo,

Please consider pulling, just to get the build without warnings
and finishing successfully in all my test environments,

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit 7f635ff187ab6be0b350b3ec06791e376af238ab:

  perf/core: Fix crash when using HW tracing kernel filters (2018-07-25 
11:46:22 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-urgent-for-mingo-4.18-20180730

for you to fetch changes up to 44fe619b1418ff4e9d2f9518a940fbe2fb686a08:

  perf tools: Fix the build on the alpine:edge distro (2018-07-30 13:15:03 
-0300)


perf/urgent fixes: (Arnaldo Carvalho de Melo)

- Update the tools copy of several files, including perf_event.h,
  powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and
  x86's memcpy_64.s (used in 'perf bench mem'), silencing the
  respective warnings during the perf tools build.

- Fix the build on the alpine:edge distro.

Signed-off-by: Arnaldo Carvalho de Melo 


Arnaldo Carvalho de Melo (5):
  tools headers uapi: Update tools's copy of linux/perf_event.h
  tools headers powerpc: Update asm/unistd.h copy to pick new
  tools headers uapi: Refresh linux/bpf.h copy
  tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem 
memcpy'
  perf tools: Fix the build on the alpine:edge distro

 tools/arch/powerpc/include/uapi/asm/unistd.h |   1 +
 tools/arch/x86/include/asm/mcsafe_test.h |  13 
 tools/arch/x86/lib/memcpy_64.S   | 112 +--
 tools/include/uapi/linux/bpf.h   |  28 +--
 tools/include/uapi/linux/perf_event.h|   2 +
 tools/perf/arch/x86/util/pmu.c   |   1 +
 tools/perf/arch/x86/util/tsc.c   |   1 +
 tools/perf/bench/Build   |   1 +
 tools/perf/bench/mem-memcpy-x86-64-asm.S |   1 +
 tools/perf/bench/mem-memcpy-x86-64-lib.c |  24 ++
 tools/perf/perf.h|   1 +
 tools/perf/util/header.h |   1 +
 tools/perf/util/namespaces.h |   1 +
 13 files changed, 124 insertions(+), 63 deletions(-)
 create mode 100644 tools/arch/x86/include/asm/mcsafe_test.h
 create mode 100644 tools/perf/bench/mem-memcpy-x86-64-lib.c

Test results:

The first ones are container (docker) based builds of tools/perf with
and without libelf support.  Where clang is available, it is also used
to build perf with/without libelf, and building with LIBCLANGLLVM=1
(built-in clang) with gcc and clang when clang and its devel libraries
are installed.

The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container 
cluster.
Those will come back later.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

  # dm
   1 alpine:3.4: Ok   gcc (Alpine 5.3.0) 5.3.0
   2 alpine:3.5: Ok   gcc (Alpine 6.2.1) 6.2.1 20160822
   3 alpine:3.6: Ok   gcc (Alpine 6.3.0) 6.3.0
   4 alpine:3.7: Ok   gcc (Alpine 6.4.0) 6.4.0
   5 alpine:edge   : Ok   gcc (Alpine 6.4.0) 6.4.0
   6 amazonlinux:1 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 
4.8.5-28)
   7 amazonlinux:2 : Ok   gcc (GCC) 7.3.1 20180303 (Red Hat 
7.3.1-5)
   8 android-ndk:r12b-arm  : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 
20150123 (prerelease)
   9 android-ndk:r15c-arm  : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 
20150123 (prerelease)
  10 centos:5  : Ok   gcc (GCC) 4.1.2 20080704 (Red Hat 
4.1.2-55)
  11 centos:6  : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 
4.4.7-18)
  12 centos:7  : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat

Re: [PATCH] of/fdt: Remove PPC32 longtrail hack in memory scan

2018-07-30 Thread Rob Herring

On Mon, Jul 30, 2018 at 4:47 AM Michael Ellerman  wrote:
>
> Rob Herring  writes:
> > On Thu, Jul 26, 2018 at 11:36 PM Michael Ellerman  
> > wrote:
> >> When the OF code was originally made common by Grant in commit
> >> 51975db0b733 ("of/flattree: merge early_init_dt_scan_memory() common
> >> code") (Feb 2010), the common code inherited a hack to handle
> >> PPC "longtrail" machines, which had a "memory@0" node with no
> >> device_type.
> >>
> >> That check was then made to only apply to PPC32 in b44aa25d20e2 ("of:
> >> Handle memory@0 node on PPC32 only") (May 2014).
> >>
> >> But according to Paul Mackerras the "longtrail" machines are long
> >> dead, if they were ever seen in the wild at all. If someone does still
> >> have one, we can handle this firmware wart in powerpc platform code.
> >>
> >> So remove the hack once and for all.
> >
> > Yay. I guess Power Macs and other quirks will never die...
>
> Not soon.
>
> In base.c I see:
>  - the hack in arch_find_n_match_cpu_physical_id()
>- we should just move that into arch code, it's a __weak arch hook
>  after all.

Except then we'd have to export __of_find_n_match_cpu_property. I
somewhat prefer it like it is because arch specific functions tend to
encourage duplication. But if the implementation is completely
different like sparc, then yes, a separate implementation makes sense.

>  - a PPC hack in of_alias_scan(), I guess we need to retain that
>behaviour, but it's pretty minor anyway.

It would be nice to know what platform(s) needs this as I don't have a
clue. It would also be nice if I had some dumps of DTs for some of
these OpenFirmware systems.

Rob

[PATCH 20/20] powerpc/dma: remove dma_nommu_mmap_coherent

2018-07-30 Thread Christoph Hellwig

The remaining implementation for coherent caches is functionally
identical to the default provided in common code.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/dma-mapping.h |  7 ---
 arch/powerpc/kernel/dma-iommu.c|  1 -
 arch/powerpc/kernel/dma.c  | 13 -
 arch/powerpc/platforms/pseries/vio.c   |  1 -
 4 files changed, 22 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 879c4efba785..e62e23aa3714 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -18,13 +18,6 @@
 #include 
 #include 
 
-/* Some dma direct funcs must be visible for use in other dma_ops */
-extern int dma_nommu_mmap_coherent(struct device *dev,
-   struct vm_area_struct *vma,
-   void *cpu_addr, dma_addr_t handle,
-   size_t size, unsigned long attrs);
-
-
 static inline unsigned long device_to_mask(struct device *dev)
 {
if (dev->dma_mask && *dev->dma_mask)
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index f9fe2080ceb9..bf5234e1f71b 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -114,7 +114,6 @@ int dma_iommu_mapping_error(struct device *dev, dma_addr_t 
dma_addr)
 struct dma_map_ops dma_iommu_ops = {
.alloc  = dma_iommu_alloc_coherent,
.free   = dma_iommu_free_coherent,
-   .mmap   = dma_nommu_mmap_coherent,
.map_sg = dma_iommu_map_sg,
.unmap_sg   = dma_iommu_unmap_sg,
.dma_supported  = dma_iommu_dma_supported,
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 08b12cbd7abf..5b71c9d1b8cc 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -70,18 +70,6 @@ static void dma_nommu_free_coherent(struct device *dev, 
size_t size,
iommu_free_coherent(iommu, size, vaddr, dma_handle);
 }
 
-int dma_nommu_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
-void *cpu_addr, dma_addr_t handle, size_t size,
-unsigned long attrs)
-{
-   unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
-
-   return remap_pfn_range(vma, vma->vm_start,
-  pfn + vma->vm_pgoff,
-  vma->vm_end - vma->vm_start,
-  vma->vm_page_prot);
-}
-
 /* note: needs to be called arch_get_required_mask for dma-noncoherent.c */
 u64 arch_get_required_mask(struct device *dev)
 {
@@ -98,7 +86,6 @@ u64 arch_get_required_mask(struct device *dev)
 const struct dma_map_ops dma_nommu_ops = {
.alloc  = dma_nommu_alloc_coherent,
.free   = dma_nommu_free_coherent,
-   .mmap   = dma_nommu_mmap_coherent,
.map_sg = dma_direct_map_sg,
.map_page   = dma_direct_map_page,
.get_required_mask  = arch_get_required_mask,
diff --git a/arch/powerpc/platforms/pseries/vio.c 
b/arch/powerpc/platforms/pseries/vio.c
index 49e04ec19238..51d564313bd0 100644
--- a/arch/powerpc/platforms/pseries/vio.c
+++ b/arch/powerpc/platforms/pseries/vio.c
@@ -618,7 +618,6 @@ static u64 vio_dma_get_required_mask(struct device *dev)
 static const struct dma_map_ops vio_dma_mapping_ops = {
.alloc = vio_dma_iommu_alloc_coherent,
.free  = vio_dma_iommu_free_coherent,
-   .mmap  = dma_nommu_mmap_coherent,
.map_sg= vio_dma_iommu_map_sg,
.unmap_sg  = vio_dma_iommu_unmap_sg,
.map_page  = vio_dma_iommu_map_page,
-- 
2.18.0

[PATCH 19/20] powerpc/dma: use the generic dma-direct map_page and map_sg routines

2018-07-30 Thread Christoph Hellwig

These are indentical except for additional error checking, so migrate
to the common code, and wire up the get_mapping_error method as well.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/dma.c | 32 
 1 file changed, 4 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index b2e88075b2ea..08b12cbd7abf 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -82,21 +82,6 @@ int dma_nommu_mmap_coherent(struct device *dev, struct 
vm_area_struct *vma,
   vma->vm_page_prot);
 }
 
-static int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
-int nents, enum dma_data_direction direction,
-unsigned long attrs)
-{
-   struct scatterlist *sg;
-   int i;
-
-   for_each_sg(sgl, sg, nents, i) {
-   sg->dma_address = phys_to_dma(dev, sg_phys(sg));
-   sg->dma_length = sg->length;
-   }
-
-   return nents;
-}
-
 /* note: needs to be called arch_get_required_mask for dma-noncoherent.c */
 u64 arch_get_required_mask(struct device *dev)
 {
@@ -110,24 +95,15 @@ u64 arch_get_required_mask(struct device *dev)
return mask;
 }
 
-static inline dma_addr_t dma_nommu_map_page(struct device *dev,
-struct page *page,
-unsigned long offset,
-size_t size,
-enum dma_data_direction dir,
-unsigned long attrs)
-{
-   return phys_to_dma(dev, page_to_phys(page)) + offset;
-}
-
 const struct dma_map_ops dma_nommu_ops = {
.alloc  = dma_nommu_alloc_coherent,
.free   = dma_nommu_free_coherent,
.mmap   = dma_nommu_mmap_coherent,
-   .map_sg = dma_nommu_map_sg,
-   .dma_supported  = dma_direct_supported,
-   .map_page   = dma_nommu_map_page,
+   .map_sg = dma_direct_map_sg,
+   .map_page   = dma_direct_map_page,
.get_required_mask  = arch_get_required_mask,
+   .dma_supported  = dma_direct_supported,
+   .mapping_error  = dma_direct_mapping_error,
 };
 
 #ifndef CONFIG_NOT_COHERENT_CACHE
-- 
2.18.0

[PATCH 18/20] powerpc/dma-noncoherent: use generic dma_noncoherent_ops

2018-07-30 Thread Christoph Hellwig

The generic dma-noncoherent code provides all that is needed by powerpc.

Note that the cache maintainance in the existing code is a bit odd
as it implements both the sync_to_device and sync_to_cpu callouts,
but never flushes caches when unmapping.  This patch keeps both
directions arounds, which will lead to more flushing than the previous
implementation.  Someone more familar with the required CPUs should
eventually take a look and optimize the cache flush handling if needed.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/Kconfig   |  2 +-
 arch/powerpc/include/asm/dma-mapping.h | 29 -
 arch/powerpc/kernel/dma.c  | 59 +++---
 arch/powerpc/kernel/pci-common.c   |  5 ++-
 arch/powerpc/kernel/setup-common.c |  4 ++
 arch/powerpc/mm/dma-noncoherent.c  | 52 +--
 arch/powerpc/platforms/44x/warp.c  |  2 +-
 arch/powerpc/platforms/Kconfig.cputype |  6 ++-
 8 files changed, 60 insertions(+), 99 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index bbfa6a8df4da..33c6017ffce6 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -129,7 +129,7 @@ config PPC
# Please keep this list sorted alphabetically.
#
select ARCH_HAS_DEVMEM_IS_ALLOWED
-   select ARCH_HAS_DMA_SET_COHERENT_MASK
+   select ARCH_HAS_DMA_SET_COHERENT_MASK if !NOT_COHERENT_CACHE
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index f0bf7ac2686c..879c4efba785 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -19,40 +19,11 @@
 #include 
 
 /* Some dma direct funcs must be visible for use in other dma_ops */
-extern void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
-dma_addr_t *dma_handle, gfp_t flag,
-unsigned long attrs);
-extern void __dma_nommu_free_coherent(struct device *dev, size_t size,
-  void *vaddr, dma_addr_t dma_handle,
-  unsigned long attrs);
 extern int dma_nommu_mmap_coherent(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t handle,
size_t size, unsigned long attrs);
 
-#ifdef CONFIG_NOT_COHERENT_CACHE
-/*
- * DMA-consistent mapping functions for PowerPCs that don't support
- * cache snooping.  These allocate/free a region of uncached mapped
- * memory space for use with DMA devices.  Alternatively, you could
- * allocate the space "normally" and use the cache management functions
- * to ensure it is consistent.
- */
-struct device;
-extern void __dma_sync(void *vaddr, size_t size, int direction);
-extern void __dma_sync_page(struct page *page, unsigned long offset,
-size_t size, int direction);
-extern unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr);
-
-#else /* ! CONFIG_NOT_COHERENT_CACHE */
-/*
- * Cache coherent cores.
- */
-
-#define __dma_sync(addr, size, rw) ((void)0)
-#define __dma_sync_page(pg, off, sz, rw)   ((void)0)
-
-#endif /* ! CONFIG_NOT_COHERENT_CACHE */
 
 static inline unsigned long device_to_mask(struct device *dev)
 {
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 2b90a403cdac..b2e88075b2ea 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -36,12 +36,7 @@ static void *dma_nommu_alloc_coherent(struct device *dev, 
size_t size,
 * we can really use the direct ops
 */
if (dma_direct_supported(dev, dev->coherent_dma_mask))
-#ifdef CONFIG_NOT_COHERENT_CACHE
-   return __dma_nommu_alloc_coherent(dev, size, dma_handle,
-  flag, attrs);
-#else
return dma_direct_alloc(dev, size, dma_handle, flag, attrs);
-#endif
 
/* Ok we can't ... do we have an iommu ? If not, fail */
iommu = get_iommu_table_base(dev);
@@ -62,12 +57,7 @@ static void dma_nommu_free_coherent(struct device *dev, 
size_t size,
 
/* See comments in dma_nommu_alloc_coherent() */
if (dma_direct_supported(dev, dev->coherent_dma_mask))
-#ifdef CONFIG_NOT_COHERENT_CACHE
-   return __dma_nommu_free_coherent(dev, size, vaddr, dma_handle,
- attrs);
-#else
return dma_direct_free(dev, size, vaddr, dma_handle, attrs);
-#endif
 
/* Maybe we used an iommu ... */
iommu = get_iommu_table_base(dev);
@@ -84,14 +74,8 @@ int dma_nommu_mmap_coherent(struct device *dev, struct 
vm_area_struct *vma,
 void *cpu_addr, dma_addr_t handle, size_t size,

[PATCH 17/20] powerpc/dma-swiotlb: use generic swiotlb_dma_ops

2018-07-30 Thread Christoph Hellwig

These are identical to the arch specific ones, so remove them.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/dma-direct.h |  4 
 arch/powerpc/include/asm/swiotlb.h|  2 --
 arch/powerpc/kernel/dma-swiotlb.c | 28 ++-
 arch/powerpc/sysdev/fsl_pci.c |  2 +-
 4 files changed, 7 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-direct.h 
b/arch/powerpc/include/asm/dma-direct.h
index 0fba19445ae8..657f84ddb20d 100644
--- a/arch/powerpc/include/asm/dma-direct.h
+++ b/arch/powerpc/include/asm/dma-direct.h
@@ -30,4 +30,8 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, 
dma_addr_t daddr)
return daddr - PCI_DRAM_OFFSET;
return daddr - dev->archdata.dma_offset;
 }
+
+u64 swiotlb_powerpc_get_required(struct device *dev);
+#define swiotlb_get_required_mask swiotlb_powerpc_get_required
+
 #endif /* ASM_POWERPC_DMA_DIRECT_H */
diff --git a/arch/powerpc/include/asm/swiotlb.h 
b/arch/powerpc/include/asm/swiotlb.h
index f65ecf57b66c..1d8c1da26ab3 100644
--- a/arch/powerpc/include/asm/swiotlb.h
+++ b/arch/powerpc/include/asm/swiotlb.h
@@ -13,8 +13,6 @@
 
 #include 
 
-extern const struct dma_map_ops powerpc_swiotlb_dma_ops;
-
 extern unsigned int ppc_swiotlb_enable;
 int __init swiotlb_setup_bus_notifier(void);
 
diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
b/arch/powerpc/kernel/dma-swiotlb.c
index 25986fcd1e5e..0c269de61f39 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -24,7 +24,7 @@
 
 unsigned int ppc_swiotlb_enable;
 
-static u64 swiotlb_powerpc_get_required(struct device *dev)
+u64 swiotlb_powerpc_get_required(struct device *dev)
 {
u64 end, mask, max_direct_dma_addr = dev->archdata.max_direct_dma_addr;
 
@@ -38,30 +38,6 @@ static u64 swiotlb_powerpc_get_required(struct device *dev)
return mask;
 }
 
-/*
- * At the moment, all platforms that use this code only require
- * swiotlb to be used if we're operating on HIGHMEM.  Since
- * we don't ever call anything other than map_sg, unmap_sg,
- * map_page, and unmap_page on highmem, use normal dma_ops
- * for everything else.
- */
-const struct dma_map_ops powerpc_swiotlb_dma_ops = {
-   .alloc = dma_direct_alloc,
-   .free = dma_direct_free,
-   .mmap = dma_nommu_mmap_coherent,
-   .map_sg = swiotlb_map_sg_attrs,
-   .unmap_sg = swiotlb_unmap_sg_attrs,
-   .dma_supported = swiotlb_dma_supported,
-   .map_page = swiotlb_map_page,
-   .unmap_page = swiotlb_unmap_page,
-   .sync_single_for_cpu = swiotlb_sync_single_for_cpu,
-   .sync_single_for_device = swiotlb_sync_single_for_device,
-   .sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
-   .sync_sg_for_device = swiotlb_sync_sg_for_device,
-   .mapping_error = swiotlb_dma_mapping_error,
-   .get_required_mask = swiotlb_powerpc_get_required,
-};
-
 void pci_dma_dev_setup_swiotlb(struct pci_dev *pdev)
 {
struct pci_controller *hose;
@@ -88,7 +64,7 @@ static int ppc_swiotlb_bus_notify(struct notifier_block *nb,
 
/* May need to bounce if the device can't address all of DRAM */
if ((dma_get_mask(dev) + 1) < memblock_end_of_DRAM())
-   set_dma_ops(dev, _swiotlb_dma_ops);
+   set_dma_ops(dev, _dma_ops);
 
return NOTIFY_DONE;
 }
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 918be816b097..daf44bc0108d 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -118,7 +118,7 @@ static void setup_swiotlb_ops(struct pci_controller *hose)
 {
if (ppc_swiotlb_enable) {
hose->controller_ops.dma_dev_setup = pci_dma_dev_setup_swiotlb;
-   set_pci_dma_ops(_swiotlb_dma_ops);
+   set_pci_dma_ops(_dma_ops);
}
 }
 #else
-- 
2.18.0

[PATCH 15/20] powerpc/dma: remove the unused unmap_page and unmap_sg methods

2018-07-30 Thread Christoph Hellwig

These methods are optional to start with, no need to implement no-op
versions.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/dma.c | 16 
 1 file changed, 16 deletions(-)

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 511a4972560d..2cfc45acbb52 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -178,12 +178,6 @@ static int dma_nommu_map_sg(struct device *dev, struct 
scatterlist *sgl,
return nents;
 }
 
-static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction direction,
-   unsigned long attrs)
-{
-}
-
 static u64 dma_nommu_get_required_mask(struct device *dev)
 {
u64 end, mask;
@@ -209,14 +203,6 @@ static inline dma_addr_t dma_nommu_map_page(struct device 
*dev,
return phys_to_dma(dev, page_to_phys(page)) + offset;
 }
 
-static inline void dma_nommu_unmap_page(struct device *dev,
-dma_addr_t dma_address,
-size_t size,
-enum dma_data_direction direction,
-unsigned long attrs)
-{
-}
-
 #ifdef CONFIG_NOT_COHERENT_CACHE
 static inline void dma_nommu_sync_sg(struct device *dev,
struct scatterlist *sgl, int nents,
@@ -242,10 +228,8 @@ const struct dma_map_ops dma_nommu_ops = {
.free   = dma_nommu_free_coherent,
.mmap   = dma_nommu_mmap_coherent,
.map_sg = dma_nommu_map_sg,
-   .unmap_sg   = dma_nommu_unmap_sg,
.dma_supported  = dma_direct_supported,
.map_page   = dma_nommu_map_page,
-   .unmap_page = dma_nommu_unmap_page,
.get_required_mask  = dma_nommu_get_required_mask,
 #ifdef CONFIG_NOT_COHERENT_CACHE
.sync_single_for_cpu= dma_nommu_sync_single,
-- 
2.18.0

[PATCH 16/20] powerpc/dma: use dma_direct_{alloc,free}

2018-07-30 Thread Christoph Hellwig

These do the same functionality as the existing helpers, but do it
simpler, and also allow the (optional) use of CMA.

Note that the swiotlb code now calls into the dma_direct code directly,
given that it doesn't work with noncoherent caches at all, and isn't called
when we have an iommu either, so the iommu special case in
dma_nommu_alloc_coherent isn't required for swiotlb.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/pgtable.h |  1 -
 arch/powerpc/kernel/dma-swiotlb.c  |  4 +-
 arch/powerpc/kernel/dma.c  | 78 --
 arch/powerpc/mm/mem.c  | 19 
 4 files changed, 11 insertions(+), 91 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 14c79a7dc855..123de4958d2e 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -38,7 +38,6 @@ extern unsigned long empty_zero_page[];
 extern pgd_t swapper_pg_dir[];
 
 void limit_zone_pfn(enum zone_type zone, unsigned long max_pfn);
-int dma_pfn_limit_to_zone(u64 pfn_limit);
 extern void paging_init(void);
 
 /*
diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
b/arch/powerpc/kernel/dma-swiotlb.c
index f6e0701c5303..25986fcd1e5e 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -46,8 +46,8 @@ static u64 swiotlb_powerpc_get_required(struct device *dev)
  * for everything else.
  */
 const struct dma_map_ops powerpc_swiotlb_dma_ops = {
-   .alloc = __dma_nommu_alloc_coherent,
-   .free = __dma_nommu_free_coherent,
+   .alloc = dma_direct_alloc,
+   .free = dma_direct_free,
.mmap = dma_nommu_mmap_coherent,
.map_sg = swiotlb_map_sg_attrs,
.unmap_sg = swiotlb_unmap_sg_attrs,
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 2cfc45acbb52..2b90a403cdac 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -26,75 +26,6 @@
  * can set archdata.dma_data to an unsigned long holding the offset. By
  * default the offset is PCI_DRAM_OFFSET.
  */
-
-static u64 __maybe_unused get_pfn_limit(struct device *dev)
-{
-   u64 pfn = (dev->coherent_dma_mask >> PAGE_SHIFT) + 1;
-   struct dev_archdata __maybe_unused *sd = >archdata;
-
-#ifdef CONFIG_SWIOTLB
-   if (sd->max_direct_dma_addr && dev->dma_ops == _swiotlb_dma_ops)
-   pfn = min_t(u64, pfn, sd->max_direct_dma_addr >> PAGE_SHIFT);
-#endif
-
-   return pfn;
-}
-
-#ifndef CONFIG_NOT_COHERENT_CACHE
-void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t flag,
- unsigned long attrs)
-{
-   void *ret;
-   struct page *page;
-   int node = dev_to_node(dev);
-#ifdef CONFIG_FSL_SOC
-   u64 pfn = get_pfn_limit(dev);
-   int zone;
-
-   /*
-* This code should be OK on other platforms, but we have drivers that
-* don't set coherent_dma_mask. As a workaround we just ifdef it. This
-* whole routine needs some serious cleanup.
-*/
-
-   zone = dma_pfn_limit_to_zone(pfn);
-   if (zone < 0) {
-   dev_err(dev, "%s: No suitable zone for pfn %#llx\n",
-   __func__, pfn);
-   return NULL;
-   }
-
-   switch (zone) {
-   case ZONE_DMA:
-   flag |= GFP_DMA;
-   break;
-#ifdef CONFIG_ZONE_DMA32
-   case ZONE_DMA32:
-   flag |= GFP_DMA32;
-   break;
-#endif
-   };
-#endif /* CONFIG_FSL_SOC */
-
-   page = alloc_pages_node(node, flag, get_order(size));
-   if (page == NULL)
-   return NULL;
-   ret = page_address(page);
-   memset(ret, 0, size);
-   *dma_handle = phys_to_dma(dev,__pa(ret));
-
-   return ret;
-}
-
-void __dma_nommu_free_coherent(struct device *dev, size_t size,
-   void *vaddr, dma_addr_t dma_handle,
-   unsigned long attrs)
-{
-   free_pages((unsigned long)vaddr, get_order(size));
-}
-#endif /* !CONFIG_NOT_COHERENT_CACHE */
-
 static void *dma_nommu_alloc_coherent(struct device *dev, size_t size,
   dma_addr_t *dma_handle, gfp_t flag,
   unsigned long attrs)
@@ -105,8 +36,12 @@ static void *dma_nommu_alloc_coherent(struct device *dev, 
size_t size,
 * we can really use the direct ops
 */
if (dma_direct_supported(dev, dev->coherent_dma_mask))
+#ifdef CONFIG_NOT_COHERENT_CACHE
return __dma_nommu_alloc_coherent(dev, size, dma_handle,
   flag, attrs);
+#else
+   return dma_direct_alloc(dev, size, dma_handle, flag, attrs);
+#endif
 
/* Ok we can't ... do we have an iommu ? If not, fail */
iommu = get_iommu_table_base(dev);
@@ -127,8 +62,13 @@ static void

[PATCH 14/20] powerpc/dma: replace dma_nommu_dma_supported with dma_direct_supported

2018-07-30 Thread Christoph Hellwig

The ppc32 case of dma_nommu_dma_supported already was a no-op, and the
64-bit case came to the same conclusion as dma_direct_supported, so
replace it with the generic version.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/kernel/dma.c | 28 +++-
 2 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f9cae7edd735..bbfa6a8df4da 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -158,6 +158,7 @@ config PPC
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
select DYNAMIC_FTRACE   if FUNCTION_TRACER
+   select DMA_DIRECT_OPS
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
select GENERIC_ATOMIC64 if PPC32
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 3487de83bb37..511a4972560d 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -40,28 +40,6 @@ static u64 __maybe_unused get_pfn_limit(struct device *dev)
return pfn;
 }
 
-static int dma_nommu_dma_supported(struct device *dev, u64 mask)
-{
-#ifdef CONFIG_PPC64
-   u64 limit = phys_to_dma(dev, (memblock_end_of_DRAM() - 1));
-
-   /* Limit fits in the mask, we are good */
-   if (mask >= limit)
-   return 1;
-
-#ifdef CONFIG_FSL_SOC
-   /* Freescale gets another chance via ZONE_DMA/ZONE_DMA32, however
-* that will have to be refined if/when they support iommus
-*/
-   return 1;
-#endif
-   /* Sorry ... */
-   return 0;
-#else
-   return 1;
-#endif
-}
-
 #ifndef CONFIG_NOT_COHERENT_CACHE
 void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
  dma_addr_t *dma_handle, gfp_t flag,
@@ -126,7 +104,7 @@ static void *dma_nommu_alloc_coherent(struct device *dev, 
size_t size,
/* The coherent mask may be smaller than the real mask, check if
 * we can really use the direct ops
 */
-   if (dma_nommu_dma_supported(dev, dev->coherent_dma_mask))
+   if (dma_direct_supported(dev, dev->coherent_dma_mask))
return __dma_nommu_alloc_coherent(dev, size, dma_handle,
   flag, attrs);
 
@@ -148,7 +126,7 @@ static void dma_nommu_free_coherent(struct device *dev, 
size_t size,
struct iommu_table *iommu;
 
/* See comments in dma_nommu_alloc_coherent() */
-   if (dma_nommu_dma_supported(dev, dev->coherent_dma_mask))
+   if (dma_direct_supported(dev, dev->coherent_dma_mask))
return __dma_nommu_free_coherent(dev, size, vaddr, dma_handle,
  attrs);
/* Maybe we used an iommu ... */
@@ -265,7 +243,7 @@ const struct dma_map_ops dma_nommu_ops = {
.mmap   = dma_nommu_mmap_coherent,
.map_sg = dma_nommu_map_sg,
.unmap_sg   = dma_nommu_unmap_sg,
-   .dma_supported  = dma_nommu_dma_supported,
+   .dma_supported  = dma_direct_supported,
.map_page   = dma_nommu_map_page,
.unmap_page = dma_nommu_unmap_page,
.get_required_mask  = dma_nommu_get_required_mask,
-- 
2.18.0

[PATCH 10/20] powerpc/dma-noncoherent: don't disable irqs over kmap_atomic

2018-07-30 Thread Christoph Hellwig

The requirement to disable local irqs over kmap_atomic is long gone,
so remove those calls.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/mm/dma-noncoherent.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/dma-noncoherent.c 
b/arch/powerpc/mm/dma-noncoherent.c
index 382528475433..d1c16456abac 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -357,12 +357,10 @@ static inline void __dma_sync_page_highmem(struct page 
*page,
 {
size_t seg_size = min((size_t)(PAGE_SIZE - offset), size);
size_t cur_size = seg_size;
-   unsigned long flags, start, seg_offset = offset;
+   unsigned long start, seg_offset = offset;
int nr_segs = 1 + ((size - seg_size) + PAGE_SIZE - 1)/PAGE_SIZE;
int seg_nr = 0;
 
-   local_irq_save(flags);
-
do {
start = (unsigned long)kmap_atomic(page + seg_nr) + seg_offset;
 
@@ -378,8 +376,6 @@ static inline void __dma_sync_page_highmem(struct page 
*page,
cur_size += seg_size;
seg_offset = 0;
} while (seg_nr < nr_segs);
-
-   local_irq_restore(flags);
 }
 #endif /* CONFIG_HIGHMEM */
 
-- 
2.18.0

[PATCH 13/20] powerpc/dma: remove get_dma_offset

2018-07-30 Thread Christoph Hellwig

Just fold the calculation into __phys_to_dma/__dma_to_phys as those are
the only places that should know about it.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/dma-direct.h  |  8 ++--
 arch/powerpc/include/asm/dma-mapping.h | 16 
 2 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-direct.h 
b/arch/powerpc/include/asm/dma-direct.h
index 7702875aabb7..0fba19445ae8 100644
--- a/arch/powerpc/include/asm/dma-direct.h
+++ b/arch/powerpc/include/asm/dma-direct.h
@@ -19,11 +19,15 @@ static inline bool dma_capable(struct device *dev, 
dma_addr_t addr, size_t size)
 
 static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
-   return paddr + get_dma_offset(dev);
+   if (!dev)
+   return paddr + PCI_DRAM_OFFSET;
+   return paddr + dev->archdata.dma_offset;
 }
 
 static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
-   return daddr - get_dma_offset(dev);
+   if (!dev)
+   return daddr - PCI_DRAM_OFFSET;
+   return daddr - dev->archdata.dma_offset;
 }
 #endif /* ASM_POWERPC_DMA_DIRECT_H */
diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index dacd0f93f2b2..f0bf7ac2686c 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -80,22 +80,6 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return NULL;
 }
 
-/*
- * get_dma_offset()
- *
- * Get the dma offset on configurations where the dma address can be determined
- * from the physical address by looking at a simple offset.  Direct dma and
- * swiotlb use this function, but it is typically not used by implementations
- * with an iommu.
- */
-static inline dma_addr_t get_dma_offset(struct device *dev)
-{
-   if (dev)
-   return dev->archdata.dma_offset;
-
-   return PCI_DRAM_OFFSET;
-}
-
 static inline void set_dma_offset(struct device *dev, dma_addr_t off)
 {
if (dev)
-- 
2.18.0

[PATCH 12/20] powerpc/dma: use phys_to_dma instead of get_dma_offset

2018-07-30 Thread Christoph Hellwig

Use the standard portable helper instead of the powerpc specific one,
which is about to go away.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/dma-swiotlb.c |  5 ++---
 arch/powerpc/kernel/dma.c | 12 ++--
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
b/arch/powerpc/kernel/dma-swiotlb.c
index 88f3963ca30f..f6e0701c5303 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -11,7 +11,7 @@
  *
  */
 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -31,9 +31,8 @@ static u64 swiotlb_powerpc_get_required(struct device *dev)
end = memblock_end_of_DRAM();
if (max_direct_dma_addr && end > max_direct_dma_addr)
end = max_direct_dma_addr;
-   end += get_dma_offset(dev);
 
-   mask = 1ULL << (fls64(end) - 1);
+   mask = 1ULL << (fls64(phys_to_dma(dev, end)) - 1);
mask += mask - 1;
 
return mask;
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index eceaa92e6986..3487de83bb37 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -6,7 +6,7 @@
  */
 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -43,7 +43,7 @@ static u64 __maybe_unused get_pfn_limit(struct device *dev)
 static int dma_nommu_dma_supported(struct device *dev, u64 mask)
 {
 #ifdef CONFIG_PPC64
-   u64 limit = get_dma_offset(dev) + (memblock_end_of_DRAM() - 1);
+   u64 limit = phys_to_dma(dev, (memblock_end_of_DRAM() - 1));
 
/* Limit fits in the mask, we are good */
if (mask >= limit)
@@ -104,7 +104,7 @@ void *__dma_nommu_alloc_coherent(struct device *dev, size_t 
size,
return NULL;
ret = page_address(page);
memset(ret, 0, size);
-   *dma_handle = __pa(ret) + get_dma_offset(dev);
+   *dma_handle = phys_to_dma(dev,__pa(ret));
 
return ret;
 }
@@ -188,7 +188,7 @@ static int dma_nommu_map_sg(struct device *dev, struct 
scatterlist *sgl,
int i;
 
for_each_sg(sgl, sg, nents, i) {
-   sg->dma_address = sg_phys(sg) + get_dma_offset(dev);
+   sg->dma_address = phys_to_dma(dev, sg_phys(sg));
sg->dma_length = sg->length;
 
if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
@@ -210,7 +210,7 @@ static u64 dma_nommu_get_required_mask(struct device *dev)
 {
u64 end, mask;
 
-   end = memblock_end_of_DRAM() + get_dma_offset(dev);
+   end = phys_to_dma(dev, memblock_end_of_DRAM());
 
mask = 1ULL << (fls64(end) - 1);
mask += mask - 1;
@@ -228,7 +228,7 @@ static inline dma_addr_t dma_nommu_map_page(struct device 
*dev,
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
__dma_sync_page(page, offset, size, dir);
 
-   return page_to_phys(page) + offset + get_dma_offset(dev);
+   return phys_to_dma(dev, page_to_phys(page)) + offset;
 }
 
 static inline void dma_nommu_unmap_page(struct device *dev,
-- 
2.18.0

[PATCH 11/20] powerpc/dma: split the two __dma_alloc_coherent implementations

2018-07-30 Thread Christoph Hellwig

The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share
any code with the one for systems with coherent caches.  Split it off
and merge it with the helpers in dma-noncoherent.c that have no other
callers.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/dma-mapping.h |  5 -
 arch/powerpc/kernel/dma.c  | 14 ++
 arch/powerpc/mm/dma-noncoherent.c  | 15 +++
 arch/powerpc/platforms/44x/warp.c  |  2 +-
 4 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index f2a4a7142b1e..dacd0f93f2b2 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -39,9 +39,6 @@ extern int dma_nommu_mmap_coherent(struct device *dev,
  * to ensure it is consistent.
  */
 struct device;
-extern void *__dma_alloc_coherent(struct device *dev, size_t size,
- dma_addr_t *handle, gfp_t gfp);
-extern void __dma_free_coherent(size_t size, void *vaddr);
 extern void __dma_sync(void *vaddr, size_t size, int direction);
 extern void __dma_sync_page(struct page *page, unsigned long offset,
 size_t size, int direction);
@@ -52,8 +49,6 @@ extern unsigned long __dma_get_coherent_pfn(unsigned long 
cpu_addr);
  * Cache coherent cores.
  */
 
-#define __dma_alloc_coherent(dev, gfp, size, handle)   NULL
-#define __dma_free_coherent(size, addr)((void)0)
 #define __dma_sync(addr, size, rw) ((void)0)
 #define __dma_sync_page(pg, off, sz, rw)   ((void)0)
 
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 3939589aab04..eceaa92e6986 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -62,18 +62,12 @@ static int dma_nommu_dma_supported(struct device *dev, u64 
mask)
 #endif
 }
 
+#ifndef CONFIG_NOT_COHERENT_CACHE
 void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
  dma_addr_t *dma_handle, gfp_t flag,
  unsigned long attrs)
 {
void *ret;
-#ifdef CONFIG_NOT_COHERENT_CACHE
-   ret = __dma_alloc_coherent(dev, size, dma_handle, flag);
-   if (ret == NULL)
-   return NULL;
-   *dma_handle += get_dma_offset(dev);
-   return ret;
-#else
struct page *page;
int node = dev_to_node(dev);
 #ifdef CONFIG_FSL_SOC
@@ -113,19 +107,15 @@ void *__dma_nommu_alloc_coherent(struct device *dev, 
size_t size,
*dma_handle = __pa(ret) + get_dma_offset(dev);
 
return ret;
-#endif
 }
 
 void __dma_nommu_free_coherent(struct device *dev, size_t size,
void *vaddr, dma_addr_t dma_handle,
unsigned long attrs)
 {
-#ifdef CONFIG_NOT_COHERENT_CACHE
-   __dma_free_coherent(size, vaddr);
-#else
free_pages((unsigned long)vaddr, get_order(size));
-#endif
 }
+#endif /* !CONFIG_NOT_COHERENT_CACHE */
 
 static void *dma_nommu_alloc_coherent(struct device *dev, size_t size,
   dma_addr_t *dma_handle, gfp_t flag,
diff --git a/arch/powerpc/mm/dma-noncoherent.c 
b/arch/powerpc/mm/dma-noncoherent.c
index d1c16456abac..cfc48a253707 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -29,7 +29,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #include 
@@ -151,8 +151,8 @@ static struct ppc_vm_region *ppc_vm_region_find(struct 
ppc_vm_region *head, unsi
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-__dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, 
gfp_t gfp)
+void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
struct page *page;
struct ppc_vm_region *c;
@@ -223,7 +223,7 @@ __dma_alloc_coherent(struct device *dev, size_t size, 
dma_addr_t *handle, gfp_t
/*
 * Set the "dma handle"
 */
-   *handle = page_to_phys(page);
+   *dma_handle = phys_to_dma(dev, page_to_phys(page));
 
do {
SetPageReserved(page);
@@ -249,12 +249,12 @@ __dma_alloc_coherent(struct device *dev, size_t size, 
dma_addr_t *handle, gfp_t
  no_page:
return NULL;
 }
-EXPORT_SYMBOL(__dma_alloc_coherent);
 
 /*
  * free a page as defined by the above mapping.
  */
-void __dma_free_coherent(size_t size, void *vaddr)
+void __dma_nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
+   dma_addr_t dma_handle, unsigned long attrs)
 {
struct ppc_vm_region *c;
unsigned long flags, addr;
@@ -309,7 +309,6 @@ void __dma_free_coherent(size_t size, void *vaddr)
   __func__, vaddr);
dump_stack();
 }

[PATCH 09/20] powerpc/dma: remove the unused ISA_DMA_THRESHOLD export

2018-07-30 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/setup_32.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 74457485574b..3c2d093f74c7 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -55,7 +55,6 @@ unsigned long ISA_DMA_THRESHOLD;
 unsigned int DMA_MODE_READ;
 unsigned int DMA_MODE_WRITE;
 
-EXPORT_SYMBOL(ISA_DMA_THRESHOLD);
 EXPORT_SYMBOL(DMA_MODE_READ);
 EXPORT_SYMBOL(DMA_MODE_WRITE);
 
-- 
2.18.0

[PATCH 08/20] powerpc/dma: remove the unused dma_nommu_ops export

2018-07-30 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/dma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index dbfc7056d7df..3939589aab04 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -286,7 +286,6 @@ const struct dma_map_ops dma_nommu_ops = {
.sync_sg_for_device = dma_nommu_sync_sg,
 #endif
 };
-EXPORT_SYMBOL(dma_nommu_ops);
 
 int dma_set_coherent_mask(struct device *dev, u64 mask)
 {
-- 
2.18.0

[PATCH 07/20] powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define

2018-07-30 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 8fa394520af6..f2a4a7142b1e 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -112,7 +112,5 @@ extern int dma_set_mask(struct device *dev, u64 dma_mask);
 
 extern u64 __dma_get_required_mask(struct device *dev);
 
-#define ARCH_HAS_DMA_MMAP_COHERENT
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_DMA_MAPPING_H */
-- 
2.18.0

[PATCH 06/20] dma-noncoherent: add an optional arch hook for ->get_required_mask

2018-07-30 Thread Christoph Hellwig

This is need for powerpc for now.  Hopefully we can come up with a clean
generic implementation mid-term.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-noncoherent.h | 6 ++
 kernel/dma/Kconfig  | 4 
 kernel/dma/noncoherent.c| 1 +
 3 files changed, 11 insertions(+)

diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index 10b2654d549b..61394c6e56df 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -17,6 +17,12 @@ int arch_dma_mmap(struct device *dev, struct vm_area_struct 
*vma,
 #define arch_dma_mmap NULL
 #endif /* CONFIG_DMA_NONCOHERENT_MMAP */
 
+#ifdef CONFIG_DMA_NONCOHERENT_GET_REQUIRED
+u64 arch_get_required_mask(struct device *dev);
+#else
+#define arch_get_required_mask NULL
+#endif
+
 #ifdef CONFIG_DMA_NONCOHERENT_CACHE_SYNC
 void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction direction);
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 9bd54304446f..b523104d6199 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -36,6 +36,10 @@ config DMA_NONCOHERENT_MMAP
bool
depends on DMA_NONCOHERENT_OPS
 
+config DMA_NONCOHERENT_GET_REQUIRED
+   bool
+   depends on DMA_NONCOHERENT_OPS
+
 config DMA_NONCOHERENT_CACHE_SYNC
bool
depends on DMA_NONCOHERENT_OPS
diff --git a/kernel/dma/noncoherent.c b/kernel/dma/noncoherent.c
index 79e9a757387f..cf4ffbe4a09d 100644
--- a/kernel/dma/noncoherent.c
+++ b/kernel/dma/noncoherent.c
@@ -98,5 +98,6 @@ const struct dma_map_ops dma_noncoherent_ops = {
.dma_supported  = dma_direct_supported,
.mapping_error  = dma_direct_mapping_error,
.cache_sync = arch_dma_cache_sync,
+   .get_required_mask  = arch_get_required_mask,
 };
 EXPORT_SYMBOL(dma_noncoherent_ops);
-- 
2.18.0

[PATCH 05/20] swiotlb: allow the architecture to provide a get_required_mask hook

2018-07-30 Thread Christoph Hellwig

For now this allows consolidating the powerpc code.  In the long run
we should grow a generic implementation of dma_get_required_mask that
returns the dma mask required to avoid bounce buffering.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 904541055792..1bb420244753 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -1084,5 +1084,9 @@ const struct dma_map_ops swiotlb_dma_ops = {
.map_page   = swiotlb_map_page,
.unmap_page = swiotlb_unmap_page,
.dma_supported  = dma_direct_supported,
+#ifdef swiotlb_get_required_mask
+   .get_required_mask  = swiotlb_get_required_mask,
+#endif
+
 };
 EXPORT_SYMBOL(swiotlb_dma_ops);
-- 
2.18.0

[PATCH 04/20] ia64: remove get_required_mask implementation

2018-07-30 Thread Christoph Hellwig

ia64 can use the generic implementation in general, and SN2 can just
override it in the dma_map_ops now.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/include/asm/dma-mapping.h  |  2 --
 arch/ia64/include/asm/machvec.h  |  7 ---
 arch/ia64/include/asm/machvec_init.h |  1 -
 arch/ia64/include/asm/machvec_sn2.h  |  2 --
 arch/ia64/pci/pci.c  | 26 --
 arch/ia64/sn/pci/pci_dma.c   |  4 ++--
 6 files changed, 2 insertions(+), 40 deletions(-)

diff --git a/arch/ia64/include/asm/dma-mapping.h 
b/arch/ia64/include/asm/dma-mapping.h
index 76e4d6632d68..522745ae67bb 100644
--- a/arch/ia64/include/asm/dma-mapping.h
+++ b/arch/ia64/include/asm/dma-mapping.h
@@ -10,8 +10,6 @@
 #include 
 #include 
 
-#define ARCH_HAS_DMA_GET_REQUIRED_MASK
-
 extern const struct dma_map_ops *dma_ops;
 extern struct ia64_machine_vector ia64_mv;
 extern void set_iommu_machvec(void);
diff --git a/arch/ia64/include/asm/machvec.h b/arch/ia64/include/asm/machvec.h
index 267f4f170191..5133739966bc 100644
--- a/arch/ia64/include/asm/machvec.h
+++ b/arch/ia64/include/asm/machvec.h
@@ -44,7 +44,6 @@ typedef void ia64_mv_kernel_launch_event_t(void);
 
 /* DMA-mapping interface: */
 typedef void ia64_mv_dma_init (void);
-typedef u64 ia64_mv_dma_get_required_mask (struct device *);
 typedef const struct dma_map_ops *ia64_mv_dma_get_ops(struct device *);
 
 /*
@@ -127,7 +126,6 @@ extern void machvec_tlb_migrate_finish (struct mm_struct *);
 #  define platform_global_tlb_purgeia64_mv.global_tlb_purge
 #  define platform_tlb_migrate_finish  ia64_mv.tlb_migrate_finish
 #  define platform_dma_initia64_mv.dma_init
-#  define platform_dma_get_required_mask ia64_mv.dma_get_required_mask
 #  define platform_dma_get_ops ia64_mv.dma_get_ops
 #  define platform_irq_to_vector   ia64_mv.irq_to_vector
 #  define platform_local_vector_to_irq ia64_mv.local_vector_to_irq
@@ -171,7 +169,6 @@ struct ia64_machine_vector {
ia64_mv_global_tlb_purge_t *global_tlb_purge;
ia64_mv_tlb_migrate_finish_t *tlb_migrate_finish;
ia64_mv_dma_init *dma_init;
-   ia64_mv_dma_get_required_mask *dma_get_required_mask;
ia64_mv_dma_get_ops *dma_get_ops;
ia64_mv_irq_to_vector *irq_to_vector;
ia64_mv_local_vector_to_irq *local_vector_to_irq;
@@ -211,7 +208,6 @@ struct ia64_machine_vector {
platform_global_tlb_purge,  \
platform_tlb_migrate_finish,\
platform_dma_init,  \
-   platform_dma_get_required_mask, \
platform_dma_get_ops,   \
platform_irq_to_vector, \
platform_local_vector_to_irq,   \
@@ -286,9 +282,6 @@ extern const struct dma_map_ops *dma_get_ops(struct device 
*);
 #ifndef platform_dma_get_ops
 # define platform_dma_get_ops  dma_get_ops
 #endif
-#ifndef platform_dma_get_required_mask
-# define  platform_dma_get_required_mask   ia64_dma_get_required_mask
-#endif
 #ifndef platform_irq_to_vector
 # define platform_irq_to_vector__ia64_irq_to_vector
 #endif
diff --git a/arch/ia64/include/asm/machvec_init.h 
b/arch/ia64/include/asm/machvec_init.h
index 2b32fd06b7c6..2aafb69a3787 100644
--- a/arch/ia64/include/asm/machvec_init.h
+++ b/arch/ia64/include/asm/machvec_init.h
@@ -4,7 +4,6 @@
 
 extern ia64_mv_send_ipi_t ia64_send_ipi;
 extern ia64_mv_global_tlb_purge_t ia64_global_tlb_purge;
-extern ia64_mv_dma_get_required_mask ia64_dma_get_required_mask;
 extern ia64_mv_irq_to_vector __ia64_irq_to_vector;
 extern ia64_mv_local_vector_to_irq __ia64_local_vector_to_irq;
 extern ia64_mv_pci_get_legacy_mem_t ia64_pci_get_legacy_mem;
diff --git a/arch/ia64/include/asm/machvec_sn2.h 
b/arch/ia64/include/asm/machvec_sn2.h
index ece9fa85be88..b5153d300289 100644
--- a/arch/ia64/include/asm/machvec_sn2.h
+++ b/arch/ia64/include/asm/machvec_sn2.h
@@ -55,7 +55,6 @@ extern ia64_mv_readb_t __sn_readb_relaxed;
 extern ia64_mv_readw_t __sn_readw_relaxed;
 extern ia64_mv_readl_t __sn_readl_relaxed;
 extern ia64_mv_readq_t __sn_readq_relaxed;
-extern ia64_mv_dma_get_required_mask   sn_dma_get_required_mask;
 extern ia64_mv_dma_initsn_dma_init;
 extern ia64_mv_migrate_t   sn_migrate;
 extern ia64_mv_kernel_launch_event_t   sn_kernel_launch_event;
@@ -100,7 +99,6 @@ extern ia64_mv_pci_fixup_bus_t   
sn_pci_fixup_bus;
 #define platform_pci_get_legacy_memsn_pci_get_legacy_mem
 #define platform_pci_legacy_read   sn_pci_legacy_read
 #define platform_pci_legacy_write  sn_pci_legacy_write
-#define platform_dma_get_required_mask sn_dma_get_required_mask
 #define platform_dma_init  sn_dma_init
 #define platform_migrate   sn_migrate
 #define platform_kernel_launch_eventsn_kernel_launch_event
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 7ccc64d5fe3e..5d71800df431 100644
--- a/arch/ia64/pci/pci.c
+++

[PATCH 03/20] dma-mapping: make the get_required_mask method available unconditionally

2018-07-30 Thread Christoph Hellwig

This save some duplication for ia64.  In the long run this method will
need some additional work including moving over to kernel/dma, but that
will require some additional prep work, so let's do this minimal change
for now.

Signed-off-by: Christoph Hellwig 
---
 drivers/base/platform.c | 11 ++-
 include/linux/dma-mapping.h |  2 --
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index dff82a3c2caa..921ddb0c051b 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1180,7 +1180,7 @@ int __init platform_bus_init(void)
 }
 
 #ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK
-u64 dma_get_required_mask(struct device *dev)
+static u64 default_dma_get_required_mask(struct device *dev)
 {
u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT);
u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT));
@@ -1198,6 +1198,15 @@ u64 dma_get_required_mask(struct device *dev)
}
return mask;
 }
+
+u64 dma_get_required_mask(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+
+   if (ops->get_required_mask)
+   return ops->get_required_mask(dev);
+   return default_dma_get_required_mask(dev);
+}
 EXPORT_SYMBOL_GPL(dma_get_required_mask);
 #endif
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f9cc309507d9..00d3065e1510 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -130,9 +130,7 @@ struct dma_map_ops {
enum dma_data_direction direction);
int (*mapping_error)(struct device *dev, dma_addr_t dma_addr);
int (*dma_supported)(struct device *dev, u64 mask);
-#ifdef ARCH_HAS_DMA_GET_REQUIRED_MASK
u64 (*get_required_mask)(struct device *dev);
-#endif
 };
 
 extern const struct dma_map_ops dma_direct_ops;
-- 
2.18.0

[PATCH 01/20] kernel/dma/direct: take DMA offset into account in dma_direct_supported

2018-07-30 Thread Christoph Hellwig

When a device has a DMA offset the dma capable result will change due
to the difference between the physical and DMA address.  Take that into
account.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 8be8106270c2..d32d4f0d2c0c 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -167,7 +167,7 @@ int dma_direct_map_sg(struct device *dev, struct 
scatterlist *sgl, int nents,
 int dma_direct_supported(struct device *dev, u64 mask)
 {
 #ifdef CONFIG_ZONE_DMA
-   if (mask < DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
+   if (mask < phys_to_dma(dev, DMA_BIT_MASK(ARCH_ZONE_DMA_BITS)))
return 0;
 #else
/*
@@ -176,14 +176,14 @@ int dma_direct_supported(struct device *dev, u64 mask)
 * memory, or by providing a ZONE_DMA32.  If neither is the case, the
 * architecture needs to use an IOMMU instead of the direct mapping.
 */
-   if (mask < DMA_BIT_MASK(32))
+   if (mask < phys_to_dma(dev, DMA_BIT_MASK(32)))
return 0;
 #endif
/*
 * Various PCI/PCIe bridges have broken support for > 32bit DMA even
 * if the device itself might support it.
 */
-   if (dev->dma_32bit_limit && mask > DMA_BIT_MASK(32))
+   if (dev->dma_32bit_limit && mask > phys_to_dma(dev, DMA_BIT_MASK(32)))
return 0;
return 1;
 }
-- 
2.18.0

[PATCH 02/20] kernel/dma/direct: refine dma_direct_alloc zone selection

2018-07-30 Thread Christoph Hellwig

We need to take the DMA offset and encryption bit into account when selecting
a zone.  Add a helper that takes those into account and use it.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index d32d4f0d2c0c..c2c1df8827f2 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -58,6 +58,14 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
return addr + size - 1 <= dev->coherent_dma_mask;
 }
 
+static bool dma_coherent_below(struct device *dev, u64 mask)
+{
+   dma_addr_t addr = force_dma_unencrypted() ?
+   __phys_to_dma(dev, mask) : phys_to_dma(dev, mask);
+
+   return dev->coherent_dma_mask <= addr;
+}
+
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs)
 {
@@ -70,9 +78,9 @@ void *dma_direct_alloc(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
gfp &= ~__GFP_ZERO;
 
/* GFP_DMA32 and GFP_DMA are no ops without the corresponding zones: */
-   if (dev->coherent_dma_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
+   if (dma_coherent_below(dev, DMA_BIT_MASK(ARCH_ZONE_DMA_BITS)))
gfp |= GFP_DMA;
-   if (dev->coherent_dma_mask <= DMA_BIT_MASK(32) && !(gfp & GFP_DMA))
+   if (dma_coherent_below(dev, DMA_BIT_MASK(32) && !(gfp & GFP_DMA)))
gfp |= GFP_DMA32;
 
 again:
@@ -92,14 +100,14 @@ void *dma_direct_alloc(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
page = NULL;
 
if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
-   dev->coherent_dma_mask < DMA_BIT_MASK(64) &&
+   dma_coherent_below(dev, DMA_BIT_MASK(64)) &&
!(gfp & (GFP_DMA32 | GFP_DMA))) {
gfp |= GFP_DMA32;
goto again;
}
 
if (IS_ENABLED(CONFIG_ZONE_DMA) &&
-   dev->coherent_dma_mask < DMA_BIT_MASK(32) &&
+   dma_coherent_below(dev, DMA_BIT_MASK(32)) &&
!(gfp & GFP_DMA)) {
gfp = (gfp & ~GFP_DMA32) | GFP_DMA;
goto again;
-- 
2.18.0

use generic DMA mapping code in powerpc

2018-07-30 Thread Christoph Hellwig

Hi all,

this series switches the powerpc port to use the generic swiotlb
and noncoherent dma ops, and to use more generic code for the
coherent direct mapping, as well as removing dead code.

Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()

2018-07-30 Thread LEROY Christophe

Murilo Opsfelder Araujo  a écrit :

Hi, Christophe.

On Fri, Jul 27, 2018 at 06:40:23PM +0200, LEROY Christophe wrote:

Murilo Opsfelder Araujo  a écrit :

> Simplify the message format by using REG_FMT as the register format.  This
> avoids having two different formats and avoids checking for MSR_64BIT.

Are you sure it is what we want ?

Yes.

Won't it change the behaviour for a 32 bits app running on a 64bits kernel ?

In fact, this changes how many zeroes are prefixed when displaying  
the registers

(%016lx vs. %08lx format).  For example, 32-bits userspace, 64-bits kernel:

Indeed that's what I suspected. What is the real benefit of this  
change ? Why not keep the current format for 32bits userspace ? All  
those leading zeroes are pointless to me.

before this series:
  [66475.002900] segv[4599]: unhandled signal 11 at  nip  
1420 lr 0fe61854 code 1

after this series:
  [   73.414535] segv[3759]: segfault (11) at  nip  
1420 lr 0fe61854 code 1 in segv[1000+1]
  [   73.414641] segv[3759]: code: 4e800421 80010014 38210010  
7c0803a6 4b30 9421ffd0 93e1002c 7c3f0b78
  [   73.414665] segv[3759]: code: 3920 913f001c 813f001c  
3941 <9149> 3920 7d234b78 397f0030

Have you spotted any other behaviour change?

Not as of today

Christophe

Cheers
Murilo

Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100

2018-07-30 Thread Alex Williamson

On Mon, 30 Jul 2018 18:58:49 +1000
Alexey Kardashevskiy  wrote:

> On 11/07/2018 19:26, Alexey Kardashevskiy wrote:
> > On Tue, 10 Jul 2018 16:37:15 -0600
> > Alex Williamson  wrote:
> >   
> >> On Tue, 10 Jul 2018 14:10:20 +1000
> >> Alexey Kardashevskiy  wrote:
> >>  
> >>> On Thu, 7 Jun 2018 23:03:23 -0600
> >>> Alex Williamson  wrote:
> >>> 
>  On Fri, 8 Jun 2018 14:14:23 +1000
>  Alexey Kardashevskiy  wrote:
>    
> > On 8/6/18 1:44 pm, Alex Williamson wrote:
> >> On Fri, 8 Jun 2018 13:08:54 +1000
> >> Alexey Kardashevskiy  wrote:
> >>   
> >>> On 8/6/18 8:15 am, Alex Williamson wrote:  
>  On Fri, 08 Jun 2018 07:54:02 +1000
>  Benjamin Herrenschmidt  wrote:
>  
> > On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote:   
> >  
> >>
> >> Can we back up and discuss whether the IOMMU grouping of NVLink
> >> connected devices makes sense?  AIUI we have a PCI view of these
> >> devices and from that perspective they're isolated.  That's the 
> >> view of
> >> the device used to generate the grouping.  However, not visible to 
> >> us,
> >> these devices are interconnected via NVLink.  What isolation 
> >> properties
> >> does NVLink provide given that its entire purpose for existing 
> >> seems to
> >> be to provide a high performance link for p2p between devices? 
> >>  
> >
> > Not entire. On POWER chips, we also have an nvlink between the 
> > device
> > and the CPU which is running significantly faster than PCIe.
> >
> > But yes, there are cross-links and those should probably be 
> > accounted
> > for in the grouping.
> 
>  Then after we fix the grouping, can we just let the host driver 
>  manage
>  this coherent memory range and expose vGPUs to guests?  The use case 
>  of
>  assigning all 6 GPUs to one VM seems pretty limited.  (Might need to
>  convince NVIDIA to support more than a single vGPU per VM though)
>  
> >>>
> >>> These are physical GPUs, not virtual sriov-alike things they are
> >>> implementing as well elsewhere.  
> >>
> >> vGPUs as implemented on M- and P-series Teslas aren't SR-IOV like
> >> either.  That's why we have mdev devices now to implement software
> >> defined devices.  I don't have first hand experience with V-series, but
> >> I would absolutely expect a PCIe-based Tesla V100 to support vGPU. 
> >>  
> >
> > So assuming V100 can do vGPU, you are suggesting ditching this patchset 
> > and
> > using mediated vGPUs instead, correct?
> 
>  If it turns out that our PCIe-only-based IOMMU grouping doesn't
>  account for lack of isolation on the NVLink side and we correct that,
>  limiting assignment to sets of 3 interconnected GPUs, is that still a
>  useful feature?  OTOH, it's entirely an NVIDIA proprietary decision
>  whether they choose to support vGPU on these GPUs or whether they can
>  be convinced to support multiple vGPUs per VM.
>    
> >>> My current understanding is that every P9 chip in that box has some 
> >>> NVLink2
> >>> logic on it so each P9 is directly connected to 3 GPUs via PCIe and
> >>> 2xNVLink2, and GPUs in that big group are interconnected by NVLink2 
> >>> links
> >>> as well.
> >>>
> >>> From small bits of information I have it seems that a GPU can 
> >>> perfectly
> >>> work alone and if the NVIDIA driver does not see these interconnects
> >>> (because we do not pass the rest of the big 3xGPU group to this 
> >>> guest), it
> >>> continues with a single GPU. There is an "nvidia-smi -r" big reset 
> >>> hammer
> >>> which simply refuses to work until all 3 GPUs are passed so there is 
> >>> some
> >>> distinction between passing 1 or 3 GPUs, and I am trying (as we 
> >>> speak) to
> >>> get a confirmation from NVIDIA that it is ok to pass just a single 
> >>> GPU.
> >>>
> >>> So we will either have 6 groups (one per GPU) or 2 groups (one per
> >>> interconnected group).  
> >>
> >> I'm not gaining much confidence that we can rely on isolation between
> >> NVLink connected GPUs, it sounds like you're simply expecting that
> >> proprietary code from NVIDIA on a proprietary interconnect from NVIDIA
> >> is going to play nice and nobody will figure out how to do bad things
> >> because... obfuscation?  Thanks,  
> >
> > Well, we already believe that a proprietary firmware of a sriov-capable
> > adapter like Mellanox ConnextX is not doing bad things, how is this
> > different in principle?
> 
>

Re: [PATCH 3/3] ptp_qoriq: convert to use module parameters for initialization

2018-07-30 Thread David Miller

From: Yangbo Lu 
Date: Mon, 30 Jul 2018 18:01:54 +0800

> +static unsigned int cksel = DEFAULT_CKSEL;
> +module_param(cksel, uint, 0644);
> +MODULE_PARM_DESC(cksel, "Select reference clock");
> +
> +static unsigned int clk_src;
> +module_param(clk_src, uint, 0644);
> +MODULE_PARM_DESC(clk_src, "Reference clock frequency (if clocks property not 
> provided in dts)");
> +
> +static unsigned int tmr_prsc = 2;
> +module_param(tmr_prsc, uint, 0644);
> +MODULE_PARM_DESC(tmr_prsc, "Output clock division/prescale factor");
> +
> +static unsigned int tmr_fiper1 = 10;
> +module_param(tmr_fiper1, uint, 0644);
> +MODULE_PARM_DESC(tmr_fiper1, "Desired fixed interval pulse period (ns)");
> +
> +static unsigned int tmr_fiper2 = 10;
> +module_param(tmr_fiper2, uint, 0644);
> +MODULE_PARM_DESC(tmr_fiper2, "Desired fixed interval pulse period (ns)");

Sorry, there is no way I am every applying something like this.  Module
parameters are to be avoided at all costs.

And you don't need it here, you have DTS, please use it.

You are required to support the existing DTS cases, in order to
avoid breaking things, anyways.

Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()

2018-07-30 Thread Murilo Opsfelder Araujo

Hi, Christophe.

On Fri, Jul 27, 2018 at 06:40:23PM +0200, LEROY Christophe wrote:
> Murilo Opsfelder Araujo  a écrit :
>
> > Simplify the message format by using REG_FMT as the register format.  This
> > avoids having two different formats and avoids checking for MSR_64BIT.
>
> Are you sure it is what we want ?

Yes.

> Won't it change the behaviour for a 32 bits app running on a 64bits kernel ?

In fact, this changes how many zeroes are prefixed when displaying the registers
(%016lx vs. %08lx format).  For example, 32-bits userspace, 64-bits kernel:

before this series:
  [66475.002900] segv[4599]: unhandled signal 11 at  nip 1420 lr 
0fe61854 code 1

after this series:
  [   73.414535] segv[3759]: segfault (11) at  nip 
1420 lr 0fe61854 code 1 in segv[1000+1]
  [   73.414641] segv[3759]: code: 4e800421 80010014 38210010 7c0803a6 4b30 
9421ffd0 93e1002c 7c3f0b78
  [   73.414665] segv[3759]: code: 3920 913f001c 813f001c 3941 
<9149> 3920 7d234b78 397f0030

Have you spotted any other behaviour change?

Cheers
Murilo

Re: [PATCH v3 1/1] powerpc/pseries: fix EEH recovery of some IOV devices

2018-07-30 Thread Bryant G. Ly

On 7/29/18 8:59 PM, Sam Bobroff wrote:

> EEH recovery currently fails on pSeries for some IOV capable PCI
> devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
> certain device tree properties for the device. (Found on an IOV
> capable device using the ipr driver.)
>
> Recovery fails in pci_enable_resources() at the check on r->parent,
> because r->flags is set and r->parent is not.  This state is due to
> sriov_init() setting the start, end and flags members of the IOV BARs
> but the parent not being set later in
> pseries_pci_fixup_iov_resources(), because the
> "ibm,open-sriov-vf-bar-info" property is missing.
>
> Correct this by zeroing the resource flags for IOV BARs when they
> can't be configured (this is the same method used by sriov_init() and
> __pci_read_base()).
>
> VFs cleared this way can't be enabled later, because that requires
> another device tree property, "ibm,number-of-configurable-vfs" as well
> as support for the RTAS function "ibm_map_pes". These are all part of
> hypervisor support for IOV and it seems unlikely that a hypervisor
> would ever partially, but not fully, support it. (None are currently
> provided by QEMU/KVM.)
>
> Signed-off-by: Sam Bobroff 
> ---
> Hi,
>
> This is a fix to allow EEH recovery to succeed in a specific situation,
> which I've tried to explain in the commit message.
>
> As with the RFC version, the IOV BARs are disabled by setting the resource
> flags to 0 but the other fields are now left as-is because that is what is 
> done
> elsewhere (see sriov_init() and __pci_read_base()).
>
> I've also examined the concern raised by Bjorn Helgaas, that VFs could be
> enabled later after the BARs are disabled, and it already seems safe: enabling
> VFs (on pseries) depends on another device tree property,
> "ibm,number-of-configurable-vfs" as well as support for the RTAS function
> "ibm_map_pes". Since these are all part of the hypervisor's support for IOV it
> seems unlikely that we would ever see some of them but not all. (None are
> currently provided by QEMU/KVM.) (Additionally, the ipr driver on which the 
> EEH
> recovery failure was discovered doesn't even seem to have SR-IOV support so it
> certainly can't enable VFs.)
>
> Cheers,
> Sam.
>
> Patch set v3:
> Patch 1/1: powerpc/pseries: fix EEH recovery of some IOV devices
> * Moved some useful information from the cover letter to the commit log.
>
> Patch set v2:
> Patch 1/1: powerpc/pseries: fix EEH recovery of some IOV devices
> * Moved the BAR disabling code to a function.
> * Also check in pseries_pci_fixup_resources().
>
> Patch set v1:
> Patch 1/1: powerpc/pseries: fix EEH recovery of IOV devices
>
>  arch/powerpc/platforms/pseries/setup.c | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/setup.c 
> b/arch/powerpc/platforms/pseries/setup.c
> index b55ad4286dc7..0a9e4243ae1d 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -645,6 +645,15 @@ void of_pci_parse_iov_addrs(struct pci_dev *dev, const 
> int *indexes)
>   }
>  }

Reviewed-by: Bryant G. Ly

Re: [PATCH 3/3] ptp_qoriq: convert to use module parameters for initialization

2018-07-30 Thread Richard Cochran

On Mon, Jul 30, 2018 at 06:01:54PM +0800, Yangbo Lu wrote:
> The ptp_qoriq driver initialized the 1588 timer with the
> configurations provided by the properties of device tree
> node. For example,
> 
>   fsl,tclk-period = <5>;
>   fsl,tmr-prsc= <2>;
>   fsl,tmr-add = <0xaaab>;
>   fsl,tmr-fiper1  = <5>;
>   fsl,tmr-fiper2  = <0>;
>   fsl,max-adj = <4>;
> 
> These things actually were runtime configurations which
> were not proper to be put into dts.

That is debatable.  While I agree that the dts isn't ideal for these,
still it is the lesser of two or more evils.

> This patch is to convert
> to use module parameters for 1588 timer initialization, and
> to support initial register values calculation.

It is hard for me to understand how using module parameters improves
the situation.

> If the parameters are not provided, the driver will calculate
> register values with a set of default parameters. With this
> patch, those dts properties are no longer needed for new
> platform to support 1588 timer, and many QorIQ DPAA platforms
> (some P series and T series platforms of PowerPC, and some
> LS series platforms of ARM64) could use this driver for their
> fman ptp timer with default module parameters. However, this
> patch didn't remove the dts method. Because there were still
> many old platforms using the dts method. We need to clean up
> their dts files, verify module parameters on them, and convert
> them to the new method gradually in case of breaking any
> function.

In addition, like it or not, because the dts is an ABI, you must
continue support of the dts values as a legacy option.

Thanks,
Richard

Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE

2018-07-30 Thread Aneesh Kumar K.V

Michael Ellerman  writes:

> Hi Laurent,
>
> Just one comment below.
>
> Laurent Dufour  writes:
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
>> b/arch/powerpc/platforms/pseries/lpar.c
>> index 96b8cd8a802d..41ed03245eb4 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long 
>> slot, unsigned long vpn,
>>  BUG_ON(lpar_rc != H_SUCCESS);
>>  }
>>  
>> +
>> +/*
>> + * As defined in the PAPR's section 14.5.4.1.8
>> + * The control mask doesn't include the returned reference and change bit 
>> from
>> + * the processed PTE.
>> + */
>> +#define HBLKR_AVPN  0x0100UL
>> +#define HBLKR_CTRL_MASK 0xf800UL
>> +#define HBLKR_CTRL_SUCCESS  0x8000UL
>> +#define HBLKR_CTRL_ERRNOTFOUND  0x8800UL
>> +#define HBLKR_CTRL_ERRBUSY  0xa000UL
>> +
>> +/**
>> + * H_BLOCK_REMOVE caller.
>> + * @idx should point to the latest @param entry set with a PTEX.
>> + * If PTE cannot be processed because another CPUs has already locked that
>> + * group, those entries are put back in @param starting at index 1.
>> + * If entries has to be retried and @retry_busy is set to true, these 
>> entries
>> + * are retried until success. If @retry_busy is set to false, the returned
>> + * is the number of entries yet to process.
>> + */
>> +static unsigned long call_block_remove(unsigned long idx, unsigned long 
>> *param,
>> +   bool retry_busy)
>> +{
>> +unsigned long i, rc, new_idx;
>> +unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
>> +
>> +again:
>> +new_idx = 0;
>> +BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
>
> I count 1 ..
>
>> +if (idx < PLPAR_HCALL9_BUFSIZE)
>> +param[idx] = HBR_END;
>> +
>> +rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
>> +  param[0], /* AVA */
>> +  param[1],  param[2],  param[3],  param[4], /* TS0-7 */
>> +  param[5],  param[6],  param[7],  param[8]);
>> +if (rc == H_SUCCESS)
>> +return 0;
>> +
>> +BUG_ON(rc != H_PARTIAL);
>
> 2 ...
>
>> +/* Check that the unprocessed entries were 'not found' or 'busy' */
>> +for (i = 0; i < idx-1; i++) {
>> +unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
>> +
>> +if (ctrl == HBLKR_CTRL_ERRBUSY) {
>> +param[++new_idx] = param[i+1];
>> +continue;
>> +}
>> +
>> +BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
>> +   && ctrl != HBLKR_CTRL_ERRNOTFOUND);
>
> 3 ...
>
> BUG_ON()s.
>
> I know the code in this file is already pretty liberal with the use of
> BUG_ON() but I'd prefer if we don't make it any worse.
>
> Given this is an optimisation it seems like we should be able to fall
> back to the existing implementation in the case of error (which will
> probably then BUG_ON() )
>
> If there's some reason we can't then I guess I can live with it.

It would be nice to log the error in case we are not expecting the
error return. We recently did
https://marc.info/?i=20180629083904.29250-1-aneesh.ku...@linux.ibm.com

-aneesh

Re: [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation

2018-07-30 Thread Aneesh Kumar K.V

Laurent Dufour  writes:

> This part of code will be called also when dealing with H_BLOCK_REMOVE.
>
> Cc: "Aneesh Kumar K.V" 
> Cc: Nicholas Piggin 
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: Benjamin Herrenschmidt 
> Signed-off-by: Laurent Dufour 

Reviewed-by: Aneesh Kumar K.V 

> ---
>  arch/powerpc/platforms/pseries/lpar.c | 27 ---
>  1 file changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
> b/arch/powerpc/platforms/pseries/lpar.c
> index 52eeff1297f4..96b8cd8a802d 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -547,6 +547,24 @@ static int pSeries_lpar_hpte_removebolted(unsigned long 
> ea,
>   return 0;
>  }
>  
> +
> +static inline unsigned long compute_slot(real_pte_t pte,
> +  unsigned long vpn,
> +  unsigned long index,
> +  unsigned long shift,
> +  int ssize)
> +{
> + unsigned long slot, hash, hidx;
> +
> + hash = hpt_hash(vpn, shift, ssize);
> + hidx = __rpte_to_hidx(pte, index);
> + if (hidx & _PTEIDX_SECONDARY)
> + hash = ~hash;
> + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
> + slot += hidx & _PTEIDX_GROUP_IX;
> + return slot;
> +}
> +
>  /*
>   * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
>   * lock.
> @@ -559,7 +577,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long 
> number, int local)
>   struct ppc64_tlb_batch *batch = this_cpu_ptr(_tlb_batch);
>   int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
>   unsigned long param[PLPAR_HCALL9_BUFSIZE];
> - unsigned long hash, index, shift, hidx, slot;
> + unsigned long index, shift, slot;
>   real_pte_t pte;
>   int psize, ssize;
>  
> @@ -573,12 +591,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long 
> number, int local)
>   vpn = batch->vpn[i];
>   pte = batch->pte[i];
>   pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
> - hash = hpt_hash(vpn, shift, ssize);
> - hidx = __rpte_to_hidx(pte, index);
> - if (hidx & _PTEIDX_SECONDARY)
> - hash = ~hash;
> - slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
> - slot += hidx & _PTEIDX_GROUP_IX;
> + slot = compute_slot(pte, vpn, index, shift, ssize);
>   if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
>   /*
>* lpar doesn't use the passed actual page size
> -- 
> 2.7.4

Re: [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE

2018-07-30 Thread Aneesh Kumar K.V

Laurent Dufour  writes:

> This feature tells if the hcall H_BLOCK_REMOVE is available.
>
> Cc: "Aneesh Kumar K.V" 
> Cc: Nicholas Piggin 
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: Benjamin Herrenschmidt 
> Signed-off-by: Laurent Dufour 

Reviewed-by: Aneesh Kumar K.V 

> ---
>  arch/powerpc/include/asm/firmware.h   | 3 ++-
>  arch/powerpc/platforms/pseries/firmware.c | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/firmware.h 
> b/arch/powerpc/include/asm/firmware.h
> index 535add3f7791..360ba197f9d2 100644
> --- a/arch/powerpc/include/asm/firmware.h
> +++ b/arch/powerpc/include/asm/firmware.h
> @@ -53,6 +53,7 @@
>  #define FW_FEATURE_PRRN  ASM_CONST(0x0002)
>  #define FW_FEATURE_DRMEM_V2  ASM_CONST(0x0004)
>  #define FW_FEATURE_DRC_INFO  ASM_CONST(0x0008)
> +#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0010)
>  
>  #ifndef __ASSEMBLY__
>  
> @@ -70,7 +71,7 @@ enum {
>   FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
>   FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
>   FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
> - FW_FEATURE_DRC_INFO,
> + FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE,
>   FW_FEATURE_PSERIES_ALWAYS = 0,
>   FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
>   FW_FEATURE_POWERNV_ALWAYS = 0,
> diff --git a/arch/powerpc/platforms/pseries/firmware.c 
> b/arch/powerpc/platforms/pseries/firmware.c
> index a3bbeb43689e..1624501386f4 100644
> --- a/arch/powerpc/platforms/pseries/firmware.c
> +++ b/arch/powerpc/platforms/pseries/firmware.c
> @@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
>   {FW_FEATURE_SET_MODE,   "hcall-set-mode"},
>   {FW_FEATURE_BEST_ENERGY,"hcall-best-energy-1*"},
>   {FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
> + {FW_FEATURE_BLOCK_REMOVE,   "hcall-block-remove"},
>  };
>  
>  /* Build up the firmware features bitmask using the contents of
> -- 
> 2.7.4

Re: [PATCH 2/2] powerpc: Use of_machine_compatible_match()

2018-07-30 Thread Rob Herring

On Mon, Jul 30, 2018 at 7:15 AM Michael Ellerman  wrote:
>
> Use of_machine_compatible_match() in platform code rather than open
> coding.

This is good because I want to get rid of of_root being used outside
of drivers/of/. Can you also convert this one:

drivers/clk/clk-qoriq.c:of_device_is_compatible(of_root,
"fsl,ls1021a")) {

>
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/platforms/40x/ppc40x_simple.c| 2 +-
>  arch/powerpc/platforms/512x/mpc512x_generic.c | 2 +-
>  arch/powerpc/platforms/52xx/lite5200.c| 2 +-
>  arch/powerpc/platforms/52xx/media5200.c   | 2 +-
>  arch/powerpc/platforms/52xx/mpc5200_simple.c  | 2 +-
>  arch/powerpc/platforms/83xx/mpc830x_rdb.c | 2 +-
>  arch/powerpc/platforms/83xx/mpc831x_rdb.c | 2 +-
>  arch/powerpc/platforms/83xx/mpc837x_rdb.c | 2 +-
>  arch/powerpc/platforms/85xx/corenet_generic.c | 2 +-
>  arch/powerpc/platforms/85xx/tqm85xx.c | 2 +-
>  10 files changed, 10 insertions(+), 10 deletions(-)

Acked-by: Rob Herring 

Rob

Re: [PATCH 1/2] of: Add of_machine_compatible_match()

2018-07-30 Thread Rob Herring

On Mon, Jul 30, 2018 at 7:15 AM Michael Ellerman  wrote:
>
> We have of_machine_is_compatible() to check if a machine is compatible
> with a single compatible string. However some code is able to support
> multiple compatible boards, and so wants to check for one of many
> compatible strings.
>
> So add of_machine_compatible_match() which takes a NULL terminated
> array of compatible strings to check against the root node's
> compatible property.
>
> Compared to an open coded match this is slightly more self
> documenting, and also avoids the caller needing to juggle the root
> node either directly or via of_find_node_by_path().
>
> Signed-off-by: Michael Ellerman 
> ---
>  drivers/of/base.c  | 21 +
>  include/linux/of.h |  6 ++
>  2 files changed, 27 insertions(+)
>
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 848f549164cd..603716ba8513 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -505,6 +505,27 @@ int of_device_compatible_match(struct device_node 
> *device,
> return score;
>  }
>
> +/**
> + * of_machine_compatible_match - Test root of device tree against a 
> compatible array
> + * @compats: NULL terminated array of compatible strings to look for in root 
> node's compatible property.
> + *
> + * Returns true if the root node has any of the given compatible values in 
> its
> + * compatible property.
> + */
> +bool of_machine_compatible_match(const char *const *compats)
> +{
> +   struct device_node *root;
> +   int rc = 0;
> +
> +   root = of_node_get(of_root);
> +   if (root) {
> +   rc = of_device_compatible_match(root, compats);
> +   of_node_put(root);
> +   }
> +
> +   return rc != 0;
> +}
> +
>  /**
>   * of_machine_is_compatible - Test root of device tree for a given 
> compatible value
>   * @compat: compatible string to look for in root node's compatible property.
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 4d25e4f952d9..05e3e23a3a57 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -389,6 +389,7 @@ extern int of_alias_get_id(struct device_node *np, const 
> char *stem);
>  extern int of_alias_get_highest_id(const char *stem);
>
>  extern int of_machine_is_compatible(const char *compat);
> +extern bool of_machine_compatible_match(const char *const *compats);

Would be nice if of_machine_is_compatible could be implemented in
terms of of_machine_compatible_match like this:

int of_machine_is_compatible(const char *compat)
{
  const char *compats[] = { compat, NULL };
  return of_machine_is_compatible(compats);
}

Probably can be inline too.

Rob

Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE

2018-07-30 Thread Michael Ellerman

Hi Laurent,

Just one comment below.

Laurent Dufour  writes:
> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
> b/arch/powerpc/platforms/pseries/lpar.c
> index 96b8cd8a802d..41ed03245eb4 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long 
> slot, unsigned long vpn,
>   BUG_ON(lpar_rc != H_SUCCESS);
>  }
>  
> +
> +/*
> + * As defined in the PAPR's section 14.5.4.1.8
> + * The control mask doesn't include the returned reference and change bit 
> from
> + * the processed PTE.
> + */
> +#define HBLKR_AVPN   0x0100UL
> +#define HBLKR_CTRL_MASK  0xf800UL
> +#define HBLKR_CTRL_SUCCESS   0x8000UL
> +#define HBLKR_CTRL_ERRNOTFOUND   0x8800UL
> +#define HBLKR_CTRL_ERRBUSY   0xa000UL
> +
> +/**
> + * H_BLOCK_REMOVE caller.
> + * @idx should point to the latest @param entry set with a PTEX.
> + * If PTE cannot be processed because another CPUs has already locked that
> + * group, those entries are put back in @param starting at index 1.
> + * If entries has to be retried and @retry_busy is set to true, these entries
> + * are retried until success. If @retry_busy is set to false, the returned
> + * is the number of entries yet to process.
> + */
> +static unsigned long call_block_remove(unsigned long idx, unsigned long 
> *param,
> +bool retry_busy)
> +{
> + unsigned long i, rc, new_idx;
> + unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
> +
> +again:
> + new_idx = 0;
> + BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));

I count 1 ..

> + if (idx < PLPAR_HCALL9_BUFSIZE)
> + param[idx] = HBR_END;
> +
> + rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
> +   param[0], /* AVA */
> +   param[1],  param[2],  param[3],  param[4], /* TS0-7 */
> +   param[5],  param[6],  param[7],  param[8]);
> + if (rc == H_SUCCESS)
> + return 0;
> +
> + BUG_ON(rc != H_PARTIAL);

2 ...

> + /* Check that the unprocessed entries were 'not found' or 'busy' */
> + for (i = 0; i < idx-1; i++) {
> + unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
> +
> + if (ctrl == HBLKR_CTRL_ERRBUSY) {
> + param[++new_idx] = param[i+1];
> + continue;
> + }
> +
> + BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
> +&& ctrl != HBLKR_CTRL_ERRNOTFOUND);

3 ...

BUG_ON()s.

I know the code in this file is already pretty liberal with the use of
BUG_ON() but I'd prefer if we don't make it any worse.

Given this is an optimisation it seems like we should be able to fall
back to the existing implementation in the case of error (which will
probably then BUG_ON() )

If there's some reason we can't then I guess I can live with it.

cheers

Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver

2018-07-30 Thread Michael Ellerman

Geert Uytterhoeven  writes:

> Hi Michael,
>
> On Mon, Jul 30, 2018 at 8:47 AM Michael Ellerman  wrote:
>> Finn Thain  writes:
>> > Now that the PowerMac via-pmu driver supports m68k PowerBooks,
>> > switch over to that driver and remove the via-pmu68k driver.
>> >
>> > Cc: Geert Uytterhoeven 
>> > Tested-by: Stan Johnson 
>> > Signed-off-by: Finn Thain 
>> > ---
>> >  arch/m68k/configs/mac_defconfig   |   2 +-
>> >  arch/m68k/configs/multi_defconfig |   2 +-
>> >  arch/m68k/mac/config.c|   2 +-
>> >  arch/m68k/mac/misc.c  |  48 +--
>> >  drivers/macintosh/Kconfig |  13 +-
>> >  drivers/macintosh/Makefile|   1 -
>> >  drivers/macintosh/adb.c   |   2 +-
>> >  drivers/macintosh/via-pmu68k.c| 846 
>> > --
>> >  include/uapi/linux/pmu.h  |   2 +-
>> >  9 files changed, 14 insertions(+), 904 deletions(-)
>> >  delete mode 100644 drivers/macintosh/via-pmu68k.c
>>
>> Geert are you OK with this and the other one that touches arch/m68k ?
>
> Sure, feel free to take them through the PPC tree.
> Acked-by: Geert Uytterhoeven 

Thanks will do.

cheers

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-30 Thread Michael S. Tsirkin

On Mon, Jul 30, 2018 at 04:18:02AM -0700, Christoph Hellwig wrote:
> On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote:
> > Let me reply to the "crappy" part first:
> > So virtio devices can run on another CPU or on a PCI bus. Configuration
> > can happen over mupltiple transports.  There is a discovery protocol to
> > figure out where it is. It has some warts but any real system has warts.
> > 
> > So IMHO virtio running on another CPU isn't "legacy virtual crappy
> > virtio". virtio devices that actually sit on a PCI bus aren't "sane"
> > simply because the DMA is more convoluted on some architectures.
> 
> All of what you said would be true if virtio didn't claim to be
> a PCI device.  

There's nothing virtio claims to be.  It's a PV device that uses PCI for
its configuration.  Configuration is enumerated on the virtual PCI bus.
That part of the interface is emulated PCI. Data path is through a
PV device enumerated on the virtio bus.

> Once it claims to be a PCI device and we also see
> real hardware written to the interface I stand to all what I said
> above.

Real hardware would reuse parts of the interface but by necessity it
needs to behave slightly differently on some platforms.  However for
some platforms (such as x86) a PV virtio driver will by luck work with a
PCI device backend without changes. As these platforms and drivers are
widely deployed, some people will deploy hardware like that.  Should be
a non issue as by definition it's transparent to guests.

> > With this out of my system:
> > I agree these approaches are hacky. I think it is generally better to
> > have virtio feature negotiation tell you whether device runs on a CPU or
> > not rather than rely on platform specific ways for this. To this end
> > there was a recent proposal to rename VIRTIO_F_IO_BARRIER to
> > VIRTIO_F_REAL_DEVICE.  It got stuck since "real" sounds vague to people,
> > e.g.  what if it's a VF - is that real or not? But I can see something
> > like e.g. VIRTIO_F_PLATFORM_DMA gaining support.
> > 
> > We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk
> > and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing.
> 
> I don't really care about the exact naming, and indeed a device that
> sets the flag doesn't have to be a 'real' device - it just has to act
> like one.  I explained all the issues that this means (at least relating
> to DMA) in one of the previous threads.

I believe you refer to this:
https://lkml.org/lkml/2018/6/7/15
that was a very helpful list outlining the problems we need to solve,
thanks a lot for that!

> The important bit is that we can specify exact behavior for both
> devices that sets the "I'm real!" flag and that ones that don't exactly
> in the spec.

I would very much like that, yes.

> And that very much excludes arch-specific (or
> Xen-specific) overrides.

We already committed to a xen specific hack but generally I prefer
devices that describe how they work instead of platforms magically
guessing, yes.

However the question people raise is that DMA API is already full of
arch-specific tricks the likes of which are outlined in your post linked
above. How is this one much worse?

-- 
MST

[PATCH 2/2] powerpc: Use of_machine_compatible_match()

2018-07-30 Thread Michael Ellerman

Use of_machine_compatible_match() in platform code rather than open
coding.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/40x/ppc40x_simple.c| 2 +-
 arch/powerpc/platforms/512x/mpc512x_generic.c | 2 +-
 arch/powerpc/platforms/52xx/lite5200.c| 2 +-
 arch/powerpc/platforms/52xx/media5200.c   | 2 +-
 arch/powerpc/platforms/52xx/mpc5200_simple.c  | 2 +-
 arch/powerpc/platforms/83xx/mpc830x_rdb.c | 2 +-
 arch/powerpc/platforms/83xx/mpc831x_rdb.c | 2 +-
 arch/powerpc/platforms/83xx/mpc837x_rdb.c | 2 +-
 arch/powerpc/platforms/85xx/corenet_generic.c | 2 +-
 arch/powerpc/platforms/85xx/tqm85xx.c | 2 +-
 10 files changed, 10 insertions(+), 10 deletions(-)


Tested the corenet_generic.c change on a p5020ds, the rest are untested owing to
lack of hardware.

diff --git a/arch/powerpc/platforms/40x/ppc40x_simple.c 
b/arch/powerpc/platforms/40x/ppc40x_simple.c
index 2a050007bbae..f401a8add141 100644
--- a/arch/powerpc/platforms/40x/ppc40x_simple.c
+++ b/arch/powerpc/platforms/40x/ppc40x_simple.c
@@ -63,7 +63,7 @@ static const char * const board[] __initconst = {
 
 static int __init ppc40x_probe(void)
 {
-   if (of_device_compatible_match(of_root, board)) {
+   if (of_machine_compatible_match(board)) {
pci_set_flags(PCI_REASSIGN_ALL_RSRC);
return 1;
}
diff --git a/arch/powerpc/platforms/512x/mpc512x_generic.c 
b/arch/powerpc/platforms/512x/mpc512x_generic.c
index bf884d3075e4..952ea53d4829 100644
--- a/arch/powerpc/platforms/512x/mpc512x_generic.c
+++ b/arch/powerpc/platforms/512x/mpc512x_generic.c
@@ -38,7 +38,7 @@ static const char * const board[] __initconst = {
  */
 static int __init mpc512x_generic_probe(void)
 {
-   if (!of_device_compatible_match(of_root, board))
+   if (!of_machine_compatible_match(board))
return 0;
 
mpc512x_init_early();
diff --git a/arch/powerpc/platforms/52xx/lite5200.c 
b/arch/powerpc/platforms/52xx/lite5200.c
index c94c385cc919..5f11e906e02c 100644
--- a/arch/powerpc/platforms/52xx/lite5200.c
+++ b/arch/powerpc/platforms/52xx/lite5200.c
@@ -183,7 +183,7 @@ static const char * const board[] __initconst = {
  */
 static int __init lite5200_probe(void)
 {
-   return of_device_compatible_match(of_root, board);
+   return of_machine_compatible_match(board);
 }
 
 define_machine(lite5200) {
diff --git a/arch/powerpc/platforms/52xx/media5200.c 
b/arch/powerpc/platforms/52xx/media5200.c
index 1fcab233d2f2..8641bb55c8e8 100644
--- a/arch/powerpc/platforms/52xx/media5200.c
+++ b/arch/powerpc/platforms/52xx/media5200.c
@@ -242,7 +242,7 @@ static const char * const board[] __initconst = {
  */
 static int __init media5200_probe(void)
 {
-   return of_device_compatible_match(of_root, board);
+   return of_machine_compatible_match(board);
 }
 
 define_machine(media5200_platform) {
diff --git a/arch/powerpc/platforms/52xx/mpc5200_simple.c 
b/arch/powerpc/platforms/52xx/mpc5200_simple.c
index a80c6278d515..7b8b3be0f159 100644
--- a/arch/powerpc/platforms/52xx/mpc5200_simple.c
+++ b/arch/powerpc/platforms/52xx/mpc5200_simple.c
@@ -70,7 +70,7 @@ static const char *board[] __initdata = {
  */
 static int __init mpc5200_simple_probe(void)
 {
-   return of_device_compatible_match(of_root, board);
+   return of_machine_compatible_match(board);
 }
 
 define_machine(mpc5200_simple_platform) {
diff --git a/arch/powerpc/platforms/83xx/mpc830x_rdb.c 
b/arch/powerpc/platforms/83xx/mpc830x_rdb.c
index 272c41c387b9..7c0cd27e71e7 100644
--- a/arch/powerpc/platforms/83xx/mpc830x_rdb.c
+++ b/arch/powerpc/platforms/83xx/mpc830x_rdb.c
@@ -43,7 +43,7 @@ static const char *board[] __initdata = {
  */
 static int __init mpc830x_rdb_probe(void)
 {
-   return of_device_compatible_match(of_root, board);
+   return of_machine_compatible_match(board);
 }
 
 machine_device_initcall(mpc830x_rdb, mpc83xx_declare_of_platform_devices);
diff --git a/arch/powerpc/platforms/83xx/mpc831x_rdb.c 
b/arch/powerpc/platforms/83xx/mpc831x_rdb.c
index fd80fd570e67..3718db1c74e4 100644
--- a/arch/powerpc/platforms/83xx/mpc831x_rdb.c
+++ b/arch/powerpc/platforms/83xx/mpc831x_rdb.c
@@ -43,7 +43,7 @@ static const char *board[] __initdata = {
  */
 static int __init mpc831x_rdb_probe(void)
 {
-   return of_device_compatible_match(of_root, board);
+   return of_machine_compatible_match(board);
 }
 
 machine_device_initcall(mpc831x_rdb, mpc83xx_declare_of_platform_devices);
diff --git a/arch/powerpc/platforms/83xx/mpc837x_rdb.c 
b/arch/powerpc/platforms/83xx/mpc837x_rdb.c
index 0c55fa6af2d5..08c8434d895c 100644
--- a/arch/powerpc/platforms/83xx/mpc837x_rdb.c
+++ b/arch/powerpc/platforms/83xx/mpc837x_rdb.c
@@ -70,7 +70,7 @@ static const char * const board[] __initconst = {
  */
 static int __init mpc837x_rdb_probe(void)
 {
-   return of_device_compatible_match(of_root, board);
+   return of_machine_compatible_match(board);
 }
 
 define_machine(mpc837x_rdb) {
diff --git

[PATCH 1/2] of: Add of_machine_compatible_match()

2018-07-30 Thread Michael Ellerman

We have of_machine_is_compatible() to check if a machine is compatible
with a single compatible string. However some code is able to support
multiple compatible boards, and so wants to check for one of many
compatible strings.

So add of_machine_compatible_match() which takes a NULL terminated
array of compatible strings to check against the root node's
compatible property.

Compared to an open coded match this is slightly more self
documenting, and also avoids the caller needing to juggle the root
node either directly or via of_find_node_by_path().

Signed-off-by: Michael Ellerman 
---
 drivers/of/base.c  | 21 +
 include/linux/of.h |  6 ++
 2 files changed, 27 insertions(+)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 848f549164cd..603716ba8513 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -505,6 +505,27 @@ int of_device_compatible_match(struct device_node *device,
return score;
 }
 
+/**
+ * of_machine_compatible_match - Test root of device tree against a compatible 
array
+ * @compats: NULL terminated array of compatible strings to look for in root 
node's compatible property.
+ *
+ * Returns true if the root node has any of the given compatible values in its
+ * compatible property.
+ */
+bool of_machine_compatible_match(const char *const *compats)
+{
+   struct device_node *root;
+   int rc = 0;
+
+   root = of_node_get(of_root);
+   if (root) {
+   rc = of_device_compatible_match(root, compats);
+   of_node_put(root);
+   }
+
+   return rc != 0;
+}
+
 /**
  * of_machine_is_compatible - Test root of device tree for a given compatible 
value
  * @compat: compatible string to look for in root node's compatible property.
diff --git a/include/linux/of.h b/include/linux/of.h
index 4d25e4f952d9..05e3e23a3a57 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -389,6 +389,7 @@ extern int of_alias_get_id(struct device_node *np, const 
char *stem);
 extern int of_alias_get_highest_id(const char *stem);
 
 extern int of_machine_is_compatible(const char *compat);
+extern bool of_machine_compatible_match(const char *const *compats);
 
 extern int of_add_property(struct device_node *np, struct property *prop);
 extern int of_remove_property(struct device_node *np, struct property *prop);
@@ -877,6 +878,11 @@ static inline int of_machine_is_compatible(const char 
*compat)
return 0;
 }
 
+static inline bool of_machine_compatible_match(const char *const *compats)
+{
+   return false;
+}
+
 static inline bool of_console_check(const struct device_node *dn, const char 
*name, int index)
 {
return false;
-- 
2.14.1

Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code

2018-07-30 Thread Thomas Gleixner

On Mon, 30 Jul 2018, Christoph Hellwig wrote:

> There is nothing arch specific about PCI or dma-debug, so move this
> call to common code just after registering the bus type.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Thomas Gleixner

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-30 Thread Christoph Hellwig

On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote:
> Let me reply to the "crappy" part first:
> So virtio devices can run on another CPU or on a PCI bus. Configuration
> can happen over mupltiple transports.  There is a discovery protocol to
> figure out where it is. It has some warts but any real system has warts.
> 
> So IMHO virtio running on another CPU isn't "legacy virtual crappy
> virtio". virtio devices that actually sit on a PCI bus aren't "sane"
> simply because the DMA is more convoluted on some architectures.

All of what you said would be true if virtio didn't claim to be
a PCI device.  Once it claims to be a PCI device and we also see
real hardware written to the interface I stand to all what I said
above.

> With this out of my system:
> I agree these approaches are hacky. I think it is generally better to
> have virtio feature negotiation tell you whether device runs on a CPU or
> not rather than rely on platform specific ways for this. To this end
> there was a recent proposal to rename VIRTIO_F_IO_BARRIER to
> VIRTIO_F_REAL_DEVICE.  It got stuck since "real" sounds vague to people,
> e.g.  what if it's a VF - is that real or not? But I can see something
> like e.g. VIRTIO_F_PLATFORM_DMA gaining support.
> 
> We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk
> and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing.

I don't really care about the exact naming, and indeed a device that
sets the flag doesn't have to be a 'real' device - it just has to act
like one.  I explained all the issues that this means (at least relating
to DMA) in one of the previous threads.

The important bit is that we can specify exact behavior for both
devices that sets the "I'm real!" flag and that ones that don't exactly
in the spec.  And that very much excludes arch-specific (or
Xen-specific) overrides.

Re: [PATCH] of/fdt: Remove PPC32 longtrail hack in memory scan

2018-07-30 Thread Michael Ellerman

Rob Herring  writes:
> On Thu, Jul 26, 2018 at 11:36 PM Michael Ellerman  wrote:
>> When the OF code was originally made common by Grant in commit
>> 51975db0b733 ("of/flattree: merge early_init_dt_scan_memory() common
>> code") (Feb 2010), the common code inherited a hack to handle
>> PPC "longtrail" machines, which had a "memory@0" node with no
>> device_type.
>>
>> That check was then made to only apply to PPC32 in b44aa25d20e2 ("of:
>> Handle memory@0 node on PPC32 only") (May 2014).
>>
>> But according to Paul Mackerras the "longtrail" machines are long
>> dead, if they were ever seen in the wild at all. If someone does still
>> have one, we can handle this firmware wart in powerpc platform code.
>>
>> So remove the hack once and for all.
>
> Yay. I guess Power Macs and other quirks will never die...

Not soon.

In base.c I see:
 - the hack in arch_find_n_match_cpu_physical_id()
   - we should just move that into arch code, it's a __weak arch hook
 after all.
 - a PPC hack in of_alias_scan(), I guess we need to retain that
   behaviour, but it's pretty minor anyway.

In address.c there's the powermac empty ranges hack. Seems like we could
fix that just by creating empty `ranges` properties in fixup_device_tree().
I don't think we support booting powermacs other than via prom_init(). (Ben?)

> I'll queue this up.

cheers

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-30 Thread Michael S. Tsirkin

On Mon, Jul 30, 2018 at 02:34:14AM -0700, Christoph Hellwig wrote:
> We really need to distinguish between legacy virtual crappy
> virtio (and that includes v1) that totally ignores the bus it pretends
> to be on, and sane virtio (to be defined) that sit on a real (or
> properly emulated including iommu and details for dma mapping) bus.

Let me reply to the "crappy" part first:
So virtio devices can run on another CPU or on a PCI bus. Configuration
can happen over mupltiple transports.  There is a discovery protocol to
figure out where it is. It has some warts but any real system has warts.

So IMHO virtio running on another CPU isn't "legacy virtual crappy
virtio". virtio devices that actually sit on a PCI bus aren't "sane"
simply because the DMA is more convoluted on some architectures.

Performance impact of the optimizations possible when you know
your "device" is in fact just another CPU has been measured,
it is real, so we aren't interested in adding all that overhead back
just so we can use DMA API. The "correct then fast" mantra doesn't
apply to something that is as widely deployed as virtio.

And I can accept an argument that maybe the DMA API isn't designed to
support such virtual DMA. Whether it should I don't know.

With this out of my system:
I agree these approaches are hacky. I think it is generally better to
have virtio feature negotiation tell you whether device runs on a CPU or
not rather than rely on platform specific ways for this. To this end
there was a recent proposal to rename VIRTIO_F_IO_BARRIER to
VIRTIO_F_REAL_DEVICE.  It got stuck since "real" sounds vague to people,
e.g.  what if it's a VF - is that real or not? But I can see something
like e.g. VIRTIO_F_PLATFORM_DMA gaining support.

We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk
and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing.

-- 
MST

[PATCH 1/3] arm64: dts: fsl: add clocks property for fman ptp timer node

2018-07-30 Thread Yangbo Lu

This patch is to add clocks property for fman ptp timer node.

Signed-off-by: Yangbo Lu 
---
 arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi 
b/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi
index a56a408..4664c33 100644
--- a/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi
+++ b/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi
@@ -80,4 +80,5 @@ ptp_timer0: ptp-timer@1afe000 {
compatible = "fsl,fman-ptp-timer";
reg = <0x0 0x1afe000 0x0 0x1000>;
interrupts = ;
+   clocks = < 3 0>;
 };
-- 
1.7.1

[PATCH 2/3] powerpc/mpc85xx: add clocks property for fman ptp timer node

2018-07-30 Thread Yangbo Lu

This patch is to add clocks property for fman ptp timer node.

Signed-off-by: Yangbo Lu 
---
 arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi   |1 +
 arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi   |1 +
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi  |1 +
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi  |1 +
 arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi |1 +
 5 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi
index 6b124f7..9b6cf91 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi
@@ -100,4 +100,5 @@ ptp_timer0: ptp-timer@4fe000 {
compatible = "fsl,fman-ptp-timer";
reg = <0x4fe000 0x1000>;
interrupts = <96 2 0 0>;
+   clocks = < 3 0>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi
index b80aaf5..e95c11f 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi
@@ -100,4 +100,5 @@ ptp_timer1: ptp-timer@5fe000 {
compatible = "fsl,fman-ptp-timer";
reg = <0x5fe000 0x1000>;
interrupts = <97 2 0 0>;
+   clocks = < 3 1>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi
index d3720fd..d62b36c 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi
@@ -105,4 +105,5 @@ ptp_timer0: ptp-timer@4fe000 {
compatible = "fsl,fman-ptp-timer";
reg = <0x4fe000 0x1000>;
interrupts = <96 2 0 0>;
+   clocks = < 3 0>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi
index ae34c20..3102324 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi
@@ -105,4 +105,5 @@ ptp_timer1: ptp-timer@5fe000 {
compatible = "fsl,fman-ptp-timer";
reg = <0x5fe000 0x1000>;
interrupts = <97 2 0 0>;
+   clocks = < 3 1>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi
index 02f2755..c90702b 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi
@@ -93,4 +93,5 @@ ptp_timer0: ptp-timer@4fe000 {
compatible = "fsl,fman-ptp-timer";
reg = <0x4fe000 0x1000>;
interrupts = <96 2 0 0>;
+   clocks = < 3 0>;
 };
-- 
1.7.1

[PATCH 3/3] ptp_qoriq: convert to use module parameters for initialization

2018-07-30 Thread Yangbo Lu

The ptp_qoriq driver initialized the 1588 timer with the
configurations provided by the properties of device tree
node. For example,

  fsl,tclk-period = <5>;
  fsl,tmr-prsc= <2>;
  fsl,tmr-add = <0xaaab>;
  fsl,tmr-fiper1  = <5>;
  fsl,tmr-fiper2  = <0>;
  fsl,max-adj = <4>;

These things actually were runtime configurations which
were not proper to be put into dts. This patch is to convert
to use module parameters for 1588 timer initialization, and
to support initial register values calculation.
If the parameters are not provided, the driver will calculate
register values with a set of default parameters. With this
patch, those dts properties are no longer needed for new
platform to support 1588 timer, and many QorIQ DPAA platforms
(some P series and T series platforms of PowerPC, and some
LS series platforms of ARM64) could use this driver for their
fman ptp timer with default module parameters. However, this
patch didn't remove the dts method. Because there were still
many old platforms using the dts method. We need to clean up
their dts files, verify module parameters on them, and convert
them to the new method gradually in case of breaking any
function.

Signed-off-by: Yangbo Lu 
---
 drivers/ptp/ptp_qoriq.c   |  117 +++-
 include/linux/fsl/ptp_qoriq.h |1 +
 2 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/drivers/ptp/ptp_qoriq.c b/drivers/ptp/ptp_qoriq.c
index a14c317..22baf83 100644
--- a/drivers/ptp/ptp_qoriq.c
+++ b/drivers/ptp/ptp_qoriq.c
@@ -29,9 +29,30 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
+static unsigned int cksel = DEFAULT_CKSEL;
+module_param(cksel, uint, 0644);
+MODULE_PARM_DESC(cksel, "Select reference clock");
+
+static unsigned int clk_src;
+module_param(clk_src, uint, 0644);
+MODULE_PARM_DESC(clk_src, "Reference clock frequency (if clocks property not 
provided in dts)");
+
+static unsigned int tmr_prsc = 2;
+module_param(tmr_prsc, uint, 0644);
+MODULE_PARM_DESC(tmr_prsc, "Output clock division/prescale factor");
+
+static unsigned int tmr_fiper1 = 10;
+module_param(tmr_fiper1, uint, 0644);
+MODULE_PARM_DESC(tmr_fiper1, "Desired fixed interval pulse period (ns)");
+
+static unsigned int tmr_fiper2 = 10;
+module_param(tmr_fiper2, uint, 0644);
+MODULE_PARM_DESC(tmr_fiper2, "Desired fixed interval pulse period (ns)");
+
 /*
  * Register access functions
  */
@@ -317,6 +338,91 @@ static int ptp_qoriq_enable(struct ptp_clock_info *ptp,
.enable = ptp_qoriq_enable,
 };
 
+/**
+ * qoriq_ptp_nominal_freq - calculate nominal frequency by reference clock
+ * frequency
+ *
+ * @clk_src: reference clock frequency
+ *
+ * The nominal frequency is the desired clock frequency.
+ * It should be less than the reference clock frequency.
+ * It should be a factor of 1000MHz.
+ *
+ * Return the nominal frequency
+ */
+static u32 qoriq_ptp_nominal_freq(u32 clk_src)
+{
+   u32 remainder = 0;
+
+   clk_src /= 100;
+   remainder = clk_src % 100;
+   if (remainder) {
+   clk_src -= remainder;
+   clk_src += 100;
+   }
+
+   do {
+   clk_src -= 100;
+
+   } while (1000 % clk_src);
+
+   return clk_src * 100;
+}
+
+static int qoriq_ptp_config(struct qoriq_ptp *qoriq_ptp,
+   struct device_node *node)
+{
+   struct clk *clk;
+   u64 freq_comp;
+   u64 max_adj;
+   u32 nominal_freq;
+
+   qoriq_ptp->cksel = cksel;
+
+   if (clk_src) {
+   qoriq_ptp->clk_src = clk_src;
+   } else {
+   clk = of_clk_get(node, 0);
+   if (!IS_ERR(clk)) {
+   qoriq_ptp->clk_src = clk_get_rate(clk);
+   clk_put(clk);
+   }
+   }
+
+   if (qoriq_ptp->clk_src <= 1UL) {
+   pr_err("error reference clock value, or lower than 100MHz\n");
+   return -EINVAL;
+   }
+
+   nominal_freq = qoriq_ptp_nominal_freq(qoriq_ptp->clk_src);
+   if (!nominal_freq)
+   return -EINVAL;
+
+   qoriq_ptp->tclk_period = 10UL / nominal_freq;
+   qoriq_ptp->tmr_prsc = tmr_prsc;
+
+   /* Calculate initial frequency compensation value for TMR_ADD register.
+* freq_comp = ceil(2^32 / freq_ratio)
+* freq_ratio = reference_clock_freq / nominal_freq
+*/
+   freq_comp = ((u64)1 << 32) * nominal_freq;
+   if (do_div(freq_comp, qoriq_ptp->clk_src))
+   freq_comp++;
+
+   qoriq_ptp->tmr_add = freq_comp;
+   qoriq_ptp->tmr_fiper1 = tmr_fiper1 - qoriq_ptp->tclk_period;
+   qoriq_ptp->tmr_fiper2 = tmr_fiper2 - qoriq_ptp->tclk_period;
+
+   /* max_adj = 10 * (freq_ratio - 1.0) - 1
+* freq_ratio = reference_clock_freq / nominal_freq
+*/
+   max_adj = 10ULL * (qoriq_ptp->clk_src - nominal_freq);
+   max_adj = max_adj

Re: [PATCH] powerpc/mm: Don't report PUDs as memory leaks when using kmemleak

2018-07-30 Thread Paul Menzel

Dear Michael,


On 07/30/18 08:43, Michael Ellerman wrote:
> Paul Menzel  writes:
>> Am 19.07.2018 um 16:33 schrieb Michael Ellerman:
> ...
>>>
>>> The fix is fairly simple. We need to tell kmemleak to ignore PUD
>>> allocations and never report them as leaks. We can also tell it not to
>>> scan the PGD, because it will never find pointers in there. However it
>>> will still notice if we allocate a PGD and then leak it.
>>>
>>> Reported-by: Paul Menzel 
>>> Signed-off-by: Michael Ellerman  > ---
>>>   arch/powerpc/include/asm/book3s/64/pgalloc.h | 23 +--
>>>   1 file changed, 21 insertions(+), 2 deletions(-)
>>
>> […]
>>
>> Tested-by: Paul Menzel  on IBM S822LC
> 
> Thanks.

No problem. I forgot to add, that it’d be great, if you tagged this
for the stable series too.

Cc: sta...@vger.kernel.org


Kind regards,

Paul



smime.p7s
Description: S/MIME Cryptographic Signature

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-30 Thread Christoph Hellwig

On Fri, Jul 27, 2018 at 10:58:05AM +0100, Will Deacon wrote:
> 
> I just wanted to say that this patch series provides a means for us to
> force the coherent DMA ops for legacy virtio devices on arm64, which in turn
> means that we can enable the SMMU with legacy devices in our fastmodel
> emulation platform (which is slowly being upgraded to virtio 1.0) without
> hanging during boot. Patch below.

Yikes, this is a nightmare.  That is exactly where I do not want things
to end up.  We really need to distinguish between legacy virtual crappy
virtio (and that includes v1) that totally ignores the bus it pretends
to be on, and sane virtio (to be defined) that sit on a real (or
properly emulated including iommu and details for dma mapping) bus.

Having a mumble jumble of arch specific undocumented magic as in
the powerpc patch replied to or this arm patch is a complete no-go.

Nacked-by: Christoph Hellwig 

for both.

Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively

2018-07-30 Thread Christoph Hellwig

> > +
> > +   if (xen_domain())
> > +   goto skip_override;
> > +
> > +   if (virtio_has_iommu_quirk(dev))
> > +   set_dma_ops(dev->dev.parent, _direct_dma_ops);
> > +
> > + skip_override:
> > +
> 
> I prefer normal if scoping as opposed to goto spaghetti pls.
> Better yet move vring_use_dma_api here and use it.
> Less of a chance something will break.

I agree about avoid pointless gotos here, but we can do things
perfectly well without either gotos or a confusing helper here
if we structure it right. E.g.:

// suitably detailed comment here
if (!xen_domain() &&
!virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
set_dma_ops(dev->dev.parent, _direct_dma_ops);

and while we're at it - modifying dma ops for the parent looks very
dangerous.  I don't think we can do that, as it could break iommu
setup interactions.  IFF we set a specific dma map ops it has to be
on the virtio device itself, of which we have full control.

Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively

2018-07-30 Thread Christoph Hellwig

> +const struct dma_map_ops virtio_direct_dma_ops;

This belongs into a header if it is non-static.  If you only
use it in this file anyway please mark it static and avoid a forward
declaration.

> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>   int ret = dev->config->finalize_features(dev);
> @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
>   if (ret)
>   return ret;
>  
> + if (virtio_has_iommu_quirk(dev))
> + set_dma_ops(dev->dev.parent, _direct_dma_ops);

This needs a big fat comment explaining what is going on here.

Also not new, but I find the existance of virtio_has_iommu_quirk and its
name horribly confusing.  It might be better to open code it here once
only a single caller is left.

Re: [RFC 1/4] virtio: Define virtio_direct_dma_ops structure

2018-07-30 Thread Christoph Hellwig

> +/*
> + * Virtio direct mapping DMA API operations structure
> + *
> + * This defines DMA API structure for all virtio devices which would not
> + * either bring in their own DMA OPS from architecture or they would not
> + * like to use architecture specific IOMMU based DMA OPS because QEMU
> + * expects GPA instead of an IOVA in absence of VIRTIO_F_IOMMU_PLATFORM.
> + */
> +dma_addr_t virtio_direct_map_page(struct device *dev, struct page *page,
> + unsigned long offset, size_t size,
> + enum dma_data_direction dir,
> + unsigned long attrs)

All these functions should probably be marked static.

> +void virtio_direct_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
> + size_t size, enum dma_data_direction dir,
> + unsigned long attrs)
> +{
> +}

No need to implement no-op callbacks in struct dma_map_ops.

> +
> +int virtio_direct_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
> +{
> + return 0;
> +}

Including this one.

> +void *virtio_direct_alloc(struct device *dev, size_t size, dma_addr_t 
> *dma_handle,
> + gfp_t gfp, unsigned long attrs)
> +{
> + void *queue = alloc_pages_exact(PAGE_ALIGN(size), gfp);
> +
> + if (queue) {
> + phys_addr_t phys_addr = virt_to_phys(queue);
> + *dma_handle = (dma_addr_t)phys_addr;
> +
> + if (WARN_ON_ONCE(*dma_handle != phys_addr)) {
> + free_pages_exact(queue, PAGE_ALIGN(size));
> + return NULL;
> + }
> + }
> + return queue;

queue is a very odd name in a generic memory allocator.

> +void virtio_direct_free(struct device *dev, size_t size, void *vaddr,
> + dma_addr_t dma_addr, unsigned long attrs)
> +{
> + free_pages_exact(vaddr, PAGE_ALIGN(size));
> +}
> +
> +const struct dma_map_ops virtio_direct_dma_ops = {
> + .alloc  = virtio_direct_alloc,
> + .free   = virtio_direct_free,
> + .map_page   = virtio_direct_map_page,
> + .unmap_page = virtio_direct_unmap_page,
> + .mapping_error  = virtio_direct_mapping_error,
> +};

This is missing a dma_map_sg implementation.  In general this is
mandatory for dma_ops.  So either you implement it or explain in
a common why you think you can skip it.

> +EXPORT_SYMBOL(virtio_direct_dma_ops);

EXPORT_SYMBOL_GPL like all virtio symbols, please.

Re: Infinite looping observed in __offline_pages

2018-07-30 Thread Michal Hocko

On Fri 27-07-18 12:32:59, John Allen wrote:
> On Wed, Jul 25, 2018 at 10:03:36PM +0200, Michal Hocko wrote:
> > On Wed 25-07-18 13:11:15, John Allen wrote:
> > [...]
> > > Does a failure in do_migrate_range indicate that the range is unmigratable
> > > and the loop in __offline_pages should terminate and goto failed_removal? 
> > > Or
> > > should we allow a certain number of retrys before we
> > > give up on migrating the range?
> > 
> > Unfortunatelly not. Migration code doesn't tell a difference between
> > ephemeral and permanent failures. We are relying on
> > start_isolate_page_range to tell us this. So the question is, what kind
> > of page is not migratable and for what reason.
> > 
> > Are you able to add some debugging to give us more information. The
> > current debugging code in the hotplug/migration sucks...
> 
> After reproducing the problem a couple times, it seems that it can occur for
> different types of pages. Running page-types on the offending page over two
> separate instances produced the following:
> 
> # tools/vm/page-types -a 307968-308224
> flags page-count   MB  symbolic-flags 
> long-symbolic-flags
> 0x0400 10  
> __Bbuddy
>total   10

Huh! How come a buddy page has non zero reference count.
> 
> And the following on a separate run:
> 
> # tools/vm/page-types -a 313088-313344
> flags page-count   MB  symbolic-flags 
> long-symbolic-flags
> 0x006c 10  
> __RU_lA
> referenced,uptodate,lru,active
> total  10

Hmm, what is the expected page count in this case? Seeing 1 doesn't look
particularly wrong.
-- 
Michal Hocko
SUSE Labs

Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100

2018-07-30 Thread Alexey Kardashevskiy




On 11/07/2018 19:26, Alexey Kardashevskiy wrote:
> On Tue, 10 Jul 2018 16:37:15 -0600
> Alex Williamson  wrote:
> 
>> On Tue, 10 Jul 2018 14:10:20 +1000
>> Alexey Kardashevskiy  wrote:
>>
>>> On Thu, 7 Jun 2018 23:03:23 -0600
>>> Alex Williamson  wrote:
>>>   
 On Fri, 8 Jun 2018 14:14:23 +1000
 Alexey Kardashevskiy  wrote:
 
> On 8/6/18 1:44 pm, Alex Williamson wrote:  
>> On Fri, 8 Jun 2018 13:08:54 +1000
>> Alexey Kardashevskiy  wrote:
>> 
>>> On 8/6/18 8:15 am, Alex Williamson wrote:
 On Fri, 08 Jun 2018 07:54:02 +1000
 Benjamin Herrenschmidt  wrote:
   
> On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote:  
>>
>> Can we back up and discuss whether the IOMMU grouping of NVLink
>> connected devices makes sense?  AIUI we have a PCI view of these
>> devices and from that perspective they're isolated.  That's the view 
>> of
>> the device used to generate the grouping.  However, not visible to 
>> us,
>> these devices are interconnected via NVLink.  What isolation 
>> properties
>> does NVLink provide given that its entire purpose for existing seems 
>> to
>> be to provide a high performance link for p2p between devices?   
>>  
>
> Not entire. On POWER chips, we also have an nvlink between the device
> and the CPU which is running significantly faster than PCIe.
>
> But yes, there are cross-links and those should probably be accounted
> for in the grouping.  

 Then after we fix the grouping, can we just let the host driver manage
 this coherent memory range and expose vGPUs to guests?  The use case of
 assigning all 6 GPUs to one VM seems pretty limited.  (Might need to
 convince NVIDIA to support more than a single vGPU per VM though)  
 
>>>
>>> These are physical GPUs, not virtual sriov-alike things they are
>>> implementing as well elsewhere.
>>
>> vGPUs as implemented on M- and P-series Teslas aren't SR-IOV like
>> either.  That's why we have mdev devices now to implement software
>> defined devices.  I don't have first hand experience with V-series, but
>> I would absolutely expect a PCIe-based Tesla V100 to support vGPU.   
>>  
>
> So assuming V100 can do vGPU, you are suggesting ditching this patchset 
> and
> using mediated vGPUs instead, correct?  

 If it turns out that our PCIe-only-based IOMMU grouping doesn't
 account for lack of isolation on the NVLink side and we correct that,
 limiting assignment to sets of 3 interconnected GPUs, is that still a
 useful feature?  OTOH, it's entirely an NVIDIA proprietary decision
 whether they choose to support vGPU on these GPUs or whether they can
 be convinced to support multiple vGPUs per VM.
 
>>> My current understanding is that every P9 chip in that box has some 
>>> NVLink2
>>> logic on it so each P9 is directly connected to 3 GPUs via PCIe and
>>> 2xNVLink2, and GPUs in that big group are interconnected by NVLink2 
>>> links
>>> as well.
>>>
>>> From small bits of information I have it seems that a GPU can perfectly
>>> work alone and if the NVIDIA driver does not see these interconnects
>>> (because we do not pass the rest of the big 3xGPU group to this guest), 
>>> it
>>> continues with a single GPU. There is an "nvidia-smi -r" big reset 
>>> hammer
>>> which simply refuses to work until all 3 GPUs are passed so there is 
>>> some
>>> distinction between passing 1 or 3 GPUs, and I am trying (as we speak) 
>>> to
>>> get a confirmation from NVIDIA that it is ok to pass just a single GPU.
>>>
>>> So we will either have 6 groups (one per GPU) or 2 groups (one per
>>> interconnected group).
>>
>> I'm not gaining much confidence that we can rely on isolation between
>> NVLink connected GPUs, it sounds like you're simply expecting that
>> proprietary code from NVIDIA on a proprietary interconnect from NVIDIA
>> is going to play nice and nobody will figure out how to do bad things
>> because... obfuscation?  Thanks,
>
> Well, we already believe that a proprietary firmware of a sriov-capable
> adapter like Mellanox ConnextX is not doing bad things, how is this
> different in principle?  

 It seems like the scope and hierarchy are different.  Here we're
 talking about exposing big discrete devices, which are peers of one
 another (and have history of being reverse engineered), to userspace
 drivers.  Once handed to userspace, each of those devices needs to be
 considered untrusted.  In the case of SR-IOV, we typically have a
 trusted host driver

Re: powerpc: 32BIT vs. 64BIT (PPC32 vs. PPC64)

2018-07-30 Thread Masahiro Yamada

2018-07-07 23:59 GMT+09:00 Randy Dunlap :
> On 07/07/2018 05:13 AM, Nicholas Piggin wrote:
>> On Fri, 6 Jul 2018 21:58:29 -0700
>> Randy Dunlap  wrote:
>>
>>> On 07/06/2018 06:45 PM, Benjamin Herrenschmidt wrote:
 On Thu, 2018-07-05 at 14:30 -0700, Randy Dunlap wrote:
> Hi,
>
> Is there a good way (or a shortcut) to do something like:
>
> $ make ARCH=powerpc O=PPC32 [other_options] allmodconfig
>   to get a PPC32/32BIT allmodconfig
>
> and also be able to do:
>
> $make ARCH=powerpc O=PPC64 [other_options] allmodconfig
>   to get a PPC64/64BIT allmodconfig?

 Hrm... O= is for the separate build dir, so there much be something
 else.

 You mean having ARCH= aliases like ppc/ppc32 and ppc64 ?
>>>
>>> Yes.
>>>
 That would be a matter of overriding some .config defaults I suppose, I
 don't know how this is done on other archs.

 I see the aliasing trick in the Makefile but that's about it.

> Note that arch/x86, arch/sh, and arch/sparc have ways to do
> some flavor(s) of this (from Documentation/kbuild/kbuild.txt;
> sh and sparc based on a recent "fix" patch from me):

 I fail to see what you are actually talking about here ... sorry. Do
 you have concrete examples on x86 or sparc ? From what I can tell the
 "i386" or "sparc32/sparc64" aliases just change SRCARCH in Makefile and
 32 vs 64-bit is just a Kconfig option...
>>>
>>> Yes, your summary is mostly correct.
>>>
>>> I'm just looking for a way to do cross-compile builds that are close to
>>> ppc32 allmodconfig and ppc64 allmodconfig.
>>
>> Would there a problem with adding ARCH=ppc32 / ppc64 matching? This
>> seems to work...
>>
>> Thanks,
>> Nick
>
> Yes, this mostly works and is similar to a patch (my patch) on my test 
> machine.
> And they both work for allmodconfig, which is my primary build target.
>
> And they both have one little quirk that is confusing when the build target
> is defconfig:
>
> When ARCH=ppc32, the terminal output (stdout) is: (using O=PPC32)
>
> make[1]: Entering directory '/home/rdunlap/lnx/lnx-418-rc3/PPC32'
>   GEN ./Makefile
> *** Default configuration is based on 'ppc64_defconfig'   < NOTE <
> #
> # configuration written to .config
> #
> make[1]: Leaving directory '/home/rdunlap/lnx/lnx-418-rc3/PPC32'
>


Maybe, we can set one of ppc32 defconfigs to KBUILD_DEFCONFIG
if ARCH is ppc32 ?


ifeq ($(ARCH),ppc32)
   KBUILD_DEFCONFIG := (some reasonable 32bit machine _defconfig)
else
   KBUILD_DEFCONFIG := ppc64_defconfig
endif

ifeq ($(CROSS_COMPILE),)
KBUILD_DEFCONFIG := $(shell uname -m)_defconfig
endif


> I expect that can be fixed also.  :)
>
> And the written .config file is indeed for 32BIT, not 64BIT.
>
> Thanks, Nick.
>
>> ---
>>  Makefile   | 8 
>>  arch/powerpc/Kconfig   | 9 +
>>  arch/powerpc/platforms/Kconfig.cputype | 8 
>>  3 files changed, 17 insertions(+), 8 deletions(-)
>>
>> diff --git a/Makefile b/Makefile
>> index c5ce55cbc543..f97204aed17a 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -345,6 +345,14 @@ ifeq ($(ARCH),sh64)
>> SRCARCH := sh
>>  endif
>>
>> +# Additional ARCH settings for powerpc
>> +ifeq ($(ARCH),ppc32)
>> +   SRCARCH := powerpc
>> +endif
>> +ifeq ($(ARCH),ppc64)
>> +   SRCARCH := powerpc
>> +endif
>> +
>>  KCONFIG_CONFIG   ?= .config
>>  export KCONFIG_CONFIG
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 9f2b75fe2c2d..3405b1b122be 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -1,4 +1,13 @@
>>  # SPDX-License-Identifier: GPL-2.0
>> +
>> +config PPC64
>> + bool "64-bit kernel" if "$(ARCH)" = "powerpc"
>> + default "$(ARCH)" != "ppc32"
>> + select ZLIB_DEFLATE
>> + help
>> +   This option selects whether a 32-bit or a 64-bit kernel
>> +   will be built.
>> +
>>  source "arch/powerpc/platforms/Kconfig.cputype"
>>
>>  config PPC32
>> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
>> b/arch/powerpc/platforms/Kconfig.cputype
>> index e6a1de521319..f6e5d6ef9782 100644
>> --- a/arch/powerpc/platforms/Kconfig.cputype
>> +++ b/arch/powerpc/platforms/Kconfig.cputype
>> @@ -1,12 +1,4 @@
>>  # SPDX-License-Identifier: GPL-2.0
>> -config PPC64
>> - bool "64-bit kernel"
>> - default n
>> - select ZLIB_DEFLATE
>> - help
>> -   This option selects whether a 32-bit or a 64-bit kernel
>> -   will be built.
>> -
>>  menu "Processor support"
>>  choice
>>   prompt "Processor Type"
>>
>
>
> --
> ~Randy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada

[PATCH] powerpc: do not redefined NEED_DMA_MAP_STATE

2018-07-30 Thread Christoph Hellwig

kernel/dma/Kconfig already defines NEED_DMA_MAP_STATE, just select it
from PPC64 and NOT_COHERENT_CACHE instead.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/Kconfig   | 3 ---
 arch/powerpc/platforms/Kconfig.cputype | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9f2b75fe2c2d..f9cae7edd735 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -884,9 +884,6 @@ config ZONE_DMA
bool
default y
 
-config NEED_DMA_MAP_STATE
-   def_bool (PPC64 || NOT_COHERENT_CACHE)
-
 config GENERIC_ISA_DMA
bool
depends on ISA_DMA_API
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index e6a1de521319..a2578bf8d560 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -3,6 +3,7 @@ config PPC64
bool "64-bit kernel"
default n
select ZLIB_DEFLATE
+   select NEED_DMA_MAP_STATE
help
  This option selects whether a 32-bit or a 64-bit kernel
  will be built.
@@ -386,6 +387,7 @@ config NOT_COHERENT_CACHE
depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
default n if PPC_47x
default y
+   select NEED_DMA_MAP_STATE
 
 config CHECK_CACHE_COHERENCY
bool
-- 
2.18.0

[PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code

2018-07-30 Thread Christoph Hellwig

There is nothing arch specific about PCI or dma-debug, so move this
call to common code just after registering the bus type.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/dma.c | 3 ---
 arch/sh/drivers/pci/pci.c | 2 --
 arch/x86/kernel/pci-dma.c | 3 ---
 drivers/pci/pci-driver.c  | 2 +-
 4 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 155170d70324..dbfc7056d7df 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -357,9 +357,6 @@ EXPORT_SYMBOL_GPL(dma_get_required_mask);
 
 static int __init dma_init(void)
 {
-#ifdef CONFIG_PCI
-   dma_debug_add_bus(_bus_type);
-#endif
 #ifdef CONFIG_IBMVIO
dma_debug_add_bus(_bus_type);
 #endif
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index e5b7437ab4af..8256626bc53c 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -160,8 +160,6 @@ static int __init pcibios_init(void)
for (hose = hose_head; hose; hose = hose->next)
pcibios_scanbus(hose);
 
-   dma_debug_add_bus(_bus_type);
-
pci_initialized = 1;
 
return 0;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index ab5d9dd668d2..43f58632f123 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -155,9 +155,6 @@ static int __init pci_iommu_init(void)
 {
struct iommu_table_entry *p;
 
-#ifdef CONFIG_PCI
-   dma_debug_add_bus(_bus_type);
-#endif
x86_init.iommu.iommu_init();
 
for (p = __iommu_table; p < __iommu_table_end; p++) {
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 6792292b5fc7..bef17c3fca67 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1668,7 +1668,7 @@ static int __init pci_driver_init(void)
if (ret)
return ret;
 #endif
-
+   dma_debug_add_bus(_bus_type);
return 0;
 }
 postcore_initcall(pci_driver_init);
-- 
2.18.0

Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver

2018-07-30 Thread Geert Uytterhoeven

Hi Michael,

On Mon, Jul 30, 2018 at 8:47 AM Michael Ellerman  wrote:
> Finn Thain  writes:
> > Now that the PowerMac via-pmu driver supports m68k PowerBooks,
> > switch over to that driver and remove the via-pmu68k driver.
> >
> > Cc: Geert Uytterhoeven 
> > Tested-by: Stan Johnson 
> > Signed-off-by: Finn Thain 
> > ---
> >  arch/m68k/configs/mac_defconfig   |   2 +-
> >  arch/m68k/configs/multi_defconfig |   2 +-
> >  arch/m68k/mac/config.c|   2 +-
> >  arch/m68k/mac/misc.c  |  48 +--
> >  drivers/macintosh/Kconfig |  13 +-
> >  drivers/macintosh/Makefile|   1 -
> >  drivers/macintosh/adb.c   |   2 +-
> >  drivers/macintosh/via-pmu68k.c| 846 
> > --
> >  include/uapi/linux/pmu.h  |   2 +-
> >  9 files changed, 14 insertions(+), 904 deletions(-)
> >  delete mode 100644 drivers/macintosh/via-pmu68k.c
>
> Geert are you OK with this and the other one that touches arch/m68k ?

Sure, feel free to take them through the PPC tree.
Acked-by: Geert Uytterhoeven 

Thanks!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver

2018-07-30 Thread Geert Uytterhoeven

On Mon, Jul 30, 2018 at 9:26 AM Geert Uytterhoeven  wrote:
> On Mon, Jul 2, 2018 at 10:21 AM Finn Thain  wrote:
> > Now that the PowerMac via-pmu driver supports m68k PowerBooks,
> > switch over to that driver and remove the via-pmu68k driver.
> >
> > Cc: Geert Uytterhoeven 
> > Tested-by: Stan Johnson 
> > Signed-off-by: Finn Thain 
>
> Acked-by: Geert Uytterhoeven 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver

2018-07-30 Thread Geert Uytterhoeven

On Mon, Jul 2, 2018 at 10:21 AM Finn Thain  wrote:
> Now that the PowerMac via-pmu driver supports m68k PowerBooks,
> switch over to that driver and remove the via-pmu68k driver.
>
> Cc: Geert Uytterhoeven 
> Tested-by: Stan Johnson 
> Signed-off-by: Finn Thain 

Acked-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v4 08/11] macintosh/via-pmu68k: Don't load driver on unsupported hardware

2018-07-30 Thread Geert Uytterhoeven

On Mon, Jul 2, 2018 at 10:21 AM Finn Thain  wrote:
> Don't load the via-pmu68k driver on early PowerBooks. The M50753 PMU
> device found in those models was never supported by this driver.
> Attempting to load the driver usually causes a boot hang.
>
> Cc: Geert Uytterhoeven 
> Signed-off-by: Finn Thain 
> Reviewed-by: Michael Schmitz 

Acked-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

RE: [PATCH] Adds __init annotation at mmu_init_secondary func

2018-07-30 Thread Michael Ellerman

Alexey Spirkov  writes:

> Without any additional option 
>
> WARNING: modpost: Found 1 section mismatch(es).
>
> If detailed debug is switched on than:
>
> WARNING: vmlinux.o(.text+0x142ac): Section mismatch in reference from the 
> function mmu_init_secondary() to the function .init.text:ppc44x_pin_tlb()
> The function mmu_init_secondary() references
> the function __init ppc44x_pin_tlb().
> This is often because mmu_init_secondary lacks a __init 
> annotation or the annotation of ppc44x_pin_tlb is wrong.

Ah right, thanks.

I checked ppc47x_pin_tlb() but didn't spot the call to ppc44x_pin_tlb().

cheers

Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver

2018-07-30 Thread Michael Ellerman

Finn Thain  writes:

> Now that the PowerMac via-pmu driver supports m68k PowerBooks,
> switch over to that driver and remove the via-pmu68k driver.
>
> Cc: Geert Uytterhoeven 
> Tested-by: Stan Johnson 
> Signed-off-by: Finn Thain 
> ---
>  arch/m68k/configs/mac_defconfig   |   2 +-
>  arch/m68k/configs/multi_defconfig |   2 +-
>  arch/m68k/mac/config.c|   2 +-
>  arch/m68k/mac/misc.c  |  48 +--
>  drivers/macintosh/Kconfig |  13 +-
>  drivers/macintosh/Makefile|   1 -
>  drivers/macintosh/adb.c   |   2 +-
>  drivers/macintosh/via-pmu68k.c| 846 
> --
>  include/uapi/linux/pmu.h  |   2 +-
>  9 files changed, 14 insertions(+), 904 deletions(-)
>  delete mode 100644 drivers/macintosh/via-pmu68k.c

Geert are you OK with this and the other one that touches arch/m68k ?

cheers

Re: [PATCH] powerpc/mm: Don't report PUDs as memory leaks when using kmemleak

2018-07-30 Thread Michael Ellerman

Paul Menzel  writes:
> Am 19.07.2018 um 16:33 schrieb Michael Ellerman:
...
>> 
>> The fix is fairly simple. We need to tell kmemleak to ignore PUD
>> allocations and never report them as leaks. We can also tell it not to
>> scan the PGD, because it will never find pointers in there. However it
>> will still notice if we allocate a PGD and then leak it.
>> 
>> Reported-by: Paul Menzel 
>> Signed-off-by: Michael Ellerman  > ---
>>   arch/powerpc/include/asm/book3s/64/pgalloc.h | 23 +--
>>   1 file changed, 21 insertions(+), 2 deletions(-)
>
> […]
>
> Tested-by: Paul Menzel  on IBM S822LC

Thanks.

cheers

Re: [PATCH] powerpc/mobility: Fix node detach/rename problem

2018-07-30 Thread Michael Ellerman

Michael Bringmann  writes:

> During LPAR migration, the content of the device tree/sysfs may
> be updated including deletion and replacement of nodes in the
> tree.  When nodes are added to the internal node structures, they
> are appended in FIFO order to a list of nodes maintained by the
> OF code APIs.

That hasn't been true for several years. The data structure is an n-ary
tree. What kernel version are you working on?

> When nodes are removed from the device tree, they
> are marked OF_DETACHED, but not actually deleted from the system
> to allow for pointers cached elsewhere in the kernel.  The order
> and content of the entries in the list of nodes is not altered,
> though.

Something is going wrong if this is actually happening.

When the node is detached it should be *detached* from the tree of all
nodes, so it should not be discoverable other than by having an existing
pointer to it.

That's what __of_detach_node() does:

parent = np->parent;
if (WARN_ON(!parent))
return;

if (parent->child == np)
parent->child = np->sibling;
else {
struct device_node *prevsib;
for (prevsib = np->parent->child;
 prevsib->sibling != np;
 prevsib = prevsib->sibling)
;
prevsib->sibling = np->sibling;
}

ie. the node must already have a NULL parent, and then it is spliced out
of its parent's child list.

Please give us more info so we can work out what's actually happening.

cheers

84 matches

Mail list logo