Re: [GIT PULL 0/5] perf/urgent fixes
* Arnaldo Carvalho de Melo wrote: > Hi Ingo, > > Please consider pulling, just to get the build without warnings > and finishing successfully in all my test environments, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 7f635ff187ab6be0b350b3ec06791e376af238ab: > > perf/core: Fix crash when using HW tracing kernel filters (2018-07-25 > 11:46:22 +0200) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git > tags/perf-urgent-for-mingo-4.18-20180730 > > for you to fetch changes up to 44fe619b1418ff4e9d2f9518a940fbe2fb686a08: > > perf tools: Fix the build on the alpine:edge distro (2018-07-30 13:15:03 > -0300) > > > perf/urgent fixes: (Arnaldo Carvalho de Melo) > > - Update the tools copy of several files, including perf_event.h, > powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and > x86's memcpy_64.s (used in 'perf bench mem'), silencing the > respective warnings during the perf tools build. > > - Fix the build on the alpine:edge distro. > > Signed-off-by: Arnaldo Carvalho de Melo > > > Arnaldo Carvalho de Melo (5): > tools headers uapi: Update tools's copy of linux/perf_event.h > tools headers powerpc: Update asm/unistd.h copy to pick new > tools headers uapi: Refresh linux/bpf.h copy > tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench > mem memcpy' > perf tools: Fix the build on the alpine:edge distro > > tools/arch/powerpc/include/uapi/asm/unistd.h | 1 + > tools/arch/x86/include/asm/mcsafe_test.h | 13 > tools/arch/x86/lib/memcpy_64.S | 112 > +-- > tools/include/uapi/linux/bpf.h | 28 +-- > tools/include/uapi/linux/perf_event.h| 2 + > tools/perf/arch/x86/util/pmu.c | 1 + > tools/perf/arch/x86/util/tsc.c | 1 + > tools/perf/bench/Build | 1 + > tools/perf/bench/mem-memcpy-x86-64-asm.S | 1 + > tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 ++ > tools/perf/perf.h| 1 + > tools/perf/util/header.h | 1 + > tools/perf/util/namespaces.h | 1 + > 13 files changed, 124 insertions(+), 63 deletions(-) > create mode 100644 tools/arch/x86/include/asm/mcsafe_test.h > create mode 100644 tools/perf/bench/mem-memcpy-x86-64-lib.c Pulled, thanks a lot Arnaldo! Ingo
Re: [PATCH 1/2] selftests/powerpc: Skip earlier in alignment_handler test
On 31/07/18 14:41, Michael Ellerman wrote: Currently the alignment_handler test prints "Can't open /dev/fb0" about 80 times per run, which is a little annoying. Refactor it to check earlier if it can open /dev/fb0 and skip if not, this results in each test printing something like: test: test_alignment_handler_vsx_206 tags: git_version:v4.18-rc3-134-gfb21a48904aa [SKIP] Test skipped on line 291 skip: test_alignment_handler_vsx_206 Signed-off-by: Michael Ellerman This seems sensible. Acked-by: Andrew Donnellan --- .../powerpc/alignment/alignment_handler.c | 40 +++--- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c b/tools/testing/selftests/powerpc/alignment/alignment_handler.c index 0f2698f9fd6d..0eddd16af49f 100644 --- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c +++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include #include @@ -191,7 +192,7 @@ int test_memcmp(void *s1, void *s2, int n, int offset, char *test_name) */ int do_test(char *test_name, void (*test_func)(char *, char *)) { - int offset, width, fd, rc = 0, r; + int offset, width, fd, rc, r; void *mem0, *mem1, *ci0, *ci1; printf("\tDoing %s:\t", test_name); @@ -199,8 +200,8 @@ int do_test(char *test_name, void (*test_func)(char *, char *)) fd = open("/dev/fb0", O_RDWR); if (fd < 0) { printf("\n"); - perror("Can't open /dev/fb0"); - SKIP_IF(1); + perror("Can't open /dev/fb0 now?"); + return 1; } ci0 = mmap(NULL, bufsize, PROT_WRITE, MAP_SHARED, @@ -226,6 +227,7 @@ int do_test(char *test_name, void (*test_func)(char *, char *)) return rc; } + rc = 0; /* offset = 0 no alignment fault, so skip */ for (offset = 1; offset < 16; offset++) { width = 16; /* vsx == 16 bytes */ @@ -244,32 +246,50 @@ int do_test(char *test_name, void (*test_func)(char *, char *)) r |= test_memcpy(mem1, mem0, width, offset, test_func); if (r && !debug) { printf("FAILED: Got signal"); + rc = 1; break; } r |= test_memcmp(mem1, ci1, width, offset, test_name); - rc |= r; if (r && !debug) { printf("FAILED: Wrong Data"); + rc = 1; break; } } - if (!r) + + if (rc == 0) printf("PASSED"); + printf("\n"); munmap(ci0, bufsize); munmap(ci1, bufsize); free(mem0); free(mem1); + close(fd); Good catch :D return rc; } +static bool can_open_fb0(void) +{ + int fd; + + fd = open("/dev/fb0", O_RDWR); + if (fd < 0) + return false; + + close(fd); + return true; +} + int test_alignment_handler_vsx_206(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + printf("VSX: 2.06B\n"); LOAD_VSX_XFORM_TEST(lxvd2x); LOAD_VSX_XFORM_TEST(lxvw4x); @@ -285,6 +305,8 @@ int test_alignment_handler_vsx_207(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + printf("VSX: 2.07B\n"); LOAD_VSX_XFORM_TEST(lxsspx); LOAD_VSX_XFORM_TEST(lxsiwax); @@ -298,6 +320,8 @@ int test_alignment_handler_vsx_300(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_3_00)); printf("VSX: 3.00B\n"); LOAD_VMX_DFORM_TEST(lxsd); @@ -328,6 +352,8 @@ int test_alignment_handler_integer(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + printf("Integer\n"); LOAD_DFORM_TEST(lbz); LOAD_DFORM_TEST(lbzu); @@ -383,6 +409,8 @@ int test_alignment_handler_vmx(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + printf("VMX\n"); LOAD_VMX_XFORM_TEST(lvx); @@ -408,6 +436,8 @@ int test_alignment_handler_fp(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + printf("Floating point\n"); LOAD_FLOAT_DFORM_TEST(lfd); LOAD_FLOAT_XFORM_TEST(lfdx); -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
[PATCH 2/2] selftests/powerpc: Add more version checks to alignment_handler test
The alignment_handler is documented to only work on Power8/Power9, but we can make it run on older CPUs by guarding more of the tests with feature checks. Signed-off-by: Michael Ellerman --- .../powerpc/alignment/alignment_handler.c | 76 +++--- 1 file changed, 68 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c b/tools/testing/selftests/powerpc/alignment/alignment_handler.c index 0eddd16af49f..bcecc44627af 100644 --- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c +++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c @@ -49,6 +49,8 @@ #include #include +#include + #include "utils.h" int bufsize; @@ -289,6 +291,7 @@ int test_alignment_handler_vsx_206(void) int rc = 0; SKIP_IF(!can_open_fb0()); + SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06)); printf("VSX: 2.06B\n"); LOAD_VSX_XFORM_TEST(lxvd2x); @@ -306,6 +309,7 @@ int test_alignment_handler_vsx_207(void) int rc = 0; SKIP_IF(!can_open_fb0()); + SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_2_07)); printf("VSX: 2.07B\n"); LOAD_VSX_XFORM_TEST(lxsspx); @@ -380,7 +384,6 @@ int test_alignment_handler_integer(void) LOAD_DFORM_TEST(ldu); LOAD_XFORM_TEST(ldx); LOAD_XFORM_TEST(ldux); - LOAD_XFORM_TEST(ldbrx); LOAD_DFORM_TEST(lmw); STORE_DFORM_TEST(stb); STORE_XFORM_TEST(stbx); @@ -400,8 +403,28 @@ int test_alignment_handler_integer(void) STORE_XFORM_TEST(stdx); STORE_DFORM_TEST(stdu); STORE_XFORM_TEST(stdux); - STORE_XFORM_TEST(stdbrx); STORE_DFORM_TEST(stmw); + + if (have_hwcap(PPC_FEATURE_ARCH_2_06)) { + LOAD_XFORM_TEST(ldbrx); + STORE_XFORM_TEST(stdbrx); + } + + return rc; +} + +int test_alignment_handler_integer_206(void) +{ + int rc = 0; + + SKIP_IF(!can_open_fb0()); + SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06)); + + printf("Integer: 2.06\n"); + + LOAD_XFORM_TEST(ldbrx); + STORE_XFORM_TEST(stdbrx); + return rc; } @@ -410,6 +433,7 @@ int test_alignment_handler_vmx(void) int rc = 0; SKIP_IF(!can_open_fb0()); + SKIP_IF(!have_hwcap(PPC_FEATURE_HAS_ALTIVEC)); printf("VMX\n"); LOAD_VMX_XFORM_TEST(lvx); @@ -441,20 +465,14 @@ int test_alignment_handler_fp(void) printf("Floating point\n"); LOAD_FLOAT_DFORM_TEST(lfd); LOAD_FLOAT_XFORM_TEST(lfdx); - LOAD_FLOAT_DFORM_TEST(lfdp); - LOAD_FLOAT_XFORM_TEST(lfdpx); LOAD_FLOAT_DFORM_TEST(lfdu); LOAD_FLOAT_XFORM_TEST(lfdux); LOAD_FLOAT_DFORM_TEST(lfs); LOAD_FLOAT_XFORM_TEST(lfsx); LOAD_FLOAT_DFORM_TEST(lfsu); LOAD_FLOAT_XFORM_TEST(lfsux); - LOAD_FLOAT_XFORM_TEST(lfiwzx); - LOAD_FLOAT_XFORM_TEST(lfiwax); STORE_FLOAT_DFORM_TEST(stfd); STORE_FLOAT_XFORM_TEST(stfdx); - STORE_FLOAT_DFORM_TEST(stfdp); - STORE_FLOAT_XFORM_TEST(stfdpx); STORE_FLOAT_DFORM_TEST(stfdu); STORE_FLOAT_XFORM_TEST(stfdux); STORE_FLOAT_DFORM_TEST(stfs); @@ -466,6 +484,42 @@ int test_alignment_handler_fp(void) return rc; } +int test_alignment_handler_fp_205(void) +{ + int rc = 0; + + SKIP_IF(!can_open_fb0()); + SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_05)); + + printf("Floating point: 2.05\n"); + + LOAD_FLOAT_DFORM_TEST(lfdp); + LOAD_FLOAT_XFORM_TEST(lfdpx); + LOAD_FLOAT_XFORM_TEST(lfiwax); + STORE_FLOAT_DFORM_TEST(stfdp); + STORE_FLOAT_XFORM_TEST(stfdpx); + + if (have_hwcap(PPC_FEATURE_ARCH_2_06)) { + LOAD_FLOAT_XFORM_TEST(lfiwzx); + } + + return rc; +} + +int test_alignment_handler_fp_206(void) +{ + int rc = 0; + + SKIP_IF(!can_open_fb0()); + SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06)); + + printf("Floating point: 2.06\n"); + + LOAD_FLOAT_XFORM_TEST(lfiwzx); + + return rc; +} + void usage(char *prog) { printf("Usage: %s [options]\n", prog); @@ -513,9 +567,15 @@ int main(int argc, char *argv[]) "test_alignment_handler_vsx_300"); rc |= test_harness(test_alignment_handler_integer, "test_alignment_handler_integer"); + rc |= test_harness(test_alignment_handler_integer_206, + "test_alignment_handler_integer_206"); rc |= test_harness(test_alignment_handler_vmx, "test_alignment_handler_vmx"); rc |= test_harness(test_alignment_handler_fp, "test_alignment_handler_fp"); + rc |= test_harness(test_alignment_handler_fp_205, + "test_alignment_handler_fp_205"); + rc |= test_harness(test_alignment_handler_fp_206, + "test_alignment_handler_fp_206");
[PATCH 1/2] selftests/powerpc: Skip earlier in alignment_handler test
Currently the alignment_handler test prints "Can't open /dev/fb0" about 80 times per run, which is a little annoying. Refactor it to check earlier if it can open /dev/fb0 and skip if not, this results in each test printing something like: test: test_alignment_handler_vsx_206 tags: git_version:v4.18-rc3-134-gfb21a48904aa [SKIP] Test skipped on line 291 skip: test_alignment_handler_vsx_206 Signed-off-by: Michael Ellerman --- .../powerpc/alignment/alignment_handler.c | 40 +++--- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c b/tools/testing/selftests/powerpc/alignment/alignment_handler.c index 0f2698f9fd6d..0eddd16af49f 100644 --- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c +++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include #include @@ -191,7 +192,7 @@ int test_memcmp(void *s1, void *s2, int n, int offset, char *test_name) */ int do_test(char *test_name, void (*test_func)(char *, char *)) { - int offset, width, fd, rc = 0, r; + int offset, width, fd, rc, r; void *mem0, *mem1, *ci0, *ci1; printf("\tDoing %s:\t", test_name); @@ -199,8 +200,8 @@ int do_test(char *test_name, void (*test_func)(char *, char *)) fd = open("/dev/fb0", O_RDWR); if (fd < 0) { printf("\n"); - perror("Can't open /dev/fb0"); - SKIP_IF(1); + perror("Can't open /dev/fb0 now?"); + return 1; } ci0 = mmap(NULL, bufsize, PROT_WRITE, MAP_SHARED, @@ -226,6 +227,7 @@ int do_test(char *test_name, void (*test_func)(char *, char *)) return rc; } + rc = 0; /* offset = 0 no alignment fault, so skip */ for (offset = 1; offset < 16; offset++) { width = 16; /* vsx == 16 bytes */ @@ -244,32 +246,50 @@ int do_test(char *test_name, void (*test_func)(char *, char *)) r |= test_memcpy(mem1, mem0, width, offset, test_func); if (r && !debug) { printf("FAILED: Got signal"); + rc = 1; break; } r |= test_memcmp(mem1, ci1, width, offset, test_name); - rc |= r; if (r && !debug) { printf("FAILED: Wrong Data"); + rc = 1; break; } } - if (!r) + + if (rc == 0) printf("PASSED"); + printf("\n"); munmap(ci0, bufsize); munmap(ci1, bufsize); free(mem0); free(mem1); + close(fd); return rc; } +static bool can_open_fb0(void) +{ + int fd; + + fd = open("/dev/fb0", O_RDWR); + if (fd < 0) + return false; + + close(fd); + return true; +} + int test_alignment_handler_vsx_206(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + printf("VSX: 2.06B\n"); LOAD_VSX_XFORM_TEST(lxvd2x); LOAD_VSX_XFORM_TEST(lxvw4x); @@ -285,6 +305,8 @@ int test_alignment_handler_vsx_207(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + printf("VSX: 2.07B\n"); LOAD_VSX_XFORM_TEST(lxsspx); LOAD_VSX_XFORM_TEST(lxsiwax); @@ -298,6 +320,8 @@ int test_alignment_handler_vsx_300(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_3_00)); printf("VSX: 3.00B\n"); LOAD_VMX_DFORM_TEST(lxsd); @@ -328,6 +352,8 @@ int test_alignment_handler_integer(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + printf("Integer\n"); LOAD_DFORM_TEST(lbz); LOAD_DFORM_TEST(lbzu); @@ -383,6 +409,8 @@ int test_alignment_handler_vmx(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + printf("VMX\n"); LOAD_VMX_XFORM_TEST(lvx); @@ -408,6 +436,8 @@ int test_alignment_handler_fp(void) { int rc = 0; + SKIP_IF(!can_open_fb0()); + printf("Floating point\n"); LOAD_FLOAT_DFORM_TEST(lfd); LOAD_FLOAT_XFORM_TEST(lfdx); -- 2.14.1
Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
On 31/07/2018 02:29, Alex Williamson wrote: > On Mon, 30 Jul 2018 18:58:49 +1000 > Alexey Kardashevskiy wrote: > >> On 11/07/2018 19:26, Alexey Kardashevskiy wrote: >>> On Tue, 10 Jul 2018 16:37:15 -0600 >>> Alex Williamson wrote: >>> On Tue, 10 Jul 2018 14:10:20 +1000 Alexey Kardashevskiy wrote: > On Thu, 7 Jun 2018 23:03:23 -0600 > Alex Williamson wrote: > >> On Fri, 8 Jun 2018 14:14:23 +1000 >> Alexey Kardashevskiy wrote: >> >>> On 8/6/18 1:44 pm, Alex Williamson wrote: On Fri, 8 Jun 2018 13:08:54 +1000 Alexey Kardashevskiy wrote: > On 8/6/18 8:15 am, Alex Williamson wrote: >> On Fri, 08 Jun 2018 07:54:02 +1000 >> Benjamin Herrenschmidt wrote: >> >>> On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote: >>> Can we back up and discuss whether the IOMMU grouping of NVLink connected devices makes sense? AIUI we have a PCI view of these devices and from that perspective they're isolated. That's the view of the device used to generate the grouping. However, not visible to us, these devices are interconnected via NVLink. What isolation properties does NVLink provide given that its entire purpose for existing seems to be to provide a high performance link for p2p between devices? >>> >>> Not entire. On POWER chips, we also have an nvlink between the >>> device >>> and the CPU which is running significantly faster than PCIe. >>> >>> But yes, there are cross-links and those should probably be >>> accounted >>> for in the grouping. >> >> Then after we fix the grouping, can we just let the host driver >> manage >> this coherent memory range and expose vGPUs to guests? The use case >> of >> assigning all 6 GPUs to one VM seems pretty limited. (Might need to >> convince NVIDIA to support more than a single vGPU per VM though) >> > > These are physical GPUs, not virtual sriov-alike things they are > implementing as well elsewhere. vGPUs as implemented on M- and P-series Teslas aren't SR-IOV like either. That's why we have mdev devices now to implement software defined devices. I don't have first hand experience with V-series, but I would absolutely expect a PCIe-based Tesla V100 to support vGPU. >>> >>> So assuming V100 can do vGPU, you are suggesting ditching this patchset >>> and >>> using mediated vGPUs instead, correct? >> >> If it turns out that our PCIe-only-based IOMMU grouping doesn't >> account for lack of isolation on the NVLink side and we correct that, >> limiting assignment to sets of 3 interconnected GPUs, is that still a >> useful feature? OTOH, it's entirely an NVIDIA proprietary decision >> whether they choose to support vGPU on these GPUs or whether they can >> be convinced to support multiple vGPUs per VM. >> > My current understanding is that every P9 chip in that box has some > NVLink2 > logic on it so each P9 is directly connected to 3 GPUs via PCIe and > 2xNVLink2, and GPUs in that big group are interconnected by NVLink2 > links > as well. > > From small bits of information I have it seems that a GPU can > perfectly > work alone and if the NVIDIA driver does not see these interconnects > (because we do not pass the rest of the big 3xGPU group to this > guest), it > continues with a single GPU. There is an "nvidia-smi -r" big reset > hammer > which simply refuses to work until all 3 GPUs are passed so there is > some > distinction between passing 1 or 3 GPUs, and I am trying (as we > speak) to > get a confirmation from NVIDIA that it is ok to pass just a single > GPU. > > So we will either have 6 groups (one per GPU) or 2 groups (one per > interconnected group). I'm not gaining much confidence that we can rely on isolation between NVLink connected GPUs, it sounds like you're simply expecting that proprietary code from NVIDIA on a proprietary interconnect from NVIDIA is going to play nice and nobody will figure out how to do bad things because... obfuscation? Thanks, >>> >>> Well, we already believe that a proprietary firmware of a sriov-capable >>> adapter like Mellanox ConnextX is not doing bad things, how is
Re: [RFC 1/4] virtio: Define virtio_direct_dma_ops structure
On 07/30/2018 02:54 PM, Christoph Hellwig wrote: >> +/* >> + * Virtio direct mapping DMA API operations structure >> + * >> + * This defines DMA API structure for all virtio devices which would not >> + * either bring in their own DMA OPS from architecture or they would not >> + * like to use architecture specific IOMMU based DMA OPS because QEMU >> + * expects GPA instead of an IOVA in absence of VIRTIO_F_IOMMU_PLATFORM. >> + */ >> +dma_addr_t virtio_direct_map_page(struct device *dev, struct page *page, >> +unsigned long offset, size_t size, >> +enum dma_data_direction dir, >> +unsigned long attrs) > > All these functions should probably be marked static. Sure. > >> +void virtio_direct_unmap_page(struct device *hwdev, dma_addr_t dev_addr, >> +size_t size, enum dma_data_direction dir, >> +unsigned long attrs) >> +{ >> +} > > No need to implement no-op callbacks in struct dma_map_ops. Okay. > >> + >> +int virtio_direct_mapping_error(struct device *hwdev, dma_addr_t dma_addr) >> +{ >> +return 0; >> +} > > Including this one. > >> +void *virtio_direct_alloc(struct device *dev, size_t size, dma_addr_t >> *dma_handle, >> +gfp_t gfp, unsigned long attrs) >> +{ >> +void *queue = alloc_pages_exact(PAGE_ALIGN(size), gfp); >> + >> +if (queue) { >> +phys_addr_t phys_addr = virt_to_phys(queue); >> +*dma_handle = (dma_addr_t)phys_addr; >> + >> +if (WARN_ON_ONCE(*dma_handle != phys_addr)) { >> +free_pages_exact(queue, PAGE_ALIGN(size)); >> +return NULL; >> +} >> +} >> +return queue; > > queue is a very odd name in a generic memory allocator. Will change it to addr. > >> +void virtio_direct_free(struct device *dev, size_t size, void *vaddr, >> +dma_addr_t dma_addr, unsigned long attrs) >> +{ >> +free_pages_exact(vaddr, PAGE_ALIGN(size)); >> +} >> + >> +const struct dma_map_ops virtio_direct_dma_ops = { >> +.alloc = virtio_direct_alloc, >> +.free = virtio_direct_free, >> +.map_page = virtio_direct_map_page, >> +.unmap_page = virtio_direct_unmap_page, >> +.mapping_error = virtio_direct_mapping_error, >> +}; > > This is missing a dma_map_sg implementation. In general this is > mandatory for dma_ops. So either you implement it or explain in > a common why you think you can skip it. Hmm. IIUC virtio core never used dma_map_sg(). Am I missing something here ? The only reference to dma_map_sg() is inside a comment. $git grep dma_map_sg drivers/virtio/ drivers/virtio/virtio_ring.c:* We can't use dma_map_sg, because we don't use scatterlists in > >> +EXPORT_SYMBOL(virtio_direct_dma_ops); > > EXPORT_SYMBOL_GPL like all virtio symbols, please. I am planning to drop EXPORT_SYMBOL from virtio_direct_dma_ops structure.
Re: [PATCH] powerpc: Add a checkpatch wrapper with our preferred settings
On Wed, 2018-07-25 at 00:03 +1000, Michael Ellerman wrote: > This makes it easy to run checkpatch with settings that we have > agreed > on (bwhahahahah). > > Usage is eg: > > $ ./arch/powerpc/tools/checkpatch.sh -g origin/master.. > > To check all commits since origin/master. > > Signed-off-by: Michael Ellerman Reviewed-by: Russell Currey
Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
On 07/30/2018 05:59 PM, Tyrel Datwyler wrote: > On 07/29/2018 06:11 AM, Michael Bringmann wrote: >> During LPAR migration, the content of the device tree/sysfs may >> be updated including deletion and replacement of nodes in the >> tree. When nodes are added to the internal node structures, they >> are appended in FIFO order to a list of nodes maintained by the >> OF code APIs. When nodes are removed from the device tree, they >> are marked OF_DETACHED, but not actually deleted from the system >> to allow for pointers cached elsewhere in the kernel. The order >> and content of the entries in the list of nodes is not altered, >> though. >> >> During LPAR migration some common nodes are deleted and re-added >> e.g. "ibm,platform-facilities". If a node is re-added to the OF >> node lists, the of_attach_node function checks to make sure that >> the name + ibm,phandle of the to-be-added data is unique. As the >> previous copy of a re-added node is not modified beyond the addition >> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING >> notice to the console, (3) renames the to-be-added node to avoid >> filename collisions within a directory, and (3) adds entries to >> the sysfs/kernfs. > > So, this patch actually just band aids over the real problem. This is a long > standing problem with several PFO drivers leaking references. The issue here > is that, during the device tree update that follows a migration. the update > of the ibm,platform-facilities node and friends below are always deleted and > re-added on the destination lpar and subsequently the leaked references > prevent the devices nodes from every actually being properly cleaned up after > detach. Thus, leading to the issue you are observing. > > As and example a quick look at nx-842-pseries.c reveals several issues. > > static int __init nx842_pseries_init(void) > { > struct nx842_devdata *new_devdata; > int ret; > > if (!of_find_compatible_node(NULL, NULL, "ibm,compression")) > return -ENODEV; > > This call to of_find_compatible_node() results in a node returned with > refcount incremented and therefore immediately leaked. > > Further, the reconfig notifier logic makes the assumption that it only needs > to deal with node updates, but as I outlined above the post migration device > tree update always results in PFO nodes and properties being deleted and > re-added. > > /** > * nx842_OF_notifier - Process updates to OF properties for the device > * > * @np: notifier block > * @action: notifier action > * @update: struct pSeries_reconfig_prop_update pointer if action is > * PSERIES_UPDATE_PROPERTY > * > * Returns: > * NOTIFY_OK on success > * NOTIFY_BAD encoded with error number on failure, use > * notifier_to_errno() to decode this value > */ > static int nx842_OF_notifier(struct notifier_block *np, unsigned long action, > void *data) > { > struct of_reconfig_data *upd = data; > struct nx842_devdata *local_devdata; > struct device_node *node = NULL; > > rcu_read_lock(); > local_devdata = rcu_dereference(devdata); > if (local_devdata) > node = local_devdata->dev->of_node; > > if (local_devdata && > action == OF_RECONFIG_UPDATE_PROPERTY && > !strcmp(upd->dn->name, node->name)) { > rcu_read_unlock(); > nx842_OF_upd(upd->prop); > } else > rcu_read_unlock(); > > return NOTIFY_OK; > } > > I expect to find the same problems in pseries-rng.c and nx.c. So, in actuality the main root of the problem is really in the vio core. A node reference for each PFO device under "ibm,platform-facilities" is taken during vio_register_device_node(). We need a reconfig notifier to call vio_unregister_device() for each PFO device on detach, and vio_bus_scan_register_devices("ibm,platform-facilities") on attach. This will make sure the PFO vio devices are released such that vio_dev_release() gets called to put the node reference that was taken at original registration time. /* vio_dev refcount hit 0 */ static void vio_dev_release(struct device *dev) { struct iommu_table *tbl = get_iommu_table_base(dev); if (tbl) iommu_tce_table_put(tbl); of_node_put(dev->of_node); kfree(to_vio_dev(dev)); } -Tyrel > > -Tyrel > >> >> This patch fixes the 'migration add' problem by changing the >> stored 'phandle' of the OF_DETACHed node to 0 (reserved value for >> of_find_node_by_phandle), so that subsequent re-add operations, >> such as those during migration, do not find the detached node, >> do not observe duplicate names, do not rename them, and the >> extra WARNING notices are removed from the console output. >> >> In addition, it erases the 'name' field of the OF_DETACHED node, >> to prevent any
[PATCH] powerpc/4xx: Fix error return path in ppc4xx_msi_probe()
An arbitrary error in ppc4xx_msi_probe() quite likely results in a crash similar to the following, seen after dma_alloc_coherent() returned an error. Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc001bff0 Oops: Kernel access of bad area, sig: 11 [#1] BE Canyonlands Modules linked in: CPU: 0 PID: 1 Comm: swapper Tainted: GW 4.18.0-rc6-00010-gff33d1030a6c #1 NIP: c001bff0 LR: c001c418 CTR: c01faa7c REGS: cf82db40 TRAP: 0300 Tainted: GW (4.18.0-rc6-00010-gff33d1030a6c) MSR: 00029000 CR: 28002024 XER: DEAR: ESR: GPR00: c001c418 cf82dbf0 cf828000 cf8de400 00c4 00c4 GPR08: c0481ea4 00c4 22002024 c00025e8 GPR16: c0492380 004a GPR24: 00029000 000c cf8de410 c0494d60 c0494d60 cf8bebc0 0001 NIP [c001bff0] ppc4xx_of_msi_remove+0x48/0xa0 LR [c001c418] ppc4xx_msi_probe+0x294/0x3b8 Call Trace: [cf82dbf0] [00029000] 0x29000 (unreliable) [cf82dc10] [c001c418] ppc4xx_msi_probe+0x294/0x3b8 [cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c [cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350 [cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac [cf82dcf0] [c0207e88] __device_attach+0xe8/0x160 [cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc [cf82dd40] [c02050c8] device_add+0x404/0x5c4 [cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8 [cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220 [cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220 [cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec [cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54 [cf82dea0] [c0002404] do_one_initcall+0x40/0x188 [cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0 [cf82df30] [c0002600] kernel_init+0x18/0x104 [cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c Instruction dump: 90010024 813d0024 2f89 83c30058 41bd0014 4838 813d0024 7f89f800 409d002c 813e000c 57ea103a 3bff0001 <7c69502e> 2f83 419effe0 4803b26d ---[ end trace 8cf551077ecfc42a ]--- Fix it up. Specifically, - Return valid error codes from ppc4xx_setup_pcieh_hw(), have it clean up after itself, and only access hardware after all possible error conditions have been handled. - Use devm_kzalloc() instead of kzalloc() in ppc4xx_msi_probe() Signed-off-by: Guenter Roeck --- arch/powerpc/platforms/4xx/msi.c | 51 +++- 1 file changed, 30 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/platforms/4xx/msi.c b/arch/powerpc/platforms/4xx/msi.c index 81b2cbce7df8..7c324eff2f22 100644 --- a/arch/powerpc/platforms/4xx/msi.c +++ b/arch/powerpc/platforms/4xx/msi.c @@ -146,13 +146,19 @@ static int ppc4xx_setup_pcieh_hw(struct platform_device *dev, const u32 *sdr_addr; dma_addr_t msi_phys; void *msi_virt; + int err; sdr_addr = of_get_property(dev->dev.of_node, "sdr-base", NULL); if (!sdr_addr) - return -1; + return -EINVAL; - mtdcri(SDR0, *sdr_addr, upper_32_bits(res.start)); /*HIGH addr */ - mtdcri(SDR0, *sdr_addr + 1, lower_32_bits(res.start)); /* Low addr */ + msi_data = of_get_property(dev->dev.of_node, "msi-data", NULL); + if (!msi_data) + return -EINVAL; + + msi_mask = of_get_property(dev->dev.of_node, "msi-mask", NULL); + if (!msi_mask) + return -EINVAL; msi->msi_dev = of_find_node_by_name(NULL, "ppc4xx-msi"); if (!msi->msi_dev) @@ -160,30 +166,30 @@ static int ppc4xx_setup_pcieh_hw(struct platform_device *dev, msi->msi_regs = of_iomap(msi->msi_dev, 0); if (!msi->msi_regs) { - dev_err(>dev, "of_iomap problem failed\n"); - return -ENOMEM; + dev_err(>dev, "of_iomap failed\n"); + err = -ENOMEM; + goto node_put; } dev_dbg(>dev, "PCIE-MSI: msi register mapped 0x%x 0x%x\n", (u32) (msi->msi_regs + PEIH_TERMADH), (u32) (msi->msi_regs)); msi_virt = dma_alloc_coherent(>dev, 64, _phys, GFP_KERNEL); - if (!msi_virt) - return -ENOMEM; + if (!msi_virt) { + err = -ENOMEM; + goto iounmap; + } msi->msi_addr_hi = upper_32_bits(msi_phys); msi->msi_addr_lo = lower_32_bits(msi_phys & 0x); dev_dbg(>dev, "PCIE-MSI: msi address high 0x%x, low 0x%x\n", msi->msi_addr_hi, msi->msi_addr_lo); + mtdcri(SDR0, *sdr_addr, upper_32_bits(res.start)); /*HIGH addr */ + mtdcri(SDR0, *sdr_addr + 1, lower_32_bits(res.start)); /* Low addr */ + /* Progam the Interrupt handler Termination addr registers */ out_be32(msi->msi_regs + PEIH_TERMADH, msi->msi_addr_hi); out_be32(msi->msi_regs + PEIH_TERMADL, msi->msi_addr_lo); -
Re: [PATCH 0/6] rapidio: move Kconfig menu definition to subsystem
On 07/30/2018 03:50 PM, Alexei Colin wrote: > The top-level Kconfig entry for RapidIO subsystem is currently > duplicated in several architecture-specific Kconfig files. This set of > patches does two things: > > 1. Move the Kconfig menu definition into the RapidIO subsystem and > remove the duplicate definitions from arch Kconfig files. > > 2. Enable RapidIO Kconfig menu entry for arm and arm64 architectures, > where it was not enabled before. I tested that subsystem and drivers > build successfully for both architectures, and tested that the modules > load on a custom arm64 Qemu model. > > For all architectures, RapidIO menu should be offered when either: > (1) The platform has a PCI bus (which host a RapidIO module on the bus). > (2) The platform has a RapidIO IP block (connected to a system bus, e.g. > AXI on ARM). In this case, 'select HAS_RAPIDIO' should be added to the > 'config ARCH_*' menu entry for the SoCs that offer the IP block. > > Prior to this patchset, different architectures used different criteria: > * powerpc: (1) and (2) > * mips: (1) and (2) after recent commit into next that added (2): > https://www.linux-mips.org/archives/linux-mips/2018-07/msg00596.html > fc5d988878942e9b42a4de5204bdd452f3f1ce47 > 491ec1553e0075f345fbe476a93775eabcbc40b6 > * x86: (1) > * arm,arm64: none (RapidIO menus never offered) > > Responses to feedback from prior submission (thanks for the reviews!): > http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593347.html > http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593349.html > > Changelog: > * Moved Kconfig entry into RapidIO subsystem instead of duplicating > > In the current patchset, I took the approach of adding '|| PCI' to the > depends in the subsystem. I did try the alterantive approach mentioned > in the reviews for v1 of this patch, where the subsystem Kconfig does > not add a '|| PCI' and each per-architecture Kconfig has to add a > 'select HAS_RAPIDIO if PCI' and SoCs with IP blocks have to also add > 'select HAS_RAPIDIO'. This works too but requires each architecture's > Kconfig to add the line for RapidIO (whereas current approach does not > require that involvement) and also may create a false impression that > the dependency on PCI is strict. > > We appreciate the suggestion for also selecting the RapdiIO subsystem for > compilation with COMPILE_TEST, but hope to address it in a separate > patchset, localized to the subsystem, since it will need to change > depends on all drivers, not just on the top level, and since this > patch now spans multiple architectures. > > > Alexei Colin (6): > rapidio: define top Kconfig menu in driver subtree > x86: factor out RapidIO Kconfig menu > powerpc: factor out RapidIO Kconfig menu entry > mips: factor out RapidIO Kconfig entry > arm: enable RapidIO menu in Kconfig > arm64: enable RapidIO menu in Kconfig > > arch/arm/Kconfig| 2 ++ > arch/arm64/Kconfig | 2 ++ > arch/mips/Kconfig | 11 --- > arch/powerpc/Kconfig| 13 + > arch/x86/Kconfig| 8 > drivers/rapidio/Kconfig | 15 +++ > 6 files changed, 20 insertions(+), 31 deletions(-) > LGTM. Acked-by: Randy Dunlap # for the series thanks, -- ~Randy
Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
On 07/29/2018 06:11 AM, Michael Bringmann wrote: > During LPAR migration, the content of the device tree/sysfs may > be updated including deletion and replacement of nodes in the > tree. When nodes are added to the internal node structures, they > are appended in FIFO order to a list of nodes maintained by the > OF code APIs. When nodes are removed from the device tree, they > are marked OF_DETACHED, but not actually deleted from the system > to allow for pointers cached elsewhere in the kernel. The order > and content of the entries in the list of nodes is not altered, > though. > > During LPAR migration some common nodes are deleted and re-added > e.g. "ibm,platform-facilities". If a node is re-added to the OF > node lists, the of_attach_node function checks to make sure that > the name + ibm,phandle of the to-be-added data is unique. As the > previous copy of a re-added node is not modified beyond the addition > of a bit flag, the code (1) finds the old copy, (2) prints a WARNING > notice to the console, (3) renames the to-be-added node to avoid > filename collisions within a directory, and (3) adds entries to > the sysfs/kernfs. So, this patch actually just band aids over the real problem. This is a long standing problem with several PFO drivers leaking references. The issue here is that, during the device tree update that follows a migration. the update of the ibm,platform-facilities node and friends below are always deleted and re-added on the destination lpar and subsequently the leaked references prevent the devices nodes from every actually being properly cleaned up after detach. Thus, leading to the issue you are observing. As and example a quick look at nx-842-pseries.c reveals several issues. static int __init nx842_pseries_init(void) { struct nx842_devdata *new_devdata; int ret; if (!of_find_compatible_node(NULL, NULL, "ibm,compression")) return -ENODEV; This call to of_find_compatible_node() results in a node returned with refcount incremented and therefore immediately leaked. Further, the reconfig notifier logic makes the assumption that it only needs to deal with node updates, but as I outlined above the post migration device tree update always results in PFO nodes and properties being deleted and re-added. /** * nx842_OF_notifier - Process updates to OF properties for the device * * @np: notifier block * @action: notifier action * @update: struct pSeries_reconfig_prop_update pointer if action is * PSERIES_UPDATE_PROPERTY * * Returns: * NOTIFY_OK on success * NOTIFY_BAD encoded with error number on failure, use * notifier_to_errno() to decode this value */ static int nx842_OF_notifier(struct notifier_block *np, unsigned long action, void *data) { struct of_reconfig_data *upd = data; struct nx842_devdata *local_devdata; struct device_node *node = NULL; rcu_read_lock(); local_devdata = rcu_dereference(devdata); if (local_devdata) node = local_devdata->dev->of_node; if (local_devdata && action == OF_RECONFIG_UPDATE_PROPERTY && !strcmp(upd->dn->name, node->name)) { rcu_read_unlock(); nx842_OF_upd(upd->prop); } else rcu_read_unlock(); return NOTIFY_OK; } I expect to find the same problems in pseries-rng.c and nx.c. -Tyrel > > This patch fixes the 'migration add' problem by changing the > stored 'phandle' of the OF_DETACHed node to 0 (reserved value for > of_find_node_by_phandle), so that subsequent re-add operations, > such as those during migration, do not find the detached node, > do not observe duplicate names, do not rename them, and the > extra WARNING notices are removed from the console output. > > In addition, it erases the 'name' field of the OF_DETACHED node, > to prevent any future calls to of_find_node_by_name() or > of_find_node_by_path() from matching this node. > > Signed-off-by: Michael Bringmann > --- > arch/powerpc/platforms/pseries/dlpar.c |3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/powerpc/platforms/pseries/dlpar.c > b/arch/powerpc/platforms/pseries/dlpar.c > index 2de0f0d..9d82c28 100644 > --- a/arch/powerpc/platforms/pseries/dlpar.c > +++ b/arch/powerpc/platforms/pseries/dlpar.c > @@ -274,6 +274,9 @@ int dlpar_detach_node(struct device_node *dn) > if (rc) > return rc; > > + dn->phandle = 0; > + memset(dn->name, 0, strlen(dn->name)); > + > return 0; > } > >
Re: [PATCH 2/3] selftests/powerpc: Only run some tests on ppc64le
Hi Michael, On 07/27/2018 02:51 AM, Michael Ellerman wrote: Gustavo Romero writes: Hi Michael, On 07/26/2018 09:24 AM, Michael Ellerman wrote: These tests are currently failing on (some) big endian systems. Until we can fix that, skip them unless we're on ppc64le. I can take a look at this. Thanks! Is that the same issue related to the gcc version we discussed a month ago? Maybe, I've forgotten :) If not, could you provide the crash logs as a starting point? There's not much: test: tm_tar tags: git_version:9794259 failure: tm_tar run-parts: ./tm-tar exited with return code 1 test: tm_vmxcopy tags: git_version:9794259 !! child died by signal 11 failure: tm_vmxcopy run-parts: ./tm-vmxcopy exited with return code 1 And now I can't get the sigreturn one to fail :/ OK. I _think_ there is a chance it's the same issue we've discussed. Hence, just for the record (in case somebody else wants to take a look at it as well), on the occasion we had the following log about tm-sigreturn, that although is not listed above (only tm-tar and tm-vmxcopy are ), it's marked to run only on LE as per this patch, just like tm-tar and tm-vmxcopy: [ 403.458953] Unexpected TM Bad Thing exception at c006da4c (msr 0x201032) 13:59:41 console: [ 403.459865] Oops: Unrecoverable exception, sig: 6 [#1] 13:59:41 console: [ 403.460286] BE SMP NR_CPUS=32 NUMA pSeries 13:59:41 console: [ 403.460611] Modules linked in: 13:59:41 console: [ 403.460929] CPU: 2 PID: 25233 Comm: tm-sigreturn Not tainted 4.16.0-gcc6x-g709b973 #1 13:59:41 console: [ 403.461530] NIP: c006da4c LR: c001d6a4 CTR: 13:59:41 console: [ 403.461782] REGS: c0003ffcbd80 TRAP: 0700 Not tainted (4.16.0-gcc6x-g709b973) 13:59:41 console: [ 403.461943] MSR: 800300201032 CR: 48004288 XER: 2000 13:59:41 console: [ 403.462112] CFAR: c001d6a0 SOFTE: 3 13:59:41 console: [ 403.462112] PACATMSCRATCH: 00034280f032 13:59:41 console: [ 403.462112] GPR00: d4018c01 c000f09bfc20 c165bd00 c000fbd5c900 13:59:41 console: [ 403.462112] GPR04: 80039032 13:59:41 console: [ 403.462112] GPR08: 0001 0001 13:59:41 console: [ 403.462112] GPR12: c0003fffdf00 000f 1003a338 13:59:41 console: [ 403.462112] GPR16: 10002e5c 10002e44 10002d08 10002e24 13:59:41 console: [ 403.462112] GPR20: 10002eac c000fbd5cdc8 c0df6578 13:59:41 console: [ 403.462112] GPR24: bfff fffef394 fffeee10 13:59:41 console: [ 403.462112] GPR28: c000f09bfea0 0280d032 c000fbd5c900 13:59:41 console: [ 403.463642] NIP [c006da4c] .tm_restore_sprs+0xc/0x1c 13:59:41 console: [ 403.463782] LR [c001d6a4] .tm_recheckpoint.part.7+0x54/0x90 13:59:41 console: [ 403.463912] Call Trace: 13:59:41 console: [ 403.464003] [c000f09bfc20] [c0312ae0] .__might_fault+0x70/0x90 (unreliable) 13:59:41 console: [ 403.464165] [c000f09bfca0] [c001b2b8] .restore_tm_user_regs.part.0+0x418/0x6c0 13:59:41 console: [ 403.464326] [c000f09bfd70] [c001c55c] .compat_sys_sigreturn+0x14c/0x490 13:59:41 console: [ 403.464495] [c000f09bfe30] [c000c070] system_call+0x58/0x6c 13:59:41 console: [ 403.464624] Instruction dump: 13:59:41 console: [ 403.464705] 7c800164 4e800020 7c0022a6 f80304b0 7c0222a6 f80304b8 7c0122a6 f80304c0 13:59:41 console: [ 403.464868] 4e800020 e80304b0 7c0023a6 e80304b8 <7c0223a6> e80304c0 7c0123a6 4e800020 13:59:41 console: [ 403.465030] ---[ end trace 0045efc572a679cf ]--- and some initial debugging by mpe using xmon: Unexpected TM Bad Thing exception at c0069e6c (msr 0x201032) cpu 0x6: Vector: 700 (Program Check) at [c0003ff9bd80] pc: c0069e6c: .tm_restore_sprs+0xc/0x1c lr: c001e2a4: .tm_recheckpoint.part.11+0x64/0xf0 sp: c000f2a2bc30 msr: 800300201032 current = 0xc000fa68b000 paca= 0xc0003fff8f00 softe: 3 irq_happened: 0x01 pid = 3879, comm = tm-sigreturn Linux version 4.17.0-rc1-gcc-7.0.1-5-g56376c5864f8 (mich...@ka3.ozlabs.ibm.com) (gcc version 7.0.1 20170213 (experimental) (Custom 8e8a14c238db56c7)) #172 SMP Thu Apr 19 21:48:31 AEST 2018 enter ? for help [link register ] c001e2a4 .tm_recheckpoint.part.11+0x64/0xf0 [c000f2a2bc30] c000f2a2bcb0 (unreliable) [c000f2a2bcb0] c001c30c .restore_tm_user_regs.part.0+0x55c/0x610 [c000f2a2bd70] c001d400 .compat_sys_sigreturn+0x130/0x410 [c000f2a2be30] c000c268 system_call+0x58/0x6c --- Exception: 300 (Data Access) at 19c8 SP (ffd75e90) is in userspace 6:mon> di %pc c0069e6c 7c0223a6mtspr
[PATCH 0/6] rapidio: move Kconfig menu definition to subsystem
The top-level Kconfig entry for RapidIO subsystem is currently duplicated in several architecture-specific Kconfig files. This set of patches does two things: 1. Move the Kconfig menu definition into the RapidIO subsystem and remove the duplicate definitions from arch Kconfig files. 2. Enable RapidIO Kconfig menu entry for arm and arm64 architectures, where it was not enabled before. I tested that subsystem and drivers build successfully for both architectures, and tested that the modules load on a custom arm64 Qemu model. For all architectures, RapidIO menu should be offered when either: (1) The platform has a PCI bus (which host a RapidIO module on the bus). (2) The platform has a RapidIO IP block (connected to a system bus, e.g. AXI on ARM). In this case, 'select HAS_RAPIDIO' should be added to the 'config ARCH_*' menu entry for the SoCs that offer the IP block. Prior to this patchset, different architectures used different criteria: * powerpc: (1) and (2) * mips: (1) and (2) after recent commit into next that added (2): https://www.linux-mips.org/archives/linux-mips/2018-07/msg00596.html fc5d988878942e9b42a4de5204bdd452f3f1ce47 491ec1553e0075f345fbe476a93775eabcbc40b6 * x86: (1) * arm,arm64: none (RapidIO menus never offered) Responses to feedback from prior submission (thanks for the reviews!): http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593347.html http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593349.html Changelog: * Moved Kconfig entry into RapidIO subsystem instead of duplicating In the current patchset, I took the approach of adding '|| PCI' to the depends in the subsystem. I did try the alterantive approach mentioned in the reviews for v1 of this patch, where the subsystem Kconfig does not add a '|| PCI' and each per-architecture Kconfig has to add a 'select HAS_RAPIDIO if PCI' and SoCs with IP blocks have to also add 'select HAS_RAPIDIO'. This works too but requires each architecture's Kconfig to add the line for RapidIO (whereas current approach does not require that involvement) and also may create a false impression that the dependency on PCI is strict. We appreciate the suggestion for also selecting the RapdiIO subsystem for compilation with COMPILE_TEST, but hope to address it in a separate patchset, localized to the subsystem, since it will need to change depends on all drivers, not just on the top level, and since this patch now spans multiple architectures. Alexei Colin (6): rapidio: define top Kconfig menu in driver subtree x86: factor out RapidIO Kconfig menu powerpc: factor out RapidIO Kconfig menu entry mips: factor out RapidIO Kconfig entry arm: enable RapidIO menu in Kconfig arm64: enable RapidIO menu in Kconfig arch/arm/Kconfig| 2 ++ arch/arm64/Kconfig | 2 ++ arch/mips/Kconfig | 11 --- arch/powerpc/Kconfig| 13 + arch/x86/Kconfig| 8 drivers/rapidio/Kconfig | 15 +++ 6 files changed, 20 insertions(+), 31 deletions(-) -- 2.18.0
[PATCH 3/6] powerpc: factor out RapidIO Kconfig menu entry
The menu entry is now defined in the rapidio subtree. Also, re-order the bus menu so tha the platform-specific RapidIO controller appears after the entry for the RapidIO subsystem. Platforms with a PCI bus will be offered the RapidIO menu since they may be want support for a RapidIO PCI device. Platforms without a PCI bus that might include a RapidIO IP block will need to "select HAS_RAPIDIO" in the platform-/machine-specific "config ARCH_*" Kconfig entry. Cc: Andrew Morton Cc: John Paul Walters Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Alexei Colin --- arch/powerpc/Kconfig | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 25d005af0a5b..17ea8a5f90a0 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -993,16 +993,7 @@ source "drivers/pci/Kconfig" source "drivers/pcmcia/Kconfig" -config HAS_RAPIDIO - bool - default n - -config RAPIDIO - tristate "RapidIO support" - depends on HAS_RAPIDIO || PCI - help - If you say Y here, the kernel will include drivers and - infrastructure code to support RapidIO interconnect devices. +source "drivers/rapidio/Kconfig" config FSL_RIO bool "Freescale Embedded SRIO Controller support" @@ -1012,8 +1003,6 @@ config FSL_RIO Include support for RapidIO controller on Freescale embedded processors (MPC8548, MPC8641, etc). -source "drivers/rapidio/Kconfig" - endmenu config NONSTATIC_KERNEL -- 2.18.0
Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()
Hi, Christophe. On Mon, Jul 30, 2018 at 06:30:47PM +0200, LEROY Christophe wrote: > Murilo Opsfelder Araujo a écrit : > > > Hi, Christophe. > > > > On Fri, Jul 27, 2018 at 06:40:23PM +0200, LEROY Christophe wrote: > > > Murilo Opsfelder Araujo a écrit : > > > > > > > Simplify the message format by using REG_FMT as the register format. > > > > This > > > > avoids having two different formats and avoids checking for MSR_64BIT. > > > > > > Are you sure it is what we want ? > > > > Yes. > > > > > Won't it change the behaviour for a 32 bits app running on a 64bits > > > kernel ? > > > > In fact, this changes how many zeroes are prefixed when displaying the > > registers > > (%016lx vs. %08lx format). For example, 32-bits userspace, 64-bits kernel: > > Indeed that's what I suspected. What is the real benefit of this change ? > Why not keep the current format for 32bits userspace ? All those leading > zeroes are pointless to me. One of the benefits is simplifying the code by removing some checks. Another is deduplicating almost identical format strings in favor of a unified one. After reading Joe's comment [1], %px seems to be the format we're looking for. An extract from Documentation/core-api/printk-formats.rst: "%px is functionally equivalent to %lx (or %lu). %px is preferred because it is more uniquely grep'able." So I guess we don't need to worry about the format (%016lx vs. %08lx), let's just use %px, as per the guideline. [1] https://lore.kernel.org/lkml/26f07092cdde378ebb42c1034badde1b56521c36.ca...@perches.com/ Cheers Murilo
Re: [PATCH 0/6] rapidio: move Kconfig menu definition to subsystem
I only have this message, and patches 5 and 6. This is meaningless for me to review - I can't tell what you've done as far as my comments on your previous iteration. Please arrange to _at least_ copy all patches the appropriate mailing lists for the set with your complete patch set if you aren't going to copy everyone on all patches in the set. On Mon, Jul 30, 2018 at 06:50:28PM -0400, Alexei Colin wrote: > In the current patchset, I took the approach of adding '|| PCI' to the > depends in the subsystem. I did try the alterantive approach mentioned > in the reviews for v1 of this patch, where the subsystem Kconfig does > not add a '|| PCI' and each per-architecture Kconfig has to add a > 'select HAS_RAPIDIO if PCI' and SoCs with IP blocks have to also add > 'select HAS_RAPIDIO'. This works too but requires each architecture's > Kconfig to add the line for RapidIO (whereas current approach does not > require that involvement) and also may create a false impression that > the dependency on PCI is strict. ... which means, as it stands, I've no idea what you actually came up with for this. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [PATCH v3 1/1] powerpc/pseries: fix EEH recovery of some IOV devices
On Mon, Jul 30, 2018 at 11:59:14AM +1000, Sam Bobroff wrote: > EEH recovery currently fails on pSeries for some IOV capable PCI > devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide > certain device tree properties for the device. (Found on an IOV > capable device using the ipr driver.) > > Recovery fails in pci_enable_resources() at the check on r->parent, > because r->flags is set and r->parent is not. This state is due to > sriov_init() setting the start, end and flags members of the IOV BARs > but the parent not being set later in > pseries_pci_fixup_iov_resources(), because the > "ibm,open-sriov-vf-bar-info" property is missing. > > Correct this by zeroing the resource flags for IOV BARs when they > can't be configured (this is the same method used by sriov_init() and > __pci_read_base()). > > VFs cleared this way can't be enabled later, because that requires > another device tree property, "ibm,number-of-configurable-vfs" as well > as support for the RTAS function "ibm_map_pes". These are all part of > hypervisor support for IOV and it seems unlikely that a hypervisor > would ever partially, but not fully, support it. (None are currently > provided by QEMU/KVM.) > > Signed-off-by: Sam Bobroff Michael, I assume you'll take this, since it only touches powerpc. Let me know if you need anything from me. > --- > Hi, > > This is a fix to allow EEH recovery to succeed in a specific situation, > which I've tried to explain in the commit message. > > As with the RFC version, the IOV BARs are disabled by setting the resource > flags to 0 but the other fields are now left as-is because that is what is > done > elsewhere (see sriov_init() and __pci_read_base()). > > I've also examined the concern raised by Bjorn Helgaas, that VFs could be > enabled later after the BARs are disabled, and it already seems safe: enabling > VFs (on pseries) depends on another device tree property, > "ibm,number-of-configurable-vfs" as well as support for the RTAS function > "ibm_map_pes". Since these are all part of the hypervisor's support for IOV it > seems unlikely that we would ever see some of them but not all. (None are > currently provided by QEMU/KVM.) (Additionally, the ipr driver on which the > EEH > recovery failure was discovered doesn't even seem to have SR-IOV support so it > certainly can't enable VFs.) > > Cheers, > Sam. > > Patch set v3: > Patch 1/1: powerpc/pseries: fix EEH recovery of some IOV devices > * Moved some useful information from the cover letter to the commit log. > > Patch set v2: > Patch 1/1: powerpc/pseries: fix EEH recovery of some IOV devices > * Moved the BAR disabling code to a function. > * Also check in pseries_pci_fixup_resources(). > > Patch set v1: > Patch 1/1: powerpc/pseries: fix EEH recovery of IOV devices > > arch/powerpc/platforms/pseries/setup.c | 25 + > 1 file changed, 17 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/setup.c > b/arch/powerpc/platforms/pseries/setup.c > index b55ad4286dc7..0a9e4243ae1d 100644 > --- a/arch/powerpc/platforms/pseries/setup.c > +++ b/arch/powerpc/platforms/pseries/setup.c > @@ -645,6 +645,15 @@ void of_pci_parse_iov_addrs(struct pci_dev *dev, const > int *indexes) > } > } > > +static void pseries_disable_sriov_resources(struct pci_dev *pdev) > +{ > + int i; > + > + pci_warn(pdev, "No hypervisor support for SR-IOV on this device, IOV > BARs disabled.\n"); > + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) > + pdev->resource[i + PCI_IOV_RESOURCES].flags = 0; > +} > + > static void pseries_pci_fixup_resources(struct pci_dev *pdev) > { > const int *indexes; > @@ -652,10 +661,10 @@ static void pseries_pci_fixup_resources(struct pci_dev > *pdev) > > /*Firmware must support open sriov otherwise dont configure*/ > indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL); > - if (!indexes) > - return; > - /* Assign the addresses from device tree*/ > - of_pci_set_vf_bar_size(pdev, indexes); > + if (indexes) > + of_pci_set_vf_bar_size(pdev, indexes); > + else > + pseries_disable_sriov_resources(pdev); > } > > static void pseries_pci_fixup_iov_resources(struct pci_dev *pdev) > @@ -667,10 +676,10 @@ static void pseries_pci_fixup_iov_resources(struct > pci_dev *pdev) > return; > /*Firmware must support open sriov otherwise dont configure*/ > indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL); > - if (!indexes) > - return; > - /* Assign the addresses from device tree*/ > - of_pci_parse_iov_addrs(pdev, indexes); > + if (indexes) > + of_pci_parse_iov_addrs(pdev, indexes); > + else > + pseries_disable_sriov_resources(pdev); > } > > static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev > *pdev, > -- > 2.16.1.74.g9b0b1f47b >
Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code
[+cc Joerg] On Mon, Jul 30, 2018 at 09:38:42AM +0200, Christoph Hellwig wrote: > There is nothing arch specific about PCI or dma-debug, so move this > call to common code just after registering the bus type. I assume that previously, even if the user set CONFIG_DMA_API_DEBUG=y we only got PCI DMA debug on powerpc, sh, and x86. And after this patch, we'll get PCI DMA debug on *all* arches? If that's true, I'll add a comment to that effect to the commitlog since that new functionality might be of interest to other arches. > Signed-off-by: Christoph Hellwig > --- > arch/powerpc/kernel/dma.c | 3 --- > arch/sh/drivers/pci/pci.c | 2 -- > arch/x86/kernel/pci-dma.c | 3 --- > drivers/pci/pci-driver.c | 2 +- > 4 files changed, 1 insertion(+), 9 deletions(-) > > diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c > index 155170d70324..dbfc7056d7df 100644 > --- a/arch/powerpc/kernel/dma.c > +++ b/arch/powerpc/kernel/dma.c > @@ -357,9 +357,6 @@ EXPORT_SYMBOL_GPL(dma_get_required_mask); > > static int __init dma_init(void) > { > -#ifdef CONFIG_PCI > - dma_debug_add_bus(_bus_type); > -#endif > #ifdef CONFIG_IBMVIO > dma_debug_add_bus(_bus_type); > #endif > diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c > index e5b7437ab4af..8256626bc53c 100644 > --- a/arch/sh/drivers/pci/pci.c > +++ b/arch/sh/drivers/pci/pci.c > @@ -160,8 +160,6 @@ static int __init pcibios_init(void) > for (hose = hose_head; hose; hose = hose->next) > pcibios_scanbus(hose); > > - dma_debug_add_bus(_bus_type); > - > pci_initialized = 1; > > return 0; > diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c > index ab5d9dd668d2..43f58632f123 100644 > --- a/arch/x86/kernel/pci-dma.c > +++ b/arch/x86/kernel/pci-dma.c > @@ -155,9 +155,6 @@ static int __init pci_iommu_init(void) > { > struct iommu_table_entry *p; > > -#ifdef CONFIG_PCI > - dma_debug_add_bus(_bus_type); > -#endif > x86_init.iommu.iommu_init(); > > for (p = __iommu_table; p < __iommu_table_end; p++) { > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 6792292b5fc7..bef17c3fca67 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -1668,7 +1668,7 @@ static int __init pci_driver_init(void) > if (ret) > return ret; > #endif > - > + dma_debug_add_bus(_bus_type); > return 0; > } > postcore_initcall(pci_driver_init); > -- > 2.18.0 >
Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
See below. On 07/30/2018 01:31 AM, Michael Ellerman wrote: > Michael Bringmann writes: > >> During LPAR migration, the content of the device tree/sysfs may >> be updated including deletion and replacement of nodes in the >> tree. When nodes are added to the internal node structures, they >> are appended in FIFO order to a list of nodes maintained by the >> OF code APIs. > > That hasn't been true for several years. The data structure is an n-ary > tree. What kernel version are you working on? Sorry for an error in my description. I oversimplified based on the name of a search iterator. Let me try to provide a better explanation of the problem, here. This is the problem. The PPC mobility code receives RTAS requests to delete nodes with platform-/hardware-specific attributes when restarting the kernel after a migration. My example is for migration between a P8 Alpine and a P8 Brazos. Nodes to be deleted may include 'ibm,random-v1', 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', or others. The mobility.c code calls 'of_detach_node' for the nodes and their children. This makes calls to detach the properties and to try to remove the associated sysfs/kernfs files. Then new copies of the same nodes are next provided by the PHYP, local copies are built, and a pointer to the 'struct device_node' is passed to of_attach_node. Before the call to of_attach_node, the phandle is initialized to 0 when the data structure is alloced. During the call to of_attach_node, it calls __of_attach_node which pulls the actual name and phandle from just created sub-properties named something like 'name' and 'ibm,phandle'. This is all fine for the first migration. The problem occurs with the second and subsequent migrations when the PHYP on the new system wants to replace the same set of nodes again, referenced with the same names and phandle values. > >> When nodes are removed from the device tree, they >> are marked OF_DETACHED, but not actually deleted from the system >> to allow for pointers cached elsewhere in the kernel. The order >> and content of the entries in the list of nodes is not altered, >> though. > > Something is going wrong if this is actually happening. > > When the node is detached it should be *detached* from the tree of all > nodes, so it should not be discoverable other than by having an existing > pointer to it. On the second and subsequent migrations, the PHYP tells the system to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these nodes by its known set of phandle values -- the same handles used by the PHYP on the source system are known on the target system. The mobility.c code calls of_find_node_by_phandle() with these values and ends up locating the first instance of each node that was added during the original boot, instead of the second instance of each node created after the first migration. The detach during the second migration fails with errors like, [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0 [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: GW 4.18.0-rc1-wi107836-v05-120+ #201 [ 4565.030737] NIP: c07c1ea8 LR: c07c1fb4 CTR: 00655170 [ 4565.030741] REGS: c003f302b690 TRAP: 0700 Tainted: GW (4.18.0-rc1-wi107836-v05-120+) [ 4565.030745] MSR: 80010282b033 CR: 22288822 XER: 000a [ 4565.030757] CFAR: c07c1fb0 IRQMASK: 1 [ 4565.030757] GPR00: c07c1fa4 c003f302b910 c114bf00 c0038e68 [ 4565.030757] GPR04: 0001 80c008e0b4b8 [ 4565.030757] GPR08: 0001 8003 2843 [ 4565.030757] GPR12: 8800 c0001ec9ae00 4000 [ 4565.030757] GPR16: 0008 f6ff [ 4565.030757] GPR20: 0007 c003e9f1f034 0001 [ 4565.030757] GPR24: [ 4565.030757] GPR28: c1549d28 c1134828 c0038e68 c003f302b930 [ 4565.030804] NIP [c07c1ea8] __of_detach_node+0x8/0xa0 [ 4565.030808] LR [c07c1fb4] of_detach_node+0x74/0xd0 [ 4565.030811] Call Trace: [ 4565.030815] [c003f302b910] [c07c1fa4] of_detach_node+0x64/0xd0 (unreliable) [ 4565.030821] [c003f302b980] [c00c33c4] dlpar_detach_node+0xb4/0x150 [ 4565.030826] [c003f302ba10] [c00c3ffc] delete_dt_node+0x3c/0x80 [
[PATCH 2/5] tools headers powerpc: Update asm/unistd.h copy to pick new
From: Arnaldo Carvalho de Melo The new 'io_pgetevents' syscall was wired up in PowerPC in the following cset: b2f82565f2ca ("powerpc: Wire up io_pgetevents") Update tools/arch/powerpc/ copy of the asm/unistd.h file so that 'perf trace' on PowerPC gets it in its syscall table. This elliminated the following perf build warning: Warning: Kernel ABI header at 'tools/arch/powerpc/include/uapi/asm/unistd.h' differs from latest version at 'arch/powerpc/include/uapi/asm/unistd.h' Cc: Alexander Shishkin Cc: Breno Leitao Cc: Hendrik Brueckner Cc: Jiri Olsa Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman Cc: Namhyung Kim Cc: Ravi Bangoria Cc: Thomas Richter Link: https://lkml.kernel.org/n/tip-9uvu7tz4ud3bxxfyxwryu...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/arch/powerpc/include/uapi/asm/unistd.h | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/arch/powerpc/include/uapi/asm/unistd.h b/tools/arch/powerpc/include/uapi/asm/unistd.h index ac5ba55066dd..985534d0b448 100644 --- a/tools/arch/powerpc/include/uapi/asm/unistd.h +++ b/tools/arch/powerpc/include/uapi/asm/unistd.h @@ -399,5 +399,6 @@ #define __NR_pkey_free 385 #define __NR_pkey_mprotect 386 #define __NR_rseq 387 +#define __NR_io_pgetevents 388 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ -- 2.14.4
[GIT PULL 0/5] perf/urgent fixes
Hi Ingo, Please consider pulling, just to get the build without warnings and finishing successfully in all my test environments, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 7f635ff187ab6be0b350b3ec06791e376af238ab: perf/core: Fix crash when using HW tracing kernel filters (2018-07-25 11:46:22 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-urgent-for-mingo-4.18-20180730 for you to fetch changes up to 44fe619b1418ff4e9d2f9518a940fbe2fb686a08: perf tools: Fix the build on the alpine:edge distro (2018-07-30 13:15:03 -0300) perf/urgent fixes: (Arnaldo Carvalho de Melo) - Update the tools copy of several files, including perf_event.h, powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and x86's memcpy_64.s (used in 'perf bench mem'), silencing the respective warnings during the perf tools build. - Fix the build on the alpine:edge distro. Signed-off-by: Arnaldo Carvalho de Melo Arnaldo Carvalho de Melo (5): tools headers uapi: Update tools's copy of linux/perf_event.h tools headers powerpc: Update asm/unistd.h copy to pick new tools headers uapi: Refresh linux/bpf.h copy tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy' perf tools: Fix the build on the alpine:edge distro tools/arch/powerpc/include/uapi/asm/unistd.h | 1 + tools/arch/x86/include/asm/mcsafe_test.h | 13 tools/arch/x86/lib/memcpy_64.S | 112 +-- tools/include/uapi/linux/bpf.h | 28 +-- tools/include/uapi/linux/perf_event.h| 2 + tools/perf/arch/x86/util/pmu.c | 1 + tools/perf/arch/x86/util/tsc.c | 1 + tools/perf/bench/Build | 1 + tools/perf/bench/mem-memcpy-x86-64-asm.S | 1 + tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 ++ tools/perf/perf.h| 1 + tools/perf/util/header.h | 1 + tools/perf/util/namespaces.h | 1 + 13 files changed, 124 insertions(+), 63 deletions(-) create mode 100644 tools/arch/x86/include/asm/mcsafe_test.h create mode 100644 tools/perf/bench/mem-memcpy-x86-64-lib.c Test results: The first ones are container (docker) based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. # dm 1 alpine:3.4: Ok gcc (Alpine 5.3.0) 5.3.0 2 alpine:3.5: Ok gcc (Alpine 6.2.1) 6.2.1 20160822 3 alpine:3.6: Ok gcc (Alpine 6.3.0) 6.3.0 4 alpine:3.7: Ok gcc (Alpine 6.4.0) 6.4.0 5 alpine:edge : Ok gcc (Alpine 6.4.0) 6.4.0 6 amazonlinux:1 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28) 7 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) 8 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 9 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 10 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55) 11 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18) 12 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat
Re: [PATCH] of/fdt: Remove PPC32 longtrail hack in memory scan
On Mon, Jul 30, 2018 at 4:47 AM Michael Ellerman wrote: > > Rob Herring writes: > > On Thu, Jul 26, 2018 at 11:36 PM Michael Ellerman > > wrote: > >> When the OF code was originally made common by Grant in commit > >> 51975db0b733 ("of/flattree: merge early_init_dt_scan_memory() common > >> code") (Feb 2010), the common code inherited a hack to handle > >> PPC "longtrail" machines, which had a "memory@0" node with no > >> device_type. > >> > >> That check was then made to only apply to PPC32 in b44aa25d20e2 ("of: > >> Handle memory@0 node on PPC32 only") (May 2014). > >> > >> But according to Paul Mackerras the "longtrail" machines are long > >> dead, if they were ever seen in the wild at all. If someone does still > >> have one, we can handle this firmware wart in powerpc platform code. > >> > >> So remove the hack once and for all. > > > > Yay. I guess Power Macs and other quirks will never die... > > Not soon. > > In base.c I see: > - the hack in arch_find_n_match_cpu_physical_id() >- we should just move that into arch code, it's a __weak arch hook > after all. Except then we'd have to export __of_find_n_match_cpu_property. I somewhat prefer it like it is because arch specific functions tend to encourage duplication. But if the implementation is completely different like sparc, then yes, a separate implementation makes sense. > - a PPC hack in of_alias_scan(), I guess we need to retain that >behaviour, but it's pretty minor anyway. It would be nice to know what platform(s) needs this as I don't have a clue. It would also be nice if I had some dumps of DTs for some of these OpenFirmware systems. Rob
[PATCH 20/20] powerpc/dma: remove dma_nommu_mmap_coherent
The remaining implementation for coherent caches is functionally identical to the default provided in common code. Signed-off-by: Christoph Hellwig --- arch/powerpc/include/asm/dma-mapping.h | 7 --- arch/powerpc/kernel/dma-iommu.c| 1 - arch/powerpc/kernel/dma.c | 13 - arch/powerpc/platforms/pseries/vio.c | 1 - 4 files changed, 22 deletions(-) diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h index 879c4efba785..e62e23aa3714 100644 --- a/arch/powerpc/include/asm/dma-mapping.h +++ b/arch/powerpc/include/asm/dma-mapping.h @@ -18,13 +18,6 @@ #include #include -/* Some dma direct funcs must be visible for use in other dma_ops */ -extern int dma_nommu_mmap_coherent(struct device *dev, - struct vm_area_struct *vma, - void *cpu_addr, dma_addr_t handle, - size_t size, unsigned long attrs); - - static inline unsigned long device_to_mask(struct device *dev) { if (dev->dma_mask && *dev->dma_mask) diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c index f9fe2080ceb9..bf5234e1f71b 100644 --- a/arch/powerpc/kernel/dma-iommu.c +++ b/arch/powerpc/kernel/dma-iommu.c @@ -114,7 +114,6 @@ int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr) struct dma_map_ops dma_iommu_ops = { .alloc = dma_iommu_alloc_coherent, .free = dma_iommu_free_coherent, - .mmap = dma_nommu_mmap_coherent, .map_sg = dma_iommu_map_sg, .unmap_sg = dma_iommu_unmap_sg, .dma_supported = dma_iommu_dma_supported, diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index 08b12cbd7abf..5b71c9d1b8cc 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -70,18 +70,6 @@ static void dma_nommu_free_coherent(struct device *dev, size_t size, iommu_free_coherent(iommu, size, vaddr, dma_handle); } -int dma_nommu_mmap_coherent(struct device *dev, struct vm_area_struct *vma, -void *cpu_addr, dma_addr_t handle, size_t size, -unsigned long attrs) -{ - unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr)); - - return remap_pfn_range(vma, vma->vm_start, - pfn + vma->vm_pgoff, - vma->vm_end - vma->vm_start, - vma->vm_page_prot); -} - /* note: needs to be called arch_get_required_mask for dma-noncoherent.c */ u64 arch_get_required_mask(struct device *dev) { @@ -98,7 +86,6 @@ u64 arch_get_required_mask(struct device *dev) const struct dma_map_ops dma_nommu_ops = { .alloc = dma_nommu_alloc_coherent, .free = dma_nommu_free_coherent, - .mmap = dma_nommu_mmap_coherent, .map_sg = dma_direct_map_sg, .map_page = dma_direct_map_page, .get_required_mask = arch_get_required_mask, diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c index 49e04ec19238..51d564313bd0 100644 --- a/arch/powerpc/platforms/pseries/vio.c +++ b/arch/powerpc/platforms/pseries/vio.c @@ -618,7 +618,6 @@ static u64 vio_dma_get_required_mask(struct device *dev) static const struct dma_map_ops vio_dma_mapping_ops = { .alloc = vio_dma_iommu_alloc_coherent, .free = vio_dma_iommu_free_coherent, - .mmap = dma_nommu_mmap_coherent, .map_sg= vio_dma_iommu_map_sg, .unmap_sg = vio_dma_iommu_unmap_sg, .map_page = vio_dma_iommu_map_page, -- 2.18.0
[PATCH 19/20] powerpc/dma: use the generic dma-direct map_page and map_sg routines
These are indentical except for additional error checking, so migrate to the common code, and wire up the get_mapping_error method as well. Signed-off-by: Christoph Hellwig --- arch/powerpc/kernel/dma.c | 32 1 file changed, 4 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index b2e88075b2ea..08b12cbd7abf 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -82,21 +82,6 @@ int dma_nommu_mmap_coherent(struct device *dev, struct vm_area_struct *vma, vma->vm_page_prot); } -static int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl, -int nents, enum dma_data_direction direction, -unsigned long attrs) -{ - struct scatterlist *sg; - int i; - - for_each_sg(sgl, sg, nents, i) { - sg->dma_address = phys_to_dma(dev, sg_phys(sg)); - sg->dma_length = sg->length; - } - - return nents; -} - /* note: needs to be called arch_get_required_mask for dma-noncoherent.c */ u64 arch_get_required_mask(struct device *dev) { @@ -110,24 +95,15 @@ u64 arch_get_required_mask(struct device *dev) return mask; } -static inline dma_addr_t dma_nommu_map_page(struct device *dev, -struct page *page, -unsigned long offset, -size_t size, -enum dma_data_direction dir, -unsigned long attrs) -{ - return phys_to_dma(dev, page_to_phys(page)) + offset; -} - const struct dma_map_ops dma_nommu_ops = { .alloc = dma_nommu_alloc_coherent, .free = dma_nommu_free_coherent, .mmap = dma_nommu_mmap_coherent, - .map_sg = dma_nommu_map_sg, - .dma_supported = dma_direct_supported, - .map_page = dma_nommu_map_page, + .map_sg = dma_direct_map_sg, + .map_page = dma_direct_map_page, .get_required_mask = arch_get_required_mask, + .dma_supported = dma_direct_supported, + .mapping_error = dma_direct_mapping_error, }; #ifndef CONFIG_NOT_COHERENT_CACHE -- 2.18.0
[PATCH 18/20] powerpc/dma-noncoherent: use generic dma_noncoherent_ops
The generic dma-noncoherent code provides all that is needed by powerpc. Note that the cache maintainance in the existing code is a bit odd as it implements both the sync_to_device and sync_to_cpu callouts, but never flushes caches when unmapping. This patch keeps both directions arounds, which will lead to more flushing than the previous implementation. Someone more familar with the required CPUs should eventually take a look and optimize the cache flush handling if needed. Signed-off-by: Christoph Hellwig --- arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/dma-mapping.h | 29 - arch/powerpc/kernel/dma.c | 59 +++--- arch/powerpc/kernel/pci-common.c | 5 ++- arch/powerpc/kernel/setup-common.c | 4 ++ arch/powerpc/mm/dma-noncoherent.c | 52 +-- arch/powerpc/platforms/44x/warp.c | 2 +- arch/powerpc/platforms/Kconfig.cputype | 6 ++- 8 files changed, 60 insertions(+), 99 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index bbfa6a8df4da..33c6017ffce6 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -129,7 +129,7 @@ config PPC # Please keep this list sorted alphabetically. # select ARCH_HAS_DEVMEM_IS_ALLOWED - select ARCH_HAS_DMA_SET_COHERENT_MASK + select ARCH_HAS_DMA_SET_COHERENT_MASK if !NOT_COHERENT_CACHE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h index f0bf7ac2686c..879c4efba785 100644 --- a/arch/powerpc/include/asm/dma-mapping.h +++ b/arch/powerpc/include/asm/dma-mapping.h @@ -19,40 +19,11 @@ #include /* Some dma direct funcs must be visible for use in other dma_ops */ -extern void *__dma_nommu_alloc_coherent(struct device *dev, size_t size, -dma_addr_t *dma_handle, gfp_t flag, -unsigned long attrs); -extern void __dma_nommu_free_coherent(struct device *dev, size_t size, - void *vaddr, dma_addr_t dma_handle, - unsigned long attrs); extern int dma_nommu_mmap_coherent(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t handle, size_t size, unsigned long attrs); -#ifdef CONFIG_NOT_COHERENT_CACHE -/* - * DMA-consistent mapping functions for PowerPCs that don't support - * cache snooping. These allocate/free a region of uncached mapped - * memory space for use with DMA devices. Alternatively, you could - * allocate the space "normally" and use the cache management functions - * to ensure it is consistent. - */ -struct device; -extern void __dma_sync(void *vaddr, size_t size, int direction); -extern void __dma_sync_page(struct page *page, unsigned long offset, -size_t size, int direction); -extern unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr); - -#else /* ! CONFIG_NOT_COHERENT_CACHE */ -/* - * Cache coherent cores. - */ - -#define __dma_sync(addr, size, rw) ((void)0) -#define __dma_sync_page(pg, off, sz, rw) ((void)0) - -#endif /* ! CONFIG_NOT_COHERENT_CACHE */ static inline unsigned long device_to_mask(struct device *dev) { diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index 2b90a403cdac..b2e88075b2ea 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -36,12 +36,7 @@ static void *dma_nommu_alloc_coherent(struct device *dev, size_t size, * we can really use the direct ops */ if (dma_direct_supported(dev, dev->coherent_dma_mask)) -#ifdef CONFIG_NOT_COHERENT_CACHE - return __dma_nommu_alloc_coherent(dev, size, dma_handle, - flag, attrs); -#else return dma_direct_alloc(dev, size, dma_handle, flag, attrs); -#endif /* Ok we can't ... do we have an iommu ? If not, fail */ iommu = get_iommu_table_base(dev); @@ -62,12 +57,7 @@ static void dma_nommu_free_coherent(struct device *dev, size_t size, /* See comments in dma_nommu_alloc_coherent() */ if (dma_direct_supported(dev, dev->coherent_dma_mask)) -#ifdef CONFIG_NOT_COHERENT_CACHE - return __dma_nommu_free_coherent(dev, size, vaddr, dma_handle, - attrs); -#else return dma_direct_free(dev, size, vaddr, dma_handle, attrs); -#endif /* Maybe we used an iommu ... */ iommu = get_iommu_table_base(dev); @@ -84,14 +74,8 @@ int dma_nommu_mmap_coherent(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t handle, size_t size,
[PATCH 17/20] powerpc/dma-swiotlb: use generic swiotlb_dma_ops
These are identical to the arch specific ones, so remove them. Signed-off-by: Christoph Hellwig --- arch/powerpc/include/asm/dma-direct.h | 4 arch/powerpc/include/asm/swiotlb.h| 2 -- arch/powerpc/kernel/dma-swiotlb.c | 28 ++- arch/powerpc/sysdev/fsl_pci.c | 2 +- 4 files changed, 7 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/include/asm/dma-direct.h b/arch/powerpc/include/asm/dma-direct.h index 0fba19445ae8..657f84ddb20d 100644 --- a/arch/powerpc/include/asm/dma-direct.h +++ b/arch/powerpc/include/asm/dma-direct.h @@ -30,4 +30,8 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr) return daddr - PCI_DRAM_OFFSET; return daddr - dev->archdata.dma_offset; } + +u64 swiotlb_powerpc_get_required(struct device *dev); +#define swiotlb_get_required_mask swiotlb_powerpc_get_required + #endif /* ASM_POWERPC_DMA_DIRECT_H */ diff --git a/arch/powerpc/include/asm/swiotlb.h b/arch/powerpc/include/asm/swiotlb.h index f65ecf57b66c..1d8c1da26ab3 100644 --- a/arch/powerpc/include/asm/swiotlb.h +++ b/arch/powerpc/include/asm/swiotlb.h @@ -13,8 +13,6 @@ #include -extern const struct dma_map_ops powerpc_swiotlb_dma_ops; - extern unsigned int ppc_swiotlb_enable; int __init swiotlb_setup_bus_notifier(void); diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c index 25986fcd1e5e..0c269de61f39 100644 --- a/arch/powerpc/kernel/dma-swiotlb.c +++ b/arch/powerpc/kernel/dma-swiotlb.c @@ -24,7 +24,7 @@ unsigned int ppc_swiotlb_enable; -static u64 swiotlb_powerpc_get_required(struct device *dev) +u64 swiotlb_powerpc_get_required(struct device *dev) { u64 end, mask, max_direct_dma_addr = dev->archdata.max_direct_dma_addr; @@ -38,30 +38,6 @@ static u64 swiotlb_powerpc_get_required(struct device *dev) return mask; } -/* - * At the moment, all platforms that use this code only require - * swiotlb to be used if we're operating on HIGHMEM. Since - * we don't ever call anything other than map_sg, unmap_sg, - * map_page, and unmap_page on highmem, use normal dma_ops - * for everything else. - */ -const struct dma_map_ops powerpc_swiotlb_dma_ops = { - .alloc = dma_direct_alloc, - .free = dma_direct_free, - .mmap = dma_nommu_mmap_coherent, - .map_sg = swiotlb_map_sg_attrs, - .unmap_sg = swiotlb_unmap_sg_attrs, - .dma_supported = swiotlb_dma_supported, - .map_page = swiotlb_map_page, - .unmap_page = swiotlb_unmap_page, - .sync_single_for_cpu = swiotlb_sync_single_for_cpu, - .sync_single_for_device = swiotlb_sync_single_for_device, - .sync_sg_for_cpu = swiotlb_sync_sg_for_cpu, - .sync_sg_for_device = swiotlb_sync_sg_for_device, - .mapping_error = swiotlb_dma_mapping_error, - .get_required_mask = swiotlb_powerpc_get_required, -}; - void pci_dma_dev_setup_swiotlb(struct pci_dev *pdev) { struct pci_controller *hose; @@ -88,7 +64,7 @@ static int ppc_swiotlb_bus_notify(struct notifier_block *nb, /* May need to bounce if the device can't address all of DRAM */ if ((dma_get_mask(dev) + 1) < memblock_end_of_DRAM()) - set_dma_ops(dev, _swiotlb_dma_ops); + set_dma_ops(dev, _dma_ops); return NOTIFY_DONE; } diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c index 918be816b097..daf44bc0108d 100644 --- a/arch/powerpc/sysdev/fsl_pci.c +++ b/arch/powerpc/sysdev/fsl_pci.c @@ -118,7 +118,7 @@ static void setup_swiotlb_ops(struct pci_controller *hose) { if (ppc_swiotlb_enable) { hose->controller_ops.dma_dev_setup = pci_dma_dev_setup_swiotlb; - set_pci_dma_ops(_swiotlb_dma_ops); + set_pci_dma_ops(_dma_ops); } } #else -- 2.18.0
[PATCH 15/20] powerpc/dma: remove the unused unmap_page and unmap_sg methods
These methods are optional to start with, no need to implement no-op versions. Signed-off-by: Christoph Hellwig --- arch/powerpc/kernel/dma.c | 16 1 file changed, 16 deletions(-) diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index 511a4972560d..2cfc45acbb52 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -178,12 +178,6 @@ static int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl, return nents; } -static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sg, - int nents, enum dma_data_direction direction, - unsigned long attrs) -{ -} - static u64 dma_nommu_get_required_mask(struct device *dev) { u64 end, mask; @@ -209,14 +203,6 @@ static inline dma_addr_t dma_nommu_map_page(struct device *dev, return phys_to_dma(dev, page_to_phys(page)) + offset; } -static inline void dma_nommu_unmap_page(struct device *dev, -dma_addr_t dma_address, -size_t size, -enum dma_data_direction direction, -unsigned long attrs) -{ -} - #ifdef CONFIG_NOT_COHERENT_CACHE static inline void dma_nommu_sync_sg(struct device *dev, struct scatterlist *sgl, int nents, @@ -242,10 +228,8 @@ const struct dma_map_ops dma_nommu_ops = { .free = dma_nommu_free_coherent, .mmap = dma_nommu_mmap_coherent, .map_sg = dma_nommu_map_sg, - .unmap_sg = dma_nommu_unmap_sg, .dma_supported = dma_direct_supported, .map_page = dma_nommu_map_page, - .unmap_page = dma_nommu_unmap_page, .get_required_mask = dma_nommu_get_required_mask, #ifdef CONFIG_NOT_COHERENT_CACHE .sync_single_for_cpu= dma_nommu_sync_single, -- 2.18.0
[PATCH 16/20] powerpc/dma: use dma_direct_{alloc,free}
These do the same functionality as the existing helpers, but do it simpler, and also allow the (optional) use of CMA. Note that the swiotlb code now calls into the dma_direct code directly, given that it doesn't work with noncoherent caches at all, and isn't called when we have an iommu either, so the iommu special case in dma_nommu_alloc_coherent isn't required for swiotlb. Signed-off-by: Christoph Hellwig --- arch/powerpc/include/asm/pgtable.h | 1 - arch/powerpc/kernel/dma-swiotlb.c | 4 +- arch/powerpc/kernel/dma.c | 78 -- arch/powerpc/mm/mem.c | 19 4 files changed, 11 insertions(+), 91 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 14c79a7dc855..123de4958d2e 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -38,7 +38,6 @@ extern unsigned long empty_zero_page[]; extern pgd_t swapper_pg_dir[]; void limit_zone_pfn(enum zone_type zone, unsigned long max_pfn); -int dma_pfn_limit_to_zone(u64 pfn_limit); extern void paging_init(void); /* diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c index f6e0701c5303..25986fcd1e5e 100644 --- a/arch/powerpc/kernel/dma-swiotlb.c +++ b/arch/powerpc/kernel/dma-swiotlb.c @@ -46,8 +46,8 @@ static u64 swiotlb_powerpc_get_required(struct device *dev) * for everything else. */ const struct dma_map_ops powerpc_swiotlb_dma_ops = { - .alloc = __dma_nommu_alloc_coherent, - .free = __dma_nommu_free_coherent, + .alloc = dma_direct_alloc, + .free = dma_direct_free, .mmap = dma_nommu_mmap_coherent, .map_sg = swiotlb_map_sg_attrs, .unmap_sg = swiotlb_unmap_sg_attrs, diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index 2cfc45acbb52..2b90a403cdac 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -26,75 +26,6 @@ * can set archdata.dma_data to an unsigned long holding the offset. By * default the offset is PCI_DRAM_OFFSET. */ - -static u64 __maybe_unused get_pfn_limit(struct device *dev) -{ - u64 pfn = (dev->coherent_dma_mask >> PAGE_SHIFT) + 1; - struct dev_archdata __maybe_unused *sd = >archdata; - -#ifdef CONFIG_SWIOTLB - if (sd->max_direct_dma_addr && dev->dma_ops == _swiotlb_dma_ops) - pfn = min_t(u64, pfn, sd->max_direct_dma_addr >> PAGE_SHIFT); -#endif - - return pfn; -} - -#ifndef CONFIG_NOT_COHERENT_CACHE -void *__dma_nommu_alloc_coherent(struct device *dev, size_t size, - dma_addr_t *dma_handle, gfp_t flag, - unsigned long attrs) -{ - void *ret; - struct page *page; - int node = dev_to_node(dev); -#ifdef CONFIG_FSL_SOC - u64 pfn = get_pfn_limit(dev); - int zone; - - /* -* This code should be OK on other platforms, but we have drivers that -* don't set coherent_dma_mask. As a workaround we just ifdef it. This -* whole routine needs some serious cleanup. -*/ - - zone = dma_pfn_limit_to_zone(pfn); - if (zone < 0) { - dev_err(dev, "%s: No suitable zone for pfn %#llx\n", - __func__, pfn); - return NULL; - } - - switch (zone) { - case ZONE_DMA: - flag |= GFP_DMA; - break; -#ifdef CONFIG_ZONE_DMA32 - case ZONE_DMA32: - flag |= GFP_DMA32; - break; -#endif - }; -#endif /* CONFIG_FSL_SOC */ - - page = alloc_pages_node(node, flag, get_order(size)); - if (page == NULL) - return NULL; - ret = page_address(page); - memset(ret, 0, size); - *dma_handle = phys_to_dma(dev,__pa(ret)); - - return ret; -} - -void __dma_nommu_free_coherent(struct device *dev, size_t size, - void *vaddr, dma_addr_t dma_handle, - unsigned long attrs) -{ - free_pages((unsigned long)vaddr, get_order(size)); -} -#endif /* !CONFIG_NOT_COHERENT_CACHE */ - static void *dma_nommu_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t flag, unsigned long attrs) @@ -105,8 +36,12 @@ static void *dma_nommu_alloc_coherent(struct device *dev, size_t size, * we can really use the direct ops */ if (dma_direct_supported(dev, dev->coherent_dma_mask)) +#ifdef CONFIG_NOT_COHERENT_CACHE return __dma_nommu_alloc_coherent(dev, size, dma_handle, flag, attrs); +#else + return dma_direct_alloc(dev, size, dma_handle, flag, attrs); +#endif /* Ok we can't ... do we have an iommu ? If not, fail */ iommu = get_iommu_table_base(dev); @@ -127,8 +62,13 @@ static void
[PATCH 14/20] powerpc/dma: replace dma_nommu_dma_supported with dma_direct_supported
The ppc32 case of dma_nommu_dma_supported already was a no-op, and the 64-bit case came to the same conclusion as dma_direct_supported, so replace it with the generic version. Signed-off-by: Christoph Hellwig --- arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/dma.c | 28 +++- 2 files changed, 4 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index f9cae7edd735..bbfa6a8df4da 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -158,6 +158,7 @@ config PPC select CLONE_BACKWARDS select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN select DYNAMIC_FTRACE if FUNCTION_TRACER + select DMA_DIRECT_OPS select EDAC_ATOMIC_SCRUB select EDAC_SUPPORT select GENERIC_ATOMIC64 if PPC32 diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index 3487de83bb37..511a4972560d 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -40,28 +40,6 @@ static u64 __maybe_unused get_pfn_limit(struct device *dev) return pfn; } -static int dma_nommu_dma_supported(struct device *dev, u64 mask) -{ -#ifdef CONFIG_PPC64 - u64 limit = phys_to_dma(dev, (memblock_end_of_DRAM() - 1)); - - /* Limit fits in the mask, we are good */ - if (mask >= limit) - return 1; - -#ifdef CONFIG_FSL_SOC - /* Freescale gets another chance via ZONE_DMA/ZONE_DMA32, however -* that will have to be refined if/when they support iommus -*/ - return 1; -#endif - /* Sorry ... */ - return 0; -#else - return 1; -#endif -} - #ifndef CONFIG_NOT_COHERENT_CACHE void *__dma_nommu_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t flag, @@ -126,7 +104,7 @@ static void *dma_nommu_alloc_coherent(struct device *dev, size_t size, /* The coherent mask may be smaller than the real mask, check if * we can really use the direct ops */ - if (dma_nommu_dma_supported(dev, dev->coherent_dma_mask)) + if (dma_direct_supported(dev, dev->coherent_dma_mask)) return __dma_nommu_alloc_coherent(dev, size, dma_handle, flag, attrs); @@ -148,7 +126,7 @@ static void dma_nommu_free_coherent(struct device *dev, size_t size, struct iommu_table *iommu; /* See comments in dma_nommu_alloc_coherent() */ - if (dma_nommu_dma_supported(dev, dev->coherent_dma_mask)) + if (dma_direct_supported(dev, dev->coherent_dma_mask)) return __dma_nommu_free_coherent(dev, size, vaddr, dma_handle, attrs); /* Maybe we used an iommu ... */ @@ -265,7 +243,7 @@ const struct dma_map_ops dma_nommu_ops = { .mmap = dma_nommu_mmap_coherent, .map_sg = dma_nommu_map_sg, .unmap_sg = dma_nommu_unmap_sg, - .dma_supported = dma_nommu_dma_supported, + .dma_supported = dma_direct_supported, .map_page = dma_nommu_map_page, .unmap_page = dma_nommu_unmap_page, .get_required_mask = dma_nommu_get_required_mask, -- 2.18.0
[PATCH 10/20] powerpc/dma-noncoherent: don't disable irqs over kmap_atomic
The requirement to disable local irqs over kmap_atomic is long gone, so remove those calls. Signed-off-by: Christoph Hellwig --- arch/powerpc/mm/dma-noncoherent.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 382528475433..d1c16456abac 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -357,12 +357,10 @@ static inline void __dma_sync_page_highmem(struct page *page, { size_t seg_size = min((size_t)(PAGE_SIZE - offset), size); size_t cur_size = seg_size; - unsigned long flags, start, seg_offset = offset; + unsigned long start, seg_offset = offset; int nr_segs = 1 + ((size - seg_size) + PAGE_SIZE - 1)/PAGE_SIZE; int seg_nr = 0; - local_irq_save(flags); - do { start = (unsigned long)kmap_atomic(page + seg_nr) + seg_offset; @@ -378,8 +376,6 @@ static inline void __dma_sync_page_highmem(struct page *page, cur_size += seg_size; seg_offset = 0; } while (seg_nr < nr_segs); - - local_irq_restore(flags); } #endif /* CONFIG_HIGHMEM */ -- 2.18.0
[PATCH 13/20] powerpc/dma: remove get_dma_offset
Just fold the calculation into __phys_to_dma/__dma_to_phys as those are the only places that should know about it. Signed-off-by: Christoph Hellwig --- arch/powerpc/include/asm/dma-direct.h | 8 ++-- arch/powerpc/include/asm/dma-mapping.h | 16 2 files changed, 6 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/dma-direct.h b/arch/powerpc/include/asm/dma-direct.h index 7702875aabb7..0fba19445ae8 100644 --- a/arch/powerpc/include/asm/dma-direct.h +++ b/arch/powerpc/include/asm/dma-direct.h @@ -19,11 +19,15 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr) { - return paddr + get_dma_offset(dev); + if (!dev) + return paddr + PCI_DRAM_OFFSET; + return paddr + dev->archdata.dma_offset; } static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr) { - return daddr - get_dma_offset(dev); + if (!dev) + return daddr - PCI_DRAM_OFFSET; + return daddr - dev->archdata.dma_offset; } #endif /* ASM_POWERPC_DMA_DIRECT_H */ diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h index dacd0f93f2b2..f0bf7ac2686c 100644 --- a/arch/powerpc/include/asm/dma-mapping.h +++ b/arch/powerpc/include/asm/dma-mapping.h @@ -80,22 +80,6 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus) return NULL; } -/* - * get_dma_offset() - * - * Get the dma offset on configurations where the dma address can be determined - * from the physical address by looking at a simple offset. Direct dma and - * swiotlb use this function, but it is typically not used by implementations - * with an iommu. - */ -static inline dma_addr_t get_dma_offset(struct device *dev) -{ - if (dev) - return dev->archdata.dma_offset; - - return PCI_DRAM_OFFSET; -} - static inline void set_dma_offset(struct device *dev, dma_addr_t off) { if (dev) -- 2.18.0
[PATCH 12/20] powerpc/dma: use phys_to_dma instead of get_dma_offset
Use the standard portable helper instead of the powerpc specific one, which is about to go away. Signed-off-by: Christoph Hellwig --- arch/powerpc/kernel/dma-swiotlb.c | 5 ++--- arch/powerpc/kernel/dma.c | 12 ++-- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c index 88f3963ca30f..f6e0701c5303 100644 --- a/arch/powerpc/kernel/dma-swiotlb.c +++ b/arch/powerpc/kernel/dma-swiotlb.c @@ -11,7 +11,7 @@ * */ -#include +#include #include #include #include @@ -31,9 +31,8 @@ static u64 swiotlb_powerpc_get_required(struct device *dev) end = memblock_end_of_DRAM(); if (max_direct_dma_addr && end > max_direct_dma_addr) end = max_direct_dma_addr; - end += get_dma_offset(dev); - mask = 1ULL << (fls64(end) - 1); + mask = 1ULL << (fls64(phys_to_dma(dev, end)) - 1); mask += mask - 1; return mask; diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index eceaa92e6986..3487de83bb37 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -6,7 +6,7 @@ */ #include -#include +#include #include #include #include @@ -43,7 +43,7 @@ static u64 __maybe_unused get_pfn_limit(struct device *dev) static int dma_nommu_dma_supported(struct device *dev, u64 mask) { #ifdef CONFIG_PPC64 - u64 limit = get_dma_offset(dev) + (memblock_end_of_DRAM() - 1); + u64 limit = phys_to_dma(dev, (memblock_end_of_DRAM() - 1)); /* Limit fits in the mask, we are good */ if (mask >= limit) @@ -104,7 +104,7 @@ void *__dma_nommu_alloc_coherent(struct device *dev, size_t size, return NULL; ret = page_address(page); memset(ret, 0, size); - *dma_handle = __pa(ret) + get_dma_offset(dev); + *dma_handle = phys_to_dma(dev,__pa(ret)); return ret; } @@ -188,7 +188,7 @@ static int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl, int i; for_each_sg(sgl, sg, nents, i) { - sg->dma_address = sg_phys(sg) + get_dma_offset(dev); + sg->dma_address = phys_to_dma(dev, sg_phys(sg)); sg->dma_length = sg->length; if (attrs & DMA_ATTR_SKIP_CPU_SYNC) @@ -210,7 +210,7 @@ static u64 dma_nommu_get_required_mask(struct device *dev) { u64 end, mask; - end = memblock_end_of_DRAM() + get_dma_offset(dev); + end = phys_to_dma(dev, memblock_end_of_DRAM()); mask = 1ULL << (fls64(end) - 1); mask += mask - 1; @@ -228,7 +228,7 @@ static inline dma_addr_t dma_nommu_map_page(struct device *dev, if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) __dma_sync_page(page, offset, size, dir); - return page_to_phys(page) + offset + get_dma_offset(dev); + return phys_to_dma(dev, page_to_phys(page)) + offset; } static inline void dma_nommu_unmap_page(struct device *dev, -- 2.18.0
[PATCH 11/20] powerpc/dma: split the two __dma_alloc_coherent implementations
The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share any code with the one for systems with coherent caches. Split it off and merge it with the helpers in dma-noncoherent.c that have no other callers. Signed-off-by: Christoph Hellwig --- arch/powerpc/include/asm/dma-mapping.h | 5 - arch/powerpc/kernel/dma.c | 14 ++ arch/powerpc/mm/dma-noncoherent.c | 15 +++ arch/powerpc/platforms/44x/warp.c | 2 +- 4 files changed, 10 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h index f2a4a7142b1e..dacd0f93f2b2 100644 --- a/arch/powerpc/include/asm/dma-mapping.h +++ b/arch/powerpc/include/asm/dma-mapping.h @@ -39,9 +39,6 @@ extern int dma_nommu_mmap_coherent(struct device *dev, * to ensure it is consistent. */ struct device; -extern void *__dma_alloc_coherent(struct device *dev, size_t size, - dma_addr_t *handle, gfp_t gfp); -extern void __dma_free_coherent(size_t size, void *vaddr); extern void __dma_sync(void *vaddr, size_t size, int direction); extern void __dma_sync_page(struct page *page, unsigned long offset, size_t size, int direction); @@ -52,8 +49,6 @@ extern unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr); * Cache coherent cores. */ -#define __dma_alloc_coherent(dev, gfp, size, handle) NULL -#define __dma_free_coherent(size, addr)((void)0) #define __dma_sync(addr, size, rw) ((void)0) #define __dma_sync_page(pg, off, sz, rw) ((void)0) diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index 3939589aab04..eceaa92e6986 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -62,18 +62,12 @@ static int dma_nommu_dma_supported(struct device *dev, u64 mask) #endif } +#ifndef CONFIG_NOT_COHERENT_CACHE void *__dma_nommu_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t flag, unsigned long attrs) { void *ret; -#ifdef CONFIG_NOT_COHERENT_CACHE - ret = __dma_alloc_coherent(dev, size, dma_handle, flag); - if (ret == NULL) - return NULL; - *dma_handle += get_dma_offset(dev); - return ret; -#else struct page *page; int node = dev_to_node(dev); #ifdef CONFIG_FSL_SOC @@ -113,19 +107,15 @@ void *__dma_nommu_alloc_coherent(struct device *dev, size_t size, *dma_handle = __pa(ret) + get_dma_offset(dev); return ret; -#endif } void __dma_nommu_free_coherent(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle, unsigned long attrs) { -#ifdef CONFIG_NOT_COHERENT_CACHE - __dma_free_coherent(size, vaddr); -#else free_pages((unsigned long)vaddr, get_order(size)); -#endif } +#endif /* !CONFIG_NOT_COHERENT_CACHE */ static void *dma_nommu_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t flag, diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index d1c16456abac..cfc48a253707 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -29,7 +29,7 @@ #include #include #include -#include +#include #include #include @@ -151,8 +151,8 @@ static struct ppc_vm_region *ppc_vm_region_find(struct ppc_vm_region *head, unsi * Allocate DMA-coherent memory space and return both the kernel remapped * virtual and bus address for that space. */ -void * -__dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp) +void *__dma_nommu_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { struct page *page; struct ppc_vm_region *c; @@ -223,7 +223,7 @@ __dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t /* * Set the "dma handle" */ - *handle = page_to_phys(page); + *dma_handle = phys_to_dma(dev, page_to_phys(page)); do { SetPageReserved(page); @@ -249,12 +249,12 @@ __dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t no_page: return NULL; } -EXPORT_SYMBOL(__dma_alloc_coherent); /* * free a page as defined by the above mapping. */ -void __dma_free_coherent(size_t size, void *vaddr) +void __dma_nommu_free_coherent(struct device *dev, size_t size, void *vaddr, + dma_addr_t dma_handle, unsigned long attrs) { struct ppc_vm_region *c; unsigned long flags, addr; @@ -309,7 +309,6 @@ void __dma_free_coherent(size_t size, void *vaddr) __func__, vaddr); dump_stack(); }
[PATCH 09/20] powerpc/dma: remove the unused ISA_DMA_THRESHOLD export
Signed-off-by: Christoph Hellwig --- arch/powerpc/kernel/setup_32.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c index 74457485574b..3c2d093f74c7 100644 --- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -55,7 +55,6 @@ unsigned long ISA_DMA_THRESHOLD; unsigned int DMA_MODE_READ; unsigned int DMA_MODE_WRITE; -EXPORT_SYMBOL(ISA_DMA_THRESHOLD); EXPORT_SYMBOL(DMA_MODE_READ); EXPORT_SYMBOL(DMA_MODE_WRITE); -- 2.18.0
[PATCH 08/20] powerpc/dma: remove the unused dma_nommu_ops export
Signed-off-by: Christoph Hellwig --- arch/powerpc/kernel/dma.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index dbfc7056d7df..3939589aab04 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -286,7 +286,6 @@ const struct dma_map_ops dma_nommu_ops = { .sync_sg_for_device = dma_nommu_sync_sg, #endif }; -EXPORT_SYMBOL(dma_nommu_ops); int dma_set_coherent_mask(struct device *dev, u64 mask) { -- 2.18.0
[PATCH 07/20] powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define
Signed-off-by: Christoph Hellwig --- arch/powerpc/include/asm/dma-mapping.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h index 8fa394520af6..f2a4a7142b1e 100644 --- a/arch/powerpc/include/asm/dma-mapping.h +++ b/arch/powerpc/include/asm/dma-mapping.h @@ -112,7 +112,5 @@ extern int dma_set_mask(struct device *dev, u64 dma_mask); extern u64 __dma_get_required_mask(struct device *dev); -#define ARCH_HAS_DMA_MMAP_COHERENT - #endif /* __KERNEL__ */ #endif /* _ASM_DMA_MAPPING_H */ -- 2.18.0
[PATCH 06/20] dma-noncoherent: add an optional arch hook for ->get_required_mask
This is need for powerpc for now. Hopefully we can come up with a clean generic implementation mid-term. Signed-off-by: Christoph Hellwig --- include/linux/dma-noncoherent.h | 6 ++ kernel/dma/Kconfig | 4 kernel/dma/noncoherent.c| 1 + 3 files changed, 11 insertions(+) diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h index 10b2654d549b..61394c6e56df 100644 --- a/include/linux/dma-noncoherent.h +++ b/include/linux/dma-noncoherent.h @@ -17,6 +17,12 @@ int arch_dma_mmap(struct device *dev, struct vm_area_struct *vma, #define arch_dma_mmap NULL #endif /* CONFIG_DMA_NONCOHERENT_MMAP */ +#ifdef CONFIG_DMA_NONCOHERENT_GET_REQUIRED +u64 arch_get_required_mask(struct device *dev); +#else +#define arch_get_required_mask NULL +#endif + #ifdef CONFIG_DMA_NONCOHERENT_CACHE_SYNC void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size, enum dma_data_direction direction); diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 9bd54304446f..b523104d6199 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -36,6 +36,10 @@ config DMA_NONCOHERENT_MMAP bool depends on DMA_NONCOHERENT_OPS +config DMA_NONCOHERENT_GET_REQUIRED + bool + depends on DMA_NONCOHERENT_OPS + config DMA_NONCOHERENT_CACHE_SYNC bool depends on DMA_NONCOHERENT_OPS diff --git a/kernel/dma/noncoherent.c b/kernel/dma/noncoherent.c index 79e9a757387f..cf4ffbe4a09d 100644 --- a/kernel/dma/noncoherent.c +++ b/kernel/dma/noncoherent.c @@ -98,5 +98,6 @@ const struct dma_map_ops dma_noncoherent_ops = { .dma_supported = dma_direct_supported, .mapping_error = dma_direct_mapping_error, .cache_sync = arch_dma_cache_sync, + .get_required_mask = arch_get_required_mask, }; EXPORT_SYMBOL(dma_noncoherent_ops); -- 2.18.0
[PATCH 05/20] swiotlb: allow the architecture to provide a get_required_mask hook
For now this allows consolidating the powerpc code. In the long run we should grow a generic implementation of dma_get_required_mask that returns the dma mask required to avoid bounce buffering. Signed-off-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 4 1 file changed, 4 insertions(+) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 904541055792..1bb420244753 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -1084,5 +1084,9 @@ const struct dma_map_ops swiotlb_dma_ops = { .map_page = swiotlb_map_page, .unmap_page = swiotlb_unmap_page, .dma_supported = dma_direct_supported, +#ifdef swiotlb_get_required_mask + .get_required_mask = swiotlb_get_required_mask, +#endif + }; EXPORT_SYMBOL(swiotlb_dma_ops); -- 2.18.0
[PATCH 04/20] ia64: remove get_required_mask implementation
ia64 can use the generic implementation in general, and SN2 can just override it in the dma_map_ops now. Signed-off-by: Christoph Hellwig --- arch/ia64/include/asm/dma-mapping.h | 2 -- arch/ia64/include/asm/machvec.h | 7 --- arch/ia64/include/asm/machvec_init.h | 1 - arch/ia64/include/asm/machvec_sn2.h | 2 -- arch/ia64/pci/pci.c | 26 -- arch/ia64/sn/pci/pci_dma.c | 4 ++-- 6 files changed, 2 insertions(+), 40 deletions(-) diff --git a/arch/ia64/include/asm/dma-mapping.h b/arch/ia64/include/asm/dma-mapping.h index 76e4d6632d68..522745ae67bb 100644 --- a/arch/ia64/include/asm/dma-mapping.h +++ b/arch/ia64/include/asm/dma-mapping.h @@ -10,8 +10,6 @@ #include #include -#define ARCH_HAS_DMA_GET_REQUIRED_MASK - extern const struct dma_map_ops *dma_ops; extern struct ia64_machine_vector ia64_mv; extern void set_iommu_machvec(void); diff --git a/arch/ia64/include/asm/machvec.h b/arch/ia64/include/asm/machvec.h index 267f4f170191..5133739966bc 100644 --- a/arch/ia64/include/asm/machvec.h +++ b/arch/ia64/include/asm/machvec.h @@ -44,7 +44,6 @@ typedef void ia64_mv_kernel_launch_event_t(void); /* DMA-mapping interface: */ typedef void ia64_mv_dma_init (void); -typedef u64 ia64_mv_dma_get_required_mask (struct device *); typedef const struct dma_map_ops *ia64_mv_dma_get_ops(struct device *); /* @@ -127,7 +126,6 @@ extern void machvec_tlb_migrate_finish (struct mm_struct *); # define platform_global_tlb_purgeia64_mv.global_tlb_purge # define platform_tlb_migrate_finish ia64_mv.tlb_migrate_finish # define platform_dma_initia64_mv.dma_init -# define platform_dma_get_required_mask ia64_mv.dma_get_required_mask # define platform_dma_get_ops ia64_mv.dma_get_ops # define platform_irq_to_vector ia64_mv.irq_to_vector # define platform_local_vector_to_irq ia64_mv.local_vector_to_irq @@ -171,7 +169,6 @@ struct ia64_machine_vector { ia64_mv_global_tlb_purge_t *global_tlb_purge; ia64_mv_tlb_migrate_finish_t *tlb_migrate_finish; ia64_mv_dma_init *dma_init; - ia64_mv_dma_get_required_mask *dma_get_required_mask; ia64_mv_dma_get_ops *dma_get_ops; ia64_mv_irq_to_vector *irq_to_vector; ia64_mv_local_vector_to_irq *local_vector_to_irq; @@ -211,7 +208,6 @@ struct ia64_machine_vector { platform_global_tlb_purge, \ platform_tlb_migrate_finish,\ platform_dma_init, \ - platform_dma_get_required_mask, \ platform_dma_get_ops, \ platform_irq_to_vector, \ platform_local_vector_to_irq, \ @@ -286,9 +282,6 @@ extern const struct dma_map_ops *dma_get_ops(struct device *); #ifndef platform_dma_get_ops # define platform_dma_get_ops dma_get_ops #endif -#ifndef platform_dma_get_required_mask -# define platform_dma_get_required_mask ia64_dma_get_required_mask -#endif #ifndef platform_irq_to_vector # define platform_irq_to_vector__ia64_irq_to_vector #endif diff --git a/arch/ia64/include/asm/machvec_init.h b/arch/ia64/include/asm/machvec_init.h index 2b32fd06b7c6..2aafb69a3787 100644 --- a/arch/ia64/include/asm/machvec_init.h +++ b/arch/ia64/include/asm/machvec_init.h @@ -4,7 +4,6 @@ extern ia64_mv_send_ipi_t ia64_send_ipi; extern ia64_mv_global_tlb_purge_t ia64_global_tlb_purge; -extern ia64_mv_dma_get_required_mask ia64_dma_get_required_mask; extern ia64_mv_irq_to_vector __ia64_irq_to_vector; extern ia64_mv_local_vector_to_irq __ia64_local_vector_to_irq; extern ia64_mv_pci_get_legacy_mem_t ia64_pci_get_legacy_mem; diff --git a/arch/ia64/include/asm/machvec_sn2.h b/arch/ia64/include/asm/machvec_sn2.h index ece9fa85be88..b5153d300289 100644 --- a/arch/ia64/include/asm/machvec_sn2.h +++ b/arch/ia64/include/asm/machvec_sn2.h @@ -55,7 +55,6 @@ extern ia64_mv_readb_t __sn_readb_relaxed; extern ia64_mv_readw_t __sn_readw_relaxed; extern ia64_mv_readl_t __sn_readl_relaxed; extern ia64_mv_readq_t __sn_readq_relaxed; -extern ia64_mv_dma_get_required_mask sn_dma_get_required_mask; extern ia64_mv_dma_initsn_dma_init; extern ia64_mv_migrate_t sn_migrate; extern ia64_mv_kernel_launch_event_t sn_kernel_launch_event; @@ -100,7 +99,6 @@ extern ia64_mv_pci_fixup_bus_t sn_pci_fixup_bus; #define platform_pci_get_legacy_memsn_pci_get_legacy_mem #define platform_pci_legacy_read sn_pci_legacy_read #define platform_pci_legacy_write sn_pci_legacy_write -#define platform_dma_get_required_mask sn_dma_get_required_mask #define platform_dma_init sn_dma_init #define platform_migrate sn_migrate #define platform_kernel_launch_eventsn_kernel_launch_event diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index 7ccc64d5fe3e..5d71800df431 100644 --- a/arch/ia64/pci/pci.c +++
[PATCH 03/20] dma-mapping: make the get_required_mask method available unconditionally
This save some duplication for ia64. In the long run this method will need some additional work including moving over to kernel/dma, but that will require some additional prep work, so let's do this minimal change for now. Signed-off-by: Christoph Hellwig --- drivers/base/platform.c | 11 ++- include/linux/dma-mapping.h | 2 -- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/base/platform.c b/drivers/base/platform.c index dff82a3c2caa..921ddb0c051b 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -1180,7 +1180,7 @@ int __init platform_bus_init(void) } #ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK -u64 dma_get_required_mask(struct device *dev) +static u64 default_dma_get_required_mask(struct device *dev) { u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT); u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT)); @@ -1198,6 +1198,15 @@ u64 dma_get_required_mask(struct device *dev) } return mask; } + +u64 dma_get_required_mask(struct device *dev) +{ + const struct dma_map_ops *ops = get_dma_ops(dev); + + if (ops->get_required_mask) + return ops->get_required_mask(dev); + return default_dma_get_required_mask(dev); +} EXPORT_SYMBOL_GPL(dma_get_required_mask); #endif diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index f9cc309507d9..00d3065e1510 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -130,9 +130,7 @@ struct dma_map_ops { enum dma_data_direction direction); int (*mapping_error)(struct device *dev, dma_addr_t dma_addr); int (*dma_supported)(struct device *dev, u64 mask); -#ifdef ARCH_HAS_DMA_GET_REQUIRED_MASK u64 (*get_required_mask)(struct device *dev); -#endif }; extern const struct dma_map_ops dma_direct_ops; -- 2.18.0
[PATCH 01/20] kernel/dma/direct: take DMA offset into account in dma_direct_supported
When a device has a DMA offset the dma capable result will change due to the difference between the physical and DMA address. Take that into account. Signed-off-by: Christoph Hellwig --- kernel/dma/direct.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 8be8106270c2..d32d4f0d2c0c 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -167,7 +167,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, int dma_direct_supported(struct device *dev, u64 mask) { #ifdef CONFIG_ZONE_DMA - if (mask < DMA_BIT_MASK(ARCH_ZONE_DMA_BITS)) + if (mask < phys_to_dma(dev, DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))) return 0; #else /* @@ -176,14 +176,14 @@ int dma_direct_supported(struct device *dev, u64 mask) * memory, or by providing a ZONE_DMA32. If neither is the case, the * architecture needs to use an IOMMU instead of the direct mapping. */ - if (mask < DMA_BIT_MASK(32)) + if (mask < phys_to_dma(dev, DMA_BIT_MASK(32))) return 0; #endif /* * Various PCI/PCIe bridges have broken support for > 32bit DMA even * if the device itself might support it. */ - if (dev->dma_32bit_limit && mask > DMA_BIT_MASK(32)) + if (dev->dma_32bit_limit && mask > phys_to_dma(dev, DMA_BIT_MASK(32))) return 0; return 1; } -- 2.18.0
[PATCH 02/20] kernel/dma/direct: refine dma_direct_alloc zone selection
We need to take the DMA offset and encryption bit into account when selecting a zone. Add a helper that takes those into account and use it. Signed-off-by: Christoph Hellwig --- kernel/dma/direct.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index d32d4f0d2c0c..c2c1df8827f2 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -58,6 +58,14 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) return addr + size - 1 <= dev->coherent_dma_mask; } +static bool dma_coherent_below(struct device *dev, u64 mask) +{ + dma_addr_t addr = force_dma_unencrypted() ? + __phys_to_dma(dev, mask) : phys_to_dma(dev, mask); + + return dev->coherent_dma_mask <= addr; +} + void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { @@ -70,9 +78,9 @@ void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp &= ~__GFP_ZERO; /* GFP_DMA32 and GFP_DMA are no ops without the corresponding zones: */ - if (dev->coherent_dma_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS)) + if (dma_coherent_below(dev, DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))) gfp |= GFP_DMA; - if (dev->coherent_dma_mask <= DMA_BIT_MASK(32) && !(gfp & GFP_DMA)) + if (dma_coherent_below(dev, DMA_BIT_MASK(32) && !(gfp & GFP_DMA))) gfp |= GFP_DMA32; again: @@ -92,14 +100,14 @@ void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, page = NULL; if (IS_ENABLED(CONFIG_ZONE_DMA32) && - dev->coherent_dma_mask < DMA_BIT_MASK(64) && + dma_coherent_below(dev, DMA_BIT_MASK(64)) && !(gfp & (GFP_DMA32 | GFP_DMA))) { gfp |= GFP_DMA32; goto again; } if (IS_ENABLED(CONFIG_ZONE_DMA) && - dev->coherent_dma_mask < DMA_BIT_MASK(32) && + dma_coherent_below(dev, DMA_BIT_MASK(32)) && !(gfp & GFP_DMA)) { gfp = (gfp & ~GFP_DMA32) | GFP_DMA; goto again; -- 2.18.0
use generic DMA mapping code in powerpc
Hi all, this series switches the powerpc port to use the generic swiotlb and noncoherent dma ops, and to use more generic code for the coherent direct mapping, as well as removing dead code.
Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()
Murilo Opsfelder Araujo a écrit : Hi, Christophe. On Fri, Jul 27, 2018 at 06:40:23PM +0200, LEROY Christophe wrote: Murilo Opsfelder Araujo a écrit : > Simplify the message format by using REG_FMT as the register format. This > avoids having two different formats and avoids checking for MSR_64BIT. Are you sure it is what we want ? Yes. Won't it change the behaviour for a 32 bits app running on a 64bits kernel ? In fact, this changes how many zeroes are prefixed when displaying the registers (%016lx vs. %08lx format). For example, 32-bits userspace, 64-bits kernel: Indeed that's what I suspected. What is the real benefit of this change ? Why not keep the current format for 32bits userspace ? All those leading zeroes are pointless to me. before this series: [66475.002900] segv[4599]: unhandled signal 11 at nip 1420 lr 0fe61854 code 1 after this series: [ 73.414535] segv[3759]: segfault (11) at nip 1420 lr 0fe61854 code 1 in segv[1000+1] [ 73.414641] segv[3759]: code: 4e800421 80010014 38210010 7c0803a6 4b30 9421ffd0 93e1002c 7c3f0b78 [ 73.414665] segv[3759]: code: 3920 913f001c 813f001c 3941 <9149> 3920 7d234b78 397f0030 Have you spotted any other behaviour change? Not as of today Christophe Cheers Murilo
Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
On Mon, 30 Jul 2018 18:58:49 +1000 Alexey Kardashevskiy wrote: > On 11/07/2018 19:26, Alexey Kardashevskiy wrote: > > On Tue, 10 Jul 2018 16:37:15 -0600 > > Alex Williamson wrote: > > > >> On Tue, 10 Jul 2018 14:10:20 +1000 > >> Alexey Kardashevskiy wrote: > >> > >>> On Thu, 7 Jun 2018 23:03:23 -0600 > >>> Alex Williamson wrote: > >>> > On Fri, 8 Jun 2018 14:14:23 +1000 > Alexey Kardashevskiy wrote: > > > On 8/6/18 1:44 pm, Alex Williamson wrote: > >> On Fri, 8 Jun 2018 13:08:54 +1000 > >> Alexey Kardashevskiy wrote: > >> > >>> On 8/6/18 8:15 am, Alex Williamson wrote: > On Fri, 08 Jun 2018 07:54:02 +1000 > Benjamin Herrenschmidt wrote: > > > On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote: > > > >> > >> Can we back up and discuss whether the IOMMU grouping of NVLink > >> connected devices makes sense? AIUI we have a PCI view of these > >> devices and from that perspective they're isolated. That's the > >> view of > >> the device used to generate the grouping. However, not visible to > >> us, > >> these devices are interconnected via NVLink. What isolation > >> properties > >> does NVLink provide given that its entire purpose for existing > >> seems to > >> be to provide a high performance link for p2p between devices? > >> > > > > Not entire. On POWER chips, we also have an nvlink between the > > device > > and the CPU which is running significantly faster than PCIe. > > > > But yes, there are cross-links and those should probably be > > accounted > > for in the grouping. > > Then after we fix the grouping, can we just let the host driver > manage > this coherent memory range and expose vGPUs to guests? The use case > of > assigning all 6 GPUs to one VM seems pretty limited. (Might need to > convince NVIDIA to support more than a single vGPU per VM though) > > >>> > >>> These are physical GPUs, not virtual sriov-alike things they are > >>> implementing as well elsewhere. > >> > >> vGPUs as implemented on M- and P-series Teslas aren't SR-IOV like > >> either. That's why we have mdev devices now to implement software > >> defined devices. I don't have first hand experience with V-series, but > >> I would absolutely expect a PCIe-based Tesla V100 to support vGPU. > >> > > > > So assuming V100 can do vGPU, you are suggesting ditching this patchset > > and > > using mediated vGPUs instead, correct? > > If it turns out that our PCIe-only-based IOMMU grouping doesn't > account for lack of isolation on the NVLink side and we correct that, > limiting assignment to sets of 3 interconnected GPUs, is that still a > useful feature? OTOH, it's entirely an NVIDIA proprietary decision > whether they choose to support vGPU on these GPUs or whether they can > be convinced to support multiple vGPUs per VM. > > >>> My current understanding is that every P9 chip in that box has some > >>> NVLink2 > >>> logic on it so each P9 is directly connected to 3 GPUs via PCIe and > >>> 2xNVLink2, and GPUs in that big group are interconnected by NVLink2 > >>> links > >>> as well. > >>> > >>> From small bits of information I have it seems that a GPU can > >>> perfectly > >>> work alone and if the NVIDIA driver does not see these interconnects > >>> (because we do not pass the rest of the big 3xGPU group to this > >>> guest), it > >>> continues with a single GPU. There is an "nvidia-smi -r" big reset > >>> hammer > >>> which simply refuses to work until all 3 GPUs are passed so there is > >>> some > >>> distinction between passing 1 or 3 GPUs, and I am trying (as we > >>> speak) to > >>> get a confirmation from NVIDIA that it is ok to pass just a single > >>> GPU. > >>> > >>> So we will either have 6 groups (one per GPU) or 2 groups (one per > >>> interconnected group). > >> > >> I'm not gaining much confidence that we can rely on isolation between > >> NVLink connected GPUs, it sounds like you're simply expecting that > >> proprietary code from NVIDIA on a proprietary interconnect from NVIDIA > >> is going to play nice and nobody will figure out how to do bad things > >> because... obfuscation? Thanks, > > > > Well, we already believe that a proprietary firmware of a sriov-capable > > adapter like Mellanox ConnextX is not doing bad things, how is this > > different in principle? > >
Re: [PATCH 3/3] ptp_qoriq: convert to use module parameters for initialization
From: Yangbo Lu Date: Mon, 30 Jul 2018 18:01:54 +0800 > +static unsigned int cksel = DEFAULT_CKSEL; > +module_param(cksel, uint, 0644); > +MODULE_PARM_DESC(cksel, "Select reference clock"); > + > +static unsigned int clk_src; > +module_param(clk_src, uint, 0644); > +MODULE_PARM_DESC(clk_src, "Reference clock frequency (if clocks property not > provided in dts)"); > + > +static unsigned int tmr_prsc = 2; > +module_param(tmr_prsc, uint, 0644); > +MODULE_PARM_DESC(tmr_prsc, "Output clock division/prescale factor"); > + > +static unsigned int tmr_fiper1 = 10; > +module_param(tmr_fiper1, uint, 0644); > +MODULE_PARM_DESC(tmr_fiper1, "Desired fixed interval pulse period (ns)"); > + > +static unsigned int tmr_fiper2 = 10; > +module_param(tmr_fiper2, uint, 0644); > +MODULE_PARM_DESC(tmr_fiper2, "Desired fixed interval pulse period (ns)"); Sorry, there is no way I am every applying something like this. Module parameters are to be avoided at all costs. And you don't need it here, you have DTS, please use it. You are required to support the existing DTS cases, in order to avoid breaking things, anyways.
Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()
Hi, Christophe. On Fri, Jul 27, 2018 at 06:40:23PM +0200, LEROY Christophe wrote: > Murilo Opsfelder Araujo a écrit : > > > Simplify the message format by using REG_FMT as the register format. This > > avoids having two different formats and avoids checking for MSR_64BIT. > > Are you sure it is what we want ? Yes. > Won't it change the behaviour for a 32 bits app running on a 64bits kernel ? In fact, this changes how many zeroes are prefixed when displaying the registers (%016lx vs. %08lx format). For example, 32-bits userspace, 64-bits kernel: before this series: [66475.002900] segv[4599]: unhandled signal 11 at nip 1420 lr 0fe61854 code 1 after this series: [ 73.414535] segv[3759]: segfault (11) at nip 1420 lr 0fe61854 code 1 in segv[1000+1] [ 73.414641] segv[3759]: code: 4e800421 80010014 38210010 7c0803a6 4b30 9421ffd0 93e1002c 7c3f0b78 [ 73.414665] segv[3759]: code: 3920 913f001c 813f001c 3941 <9149> 3920 7d234b78 397f0030 Have you spotted any other behaviour change? Cheers Murilo
Re: [PATCH v3 1/1] powerpc/pseries: fix EEH recovery of some IOV devices
On 7/29/18 8:59 PM, Sam Bobroff wrote: > EEH recovery currently fails on pSeries for some IOV capable PCI > devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide > certain device tree properties for the device. (Found on an IOV > capable device using the ipr driver.) > > Recovery fails in pci_enable_resources() at the check on r->parent, > because r->flags is set and r->parent is not. This state is due to > sriov_init() setting the start, end and flags members of the IOV BARs > but the parent not being set later in > pseries_pci_fixup_iov_resources(), because the > "ibm,open-sriov-vf-bar-info" property is missing. > > Correct this by zeroing the resource flags for IOV BARs when they > can't be configured (this is the same method used by sriov_init() and > __pci_read_base()). > > VFs cleared this way can't be enabled later, because that requires > another device tree property, "ibm,number-of-configurable-vfs" as well > as support for the RTAS function "ibm_map_pes". These are all part of > hypervisor support for IOV and it seems unlikely that a hypervisor > would ever partially, but not fully, support it. (None are currently > provided by QEMU/KVM.) > > Signed-off-by: Sam Bobroff > --- > Hi, > > This is a fix to allow EEH recovery to succeed in a specific situation, > which I've tried to explain in the commit message. > > As with the RFC version, the IOV BARs are disabled by setting the resource > flags to 0 but the other fields are now left as-is because that is what is > done > elsewhere (see sriov_init() and __pci_read_base()). > > I've also examined the concern raised by Bjorn Helgaas, that VFs could be > enabled later after the BARs are disabled, and it already seems safe: enabling > VFs (on pseries) depends on another device tree property, > "ibm,number-of-configurable-vfs" as well as support for the RTAS function > "ibm_map_pes". Since these are all part of the hypervisor's support for IOV it > seems unlikely that we would ever see some of them but not all. (None are > currently provided by QEMU/KVM.) (Additionally, the ipr driver on which the > EEH > recovery failure was discovered doesn't even seem to have SR-IOV support so it > certainly can't enable VFs.) > > Cheers, > Sam. > > Patch set v3: > Patch 1/1: powerpc/pseries: fix EEH recovery of some IOV devices > * Moved some useful information from the cover letter to the commit log. > > Patch set v2: > Patch 1/1: powerpc/pseries: fix EEH recovery of some IOV devices > * Moved the BAR disabling code to a function. > * Also check in pseries_pci_fixup_resources(). > > Patch set v1: > Patch 1/1: powerpc/pseries: fix EEH recovery of IOV devices > > arch/powerpc/platforms/pseries/setup.c | 25 + > 1 file changed, 17 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/setup.c > b/arch/powerpc/platforms/pseries/setup.c > index b55ad4286dc7..0a9e4243ae1d 100644 > --- a/arch/powerpc/platforms/pseries/setup.c > +++ b/arch/powerpc/platforms/pseries/setup.c > @@ -645,6 +645,15 @@ void of_pci_parse_iov_addrs(struct pci_dev *dev, const > int *indexes) > } > } Reviewed-by: Bryant G. Ly
Re: [PATCH 3/3] ptp_qoriq: convert to use module parameters for initialization
On Mon, Jul 30, 2018 at 06:01:54PM +0800, Yangbo Lu wrote: > The ptp_qoriq driver initialized the 1588 timer with the > configurations provided by the properties of device tree > node. For example, > > fsl,tclk-period = <5>; > fsl,tmr-prsc= <2>; > fsl,tmr-add = <0xaaab>; > fsl,tmr-fiper1 = <5>; > fsl,tmr-fiper2 = <0>; > fsl,max-adj = <4>; > > These things actually were runtime configurations which > were not proper to be put into dts. That is debatable. While I agree that the dts isn't ideal for these, still it is the lesser of two or more evils. > This patch is to convert > to use module parameters for 1588 timer initialization, and > to support initial register values calculation. It is hard for me to understand how using module parameters improves the situation. > If the parameters are not provided, the driver will calculate > register values with a set of default parameters. With this > patch, those dts properties are no longer needed for new > platform to support 1588 timer, and many QorIQ DPAA platforms > (some P series and T series platforms of PowerPC, and some > LS series platforms of ARM64) could use this driver for their > fman ptp timer with default module parameters. However, this > patch didn't remove the dts method. Because there were still > many old platforms using the dts method. We need to clean up > their dts files, verify module parameters on them, and convert > them to the new method gradually in case of breaking any > function. In addition, like it or not, because the dts is an ABI, you must continue support of the dts values as a legacy option. Thanks, Richard
Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE
Michael Ellerman writes: > Hi Laurent, > > Just one comment below. > > Laurent Dufour writes: >> diff --git a/arch/powerpc/platforms/pseries/lpar.c >> b/arch/powerpc/platforms/pseries/lpar.c >> index 96b8cd8a802d..41ed03245eb4 100644 >> --- a/arch/powerpc/platforms/pseries/lpar.c >> +++ b/arch/powerpc/platforms/pseries/lpar.c >> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long >> slot, unsigned long vpn, >> BUG_ON(lpar_rc != H_SUCCESS); >> } >> >> + >> +/* >> + * As defined in the PAPR's section 14.5.4.1.8 >> + * The control mask doesn't include the returned reference and change bit >> from >> + * the processed PTE. >> + */ >> +#define HBLKR_AVPN 0x0100UL >> +#define HBLKR_CTRL_MASK 0xf800UL >> +#define HBLKR_CTRL_SUCCESS 0x8000UL >> +#define HBLKR_CTRL_ERRNOTFOUND 0x8800UL >> +#define HBLKR_CTRL_ERRBUSY 0xa000UL >> + >> +/** >> + * H_BLOCK_REMOVE caller. >> + * @idx should point to the latest @param entry set with a PTEX. >> + * If PTE cannot be processed because another CPUs has already locked that >> + * group, those entries are put back in @param starting at index 1. >> + * If entries has to be retried and @retry_busy is set to true, these >> entries >> + * are retried until success. If @retry_busy is set to false, the returned >> + * is the number of entries yet to process. >> + */ >> +static unsigned long call_block_remove(unsigned long idx, unsigned long >> *param, >> + bool retry_busy) >> +{ >> +unsigned long i, rc, new_idx; >> +unsigned long retbuf[PLPAR_HCALL9_BUFSIZE]; >> + >> +again: >> +new_idx = 0; >> +BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE)); > > I count 1 .. > >> +if (idx < PLPAR_HCALL9_BUFSIZE) >> +param[idx] = HBR_END; >> + >> +rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf, >> + param[0], /* AVA */ >> + param[1], param[2], param[3], param[4], /* TS0-7 */ >> + param[5], param[6], param[7], param[8]); >> +if (rc == H_SUCCESS) >> +return 0; >> + >> +BUG_ON(rc != H_PARTIAL); > > 2 ... > >> +/* Check that the unprocessed entries were 'not found' or 'busy' */ >> +for (i = 0; i < idx-1; i++) { >> +unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK; >> + >> +if (ctrl == HBLKR_CTRL_ERRBUSY) { >> +param[++new_idx] = param[i+1]; >> +continue; >> +} >> + >> +BUG_ON(ctrl != HBLKR_CTRL_SUCCESS >> + && ctrl != HBLKR_CTRL_ERRNOTFOUND); > > 3 ... > > BUG_ON()s. > > I know the code in this file is already pretty liberal with the use of > BUG_ON() but I'd prefer if we don't make it any worse. > > Given this is an optimisation it seems like we should be able to fall > back to the existing implementation in the case of error (which will > probably then BUG_ON() ) > > If there's some reason we can't then I guess I can live with it. It would be nice to log the error in case we are not expecting the error return. We recently did https://marc.info/?i=20180629083904.29250-1-aneesh.ku...@linux.ibm.com -aneesh
Re: [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation
Laurent Dufour writes: > This part of code will be called also when dealing with H_BLOCK_REMOVE. > > Cc: "Aneesh Kumar K.V" > Cc: Nicholas Piggin > Cc: Michael Ellerman > Cc: Paul Mackerras > Cc: Benjamin Herrenschmidt > Signed-off-by: Laurent Dufour Reviewed-by: Aneesh Kumar K.V > --- > arch/powerpc/platforms/pseries/lpar.c | 27 --- > 1 file changed, 20 insertions(+), 7 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/lpar.c > b/arch/powerpc/platforms/pseries/lpar.c > index 52eeff1297f4..96b8cd8a802d 100644 > --- a/arch/powerpc/platforms/pseries/lpar.c > +++ b/arch/powerpc/platforms/pseries/lpar.c > @@ -547,6 +547,24 @@ static int pSeries_lpar_hpte_removebolted(unsigned long > ea, > return 0; > } > > + > +static inline unsigned long compute_slot(real_pte_t pte, > + unsigned long vpn, > + unsigned long index, > + unsigned long shift, > + int ssize) > +{ > + unsigned long slot, hash, hidx; > + > + hash = hpt_hash(vpn, shift, ssize); > + hidx = __rpte_to_hidx(pte, index); > + if (hidx & _PTEIDX_SECONDARY) > + hash = ~hash; > + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; > + slot += hidx & _PTEIDX_GROUP_IX; > + return slot; > +} > + > /* > * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie > * lock. > @@ -559,7 +577,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long > number, int local) > struct ppc64_tlb_batch *batch = this_cpu_ptr(_tlb_batch); > int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE); > unsigned long param[PLPAR_HCALL9_BUFSIZE]; > - unsigned long hash, index, shift, hidx, slot; > + unsigned long index, shift, slot; > real_pte_t pte; > int psize, ssize; > > @@ -573,12 +591,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long > number, int local) > vpn = batch->vpn[i]; > pte = batch->pte[i]; > pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) { > - hash = hpt_hash(vpn, shift, ssize); > - hidx = __rpte_to_hidx(pte, index); > - if (hidx & _PTEIDX_SECONDARY) > - hash = ~hash; > - slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; > - slot += hidx & _PTEIDX_GROUP_IX; > + slot = compute_slot(pte, vpn, index, shift, ssize); > if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) { > /* >* lpar doesn't use the passed actual page size > -- > 2.7.4
Re: [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
Laurent Dufour writes: > This feature tells if the hcall H_BLOCK_REMOVE is available. > > Cc: "Aneesh Kumar K.V" > Cc: Nicholas Piggin > Cc: Michael Ellerman > Cc: Paul Mackerras > Cc: Benjamin Herrenschmidt > Signed-off-by: Laurent Dufour Reviewed-by: Aneesh Kumar K.V > --- > arch/powerpc/include/asm/firmware.h | 3 ++- > arch/powerpc/platforms/pseries/firmware.c | 1 + > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/firmware.h > b/arch/powerpc/include/asm/firmware.h > index 535add3f7791..360ba197f9d2 100644 > --- a/arch/powerpc/include/asm/firmware.h > +++ b/arch/powerpc/include/asm/firmware.h > @@ -53,6 +53,7 @@ > #define FW_FEATURE_PRRN ASM_CONST(0x0002) > #define FW_FEATURE_DRMEM_V2 ASM_CONST(0x0004) > #define FW_FEATURE_DRC_INFO ASM_CONST(0x0008) > +#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0010) > > #ifndef __ASSEMBLY__ > > @@ -70,7 +71,7 @@ enum { > FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY | > FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN | > FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 | > - FW_FEATURE_DRC_INFO, > + FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE, > FW_FEATURE_PSERIES_ALWAYS = 0, > FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL, > FW_FEATURE_POWERNV_ALWAYS = 0, > diff --git a/arch/powerpc/platforms/pseries/firmware.c > b/arch/powerpc/platforms/pseries/firmware.c > index a3bbeb43689e..1624501386f4 100644 > --- a/arch/powerpc/platforms/pseries/firmware.c > +++ b/arch/powerpc/platforms/pseries/firmware.c > @@ -65,6 +65,7 @@ hypertas_fw_features_table[] = { > {FW_FEATURE_SET_MODE, "hcall-set-mode"}, > {FW_FEATURE_BEST_ENERGY,"hcall-best-energy-1*"}, > {FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"}, > + {FW_FEATURE_BLOCK_REMOVE, "hcall-block-remove"}, > }; > > /* Build up the firmware features bitmask using the contents of > -- > 2.7.4
Re: [PATCH 2/2] powerpc: Use of_machine_compatible_match()
On Mon, Jul 30, 2018 at 7:15 AM Michael Ellerman wrote: > > Use of_machine_compatible_match() in platform code rather than open > coding. This is good because I want to get rid of of_root being used outside of drivers/of/. Can you also convert this one: drivers/clk/clk-qoriq.c:of_device_is_compatible(of_root, "fsl,ls1021a")) { > > Signed-off-by: Michael Ellerman > --- > arch/powerpc/platforms/40x/ppc40x_simple.c| 2 +- > arch/powerpc/platforms/512x/mpc512x_generic.c | 2 +- > arch/powerpc/platforms/52xx/lite5200.c| 2 +- > arch/powerpc/platforms/52xx/media5200.c | 2 +- > arch/powerpc/platforms/52xx/mpc5200_simple.c | 2 +- > arch/powerpc/platforms/83xx/mpc830x_rdb.c | 2 +- > arch/powerpc/platforms/83xx/mpc831x_rdb.c | 2 +- > arch/powerpc/platforms/83xx/mpc837x_rdb.c | 2 +- > arch/powerpc/platforms/85xx/corenet_generic.c | 2 +- > arch/powerpc/platforms/85xx/tqm85xx.c | 2 +- > 10 files changed, 10 insertions(+), 10 deletions(-) Acked-by: Rob Herring Rob
Re: [PATCH 1/2] of: Add of_machine_compatible_match()
On Mon, Jul 30, 2018 at 7:15 AM Michael Ellerman wrote: > > We have of_machine_is_compatible() to check if a machine is compatible > with a single compatible string. However some code is able to support > multiple compatible boards, and so wants to check for one of many > compatible strings. > > So add of_machine_compatible_match() which takes a NULL terminated > array of compatible strings to check against the root node's > compatible property. > > Compared to an open coded match this is slightly more self > documenting, and also avoids the caller needing to juggle the root > node either directly or via of_find_node_by_path(). > > Signed-off-by: Michael Ellerman > --- > drivers/of/base.c | 21 + > include/linux/of.h | 6 ++ > 2 files changed, 27 insertions(+) > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 848f549164cd..603716ba8513 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -505,6 +505,27 @@ int of_device_compatible_match(struct device_node > *device, > return score; > } > > +/** > + * of_machine_compatible_match - Test root of device tree against a > compatible array > + * @compats: NULL terminated array of compatible strings to look for in root > node's compatible property. > + * > + * Returns true if the root node has any of the given compatible values in > its > + * compatible property. > + */ > +bool of_machine_compatible_match(const char *const *compats) > +{ > + struct device_node *root; > + int rc = 0; > + > + root = of_node_get(of_root); > + if (root) { > + rc = of_device_compatible_match(root, compats); > + of_node_put(root); > + } > + > + return rc != 0; > +} > + > /** > * of_machine_is_compatible - Test root of device tree for a given > compatible value > * @compat: compatible string to look for in root node's compatible property. > diff --git a/include/linux/of.h b/include/linux/of.h > index 4d25e4f952d9..05e3e23a3a57 100644 > --- a/include/linux/of.h > +++ b/include/linux/of.h > @@ -389,6 +389,7 @@ extern int of_alias_get_id(struct device_node *np, const > char *stem); > extern int of_alias_get_highest_id(const char *stem); > > extern int of_machine_is_compatible(const char *compat); > +extern bool of_machine_compatible_match(const char *const *compats); Would be nice if of_machine_is_compatible could be implemented in terms of of_machine_compatible_match like this: int of_machine_is_compatible(const char *compat) { const char *compats[] = { compat, NULL }; return of_machine_is_compatible(compats); } Probably can be inline too. Rob
Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE
Hi Laurent, Just one comment below. Laurent Dufour writes: > diff --git a/arch/powerpc/platforms/pseries/lpar.c > b/arch/powerpc/platforms/pseries/lpar.c > index 96b8cd8a802d..41ed03245eb4 100644 > --- a/arch/powerpc/platforms/pseries/lpar.c > +++ b/arch/powerpc/platforms/pseries/lpar.c > @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long > slot, unsigned long vpn, > BUG_ON(lpar_rc != H_SUCCESS); > } > > + > +/* > + * As defined in the PAPR's section 14.5.4.1.8 > + * The control mask doesn't include the returned reference and change bit > from > + * the processed PTE. > + */ > +#define HBLKR_AVPN 0x0100UL > +#define HBLKR_CTRL_MASK 0xf800UL > +#define HBLKR_CTRL_SUCCESS 0x8000UL > +#define HBLKR_CTRL_ERRNOTFOUND 0x8800UL > +#define HBLKR_CTRL_ERRBUSY 0xa000UL > + > +/** > + * H_BLOCK_REMOVE caller. > + * @idx should point to the latest @param entry set with a PTEX. > + * If PTE cannot be processed because another CPUs has already locked that > + * group, those entries are put back in @param starting at index 1. > + * If entries has to be retried and @retry_busy is set to true, these entries > + * are retried until success. If @retry_busy is set to false, the returned > + * is the number of entries yet to process. > + */ > +static unsigned long call_block_remove(unsigned long idx, unsigned long > *param, > +bool retry_busy) > +{ > + unsigned long i, rc, new_idx; > + unsigned long retbuf[PLPAR_HCALL9_BUFSIZE]; > + > +again: > + new_idx = 0; > + BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE)); I count 1 .. > + if (idx < PLPAR_HCALL9_BUFSIZE) > + param[idx] = HBR_END; > + > + rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf, > + param[0], /* AVA */ > + param[1], param[2], param[3], param[4], /* TS0-7 */ > + param[5], param[6], param[7], param[8]); > + if (rc == H_SUCCESS) > + return 0; > + > + BUG_ON(rc != H_PARTIAL); 2 ... > + /* Check that the unprocessed entries were 'not found' or 'busy' */ > + for (i = 0; i < idx-1; i++) { > + unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK; > + > + if (ctrl == HBLKR_CTRL_ERRBUSY) { > + param[++new_idx] = param[i+1]; > + continue; > + } > + > + BUG_ON(ctrl != HBLKR_CTRL_SUCCESS > +&& ctrl != HBLKR_CTRL_ERRNOTFOUND); 3 ... BUG_ON()s. I know the code in this file is already pretty liberal with the use of BUG_ON() but I'd prefer if we don't make it any worse. Given this is an optimisation it seems like we should be able to fall back to the existing implementation in the case of error (which will probably then BUG_ON() ) If there's some reason we can't then I guess I can live with it. cheers
Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
Geert Uytterhoeven writes: > Hi Michael, > > On Mon, Jul 30, 2018 at 8:47 AM Michael Ellerman wrote: >> Finn Thain writes: >> > Now that the PowerMac via-pmu driver supports m68k PowerBooks, >> > switch over to that driver and remove the via-pmu68k driver. >> > >> > Cc: Geert Uytterhoeven >> > Tested-by: Stan Johnson >> > Signed-off-by: Finn Thain >> > --- >> > arch/m68k/configs/mac_defconfig | 2 +- >> > arch/m68k/configs/multi_defconfig | 2 +- >> > arch/m68k/mac/config.c| 2 +- >> > arch/m68k/mac/misc.c | 48 +-- >> > drivers/macintosh/Kconfig | 13 +- >> > drivers/macintosh/Makefile| 1 - >> > drivers/macintosh/adb.c | 2 +- >> > drivers/macintosh/via-pmu68k.c| 846 >> > -- >> > include/uapi/linux/pmu.h | 2 +- >> > 9 files changed, 14 insertions(+), 904 deletions(-) >> > delete mode 100644 drivers/macintosh/via-pmu68k.c >> >> Geert are you OK with this and the other one that touches arch/m68k ? > > Sure, feel free to take them through the PPC tree. > Acked-by: Geert Uytterhoeven Thanks will do. cheers
Re: [RFC 0/4] Virtio uses DMA API for all devices
On Mon, Jul 30, 2018 at 04:18:02AM -0700, Christoph Hellwig wrote: > On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote: > > Let me reply to the "crappy" part first: > > So virtio devices can run on another CPU or on a PCI bus. Configuration > > can happen over mupltiple transports. There is a discovery protocol to > > figure out where it is. It has some warts but any real system has warts. > > > > So IMHO virtio running on another CPU isn't "legacy virtual crappy > > virtio". virtio devices that actually sit on a PCI bus aren't "sane" > > simply because the DMA is more convoluted on some architectures. > > All of what you said would be true if virtio didn't claim to be > a PCI device. There's nothing virtio claims to be. It's a PV device that uses PCI for its configuration. Configuration is enumerated on the virtual PCI bus. That part of the interface is emulated PCI. Data path is through a PV device enumerated on the virtio bus. > Once it claims to be a PCI device and we also see > real hardware written to the interface I stand to all what I said > above. Real hardware would reuse parts of the interface but by necessity it needs to behave slightly differently on some platforms. However for some platforms (such as x86) a PV virtio driver will by luck work with a PCI device backend without changes. As these platforms and drivers are widely deployed, some people will deploy hardware like that. Should be a non issue as by definition it's transparent to guests. > > With this out of my system: > > I agree these approaches are hacky. I think it is generally better to > > have virtio feature negotiation tell you whether device runs on a CPU or > > not rather than rely on platform specific ways for this. To this end > > there was a recent proposal to rename VIRTIO_F_IO_BARRIER to > > VIRTIO_F_REAL_DEVICE. It got stuck since "real" sounds vague to people, > > e.g. what if it's a VF - is that real or not? But I can see something > > like e.g. VIRTIO_F_PLATFORM_DMA gaining support. > > > > We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk > > and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing. > > I don't really care about the exact naming, and indeed a device that > sets the flag doesn't have to be a 'real' device - it just has to act > like one. I explained all the issues that this means (at least relating > to DMA) in one of the previous threads. I believe you refer to this: https://lkml.org/lkml/2018/6/7/15 that was a very helpful list outlining the problems we need to solve, thanks a lot for that! > The important bit is that we can specify exact behavior for both > devices that sets the "I'm real!" flag and that ones that don't exactly > in the spec. I would very much like that, yes. > And that very much excludes arch-specific (or > Xen-specific) overrides. We already committed to a xen specific hack but generally I prefer devices that describe how they work instead of platforms magically guessing, yes. However the question people raise is that DMA API is already full of arch-specific tricks the likes of which are outlined in your post linked above. How is this one much worse? -- MST
[PATCH 2/2] powerpc: Use of_machine_compatible_match()
Use of_machine_compatible_match() in platform code rather than open coding. Signed-off-by: Michael Ellerman --- arch/powerpc/platforms/40x/ppc40x_simple.c| 2 +- arch/powerpc/platforms/512x/mpc512x_generic.c | 2 +- arch/powerpc/platforms/52xx/lite5200.c| 2 +- arch/powerpc/platforms/52xx/media5200.c | 2 +- arch/powerpc/platforms/52xx/mpc5200_simple.c | 2 +- arch/powerpc/platforms/83xx/mpc830x_rdb.c | 2 +- arch/powerpc/platforms/83xx/mpc831x_rdb.c | 2 +- arch/powerpc/platforms/83xx/mpc837x_rdb.c | 2 +- arch/powerpc/platforms/85xx/corenet_generic.c | 2 +- arch/powerpc/platforms/85xx/tqm85xx.c | 2 +- 10 files changed, 10 insertions(+), 10 deletions(-) Tested the corenet_generic.c change on a p5020ds, the rest are untested owing to lack of hardware. diff --git a/arch/powerpc/platforms/40x/ppc40x_simple.c b/arch/powerpc/platforms/40x/ppc40x_simple.c index 2a050007bbae..f401a8add141 100644 --- a/arch/powerpc/platforms/40x/ppc40x_simple.c +++ b/arch/powerpc/platforms/40x/ppc40x_simple.c @@ -63,7 +63,7 @@ static const char * const board[] __initconst = { static int __init ppc40x_probe(void) { - if (of_device_compatible_match(of_root, board)) { + if (of_machine_compatible_match(board)) { pci_set_flags(PCI_REASSIGN_ALL_RSRC); return 1; } diff --git a/arch/powerpc/platforms/512x/mpc512x_generic.c b/arch/powerpc/platforms/512x/mpc512x_generic.c index bf884d3075e4..952ea53d4829 100644 --- a/arch/powerpc/platforms/512x/mpc512x_generic.c +++ b/arch/powerpc/platforms/512x/mpc512x_generic.c @@ -38,7 +38,7 @@ static const char * const board[] __initconst = { */ static int __init mpc512x_generic_probe(void) { - if (!of_device_compatible_match(of_root, board)) + if (!of_machine_compatible_match(board)) return 0; mpc512x_init_early(); diff --git a/arch/powerpc/platforms/52xx/lite5200.c b/arch/powerpc/platforms/52xx/lite5200.c index c94c385cc919..5f11e906e02c 100644 --- a/arch/powerpc/platforms/52xx/lite5200.c +++ b/arch/powerpc/platforms/52xx/lite5200.c @@ -183,7 +183,7 @@ static const char * const board[] __initconst = { */ static int __init lite5200_probe(void) { - return of_device_compatible_match(of_root, board); + return of_machine_compatible_match(board); } define_machine(lite5200) { diff --git a/arch/powerpc/platforms/52xx/media5200.c b/arch/powerpc/platforms/52xx/media5200.c index 1fcab233d2f2..8641bb55c8e8 100644 --- a/arch/powerpc/platforms/52xx/media5200.c +++ b/arch/powerpc/platforms/52xx/media5200.c @@ -242,7 +242,7 @@ static const char * const board[] __initconst = { */ static int __init media5200_probe(void) { - return of_device_compatible_match(of_root, board); + return of_machine_compatible_match(board); } define_machine(media5200_platform) { diff --git a/arch/powerpc/platforms/52xx/mpc5200_simple.c b/arch/powerpc/platforms/52xx/mpc5200_simple.c index a80c6278d515..7b8b3be0f159 100644 --- a/arch/powerpc/platforms/52xx/mpc5200_simple.c +++ b/arch/powerpc/platforms/52xx/mpc5200_simple.c @@ -70,7 +70,7 @@ static const char *board[] __initdata = { */ static int __init mpc5200_simple_probe(void) { - return of_device_compatible_match(of_root, board); + return of_machine_compatible_match(board); } define_machine(mpc5200_simple_platform) { diff --git a/arch/powerpc/platforms/83xx/mpc830x_rdb.c b/arch/powerpc/platforms/83xx/mpc830x_rdb.c index 272c41c387b9..7c0cd27e71e7 100644 --- a/arch/powerpc/platforms/83xx/mpc830x_rdb.c +++ b/arch/powerpc/platforms/83xx/mpc830x_rdb.c @@ -43,7 +43,7 @@ static const char *board[] __initdata = { */ static int __init mpc830x_rdb_probe(void) { - return of_device_compatible_match(of_root, board); + return of_machine_compatible_match(board); } machine_device_initcall(mpc830x_rdb, mpc83xx_declare_of_platform_devices); diff --git a/arch/powerpc/platforms/83xx/mpc831x_rdb.c b/arch/powerpc/platforms/83xx/mpc831x_rdb.c index fd80fd570e67..3718db1c74e4 100644 --- a/arch/powerpc/platforms/83xx/mpc831x_rdb.c +++ b/arch/powerpc/platforms/83xx/mpc831x_rdb.c @@ -43,7 +43,7 @@ static const char *board[] __initdata = { */ static int __init mpc831x_rdb_probe(void) { - return of_device_compatible_match(of_root, board); + return of_machine_compatible_match(board); } machine_device_initcall(mpc831x_rdb, mpc83xx_declare_of_platform_devices); diff --git a/arch/powerpc/platforms/83xx/mpc837x_rdb.c b/arch/powerpc/platforms/83xx/mpc837x_rdb.c index 0c55fa6af2d5..08c8434d895c 100644 --- a/arch/powerpc/platforms/83xx/mpc837x_rdb.c +++ b/arch/powerpc/platforms/83xx/mpc837x_rdb.c @@ -70,7 +70,7 @@ static const char * const board[] __initconst = { */ static int __init mpc837x_rdb_probe(void) { - return of_device_compatible_match(of_root, board); + return of_machine_compatible_match(board); } define_machine(mpc837x_rdb) { diff --git
[PATCH 1/2] of: Add of_machine_compatible_match()
We have of_machine_is_compatible() to check if a machine is compatible with a single compatible string. However some code is able to support multiple compatible boards, and so wants to check for one of many compatible strings. So add of_machine_compatible_match() which takes a NULL terminated array of compatible strings to check against the root node's compatible property. Compared to an open coded match this is slightly more self documenting, and also avoids the caller needing to juggle the root node either directly or via of_find_node_by_path(). Signed-off-by: Michael Ellerman --- drivers/of/base.c | 21 + include/linux/of.h | 6 ++ 2 files changed, 27 insertions(+) diff --git a/drivers/of/base.c b/drivers/of/base.c index 848f549164cd..603716ba8513 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -505,6 +505,27 @@ int of_device_compatible_match(struct device_node *device, return score; } +/** + * of_machine_compatible_match - Test root of device tree against a compatible array + * @compats: NULL terminated array of compatible strings to look for in root node's compatible property. + * + * Returns true if the root node has any of the given compatible values in its + * compatible property. + */ +bool of_machine_compatible_match(const char *const *compats) +{ + struct device_node *root; + int rc = 0; + + root = of_node_get(of_root); + if (root) { + rc = of_device_compatible_match(root, compats); + of_node_put(root); + } + + return rc != 0; +} + /** * of_machine_is_compatible - Test root of device tree for a given compatible value * @compat: compatible string to look for in root node's compatible property. diff --git a/include/linux/of.h b/include/linux/of.h index 4d25e4f952d9..05e3e23a3a57 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -389,6 +389,7 @@ extern int of_alias_get_id(struct device_node *np, const char *stem); extern int of_alias_get_highest_id(const char *stem); extern int of_machine_is_compatible(const char *compat); +extern bool of_machine_compatible_match(const char *const *compats); extern int of_add_property(struct device_node *np, struct property *prop); extern int of_remove_property(struct device_node *np, struct property *prop); @@ -877,6 +878,11 @@ static inline int of_machine_is_compatible(const char *compat) return 0; } +static inline bool of_machine_compatible_match(const char *const *compats) +{ + return false; +} + static inline bool of_console_check(const struct device_node *dn, const char *name, int index) { return false; -- 2.14.1
Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code
On Mon, 30 Jul 2018, Christoph Hellwig wrote: > There is nothing arch specific about PCI or dma-debug, so move this > call to common code just after registering the bus type. > > Signed-off-by: Christoph Hellwig Acked-by: Thomas Gleixner
Re: [RFC 0/4] Virtio uses DMA API for all devices
On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote: > Let me reply to the "crappy" part first: > So virtio devices can run on another CPU or on a PCI bus. Configuration > can happen over mupltiple transports. There is a discovery protocol to > figure out where it is. It has some warts but any real system has warts. > > So IMHO virtio running on another CPU isn't "legacy virtual crappy > virtio". virtio devices that actually sit on a PCI bus aren't "sane" > simply because the DMA is more convoluted on some architectures. All of what you said would be true if virtio didn't claim to be a PCI device. Once it claims to be a PCI device and we also see real hardware written to the interface I stand to all what I said above. > With this out of my system: > I agree these approaches are hacky. I think it is generally better to > have virtio feature negotiation tell you whether device runs on a CPU or > not rather than rely on platform specific ways for this. To this end > there was a recent proposal to rename VIRTIO_F_IO_BARRIER to > VIRTIO_F_REAL_DEVICE. It got stuck since "real" sounds vague to people, > e.g. what if it's a VF - is that real or not? But I can see something > like e.g. VIRTIO_F_PLATFORM_DMA gaining support. > > We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk > and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing. I don't really care about the exact naming, and indeed a device that sets the flag doesn't have to be a 'real' device - it just has to act like one. I explained all the issues that this means (at least relating to DMA) in one of the previous threads. The important bit is that we can specify exact behavior for both devices that sets the "I'm real!" flag and that ones that don't exactly in the spec. And that very much excludes arch-specific (or Xen-specific) overrides.
Re: [PATCH] of/fdt: Remove PPC32 longtrail hack in memory scan
Rob Herring writes: > On Thu, Jul 26, 2018 at 11:36 PM Michael Ellerman wrote: >> When the OF code was originally made common by Grant in commit >> 51975db0b733 ("of/flattree: merge early_init_dt_scan_memory() common >> code") (Feb 2010), the common code inherited a hack to handle >> PPC "longtrail" machines, which had a "memory@0" node with no >> device_type. >> >> That check was then made to only apply to PPC32 in b44aa25d20e2 ("of: >> Handle memory@0 node on PPC32 only") (May 2014). >> >> But according to Paul Mackerras the "longtrail" machines are long >> dead, if they were ever seen in the wild at all. If someone does still >> have one, we can handle this firmware wart in powerpc platform code. >> >> So remove the hack once and for all. > > Yay. I guess Power Macs and other quirks will never die... Not soon. In base.c I see: - the hack in arch_find_n_match_cpu_physical_id() - we should just move that into arch code, it's a __weak arch hook after all. - a PPC hack in of_alias_scan(), I guess we need to retain that behaviour, but it's pretty minor anyway. In address.c there's the powermac empty ranges hack. Seems like we could fix that just by creating empty `ranges` properties in fixup_device_tree(). I don't think we support booting powermacs other than via prom_init(). (Ben?) > I'll queue this up. cheers
Re: [RFC 0/4] Virtio uses DMA API for all devices
On Mon, Jul 30, 2018 at 02:34:14AM -0700, Christoph Hellwig wrote: > We really need to distinguish between legacy virtual crappy > virtio (and that includes v1) that totally ignores the bus it pretends > to be on, and sane virtio (to be defined) that sit on a real (or > properly emulated including iommu and details for dma mapping) bus. Let me reply to the "crappy" part first: So virtio devices can run on another CPU or on a PCI bus. Configuration can happen over mupltiple transports. There is a discovery protocol to figure out where it is. It has some warts but any real system has warts. So IMHO virtio running on another CPU isn't "legacy virtual crappy virtio". virtio devices that actually sit on a PCI bus aren't "sane" simply because the DMA is more convoluted on some architectures. Performance impact of the optimizations possible when you know your "device" is in fact just another CPU has been measured, it is real, so we aren't interested in adding all that overhead back just so we can use DMA API. The "correct then fast" mantra doesn't apply to something that is as widely deployed as virtio. And I can accept an argument that maybe the DMA API isn't designed to support such virtual DMA. Whether it should I don't know. With this out of my system: I agree these approaches are hacky. I think it is generally better to have virtio feature negotiation tell you whether device runs on a CPU or not rather than rely on platform specific ways for this. To this end there was a recent proposal to rename VIRTIO_F_IO_BARRIER to VIRTIO_F_REAL_DEVICE. It got stuck since "real" sounds vague to people, e.g. what if it's a VF - is that real or not? But I can see something like e.g. VIRTIO_F_PLATFORM_DMA gaining support. We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing. -- MST
[PATCH 1/3] arm64: dts: fsl: add clocks property for fman ptp timer node
This patch is to add clocks property for fman ptp timer node. Signed-off-by: Yangbo Lu --- arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi b/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi index a56a408..4664c33 100644 --- a/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi +++ b/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi @@ -80,4 +80,5 @@ ptp_timer0: ptp-timer@1afe000 { compatible = "fsl,fman-ptp-timer"; reg = <0x0 0x1afe000 0x0 0x1000>; interrupts = ; + clocks = < 3 0>; }; -- 1.7.1
[PATCH 2/3] powerpc/mpc85xx: add clocks property for fman ptp timer node
This patch is to add clocks property for fman ptp timer node. Signed-off-by: Yangbo Lu --- arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi |1 + arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi |1 + arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi |1 + arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi |1 + arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi |1 + 5 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi index 6b124f7..9b6cf91 100644 --- a/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi +++ b/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi @@ -100,4 +100,5 @@ ptp_timer0: ptp-timer@4fe000 { compatible = "fsl,fman-ptp-timer"; reg = <0x4fe000 0x1000>; interrupts = <96 2 0 0>; + clocks = < 3 0>; }; diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi index b80aaf5..e95c11f 100644 --- a/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi +++ b/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi @@ -100,4 +100,5 @@ ptp_timer1: ptp-timer@5fe000 { compatible = "fsl,fman-ptp-timer"; reg = <0x5fe000 0x1000>; interrupts = <97 2 0 0>; + clocks = < 3 1>; }; diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi index d3720fd..d62b36c 100644 --- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi +++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi @@ -105,4 +105,5 @@ ptp_timer0: ptp-timer@4fe000 { compatible = "fsl,fman-ptp-timer"; reg = <0x4fe000 0x1000>; interrupts = <96 2 0 0>; + clocks = < 3 0>; }; diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi index ae34c20..3102324 100644 --- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi +++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi @@ -105,4 +105,5 @@ ptp_timer1: ptp-timer@5fe000 { compatible = "fsl,fman-ptp-timer"; reg = <0x5fe000 0x1000>; interrupts = <97 2 0 0>; + clocks = < 3 1>; }; diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi index 02f2755..c90702b 100644 --- a/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi +++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi @@ -93,4 +93,5 @@ ptp_timer0: ptp-timer@4fe000 { compatible = "fsl,fman-ptp-timer"; reg = <0x4fe000 0x1000>; interrupts = <96 2 0 0>; + clocks = < 3 0>; }; -- 1.7.1
[PATCH 3/3] ptp_qoriq: convert to use module parameters for initialization
The ptp_qoriq driver initialized the 1588 timer with the configurations provided by the properties of device tree node. For example, fsl,tclk-period = <5>; fsl,tmr-prsc= <2>; fsl,tmr-add = <0xaaab>; fsl,tmr-fiper1 = <5>; fsl,tmr-fiper2 = <0>; fsl,max-adj = <4>; These things actually were runtime configurations which were not proper to be put into dts. This patch is to convert to use module parameters for 1588 timer initialization, and to support initial register values calculation. If the parameters are not provided, the driver will calculate register values with a set of default parameters. With this patch, those dts properties are no longer needed for new platform to support 1588 timer, and many QorIQ DPAA platforms (some P series and T series platforms of PowerPC, and some LS series platforms of ARM64) could use this driver for their fman ptp timer with default module parameters. However, this patch didn't remove the dts method. Because there were still many old platforms using the dts method. We need to clean up their dts files, verify module parameters on them, and convert them to the new method gradually in case of breaking any function. Signed-off-by: Yangbo Lu --- drivers/ptp/ptp_qoriq.c | 117 +++- include/linux/fsl/ptp_qoriq.h |1 + 2 files changed, 115 insertions(+), 3 deletions(-) diff --git a/drivers/ptp/ptp_qoriq.c b/drivers/ptp/ptp_qoriq.c index a14c317..22baf83 100644 --- a/drivers/ptp/ptp_qoriq.c +++ b/drivers/ptp/ptp_qoriq.c @@ -29,9 +29,30 @@ #include #include #include +#include #include +static unsigned int cksel = DEFAULT_CKSEL; +module_param(cksel, uint, 0644); +MODULE_PARM_DESC(cksel, "Select reference clock"); + +static unsigned int clk_src; +module_param(clk_src, uint, 0644); +MODULE_PARM_DESC(clk_src, "Reference clock frequency (if clocks property not provided in dts)"); + +static unsigned int tmr_prsc = 2; +module_param(tmr_prsc, uint, 0644); +MODULE_PARM_DESC(tmr_prsc, "Output clock division/prescale factor"); + +static unsigned int tmr_fiper1 = 10; +module_param(tmr_fiper1, uint, 0644); +MODULE_PARM_DESC(tmr_fiper1, "Desired fixed interval pulse period (ns)"); + +static unsigned int tmr_fiper2 = 10; +module_param(tmr_fiper2, uint, 0644); +MODULE_PARM_DESC(tmr_fiper2, "Desired fixed interval pulse period (ns)"); + /* * Register access functions */ @@ -317,6 +338,91 @@ static int ptp_qoriq_enable(struct ptp_clock_info *ptp, .enable = ptp_qoriq_enable, }; +/** + * qoriq_ptp_nominal_freq - calculate nominal frequency by reference clock + * frequency + * + * @clk_src: reference clock frequency + * + * The nominal frequency is the desired clock frequency. + * It should be less than the reference clock frequency. + * It should be a factor of 1000MHz. + * + * Return the nominal frequency + */ +static u32 qoriq_ptp_nominal_freq(u32 clk_src) +{ + u32 remainder = 0; + + clk_src /= 100; + remainder = clk_src % 100; + if (remainder) { + clk_src -= remainder; + clk_src += 100; + } + + do { + clk_src -= 100; + + } while (1000 % clk_src); + + return clk_src * 100; +} + +static int qoriq_ptp_config(struct qoriq_ptp *qoriq_ptp, + struct device_node *node) +{ + struct clk *clk; + u64 freq_comp; + u64 max_adj; + u32 nominal_freq; + + qoriq_ptp->cksel = cksel; + + if (clk_src) { + qoriq_ptp->clk_src = clk_src; + } else { + clk = of_clk_get(node, 0); + if (!IS_ERR(clk)) { + qoriq_ptp->clk_src = clk_get_rate(clk); + clk_put(clk); + } + } + + if (qoriq_ptp->clk_src <= 1UL) { + pr_err("error reference clock value, or lower than 100MHz\n"); + return -EINVAL; + } + + nominal_freq = qoriq_ptp_nominal_freq(qoriq_ptp->clk_src); + if (!nominal_freq) + return -EINVAL; + + qoriq_ptp->tclk_period = 10UL / nominal_freq; + qoriq_ptp->tmr_prsc = tmr_prsc; + + /* Calculate initial frequency compensation value for TMR_ADD register. +* freq_comp = ceil(2^32 / freq_ratio) +* freq_ratio = reference_clock_freq / nominal_freq +*/ + freq_comp = ((u64)1 << 32) * nominal_freq; + if (do_div(freq_comp, qoriq_ptp->clk_src)) + freq_comp++; + + qoriq_ptp->tmr_add = freq_comp; + qoriq_ptp->tmr_fiper1 = tmr_fiper1 - qoriq_ptp->tclk_period; + qoriq_ptp->tmr_fiper2 = tmr_fiper2 - qoriq_ptp->tclk_period; + + /* max_adj = 10 * (freq_ratio - 1.0) - 1 +* freq_ratio = reference_clock_freq / nominal_freq +*/ + max_adj = 10ULL * (qoriq_ptp->clk_src - nominal_freq); + max_adj = max_adj
Re: [PATCH] powerpc/mm: Don't report PUDs as memory leaks when using kmemleak
Dear Michael, On 07/30/18 08:43, Michael Ellerman wrote: > Paul Menzel writes: >> Am 19.07.2018 um 16:33 schrieb Michael Ellerman: > ... >>> >>> The fix is fairly simple. We need to tell kmemleak to ignore PUD >>> allocations and never report them as leaks. We can also tell it not to >>> scan the PGD, because it will never find pointers in there. However it >>> will still notice if we allocate a PGD and then leak it. >>> >>> Reported-by: Paul Menzel >>> Signed-off-by: Michael Ellerman > --- >>> arch/powerpc/include/asm/book3s/64/pgalloc.h | 23 +-- >>> 1 file changed, 21 insertions(+), 2 deletions(-) >> >> […] >> >> Tested-by: Paul Menzel on IBM S822LC > > Thanks. No problem. I forgot to add, that it’d be great, if you tagged this for the stable series too. Cc: sta...@vger.kernel.org Kind regards, Paul smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFC 0/4] Virtio uses DMA API for all devices
On Fri, Jul 27, 2018 at 10:58:05AM +0100, Will Deacon wrote: > > I just wanted to say that this patch series provides a means for us to > force the coherent DMA ops for legacy virtio devices on arm64, which in turn > means that we can enable the SMMU with legacy devices in our fastmodel > emulation platform (which is slowly being upgraded to virtio 1.0) without > hanging during boot. Patch below. Yikes, this is a nightmare. That is exactly where I do not want things to end up. We really need to distinguish between legacy virtual crappy virtio (and that includes v1) that totally ignores the bus it pretends to be on, and sane virtio (to be defined) that sit on a real (or properly emulated including iommu and details for dma mapping) bus. Having a mumble jumble of arch specific undocumented magic as in the powerpc patch replied to or this arm patch is a complete no-go. Nacked-by: Christoph Hellwig for both.
Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
> > + > > + if (xen_domain()) > > + goto skip_override; > > + > > + if (virtio_has_iommu_quirk(dev)) > > + set_dma_ops(dev->dev.parent, _direct_dma_ops); > > + > > + skip_override: > > + > > I prefer normal if scoping as opposed to goto spaghetti pls. > Better yet move vring_use_dma_api here and use it. > Less of a chance something will break. I agree about avoid pointless gotos here, but we can do things perfectly well without either gotos or a confusing helper here if we structure it right. E.g.: // suitably detailed comment here if (!xen_domain() && !virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) set_dma_ops(dev->dev.parent, _direct_dma_ops); and while we're at it - modifying dma ops for the parent looks very dangerous. I don't think we can do that, as it could break iommu setup interactions. IFF we set a specific dma map ops it has to be on the virtio device itself, of which we have full control.
Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
> +const struct dma_map_ops virtio_direct_dma_ops; This belongs into a header if it is non-static. If you only use it in this file anyway please mark it static and avoid a forward declaration. > + > int virtio_finalize_features(struct virtio_device *dev) > { > int ret = dev->config->finalize_features(dev); > @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev) > if (ret) > return ret; > > + if (virtio_has_iommu_quirk(dev)) > + set_dma_ops(dev->dev.parent, _direct_dma_ops); This needs a big fat comment explaining what is going on here. Also not new, but I find the existance of virtio_has_iommu_quirk and its name horribly confusing. It might be better to open code it here once only a single caller is left.
Re: [RFC 1/4] virtio: Define virtio_direct_dma_ops structure
> +/* > + * Virtio direct mapping DMA API operations structure > + * > + * This defines DMA API structure for all virtio devices which would not > + * either bring in their own DMA OPS from architecture or they would not > + * like to use architecture specific IOMMU based DMA OPS because QEMU > + * expects GPA instead of an IOVA in absence of VIRTIO_F_IOMMU_PLATFORM. > + */ > +dma_addr_t virtio_direct_map_page(struct device *dev, struct page *page, > + unsigned long offset, size_t size, > + enum dma_data_direction dir, > + unsigned long attrs) All these functions should probably be marked static. > +void virtio_direct_unmap_page(struct device *hwdev, dma_addr_t dev_addr, > + size_t size, enum dma_data_direction dir, > + unsigned long attrs) > +{ > +} No need to implement no-op callbacks in struct dma_map_ops. > + > +int virtio_direct_mapping_error(struct device *hwdev, dma_addr_t dma_addr) > +{ > + return 0; > +} Including this one. > +void *virtio_direct_alloc(struct device *dev, size_t size, dma_addr_t > *dma_handle, > + gfp_t gfp, unsigned long attrs) > +{ > + void *queue = alloc_pages_exact(PAGE_ALIGN(size), gfp); > + > + if (queue) { > + phys_addr_t phys_addr = virt_to_phys(queue); > + *dma_handle = (dma_addr_t)phys_addr; > + > + if (WARN_ON_ONCE(*dma_handle != phys_addr)) { > + free_pages_exact(queue, PAGE_ALIGN(size)); > + return NULL; > + } > + } > + return queue; queue is a very odd name in a generic memory allocator. > +void virtio_direct_free(struct device *dev, size_t size, void *vaddr, > + dma_addr_t dma_addr, unsigned long attrs) > +{ > + free_pages_exact(vaddr, PAGE_ALIGN(size)); > +} > + > +const struct dma_map_ops virtio_direct_dma_ops = { > + .alloc = virtio_direct_alloc, > + .free = virtio_direct_free, > + .map_page = virtio_direct_map_page, > + .unmap_page = virtio_direct_unmap_page, > + .mapping_error = virtio_direct_mapping_error, > +}; This is missing a dma_map_sg implementation. In general this is mandatory for dma_ops. So either you implement it or explain in a common why you think you can skip it. > +EXPORT_SYMBOL(virtio_direct_dma_ops); EXPORT_SYMBOL_GPL like all virtio symbols, please.
Re: Infinite looping observed in __offline_pages
On Fri 27-07-18 12:32:59, John Allen wrote: > On Wed, Jul 25, 2018 at 10:03:36PM +0200, Michal Hocko wrote: > > On Wed 25-07-18 13:11:15, John Allen wrote: > > [...] > > > Does a failure in do_migrate_range indicate that the range is unmigratable > > > and the loop in __offline_pages should terminate and goto failed_removal? > > > Or > > > should we allow a certain number of retrys before we > > > give up on migrating the range? > > > > Unfortunatelly not. Migration code doesn't tell a difference between > > ephemeral and permanent failures. We are relying on > > start_isolate_page_range to tell us this. So the question is, what kind > > of page is not migratable and for what reason. > > > > Are you able to add some debugging to give us more information. The > > current debugging code in the hotplug/migration sucks... > > After reproducing the problem a couple times, it seems that it can occur for > different types of pages. Running page-types on the offending page over two > separate instances produced the following: > > # tools/vm/page-types -a 307968-308224 > flags page-count MB symbolic-flags > long-symbolic-flags > 0x0400 10 > __Bbuddy >total 10 Huh! How come a buddy page has non zero reference count. > > And the following on a separate run: > > # tools/vm/page-types -a 313088-313344 > flags page-count MB symbolic-flags > long-symbolic-flags > 0x006c 10 > __RU_lA > referenced,uptodate,lru,active > total 10 Hmm, what is the expected page count in this case? Seeing 1 doesn't look particularly wrong. -- Michal Hocko SUSE Labs
Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
On 11/07/2018 19:26, Alexey Kardashevskiy wrote: > On Tue, 10 Jul 2018 16:37:15 -0600 > Alex Williamson wrote: > >> On Tue, 10 Jul 2018 14:10:20 +1000 >> Alexey Kardashevskiy wrote: >> >>> On Thu, 7 Jun 2018 23:03:23 -0600 >>> Alex Williamson wrote: >>> On Fri, 8 Jun 2018 14:14:23 +1000 Alexey Kardashevskiy wrote: > On 8/6/18 1:44 pm, Alex Williamson wrote: >> On Fri, 8 Jun 2018 13:08:54 +1000 >> Alexey Kardashevskiy wrote: >> >>> On 8/6/18 8:15 am, Alex Williamson wrote: On Fri, 08 Jun 2018 07:54:02 +1000 Benjamin Herrenschmidt wrote: > On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote: >> >> Can we back up and discuss whether the IOMMU grouping of NVLink >> connected devices makes sense? AIUI we have a PCI view of these >> devices and from that perspective they're isolated. That's the view >> of >> the device used to generate the grouping. However, not visible to >> us, >> these devices are interconnected via NVLink. What isolation >> properties >> does NVLink provide given that its entire purpose for existing seems >> to >> be to provide a high performance link for p2p between devices? >> > > Not entire. On POWER chips, we also have an nvlink between the device > and the CPU which is running significantly faster than PCIe. > > But yes, there are cross-links and those should probably be accounted > for in the grouping. Then after we fix the grouping, can we just let the host driver manage this coherent memory range and expose vGPUs to guests? The use case of assigning all 6 GPUs to one VM seems pretty limited. (Might need to convince NVIDIA to support more than a single vGPU per VM though) >>> >>> These are physical GPUs, not virtual sriov-alike things they are >>> implementing as well elsewhere. >> >> vGPUs as implemented on M- and P-series Teslas aren't SR-IOV like >> either. That's why we have mdev devices now to implement software >> defined devices. I don't have first hand experience with V-series, but >> I would absolutely expect a PCIe-based Tesla V100 to support vGPU. >> > > So assuming V100 can do vGPU, you are suggesting ditching this patchset > and > using mediated vGPUs instead, correct? If it turns out that our PCIe-only-based IOMMU grouping doesn't account for lack of isolation on the NVLink side and we correct that, limiting assignment to sets of 3 interconnected GPUs, is that still a useful feature? OTOH, it's entirely an NVIDIA proprietary decision whether they choose to support vGPU on these GPUs or whether they can be convinced to support multiple vGPUs per VM. >>> My current understanding is that every P9 chip in that box has some >>> NVLink2 >>> logic on it so each P9 is directly connected to 3 GPUs via PCIe and >>> 2xNVLink2, and GPUs in that big group are interconnected by NVLink2 >>> links >>> as well. >>> >>> From small bits of information I have it seems that a GPU can perfectly >>> work alone and if the NVIDIA driver does not see these interconnects >>> (because we do not pass the rest of the big 3xGPU group to this guest), >>> it >>> continues with a single GPU. There is an "nvidia-smi -r" big reset >>> hammer >>> which simply refuses to work until all 3 GPUs are passed so there is >>> some >>> distinction between passing 1 or 3 GPUs, and I am trying (as we speak) >>> to >>> get a confirmation from NVIDIA that it is ok to pass just a single GPU. >>> >>> So we will either have 6 groups (one per GPU) or 2 groups (one per >>> interconnected group). >> >> I'm not gaining much confidence that we can rely on isolation between >> NVLink connected GPUs, it sounds like you're simply expecting that >> proprietary code from NVIDIA on a proprietary interconnect from NVIDIA >> is going to play nice and nobody will figure out how to do bad things >> because... obfuscation? Thanks, > > Well, we already believe that a proprietary firmware of a sriov-capable > adapter like Mellanox ConnextX is not doing bad things, how is this > different in principle? It seems like the scope and hierarchy are different. Here we're talking about exposing big discrete devices, which are peers of one another (and have history of being reverse engineered), to userspace drivers. Once handed to userspace, each of those devices needs to be considered untrusted. In the case of SR-IOV, we typically have a trusted host driver
Re: powerpc: 32BIT vs. 64BIT (PPC32 vs. PPC64)
2018-07-07 23:59 GMT+09:00 Randy Dunlap : > On 07/07/2018 05:13 AM, Nicholas Piggin wrote: >> On Fri, 6 Jul 2018 21:58:29 -0700 >> Randy Dunlap wrote: >> >>> On 07/06/2018 06:45 PM, Benjamin Herrenschmidt wrote: On Thu, 2018-07-05 at 14:30 -0700, Randy Dunlap wrote: > Hi, > > Is there a good way (or a shortcut) to do something like: > > $ make ARCH=powerpc O=PPC32 [other_options] allmodconfig > to get a PPC32/32BIT allmodconfig > > and also be able to do: > > $make ARCH=powerpc O=PPC64 [other_options] allmodconfig > to get a PPC64/64BIT allmodconfig? Hrm... O= is for the separate build dir, so there much be something else. You mean having ARCH= aliases like ppc/ppc32 and ppc64 ? >>> >>> Yes. >>> That would be a matter of overriding some .config defaults I suppose, I don't know how this is done on other archs. I see the aliasing trick in the Makefile but that's about it. > Note that arch/x86, arch/sh, and arch/sparc have ways to do > some flavor(s) of this (from Documentation/kbuild/kbuild.txt; > sh and sparc based on a recent "fix" patch from me): I fail to see what you are actually talking about here ... sorry. Do you have concrete examples on x86 or sparc ? From what I can tell the "i386" or "sparc32/sparc64" aliases just change SRCARCH in Makefile and 32 vs 64-bit is just a Kconfig option... >>> >>> Yes, your summary is mostly correct. >>> >>> I'm just looking for a way to do cross-compile builds that are close to >>> ppc32 allmodconfig and ppc64 allmodconfig. >> >> Would there a problem with adding ARCH=ppc32 / ppc64 matching? This >> seems to work... >> >> Thanks, >> Nick > > Yes, this mostly works and is similar to a patch (my patch) on my test > machine. > And they both work for allmodconfig, which is my primary build target. > > And they both have one little quirk that is confusing when the build target > is defconfig: > > When ARCH=ppc32, the terminal output (stdout) is: (using O=PPC32) > > make[1]: Entering directory '/home/rdunlap/lnx/lnx-418-rc3/PPC32' > GEN ./Makefile > *** Default configuration is based on 'ppc64_defconfig' < NOTE < > # > # configuration written to .config > # > make[1]: Leaving directory '/home/rdunlap/lnx/lnx-418-rc3/PPC32' > Maybe, we can set one of ppc32 defconfigs to KBUILD_DEFCONFIG if ARCH is ppc32 ? ifeq ($(ARCH),ppc32) KBUILD_DEFCONFIG := (some reasonable 32bit machine _defconfig) else KBUILD_DEFCONFIG := ppc64_defconfig endif ifeq ($(CROSS_COMPILE),) KBUILD_DEFCONFIG := $(shell uname -m)_defconfig endif > I expect that can be fixed also. :) > > And the written .config file is indeed for 32BIT, not 64BIT. > > Thanks, Nick. > >> --- >> Makefile | 8 >> arch/powerpc/Kconfig | 9 + >> arch/powerpc/platforms/Kconfig.cputype | 8 >> 3 files changed, 17 insertions(+), 8 deletions(-) >> >> diff --git a/Makefile b/Makefile >> index c5ce55cbc543..f97204aed17a 100644 >> --- a/Makefile >> +++ b/Makefile >> @@ -345,6 +345,14 @@ ifeq ($(ARCH),sh64) >> SRCARCH := sh >> endif >> >> +# Additional ARCH settings for powerpc >> +ifeq ($(ARCH),ppc32) >> + SRCARCH := powerpc >> +endif >> +ifeq ($(ARCH),ppc64) >> + SRCARCH := powerpc >> +endif >> + >> KCONFIG_CONFIG ?= .config >> export KCONFIG_CONFIG >> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >> index 9f2b75fe2c2d..3405b1b122be 100644 >> --- a/arch/powerpc/Kconfig >> +++ b/arch/powerpc/Kconfig >> @@ -1,4 +1,13 @@ >> # SPDX-License-Identifier: GPL-2.0 >> + >> +config PPC64 >> + bool "64-bit kernel" if "$(ARCH)" = "powerpc" >> + default "$(ARCH)" != "ppc32" >> + select ZLIB_DEFLATE >> + help >> + This option selects whether a 32-bit or a 64-bit kernel >> + will be built. >> + >> source "arch/powerpc/platforms/Kconfig.cputype" >> >> config PPC32 >> diff --git a/arch/powerpc/platforms/Kconfig.cputype >> b/arch/powerpc/platforms/Kconfig.cputype >> index e6a1de521319..f6e5d6ef9782 100644 >> --- a/arch/powerpc/platforms/Kconfig.cputype >> +++ b/arch/powerpc/platforms/Kconfig.cputype >> @@ -1,12 +1,4 @@ >> # SPDX-License-Identifier: GPL-2.0 >> -config PPC64 >> - bool "64-bit kernel" >> - default n >> - select ZLIB_DEFLATE >> - help >> - This option selects whether a 32-bit or a 64-bit kernel >> - will be built. >> - >> menu "Processor support" >> choice >> prompt "Processor Type" >> > > > -- > ~Randy > -- > To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards Masahiro Yamada
[PATCH] powerpc: do not redefined NEED_DMA_MAP_STATE
kernel/dma/Kconfig already defines NEED_DMA_MAP_STATE, just select it from PPC64 and NOT_COHERENT_CACHE instead. Signed-off-by: Christoph Hellwig --- arch/powerpc/Kconfig | 3 --- arch/powerpc/platforms/Kconfig.cputype | 2 ++ 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 9f2b75fe2c2d..f9cae7edd735 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -884,9 +884,6 @@ config ZONE_DMA bool default y -config NEED_DMA_MAP_STATE - def_bool (PPC64 || NOT_COHERENT_CACHE) - config GENERIC_ISA_DMA bool depends on ISA_DMA_API diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index e6a1de521319..a2578bf8d560 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -3,6 +3,7 @@ config PPC64 bool "64-bit kernel" default n select ZLIB_DEFLATE + select NEED_DMA_MAP_STATE help This option selects whether a 32-bit or a 64-bit kernel will be built. @@ -386,6 +387,7 @@ config NOT_COHERENT_CACHE depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON default n if PPC_47x default y + select NEED_DMA_MAP_STATE config CHECK_CACHE_COHERENCY bool -- 2.18.0
[PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code
There is nothing arch specific about PCI or dma-debug, so move this call to common code just after registering the bus type. Signed-off-by: Christoph Hellwig --- arch/powerpc/kernel/dma.c | 3 --- arch/sh/drivers/pci/pci.c | 2 -- arch/x86/kernel/pci-dma.c | 3 --- drivers/pci/pci-driver.c | 2 +- 4 files changed, 1 insertion(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index 155170d70324..dbfc7056d7df 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -357,9 +357,6 @@ EXPORT_SYMBOL_GPL(dma_get_required_mask); static int __init dma_init(void) { -#ifdef CONFIG_PCI - dma_debug_add_bus(_bus_type); -#endif #ifdef CONFIG_IBMVIO dma_debug_add_bus(_bus_type); #endif diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c index e5b7437ab4af..8256626bc53c 100644 --- a/arch/sh/drivers/pci/pci.c +++ b/arch/sh/drivers/pci/pci.c @@ -160,8 +160,6 @@ static int __init pcibios_init(void) for (hose = hose_head; hose; hose = hose->next) pcibios_scanbus(hose); - dma_debug_add_bus(_bus_type); - pci_initialized = 1; return 0; diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index ab5d9dd668d2..43f58632f123 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -155,9 +155,6 @@ static int __init pci_iommu_init(void) { struct iommu_table_entry *p; -#ifdef CONFIG_PCI - dma_debug_add_bus(_bus_type); -#endif x86_init.iommu.iommu_init(); for (p = __iommu_table; p < __iommu_table_end; p++) { diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 6792292b5fc7..bef17c3fca67 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -1668,7 +1668,7 @@ static int __init pci_driver_init(void) if (ret) return ret; #endif - + dma_debug_add_bus(_bus_type); return 0; } postcore_initcall(pci_driver_init); -- 2.18.0
Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
Hi Michael, On Mon, Jul 30, 2018 at 8:47 AM Michael Ellerman wrote: > Finn Thain writes: > > Now that the PowerMac via-pmu driver supports m68k PowerBooks, > > switch over to that driver and remove the via-pmu68k driver. > > > > Cc: Geert Uytterhoeven > > Tested-by: Stan Johnson > > Signed-off-by: Finn Thain > > --- > > arch/m68k/configs/mac_defconfig | 2 +- > > arch/m68k/configs/multi_defconfig | 2 +- > > arch/m68k/mac/config.c| 2 +- > > arch/m68k/mac/misc.c | 48 +-- > > drivers/macintosh/Kconfig | 13 +- > > drivers/macintosh/Makefile| 1 - > > drivers/macintosh/adb.c | 2 +- > > drivers/macintosh/via-pmu68k.c| 846 > > -- > > include/uapi/linux/pmu.h | 2 +- > > 9 files changed, 14 insertions(+), 904 deletions(-) > > delete mode 100644 drivers/macintosh/via-pmu68k.c > > Geert are you OK with this and the other one that touches arch/m68k ? Sure, feel free to take them through the PPC tree. Acked-by: Geert Uytterhoeven Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
On Mon, Jul 30, 2018 at 9:26 AM Geert Uytterhoeven wrote: > On Mon, Jul 2, 2018 at 10:21 AM Finn Thain wrote: > > Now that the PowerMac via-pmu driver supports m68k PowerBooks, > > switch over to that driver and remove the via-pmu68k driver. > > > > Cc: Geert Uytterhoeven > > Tested-by: Stan Johnson > > Signed-off-by: Finn Thain > > Acked-by: Geert Uytterhoeven Reviewed-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
On Mon, Jul 2, 2018 at 10:21 AM Finn Thain wrote: > Now that the PowerMac via-pmu driver supports m68k PowerBooks, > switch over to that driver and remove the via-pmu68k driver. > > Cc: Geert Uytterhoeven > Tested-by: Stan Johnson > Signed-off-by: Finn Thain Acked-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v4 08/11] macintosh/via-pmu68k: Don't load driver on unsupported hardware
On Mon, Jul 2, 2018 at 10:21 AM Finn Thain wrote: > Don't load the via-pmu68k driver on early PowerBooks. The M50753 PMU > device found in those models was never supported by this driver. > Attempting to load the driver usually causes a boot hang. > > Cc: Geert Uytterhoeven > Signed-off-by: Finn Thain > Reviewed-by: Michael Schmitz Acked-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
RE: [PATCH] Adds __init annotation at mmu_init_secondary func
Alexey Spirkov writes: > Without any additional option > > WARNING: modpost: Found 1 section mismatch(es). > > If detailed debug is switched on than: > > WARNING: vmlinux.o(.text+0x142ac): Section mismatch in reference from the > function mmu_init_secondary() to the function .init.text:ppc44x_pin_tlb() > The function mmu_init_secondary() references > the function __init ppc44x_pin_tlb(). > This is often because mmu_init_secondary lacks a __init > annotation or the annotation of ppc44x_pin_tlb is wrong. Ah right, thanks. I checked ppc47x_pin_tlb() but didn't spot the call to ppc44x_pin_tlb(). cheers
Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
Finn Thain writes: > Now that the PowerMac via-pmu driver supports m68k PowerBooks, > switch over to that driver and remove the via-pmu68k driver. > > Cc: Geert Uytterhoeven > Tested-by: Stan Johnson > Signed-off-by: Finn Thain > --- > arch/m68k/configs/mac_defconfig | 2 +- > arch/m68k/configs/multi_defconfig | 2 +- > arch/m68k/mac/config.c| 2 +- > arch/m68k/mac/misc.c | 48 +-- > drivers/macintosh/Kconfig | 13 +- > drivers/macintosh/Makefile| 1 - > drivers/macintosh/adb.c | 2 +- > drivers/macintosh/via-pmu68k.c| 846 > -- > include/uapi/linux/pmu.h | 2 +- > 9 files changed, 14 insertions(+), 904 deletions(-) > delete mode 100644 drivers/macintosh/via-pmu68k.c Geert are you OK with this and the other one that touches arch/m68k ? cheers
Re: [PATCH] powerpc/mm: Don't report PUDs as memory leaks when using kmemleak
Paul Menzel writes: > Am 19.07.2018 um 16:33 schrieb Michael Ellerman: ... >> >> The fix is fairly simple. We need to tell kmemleak to ignore PUD >> allocations and never report them as leaks. We can also tell it not to >> scan the PGD, because it will never find pointers in there. However it >> will still notice if we allocate a PGD and then leak it. >> >> Reported-by: Paul Menzel >> Signed-off-by: Michael Ellerman > --- >> arch/powerpc/include/asm/book3s/64/pgalloc.h | 23 +-- >> 1 file changed, 21 insertions(+), 2 deletions(-) > > […] > > Tested-by: Paul Menzel on IBM S822LC Thanks. cheers
Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
Michael Bringmann writes: > During LPAR migration, the content of the device tree/sysfs may > be updated including deletion and replacement of nodes in the > tree. When nodes are added to the internal node structures, they > are appended in FIFO order to a list of nodes maintained by the > OF code APIs. That hasn't been true for several years. The data structure is an n-ary tree. What kernel version are you working on? > When nodes are removed from the device tree, they > are marked OF_DETACHED, but not actually deleted from the system > to allow for pointers cached elsewhere in the kernel. The order > and content of the entries in the list of nodes is not altered, > though. Something is going wrong if this is actually happening. When the node is detached it should be *detached* from the tree of all nodes, so it should not be discoverable other than by having an existing pointer to it. That's what __of_detach_node() does: parent = np->parent; if (WARN_ON(!parent)) return; if (parent->child == np) parent->child = np->sibling; else { struct device_node *prevsib; for (prevsib = np->parent->child; prevsib->sibling != np; prevsib = prevsib->sibling) ; prevsib->sibling = np->sibling; } ie. the node must already have a NULL parent, and then it is spliced out of its parent's child list. Please give us more info so we can work out what's actually happening. cheers