[PATCH 2/5] kernel/dma: remove unnecessary unmap_kernel_range

2021-01-26 Thread Nicholas Piggin
vunmap will remove ptes. Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: io...@lists.linux-foundation.org Signed-off-by: Nicholas Piggin --- kernel/dma/remap.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c index 905c3fa005f1

[PATCH 1/5] mm/vmalloc: remove map_kernel_range

2021-01-26 Thread Nicholas Piggin
This is a shim around vmap_pages_range, get rid of it. Move the main API comment from the _noflush variant to the normal variant, and make _noflush internal to mm/. Signed-off-by: Nicholas Piggin --- Documentation/core-api/cachetlb.rst | 2 +- include/linux/vmalloc.h | 11

[PATCH 0/5] mm/vmalloc: cleanup after hugepage series

2021-01-26 Thread Nicholas Piggin
Christoph pointed out some overdue cleanups required after the huge page series, and I had some other comment and warning changes. Thanks, Nick Nicholas Piggin (5): mm/vmalloc: remove map_kernel_range kernel/dma: remove unnecessary unmap_kernel_range powerpc/xive: remove unnecessary

[PATCH v11 13/13] powerpc/64s/radix: Enable huge vmalloc mappings

2021-01-26 Thread Nicholas Piggin
Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Nicholas Piggin --- .../admin-guide/kernel-parameters.txt | 2 ++ arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/module.c | 21 +++ 3 files changed, 20 insertions(+), 4 deletions

[PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings

2021-01-26 Thread Nicholas Piggin
misses by nearly 30x on a `git diff` workload on a 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%. This can result in more internal fragmentation and memory overhead for a given allocation, an option nohugevmalloc is added to disable at boot. Signed-off-by: Nicholas Pig

[PATCH v11 11/13] mm/vmalloc: add vmap_range_noflush variant

2021-01-26 Thread Nicholas Piggin
As a side-effect, the order of flush_cache_vmap() and arch_sync_kernel_mappings() calls are switched, but that now matches the other callers in this file. Reviewed-by: Christoph Hellwig Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 16 +--- 1 file changed, 13 insertions(+), 3

[PATCH v11 10/13] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c

2021-01-26 Thread Nicholas Piggin
This is a generic kernel virtual memory mapper, not specific to ioremap. Code is unchanged other than making vmap_range non-static. Reviewed-by: Christoph Hellwig Signed-off-by: Nicholas Piggin --- include/linux/vmalloc.h | 3 + mm/ioremap.c| 203

[PATCH v11 09/13] mm/vmalloc: provide fallback arch huge vmap support functions

2021-01-26 Thread Nicholas Piggin
If an architecture doesn't support a particular page table level as a huge vmap page size then allow it to skip defining the support query function. Suggested-by: Christoph Hellwig Signed-off-by: Nicholas Piggin --- arch/arm64/include/asm/vmalloc.h | 7 +++ arch/powerpc/include/asm

[PATCH v11 07/13] arm64: inline huge vmap supported functions

2021-01-26 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Acked-by: Catalin Marinas Signed-off-by: Nicholas Piggin --- arch/arm64/include

[PATCH v11 06/13] powerpc: inline huge vmap supported functions

2021-01-26 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: linuxppc-...@lists.ozlabs.org Acked-by: Michael Ellerman Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/vmalloc.h | 19

[PATCH v11 05/13] mm: HUGE_VMAP arch support cleanup

2021-01-26 Thread Nicholas Piggin
Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: x...@kernel.org Cc: "H. Peter Anvin" Acked-by: Catalin Marinas [arm64] Signed-off-by: Nicholas Piggin --- arch/arm64/include/asm/vmalloc.h | 8 ++ arch/arm64/mm/mmu.c | 10 +-- arch/powerpc/i

[PATCH v11 04/13] mm/ioremap: rename ioremap_*_range to vmap_*_range

2021-01-26 Thread Nicholas Piggin
This will be used as a generic kernel virtual mapping function, so re-name it in preparation. Signed-off-by: Nicholas Piggin --- mm/ioremap.c | 64 +++- 1 file changed, 33 insertions(+), 31 deletions(-) diff --git a/mm/ioremap.c b/mm/ioremap.c

[PATCH v11 03/13] mm/vmalloc: rename vmap_*_range vmap_pages_*_range

2021-01-26 Thread Nicholas Piggin
The vmalloc mapper operates on a struct page * array rather than a linear physical address, re-name it to make this distinction clear. Reviewed-by: Christoph Hellwig Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff

[PATCH v11 02/13] mm: apply_to_pte_range warn and fail if a large pte is encountered

2021-01-26 Thread Nicholas Piggin
apply_to_pte_range might mistake a large pte for bad, or treat it as a page table, resulting in a crash or corruption. Add a test to warn and return error if large entries are found. Reviewed-by: Christoph Hellwig Signed-off-by: Nicholas Piggin --- mm/memory.c | 66

Re: [PATCH v10 06/12] powerpc: inline huge vmap supported functions

2021-01-26 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of January 25, 2021 6:42 pm: > > > Le 24/01/2021 à 09:22, Nicholas Piggin a écrit : >> This allows unsupported levels to be constant folded away, and so >> p4d_free_pud_page can be removed because it's no longer linked to. > &

Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-26 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of January 25, 2021 7:14 pm: > > > Le 24/01/2021 à 09:22, Nicholas Piggin a écrit : >> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC >> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and

Re: [PATCH v4 19/23] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()

2021-01-26 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of January 26, 2021 12:48 am: > When r3 is not modified, reload it from regs->orig_r3 to free > volatile registers. This avoids a stack frame for the likely part > of system_call_exception() > > Before the patch: > > c000b4d4 : > c000b4d4: 7c 08 02 a6

Re: [PATCH v4 14/23] powerpc/syscall: Save r3 in regs->orig_r3

2021-01-26 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of January 26, 2021 12:48 am: > Save r3 in regs->orig_r3 in system_call_exception() > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/kernel/entry_64.S | 1 - > arch/powerpc/kernel/syscall.c | 2 ++ > 2 files changed, 2 insertions(+), 1

Re: [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings

2021-01-26 Thread Nicholas Piggin
Excerpts from Ding Tianhong's message of January 26, 2021 4:59 pm: > On 2021/1/26 12:45, Nicholas Piggin wrote: >> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC >> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and >> supports PM

Re: [PATCH v4 11/23] powerpc/syscall: Rename syscall_64.c into syscall.c

2021-01-26 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of January 26, 2021 12:48 am: > syscall_64.c will be reused almost as is for PPC32. > > Rename it syscall.c Could you rename it to interrupt.c instead? A system call is an interrupt, and the file now also has code to return from other interrupts as well,

Re: [PATCH v4 20/23] powerpc/syscall: Do not check unsupported scv vector on PPC32

2021-01-26 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of January 26, 2021 12:48 am: > Only PPC64 has scv. No need to check the 0x7ff0 trap on PPC32. > > And ignore the scv parameter in syscall_exit_prepare (Save 14 cycles > 346 => 332 cycles) > > Signed-off-by: Christophe Leroy > --- >

RE: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-26 Thread Nicholas Piggin
Excerpts from David Laight's message of January 25, 2021 10:24 pm: > From: Christophe Leroy >> Sent: 25 January 2021 09:15 >> >> Le 24/01/2021 à 09:22, Nicholas Piggin a écrit : >> > Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC >>

Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-24 Thread Nicholas Piggin
Excerpts from Christoph Hellwig's message of January 25, 2021 1:07 am: > On Sun, Jan 24, 2021 at 06:22:29PM +1000, Nicholas Piggin wrote: >> diff --git a/arch/Kconfig b/arch/Kconfig >> index 24862d15f3a3..f87feb616184 100644 >> --- a/arch/Kconfig >> +++ b/arch/K

Re: [PATCH v10 05/12] mm: HUGE_VMAP arch support cleanup

2021-01-24 Thread Nicholas Piggin
Excerpts from Christoph Hellwig's message of January 24, 2021 9:40 pm: >> diff --git a/arch/arm64/include/asm/vmalloc.h >> b/arch/arm64/include/asm/vmalloc.h >> index 2ca708ab9b20..597b40405319 100644 >> --- a/arch/arm64/include/asm/vmalloc.h >> +++ b/arch/arm64/include/asm/vmalloc.h >> @@ -1,4

Re: [PATCH v10 04/12] mm/ioremap: rename ioremap_*_range to vmap_*_range

2021-01-24 Thread Nicholas Piggin
Excerpts from Christoph Hellwig's message of January 24, 2021 9:36 pm: > On Sun, Jan 24, 2021 at 06:22:22PM +1000, Nicholas Piggin wrote: >> This will be used as a generic kernel virtual mapping function, so >> re-name it in preparation. > > The new name looks ok, but s

[PATCH v10 07/12] arm64: inline huge vmap supported functions

2021-01-24 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Acked-by: Catalin Marinas Signed-off-by: Nicholas Piggin --- arch/arm64/include

[PATCH v10 12/12] powerpc/64s/radix: Enable huge vmalloc mappings

2021-01-24 Thread Nicholas Piggin
Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Nicholas Piggin --- Documentation/admin-guide/kernel-parameters.txt | 2 ++ arch/powerpc/Kconfig| 1 + arch/powerpc/kernel/module.c| 13 +++-- 3 files changed, 14 insertions(+), 2 deletions

[PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

2021-01-24 Thread Nicholas Piggin
misses by nearly 30x on a `git diff` workload on a 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%. This can result in more internal fragmentation and memory overhead for a given allocation, an option nohugevmalloc is added to disable at boot. Signed-off-by: Nicholas Pig

[PATCH v10 10/12] mm/vmalloc: add vmap_range_noflush variant

2021-01-24 Thread Nicholas Piggin
As a side-effect, the order of flush_cache_vmap() and arch_sync_kernel_mappings() calls are switched, but that now matches the other callers in this file. Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/mm

[PATCH v10 08/12] x86: inline huge vmap supported functions

2021-01-24 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: x...@kernel.org Cc: "H. Peter Anvin" Signed-off-by: Nicholas Piggin --- arch/x86/i

[PATCH v10 09/12] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c

2021-01-24 Thread Nicholas Piggin
This is a generic kernel virtual memory mapper, not specific to ioremap. Signed-off-by: Nicholas Piggin --- include/linux/vmalloc.h | 3 + mm/ioremap.c| 197 mm/vmalloc.c| 196 +++ 3 files

[PATCH v10 04/12] mm/ioremap: rename ioremap_*_range to vmap_*_range

2021-01-24 Thread Nicholas Piggin
This will be used as a generic kernel virtual mapping function, so re-name it in preparation. Signed-off-by: Nicholas Piggin --- mm/ioremap.c | 64 +++- 1 file changed, 33 insertions(+), 31 deletions(-) diff --git a/mm/ioremap.c b/mm/ioremap.c

[PATCH v10 02/12] mm: apply_to_pte_range warn and fail if a large pte is encountered

2021-01-24 Thread Nicholas Piggin
apply_to_pte_range might mistake a large pte for bad, or treat it as a page table, resulting in a crash or corruption. Add a test to warn and return error if large entries are found. Signed-off-by: Nicholas Piggin --- mm/memory.c | 66 +++-- 1

[PATCH v10 05/12] mm: HUGE_VMAP arch support cleanup

2021-01-24 Thread Nicholas Piggin
Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: x...@kernel.org Cc: "H. Peter Anvin" Acked-by: Catalin Marinas [arm64] Signed-off-by: Nicholas Piggin --- arch/arm64/include/asm/vmalloc.h | 8 +++ arch/arm64/mm/mmu.c | 10 +-- arch/powerpc/i

[PATCH v10 03/12] mm/vmalloc: rename vmap_*_range vmap_pages_*_range

2021-01-24 Thread Nicholas Piggin
The vmalloc mapper operates on a struct page * array rather than a linear physical address, re-name it to make this distinction clear. Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c

[PATCH v10 06/12] powerpc: inline huge vmap supported functions

2021-01-24 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: linuxppc-...@lists.ozlabs.org Acked-by: Michael Ellerman Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/vmalloc.h | 19

[PATCH v10 01/12] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings

2021-01-24 Thread Nicholas Piggin
pings") Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 41 ++--- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index e6f352bf0498..62372f9e0167 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -34,7 +34,7 @@

[PATCH v10 00/12] huge vmalloc mappings

2021-01-24 Thread Nicholas Piggin
ut. - Made an architecture config option, powerpc only for now. Since v3: - Fixed an off-by-one bug in a loop - Fix !CONFIG_HAVE_ARCH_HUGE_VMAP build fail *** BLURB HERE *** Nicholas Piggin (12): mm/vmalloc: fix vmalloc_to_page for huge vmap mappings mm: apply_to_pte_range warn and fail if a

Re: [PATCH v9 05/12] mm: HUGE_VMAP arch support cleanup

2021-01-23 Thread Nicholas Piggin
Excerpts from Ding Tianhong's message of January 4, 2021 10:33 pm: > On 2020/12/5 14:57, Nicholas Piggin wrote: >> This changes the awkward approach where architectures provide init >> functions to determine which levels they can provide large mappings for, >> to one wher

Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()

2020-12-30 Thread Nicholas Piggin
Excerpts from Russell King - ARM Linux admin's message of December 30, 2020 8:58 pm: > On Wed, Dec 30, 2020 at 10:00:28AM +, Russell King - ARM Linux admin > wrote: >> On Wed, Dec 30, 2020 at 12:33:02PM +1000, Nicholas Piggin wrote: >> > Excerpts from Russell King - ARM

Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()

2020-12-29 Thread Nicholas Piggin
Excerpts from Russell King - ARM Linux admin's message of December 29, 2020 8:44 pm: > On Tue, Dec 29, 2020 at 01:09:12PM +1000, Nicholas Piggin wrote: >> I think it should certainly be documented in terms of what guarantees >> it provides to application, _not_ the kinds of inst

Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()

2020-12-28 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 29, 2020 10:36 am: > On Mon, Dec 28, 2020 at 4:11 PM Nicholas Piggin wrote: >> >> Excerpts from Andy Lutomirski's message of December 28, 2020 4:28 am: >> > The old sync_core_before_usermode() comments said that a non-ic

Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()

2020-12-28 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 29, 2020 10:56 am: > On Mon, Dec 28, 2020 at 4:36 PM Nicholas Piggin wrote: >> >> Excerpts from Andy Lutomirski's message of December 29, 2020 7:06 am: >> > On Mon, Dec 28, 2020 at 12:32 PM Mathieu Desnoyers >> >

Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()

2020-12-28 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 29, 2020 7:06 am: > On Mon, Dec 28, 2020 at 12:32 PM Mathieu Desnoyers > wrote: >> >> - On Dec 28, 2020, at 2:44 PM, Andy Lutomirski l...@kernel.org wrote: >> >> > On Mon, Dec 28, 2020 at 11:09 AM Russell King - ARM Linux admin >> > wrote:

Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()

2020-12-28 Thread Nicholas Piggin
; Cc: Michael Ellerman > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: linuxppc-...@lists.ozlabs.org > Cc: Nicholas Piggin > Cc: Catalin Marinas > Cc: Will Deacon > Cc: linux-arm-ker...@lists.infradead.org > Cc: Mathieu Desnoyers > Cc: x...@kernel.org > Cc

Re: [PATCH v1 05/15] powerpc: Remove address argument from bad_page_fault()

2020-12-26 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of December 22, 2020 11:28 pm: > The address argument is not used by bad_page_fault(). > > Remove it. > > Suggested-by: Nicholas Piggin > Signed-off-by: Christophe Leroy > --- > arch/powerpc/include/asm/bug.h | 4 ++-

Re: [PATCH v1 06/15] powerpc: Remove address and errorcode arguments from do_break()

2020-12-26 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of December 22, 2020 11:28 pm: > Let do_break() retrieve address and errorcode from regs. > > This simplifies the code and shouldn't impeed performance as > address and errorcode are likely still hot in the cache. > > Suggested-b

Re: [PATCH 3/3] powerpc: rewrite atomics to use ARCH_ATOMIC

2020-12-21 Thread Nicholas Piggin
Excerpts from Boqun Feng's message of November 14, 2020 1:30 am: > Hi Nicholas, > > On Wed, Nov 11, 2020 at 09:07:23PM +1000, Nicholas Piggin wrote: >> All the cool kids are doing it. >> >> Signed-off-by: Nicholas Piggin >> --- >> a

Re: [PATCH 2/2] powerpc/64s: Trim offlined CPUs from mm_cpumasks

2020-12-14 Thread Nicholas Piggin
Excerpts from Michael Ellerman's message of December 14, 2020 8:43 pm: > Nicholas Piggin writes: >> Excerpts from Geert Uytterhoeven's message of December 10, 2020 7:06 pm: >>> Hi Nicholas, >>> >>> On Fri, Nov 20, 2020 at 4:01 AM Nicholas Piggin wrote: >

[PATCH v2 4/5] powerpc: use lazy mm refcount helper functions

2020-12-13 Thread Nicholas Piggin
Use _lazy_tlb functions for lazy mm refcounting in powerpc, to prepare to move to MMU_LAZY_TLB_SHOOTDOWN. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index

[PATCH v2 5/5] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN

2020-12-13 Thread Nicholas Piggin
On a 16-socket 192-core POWER8 system, a context switching benchmark with as many software threads as CPUs (so each switch will go in and out of idle), upstream can achieve a rate of about 1 million context switches per second. After this patch it goes up to 118 million. Signed-off-by: Nicholas

[PATCH v2 2/5] lazy tlb: allow lazy tlb mm switching to be configurable

2020-12-13 Thread Nicholas Piggin
provides an alternate scheme. Signed-off-by: Nicholas Piggin --- arch/Kconfig | 17 + include/linux/sched/mm.h | 13 +-- kernel/sched/core.c | 75 ++-- kernel/sched/sched.h | 4 ++- 4 files changed, 87 insertions(+), 22

[PATCH v2 3/5] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-13 Thread Nicholas Piggin
tional interrupts on a 144 CPU system during a kernel compile). There are a number of strategies that could be employed to reduce IPIs if they turn out to be a problem for some workload. Signed-off-by: Nicholas Piggin --- arch/Kconfig | 17 +++-- kernel/fork.

[PATCH v2 1/5] lazy tlb: introduce lazy mm refcount helper functions

2020-12-13 Thread Nicholas Piggin
Add explicit _lazy_tlb annotated functions for lazy mm refcounting. This makes things a bit more explicit, and allows explicit refcounting to be removed if it is not used. Signed-off-by: Nicholas Piggin --- arch/arm/mach-rpc/ecard.c| 2 +- arch/powerpc/mm/book3s64/radix_tlb.c | 4

[PATCH v2 0/5] shoot lazy tlbs

2020-12-13 Thread Nicholas Piggin
This is another rebase, on top of mainline now (don't need the asm-generic tree), and without any x86 or membarrier changes. This makes the series far smaller and more manageable and without the controversial bits. Thanks, Nick Nicholas Piggin (5): lazy tlb: introduce lazy mm refcount helper

Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-12-13 Thread Nicholas Piggin
Excerpts from Nicholas Piggin's message of December 14, 2020 2:07 pm: > Excerpts from Andy Lutomirski's message of December 11, 2020 10:11 am: >>> On Dec 5, 2020, at 7:59 PM, Nicholas Piggin wrote: >>> >> >>> I'm still going to persue shoot-lazies for the mer

Re: [PATCH 2/2] powerpc/64s: Trim offlined CPUs from mm_cpumasks

2020-12-13 Thread Nicholas Piggin
Excerpts from Geert Uytterhoeven's message of December 10, 2020 7:06 pm: > Hi Nicholas, > > On Fri, Nov 20, 2020 at 4:01 AM Nicholas Piggin wrote: >> >> When offlining a CPU, powerpc/64s does not flush TLBs, rather it just >> leaves the CPU set in mm_cpumasks, so it

Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-12-13 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 11, 2020 10:11 am: >> On Dec 5, 2020, at 7:59 PM, Nicholas Piggin wrote: >> > >> I'm still going to persue shoot-lazies for the merge window. As you >> see it's about a dozen lines and a if (IS_ENABLED(... in core cod

Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-12-05 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 6, 2020 10:36 am: > On Sat, Dec 5, 2020 at 3:15 PM Nicholas Piggin wrote: >> >> Excerpts from Andy Lutomirski's message of December 6, 2020 2:11 am: >> > > >> If an mm was lazy tlb for a kernel

Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-12-05 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 6, 2020 2:11 am: > >> On Dec 5, 2020, at 12:00 AM, Nicholas Piggin wrote: >> >> >> I disagree. Until now nobody following it noticed that the mm gets >> un-lazied in other cases, because that was not

Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-12-05 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 3, 2020 3:09 pm: > On Tue, Dec 1, 2020 at 6:50 PM Nicholas Piggin wrote: >> >> Excerpts from Andy Lutomirski's message of November 29, 2020 3:55 am: >> > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote:

[PATCH v9 10/12] mm/vmalloc: add vmap_range_noflush variant

2020-12-04 Thread Nicholas Piggin
As a side-effect, the order of flush_cache_vmap() and arch_sync_kernel_mappings() calls are switched, but that now matches the other callers in this file. Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/mm

[PATCH v9 11/12] mm/vmalloc: Hugepage vmalloc mappings

2020-12-04 Thread Nicholas Piggin
misses by nearly 30x on a `git diff` workload on a 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%. This can result in more internal fragmentation and memory overhead for a given allocation, an option nohugevmalloc is added to disable at boot. Signed-off-by: Nicholas Pig

[PATCH v9 12/12] powerpc/64s/radix: Enable huge vmalloc mappings

2020-12-04 Thread Nicholas Piggin
Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Nicholas Piggin --- Documentation/admin-guide/kernel-parameters.txt | 2 ++ arch/powerpc/Kconfig| 1 + arch/powerpc/kernel/module.c| 13 +++-- 3 files changed, 14 insertions(+), 2 deletions

[PATCH v9 05/12] mm: HUGE_VMAP arch support cleanup

2020-12-04 Thread Nicholas Piggin
Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: x...@kernel.org Cc: "H. Peter Anvin" Acked-by: Catalin Marinas [arm64] Signed-off-by: Nicholas Piggin --- arch/arm64/include/asm/vmalloc.h | 8 +++ arch/arm64/mm/mmu.c | 10 +-- arch/powerpc/i

[PATCH v9 08/12] x86: inline huge vmap supported functions

2020-12-04 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: x...@kernel.org Cc: "H. Peter Anvin" Signed-off-by: Nicholas Piggin --- arch/x86/i

[PATCH v9 07/12] arm64: inline huge vmap supported functions

2020-12-04 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Acked-by: Catalin Marinas Signed-off-by: Nicholas Piggin --- arch/arm64/include

[PATCH v9 09/12] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c

2020-12-04 Thread Nicholas Piggin
This is a generic kernel virtual memory mapper, not specific to ioremap. Signed-off-by: Nicholas Piggin --- include/linux/vmalloc.h | 3 + mm/ioremap.c| 197 mm/vmalloc.c| 196 +++ 3 files

[PATCH v9 06/12] powerpc: inline huge vmap supported functions

2020-12-04 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: linuxppc-...@lists.ozlabs.org Acked-by: Michael Ellerman Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/vmalloc.h | 19

[PATCH v9 04/12] mm/ioremap: rename ioremap_*_range to vmap_*_range

2020-12-04 Thread Nicholas Piggin
This will be used as a generic kernel virtual mapping function, so re-name it in preparation. Signed-off-by: Nicholas Piggin --- mm/ioremap.c | 64 +++- 1 file changed, 33 insertions(+), 31 deletions(-) diff --git a/mm/ioremap.c b/mm/ioremap.c

[PATCH v9 02/12] mm: apply_to_pte_range warn and fail if a large pte is encountered

2020-12-04 Thread Nicholas Piggin
apply_to_pte_range might mistake a large pte for bad, or treat it as a page table, resulting in a crash or corruption. Add a test to warn and return error if large entries are found. Signed-off-by: Nicholas Piggin --- mm/memory.c | 66 +++-- 1

[PATCH v9 03/12] mm/vmalloc: rename vmap_*_range vmap_pages_*_range

2020-12-04 Thread Nicholas Piggin
The vmalloc mapper operates on a struct page * array rather than a linear physical address, re-name it to make this distinction clear. Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c

[PATCH v9 01/12] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings

2020-12-04 Thread Nicholas Piggin
pings") Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 41 ++--- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6ae491a8b210..f85124e88bdb 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -34,7 +34,7 @@

[PATCH v9 00/12] huge vmalloc mappings

2020-12-04 Thread Nicholas Piggin
- Rebased on vmalloc cleanups, split series into simpler pieces. - Fixed several compile errors and warnings - Keep the page array and accounting in small page units because struct vm_struct is an interface (this should fix x86 vmap stack debug assert). [Thanks Zefan] Nicholas Piggin (12):

Re: [RFC v2 2/2] [MOCKUP] sched/mm: Lightweight lazy mm refcounting

2020-12-04 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 5, 2020 12:37 am: > > >> On Dec 3, 2020, at 11:54 PM, Nicholas Piggin wrote: >> >> Excerpts from Andy Lutomirski's message of December 4, 2020 3:26 pm: >>> This is a mockup. It's designed to illustrate t

Re: [PATCH v8 11/12] mm/vmalloc: Hugepage vmalloc mappings

2020-12-04 Thread Nicholas Piggin
Excerpts from Edgecombe, Rick P's message of December 5, 2020 4:33 am: > On Fri, 2020-12-04 at 18:12 +1000, Nicholas Piggin wrote: >> Excerpts from Edgecombe, Rick P's message of December 1, 2020 6:21 >> am: >> > On Sun, 2020-11-29 at 01:25 +1000, Nicholas Piggin wrote:

Re: [PATCH v8 11/12] mm/vmalloc: Hugepage vmalloc mappings

2020-12-04 Thread Nicholas Piggin
Excerpts from Edgecombe, Rick P's message of December 1, 2020 6:21 am: > On Sun, 2020-11-29 at 01:25 +1000, Nicholas Piggin wrote: >> Support huge page vmalloc mappings. Config option >> HAVE_ARCH_HUGE_VMALLOC >> enables support on architectures that define HAVE_ARCH_HUGE_VMA

Re: [RFC v2 2/2] [MOCKUP] sched/mm: Lightweight lazy mm refcounting

2020-12-03 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 4, 2020 3:26 pm: > This is a mockup. It's designed to illustrate the algorithm and how the > code might be structured. There are several things blatantly wrong with > it: > > The coding stype is not up to kernel standards. I have prototypes

Re: [RFC v2 1/2] [NEEDS HELP] x86/mm: Handle unlazying membarrier core sync in the arch code

2020-12-03 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 4, 2020 3:26 pm: > The core scheduler isn't a great place for > membarrier_mm_sync_core_before_usermode() -- the core scheduler doesn't > actually know whether we are lazy. With the old code, if a CPU is > running a membarrier-registered task,

Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting

2020-12-03 Thread Nicholas Piggin
Excerpts from Peter Zijlstra's message of December 3, 2020 6:44 pm: > On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote: > >> power: same as ARM, except that the loop may be rather larger since >> the systems are bigger. But I imagine it's still faster than Nick's >> approach -- a

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-01 Thread Nicholas Piggin
gt; >> On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski wrote: >> > >> > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote: >> > > >> > > On big systems, the mm refcount can become highly contented when doing >> > > a lot of co

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-01 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of November 29, 2020 1:54 pm: > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote: >> >> On big systems, the mm refcount can become highly contented when doing >> a lot of context switching with threaded applications (particularly

Re: [PATCH 1/8] lazy tlb: introduce exit_lazy_tlb

2020-12-01 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of November 29, 2020 10:38 am: > On Sat, Nov 28, 2020 at 8:01 AM Nicholas Piggin wrote: >> >> This is called at points where a lazy mm is switched away or made not >> lazy (by its owner switching back). >> >> Signed-off-by:

Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-12-01 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of November 29, 2020 3:55 am: > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote: >> >> And get rid of the generic sync_core_before_usermode facility. This is >> functionally a no-op in the core scheduler code, but it also catc

Re: [PATCH 5/8] lazy tlb: allow lazy tlb mm switching to be configurable

2020-12-01 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of November 29, 2020 10:36 am: > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote: >> >> NOMMU systems could easily go without this and save a bit of code >> and the refcount atomics, because their mm switch is a no-op. I >>

[PATCH 3/8] x86: remove ARCH_HAS_SYNC_CORE_BEFORE_USERMODE

2020-11-28 Thread Nicholas Piggin
Switch remaining x86-specific users to asm/sync_core.h, remove the linux/sync_core.h header and ARCH_ option. Signed-off-by: Nicholas Piggin --- arch/x86/Kconfig| 1 - arch/x86/kernel/alternative.c | 2 +- arch/x86/kernel/cpu/mce/core.c | 2 +- drivers/misc/sgi

[PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-11-28 Thread Nicholas Piggin
). This makes lazy tlb code a bit more modular. Signed-off-by: Nicholas Piggin --- .../membarrier-sync-core/arch-support.txt | 6 - arch/x86/include/asm/mmu_context.h| 27 +++ include/linux/sched/mm.h | 14 -- kernel/cpu.c

[PATCH 1/8] lazy tlb: introduce exit_lazy_tlb

2020-11-28 Thread Nicholas Piggin
This is called at points where a lazy mm is switched away or made not lazy (by its owner switching back). Signed-off-by: Nicholas Piggin --- arch/arm/mach-rpc/ecard.c| 1 + arch/powerpc/mm/book3s64/radix_tlb.c | 1 + fs/exec.c| 6 -- include/asm

[PATCH 0/8] shoot lazy tlbs

2020-11-28 Thread Nicholas Piggin
go in a generic mm/scheduler series if we get arch acks because it's really just refactoring wrappers. The main result is reduced contention on lazy tlb mm refcount that helps very big systems. Thanks, Nick Nicholas Piggin (8): lazy tlb: introduce exit_lazy_tlb x86: use exit_lazy_tlb rather

[PATCH 5/8] lazy tlb: allow lazy tlb mm switching to be configurable

2020-11-28 Thread Nicholas Piggin
NOMMU systems could easily go without this and save a bit of code and the refcount atomics, because their mm switch is a no-op. I haven't flipped them over because haven't audited all arch code to convert over to using the _lazy_tlb refcounting. Signed-off-by: Nicholas Piggin --- arch/Kconfig

[PATCH 4/8] lazy tlb: introduce lazy mm refcount helper functions

2020-11-28 Thread Nicholas Piggin
Add explicit _lazy_tlb annotated functions for lazy mm refcounting. This makes things a bit more explicit, and allows explicit refcounting to be removed if it is not used. Signed-off-by: Nicholas Piggin --- arch/arm/mach-rpc/ecard.c| 2 +- arch/powerpc/mm/book3s64/radix_tlb.c | 4

[PATCH 7/8] powerpc: use lazy mm refcount helper functions

2020-11-28 Thread Nicholas Piggin
Use _lazy_tlb functions for lazy mm refcounting in powerpc, to prepare to move to MMU_LAZY_TLB_SHOOTDOWN. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index

[PATCH 8/8] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN

2020-11-28 Thread Nicholas Piggin
On a 16-socket 192-core POWER8 system, a context switching benchmark with as many software threads as CPUs (so each switch will go in and out of idle), upstream can achieve a rate of about 1 million context switches per second. After this patch it goes up to 118 million. Signed-off-by: Nicholas

[PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-11-28 Thread Nicholas Piggin
tional interrupts on a 144 CPU system during a kernel compile). There are a number of strategies that could be employed to reduce IPIs if they turn out to be a problem for some workload. Signed-off-by: Nicholas Piggin --- arch/Kconfig | 13 + kernel/fork.

[PATCH v8 02/12] mm: apply_to_pte_range warn and fail if a large pte is encountered

2020-11-28 Thread Nicholas Piggin
apply_to_pte_range might mistake a large pte for bad, or treat it as a page table, resulting in a crash or corruption. Add a test to warn and return error if large entries are found. Signed-off-by: Nicholas Piggin --- mm/memory.c | 66 +++-- 1

[PATCH v8 01/12] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings

2020-11-28 Thread Nicholas Piggin
pings") Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 41 ++--- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6ae491a8b210..f85124e88bdb 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -34,7 +34,7 @@

[PATCH v8 06/12] powerpc: inline huge vmap supported functions

2020-11-28 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Cc: linuxppc-...@lists.ozlabs.org Acked-by: Michael Ellerman Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/vmalloc.h | 19

[PATCH v8 10/12] mm/vmalloc: add vmap_range_noflush variant

2020-11-28 Thread Nicholas Piggin
As a side-effect, the order of flush_cache_vmap() and arch_sync_kernel_mappings() calls are switched, but that now matches the other callers in this file. Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/mm

[PATCH v8 12/12] powerpc/64s/radix: Enable huge vmalloc mappings

2020-11-28 Thread Nicholas Piggin
Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Nicholas Piggin --- Documentation/admin-guide/kernel-parameters.txt | 2 ++ arch/powerpc/Kconfig| 1 + 2 files changed, 3 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation

[PATCH v8 04/12] mm/ioremap: rename ioremap_*_range to vmap_*_range

2020-11-28 Thread Nicholas Piggin
This will be used as a generic kernel virtual mapping function, so re-name it in preparation. Signed-off-by: Nicholas Piggin --- mm/ioremap.c | 64 +++- 1 file changed, 33 insertions(+), 31 deletions(-) diff --git a/mm/ioremap.c b/mm/ioremap.c

<    1   2   3   4   5   6   7   8   9   10   >