Re: [RFC v2 2/2] [MOCKUP] sched/mm: Lightweight lazy mm refcounting

2020-12-03 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 4, 2020 3:26 pm: > This is a mockup. It's designed to illustrate the algorithm and how the > code might be structured. There are several things blatantly wrong with > it: > > The coding stype is not up to kernel standards. I have prototypes

Re: [RFC v2 1/2] [NEEDS HELP] x86/mm: Handle unlazying membarrier core sync in the arch code

2020-12-03 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of December 4, 2020 3:26 pm: > The core scheduler isn't a great place for > membarrier_mm_sync_core_before_usermode() -- the core scheduler doesn't > actually know whether we are lazy. With the old code, if a CPU is > running a membarrier-registered task,

Re: [PATCH] powerpc/mm: Don't see NULL pointer dereference as a KUAP fault

2020-12-03 Thread Christophe Leroy
Le 03/12/2020 à 12:55, Michael Ellerman a écrit : Christophe Leroy writes: Sometimes, NULL pointer dereferences are expected. Even when they are accidental they are unlikely an exploit attempt because the first page is never mapped. The first page can be mapped if mmap_min_addr is 0.

Re: [PATCH 3/7] powerpc/64s: flush L1D after user accesses

2020-12-03 Thread Christophe Leroy
Quoting Qian Cai : On Thu, 2020-12-03 at 12:17 -0500, Qian Cai wrote: [] > +static inline bool > +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) > +{ > + return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) && > + (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE

[RFC v2 2/2] [MOCKUP] sched/mm: Lightweight lazy mm refcounting

2020-12-03 Thread Andy Lutomirski
This is a mockup. It's designed to illustrate the algorithm and how the code might be structured. There are several things blatantly wrong with it: The coding stype is not up to kernel standards. I have prototypes in the wrong places and other hacks. There's a problem with mm_cpumask() not

[RFC v2 1/2] [NEEDS HELP] x86/mm: Handle unlazying membarrier core sync in the arch code

2020-12-03 Thread Andy Lutomirski
The core scheduler isn't a great place for membarrier_mm_sync_core_before_usermode() -- the core scheduler doesn't actually know whether we are lazy. With the old code, if a CPU is running a membarrier-registered task, goes idle, gets unlazied via a TLB shootdown IPI, and switches back to the

[RFC v2 0/2] lazy mm refcounting

2020-12-03 Thread Andy Lutomirski
This is part of a larger series here, but the beginning bit is irrelevant to the current discussion: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm=203d39d11562575fd8bd6a094d97a3a332d8b265 This is IMO a lot better than v1. It's now almost entirely in generic

[PATCH 1/3] powerpc/smp: Parse ibm, thread-groups with multiple properties

2020-12-03 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" The "ibm,thread-groups" device-tree property is an array that is used to indicate if groups of threads within a core share certain properties. It provides details of which property is being shared by which groups of threads. This array can encode information about

[PATCH 3/3] powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache

2020-12-03 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" On POWER platforms where only some groups of threads within a core share the L2-cache (indicated by the ibm,thread-groups device-tree property), we currently print the incorrect shared_cpu_map/list for L2-cache in the sysfs. This patch reports the correct

[PATCH 2/3] powerpc/smp: Add support detecting thread-groups sharing L2 cache

2020-12-03 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" On POWER systems, groups of threads within a core sharing the L2-cache can be indicated by the "ibm,thread-groups" property array with the identifier "2". This patch adds support for detecting this, and when present, populate the populating the cpu_l2_cache_mask of

[PATCH 0/3] Extend Parsing "ibm, thread-groups" for Shared-L2 information

2020-12-03 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" The "ibm,thread-groups" device-tree property is an array that is used to indicate if groups of threads within a core share certain properties. It provides details of which property is being shared by which groups of threads. This array can encode information about

Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting

2020-12-03 Thread Andy Lutomirski
> On Dec 3, 2020, at 2:13 PM, Nicholas Piggin wrote: > > Excerpts from Peter Zijlstra's message of December 3, 2020 6:44 pm: >>> On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote: >>> >>> power: same as ARM, except that the loop may be rather larger since >>> the systems are

Re: [PATCH kernel v2] vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU

2020-12-03 Thread Alex Williamson
On Sun, 22 Nov 2020 18:39:50 +1100 Alexey Kardashevskiy wrote: > We execute certain NPU2 setup code (such as mapping an LPID to a device > in NPU2) unconditionally if an Nvlink bridge is detected. However this > cannot succeed on POWER8NVL machines as the init helpers return an error > other

Re: [PATCH 0/6] Add documentation for Documentation/features at the built docs

2020-12-03 Thread Jonathan Corbet
On Mon, 30 Nov 2020 16:36:29 +0100 Mauro Carvalho Chehab wrote: > This series got already submitted last year: > > > https://lore.kernel.org/lkml/cover.1561222784.git.mchehab+sams...@kernel.org/ > > Yet, on that time, there were too many other patches related to ReST > conversion floating

Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting

2020-12-03 Thread Nicholas Piggin
Excerpts from Peter Zijlstra's message of December 3, 2020 6:44 pm: > On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote: > >> power: same as ARM, except that the loop may be rather larger since >> the systems are bigger. But I imagine it's still faster than Nick's >> approach -- a

Re: [PATCH 3/7] powerpc/64s: flush L1D after user accesses

2020-12-03 Thread Qian Cai
On Thu, 2020-12-03 at 12:17 -0500, Qian Cai wrote: > [] > > +static inline bool > > +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) > > +{ > > + return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) && > > + (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : >

Re: [PATCH 3/7] powerpc/64s: flush L1D after user accesses

2020-12-03 Thread Qian Cai
On Thu, 2020-12-03 at 12:17 -0500, Qian Cai wrote: > A simple "echo t > /proc/sysrq-trigger" will trigger this warning almost > endlessly on Power8 NV. Correction -- POWER9 NV.

Re: [PATCH 3/7] powerpc/64s: flush L1D after user accesses

2020-12-03 Thread Qian Cai
/powerpc/include/asm/book3s/64/kup-radix.h:145 do_page_fault+0x8fc/0xb70 [ 391.734232][ T1986] Modules linked in: kvm_hv kvm ip_tables x_tables sd_mod ahci libahci tg3 libata firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod [ 391.734425][ T1986] CPU: 80 PID: 1986 Comm: b

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-03 Thread Alexander Gordeev
On Thu, Dec 03, 2020 at 09:14:22AM -0800, Andy Lutomirski wrote: > > > > On Dec 3, 2020, at 9:09 AM, Alexander Gordeev > > wrote: > > > > On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote: > >> other arch folk: there's some background here: > > > > >> > >> power:

Re: [PATCH] powerpc/hotplug: assign hot added LMB to the right node

2020-12-03 Thread Greg KH
On Thu, Dec 03, 2020 at 11:15:14AM +0100, Laurent Dufour wrote: > This patch applies to 5.9 and earlier kernels only. > > Since 5.10, this has been fortunately fixed by the commit > e5e179aa3a39 ("pseries/drmem: don't cache node id in drmem_lmb struct"). Why can't we just backport that patch

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-03 Thread Andy Lutomirski
> On Dec 3, 2020, at 9:09 AM, Alexander Gordeev wrote: > > On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote: >> other arch folk: there's some background here: > >> >> power: Ridiculously complicated, seems to vary by system and kernel config. >> >> So, Nick, your

Re: [PATCH AUTOSEL 5.9 27/39] sched/idle: Fix arch_cpu_idle() vs tracing

2020-12-03 Thread Peter Zijlstra
On Thu, Dec 03, 2020 at 03:54:42PM +0100, Heiko Carstens wrote: > On Thu, Dec 03, 2020 at 08:28:21AM -0500, Sasha Levin wrote: > > From: Peter Zijlstra > > > > [ Upstream commit 58c644ba512cfbc2e39b758dd979edd1d6d00e27 ] > > > > We call arch_cpu_idle() with RCU disabled, but then use > >

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-03 Thread Alexander Gordeev
On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote: > other arch folk: there's some background here: > > https://lkml.kernel.org/r/calcetrvxube8lfnn-qs+dzroqaiw+sfug1j047ybyv31sat...@mail.gmail.com > > On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski wrote: > > > > On Sat, Nov 28,

Re: [PATCH AUTOSEL 5.9 27/39] sched/idle: Fix arch_cpu_idle() vs tracing

2020-12-03 Thread Heiko Carstens
On Thu, Dec 03, 2020 at 08:28:21AM -0500, Sasha Levin wrote: > From: Peter Zijlstra > > [ Upstream commit 58c644ba512cfbc2e39b758dd979edd1d6d00e27 ] > > We call arch_cpu_idle() with RCU disabled, but then use > local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. > > Switch

Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting

2020-12-03 Thread Rik van Riel
On Thu, 2020-12-03 at 12:31 +, Matthew Wilcox wrote: > And this just makes me think RCU freeing of mm_struct. I'm sure it's > more complicated than that (then, or now), but if an anonymous > process > is borrowing a freed mm, and the mm is freed by RCU then it will not > go > away until the

Re: powerpc 5.10-rcN boot failures with RCU_SCALE_TEST=m

2020-12-03 Thread Uladzislau Rezki
On Thu, Dec 03, 2020 at 05:22:20PM +1100, Michael Ellerman wrote: > Uladzislau Rezki writes: > > On Thu, Dec 03, 2020 at 01:03:32AM +1100, Michael Ellerman wrote: > ... > >> > >> The SMP bringup stalls because _cpu_up() is blocked trying to take > >> cpu_hotplug_lock for writing: > >> > >> [

[PATCH AUTOSEL 4.19 10/14] soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)

2020-12-03 Thread Sasha Levin
From: Hao Si [ Upstream commit 2663b3388551230cbc4606a40fabf3331ceb59e4 ] The local variable 'cpumask_t mask' is in the stack memory, and its address is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'. But the memory area where this variable is located is at risk of being modified.

[PATCH AUTOSEL 4.19 04/14] powerpc: Drop -me200 addition to build flags

2020-12-03 Thread Sasha Levin
From: Michael Ellerman [ Upstream commit e02152ba2810f7c88cb54e71cda096268dfa9241 ] Currently a build with CONFIG_E200=y will fail with: Error: invalid switch -me200 Error: unrecognized option -me200 Upstream binutils has never supported an -me200 option. Presumably it was supported at

[PATCH AUTOSEL 5.4 15/23] soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)

2020-12-03 Thread Sasha Levin
From: Hao Si [ Upstream commit 2663b3388551230cbc4606a40fabf3331ceb59e4 ] The local variable 'cpumask_t mask' is in the stack memory, and its address is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'. But the memory area where this variable is located is at risk of being modified.

[PATCH AUTOSEL 5.4 12/23] ibmvnic: skip tx timeout reset while in resetting

2020-12-03 Thread Sasha Levin
From: Lijun Pan [ Upstream commit 855a631a4c11458a9cef1ab79c1530436aa95fae ] Sometimes it takes longer than 5 seconds (watchdog timeout) to complete failover, migration, and other resets. In stead of scheduling another timeout reset, we wait for the current one to complete. Suggested-by: Brian

[PATCH AUTOSEL 5.4 05/23] powerpc: Drop -me200 addition to build flags

2020-12-03 Thread Sasha Levin
From: Michael Ellerman [ Upstream commit e02152ba2810f7c88cb54e71cda096268dfa9241 ] Currently a build with CONFIG_E200=y will fail with: Error: invalid switch -me200 Error: unrecognized option -me200 Upstream binutils has never supported an -me200 option. Presumably it was supported at

[PATCH AUTOSEL 5.9 27/39] sched/idle: Fix arch_cpu_idle() vs tracing

2020-12-03 Thread Sasha Levin
From: Peter Zijlstra [ Upstream commit 58c644ba512cfbc2e39b758dd979edd1d6d00e27 ] We call arch_cpu_idle() with RCU disabled, but then use local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. Switch all arch_cpu_idle() implementations to use raw_local_irq_{en,dis}able() and

[PATCH AUTOSEL 5.9 26/39] soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)

2020-12-03 Thread Sasha Levin
From: Hao Si [ Upstream commit 2663b3388551230cbc4606a40fabf3331ceb59e4 ] The local variable 'cpumask_t mask' is in the stack memory, and its address is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'. But the memory area where this variable is located is at risk of being modified.

[PATCH AUTOSEL 5.9 18/39] ibmvnic: skip tx timeout reset while in resetting

2020-12-03 Thread Sasha Levin
From: Lijun Pan [ Upstream commit 855a631a4c11458a9cef1ab79c1530436aa95fae ] Sometimes it takes longer than 5 seconds (watchdog timeout) to complete failover, migration, and other resets. In stead of scheduling another timeout reset, we wait for the current one to complete. Suggested-by: Brian

[PATCH AUTOSEL 5.9 09/39] powerpc: Drop -me200 addition to build flags

2020-12-03 Thread Sasha Levin
From: Michael Ellerman [ Upstream commit e02152ba2810f7c88cb54e71cda096268dfa9241 ] Currently a build with CONFIG_E200=y will fail with: Error: invalid switch -me200 Error: unrecognized option -me200 Upstream binutils has never supported an -me200 option. Presumably it was supported at

Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting

2020-12-03 Thread Matthew Wilcox
On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote: > This code compiles, but I haven't even tried to boot it. The earlier > part of the series isn't terribly interesting -- it's a handful of > cleanups that remove all reads of ->active_mm from arch/x86. I've > been meaning to do

Re: [PATCH] powerpc/mm: Don't see NULL pointer dereference as a KUAP fault

2020-12-03 Thread Michael Ellerman
Christophe Leroy writes: > Sometimes, NULL pointer dereferences are expected. Even when they > are accidental they are unlikely an exploit attempt because the > first page is never mapped. The first page can be mapped if mmap_min_addr is 0. Blocking all faults to the first page would

Re: [PATCH] EDAC, mv64x60: Fix error return code in mv64x60_pci_err_probe()

2020-12-03 Thread Borislav Petkov
On Thu, Dec 03, 2020 at 10:27:25PM +1100, Michael Ellerman wrote: > It's dead code, so drop it. > > I can send a patch if no one else wants to. Yes please. I love patches removing code! :-) -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] EDAC, mv64x60: Fix error return code in mv64x60_pci_err_probe()

2020-12-03 Thread Michael Ellerman
Borislav Petkov writes: > On Tue, Nov 24, 2020 at 02:30:09PM +0800, Wang ShaoBo wrote: >> Fix to return -ENODEV error code when edac_pci_add_device() failed instaed >> of 0 in mv64x60_pci_err_probe(), as done elsewhere in this function. >> >> Fixes: 4f4aeeabc061 ("drivers-edac: add marvell

[PATCH] powerpc/hotplug: assign hot added LMB to the right node

2020-12-03 Thread Laurent Dufour
This patch applies to 5.9 and earlier kernels only. Since 5.10, this has been fortunately fixed by the commit e5e179aa3a39 ("pseries/drmem: don't cache node id in drmem_lmb struct"). When LMBs are added to a running system, the node id assigned to the LMB is fetched from the temporary DT node

Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting

2020-12-03 Thread Peter Zijlstra
On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote: > power: same as ARM, except that the loop may be rather larger since > the systems are bigger. But I imagine it's still faster than Nick's > approach -- a cmpxchg to a remote cacheline should still be faster than > an IPI

Re: [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts

2020-12-03 Thread Alexey Kardashevskiy
On 03/12/2020 19:03, Christophe Leroy wrote: Le 03/12/2020 à 06:47, Alexey Kardashevskiy a écrit : When interrupted in raw_copy_from_user()/... after user memory access is enabled, a nested handler may also access user memory (perf is one example) and when it does so, it calls

Re: [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts

2020-12-03 Thread Alexey Kardashevskiy
On 03/12/2020 17:38, Aneesh Kumar K.V wrote: Alexey Kardashevskiy writes: When interrupted in raw_copy_from_user()/... after user memory access is enabled, a nested handler may also access user memory (perf is one example) and when it does so, it calls prevent_read_from_user() which

Re: [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts

2020-12-03 Thread Christophe Leroy
Le 03/12/2020 à 06:47, Alexey Kardashevskiy a écrit : When interrupted in raw_copy_from_user()/... after user memory access is enabled, a nested handler may also access user memory (perf is one example) and when it does so, it calls prevent_read_from_user() which prevents the upper handler