Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Fri, Apr 19, 2024 at 02:42:16PM -0700, Song Liu wrote: > On Fri, Apr 19, 2024 at 1:00 PM Mike Rapoport wrote: > > > > On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote: > > > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport wrote: > > > [...] > > > > > > > > > > > > [1] > > > > > > https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > > > > > > > > > > For the ROX to work, we need different users (module text, kprobe, > > > > > etc.) to have > > > > > the same execmem_range. From [1]: > > > > > > > > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t > > > > > size) > > > > > { > > > > > ... > > > > >p = __execmem_cache_alloc(size); > > > > >if (p) > > > > >return p; > > > > > err = execmem_cache_populate(range, size); > > > > > ... > > > > > } > > > > > > > > > > We are calling __execmem_cache_alloc() without range. For this to > > > > > work, > > > > > we can only call execmem_cache_alloc() with one execmem_range. > > > > > > > > Actually, on x86 this will "just work" because everything shares the > > > > same > > > > address space :) > > > > > > > > The 2M pages in the cache will be in the modules space, so > > > > __execmem_cache_alloc() will always return memory from that address > > > > space. > > > > > > > > For other architectures this indeed needs to be fixed with passing the > > > > range to __execmem_cache_alloc() and limiting search in the cache for > > > > that > > > > range. > > > > > > I think we at least need the "map to" concept (initially proposed by > > > Thomas) > > > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE > > > maps to EXECMEM_MODULE_TEXT, so that all these actually share > > > the same range. > > > > Why? > > IIUC, we need to update __execmem_cache_alloc() to take a range pointer as > input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe > will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing > the "range" object, we will have to compare different range parameters to > check > we can share cached pages between module text and kprobe, which is not > efficient. Did I miss something? We can always share large ROX pages as long as they are within the correct address space. The permissions for them are ROX and the alignment differences are due to KASAN and this is handled during allocation of the large page to refill the cache. __execmem_cache_alloc() only needs to limit the search for the address space of the range. And regardless, they way we deal with sharing of the cache can be sorted out later. > Thanks, > Song -- Sincerely yours, Mike.
[GIT PULL] Please pull powerpc/linux.git powerpc-6.9-3 tag
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Linus, Please pull some more powerpc fixes for 6.9: The following changes since commit 39cd87c4eb2b893354f3b850f916353f2658ae6f: Linux 6.9-rc2 (2024-03-31 14:32:39 -0700) are available in the git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-6.9-3 for you to fetch changes up to 210cfef579260ed6c3b700e7baeae51a5e183f43: selftests/powerpc/papr-vpd: Fix missing variable initialization (2024-04-12 14:40:07 +1000) - -- powerpc fixes for 6.9 #3 - Fix wireguard loading failure on pre-Power10 due to Power10 crypto routines. - Fix papr-vpd selftest failure due to missing variable initialization. - Avoid unnecessary get/put in spapr_tce_platform_iommu_attach_dev(). Thanks to: Geetika Moolchandani, Jason Gunthorpe, Michal Suchánek, Nathan Lynch, Shivaprasad G Bhat. - -- Michael Ellerman (1): powerpc/crypto/chacha-p10: Fix failure on non Power10 Nathan Lynch (1): selftests/powerpc/papr-vpd: Fix missing variable initialization Shivaprasad G Bhat (1): powerpc/iommu: Refactor spapr_tce_platform_iommu_attach_dev() arch/powerpc/crypto/chacha-p10-glue.c | 8 +++- arch/powerpc/kernel/iommu.c | 7 +++ tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c | 2 +- 3 files changed, 11 insertions(+), 6 deletions(-) -BEGIN PGP SIGNATURE- iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmYjNHwACgkQUevqPMjh pYCbuA//VsCCZotHFjCuEMX4hKFE+FS7doJ6HAaj1lysJLVQsXW94pEufcy53hXX y4DQ06Y2vV/zJTrbYk/3vukeMDcc7ZSE9IKZlnSvZP1GizGF2dQE/AjsqNfJ5iV4 1YcE22fmjoSIL2pbD337r1BHqEwGvcd47Rm0SET6yIxQamTKHLAa7Q9OIindDMaq Qu/g6LFoa856idDkXRY96xNWRodGzNyhZ3+16F4jsE2pbGiYNbjqyua3ke8leWgI DInYs9KholNeg285Qbjq6VaXcMjJG6po8N/MDJDxdysVhF+9GTXlyKSN9oURDUtf IDWpbrF3n7WE3iUllucKWNXBQbfX5yoFlZ7PnOj+Tjm1RcyBZDBhoq05DNcosFY0 Mgbg7hKluxUP7ijSMOOKKYasMrNqJofo+f3S78L3sBDjIW2w/j6+ZP6EQ515t7he /oac+WfFr6m0rEtQtZuf0TMPWv60odfjkLuGFomcd5qpldXesMfR62rT+RSN3M7m 7kJj4duYpKJJZEdOdj9PzVMqQd/dd5cpeO2b1vJef7G6lHXRaB0tTaDMnvZBNtlL 7etRE5g1ErvOFGnoAzeXcU3GFU2ZO49cpKMjqwubivufNXDwZ7VKJfPjIsb4cfm0 lsquI0A5FhkIDr1G/0wL4VyCisKYCAuYj9NCSf9F9iM+92w+fKo= =gaHh -END PGP SIGNATURE-
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Fri, Apr 19, 2024 at 1:00 PM Mike Rapoport wrote: > > On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote: > > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport wrote: > > [...] > > > > > > > > > > [1] > > > > > https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > > > > > > > > For the ROX to work, we need different users (module text, kprobe, > > > > etc.) to have > > > > the same execmem_range. From [1]: > > > > > > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t > > > > size) > > > > { > > > > ... > > > >p = __execmem_cache_alloc(size); > > > >if (p) > > > >return p; > > > > err = execmem_cache_populate(range, size); > > > > ... > > > > } > > > > > > > > We are calling __execmem_cache_alloc() without range. For this to work, > > > > we can only call execmem_cache_alloc() with one execmem_range. > > > > > > Actually, on x86 this will "just work" because everything shares the same > > > address space :) > > > > > > The 2M pages in the cache will be in the modules space, so > > > __execmem_cache_alloc() will always return memory from that address space. > > > > > > For other architectures this indeed needs to be fixed with passing the > > > range to __execmem_cache_alloc() and limiting search in the cache for that > > > range. > > > > I think we at least need the "map to" concept (initially proposed by Thomas) > > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE > > maps to EXECMEM_MODULE_TEXT, so that all these actually share > > the same range. > > Why? IIUC, we need to update __execmem_cache_alloc() to take a range pointer as input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing the "range" object, we will have to compare different range parameters to check we can share cached pages between module text and kprobe, which is not efficient. Did I miss something? Thanks, Song
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote: > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport wrote: > [...] > > > > > > > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > > > > > > For the ROX to work, we need different users (module text, kprobe, etc.) > > > to have > > > the same execmem_range. From [1]: > > > > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t size) > > > { > > > ... > > >p = __execmem_cache_alloc(size); > > >if (p) > > >return p; > > > err = execmem_cache_populate(range, size); > > > ... > > > } > > > > > > We are calling __execmem_cache_alloc() without range. For this to work, > > > we can only call execmem_cache_alloc() with one execmem_range. > > > > Actually, on x86 this will "just work" because everything shares the same > > address space :) > > > > The 2M pages in the cache will be in the modules space, so > > __execmem_cache_alloc() will always return memory from that address space. > > > > For other architectures this indeed needs to be fixed with passing the > > range to __execmem_cache_alloc() and limiting search in the cache for that > > range. > > I think we at least need the "map to" concept (initially proposed by Thomas) > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE > maps to EXECMEM_MODULE_TEXT, so that all these actually share > the same range. Why? > Does this make sense? > > Song -- Sincerely yours, Mike.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport wrote: [...] > > > > > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > > > > For the ROX to work, we need different users (module text, kprobe, etc.) to > > have > > the same execmem_range. From [1]: > > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t size) > > { > > ... > >p = __execmem_cache_alloc(size); > >if (p) > >return p; > > err = execmem_cache_populate(range, size); > > ... > > } > > > > We are calling __execmem_cache_alloc() without range. For this to work, > > we can only call execmem_cache_alloc() with one execmem_range. > > Actually, on x86 this will "just work" because everything shares the same > address space :) > > The 2M pages in the cache will be in the modules space, so > __execmem_cache_alloc() will always return memory from that address space. > > For other architectures this indeed needs to be fixed with passing the > range to __execmem_cache_alloc() and limiting search in the cache for that > range. I think we at least need the "map to" concept (initially proposed by Thomas) to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE maps to EXECMEM_MODULE_TEXT, so that all these actually share the same range. Does this make sense? Song
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Fri, Apr 19, 2024 at 08:54:40AM -0700, Song Liu wrote: > On Thu, Apr 18, 2024 at 11:56 PM Mike Rapoport wrote: > > > > On Thu, Apr 18, 2024 at 02:01:22PM -0700, Song Liu wrote: > > > On Thu, Apr 18, 2024 at 10:54 AM Mike Rapoport wrote: > > > > > > > > On Thu, Apr 18, 2024 at 09:13:27AM -0700, Song Liu wrote: > > > > > On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport wrote: > > > > > > > > > > > > > > > > I'm looking at execmem_types more as definition of the > > > > > > > > consumers, maybe I > > > > > > > > should have named the enum execmem_consumer at the first place. > > > > > > > > > > > > > > I think looking at execmem_type from consumers' point of view adds > > > > > > > unnecessary complexity. IIUC, for most (if not all) archs, > > > > > > > ftrace, kprobe, > > > > > > > and bpf (and maybe also module text) all have the same > > > > > > > requirements. > > > > > > > Did I miss something? > > > > > > > > > > > > It's enough to have one architecture with different constrains for > > > > > > kprobes > > > > > > and bpf to warrant a type for each. > > > > > > > > > > AFAICT, some of these constraints can be changed without too much > > > > > work. > > > > > > > > But why? > > > > I honestly don't understand what are you trying to optimize here. A few > > > > lines of initialization in execmem_info? > > > > > > IIUC, having separate EXECMEM_BPF and EXECMEM_KPROBE makes it > > > harder for bpf and kprobe to share the same ROX page. In many use cases, > > > a 2MiB page (assuming x86_64) is enough for all BPF, kprobe, ftrace, and > > > module text. It is not efficient if we have to allocate separate pages > > > for each > > > of these use cases. If this is not a problem, the current approach works. > > > > The caching of large ROX pages does not need to be per type. > > > > In the POC I've posted for caching of large ROX pages on x86 [1], the cache > > is > > global and to make kprobes and bpf use it it's enough to set a flag in > > execmem_info. > > > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > > For the ROX to work, we need different users (module text, kprobe, etc.) to > have > the same execmem_range. From [1]: > > static void *execmem_cache_alloc(struct execmem_range *range, size_t size) > { > ... >p = __execmem_cache_alloc(size); >if (p) >return p; > err = execmem_cache_populate(range, size); > ... > } > > We are calling __execmem_cache_alloc() without range. For this to work, > we can only call execmem_cache_alloc() with one execmem_range. Actually, on x86 this will "just work" because everything shares the same address space :) The 2M pages in the cache will be in the modules space, so __execmem_cache_alloc() will always return memory from that address space. For other architectures this indeed needs to be fixed with passing the range to __execmem_cache_alloc() and limiting search in the cache for that range. > Did I miss something? > > Thanks, > Song -- Sincerely yours, Mike.
Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES
Le 19/04/2024 à 17:49, Mike Rapoport a écrit : > Hi Masami, > > On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote: >> Hi Mike, >> >> On Thu, 11 Apr 2024 19:00:50 +0300 >> Mike Rapoport wrote: >> >>> From: "Mike Rapoport (IBM)" >>> >>> kprobes depended on CONFIG_MODULES because it has to allocate memory for >>> code. >>> >>> Since code allocations are now implemented with execmem, kprobes can be >>> enabled in non-modular kernels. >>> >>> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside >>> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the >>> dependency of CONFIG_KPROBES on CONFIG_MODULES. >> >> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4. >> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in >> function body? We have enough dummy functions for that, so it should >> not make a problem. > > The code in check_kprobe_address_safe() that gets the module and checks for > __init functions does not compile with IS_ENABLED(CONFIG_MODULES). > I can pull it out to a helper or leave #ifdef in the function body, > whichever you prefer. As far as I can see, the only problem is MODULE_STATE_COMING. Can we move 'enum module_state' out of #ifdef CONFIG_MODULES in module.h ? > >> -- >> Masami Hiramatsu >
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Thu, Apr 18, 2024 at 11:56 PM Mike Rapoport wrote: > > On Thu, Apr 18, 2024 at 02:01:22PM -0700, Song Liu wrote: > > On Thu, Apr 18, 2024 at 10:54 AM Mike Rapoport wrote: > > > > > > On Thu, Apr 18, 2024 at 09:13:27AM -0700, Song Liu wrote: > > > > On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport wrote: > > > > > > > > > > > > > > I'm looking at execmem_types more as definition of the consumers, > > > > > > > maybe I > > > > > > > should have named the enum execmem_consumer at the first place. > > > > > > > > > > > > I think looking at execmem_type from consumers' point of view adds > > > > > > unnecessary complexity. IIUC, for most (if not all) archs, ftrace, > > > > > > kprobe, > > > > > > and bpf (and maybe also module text) all have the same requirements. > > > > > > Did I miss something? > > > > > > > > > > It's enough to have one architecture with different constrains for > > > > > kprobes > > > > > and bpf to warrant a type for each. > > > > > > > > AFAICT, some of these constraints can be changed without too much work. > > > > > > But why? > > > I honestly don't understand what are you trying to optimize here. A few > > > lines of initialization in execmem_info? > > > > IIUC, having separate EXECMEM_BPF and EXECMEM_KPROBE makes it > > harder for bpf and kprobe to share the same ROX page. In many use cases, > > a 2MiB page (assuming x86_64) is enough for all BPF, kprobe, ftrace, and > > module text. It is not efficient if we have to allocate separate pages for > > each > > of these use cases. If this is not a problem, the current approach works. > > The caching of large ROX pages does not need to be per type. > > In the POC I've posted for caching of large ROX pages on x86 [1], the cache is > global and to make kprobes and bpf use it it's enough to set a flag in > execmem_info. > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org For the ROX to work, we need different users (module text, kprobe, etc.) to have the same execmem_range. From [1]: static void *execmem_cache_alloc(struct execmem_range *range, size_t size) { ... p = __execmem_cache_alloc(size); if (p) return p; err = execmem_cache_populate(range, size); ... } We are calling __execmem_cache_alloc() without range. For this to work, we can only call execmem_cache_alloc() with one execmem_range. Did I miss something? Thanks, Song
Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES
Hi Masami, On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote: > Hi Mike, > > On Thu, 11 Apr 2024 19:00:50 +0300 > Mike Rapoport wrote: > > > From: "Mike Rapoport (IBM)" > > > > kprobes depended on CONFIG_MODULES because it has to allocate memory for > > code. > > > > Since code allocations are now implemented with execmem, kprobes can be > > enabled in non-modular kernels. > > > > Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside > > modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the > > dependency of CONFIG_KPROBES on CONFIG_MODULES. > > Thanks for this work, but this conflicts with the latest fix in v6.9-rc4. > Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in > function body? We have enough dummy functions for that, so it should > not make a problem. The code in check_kprobe_address_safe() that gets the module and checks for __init functions does not compile with IS_ENABLED(CONFIG_MODULES). I can pull it out to a helper or leave #ifdef in the function body, whichever you prefer. > -- > Masami Hiramatsu -- Sincerely yours, Mike.
Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE
You are right. I think, the "reboot" should be replaced with just "boot up". If there are no other comments, or code changes, I can re-word the commit message and submit for review. Thanks, Gaurav On 4/19/24 6:11 AM, Michal Suchánek wrote: Hello, On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote: Gaurav Batra writes: At the time of LPAR reboot, partition firmware provides Open Firmware property ibm,dma-window for the PE. This property is provided on the PCI bus the PE is attached to. AFAICS you're actually describing a bug that happens during boot *up*? Describing it as "reboot" makes me think you're talking about the shutdown path. I think that will confuse people, me at least :) there is probably an assumption that it must have been running previously for the errors to happen in the first place but given the error state persists for a day it may be a very long 'reboot'. Thanks Michal cheers There are execptions where the partition firmware might not provide this property for the PE at the time of LPAR reboot. One of the scenario is where the firmware has frozen the PE due to some error conditions. This PE is frozen for 24 hours or unless the whole system is reinitialized. Within this time frame, if the LPAR is rebooted, the frozen PE will be presented to the LPAR but ibm,dma-window property could be missing. Today, under these circumstances, the LPAR oopses with NULL pointer dereference, when configuring the PCI bus the PE is attached to. BUG: Kernel NULL pointer dereference on read at 0x00c8 Faulting instruction address: 0xc01024c0 Oops: Kernel access of bad area, sig: 7 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: Supported: Yes CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1 Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries NIP: c01024c0 LR: c01024b0 CTR: c0102450 REGS: c37db5c0 TRAP: 0300 Not tainted (6.4.0-150600.9-default) MSR: 82009033 CR: 28000822 XER: CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0 ... NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0 LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 Call Trace: pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable) pcibios_setup_bus_self+0x1c0/0x370 __of_scan_bus+0x2f8/0x330 pcibios_scan_phb+0x280/0x3d0 pcibios_init+0x88/0x12c do_one_initcall+0x60/0x320 kernel_init_freeable+0x344/0x3e4 kernel_init+0x34/0x1d0 ret_from_kernel_user_thread+0x14/0x1c Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Signed-off-by: Gaurav Batra --- arch/powerpc/platforms/pseries/iommu.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index e8c4129697b1..e808d5b1fa49 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus) * parent bus. During reboot, there will be ibm,dma-window property to * define DMA window. For kdump, there will at least be default window or DDW * or both. +* There is an exception to the above. In case the PE goes into frozen +* state, firmware may not provide ibm,dma-window property at the time +* of LPAR reboot. */ + if (!pdn) { + pr_debug(" no ibm,dma-window property !\n"); + return; + } + ppci = PCI_DN(pdn); pr_debug(" parent is %pOF, iommu_table: 0x%p\n", base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702 -- 2.39.3 (Apple Git-146)
Re: [PATCH 1/3] x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n
On Fri, Apr 19, 2024 at 07:06:00AM -0700, Sean Christopherson wrote: > On Fri, Apr 19, 2024, Will Deacon wrote: > > On Mon, Apr 15, 2024 at 07:31:23AM -0700, Sean Christopherson wrote: > > > On Mon, Apr 15, 2024, Geert Uytterhoeven wrote: > > > Oof. I completely missed that "cpu_mitigations" wasn't x86-only. I > > > can't think > > > of better solution than an on-by-default generic Kconfig, though can't > > > that it > > > more simply be: > > > > > > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig > > > index 2b8fd6bb7da0..5930cb56ee29 100644 > > > --- a/drivers/base/Kconfig > > > +++ b/drivers/base/Kconfig > > > @@ -191,6 +191,9 @@ config GENERIC_CPU_AUTOPROBE > > > config GENERIC_CPU_VULNERABILITIES > > > bool > > > > > > +config SPECULATION_MITIGATIONS > > > + def_bool !X86 > > > + > > > config SOC_BUS > > > bool > > > select GLOB > > > > I can't see this in -next yet. Do you plan to post it as a proper patch > > to collect acks etc? > > Sorry, I neglected to Cc everyone. > > https://lore.kernel.org/all/20240417001507.2264512-2-sea...@google.com Ah, thanks. I'll go Ack that... Will
[PATCH] ibmvnic: Use -EBUSY in __ibmvnic_reset()
From: Markus Elfring Date: Fri, 19 Apr 2024 15:46:17 +0200 Add a minus sign before the error code “EBUSY” so that a negative value will be used as in other cases. This issue was transformed by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/net/ethernet/ibm/ibmvnic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 5e9a93bdb518..737ae83a836a 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -3212,7 +3212,7 @@ static void __ibmvnic_reset(struct work_struct *work) adapter->state == VNIC_REMOVED) { spin_unlock_irqrestore(>state_lock, flags); kfree(rwi); - rc = EBUSY; + rc = -EBUSY; break; } -- 2.44.0
Re: [PATCH 1/3] x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n
On Fri, Apr 19, 2024, Will Deacon wrote: > On Mon, Apr 15, 2024 at 07:31:23AM -0700, Sean Christopherson wrote: > > On Mon, Apr 15, 2024, Geert Uytterhoeven wrote: > > Oof. I completely missed that "cpu_mitigations" wasn't x86-only. I can't > > think > > of better solution than an on-by-default generic Kconfig, though can't that > > it > > more simply be: > > > > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig > > index 2b8fd6bb7da0..5930cb56ee29 100644 > > --- a/drivers/base/Kconfig > > +++ b/drivers/base/Kconfig > > @@ -191,6 +191,9 @@ config GENERIC_CPU_AUTOPROBE > > config GENERIC_CPU_VULNERABILITIES > > bool > > > > +config SPECULATION_MITIGATIONS > > + def_bool !X86 > > + > > config SOC_BUS > > bool > > select GLOB > > I can't see this in -next yet. Do you plan to post it as a proper patch > to collect acks etc? Sorry, I neglected to Cc everyone. https://lore.kernel.org/all/20240417001507.2264512-2-sea...@google.com
Re: [PATCH 1/3] x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n
On Mon, Apr 15, 2024 at 07:31:23AM -0700, Sean Christopherson wrote: > On Mon, Apr 15, 2024, Geert Uytterhoeven wrote: > > On Sat, Apr 13, 2024 at 11:38 AM Michael Ellerman > > wrote: > > > Michael Ellerman writes: > > > > Stephen Rothwell writes: > > > ... > > > >> On Tue, 9 Apr 2024 10:51:05 -0700 Sean Christopherson > > > >> wrote: > > > ... > > > >>> diff --git a/kernel/cpu.c b/kernel/cpu.c > > > >>> index 8f6affd051f7..07ad53b7f119 100644 > > > >>> --- a/kernel/cpu.c > > > >>> +++ b/kernel/cpu.c > > > >>> @@ -3207,7 +3207,8 @@ enum cpu_mitigations { > > > >>> }; > > > >>> > > > >>> static enum cpu_mitigations cpu_mitigations __ro_after_init = > > > >>> - CPU_MITIGATIONS_AUTO; > > > >>> + IS_ENABLED(CONFIG_SPECULATION_MITIGATIONS) ? CPU_MITIGATIONS_AUTO > > > >>> : > > > >>> +CPU_MITIGATIONS_OFF; > > > >>> > > > >>> static int __init mitigations_parse_cmdline(char *arg) > > > >>> { > > > > > > I think a minimal workaround/fix would be: > > > > > > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig > > > index 2b8fd6bb7da0..290be2f9e909 100644 > > > --- a/drivers/base/Kconfig > > > +++ b/drivers/base/Kconfig > > > @@ -191,6 +191,10 @@ config GENERIC_CPU_AUTOPROBE > > > config GENERIC_CPU_VULNERABILITIES > > > bool > > > > > > +config SPECULATION_MITIGATIONS > > > + def_bool y > > > + depends on !X86 > > > + > > > config SOC_BUS > > > bool > > > select GLOB > > > > Thanks, that works for me (on arm64), so > > Tested-by: Geert Uytterhoeven > > Oof. I completely missed that "cpu_mitigations" wasn't x86-only. I can't > think > of better solution than an on-by-default generic Kconfig, though can't that it > more simply be: > > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig > index 2b8fd6bb7da0..5930cb56ee29 100644 > --- a/drivers/base/Kconfig > +++ b/drivers/base/Kconfig > @@ -191,6 +191,9 @@ config GENERIC_CPU_AUTOPROBE > config GENERIC_CPU_VULNERABILITIES > bool > > +config SPECULATION_MITIGATIONS > + def_bool !X86 > + > config SOC_BUS > bool > select GLOB I can't see this in -next yet. Do you plan to post it as a proper patch to collect acks etc? Cheers, Will
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
On Fri, Apr 19, 2024, Will Deacon wrote: > > @@ -663,10 +669,22 @@ static __always_inline kvm_mn_ret_t > > __kvm_handle_hva_range(struct kvm *kvm, > > break; > > } > > r.ret |= range->handler(kvm, _range); > > + > > + /* > > + * Use a precise gfn-based TLB flush when possible, as > > + * most mmu_notifier events affect a small-ish range. > > + * Fall back to a full TLB flush if the gfn-based flush > > + * fails, and don't bother trying the gfn-based flush > > + * if a full flush is already pending. > > + */ > > + if (range->flush_on_ret && !need_flush && r.ret && > > + kvm_arch_flush_remote_tlbs_range(kvm, > > gfn_range.start, > > + gfn_range.end - > > gfn_range.start + 1)) > > What's that '+ 1' needed for here? (a) To see if you're paying attention. (b) Because more is always better. (c) Because math is hard. (d) Because I haven't tested this. (e) Both (c) and (d).
Re: [PATCH] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings
On Thu, Apr 04, 2024 at 03:38:53PM +1100, Rohan McLure wrote: > Prior to this patch, data races are detectable by KCSAN of the following > forms: > > [1] Asynchronous calls to mmiowb_set_pending() from an interrupt context > or otherwise outside of a critical section > [2] Interrupted critical sections, where the interrupt will itself > acquire a lock > > In case [1], calling context does not need an mmiowb() call to be > issued, otherwise it would do so itself. Such calls to > mmiowb_set_pending() are either idempotent or no-ops. > > In case [2], irrespective of when the interrupt occurs, the interrupt > will acquire and release its locks prior to its return, nesting_count > will continue balanced. In the worst case, the interrupted critical > section during a mmiowb_spin_unlock() call observes an mmiowb to be > pending and afterward is interrupted, leading to an extraneous call to > mmiowb(). This data race is clearly innocuous. > > Resolve KCSAN warnings of type [1] by means of READ_ONCE, WRITE_ONCE. > As increments and decrements to nesting_count are balanced by interrupt > contexts, resolve type [2] warnings by simply revoking instrumentation, > with data_race() rather than READ_ONCE() and WRITE_ONCE(), the memory > consistency semantics of plain-accesses will still lead to correct > behaviour. > > Signed-off-by: Rohan McLure > Reported-by: Michael Ellerman > Reported-by: Gautam Menghani > Tested-by: Gautam Menghani > Acked-by: Arnd Bergmann > --- > Previously discussed here: > https://lore.kernel.org/linuxppc-dev/20230510033117.1395895-4-rmcl...@linux.ibm.com/ > But pushed back due to affecting other architectures. Reissuing, to > linuxppc-dev, as it does not enact a functional change. > --- > include/asm-generic/mmiowb.h | 15 +-- > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h > index 5698fca3bf56..f8c7c8a84e9e 100644 > --- a/include/asm-generic/mmiowb.h > +++ b/include/asm-generic/mmiowb.h > @@ -37,25 +37,28 @@ static inline void mmiowb_set_pending(void) > struct mmiowb_state *ms = __mmiowb_state(); > > if (likely(ms->nesting_count)) > - ms->mmiowb_pending = ms->nesting_count; > + WRITE_ONCE(ms->mmiowb_pending, ms->nesting_count); > } > > static inline void mmiowb_spin_lock(void) > { > struct mmiowb_state *ms = __mmiowb_state(); > - ms->nesting_count++; > + > + /* Increment need not be atomic. Nestedness is balanced over > interrupts. */ > + data_race(ms->nesting_count++); > } > > static inline void mmiowb_spin_unlock(void) > { > struct mmiowb_state *ms = __mmiowb_state(); > + u16 pending = READ_ONCE(ms->mmiowb_pending); > > - if (unlikely(ms->mmiowb_pending)) { > - ms->mmiowb_pending = 0; > + WRITE_ONCE(ms->mmiowb_pending, 0); Why are you changing this store to be unconditional? Will
[powerpc:next-test] BUILD REGRESSION 3cfdbc2274139bbcc09a44f8aa18d08c0abfd51e
tree/branch: https://github.com/linuxppc/linux next-test branch HEAD: 3cfdbc2274139bbcc09a44f8aa18d08c0abfd51e powerpc: drop port I/O helpers for CONFIG_HAS_IOPORT=n Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202404190958.9qodkytx-...@intel.com https://lore.kernel.org/oe-kbuild-all/202404191016.zzs7aptf-...@intel.com Error/Warning: (recently discovered and may have been fixed) arch/powerpc/include/asm/io.h:695:13: error: implicit declaration of function 'inb'; did you mean '_insb'? [-Werror=implicit-function-declaration] arch/powerpc/include/asm/io.h:698:14: error: implicit declaration of function 'outb'; did you mean '_outsb'? [-Werror=implicit-function-declaration] arch/powerpc/kernel/udbg_16550.c:167:9: error: call to undeclared function 'inb'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] arch/powerpc/kernel/udbg_16550.c:172:2: error: call to undeclared function 'outb'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] drivers/tty/serial/8250/8250_early.c:50:10: error: call to undeclared function 'inb'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] drivers/tty/serial/8250/8250_early.c:74:3: error: call to undeclared function 'outb'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] drivers/tty/serial/8250/8250_port.c:344:2: error: call to undeclared function 'outb'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] drivers/tty/serial/8250/8250_port.c:345:9: error: call to undeclared function 'inb'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] Error/Warning ids grouped by kconfigs: gcc_recent_errors `-- powerpc-ppa8548_defconfig |-- arch-powerpc-include-asm-io.h:error:implicit-declaration-of-function-inb `-- arch-powerpc-include-asm-io.h:error:implicit-declaration-of-function-outb clang_recent_errors `-- powerpc-sequoia_defconfig |-- arch-powerpc-kernel-udbg_16550.c:error:call-to-undeclared-function-inb-ISO-C99-and-later-do-not-support-implicit-function-declarations |-- arch-powerpc-kernel-udbg_16550.c:error:call-to-undeclared-function-outb-ISO-C99-and-later-do-not-support-implicit-function-declarations |-- drivers-tty-serial-8250_early.c:error:call-to-undeclared-function-inb-ISO-C99-and-later-do-not-support-implicit-function-declarations |-- drivers-tty-serial-8250_early.c:error:call-to-undeclared-function-outb-ISO-C99-and-later-do-not-support-implicit-function-declarations |-- drivers-tty-serial-8250_port.c:error:call-to-undeclared-function-inb-ISO-C99-and-later-do-not-support-implicit-function-declarations `-- drivers-tty-serial-8250_port.c:error:call-to-undeclared-function-outb-ISO-C99-and-later-do-not-support-implicit-function-declarations elapsed time: 1403m configs tested: 173 configs skipped: 5 tested configs: alpha allnoconfig gcc alphaallyesconfig gcc alpha defconfig gcc arc allmodconfig gcc arc allnoconfig gcc arc allyesconfig gcc arc axs103_defconfig gcc arc defconfig gcc arc haps_hs_smp_defconfig gcc arc randconfig-001-20240419 gcc arc randconfig-002-20240419 gcc arm allmodconfig gcc arm allnoconfig clang arm allyesconfig gcc arm defconfig clang arm ixp4xx_defconfig gcc arm omap2plus_defconfig gcc arm randconfig-001-20240419 gcc arm randconfig-002-20240419 clang arm randconfig-003-20240419 gcc arm randconfig-004-20240419 clang arm64allmodconfig clang arm64 allnoconfig gcc arm64allyesconfig clang arm64 defconfig gcc arm64 randconfig-001-20240419 clang arm64 randconfig-002-20240419 clang arm64 randconfig-003-20240419 clang arm64 randconfig-004-20240419 clang csky allmodconfig gcc csky allnoconfig gcc csky allyesconfig gcc cskydefconfig gcc csky randconfig-001-20240419 gcc csky randconfig-002-20240419 gcc hexagon allmodconfig clang
[PATCH] powerpc/dart: Drop unnecessary call to kmemleak_no_scan()
Erhard reported that kmemleak was showing a warning at boot: kmemleak: Not scanning unknown object at 0xc0007f00 CPU: 0 PID: 0 Comm: swapper Not tainted 5.19.0-rc3-PMacG5+ #2 Call Trace: .dump_stack_lvl+0x7c/0xc4 (unreliable) .kmemleak_no_scan+0xe0/0x100 .iommu_init_early_dart+0x2f0/0x924 .pmac_probe+0x1b0/0x20c .setup_arch+0x1b8/0x674 .start_kernel+0xdc/0xb74 start_here_common+0x1c/0x44 DART table allocated at: (ptrval) Which he bisected to a change in kmemleak, commit 23c2d497de21 ("mm: kmemleak: take a full lowmem check in kmemleak_*_phys()"). Because pmac_probe() is called before mem_topology_setup(), the min/ max PFN variables are still zero. That causes kmemleak_alloc_phys() to ignore the allocation, because the checks against the PFN fail. Then kmemleak_no_scan() can't find the allocation and prints warning. Given that kmemleak_alloc_phys() is ignoring the allocation to begin with, there's no need to call kmemleak_no_scan() at all, which avoids the warning. Reported-by: Erhard Furtner Closes: https://lore.kernel.org/all/bug-216156-206...@https.bugzilla.kernel.org%2F/ Signed-off-by: Michael Ellerman --- arch/powerpc/sysdev/dart_iommu.c | 4 1 file changed, 4 deletions(-) diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c index 98096bbfd62e..c0d10c149661 100644 --- a/arch/powerpc/sysdev/dart_iommu.c +++ b/arch/powerpc/sysdev/dart_iommu.c @@ -24,7 +24,6 @@ #include #include #include -#include #include #include #include @@ -243,9 +242,6 @@ static void __init allocate_dart(void) if (!dart_tablebase) panic("Failed to allocate 16MB below 2GB for DART table\n"); - /* There is no point scanning the DART space for leaks*/ - kmemleak_no_scan((void *)dart_tablebase); - /* Allocate a spare page to map all invalid DART pages. We need to do * that to work around what looks like a problem with the HT bridge * prefetching into invalid pages and corrupting data -- 2.44.0
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
On Thu, Apr 18, 2024 at 12:53:26PM -0700, Sean Christopherson wrote: > On Thu, Apr 18, 2024, Will Deacon wrote: > > On Mon, Apr 15, 2024 at 10:03:51AM -0700, Sean Christopherson wrote: > > > On Sat, Apr 13, 2024, Marc Zyngier wrote: > > > > On Fri, 12 Apr 2024 15:54:22 +0100, Sean Christopherson > > > > wrote: > > > > > > > > > > On Fri, Apr 12, 2024, Marc Zyngier wrote: > > > > > > On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon > > > > > > wrote: > > > > > > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote: > > > > > > > Also, if you're in the business of hacking the MMU notifier code, > > > > > > > it > > > > > > > would be really great to change the .clear_flush_young() callback > > > > > > > so > > > > > > > that the architecture could handle the TLB invalidation. At the > > > > > > > moment, > > > > > > > the core KVM code invalidates the whole VMID courtesy of > > > > > > > 'flush_on_ret' > > > > > > > being set by kvm_handle_hva_range(), whereas we could do a much > > > > > > > lighter-weight and targetted TLBI in the architecture page-table > > > > > > > code > > > > > > > when we actually update the ptes for small ranges. > > > > > > > > > > > > Indeed, and I was looking at this earlier this week as it has a > > > > > > pretty > > > > > > devastating effect with NV (it blows the shadow S2 for that VMID, > > > > > > with > > > > > > costly consequences). > > > > > > > > > > > > In general, it feels like the TLB invalidation should stay with the > > > > > > code that deals with the page tables, as it has a pretty good idea > > > > > > of > > > > > > what needs to be invalidated and how -- specially on architectures > > > > > > that have a HW-broadcast facility like arm64. > > > > > > > > > > Would this be roughly on par with an in-line flush on arm64? The > > > > > simpler, more > > > > > straightforward solution would be to let architectures override > > > > > flush_on_ret, > > > > > but I would prefer something like the below as x86 can also utilize a > > > > > range-based > > > > > flush when running as a nested hypervisor. > > > > > > ... > > > > > > > I think this works for us on HW that has range invalidation, which > > > > would already be a positive move. > > > > > > > > For the lesser HW that isn't range capable, it also gives the > > > > opportunity to perform the iteration ourselves or go for the nuclear > > > > option if the range is larger than some arbitrary constant (though > > > > this is additional work). > > > > > > > > But this still considers the whole range as being affected by > > > > range->handler(). It'd be interesting to try and see whether more > > > > precise tracking is (or isn't) generally beneficial. > > > > > > I assume the idea would be to let arch code do single-page invalidations > > > of > > > stage-2 entries for each gfn? > > > > Right, as it's the only code which knows which ptes actually ended up > > being aged. > > > > > Unless I'm having a brain fart, x86 can't make use of that functionality. > > > Intel > > > doesn't provide any way to do targeted invalidation of stage-2 mappings. > > > AMD > > > provides an instruction to do broadcast invalidations, but it takes a > > > virtual > > > address, i.e. a stage-1 address. I can't tell if it's a host virtual > > > address or > > > a guest virtual address, but it's a moot point because KVM doen't have > > > the guest > > > virtual address, and if it's a host virtual address, there would need to > > > be valid > > > mappings in the host page tables for it to work, which KVM can't > > > guarantee. > > > > Ah, so it sounds like it would need to be an arch opt-in then. > > Even if x86 (or some other arch code) could use the precise tracking, I think > it > would make sense to have the behavior be arch specific. Adding infrastructure > to get information from arch code, only to turn around and give it back to > arch > code would be odd. Sorry, yes, that's what I had in mind. Basically, a way for the arch code to say "I've handled the TLBI, don't worry about it." > Unless arm64 can't do the invalidation immediately after aging the stage-2 > PTE, > the best/easiest solution would be to let arm64 opt out of the common TLB > flush > when a SPTE is made young. > > With the range-based flushing bundled in, this? > > --- > include/linux/kvm_host.h | 2 ++ > virt/kvm/kvm_main.c | 40 +--- > 2 files changed, 27 insertions(+), 15 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index afbc99264ffa..8fe5f5e16919 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -2010,6 +2010,8 @@ extern const struct kvm_stats_header > kvm_vcpu_stats_header; > extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[]; > > #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER > +int kvm_arch_flush_tlb_if_young(void); > + > static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned
Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE
Hello, On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote: > Gaurav Batra writes: > > At the time of LPAR reboot, partition firmware provides Open Firmware > > property ibm,dma-window for the PE. This property is provided on the PCI > > bus the PE is attached to. > > AFAICS you're actually describing a bug that happens during boot *up*? > > Describing it as "reboot" makes me think you're talking about the > shutdown path. I think that will confuse people, me at least :) there is probably an assumption that it must have been running previously for the errors to happen in the first place but given the error state persists for a day it may be a very long 'reboot'. Thanks Michal > > cheers > > > There are execptions where the partition firmware might not provide this > > property for the PE at the time of LPAR reboot. One of the scenario is > > where the firmware has frozen the PE due to some error conditions. This > > PE is frozen for 24 hours or unless the whole system is reinitialized. > > > > Within this time frame, if the LPAR is rebooted, the frozen PE will be > > presented to the LPAR but ibm,dma-window property could be missing. > > > > Today, under these circumstances, the LPAR oopses with NULL pointer > > dereference, when configuring the PCI bus the PE is attached to. > > > > BUG: Kernel NULL pointer dereference on read at 0x00c8 > > Faulting instruction address: 0xc01024c0 > > Oops: Kernel access of bad area, sig: 7 [#1] > > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries > > Modules linked in: > > Supported: Yes > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1 > > Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 > > of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries > > NIP: c01024c0 LR: c01024b0 CTR: c0102450 > > REGS: c37db5c0 TRAP: 0300 Not tainted (6.4.0-150600.9-default) > > MSR: 82009033 CR: 28000822 XER: > > > > CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0 > > ... > > NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0 > > LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 > > Call Trace: > > pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable) > > pcibios_setup_bus_self+0x1c0/0x370 > > __of_scan_bus+0x2f8/0x330 > > pcibios_scan_phb+0x280/0x3d0 > > pcibios_init+0x88/0x12c > > do_one_initcall+0x60/0x320 > > kernel_init_freeable+0x344/0x3e4 > > kernel_init+0x34/0x1d0 > > ret_from_kernel_user_thread+0x14/0x1c > > > > Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of > > ibm,dma-window") > > Signed-off-by: Gaurav Batra > > --- > > arch/powerpc/platforms/pseries/iommu.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c > > b/arch/powerpc/platforms/pseries/iommu.c > > index e8c4129697b1..e808d5b1fa49 100644 > > --- a/arch/powerpc/platforms/pseries/iommu.c > > +++ b/arch/powerpc/platforms/pseries/iommu.c > > @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus > > *bus) > > * parent bus. During reboot, there will be ibm,dma-window property to > > * define DMA window. For kdump, there will at least be default window > > or DDW > > * or both. > > +* There is an exception to the above. In case the PE goes into frozen > > +* state, firmware may not provide ibm,dma-window property at the time > > +* of LPAR reboot. > > */ > > > > + if (!pdn) { > > + pr_debug(" no ibm,dma-window property !\n"); > > + return; > > + } > > + > > ppci = PCI_DN(pdn); > > > > pr_debug(" parent is %pOF, iommu_table: 0x%p\n", > > > > base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702 > > -- > > 2.39.3 (Apple Git-146)
[Bug 216715] kernel 6.1-rc5 + KASAN_OUTLINE fails to boot at very early stage when DEBUG_PAGEALLOC_ENABLE_DEFAULT is enabled (PowerMac G4 3,6)
https://bugzilla.kernel.org/show_bug.cgi?id=216715 Erhard F. (erhar...@mailbox.org) changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |OBSOLETE --- Comment #9 from Erhard F. (erhar...@mailbox.org) --- Cannot reproduce on kernels v6.8.x and current v6.9-rc4, outline + inline KASAN work as intended. So closing here. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216156] [bisected] kmemleak: Not scanning unknown object at 0xc00000007f000000
https://bugzilla.kernel.org/show_bug.cgi?id=216156 --- Comment #17 from Erhard F. (erhar...@mailbox.org) --- Just noticed the issue is still there in v6.9-rc4. Does some side effects prevented the patch to get upstream or was it overlooked? ;) [..] Page orders: linear mapping = 12, virtual = 12, io = 12 hash-mmu: Initializing hash mmu with SLB Linux version 6.9.0-rc4-PMacG5-dirty (root@T1000) (gcc (Gentoo 13.2.1_p20240210 p14) 13.2.1 20240210, GNU ld (Gentoo 2.42 p3) 2.42.0) #1 SMP Fri Apr 19 01:23:36 CEST 2024 ioremap() called early from pmac_feature_init+0x5e8/0x12d8. Use early_ioremap() instead ioremap() called early from pmac_feature_init+0x1040/0x12d8. Use early_ioremap() instead Found U4 memory controller & host bridge @ 0xf800 revision: 0x42 Mapped at 0xc0003e008000 ioremap() called early from probe_one_macio+0x3e4/0x6cc. Use early_ioremap() instead Found a Shasta mac-io controller, rev: 0, mapped at 0x(ptrval) PowerMac motherboard: PowerMac G5 Dual Core ioremap() called early from btext_map+0x6c/0xf0. Use early_ioremap() instead ioremap() called early from iommu_init_early_dart+0x174/0xac4. Use early_ioremap() instead kmemleak: Not scanning unknown object at 0xc0007f00 CPU: 0 PID: 0 Comm: swapper Not tainted 6.9.0-rc4-PMacG5-dirty #1 Call Trace: [c22f3c50] [c0d6524c] dump_stack_lvl+0x84/0x10c (unreliable) [c22f3c80] [c0379118] kmemleak_no_scan+0x118/0x12c [c22f3cf0] [c2029110] iommu_init_early_dart+0x1e4/0xac4 [c22f3e00] [c202a3d8] pmac_probe+0x158/0x290 [c22f3e60] [c2011608] setup_arch+0x2b0/0xa18 [c22f3f30] [c200613c] start_kernel+0x98/0x81c [c22f3fe0] [c000c848] start_here_common+0x1c/0x20 DART table allocated at: (ptrval) DART IOMMU initialized for U4 type chipset Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac printk: legacy bootconsole [udbg0] enabled CPU maps initialized for 1 thread per core (thread shift is 0) Allocated 1040 bytes for 2 pacas - phys_mem_size = 0x4 dcache_bsize = 0x80 icache_bsize = 0x80 cpu_features = 0x0100900c218a possible= 0x001ffbebfbffb18f always = 0x0180 cpu_user_features = 0xdc08 0x mmu_features = 0x0c008001 firmware_features = 0x vmalloc start = 0xc0003d00 IO start = 0xc0003e00 vmemmap start = 0xc0003f00 hash-mmu: ppc64_pft_size= 0x0 hash-mmu: htab_hash_mask= 0x1f - -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH] i2c: mpc: Removal of of_node_put with __free for auto cleanup
Chris Packham writes: > On 16/04/24 08:54, Andi Shyti wrote: >> Hi Abhinav, >> >>> /* Enable I2C interrupts for mpc5121 */ >>> - node_ctrl = of_find_compatible_node(NULL, NULL, >>> - "fsl,mpc5121-i2c-ctrl"); >>> + struct device_node *node_ctrl __free(device_node) = >> How have you tested this? > > I'm not sure I know anyone that still has a mpc5121. Maybe someone on > linuxppc-dev? Not that I know of. The driver loads on my T4240, but doesn't hit the right paths to test this patch. No objection to it being merged though. cheers
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Thu, Apr 18, 2024 at 02:01:22PM -0700, Song Liu wrote: > On Thu, Apr 18, 2024 at 10:54 AM Mike Rapoport wrote: > > > > On Thu, Apr 18, 2024 at 09:13:27AM -0700, Song Liu wrote: > > > On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport wrote: > > > > > > > > > > > > I'm looking at execmem_types more as definition of the consumers, > > > > > > maybe I > > > > > > should have named the enum execmem_consumer at the first place. > > > > > > > > > > I think looking at execmem_type from consumers' point of view adds > > > > > unnecessary complexity. IIUC, for most (if not all) archs, ftrace, > > > > > kprobe, > > > > > and bpf (and maybe also module text) all have the same requirements. > > > > > Did I miss something? > > > > > > > > It's enough to have one architecture with different constrains for > > > > kprobes > > > > and bpf to warrant a type for each. > > > > > > AFAICT, some of these constraints can be changed without too much work. > > > > But why? > > I honestly don't understand what are you trying to optimize here. A few > > lines of initialization in execmem_info? > > IIUC, having separate EXECMEM_BPF and EXECMEM_KPROBE makes it > harder for bpf and kprobe to share the same ROX page. In many use cases, > a 2MiB page (assuming x86_64) is enough for all BPF, kprobe, ftrace, and > module text. It is not efficient if we have to allocate separate pages for > each > of these use cases. If this is not a problem, the current approach works. The caching of large ROX pages does not need to be per type. In the POC I've posted for caching of large ROX pages on x86 [1], the cache is global and to make kprobes and bpf use it it's enough to set a flag in execmem_info. [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > Thanks, > Song -- Sincerely yours, Mike.
Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE
Gaurav Batra writes: > At the time of LPAR reboot, partition firmware provides Open Firmware > property ibm,dma-window for the PE. This property is provided on the PCI > bus the PE is attached to. AFAICS you're actually describing a bug that happens during boot *up*? Describing it as "reboot" makes me think you're talking about the shutdown path. I think that will confuse people, me at least :) cheers > There are execptions where the partition firmware might not provide this > property for the PE at the time of LPAR reboot. One of the scenario is > where the firmware has frozen the PE due to some error conditions. This > PE is frozen for 24 hours or unless the whole system is reinitialized. > > Within this time frame, if the LPAR is rebooted, the frozen PE will be > presented to the LPAR but ibm,dma-window property could be missing. > > Today, under these circumstances, the LPAR oopses with NULL pointer > dereference, when configuring the PCI bus the PE is attached to. > > BUG: Kernel NULL pointer dereference on read at 0x00c8 > Faulting instruction address: 0xc01024c0 > Oops: Kernel access of bad area, sig: 7 [#1] > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries > Modules linked in: > Supported: Yes > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1 > Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 > (NM1060_023) hv:phyp pSeries > NIP: c01024c0 LR: c01024b0 CTR: c0102450 > REGS: c37db5c0 TRAP: 0300 Not tainted (6.4.0-150600.9-default) > MSR: 82009033 CR: 28000822 XER: > CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0 > ... > NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0 > LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 > Call Trace: > pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable) > pcibios_setup_bus_self+0x1c0/0x370 > __of_scan_bus+0x2f8/0x330 > pcibios_scan_phb+0x280/0x3d0 > pcibios_init+0x88/0x12c > do_one_initcall+0x60/0x320 > kernel_init_freeable+0x344/0x3e4 > kernel_init+0x34/0x1d0 > ret_from_kernel_user_thread+0x14/0x1c > > Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of > ibm,dma-window") > Signed-off-by: Gaurav Batra > --- > arch/powerpc/platforms/pseries/iommu.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/arch/powerpc/platforms/pseries/iommu.c > b/arch/powerpc/platforms/pseries/iommu.c > index e8c4129697b1..e808d5b1fa49 100644 > --- a/arch/powerpc/platforms/pseries/iommu.c > +++ b/arch/powerpc/platforms/pseries/iommu.c > @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus > *bus) >* parent bus. During reboot, there will be ibm,dma-window property to >* define DMA window. For kdump, there will at least be default window > or DDW >* or both. > + * There is an exception to the above. In case the PE goes into frozen > + * state, firmware may not provide ibm,dma-window property at the time > + * of LPAR reboot. >*/ > > + if (!pdn) { > + pr_debug(" no ibm,dma-window property !\n"); > + return; > + } > + > ppci = PCI_DN(pdn); > > pr_debug(" parent is %pOF, iommu_table: 0x%p\n", > > base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702 > -- > 2.39.3 (Apple Git-146)