Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-19 Thread Mike Rapoport
On Fri, Apr 19, 2024 at 02:42:16PM -0700, Song Liu wrote:
> On Fri, Apr 19, 2024 at 1:00 PM Mike Rapoport  wrote:
> >
> > On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote:
> > > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport  wrote:
> > > [...]
> > > > > >
> > > > > > [1] 
> > > > > > https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org
> > > > >
> > > > > For the ROX to work, we need different users (module text, kprobe, 
> > > > > etc.) to have
> > > > > the same execmem_range. From [1]:
> > > > >
> > > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t 
> > > > > size)
> > > > > {
> > > > > ...
> > > > >p = __execmem_cache_alloc(size);
> > > > >if (p)
> > > > >return p;
> > > > >   err = execmem_cache_populate(range, size);
> > > > > ...
> > > > > }
> > > > >
> > > > > We are calling __execmem_cache_alloc() without range. For this to 
> > > > > work,
> > > > > we can only call execmem_cache_alloc() with one execmem_range.
> > > >
> > > > Actually, on x86 this will "just work" because everything shares the 
> > > > same
> > > > address space :)
> > > >
> > > > The 2M pages in the cache will be in the modules space, so
> > > > __execmem_cache_alloc() will always return memory from that address 
> > > > space.
> > > >
> > > > For other architectures this indeed needs to be fixed with passing the
> > > > range to __execmem_cache_alloc() and limiting search in the cache for 
> > > > that
> > > > range.
> > >
> > > I think we at least need the "map to" concept (initially proposed by 
> > > Thomas)
> > > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE
> > > maps to EXECMEM_MODULE_TEXT, so that all these actually share
> > > the same range.
> >
> > Why?
> 
> IIUC, we need to update __execmem_cache_alloc() to take a range pointer as
> input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe
> will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing
> the "range" object, we will have to compare different range parameters to 
> check
> we can share cached pages between module text and kprobe, which is not
> efficient. Did I miss something?

We can always share large ROX pages as long as they are within the correct
address space. The permissions for them are ROX and the alignment
differences are due to KASAN and this is handled during allocation of the
large page to refill the cache. __execmem_cache_alloc() only needs to limit
the search for the address space of the range.

And regardless, they way we deal with sharing of the cache can be sorted
out later.

> Thanks,
> Song

-- 
Sincerely yours,
Mike.


[GIT PULL] Please pull powerpc/linux.git powerpc-6.9-3 tag

2024-04-19 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Linus,

Please pull some more powerpc fixes for 6.9:

The following changes since commit 39cd87c4eb2b893354f3b850f916353f2658ae6f:

  Linux 6.9-rc2 (2024-03-31 14:32:39 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-6.9-3

for you to fetch changes up to 210cfef579260ed6c3b700e7baeae51a5e183f43:

  selftests/powerpc/papr-vpd: Fix missing variable initialization (2024-04-12 
14:40:07 +1000)

- --
powerpc fixes for 6.9 #3

 - Fix wireguard loading failure on pre-Power10 due to Power10 crypto routines.

 - Fix papr-vpd selftest failure due to missing variable initialization.

 - Avoid unnecessary get/put in spapr_tce_platform_iommu_attach_dev().

Thanks to: Geetika Moolchandani, Jason Gunthorpe, Michal Suchánek, Nathan Lynch,
Shivaprasad G Bhat.

- --
Michael Ellerman (1):
  powerpc/crypto/chacha-p10: Fix failure on non Power10

Nathan Lynch (1):
  selftests/powerpc/papr-vpd: Fix missing variable initialization

Shivaprasad G Bhat (1):
  powerpc/iommu: Refactor spapr_tce_platform_iommu_attach_dev()


 arch/powerpc/crypto/chacha-p10-glue.c   | 8 +++-
 arch/powerpc/kernel/iommu.c | 7 +++
 tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c | 2 +-
 3 files changed, 11 insertions(+), 6 deletions(-)
-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmYjNHwACgkQUevqPMjh
pYCbuA//VsCCZotHFjCuEMX4hKFE+FS7doJ6HAaj1lysJLVQsXW94pEufcy53hXX
y4DQ06Y2vV/zJTrbYk/3vukeMDcc7ZSE9IKZlnSvZP1GizGF2dQE/AjsqNfJ5iV4
1YcE22fmjoSIL2pbD337r1BHqEwGvcd47Rm0SET6yIxQamTKHLAa7Q9OIindDMaq
Qu/g6LFoa856idDkXRY96xNWRodGzNyhZ3+16F4jsE2pbGiYNbjqyua3ke8leWgI
DInYs9KholNeg285Qbjq6VaXcMjJG6po8N/MDJDxdysVhF+9GTXlyKSN9oURDUtf
IDWpbrF3n7WE3iUllucKWNXBQbfX5yoFlZ7PnOj+Tjm1RcyBZDBhoq05DNcosFY0
Mgbg7hKluxUP7ijSMOOKKYasMrNqJofo+f3S78L3sBDjIW2w/j6+ZP6EQ515t7he
/oac+WfFr6m0rEtQtZuf0TMPWv60odfjkLuGFomcd5qpldXesMfR62rT+RSN3M7m
7kJj4duYpKJJZEdOdj9PzVMqQd/dd5cpeO2b1vJef7G6lHXRaB0tTaDMnvZBNtlL
7etRE5g1ErvOFGnoAzeXcU3GFU2ZO49cpKMjqwubivufNXDwZ7VKJfPjIsb4cfm0
lsquI0A5FhkIDr1G/0wL4VyCisKYCAuYj9NCSf9F9iM+92w+fKo=
=gaHh
-END PGP SIGNATURE-


Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-19 Thread Song Liu
On Fri, Apr 19, 2024 at 1:00 PM Mike Rapoport  wrote:
>
> On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote:
> > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport  wrote:
> > [...]
> > > > >
> > > > > [1] 
> > > > > https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org
> > > >
> > > > For the ROX to work, we need different users (module text, kprobe, 
> > > > etc.) to have
> > > > the same execmem_range. From [1]:
> > > >
> > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t 
> > > > size)
> > > > {
> > > > ...
> > > >p = __execmem_cache_alloc(size);
> > > >if (p)
> > > >return p;
> > > >   err = execmem_cache_populate(range, size);
> > > > ...
> > > > }
> > > >
> > > > We are calling __execmem_cache_alloc() without range. For this to work,
> > > > we can only call execmem_cache_alloc() with one execmem_range.
> > >
> > > Actually, on x86 this will "just work" because everything shares the same
> > > address space :)
> > >
> > > The 2M pages in the cache will be in the modules space, so
> > > __execmem_cache_alloc() will always return memory from that address space.
> > >
> > > For other architectures this indeed needs to be fixed with passing the
> > > range to __execmem_cache_alloc() and limiting search in the cache for that
> > > range.
> >
> > I think we at least need the "map to" concept (initially proposed by Thomas)
> > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE
> > maps to EXECMEM_MODULE_TEXT, so that all these actually share
> > the same range.
>
> Why?

IIUC, we need to update __execmem_cache_alloc() to take a range pointer as
input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe
will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing
the "range" object, we will have to compare different range parameters to check
we can share cached pages between module text and kprobe, which is not
efficient. Did I miss something?

Thanks,
Song


Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-19 Thread Mike Rapoport
On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote:
> On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport  wrote:
> [...]
> > > >
> > > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org
> > >
> > > For the ROX to work, we need different users (module text, kprobe, etc.) 
> > > to have
> > > the same execmem_range. From [1]:
> > >
> > > static void *execmem_cache_alloc(struct execmem_range *range, size_t size)
> > > {
> > > ...
> > >p = __execmem_cache_alloc(size);
> > >if (p)
> > >return p;
> > >   err = execmem_cache_populate(range, size);
> > > ...
> > > }
> > >
> > > We are calling __execmem_cache_alloc() without range. For this to work,
> > > we can only call execmem_cache_alloc() with one execmem_range.
> >
> > Actually, on x86 this will "just work" because everything shares the same
> > address space :)
> >
> > The 2M pages in the cache will be in the modules space, so
> > __execmem_cache_alloc() will always return memory from that address space.
> >
> > For other architectures this indeed needs to be fixed with passing the
> > range to __execmem_cache_alloc() and limiting search in the cache for that
> > range.
> 
> I think we at least need the "map to" concept (initially proposed by Thomas)
> to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE
> maps to EXECMEM_MODULE_TEXT, so that all these actually share
> the same range.

Why?
 
> Does this make sense?
> 
> Song

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-19 Thread Song Liu
On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport  wrote:
[...]
> > >
> > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org
> >
> > For the ROX to work, we need different users (module text, kprobe, etc.) to 
> > have
> > the same execmem_range. From [1]:
> >
> > static void *execmem_cache_alloc(struct execmem_range *range, size_t size)
> > {
> > ...
> >p = __execmem_cache_alloc(size);
> >if (p)
> >return p;
> >   err = execmem_cache_populate(range, size);
> > ...
> > }
> >
> > We are calling __execmem_cache_alloc() without range. For this to work,
> > we can only call execmem_cache_alloc() with one execmem_range.
>
> Actually, on x86 this will "just work" because everything shares the same
> address space :)
>
> The 2M pages in the cache will be in the modules space, so
> __execmem_cache_alloc() will always return memory from that address space.
>
> For other architectures this indeed needs to be fixed with passing the
> range to __execmem_cache_alloc() and limiting search in the cache for that
> range.

I think we at least need the "map to" concept (initially proposed by Thomas)
to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE
maps to EXECMEM_MODULE_TEXT, so that all these actually share
the same range.

Does this make sense?

Song


Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-19 Thread Mike Rapoport
On Fri, Apr 19, 2024 at 08:54:40AM -0700, Song Liu wrote:
> On Thu, Apr 18, 2024 at 11:56 PM Mike Rapoport  wrote:
> >
> > On Thu, Apr 18, 2024 at 02:01:22PM -0700, Song Liu wrote:
> > > On Thu, Apr 18, 2024 at 10:54 AM Mike Rapoport  wrote:
> > > >
> > > > On Thu, Apr 18, 2024 at 09:13:27AM -0700, Song Liu wrote:
> > > > > On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport  wrote:
> > > > > > > >
> > > > > > > > I'm looking at execmem_types more as definition of the 
> > > > > > > > consumers, maybe I
> > > > > > > > should have named the enum execmem_consumer at the first place.
> > > > > > >
> > > > > > > I think looking at execmem_type from consumers' point of view adds
> > > > > > > unnecessary complexity. IIUC, for most (if not all) archs, 
> > > > > > > ftrace, kprobe,
> > > > > > > and bpf (and maybe also module text) all have the same 
> > > > > > > requirements.
> > > > > > > Did I miss something?
> > > > > >
> > > > > > It's enough to have one architecture with different constrains for 
> > > > > > kprobes
> > > > > > and bpf to warrant a type for each.
> > > > >
> > > > > AFAICT, some of these constraints can be changed without too much 
> > > > > work.
> > > >
> > > > But why?
> > > > I honestly don't understand what are you trying to optimize here. A few
> > > > lines of initialization in execmem_info?
> > >
> > > IIUC, having separate EXECMEM_BPF and EXECMEM_KPROBE makes it
> > > harder for bpf and kprobe to share the same ROX page. In many use cases,
> > > a 2MiB page (assuming x86_64) is enough for all BPF, kprobe, ftrace, and
> > > module text. It is not efficient if we have to allocate separate pages 
> > > for each
> > > of these use cases. If this is not a problem, the current approach works.
> >
> > The caching of large ROX pages does not need to be per type.
> >
> > In the POC I've posted for caching of large ROX pages on x86 [1], the cache 
> > is
> > global and to make kprobes and bpf use it it's enough to set a flag in
> > execmem_info.
> >
> > [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org
> 
> For the ROX to work, we need different users (module text, kprobe, etc.) to 
> have
> the same execmem_range. From [1]:
> 
> static void *execmem_cache_alloc(struct execmem_range *range, size_t size)
> {
> ...
>p = __execmem_cache_alloc(size);
>if (p)
>return p;
>   err = execmem_cache_populate(range, size);
> ...
> }
> 
> We are calling __execmem_cache_alloc() without range. For this to work,
> we can only call execmem_cache_alloc() with one execmem_range.

Actually, on x86 this will "just work" because everything shares the same
address space :)

The 2M pages in the cache will be in the modules space, so
__execmem_cache_alloc() will always return memory from that address space.

For other architectures this indeed needs to be fixed with passing the
range to __execmem_cache_alloc() and limiting search in the cache for that
range.
 
> Did I miss something?
> 
> Thanks,
> Song

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES

2024-04-19 Thread Christophe Leroy


Le 19/04/2024 à 17:49, Mike Rapoport a écrit :
> Hi Masami,
> 
> On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote:
>> Hi Mike,
>>
>> On Thu, 11 Apr 2024 19:00:50 +0300
>> Mike Rapoport  wrote:
>>
>>> From: "Mike Rapoport (IBM)" 
>>>
>>> kprobes depended on CONFIG_MODULES because it has to allocate memory for
>>> code.
>>>
>>> Since code allocations are now implemented with execmem, kprobes can be
>>> enabled in non-modular kernels.
>>>
>>> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
>>> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
>>> dependency of CONFIG_KPROBES on CONFIG_MODULES.
>>
>> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4.
>> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
>> function body? We have enough dummy functions for that, so it should
>> not make a problem.
> 
> The code in check_kprobe_address_safe() that gets the module and checks for
> __init functions does not compile with IS_ENABLED(CONFIG_MODULES).
> I can pull it out to a helper or leave #ifdef in the function body,
> whichever you prefer.

As far as I can see, the only problem is MODULE_STATE_COMING.
Can we move 'enum module_state' out of #ifdef CONFIG_MODULES in module.h  ?


>   
>> -- 
>> Masami Hiramatsu
> 


Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-19 Thread Song Liu
On Thu, Apr 18, 2024 at 11:56 PM Mike Rapoport  wrote:
>
> On Thu, Apr 18, 2024 at 02:01:22PM -0700, Song Liu wrote:
> > On Thu, Apr 18, 2024 at 10:54 AM Mike Rapoport  wrote:
> > >
> > > On Thu, Apr 18, 2024 at 09:13:27AM -0700, Song Liu wrote:
> > > > On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport  wrote:
> > > > > > >
> > > > > > > I'm looking at execmem_types more as definition of the consumers, 
> > > > > > > maybe I
> > > > > > > should have named the enum execmem_consumer at the first place.
> > > > > >
> > > > > > I think looking at execmem_type from consumers' point of view adds
> > > > > > unnecessary complexity. IIUC, for most (if not all) archs, ftrace, 
> > > > > > kprobe,
> > > > > > and bpf (and maybe also module text) all have the same requirements.
> > > > > > Did I miss something?
> > > > >
> > > > > It's enough to have one architecture with different constrains for 
> > > > > kprobes
> > > > > and bpf to warrant a type for each.
> > > >
> > > > AFAICT, some of these constraints can be changed without too much work.
> > >
> > > But why?
> > > I honestly don't understand what are you trying to optimize here. A few
> > > lines of initialization in execmem_info?
> >
> > IIUC, having separate EXECMEM_BPF and EXECMEM_KPROBE makes it
> > harder for bpf and kprobe to share the same ROX page. In many use cases,
> > a 2MiB page (assuming x86_64) is enough for all BPF, kprobe, ftrace, and
> > module text. It is not efficient if we have to allocate separate pages for 
> > each
> > of these use cases. If this is not a problem, the current approach works.
>
> The caching of large ROX pages does not need to be per type.
>
> In the POC I've posted for caching of large ROX pages on x86 [1], the cache is
> global and to make kprobes and bpf use it it's enough to set a flag in
> execmem_info.
>
> [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org

For the ROX to work, we need different users (module text, kprobe, etc.) to have
the same execmem_range. From [1]:

static void *execmem_cache_alloc(struct execmem_range *range, size_t size)
{
...
   p = __execmem_cache_alloc(size);
   if (p)
   return p;
  err = execmem_cache_populate(range, size);
...
}

We are calling __execmem_cache_alloc() without range. For this to work,
we can only call execmem_cache_alloc() with one execmem_range.

Did I miss something?

Thanks,
Song


Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES

2024-04-19 Thread Mike Rapoport
Hi Masami,

On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote:
> Hi Mike,
> 
> On Thu, 11 Apr 2024 19:00:50 +0300
> Mike Rapoport  wrote:
> 
> > From: "Mike Rapoport (IBM)" 
> > 
> > kprobes depended on CONFIG_MODULES because it has to allocate memory for
> > code.
> > 
> > Since code allocations are now implemented with execmem, kprobes can be
> > enabled in non-modular kernels.
> > 
> > Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
> > modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
> > dependency of CONFIG_KPROBES on CONFIG_MODULES.
> 
> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4.
> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
> function body? We have enough dummy functions for that, so it should
> not make a problem.

The code in check_kprobe_address_safe() that gets the module and checks for
__init functions does not compile with IS_ENABLED(CONFIG_MODULES). 
I can pull it out to a helper or leave #ifdef in the function body,
whichever you prefer.
 
> -- 
> Masami Hiramatsu

-- 
Sincerely yours,
Mike.


Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE

2024-04-19 Thread Gaurav Batra
You are right. I think, the "reboot" should be replaced with just "boot 
up". If there are no other comments, or code changes, I can re-word the 
commit message and submit for review.


Thanks,

Gaurav

On 4/19/24 6:11 AM, Michal Suchánek wrote:

Hello,

On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote:

Gaurav Batra  writes:

At the time of LPAR reboot, partition firmware provides Open Firmware
property ibm,dma-window for the PE. This property is provided on the PCI
bus the PE is attached to.

AFAICS you're actually describing a bug that happens during boot *up*?

Describing it as "reboot" makes me think you're talking about the
shutdown path. I think that will confuse people, me at least :)

there is probably an assumption that it must have been running
previously for the errors to happen in the first place but given the
error state persists for a day it may be a very long 'reboot'.

Thanks

Michal

cheers


There are execptions where the partition firmware might not provide this
property for the PE at the time of LPAR reboot. One of the scenario is
where the firmware has frozen the PE due to some error conditions. This
PE is frozen for 24 hours or unless the whole system is reinitialized.

Within this time frame, if the LPAR is rebooted, the frozen PE will be
presented to the LPAR but ibm,dma-window property could be missing.

Today, under these circumstances, the LPAR oopses with NULL pointer
dereference, when configuring the PCI bus the PE is attached to.

BUG: Kernel NULL pointer dereference on read at 0x00c8
Faulting instruction address: 0xc01024c0
Oops: Kernel access of bad area, sig: 7 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
Supported: Yes
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 
(NM1060_023) hv:phyp pSeries
NIP:  c01024c0 LR: c01024b0 CTR: c0102450
REGS: c37db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
MSR:  82009033   CR: 28000822  XER: 
CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0
...
NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
Call Trace:
pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
pcibios_setup_bus_self+0x1c0/0x370
__of_scan_bus+0x2f8/0x330
pcibios_scan_phb+0x280/0x3d0
pcibios_init+0x88/0x12c
do_one_initcall+0x60/0x320
kernel_init_freeable+0x344/0x3e4
kernel_init+0x34/0x1d0
ret_from_kernel_user_thread+0x14/0x1c

Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of 
ibm,dma-window")
Signed-off-by: Gaurav Batra 
---
  arch/powerpc/platforms/pseries/iommu.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index e8c4129697b1..e808d5b1fa49 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus 
*bus)
 * parent bus. During reboot, there will be ibm,dma-window property to
 * define DMA window. For kdump, there will at least be default window 
or DDW
 * or both.
+* There is an exception to the above. In case the PE goes into frozen
+* state, firmware may not provide ibm,dma-window property at the time
+* of LPAR reboot.
 */
  
+	if (!pdn) {

+   pr_debug("  no ibm,dma-window property !\n");
+   return;
+   }
+
ppci = PCI_DN(pdn);
  
  	pr_debug("  parent is %pOF, iommu_table: 0x%p\n",


base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
--
2.39.3 (Apple Git-146)


Re: [PATCH 1/3] x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n

2024-04-19 Thread Will Deacon
On Fri, Apr 19, 2024 at 07:06:00AM -0700, Sean Christopherson wrote:
> On Fri, Apr 19, 2024, Will Deacon wrote:
> > On Mon, Apr 15, 2024 at 07:31:23AM -0700, Sean Christopherson wrote:
> > > On Mon, Apr 15, 2024, Geert Uytterhoeven wrote:
> > > Oof.  I completely missed that "cpu_mitigations" wasn't x86-only.  I 
> > > can't think
> > > of better solution than an on-by-default generic Kconfig, though can't 
> > > that it
> > > more simply be:
> > > 
> > > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > > index 2b8fd6bb7da0..5930cb56ee29 100644
> > > --- a/drivers/base/Kconfig
> > > +++ b/drivers/base/Kconfig
> > > @@ -191,6 +191,9 @@ config GENERIC_CPU_AUTOPROBE
> > >  config GENERIC_CPU_VULNERABILITIES
> > > bool
> > >  
> > > +config SPECULATION_MITIGATIONS
> > > +   def_bool !X86
> > > +
> > >  config SOC_BUS
> > > bool
> > > select GLOB
> > 
> > I can't see this in -next yet. Do you plan to post it as a proper patch
> > to collect acks etc?
> 
> Sorry, I neglected to Cc everyone.
> 
> https://lore.kernel.org/all/20240417001507.2264512-2-sea...@google.com

Ah, thanks. I'll go Ack that...

Will


[PATCH] ibmvnic: Use -EBUSY in __ibmvnic_reset()

2024-04-19 Thread Markus Elfring
From: Markus Elfring 
Date: Fri, 19 Apr 2024 15:46:17 +0200

Add a minus sign before the error code “EBUSY”
so that a negative value will be used as in other cases.

This issue was transformed by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 5e9a93bdb518..737ae83a836a 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3212,7 +3212,7 @@ static void __ibmvnic_reset(struct work_struct *work)
adapter->state == VNIC_REMOVED) {
spin_unlock_irqrestore(>state_lock, flags);
kfree(rwi);
-   rc = EBUSY;
+   rc = -EBUSY;
break;
}

--
2.44.0



Re: [PATCH 1/3] x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n

2024-04-19 Thread Sean Christopherson
On Fri, Apr 19, 2024, Will Deacon wrote:
> On Mon, Apr 15, 2024 at 07:31:23AM -0700, Sean Christopherson wrote:
> > On Mon, Apr 15, 2024, Geert Uytterhoeven wrote:
> > Oof.  I completely missed that "cpu_mitigations" wasn't x86-only.  I can't 
> > think
> > of better solution than an on-by-default generic Kconfig, though can't that 
> > it
> > more simply be:
> > 
> > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > index 2b8fd6bb7da0..5930cb56ee29 100644
> > --- a/drivers/base/Kconfig
> > +++ b/drivers/base/Kconfig
> > @@ -191,6 +191,9 @@ config GENERIC_CPU_AUTOPROBE
> >  config GENERIC_CPU_VULNERABILITIES
> > bool
> >  
> > +config SPECULATION_MITIGATIONS
> > +   def_bool !X86
> > +
> >  config SOC_BUS
> > bool
> > select GLOB
> 
> I can't see this in -next yet. Do you plan to post it as a proper patch
> to collect acks etc?

Sorry, I neglected to Cc everyone.

https://lore.kernel.org/all/20240417001507.2264512-2-sea...@google.com


Re: [PATCH 1/3] x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n

2024-04-19 Thread Will Deacon
On Mon, Apr 15, 2024 at 07:31:23AM -0700, Sean Christopherson wrote:
> On Mon, Apr 15, 2024, Geert Uytterhoeven wrote:
> > On Sat, Apr 13, 2024 at 11:38 AM Michael Ellerman  
> > wrote:
> > > Michael Ellerman  writes:
> > > > Stephen Rothwell  writes:
> > > ...
> > > >> On Tue,  9 Apr 2024 10:51:05 -0700 Sean Christopherson 
> > > >>  wrote:
> > > ...
> > > >>> diff --git a/kernel/cpu.c b/kernel/cpu.c
> > > >>> index 8f6affd051f7..07ad53b7f119 100644
> > > >>> --- a/kernel/cpu.c
> > > >>> +++ b/kernel/cpu.c
> > > >>> @@ -3207,7 +3207,8 @@ enum cpu_mitigations {
> > > >>>  };
> > > >>>
> > > >>>  static enum cpu_mitigations cpu_mitigations __ro_after_init =
> > > >>> -   CPU_MITIGATIONS_AUTO;
> > > >>> +   IS_ENABLED(CONFIG_SPECULATION_MITIGATIONS) ? CPU_MITIGATIONS_AUTO 
> > > >>> :
> > > >>> +CPU_MITIGATIONS_OFF;
> > > >>>
> > > >>>  static int __init mitigations_parse_cmdline(char *arg)
> > > >>>  {
> > >
> > > I think a minimal workaround/fix would be:
> > >
> > > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > > index 2b8fd6bb7da0..290be2f9e909 100644
> > > --- a/drivers/base/Kconfig
> > > +++ b/drivers/base/Kconfig
> > > @@ -191,6 +191,10 @@ config GENERIC_CPU_AUTOPROBE
> > >  config GENERIC_CPU_VULNERABILITIES
> > > bool
> > >
> > > +config SPECULATION_MITIGATIONS
> > > +   def_bool y
> > > +   depends on !X86
> > > +
> > >  config SOC_BUS
> > > bool
> > > select GLOB
> > 
> > Thanks, that works for me (on arm64), so
> > Tested-by: Geert Uytterhoeven 
> 
> Oof.  I completely missed that "cpu_mitigations" wasn't x86-only.  I can't 
> think
> of better solution than an on-by-default generic Kconfig, though can't that it
> more simply be:
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 2b8fd6bb7da0..5930cb56ee29 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -191,6 +191,9 @@ config GENERIC_CPU_AUTOPROBE
>  config GENERIC_CPU_VULNERABILITIES
> bool
>  
> +config SPECULATION_MITIGATIONS
> +   def_bool !X86
> +
>  config SOC_BUS
> bool
> select GLOB

I can't see this in -next yet. Do you plan to post it as a proper patch
to collect acks etc?

Cheers,

Will


Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-19 Thread Sean Christopherson
On Fri, Apr 19, 2024, Will Deacon wrote:
> > @@ -663,10 +669,22 @@ static __always_inline kvm_mn_ret_t 
> > __kvm_handle_hva_range(struct kvm *kvm,
> > break;
> > }
> > r.ret |= range->handler(kvm, _range);
> > +
> > +  /*
> > +   * Use a precise gfn-based TLB flush when possible, as
> > +   * most mmu_notifier events affect a small-ish range.
> > +   * Fall back to a full TLB flush if the gfn-based flush
> > +   * fails, and don't bother trying the gfn-based flush
> > +   * if a full flush is already pending.
> > +   */
> > +  if (range->flush_on_ret && !need_flush && r.ret &&
> > +  kvm_arch_flush_remote_tlbs_range(kvm, 
> > gfn_range.start,
> > +   gfn_range.end - 
> > gfn_range.start + 1))
> 
> What's that '+ 1' needed for here?

 (a) To see if you're paying attention.
 (b) Because more is always better.
 (c) Because math is hard.
 (d) Because I haven't tested this.
 (e) Both (c) and (d).


Re: [PATCH] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings

2024-04-19 Thread Will Deacon
On Thu, Apr 04, 2024 at 03:38:53PM +1100, Rohan McLure wrote:
> Prior to this patch, data races are detectable by KCSAN of the following
> forms:
> 
> [1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
> or otherwise outside of a critical section
> [2] Interrupted critical sections, where the interrupt will itself
> acquire a lock
> 
> In case [1], calling context does not need an mmiowb() call to be
> issued, otherwise it would do so itself. Such calls to
> mmiowb_set_pending() are either idempotent or no-ops.
> 
> In case [2], irrespective of when the interrupt occurs, the interrupt
> will acquire and release its locks prior to its return, nesting_count
> will continue balanced. In the worst case, the interrupted critical
> section during a mmiowb_spin_unlock() call observes an mmiowb to be
> pending and afterward is interrupted, leading to an extraneous call to
> mmiowb(). This data race is clearly innocuous.
> 
> Resolve KCSAN warnings of type [1] by means of READ_ONCE, WRITE_ONCE.
> As increments and decrements to nesting_count are balanced by interrupt
> contexts, resolve type [2] warnings by simply revoking instrumentation,
> with data_race() rather than READ_ONCE() and WRITE_ONCE(), the memory
> consistency semantics of plain-accesses will still lead to correct
> behaviour.
> 
> Signed-off-by: Rohan McLure 
> Reported-by: Michael Ellerman 
> Reported-by: Gautam Menghani 
> Tested-by: Gautam Menghani 
> Acked-by: Arnd Bergmann 
> ---
> Previously discussed here:
> https://lore.kernel.org/linuxppc-dev/20230510033117.1395895-4-rmcl...@linux.ibm.com/
> But pushed back due to affecting other architectures. Reissuing, to
> linuxppc-dev, as it does not enact a functional change.
> ---
>  include/asm-generic/mmiowb.h | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
> index 5698fca3bf56..f8c7c8a84e9e 100644
> --- a/include/asm-generic/mmiowb.h
> +++ b/include/asm-generic/mmiowb.h
> @@ -37,25 +37,28 @@ static inline void mmiowb_set_pending(void)
>   struct mmiowb_state *ms = __mmiowb_state();
>  
>   if (likely(ms->nesting_count))
> - ms->mmiowb_pending = ms->nesting_count;
> + WRITE_ONCE(ms->mmiowb_pending, ms->nesting_count);
>  }
>  
>  static inline void mmiowb_spin_lock(void)
>  {
>   struct mmiowb_state *ms = __mmiowb_state();
> - ms->nesting_count++;
> +
> + /* Increment need not be atomic. Nestedness is balanced over 
> interrupts. */
> + data_race(ms->nesting_count++);
>  }
>  
>  static inline void mmiowb_spin_unlock(void)
>  {
>   struct mmiowb_state *ms = __mmiowb_state();
> + u16 pending = READ_ONCE(ms->mmiowb_pending);
>  
> - if (unlikely(ms->mmiowb_pending)) {
> - ms->mmiowb_pending = 0;
> + WRITE_ONCE(ms->mmiowb_pending, 0);

Why are you changing this store to be unconditional?

Will


[powerpc:next-test] BUILD REGRESSION 3cfdbc2274139bbcc09a44f8aa18d08c0abfd51e

2024-04-19 Thread kernel test robot
tree/branch: https://github.com/linuxppc/linux next-test
branch HEAD: 3cfdbc2274139bbcc09a44f8aa18d08c0abfd51e  powerpc: drop port I/O 
helpers for CONFIG_HAS_IOPORT=n

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202404190958.9qodkytx-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202404191016.zzs7aptf-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

arch/powerpc/include/asm/io.h:695:13: error: implicit declaration of function 
'inb'; did you mean '_insb'? [-Werror=implicit-function-declaration]
arch/powerpc/include/asm/io.h:698:14: error: implicit declaration of function 
'outb'; did you mean '_outsb'? [-Werror=implicit-function-declaration]
arch/powerpc/kernel/udbg_16550.c:167:9: error: call to undeclared function 
'inb'; ISO C99 and later do not support implicit function declarations 
[-Wimplicit-function-declaration]
arch/powerpc/kernel/udbg_16550.c:172:2: error: call to undeclared function 
'outb'; ISO C99 and later do not support implicit function declarations 
[-Wimplicit-function-declaration]
drivers/tty/serial/8250/8250_early.c:50:10: error: call to undeclared function 
'inb'; ISO C99 and later do not support implicit function declarations 
[-Wimplicit-function-declaration]
drivers/tty/serial/8250/8250_early.c:74:3: error: call to undeclared function 
'outb'; ISO C99 and later do not support implicit function declarations 
[-Wimplicit-function-declaration]
drivers/tty/serial/8250/8250_port.c:344:2: error: call to undeclared function 
'outb'; ISO C99 and later do not support implicit function declarations 
[-Wimplicit-function-declaration]
drivers/tty/serial/8250/8250_port.c:345:9: error: call to undeclared function 
'inb'; ISO C99 and later do not support implicit function declarations 
[-Wimplicit-function-declaration]

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
`-- powerpc-ppa8548_defconfig
|-- arch-powerpc-include-asm-io.h:error:implicit-declaration-of-function-inb
`-- 
arch-powerpc-include-asm-io.h:error:implicit-declaration-of-function-outb
clang_recent_errors
`-- powerpc-sequoia_defconfig
|-- 
arch-powerpc-kernel-udbg_16550.c:error:call-to-undeclared-function-inb-ISO-C99-and-later-do-not-support-implicit-function-declarations
|-- 
arch-powerpc-kernel-udbg_16550.c:error:call-to-undeclared-function-outb-ISO-C99-and-later-do-not-support-implicit-function-declarations
|-- 
drivers-tty-serial-8250_early.c:error:call-to-undeclared-function-inb-ISO-C99-and-later-do-not-support-implicit-function-declarations
|-- 
drivers-tty-serial-8250_early.c:error:call-to-undeclared-function-outb-ISO-C99-and-later-do-not-support-implicit-function-declarations
|-- 
drivers-tty-serial-8250_port.c:error:call-to-undeclared-function-inb-ISO-C99-and-later-do-not-support-implicit-function-declarations
`-- 
drivers-tty-serial-8250_port.c:error:call-to-undeclared-function-outb-ISO-C99-and-later-do-not-support-implicit-function-declarations

elapsed time: 1403m

configs tested: 173
configs skipped: 5

tested configs:
alpha allnoconfig   gcc  
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
arc  allmodconfig   gcc  
arc   allnoconfig   gcc  
arc  allyesconfig   gcc  
arc  axs103_defconfig   gcc  
arc defconfig   gcc  
arc haps_hs_smp_defconfig   gcc  
arc   randconfig-001-20240419   gcc  
arc   randconfig-002-20240419   gcc  
arm  allmodconfig   gcc  
arm   allnoconfig   clang
arm  allyesconfig   gcc  
arm defconfig   clang
arm  ixp4xx_defconfig   gcc  
arm   omap2plus_defconfig   gcc  
arm   randconfig-001-20240419   gcc  
arm   randconfig-002-20240419   clang
arm   randconfig-003-20240419   gcc  
arm   randconfig-004-20240419   clang
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64allyesconfig   clang
arm64   defconfig   gcc  
arm64 randconfig-001-20240419   clang
arm64 randconfig-002-20240419   clang
arm64 randconfig-003-20240419   clang
arm64 randconfig-004-20240419   clang
csky allmodconfig   gcc  
csky  allnoconfig   gcc  
csky allyesconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20240419   gcc  
csky  randconfig-002-20240419   gcc  
hexagon  allmodconfig   clang

[PATCH] powerpc/dart: Drop unnecessary call to kmemleak_no_scan()

2024-04-19 Thread Michael Ellerman
Erhard reported that kmemleak was showing a warning at boot:

  kmemleak: Not scanning unknown object at 0xc0007f00
  CPU: 0 PID: 0 Comm: swapper Not tainted 5.19.0-rc3-PMacG5+ #2
  Call Trace:
   .dump_stack_lvl+0x7c/0xc4 (unreliable)
   .kmemleak_no_scan+0xe0/0x100
   .iommu_init_early_dart+0x2f0/0x924
   .pmac_probe+0x1b0/0x20c
   .setup_arch+0x1b8/0x674
   .start_kernel+0xdc/0xb74
   start_here_common+0x1c/0x44
  DART table allocated at: (ptrval)

Which he bisected to a change in kmemleak, commit
23c2d497de21 ("mm: kmemleak: take a full lowmem check in kmemleak_*_phys()").

Because pmac_probe() is called before mem_topology_setup(), the min/
max PFN variables are still zero. That causes kmemleak_alloc_phys() to
ignore the allocation, because the checks against the PFN fail. Then
kmemleak_no_scan() can't find the allocation and prints warning.

Given that kmemleak_alloc_phys() is ignoring the allocation to begin
with, there's no need to call kmemleak_no_scan() at all, which avoids
the warning.

Reported-by: Erhard Furtner 
Closes: 
https://lore.kernel.org/all/bug-216156-206...@https.bugzilla.kernel.org%2F/
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/sysdev/dart_iommu.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 98096bbfd62e..c0d10c149661 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -24,7 +24,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -243,9 +242,6 @@ static void __init allocate_dart(void)
if (!dart_tablebase)
panic("Failed to allocate 16MB below 2GB for DART table\n");
 
-   /* There is no point scanning the DART space for leaks*/
-   kmemleak_no_scan((void *)dart_tablebase);
-
/* Allocate a spare page to map all invalid DART pages. We need to do
 * that to work around what looks like a problem with the HT bridge
 * prefetching into invalid pages and corrupting data
-- 
2.44.0



Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-19 Thread Will Deacon
On Thu, Apr 18, 2024 at 12:53:26PM -0700, Sean Christopherson wrote:
> On Thu, Apr 18, 2024, Will Deacon wrote:
> > On Mon, Apr 15, 2024 at 10:03:51AM -0700, Sean Christopherson wrote:
> > > On Sat, Apr 13, 2024, Marc Zyngier wrote:
> > > > On Fri, 12 Apr 2024 15:54:22 +0100, Sean Christopherson 
> > > >  wrote:
> > > > > 
> > > > > On Fri, Apr 12, 2024, Marc Zyngier wrote:
> > > > > > On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon  
> > > > > > wrote:
> > > > > > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> > > > > > > Also, if you're in the business of hacking the MMU notifier code, 
> > > > > > > it
> > > > > > > would be really great to change the .clear_flush_young() callback 
> > > > > > > so
> > > > > > > that the architecture could handle the TLB invalidation. At the 
> > > > > > > moment,
> > > > > > > the core KVM code invalidates the whole VMID courtesy of 
> > > > > > > 'flush_on_ret'
> > > > > > > being set by kvm_handle_hva_range(), whereas we could do a much
> > > > > > > lighter-weight and targetted TLBI in the architecture page-table 
> > > > > > > code
> > > > > > > when we actually update the ptes for small ranges.
> > > > > > 
> > > > > > Indeed, and I was looking at this earlier this week as it has a 
> > > > > > pretty
> > > > > > devastating effect with NV (it blows the shadow S2 for that VMID, 
> > > > > > with
> > > > > > costly consequences).
> > > > > > 
> > > > > > In general, it feels like the TLB invalidation should stay with the
> > > > > > code that deals with the page tables, as it has a pretty good idea 
> > > > > > of
> > > > > > what needs to be invalidated and how -- specially on architectures
> > > > > > that have a HW-broadcast facility like arm64.
> > > > > 
> > > > > Would this be roughly on par with an in-line flush on arm64?  The 
> > > > > simpler, more
> > > > > straightforward solution would be to let architectures override 
> > > > > flush_on_ret,
> > > > > but I would prefer something like the below as x86 can also utilize a 
> > > > > range-based
> > > > > flush when running as a nested hypervisor.
> > > 
> > > ...
> > > 
> > > > I think this works for us on HW that has range invalidation, which
> > > > would already be a positive move.
> > > > 
> > > > For the lesser HW that isn't range capable, it also gives the
> > > > opportunity to perform the iteration ourselves or go for the nuclear
> > > > option if the range is larger than some arbitrary constant (though
> > > > this is additional work).
> > > > 
> > > > But this still considers the whole range as being affected by
> > > > range->handler(). It'd be interesting to try and see whether more
> > > > precise tracking is (or isn't) generally beneficial.
> > > 
> > > I assume the idea would be to let arch code do single-page invalidations 
> > > of
> > > stage-2 entries for each gfn?
> > 
> > Right, as it's the only code which knows which ptes actually ended up
> > being aged.
> > 
> > > Unless I'm having a brain fart, x86 can't make use of that functionality. 
> > >  Intel
> > > doesn't provide any way to do targeted invalidation of stage-2 mappings.  
> > > AMD
> > > provides an instruction to do broadcast invalidations, but it takes a 
> > > virtual
> > > address, i.e. a stage-1 address.  I can't tell if it's a host virtual 
> > > address or
> > > a guest virtual address, but it's a moot point because KVM doen't have 
> > > the guest
> > > virtual address, and if it's a host virtual address, there would need to 
> > > be valid
> > > mappings in the host page tables for it to work, which KVM can't 
> > > guarantee.
> > 
> > Ah, so it sounds like it would need to be an arch opt-in then.
> 
> Even if x86 (or some other arch code) could use the precise tracking, I think 
> it
> would make sense to have the behavior be arch specific.  Adding infrastructure
> to get information from arch code, only to turn around and give it back to 
> arch
> code would be odd.

Sorry, yes, that's what I had in mind. Basically, a way for the arch code
to say "I've handled the TLBI, don't worry about it."

> Unless arm64 can't do the invalidation immediately after aging the stage-2 
> PTE,
> the best/easiest solution would be to let arm64 opt out of the common TLB 
> flush
> when a SPTE is made young.
> 
> With the range-based flushing bundled in, this?
> 
> ---
>  include/linux/kvm_host.h |  2 ++
>  virt/kvm/kvm_main.c  | 40 +---
>  2 files changed, 27 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index afbc99264ffa..8fe5f5e16919 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2010,6 +2010,8 @@ extern const struct kvm_stats_header 
> kvm_vcpu_stats_header;
>  extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[];
>  
>  #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
> +int kvm_arch_flush_tlb_if_young(void);
> +
>  static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned 

Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE

2024-04-19 Thread Michal Suchánek
Hello,

On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote:
> Gaurav Batra  writes:
> > At the time of LPAR reboot, partition firmware provides Open Firmware
> > property ibm,dma-window for the PE. This property is provided on the PCI
> > bus the PE is attached to.
> 
> AFAICS you're actually describing a bug that happens during boot *up*?
> 
> Describing it as "reboot" makes me think you're talking about the
> shutdown path. I think that will confuse people, me at least :)

there is probably an assumption that it must have been running
previously for the errors to happen in the first place but given the
error state persists for a day it may be a very long 'reboot'.

Thanks

Michal
> 
> cheers
> 
> > There are execptions where the partition firmware might not provide this
> > property for the PE at the time of LPAR reboot. One of the scenario is
> > where the firmware has frozen the PE due to some error conditions. This
> > PE is frozen for 24 hours or unless the whole system is reinitialized.
> >
> > Within this time frame, if the LPAR is rebooted, the frozen PE will be
> > presented to the LPAR but ibm,dma-window property could be missing.
> >
> > Today, under these circumstances, the LPAR oopses with NULL pointer
> > dereference, when configuring the PCI bus the PE is attached to.
> >
> > BUG: Kernel NULL pointer dereference on read at 0x00c8
> > Faulting instruction address: 0xc01024c0
> > Oops: Kernel access of bad area, sig: 7 [#1]
> > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > Modules linked in:
> > Supported: Yes
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
> > Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 
> > of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries
> > NIP:  c01024c0 LR: c01024b0 CTR: c0102450
> > REGS: c37db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
> > MSR:  82009033   CR: 28000822  XER: 
> > 
> > CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0
> > ...
> > NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
> > LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
> > Call Trace:
> > pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
> > pcibios_setup_bus_self+0x1c0/0x370
> > __of_scan_bus+0x2f8/0x330
> > pcibios_scan_phb+0x280/0x3d0
> > pcibios_init+0x88/0x12c
> > do_one_initcall+0x60/0x320
> > kernel_init_freeable+0x344/0x3e4
> > kernel_init+0x34/0x1d0
> > ret_from_kernel_user_thread+0x14/0x1c
> >
> > Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of 
> > ibm,dma-window")
> > Signed-off-by: Gaurav Batra 
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> > b/arch/powerpc/platforms/pseries/iommu.c
> > index e8c4129697b1..e808d5b1fa49 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus 
> > *bus)
> >  * parent bus. During reboot, there will be ibm,dma-window property to
> >  * define DMA window. For kdump, there will at least be default window 
> > or DDW
> >  * or both.
> > +* There is an exception to the above. In case the PE goes into frozen
> > +* state, firmware may not provide ibm,dma-window property at the time
> > +* of LPAR reboot.
> >  */
> >  
> > +   if (!pdn) {
> > +   pr_debug("  no ibm,dma-window property !\n");
> > +   return;
> > +   }
> > +
> > ppci = PCI_DN(pdn);
> >  
> > pr_debug("  parent is %pOF, iommu_table: 0x%p\n",
> >
> > base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
> > -- 
> > 2.39.3 (Apple Git-146)


[Bug 216715] kernel 6.1-rc5 + KASAN_OUTLINE fails to boot at very early stage when DEBUG_PAGEALLOC_ENABLE_DEFAULT is enabled (PowerMac G4 3,6)

2024-04-19 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=216715

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |OBSOLETE

--- Comment #9 from Erhard F. (erhar...@mailbox.org) ---
Cannot reproduce on kernels v6.8.x and current v6.9-rc4, outline + inline KASAN
work as intended.

So closing here.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 216156] [bisected] kmemleak: Not scanning unknown object at 0xc00000007f000000

2024-04-19 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=216156

--- Comment #17 from Erhard F. (erhar...@mailbox.org) ---
Just noticed the issue is still there in v6.9-rc4. Does some side effects
prevented the patch to get upstream or was it overlooked? ;)

[..]
Page orders: linear mapping = 12, virtual = 12, io = 12
hash-mmu: Initializing hash mmu with SLB
Linux version 6.9.0-rc4-PMacG5-dirty (root@T1000) (gcc (Gentoo 13.2.1_p20240210
p14) 13.2.1 20240210, GNU ld (Gentoo 2.42 p3) 2.42.0) #1 SMP Fri Apr 19
01:23:36 CEST 2024
ioremap() called early from pmac_feature_init+0x5e8/0x12d8. Use early_ioremap()
instead
ioremap() called early from pmac_feature_init+0x1040/0x12d8. Use
early_ioremap() instead
Found U4 memory controller & host bridge @ 0xf800 revision: 0x42
Mapped at 0xc0003e008000
ioremap() called early from probe_one_macio+0x3e4/0x6cc. Use early_ioremap()
instead
Found a Shasta mac-io controller, rev: 0, mapped at 0x(ptrval)
PowerMac motherboard: PowerMac G5 Dual Core
ioremap() called early from btext_map+0x6c/0xf0. Use early_ioremap() instead
ioremap() called early from iommu_init_early_dart+0x174/0xac4. Use
early_ioremap() instead
kmemleak: Not scanning unknown object at 0xc0007f00
CPU: 0 PID: 0 Comm: swapper Not tainted 6.9.0-rc4-PMacG5-dirty #1
Call Trace:
[c22f3c50] [c0d6524c] dump_stack_lvl+0x84/0x10c (unreliable)
[c22f3c80] [c0379118] kmemleak_no_scan+0x118/0x12c
[c22f3cf0] [c2029110] iommu_init_early_dart+0x1e4/0xac4
[c22f3e00] [c202a3d8] pmac_probe+0x158/0x290
[c22f3e60] [c2011608] setup_arch+0x2b0/0xa18
[c22f3f30] [c200613c] start_kernel+0x98/0x81c
[c22f3fe0] [c000c848] start_here_common+0x1c/0x20
DART table allocated at: (ptrval)
DART IOMMU initialized for U4 type chipset
Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
printk: legacy bootconsole [udbg0] enabled
CPU maps initialized for 1 thread per core
 (thread shift is 0)
Allocated 1040 bytes for 2 pacas
-
phys_mem_size = 0x4
dcache_bsize  = 0x80
icache_bsize  = 0x80
cpu_features  = 0x0100900c218a
  possible= 0x001ffbebfbffb18f
  always  = 0x0180
cpu_user_features = 0xdc08 0x
mmu_features  = 0x0c008001
firmware_features = 0x
vmalloc start = 0xc0003d00
IO start  = 0xc0003e00
vmemmap start = 0xc0003f00
hash-mmu: ppc64_pft_size= 0x0
hash-mmu: htab_hash_mask= 0x1f
-

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] i2c: mpc: Removal of of_node_put with __free for auto cleanup

2024-04-19 Thread Michael Ellerman
Chris Packham  writes:
> On 16/04/24 08:54, Andi Shyti wrote:
>> Hi Abhinav,
>>
>>> /* Enable I2C interrupts for mpc5121 */
>>> -   node_ctrl = of_find_compatible_node(NULL, NULL,
>>> -   "fsl,mpc5121-i2c-ctrl");
>>> +   struct device_node *node_ctrl __free(device_node) =
>> How have you tested this?
>
> I'm not sure I know anyone that still has a mpc5121. Maybe someone on 
> linuxppc-dev?

Not that I know of.

The driver loads on my T4240, but doesn't hit the right paths to test
this patch.

No objection to it being merged though.

cheers


Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-19 Thread Mike Rapoport
On Thu, Apr 18, 2024 at 02:01:22PM -0700, Song Liu wrote:
> On Thu, Apr 18, 2024 at 10:54 AM Mike Rapoport  wrote:
> >
> > On Thu, Apr 18, 2024 at 09:13:27AM -0700, Song Liu wrote:
> > > On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport  wrote:
> > > > > >
> > > > > > I'm looking at execmem_types more as definition of the consumers, 
> > > > > > maybe I
> > > > > > should have named the enum execmem_consumer at the first place.
> > > > >
> > > > > I think looking at execmem_type from consumers' point of view adds
> > > > > unnecessary complexity. IIUC, for most (if not all) archs, ftrace, 
> > > > > kprobe,
> > > > > and bpf (and maybe also module text) all have the same requirements.
> > > > > Did I miss something?
> > > >
> > > > It's enough to have one architecture with different constrains for 
> > > > kprobes
> > > > and bpf to warrant a type for each.
> > >
> > > AFAICT, some of these constraints can be changed without too much work.
> >
> > But why?
> > I honestly don't understand what are you trying to optimize here. A few
> > lines of initialization in execmem_info?
> 
> IIUC, having separate EXECMEM_BPF and EXECMEM_KPROBE makes it
> harder for bpf and kprobe to share the same ROX page. In many use cases,
> a 2MiB page (assuming x86_64) is enough for all BPF, kprobe, ftrace, and
> module text. It is not efficient if we have to allocate separate pages for 
> each
> of these use cases. If this is not a problem, the current approach works.

The caching of large ROX pages does not need to be per type. 

In the POC I've posted for caching of large ROX pages on x86 [1], the cache is
global and to make kprobes and bpf use it it's enough to set a flag in
execmem_info.

[1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org

> Thanks,
> Song

-- 
Sincerely yours,
Mike.


Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE

2024-04-19 Thread Michael Ellerman
Gaurav Batra  writes:
> At the time of LPAR reboot, partition firmware provides Open Firmware
> property ibm,dma-window for the PE. This property is provided on the PCI
> bus the PE is attached to.

AFAICS you're actually describing a bug that happens during boot *up*?

Describing it as "reboot" makes me think you're talking about the
shutdown path. I think that will confuse people, me at least :)

cheers

> There are execptions where the partition firmware might not provide this
> property for the PE at the time of LPAR reboot. One of the scenario is
> where the firmware has frozen the PE due to some error conditions. This
> PE is frozen for 24 hours or unless the whole system is reinitialized.
>
> Within this time frame, if the LPAR is rebooted, the frozen PE will be
> presented to the LPAR but ibm,dma-window property could be missing.
>
> Today, under these circumstances, the LPAR oopses with NULL pointer
> dereference, when configuring the PCI bus the PE is attached to.
>
> BUG: Kernel NULL pointer dereference on read at 0x00c8
> Faulting instruction address: 0xc01024c0
> Oops: Kernel access of bad area, sig: 7 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in:
> Supported: Yes
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
> Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 
> (NM1060_023) hv:phyp pSeries
> NIP:  c01024c0 LR: c01024b0 CTR: c0102450
> REGS: c37db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
> MSR:  82009033   CR: 28000822  XER: 
> CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0
> ...
> NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
> LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
> Call Trace:
>   pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
>   pcibios_setup_bus_self+0x1c0/0x370
>   __of_scan_bus+0x2f8/0x330
>   pcibios_scan_phb+0x280/0x3d0
>   pcibios_init+0x88/0x12c
>   do_one_initcall+0x60/0x320
>   kernel_init_freeable+0x344/0x3e4
>   kernel_init+0x34/0x1d0
>   ret_from_kernel_user_thread+0x14/0x1c
>
> Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of 
> ibm,dma-window")
> Signed-off-by: Gaurav Batra 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index e8c4129697b1..e808d5b1fa49 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus 
> *bus)
>* parent bus. During reboot, there will be ibm,dma-window property to
>* define DMA window. For kdump, there will at least be default window 
> or DDW
>* or both.
> +  * There is an exception to the above. In case the PE goes into frozen
> +  * state, firmware may not provide ibm,dma-window property at the time
> +  * of LPAR reboot.
>*/
>  
> + if (!pdn) {
> + pr_debug("  no ibm,dma-window property !\n");
> + return;
> + }
> +
>   ppci = PCI_DN(pdn);
>  
>   pr_debug("  parent is %pOF, iommu_table: 0x%p\n",
>
> base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
> -- 
> 2.39.3 (Apple Git-146)