[GIT PULL] Please pull powerpc/linux.git powerpc-6.10-3 tag

2024-06-22 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Linus,

Please pull some more powerpc fixes for 6.10:

The following changes since commit c3f38fa61af77b49866b006939479069cd451173:

  Linux 6.10-rc2 (2024-06-02 15:44:56 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-6.10-3

for you to fetch changes up to a986fa57fd81a1430e00b3c6cf8a325d6f894a63:

  KVM: PPC: Book3S HV: Prevent UAF in kvm_spapr_tce_attach_iommu_group() 
(2024-06-16 10:20:11 +1000)

- --
powerpc fixes for 6.10 #3

 - Prevent use-after-free in 64-bit KVM VFIO

 - Add generated Power8 crypto asm to .gitignore

Thanks to: Al Viro, Nathan Lynch.

- --
Michael Ellerman (1):
  KVM: PPC: Book3S HV: Prevent UAF in kvm_spapr_tce_attach_iommu_group()

Nathan Lynch (1):
  powerpc/crypto: Add generated P8 asm to .gitignore


 arch/powerpc/crypto/.gitignore   |  2 ++
 arch/powerpc/kvm/book3s_64_vio.c | 18 +-
 2 files changed, 15 insertions(+), 5 deletions(-)
-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmZ3gucACgkQUevqPMjh
pYC25xAAsSnTIzbZpHdg3Bdb0vqdgzQimS9TbXGm5Qnea6rZawXYcSEoLuUABWZY
ZkNDoezJoL9nks7JGO1lbTICUNFKleFSlBLye4WgRn7NQBlgFP2GYiH1hXJIlE0C
qNyqY8k9uhJQor6CYt4eygskVwjpdX6oeIcHcUNWQy7/p1jID48DFS7QHib0WWm+
wwuiPhdqBwlbJCQUO0zBmDgg8rAhPmeGSR1iHWJxk69CIcEOoXK8sxv1ZTKuE0YN
clOKeAPrlZ3dz2jDojcMUzckFxg9J/Wlozk+m4LVl4XVj5hV7TqBpT4BVJNoMk1i
qWV00bVg7sEWXQ9CGR71NKpdeE4pIeiN4EAEkW+nSmlJ0x9htadNychode+9cakb
E0U/fb/rB6T32UJsEUAFF2Dq8dG5wWXHPqn0rHh9v63tPvnteUisSFM9DN7Be9a4
UziItFANSmt3AK0uvBMgoYk8HM2USLb4WvigWdqtW9j6AGmO5NYPl1PgrLCDkFBA
Feevx5TAIs6GeGKrzbE5s9QHMAwtVhsN1g8lJgbCPZJfh9wcynIyrPI4K/Vu5J9A
tpNbRXGsfk/MCsNF6kgm+pAoseavXUwjsSNteFwq7eMSqZUgeG5a1hx/36b5mHdW
+YANpMmzS3Ae2HscZ6E8xTGxzfmmWmr6SczN0i0lBCR5cjUo7Cw=
=oWyE
-END PGP SIGNATURE-


Re: [PATCH] powerpc/pseries: Whitelist dtl slub object for copying to userspace

2024-06-21 Thread Michael Ellerman
Anjali K  writes:
> Hi Michael
>
> On 18/06/24 12:41, Michael Ellerman wrote:
>> I guess there isn't a kmem_cache_create_user_readonly() ?

> Thank you for your review.    
> 
> My understanding of the question is whether there's a way to whitelist a   
> region such that it can be copied to userspace, but not written to using   
> copy_from_user().     
 
Yes that's what I meant, and I pretty much worked that out from looking
at the implementation, but was hoping Kees would tell me it was there
somewhere, or implement it :)  Apologies for being cryptic.

> No, we don't have a function to whitelist only for copy_to_user() and not  
> copy_from_user().

Yep. I'll take this patch as-is, I think we've established that it's
pretty low risk to whitelist the whole cache.

cheers


Re: [PATCH v2 0/8] KVM: PPC: Book3S HV: Nested guest migration fixes

2024-06-20 Thread Michael Ellerman
On Wed, 05 Jun 2024 13:06:00 +, Shivaprasad G Bhat wrote:
> The series fixes the issues exposed by the kvm-unit-tests[1]
> sprs-migration test.
> 
> The SDAR, MMCR3 were seen to have some typo/refactoring bugs.
> The first two patches fix them.
> 
> The remaining patches take care of save-restoring the guest
> state elements for DEXCR, HASHKEYR and HASHPKEYR SPRs with PHYP
> during entry-exit. The KVM_PPC_REG too for them are missing which
> are added for use by the QEMU.
> 
> [...]

Applied to powerpc/topic/ppc-kvm.

[1/8] KVM: PPC: Book3S HV: Fix the set_one_reg for MMCR3
  https://git.kernel.org/powerpc/c/f9ca6a10be20479d526f27316cc32cfd1785ed39
[2/8] KVM: PPC: Book3S HV: Fix the get_one_reg of SDAR
  https://git.kernel.org/powerpc/c/009f6f42c67e9de737d6d3d199f92b21a8cb9622
[3/8] KVM: PPC: Book3S HV: Add one-reg interface for DEXCR register
  https://git.kernel.org/powerpc/c/1a1e6865f516696adcf6e94f286c7a0f84d78df3
[4/8] KVM: PPC: Book3S HV nestedv2: Keep nested guest DEXCR in sync
  https://git.kernel.org/powerpc/c/2d6be3ca3276ab30fb14f285d400461a718d45e7
[5/8] KVM: PPC: Book3S HV: Add one-reg interface for HASHKEYR register
  https://git.kernel.org/powerpc/c/e9eb790b25577a15d3f450ed585c59048e4e6c44
[6/8] KVM: PPC: Book3S HV nestedv2: Keep nested guest HASHKEYR in sync
  https://git.kernel.org/powerpc/c/1e97c1eb785fe2dc863c2bd570030d6fcf4b5e5b
[7/8] KVM: PPC: Book3S HV: Add one-reg interface for HASHPKEYR register
  https://git.kernel.org/powerpc/c/9a0d2f4995ddde3022c54e43f9ece4f71f76f6e8
[8/8] KVM: PPC: Book3S HV nestedv2: Keep nested guest HASHPKEYR in sync
  https://git.kernel.org/powerpc/c/0b65365f3fa95c2c5e2094739151a05cabb3c48a

cheers


Re: [PATCH v9] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests

2024-06-20 Thread Michael Ellerman
On Mon, 20 May 2024 23:27:40 +0530, Gautam Menghani wrote:
> PAPR hypervisor has introduced three new counters in the VPA area of
> LPAR CPUs for KVM L2 guest (see [1] for terminology) observability - 2
> for context switches from host to guest and vice versa, and 1 counter
> for getting the total time spent inside the KVM guest. Add a tracepoint
> that enables reading the counters for use by ftrace/perf. Note that this
> tracepoint is only available for nestedv2 API (i.e, KVM on PowerVM).
> 
> [...]

Applied to powerpc/topic/ppc-kvm.

[1/1] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests
  https://git.kernel.org/powerpc/c/e1f288d2f9c69bb8965db9fb99a19b58231a00dd

cheers


Re: [PATCH v2 0/2] Fix doorbell emulation for v2 API on PPC

2024-06-20 Thread Michael Ellerman
On Wed, 05 Jun 2024 17:09:08 +0530, Gautam Menghani wrote:
> Doorbell emulation for KVM on PAPR guests is broken as support for DPDES
> was not added in initial patch series [1].
> Add DPDES support and doorbell handling support for V2 API.
> 
> [1] lore.kernel.org/linuxppc-dev/20230914030600.16993-1-jniet...@gmail.com
> 
> Changes in v2:
> 1. Split DPDES support into its own patch
> 
> [...]

Applied to powerpc/topic/ppc-kvm.

[1/2] arch/powerpc/kvm: Add DPDES support in helper library for Guest state 
buffer
  https://git.kernel.org/powerpc/c/55dfb8bed6fe8bda390cc71cca878d11a9407099
[2/2] arch/powerpc/kvm: Fix doorbell emulation for v2 API
  https://git.kernel.org/powerpc/c/54ec2bd9e0173b75daf84675d07c56584f96564b

cheers


Re: [PATCH v2 0/2] powerpc: kexec fixes

2024-06-20 Thread Michael Ellerman
On Fri, 10 May 2024 15:52:33 +0530, Sourabh Jain wrote:
> Patch series fixes two kexec issues.
> 
> 01/02: Update extra size calculation for kexec FDT to avoid kexec load
> failure due to FDT_ERR_NOSPACE while including CPU nodes added post
> boot and reserved memory ranges.
> 
> 02/02: Fix update_cpus_node/core_64.c function to include missing device
> nodes under /cpus node with device_type != "cpu".
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/kexec_file: fix extra size calculation for kexec FDT
  https://git.kernel.org/powerpc/c/0d3ff067331ef84e7e7f49537d768881042ed5ba
[2/2] powerpc/kexec_file: fix cpus node update to FDT
  https://git.kernel.org/powerpc/c/932bed41217059638c78a75411b7893b121d2162

cheers


Re: [PATCH v2 0/1] powerpc/numa: Make cpu/memory less numa-node online

2024-06-20 Thread Michael Ellerman
On Fri, 17 May 2024 19:55:21 +0530, Nilay Shroff wrote:
> On NUMA aware system, we make a numa-node online only if that node is
> attached to cpu/memory. However it's possible that we have some PCI/IO
> device affinitized to a numa-node which is not currently online. In such
> case we set the numa-node id of the corresponding PCI device to -1
> (NUMA_NO_NODE). Not assigning the correct numa-node id to PCI device may
> impact the performance of such device. For instance, we have a multi
> controller NVMe disk where each controller of the disk is attached to
> different PHB (PCI host bridge). Each of these PHBs has numa-node id
> assigned during PCI enumeration. During PCI enumeration if we find that
> the numa-node is not online then we set the numa-node id of the PHB to -1.
> If we create shared namespace and attach to multi controller NVMe disk
> then that namespace could be accessed through each controller and as each
> controller is connected to different PHBs, it's possible to access the
> same namespace using multiple PCI channel. While sending IO to a shared
> namespace, NVMe driver would calculate the optimal IO path using numa-node
> distance. However if the numa-node id is not correctly assigned to NVMe
> PCIe controller then it's possible that driver would calculate incorrect
> NUMA distance and hence select the non-optimal path for sending IO. If
> this happens then we could potentially observe the degraded IO performance.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/numa: Online a node if PHB is attached.
  https://git.kernel.org/powerpc/c/11981816e3614156a1fe14a1e8e77094ea46c7d5

cheers


Re: [PATCH] powerpc/mm/drmem: Silence drmem_init() early return

2024-06-20 Thread Michael Ellerman
On Mon, 03 Jun 2024 14:31:32 -0500, Nathan Lynch wrote:
> It's not an error or noteworthy condition if the
> "ibm,dynamic-reconfiguration-memory" node isn't present.
> 
> Drop the needless message.
> 
> 

Applied to powerpc/next.

[1/1] powerpc/mm/drmem: Silence drmem_init() early return
  https://git.kernel.org/powerpc/c/11e6e6d8bf8f908468bac0447727e3f3923c8512

cheers


Re: [PATCH v6] powerpc/pseries/vas: Use usleep_range() to support HCALL delay

2024-06-20 Thread Michael Ellerman
On Mon, 15 Jan 2024 21:59:10 -0800, Haren Myneni wrote:
> VAS allocate, modify and deallocate HCALLs returns
> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
> delay and expects OS to reissue HCALL after that delay. But using
> msleep() will often sleep at least 20 msecs even though the
> hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
> 
> The open and close VAS window functions hold mutex and then issue
> these HCALLs. So these operations can take longer than the
> necessary when multiple threads issue open or close window APIs
> simultaneously, especially might affect the performance in the
> case of repeat open/close APIs for each compression request.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pseries/vas: Use usleep_range() to support HCALL delay
  https://git.kernel.org/powerpc/c/43ac9f5cd457bb01930f87448ddaaae455f8a8cf

cheers


Re: [PATCH] powerpc/pseries/iommu: Split Dynamic DMA Window to be used in Hybrid mode

2024-06-20 Thread Michael Ellerman
On Mon, 13 May 2024 20:46:08 -0500, Gaurav Batra wrote:
> Dynamic DMA Window (DDW) supports TCEs that are backed by 2MB page size.
> In most configurations, DDW is big enough to pre-map all of LPAR memory
> for IO. Pre-mapping of memory for DMA results in improvements in IO
> performance.
> 
> Persistent memory, vPMEM, can be assigned to an LPAR as well. vPMEM is not
> contiguous with LPAR memory and usually is assigned at high memory
> addresses.  This makes is not possible to pre-map both vPMEM and LPAR
> memory in the same DDW.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pseries/iommu: Split Dynamic DMA Window to be used in Hybrid mode
  https://git.kernel.org/powerpc/c/ff5163bb7000a0254ffdd7b50cb6df43add94f33

cheers


Re: [PATCH v2] arch/powerpc: Remove unused cede related functions

2024-06-20 Thread Michael Ellerman
On Tue, 14 May 2024 18:54:55 +0530, Gautam Menghani wrote:
> Remove extended_cede_processor() and its helpers as
> extended_cede_processor() has no callers since
> commit 48f6e7f6d948("powerpc/pseries: remove cede offline state for CPUs")
> 
> 

Applied to powerpc/next.

[1/1] arch/powerpc: Remove unused cede related functions
  https://git.kernel.org/powerpc/c/214f33fcf656bf1be3f9f03d58fda067cdf7eecc

cheers


Re: (subset) [PATCH 0/6] defconfig: drop RT_GROUP_SCHED=y

2024-06-20 Thread Michael Ellerman
On Thu, 30 May 2024 19:19:48 +0800, Celeste Liu wrote:
> For cgroup v1, if turned on, and there's any cgroup in the "cpu" hierarchy it
> needs an RT budget assigned, otherwise the processes in it will not be able to
> get RT at all. The problem with RT group scheduling is that it requires the
> budget assigned but there's no way we could assign a default budget, since the
> values to assign are both upper and lower time limits, are absolute, and need 
> to
> be sum up to < 1 for each individal cgroup. That means we cannot really come 
> up
> with values that would work by default in the general case.[1]
> 
> [...]

Patch 4 applied to powerpc/next.

[4/6] powerpc: defconfig: drop RT_GROUP_SCHED=y from ppc6xx_defconfig
  https://git.kernel.org/powerpc/c/2bac6caee94e25f59ee47e2d365d7e07465089ba

cheers


Re: [PATCH] powerpc/crypto: Add generated P8 asm to .gitignore

2024-06-20 Thread Michael Ellerman
On Mon, 03 Jun 2024 08:01:03 -0500, Nathan Lynch wrote:
> Looks like drivers/crypto/vmx/.gitignore should have been merged into
> arch/powerpc/crypto/.gitignore as part of commit
> 109303336a0c ("crypto: vmx - Move to arch/powerpc/crypto") so that all
> generated asm files are ignored.
> 
> 

Applied to powerpc/fixes.

[1/1] powerpc/crypto: Add generated P8 asm to .gitignore
  https://git.kernel.org/powerpc/c/2b85b7fb1376481f7d4c2cf92e5da942f06b2547

cheers


Re: [PATCH] KVM: PPC: Book3S HV: Prevent UAF in kvm_spapr_tce_attach_iommu_group()

2024-06-20 Thread Michael Ellerman
On Fri, 14 Jun 2024 22:29:10 +1000, Michael Ellerman wrote:
> Al reported a possible use-after-free (UAF) in 
> kvm_spapr_tce_attach_iommu_group().
> 
> It looks up `stt` from tablefd, but then continues to use it after doing
> fdput() on the returned fd. After the fdput() the tablefd is free to be
> closed by another thread. The close calls kvm_spapr_tce_release() and
> then release_spapr_tce_table() (via call_rcu()) which frees `stt`.
> 
> [...]

Applied to powerpc/fixes.

[1/1] KVM: PPC: Book3S HV: Prevent UAF in kvm_spapr_tce_attach_iommu_group()
  https://git.kernel.org/powerpc/c/a986fa57fd81a1430e00b3c6cf8a325d6f894a63

cheers


Re: [PATCH AUTOSEL 6.9 18/23] powerpc: make fadump resilient with memory add/remove events

2024-06-19 Thread Michael Ellerman
Pavel Machek  writes:
>> Hello Sasha,
>> 
>> Thank you for considering this patch for the stable tree 6.9, 6.8, 6.6, and
>> 6.1.
>> 
>> This patch does two things:
>> 1. Fixes a potential memory corruption issue mentioned as the third point in
>> the commit message
>> 2. Enables the kernel to avoid unnecessary fadump re-registration on memory
>> add/remove events
>
> Actually, I'd suggest dropping this one, as it fixes two things and is
> over 200 lines long, as per stable kernel rules.

Yeah I agree, best to drop this one. It's a bit big and involved, and
has other dependencies.

cheers


Re: [PATCH] powerpc: Fixed duplicate copying in the early boot.

2024-06-18 Thread Michael Ellerman
Segher Boessenkool  writes:
> Hi!
>
> On Mon, Jun 17, 2024 at 10:35:09AM +0800, Jinglin Wen wrote:
>> +cmplwi  cr0,r4,0/* runtime base addr is zero */
>
> Just write
>cmpwi r4,0
>
> cr0 is the default, also implicit in many other instructions, please
> don't clutter the source code.  All the extra stuff makes you miss the
> things that do matter!
>
> The "l" is unnecessary, you only care about equality here after all.

In my mind it's an unsigned comparison, so I'd use cmpld, even though as
you say all we actually care about is equality.

cheers


Re: [PATCH] powerpc/pseries: Whitelist dtl slub object for copying to userspace

2024-06-18 Thread Michael Ellerman
Kees Cook  writes:
> On Fri, Jun 14, 2024 at 11:08:44PM +0530, Anjali K wrote:
>> Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
>> results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
>> shown below.
>> 
>> kernel BUG at mm/usercopy.c:102!
>> Oops: Exception in kernel mode, sig: 5 [#1]
>> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>> Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
>> scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror 
>> dm_region_hash dm_log dm_mod fuse
>> CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 #85
>> Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf06 
>> of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
>> NIP:  c05d23d4 LR: c05d23d0 CTR: 006ee6f8
>> REGS: c00120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
>> MSR:  80029033   CR: 2828220f  XER: 
>> 000e
>> CFAR: c01fdc80 IRQMASK: 0
>> [ ... GPRs omitted ... ]
>> NIP [c05d23d4] usercopy_abort+0x78/0xb0
>> LR [c05d23d0] usercopy_abort+0x74/0xb0
>> Call Trace:
>>  usercopy_abort+0x74/0xb0 (unreliable)
>>  __check_heap_object+0xf8/0x120
>>  check_heap_object+0x218/0x240
>>  __check_object_size+0x84/0x1a4
>>  dtl_file_read+0x17c/0x2c4
>>  full_proxy_read+0x8c/0x110
>>  vfs_read+0xdc/0x3a0
>>  ksys_read+0x84/0x144
>>  system_call_exception+0x124/0x330
>>  system_call_vectored_common+0x15c/0x2ec
>> --- interrupt: 3000 at 0x7fff81f3ab34
>> 
>> Commit 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
>> requires that only whitelisted areas in slab/slub objects can be copied to
>> userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
>> Dtl contains hypervisor dispatch events which are expected to be read by
>> privileged users. Hence mark this safe for user access.
>> Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
>> entire object.
>> 
>> Co-developed-by: Vishal Chourasia 
>> Signed-off-by: Vishal Chourasia 
>> Signed-off-by: Anjali K 
>> ---
>>  arch/powerpc/platforms/pseries/setup.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/setup.c 
>> b/arch/powerpc/platforms/pseries/setup.c
>> index 284a6fa04b0c..cba40d9d1284 100644
>> --- a/arch/powerpc/platforms/pseries/setup.c
>> +++ b/arch/powerpc/platforms/pseries/setup.c
>> @@ -343,8 +343,8 @@ static int alloc_dispatch_log_kmem_cache(void)
>>  {
>>  void (*ctor)(void *) = get_dtl_cache_ctor();
>>  
>> -dtl_cache = kmem_cache_create("dtl", DISPATCH_LOG_BYTES,
>> -DISPATCH_LOG_BYTES, 0, ctor);
>> +dtl_cache = kmem_cache_create_usercopy("dtl", DISPATCH_LOG_BYTES,
>> +DISPATCH_LOG_BYTES, 0, 0, 
>> DISPATCH_LOG_BYTES, ctor);
>>  if (!dtl_cache) {
>>  pr_warn("Failed to create dispatch trace log buffer cache\n");
>>  pr_warn("Stolen time statistics will be unreliable\n");
>
> Are you sure you want to universally expose this memory region? It
> sounds like it's only exposed via a debug interface. Maybe it'd be
> better to use a bounce buffer in the debug interface instead?

I'm not sure what the threat is?

The log entries are written by the hypervisor, but never read.

That kmem_cache is only used to allocate the array of dtl_entry structs,
the ring buffer itself (struct dtl) is allocated statically. So
overwriting the dtl_entries can't corrupt the structure of the ring
buffer, just the content.

An attacker could read the entries and see some kernel pointers, but
those are everywhere.

So it seems pretty harmless.

I guess there isn't a kmem_cache_create_user_readonly() ?

> diff --git a/arch/powerpc/platforms/pseries/dtl.c 
> b/arch/powerpc/platforms/pseries/dtl.c
> index 3f1cdccebc9c..3adcff5cc4b2 100644
> --- a/arch/powerpc/platforms/pseries/dtl.c
> +++ b/arch/powerpc/platforms/pseries/dtl.c
> @@ -257,6 +257,22 @@ static int dtl_file_release(struct inode *inode, struct 
> file *filp)
>   return 0;
>  }
>  
> +static inline int bounce_copy(char __user *buf, void *src, size_t size)
> +{
> + u8 *bounce;
> + int rc;
> +
> + bounce = kmalloc(size, GFP_KERNEL);
> + if (!bounce)
> + return -ENOMEM;
> +
> + memcpy(bounce, src, size);
> + rc = copy_to_user(buf, bounce, size);
> +
> + kfree(bounce);
> + return rc;
> +}

Is there no generic version of that?

> @@ -300,7 +316,7 @@ static ssize_t dtl_file_read(struct file *filp, char 
> __user *buf, size_t len,
>   if (i + n_req > dtl->buf_entries) {
>   read_size = dtl->buf_entries - i;
>  
> - rc = copy_to_user(buf, >buf[i],
> + rc = bounce_copy(buf, >buf[i],
>   read_size * sizeof(struct dtl_entry));
> 

Re: [PATCH] Documentation: Remove the unused "topology_updates" from kernel-parameters.txt

2024-06-17 Thread Michael Ellerman
Thomas Huth  writes:
> The "topology_updates" switch has been removed four years ago in commit
> c30f931e891e ("powerpc/numa: remove ability to enable topology updates"),
> so let's remove this from the documentation, too.
>
> Signed-off-by: Thomas Huth 
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 6 --
>  1 file changed, 6 deletions(-)

Oops, thanks for cleaning it up.

Acked-by: Michael Ellerman  (powerpc)

cheers

> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index f58001338860..b75852f1a789 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6600,12 +6600,6 @@
>   e.g. base its process migration decisions on it.
>   Default is on.
>  
> - topology_updates= [KNL, PPC, NUMA]
> - Format: {off}
> - Specify if the kernel should ignore (off)
> - topology updates sent by the hypervisor to this
> - LPAR.
> -
>   torture.disable_onoff_at_boot= [KNL]
>   Prevent the CPU-hotplug component of torturing
>   until after init has spawned.
> -- 
> 2.45.2


Re: [PATCH] powerpc: Fixed duplicate copying in the early boot.

2024-06-17 Thread Michael Ellerman
Jinglin Wen  writes:
> According to the code logic, when the kernel is loaded to address 0,
> no copying operation should be performed, but it is currently being
> done.
>
> This patch fixes the issue where the kernel code was incorrectly
> duplicated to address 0 when booting from address 0.
>
> Signed-off-by: Jinglin Wen 
> ---
>  arch/powerpc/kernel/head_64.S | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Thanks for the improved change log.

The subject could probably still be clearer, maybe:
  Fix unnecessary copy to 0 when kernel is booted at address 0

Looks like this was introduced by:

  Fixes: b270bebd34e3 ("powerpc/64s: Run at the kernel virtual address earlier 
in boot")
  Cc: sta...@vger.kernel.org # v6.4+

Let me know if you think otherwise.

Just out of interest, how are you hitting this bug? AFAIK none of our
"normal" boot loaders will load the kernel at 0. 

> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index 4690c219bfa4..6c73551bdc50 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -647,7 +647,9 @@ __after_prom_start:
>   * Note: This process overwrites the OF exception vectors.
>   */
>   LOAD_REG_IMMEDIATE(r3, PAGE_OFFSET)
> - mr. r4,r26  /* In some cases the loader may  */
> + tophys(r4,r26)
> + cmplwi  cr0,r4,0/* runtime base addr is zero */
> + mr  r4,r26  /* In some cases the loader may */
>   beq 9f  /* have already put us at zero */

That is a pretty minimal fix, but I think the code would be clearer if
we just compared the source and destination addresses.

Something like the diff below. Can you confirm that works for you.

cheers

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 4690c219bfa4..6ad1435303f9 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -647,8 +647,9 @@ __after_prom_start:
  * Note: This process overwrites the OF exception vectors.
  */
LOAD_REG_IMMEDIATE(r3, PAGE_OFFSET)
-   mr. r4,r26  /* In some cases the loader may  */
-   beq 9f  /* have already put us at zero */
+   mr  r4, r26 // Load the source address into r4
+   cmpld   cr0, r3, r4 // Check if source == dest
+   beq 9f  // If so skip the copy
li  r6,0x100/* Start offset, the first 0x100 */
/* bytes were copied earlier.*/
 


[PATCH] KVM: PPC: Book3S HV: Prevent UAF in kvm_spapr_tce_attach_iommu_group()

2024-06-14 Thread Michael Ellerman
Al reported a possible use-after-free (UAF) in 
kvm_spapr_tce_attach_iommu_group().

It looks up `stt` from tablefd, but then continues to use it after doing
fdput() on the returned fd. After the fdput() the tablefd is free to be
closed by another thread. The close calls kvm_spapr_tce_release() and
then release_spapr_tce_table() (via call_rcu()) which frees `stt`.

Although there are calls to rcu_read_lock() in
kvm_spapr_tce_attach_iommu_group() they are not sufficient to prevent
the UAF, because `stt` is used outside the locked regions.

With an artifcial delay after the fdput() and a userspace program which
triggers the race, KASAN detects the UAF:

  BUG: KASAN: slab-use-after-free in 
kvm_spapr_tce_attach_iommu_group+0x298/0x720 [kvm]
  Read of size 4 at addr c000200027552c30 by task kvm-vfio/2505
  CPU: 54 PID: 2505 Comm: kvm-vfio Not tainted 6.10.0-rc3-next-20240612-dirty #1
  Hardware name: 8335-GTH POWER9 0x4e1202 opal:skiboot-v6.5.3-35-g1851b2a06 
PowerNV
  Call Trace:
dump_stack_lvl+0xb4/0x108 (unreliable)
print_report+0x2b4/0x6ec
kasan_report+0x118/0x2b0
__asan_load4+0xb8/0xd0
kvm_spapr_tce_attach_iommu_group+0x298/0x720 [kvm]
kvm_vfio_set_attr+0x524/0xac0 [kvm]
kvm_device_ioctl+0x144/0x240 [kvm]
sys_ioctl+0x62c/0x1810
system_call_exception+0x190/0x440
system_call_vectored_common+0x15c/0x2ec
  ...
  Freed by task 0:
   ...
   kfree+0xec/0x3e0
   release_spapr_tce_table+0xd4/0x11c [kvm]
   rcu_core+0x568/0x16a0
   handle_softirqs+0x23c/0x920
   do_softirq_own_stack+0x6c/0x90
   do_softirq_own_stack+0x58/0x90
   __irq_exit_rcu+0x218/0x2d0
   irq_exit+0x30/0x80
   arch_local_irq_restore+0x128/0x230
   arch_local_irq_enable+0x1c/0x30
   cpuidle_enter_state+0x134/0x5cc
   cpuidle_enter+0x6c/0xb0
   call_cpuidle+0x7c/0x100
   do_idle+0x394/0x410
   cpu_startup_entry+0x60/0x70
   start_secondary+0x3fc/0x410
   start_secondary_prolog+0x10/0x14

Fix it by delaying the fdput() until `stt` is no longer in use, which
is effectively the entire function. To keep the patch minimal add a call
to fdput() at each of the existing return paths. Future work can convert
the function to goto or __cleanup style cleanup.

With the fix in place the test case no longer triggers the UAF.

Reported-by: Al Viro 
Closes: https://lore.kernel.org/all/20240610024437.GA1464458@ZenIV/
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kvm/book3s_64_vio.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

I'll plan to merge this via the powerpc/fixes tree, unless anyone thinks 
otherwise.

cheers

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index b569ebaa590e..3ff3de9a52ac 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -130,14 +130,16 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, 
int tablefd,
}
rcu_read_unlock();
 
-   fdput(f);
-
-   if (!found)
+   if (!found) {
+   fdput(f);
return -EINVAL;
+   }
 
table_group = iommu_group_get_iommudata(grp);
-   if (WARN_ON(!table_group))
+   if (WARN_ON(!table_group)) {
+   fdput(f);
return -EFAULT;
+   }
 
for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
struct iommu_table *tbltmp = table_group->tables[i];
@@ -158,8 +160,10 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int 
tablefd,
break;
}
}
-   if (!tbl)
+   if (!tbl) {
+   fdput(f);
return -EINVAL;
+   }
 
rcu_read_lock();
list_for_each_entry_rcu(stit, >iommu_tables, next) {
@@ -170,6 +174,7 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int 
tablefd,
/* stit is being destroyed */
iommu_tce_table_put(tbl);
rcu_read_unlock();
+   fdput(f);
return -ENOTTY;
}
/*
@@ -177,6 +182,7 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int 
tablefd,
 * its KVM reference counter and can return.
 */
rcu_read_unlock();
+   fdput(f);
return 0;
}
rcu_read_unlock();
@@ -184,6 +190,7 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int 
tablefd,
stit = kzalloc(sizeof(*stit), GFP_KERNEL);
if (!stit) {
iommu_tce_table_put(tbl);
+   fdput(f);
return -ENOMEM;
}
 
@@ -192,6 +199,7 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int 
tablefd,
 
list_add_rcu(>next, >iommu_tables);
 
+   fdput(f);
return 0;
 }
 
-- 
2.45.1



Re: Please backport 2d43cc701b96 to v6.9 and v6.6

2024-06-14 Thread Michael Ellerman
Greg KH  writes:
> On Fri, Jun 14, 2024 at 05:54:50PM +1000, Michael Ellerman wrote:
>> Hi stable team,
>> 
>> Can you please backport:
>>   2d43cc701b96 ("powerpc/uaccess: Fix build errors seen with GCC 13/14")
>> 
>> To v6.9 and v6.6.
>> 
>> It was marked for backporting, but hasn't been picked up AFAICS. I'm not
>> sure if it clashed with the asm_goto_output changes or something. But it
>> backports cleanly to the current stable branches.
>
> It's still in my "to get to queue" along with about 150+ other patches
> that were tagged for stable inclusion.  It's in good company, I'll get
> to it after this current round of -rc releases is out.

Thanks.

I also just sent three backports for that commit for v5.10, v5.15 and v6.1.

cheers


[PATCH v6.1] powerpc/uaccess: Fix build errors seen with GCC 13/14

2024-06-14 Thread Michael Ellerman
commit 2d43cc701b96f910f50915ac4c2a0cae5deb734c upstream.

Building ppc64le_defconfig with GCC 14 fails with assembler errors:

CC  fs/readdir.o
  /tmp/ccdQn0mD.s: Assembler messages:
  /tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 4)
  /tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 4)
  ... [6 lines]
  /tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple of 4)

A snippet of the asm shows:

  # ../fs/readdir.c:210: unsafe_copy_dirent_name(dirent->d_name, name, 
namlen, efault_end);
 ld 9,0(29)   # MEM[(u64 *)name_38(D) + _88 * 1], MEM[(u64 
*)name_38(D) + _88 * 1]
  # 210 "../fs/readdir.c" 1
 1:  std 9,18(8) # put_user   # *__pus_addr_52, MEM[(u64 
*)name_38(D) + _88 * 1]

The 'std' instruction requires a 4-byte aligned displacement because
it is a DS-form instruction, and as the assembler says, 18 is not a
multiple of 4.

A similar error is seen with GCC 13 and CONFIG_UBSAN_SIGNED_WRAP=y.

The fix is to change the constraint on the memory operand to put_user(),
from "m" which is a general memory reference to "YZ".

The "Z" constraint is documented in the GCC manual PowerPC machine
constraints, and specifies a "memory operand accessed with indexed or
indirect addressing". "Y" is not documented in the manual but specifies
a "memory operand for a DS-form instruction". Using both allows the
compiler to generate a DS-form "std" or X-form "stdx" as appropriate.

Unfortunately clang doesn't support the "Y" constraint so that has to be
behind an ifdef.

Although the build error is only seen with GCC 13/14, that appears
to just be luck. The constraint has been incorrect since it was first
added.

Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with 
__put_user()/__get_user()")
Suggested-by: Kewen Lin 
[mpe: Drop CONFIG_PPC_KERNEL_PREFIXED ifdef for backport]
Signed-off-by: Michael Ellerman 
Link: https://msgid.link/20240529123029.146953-1-...@ellerman.id.au
---
 arch/powerpc/include/asm/uaccess.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 45d4c9cf3f3a..661046150e49 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -80,9 +80,20 @@ __pu_failed: 
\
:   \
: label)
 
+#ifdef CONFIG_CC_IS_CLANG
+#define DS_FORM_CONSTRAINT "Z<>"
+#else
+#define DS_FORM_CONSTRAINT "YZ<>"
+#endif
+
 #ifdef __powerpc64__
-#define __put_user_asm2_goto(x, ptr, label)\
-   __put_user_asm_goto(x, ptr, label, "std")
+#define __put_user_asm2_goto(x, addr, label)   \
+   asm goto ("1: std%U1%X1 %0,%1   # put_user\n"   \
+   EX_TABLE(1b, %l2)   \
+   :   \
+   : "r" (x), DS_FORM_CONSTRAINT (*addr)   \
+   :   \
+   : label)
 #else /* __powerpc64__ */
 #define __put_user_asm2_goto(x, addr, label)   \
asm goto(   \
-- 
2.45.1



[PATCH v5.15] powerpc/uaccess: Fix build errors seen with GCC 13/14

2024-06-14 Thread Michael Ellerman
commit 2d43cc701b96f910f50915ac4c2a0cae5deb734c upstream.

Building ppc64le_defconfig with GCC 14 fails with assembler errors:

CC  fs/readdir.o
  /tmp/ccdQn0mD.s: Assembler messages:
  /tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 4)
  /tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 4)
  ... [6 lines]
  /tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple of 4)

A snippet of the asm shows:

  # ../fs/readdir.c:210: unsafe_copy_dirent_name(dirent->d_name, name, 
namlen, efault_end);
 ld 9,0(29)   # MEM[(u64 *)name_38(D) + _88 * 1], MEM[(u64 
*)name_38(D) + _88 * 1]
  # 210 "../fs/readdir.c" 1
 1:  std 9,18(8) # put_user   # *__pus_addr_52, MEM[(u64 
*)name_38(D) + _88 * 1]

The 'std' instruction requires a 4-byte aligned displacement because
it is a DS-form instruction, and as the assembler says, 18 is not a
multiple of 4.

A similar error is seen with GCC 13 and CONFIG_UBSAN_SIGNED_WRAP=y.

The fix is to change the constraint on the memory operand to put_user(),
from "m" which is a general memory reference to "YZ".

The "Z" constraint is documented in the GCC manual PowerPC machine
constraints, and specifies a "memory operand accessed with indexed or
indirect addressing". "Y" is not documented in the manual but specifies
a "memory operand for a DS-form instruction". Using both allows the
compiler to generate a DS-form "std" or X-form "stdx" as appropriate.

Unfortunately clang doesn't support the "Y" constraint so that has to be
behind an ifdef.

Although the build error is only seen with GCC 13/14, that appears
to just be luck. The constraint has been incorrect since it was first
added.

Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with 
__put_user()/__get_user()")
Suggested-by: Kewen Lin 
[mpe: Drop CONFIG_PPC_KERNEL_PREFIXED ifdef for backport]
Signed-off-by: Michael Ellerman 
Link: https://msgid.link/20240529123029.146953-1-...@ellerman.id.au
---
 arch/powerpc/include/asm/uaccess.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index b2680070d65d..6013a7fc74ba 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -90,9 +90,20 @@ __pu_failed: 
\
:   \
: label)
 
+#ifdef CONFIG_CC_IS_CLANG
+#define DS_FORM_CONSTRAINT "Z<>"
+#else
+#define DS_FORM_CONSTRAINT "YZ<>"
+#endif
+
 #ifdef __powerpc64__
-#define __put_user_asm2_goto(x, ptr, label)\
-   __put_user_asm_goto(x, ptr, label, "std")
+#define __put_user_asm2_goto(x, addr, label)   \
+   asm goto ("1: std%U1%X1 %0,%1   # put_user\n"   \
+   EX_TABLE(1b, %l2)   \
+   :   \
+   : "r" (x), DS_FORM_CONSTRAINT (*addr)   \
+   :   \
+   : label)
 #else /* __powerpc64__ */
 #define __put_user_asm2_goto(x, addr, label)   \
asm_volatile_goto(  \
-- 
2.45.1



[PATCH v5.10] powerpc/uaccess: Fix build errors seen with GCC 13/14

2024-06-14 Thread Michael Ellerman
commit 2d43cc701b96f910f50915ac4c2a0cae5deb734c upstream.

Building ppc64le_defconfig with GCC 14 fails with assembler errors:

CC  fs/readdir.o
  /tmp/ccdQn0mD.s: Assembler messages:
  /tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 4)
  /tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 4)
  ... [6 lines]
  /tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple of 4)

A snippet of the asm shows:

  # ../fs/readdir.c:210: unsafe_copy_dirent_name(dirent->d_name, name, 
namlen, efault_end);
 ld 9,0(29)   # MEM[(u64 *)name_38(D) + _88 * 1], MEM[(u64 
*)name_38(D) + _88 * 1]
  # 210 "../fs/readdir.c" 1
 1:  std 9,18(8) # put_user   # *__pus_addr_52, MEM[(u64 
*)name_38(D) + _88 * 1]

The 'std' instruction requires a 4-byte aligned displacement because
it is a DS-form instruction, and as the assembler says, 18 is not a
multiple of 4.

A similar error is seen with GCC 13 and CONFIG_UBSAN_SIGNED_WRAP=y.

The fix is to change the constraint on the memory operand to put_user(),
from "m" which is a general memory reference to "YZ".

The "Z" constraint is documented in the GCC manual PowerPC machine
constraints, and specifies a "memory operand accessed with indexed or
indirect addressing". "Y" is not documented in the manual but specifies
a "memory operand for a DS-form instruction". Using both allows the
compiler to generate a DS-form "std" or X-form "stdx" as appropriate.

Unfortunately clang doesn't support the "Y" constraint so that has to be
behind an ifdef.

Although the build error is only seen with GCC 13/14, that appears
to just be luck. The constraint has been incorrect since it was first
added.

Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with 
__put_user()/__get_user()")
Suggested-by: Kewen Lin 
[mpe: Drop CONFIG_PPC_KERNEL_PREFIXED ifdef for backport]
Signed-off-by: Michael Ellerman 
Link: https://msgid.link/20240529123029.146953-1-...@ellerman.id.au
---
 arch/powerpc/include/asm/uaccess.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 6b808bcdecd5..6df110c1254e 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -186,9 +186,20 @@ do {   
\
:   \
: label)
 
+#ifdef CONFIG_CC_IS_CLANG
+#define DS_FORM_CONSTRAINT "Z<>"
+#else
+#define DS_FORM_CONSTRAINT "YZ<>"
+#endif
+
 #ifdef __powerpc64__
-#define __put_user_asm2_goto(x, ptr, label)\
-   __put_user_asm_goto(x, ptr, label, "std")
+#define __put_user_asm2_goto(x, addr, label)   \
+   asm goto ("1: std%U1%X1 %0,%1   # put_user\n"   \
+   EX_TABLE(1b, %l2)   \
+   :   \
+   : "r" (x), DS_FORM_CONSTRAINT (*addr)   \
+   :   \
+   : label)
 #else /* __powerpc64__ */
 #define __put_user_asm2_goto(x, addr, label)   \
asm_volatile_goto(  \
-- 
2.45.1



Please backport 2d43cc701b96 to v6.9 and v6.6

2024-06-14 Thread Michael Ellerman
Hi stable team,

Can you please backport:
  2d43cc701b96 ("powerpc/uaccess: Fix build errors seen with GCC 13/14")

To v6.9 and v6.6.

It was marked for backporting, but hasn't been picked up AFAICS. I'm not
sure if it clashed with the asm_goto_output changes or something. But it
backports cleanly to the current stable branches.

It needs a custom backport for earlier kernels, I'll send those.

cheers


Re: [PATCH 0/2] Skip offline cores when enabling SMT on PowerPC

2024-06-13 Thread Michael Ellerman
"Nysal Jan K.A."  writes:
> From: "Nysal Jan K.A" 
>
> After the addition of HOTPLUG_SMT support for PowerPC [1] there was a
> regression reported [2] when enabling SMT.

This implies it was a kernel regression. But it can't be a kernel
regression because previously there was no support at all for the sysfs
interface on powerpc.

IIUIC the regression was in the ppc64_cpu userspace tool, which switched
to using the new kernel interface without taking into account the way it
behaves.

Or are you saying the kernel behaviour changed on x86 after the powerpc
HOTPLUG_SMT was added?

> On a system with at least
> one offline core, when enabling SMT, the expectation is that no CPUs
> of offline cores are made online.
>
> On a POWER9 system with 4 cores in SMT4 mode:
> $ ppc64_cpu --info
> Core   0:0*1*2*3*
> Core   1:4*5*6*7*
> Core   2:8*9*   10*   11*
> Core   3:   12*   13*   14*   15*
>
> Turn only one core on:
> $ ppc64_cpu --cores-on=1
> $ ppc64_cpu --info
> Core   0:0*1*2*3*
> Core   1:4 5 6 7
> Core   2:8 91011
> Core   3:   12131415
>
> Change the SMT level to 2:
> $ ppc64_cpu --smt=2
> $ ppc64_cpu --info
> Core   0:0*1*2 3
> Core   1:4 5 6 7
> Core   2:8 91011
> Core   3:   12131415
>
> As expected we see only two CPUs of core 0 are online
>
> Change the SMT level to 4:
> $ ppc64_cpu --smt=4
> $ ppc64_cpu --info
> Core   0:0*1*2*3*
> Core   1:4*5*6*7*
> Core   2:8*9*   10*   11*
> Core   3:   12*   13*   14*   15*
>
> The CPUs of offline cores are made online. If a core is offline then
> enabling SMT should not online CPUs of this core.

That's the way the ppc64_cpu tool behaves, but it's not necessarily what
other arches want.

> An arch specific
> function topology_is_core_online() is proposed to address this.
> Another approach is to check the topology_sibling_cpumask() for any
> online siblings. This avoids the need for an arch specific function
> but is less efficient and more importantly this introduces a change
> in existing behaviour on other architectures.

It's only x86 and powerpc right?

Having different behaviour on the only two arches that support the
interface does not seem like a good result.

> What is the expected behaviour on x86 when enabling SMT and certain cores
> are offline? 

AFAIK no one really touches SMT on x86 other than to turn it off for
security reasons.

cheers

> [1] https://lore.kernel.org/lkml/20230705145143.40545-1-lduf...@linux.ibm.com/
> [2] 
> https://groups.google.com/g/powerpc-utils-devel/c/wrwVzAAnRlI/m/5KJSoqP4BAAJ
>
> Nysal Jan K.A (2):
>   cpu/SMT: Enable SMT only if a core is online
>   powerpc/topology: Check if a core is online
>
>  arch/powerpc/include/asm/topology.h | 13 +
>  kernel/cpu.c| 12 +++-
>  2 files changed, 24 insertions(+), 1 deletion(-)
>
>
> base-commit: c760b3725e52403dc1b28644fb09c47a83cacea6
> -- 
> 2.35.3


Re: [RFC] potential UAF in kvm_spapr_tce_attach_iommu_group() (was Re: [PATCH 11/19] switch simple users of fdget() to CLASS(fd, ...))

2024-06-13 Thread Michael Ellerman
Linus Torvalds  writes:
> On Sun, 9 Jun 2024 at 19:45, Al Viro  wrote:
>>
>> Unless I'm misreading that code (entirely possible), this fdput() shouldn't
>> be done until we are done with stt.
>
> Ack. That looks right to me.
>
> If I follow it right, the lifetime of stt is tied to the lifetime of
> the file (plus RCU), so doing fdput early and then dropping the RCU
> lock means that stt may not be valid any more later.

Yep. I added a sleep after the fdput and was able to get KASAN to catch
it (below).

I'll send a fix patch tomorrow, just using fdput(), and then the CLASS
conversion can go on top later.

cheers


==
BUG: KASAN: slab-use-after-free in kvm_spapr_tce_attach_iommu_group+0x298/0x720 
[kvm]
Read of size 4 at addr c000200027552c30 by task kvm-vfio/2505

CPU: 54 PID: 2505 Comm: kvm-vfio Not tainted 6.10.0-rc3-next-20240612-dirty #1
Hardware name: 8335-GTH POWER9 0x4e1202 opal:skiboot-v6.5.3-35-g1851b2a06 
PowerNV
Call Trace:
[c00020008c2a7860] [c27d4d50] dump_stack_lvl+0xb4/0x108 (unreliable)
[c00020008c2a78a0] [c072dfa8] print_report+0x2b4/0x6ec
[c00020008c2a7990] [c072d898] kasan_report+0x118/0x2b0
[c00020008c2a7aa0] [c072ff38] __asan_load4+0xb8/0xd0
[c00020008c2a7ac0] [c0081b343140] 
kvm_spapr_tce_attach_iommu_group+0x298/0x720 [kvm]
[c00020008c2a7b90] [c0081b31d61c] kvm_vfio_set_attr+0x524/0xac0 [kvm]
[c00020008c2a7c60] [c0081b3083ec] kvm_device_ioctl+0x144/0x240 [kvm]
[c00020008c2a7cd0] [c07e052c] sys_ioctl+0x62c/0x1810
[c00020008c2a7df0] [c0038d90] system_call_exception+0x190/0x440
[c00020008c2a7e50] [c000d15c] system_call_vectored_common+0x15c/0x2ec
--- interrupt: 3000 at 0x7fff8af5bedc
NIP:  7fff8af5bedc LR: 7fff8af5bedc CTR: 
REGS: c00020008c2a7e80 TRAP: 3000   Not tainted  
(6.10.0-rc3-next-20240612-dirty)
MSR:  9280f033   CR: 44002482  
XER: 
IRQMASK: 0 
GPR00: 0036 7fffda53b1f0 7fff8b066d00 0006 
GPR04: 8018aee1 7fffda53b270 0008 7fff8ac0e9e0 
GPR08: 0006    
GPR12:  7fff8b2ca540   
GPR16:     
GPR20:    100101c0 
GPR24: 7fff8b2bf840 7fff8b2c 7fffda53b728 0001 
GPR28: 7fffda53b838 0006 0001 0005 
NIP [7fff8af5bedc] 0x7fff8af5bedc
LR [7fff8af5bedc] 0x7fff8af5bedc
--- interrupt: 3000

Allocated by task 2505:
 kasan_save_stack+0x48/0x80
 kasan_save_track+0x2c/0x50
 kasan_save_alloc_info+0x44/0x60
 __kasan_kmalloc+0xd0/0x120
 __kmalloc_noprof+0x214/0x670
 kvm_vm_ioctl_create_spapr_tce+0x10c/0x420 [kvm]
 kvm_arch_vm_ioctl+0x5fc/0x890 [kvm]
 kvm_vm_ioctl+0xa54/0x13d0 [kvm]
 sys_ioctl+0x62c/0x1810
 system_call_exception+0x190/0x440
 system_call_vectored_common+0x15c/0x2ec

Freed by task 0:
 kasan_save_stack+0x48/0x80
 kasan_save_track+0x2c/0x50
 kasan_save_free_info+0xac/0xd0
 __kasan_slab_free+0x120/0x210
 kfree+0xec/0x3e0
 release_spapr_tce_table+0xd4/0x11c [kvm]
 rcu_core+0x568/0x16a0
 handle_softirqs+0x23c/0x920
 do_softirq_own_stack+0x6c/0x90
 do_softirq_own_stack+0x58/0x90
 __irq_exit_rcu+0x218/0x2d0
 irq_exit+0x30/0x80
 arch_local_irq_restore+0x128/0x230
 arch_local_irq_enable+0x1c/0x30
 cpuidle_enter_state+0x134/0x5cc
 cpuidle_enter+0x6c/0xb0
 call_cpuidle+0x7c/0x100
 do_idle+0x394/0x410
 cpu_startup_entry+0x60/0x70
 start_secondary+0x3fc/0x410
 start_secondary_prolog+0x10/0x14

Last potentially related work creation:
 kasan_save_stack+0x48/0x80
 __kasan_record_aux_stack+0xcc/0x130
 __call_rcu_common.constprop.0+0x8c/0x8e0
 kvm_spapr_tce_release+0x29c/0xbc10 [kvm]
 __fput+0x22c/0x630
 sys_close+0x70/0xe0
 system_call_exception+0x190/0x440
 system_call_vectored_common+0x15c/0x2ec

The buggy address belongs to the object at c000200027552c00
 which belongs to the cache kmalloc-256 of size 256
The buggy address is located 48 bytes inside of
 freed 256-byte region [c000200027552c00, c000200027552d00)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping: index:0xc000200027551800 
pfn:0x20002755
flags: 0x838(node=8|zone=0|lastcpupid=0x7)
page_type: 0xfdff(slab)
raw: 0838 c7010d80 5deadbeef122 
raw: c000200027551800 80800078 0001fdff 
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 c000200027552b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 c000200027552b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>c000200027552c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ^
 c000200027552c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 

Re: [PATCH] powerpc/eeh: avoid possible crash when edev->pdev changes

2024-06-10 Thread Michael Ellerman
Hi Ganesh,

Ganesh Goudar  writes:
> If a PCI device is removed during eeh_pe_report_edev(), edev->pdev
> will change and can cause a crash, hold the PCI rescan/remove lock
> while taking a copy of edev->pdev.
>
> Signed-off-by: Ganesh Goudar 
> ---
>  arch/powerpc/kernel/eeh_pe.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
> index d1030bc52564..49f968733912 100644
> --- a/arch/powerpc/kernel/eeh_pe.c
> +++ b/arch/powerpc/kernel/eeh_pe.c
> @@ -859,7 +859,9 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
>  
>   /* Retrieve the parent PCI bus of first (top) PCI device */
>   edev = list_first_entry_or_null(>edevs, struct eeh_dev, entry);
> + pci_lock_rescan_remove();
>   pdev = eeh_dev_to_pci_dev(edev);
> + pci_unlock_rescan_remove();
>   if (pdev)
>   return pdev->bus;

What prevents pdev being freed/reused immediately after you drop the
rescan/remove lock?

AFAICS eeh_dev_to_pci_dev() doesn't take an additional reference to the
pdev or anything.

cheers


Re: [PATCH] powerpc: vdso: fix building with wrong-endian toolchain

2024-06-07 Thread Michael Ellerman
Arnd Bergmann  writes:
> From: Arnd Bergmann 
>
> Building powerpc64le kernels with the kernel.org crosstool toolchains
> no longer works as the linker attempts to build a big-endian vdso:
>
> powerpc-linux/lib/gcc/powerpc-linux/12.3.0/../../../../powerpc-linux/bin/ld: 
> arch/powerpc/kernel/vdso/sigtramp32-32.o: compiled for a little endian system 
> and target is big endian
> powerpc-linux/lib/gcc/powerpc-linux/12.3.0/../../../../powerpc-linux/bin/ld: 
> failed to merge target specific data of file 
> arch/powerpc/kernel/vdso/sigtramp32-32.o
>
> Apparently creating the vdso.lds files from the lds.S files fails to
> pass the -mlittle-endian argument here, so the output format gets set
> wrong. Changing the conditional to check for CONFIG_CPU_LITTLE_ENDIAN
> instead still works, as the kernel configuration definitions are visible.
>
> Signed-off-by: Arnd Bergmann 
> ---
> I'm fairly sure this worked in the past, but I did not try to bisect the
> issue.

It still works for me.

I use the korg toolchains every day, and kisskb uses them too.

What commit / defconfig are you seeing the errors with?

Is it just the 12.3.0 toolchain or all of them? I just tested 12.3.0
here and it built OK.

I guess you're building on x86 or arm64? I build on ppc64le, I wonder if
that makes a difference.

The patch is probably OK regardless, but I'd rather understand what the
actual problem is.

cheers

> diff --git a/arch/powerpc/kernel/vdso/vdso32.lds.S 
> b/arch/powerpc/kernel/vdso/vdso32.lds.S
> index 426e1ccc6971..5845ea2d1cba 100644
> --- a/arch/powerpc/kernel/vdso/vdso32.lds.S
> +++ b/arch/powerpc/kernel/vdso/vdso32.lds.S
> @@ -7,7 +7,7 @@
>  #include 
>  #include 
>  
> -#ifdef __LITTLE_ENDIAN__
> +#ifdef CONFIG_CPU_LITTLE_ENDIAN
>  OUTPUT_FORMAT("elf32-powerpcle", "elf32-powerpcle", "elf32-powerpcle")
>  #else
>  OUTPUT_FORMAT("elf32-powerpc", "elf32-powerpc", "elf32-powerpc")
> diff --git a/arch/powerpc/kernel/vdso/vdso64.lds.S 
> b/arch/powerpc/kernel/vdso/vdso64.lds.S
> index bda6c8cdd459..82c418b18cce 100644
> --- a/arch/powerpc/kernel/vdso/vdso64.lds.S
> +++ b/arch/powerpc/kernel/vdso/vdso64.lds.S
> @@ -7,7 +7,7 @@
>  #include 
>  #include 
>  
> -#ifdef __LITTLE_ENDIAN__
> +#ifdef CONFIG_CPU_LITTLE_ENDIAN
>  OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", "elf64-powerpcle")
>  #else
>  OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc")
> -- 
> 2.39.2


Re: [PATCH] ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K

2024-06-06 Thread Michael Ellerman
Niklas Cassel  writes:
> On Thu, Jun 06, 2024 at 09:14:45PM +1000, Michael Ellerman wrote:
>> The pata_macio driver advertises a max_segment_size of 0xff00, because
>> the hardware doesn't cope with requests >= 64K.
>> 
>> However the SCSI core requires max_segment_size to be at least
>> PAGE_SIZE, which is a problem for pata_macio when the kernel is built
>> with 64K pages.
>> 
>> In older kernels the SCSI core would just increase the segment size to
>> be equal to PAGE_SIZE, however since the commit tagged below it causes a
>> warning and the device fails to probe:
>> 
>>   WARNING: CPU: 0 PID: 26 at block/blk-settings.c:202 
>> .blk_validate_limits+0x2f8/0x35c
>>   CPU: 0 PID: 26 Comm: kworker/u4:1 Not tainted 6.10.0-rc1 #1
>>   Hardware name: PowerMac7,2 PPC970 0x390202 PowerMac
>>   ...
>>   NIP .blk_validate_limits+0x2f8/0x35c
>>   LR  .blk_alloc_queue+0xc0/0x2f8
>>   Call Trace:
>> .blk_alloc_queue+0xc0/0x2f8
>> .blk_mq_alloc_queue+0x60/0xf8
>> .scsi_alloc_sdev+0x208/0x3c0
>> .scsi_probe_and_add_lun+0x314/0x52c
>> .__scsi_add_device+0x170/0x1a4
>> .ata_scsi_scan_host+0x2bc/0x3e4
>> .async_port_probe+0x6c/0xa0
>> .async_run_entry_fn+0x60/0x1bc
>> .process_one_work+0x228/0x510
>> .worker_thread+0x360/0x530
>> .kthread+0x134/0x13c
>> .start_kernel_thread+0x10/0x14
>>   ...
>>   scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI 
>> devices might not be configured
>> 
>> Although the hardware can't cope with a 64K segment, the driver
>> already deals with that internally by splitting large requests in
>> pata_macio_qc_prep(). That is how the driver has managed to function
>> until now on 64K kernels.
>> 
>> So fix the driver to advertise a max_segment_size of 64K, which avoids
>> the warning and keeps the SCSI core happy.
>> 
>> Fixes: afd53a3d8528 ("scsi: core: Initialize scsi midlayer limits before 
>> allocating the queue")
>> Reported-by: Guenter Roeck 
>> Closes: 
>> https://lore.kernel.org/all/ce2bf6af-4382-4fe1-b392-cc6829f5c...@roeck-us.net/
>> Reported-by: Doru Iorgulescu 
>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218858
>> Signed-off-by: Michael Ellerman 
>> Reviewed-by: Christoph Hellwig 
>> ---
>
> Applied to libata/for-6.10-fixes:
> https://git.kernel.org/pub/scm/linux/kernel/git/libata/linux.git/log/?h=for-6.10-fixes
>
> With John's Reviewed-by from the other thread:
> https://lore.kernel.org/linux-ide/171362345502.571343.9746199181827642774.b4...@oracle.com/T/#t

Thanks.

cheers


Re: [PATCH v2 0/2] Fix doorbell emulation for v2 API on PPC

2024-06-06 Thread Michael Ellerman
Gautam Menghani  writes:
> On Wed, Jun 05, 2024 at 05:09:08PM GMT, Gautam Menghani wrote:
>> Doorbell emulation for KVM on PAPR guests is broken as support for DPDES
>> was not added in initial patch series [1].
>> Add DPDES support and doorbell handling support for V2 API. 
>> 
>> [1] lore.kernel.org/linuxppc-dev/20230914030600.16993-1-jniet...@gmail.com
>> 
>> Changes in v2:
>> 1. Split DPDES support into its own patch
>> 
>> Gautam Menghani (2):
>>   arch/powerpc/kvm: Add DPDES support in helper library for Guest state
>> buffer
>>   arch/powerpc/kvm: Fix doorbell emulation for v2 API
>> 
>>  Documentation/arch/powerpc/kvm-nested.rst | 4 +++-
>>  arch/powerpc/include/asm/guest-state-buffer.h | 3 ++-
>>  arch/powerpc/include/asm/kvm_book3s.h | 1 +
>>  arch/powerpc/kvm/book3s_hv.c  | 5 +
>>  arch/powerpc/kvm/book3s_hv_nestedv2.c | 7 +++
>>  arch/powerpc/kvm/test-guest-state-buffer.c| 2 +-
>>  6 files changed, 19 insertions(+), 3 deletions(-)
>> 
>> -- 
>> 2.45.1
>> 
>
>
> Hi Michael,
>
> This patch series is to be backported for all kernels >= 6.7. So the tag
> should be 
> Cc: sta...@vger.kernel.org # v6.7+
>
> and not
> Cc: sta...@vger.kernel.org # v6.7
>
> Should I send a new version of this series or can you please make this 
> change when pulling in your tree?

I can make the change when applying.

cheers


Re: [PATCH 04/23] scsi: initialize scsi midlayer limits before allocating the queue

2024-06-06 Thread Michael Ellerman
John Garry  writes:
>> diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c
>> index 817838e2f70e..3cb455a32d92 100644
>> --- a/drivers/ata/pata_macio.c
>> +++ b/drivers/ata/pata_macio.c
>> @@ -915,10 +915,13 @@ static const struct scsi_host_template pata_macio_sht 
>> = {
>>  .sg_tablesize   = MAX_DCMDS,
>>  /* We may not need that strict one */
>>  .dma_boundary   = ATA_DMA_BOUNDARY,
>> -/* Not sure what the real max is but we know it's less than 64K, let's
>> - * use 64K minus 256
>> +/*
>> + * The SCSI core requires the segment size to cover at least a page, so
>> + * for 64K page size kernels this must be at least 64K. However the
>> + * hardware can't handle 64K, so pata_macio_qc_prep() will split large
>> + * requests.
>>   */
>> -.max_segment_size   = MAX_DBDMA_SEG,
>> +.max_segment_size   = SZ_64K,
>>  .device_configure   = pata_macio_device_configure,
>>  .sdev_groups= ata_common_sdev_groups,
>>  .can_queue  = ATA_DEF_QUEUE,
>
> Feel free to add:
> Reviewed-by: John Garry 

Thanks.

Sorry I missed adding this when sending the proper patch, maybe whoever
applies it can add it then.

cheers


Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)

2024-06-06 Thread Michael Ellerman
Yu Zhao  writes:
> On Wed, Jun 5, 2024 at 9:12 PM Michael Ellerman  wrote:
>>
>> David Hildenbrand  writes:
>> > On 01.06.24 08:01, Yu Zhao wrote:
>> >> On Wed, May 15, 2024 at 4:06 PM Yu Zhao  wrote:
>> ...
>> >>
>> >> Your system has 2GB memory and it uses zswap with zsmalloc (which is
>> >> good since it can allocate from the highmem zone) and zstd/lzo (which
>> >> doesn't matter much). Somehow -- I couldn't figure out why -- it
>> >> splits the 2GB into a 0.25GB DMA zone and a 1.75GB highmem zone:
>> >>
>> >> [0.00] Zone ranges:
>> >> [0.00]   DMA  [mem 0x-0x2fff]
>> >> [0.00]   Normal   empty
>> >> [0.00]   HighMem  [mem 0x3000-0x7fff]
>> >
>> > That's really odd. But we are messing with "PowerMac3,6", so I don't
>> > really know what's right or wrong ...
>>
>> The DMA zone exists because 9739ab7eda45 ("powerpc: enable a 30-bit
>> ZONE_DMA for 32-bit pmac") selects it.
>>
>> It's 768MB (not 0.25GB) because it's clamped at max_low_pfn:
>
> Right. (I meant 0.75GB.)
>
>> #ifdef CONFIG_ZONE_DMA
>> max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
>>   1UL << (zone_dma_bits - PAGE_SHIFT));
>> #endif
>>
>> Which comes eventually from CONFIG_LOWMEM_SIZE, which defaults to 768MB.
>
> I see. I grep'ed VMSPLIT which is used on x86 and arm but apparently
> not on powerpc.

Those VMSPLIT configs are nice, on powerpc it's all done manually :}

>> I think it's 768MB because the user:kernel split is 3G:1G, and then the
>> kernel needs some of that 1G virtual space for vmalloc/ioremap/highmem,
>> so it splits it 768M:256M.
>>
>> Then ZONE_NORMAL is empty because it is also limited to max_low_pfn:
>>
>> max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>>
>> The rest of RAM is highmem.
>>
>> So I think that's all behaving as expected, but I don't know 32-bit /
>> highmem stuff that well so I could be wrong.
>
> Yes, the three zones work as intended.
>
> Erhard,
>
> Since your system only has 2GB memory, I'd try the 2G:2G split, which
> would in theory allow both the kernel and userspace to all memory.
>
> CONFIG_LOWMEM_SIZE_BOOL=y
> CONFIG_LOWMEM_SIZE=0x700
>
> (Michael, please correct me if the above wouldn't work.)

It's a bit more complicated, in order to increase LOWMEM_SIZE you need
to adjust all the other variables to make space.

To get 2G of user virtual space I think you need:

CONFIG_ADVANCED_OPTIONS=y
CONFIG_LOWMEM_SIZE_BOOL=y
CONFIG_LOWMEM_SIZE=0x6000
CONFIG_PAGE_OFFSET_BOOL=y
CONFIG_PAGE_OFFSET=0x9000
CONFIG_KERNEL_START_BOOL=y
CONFIG_KERNEL_START=0x9000
CONFIG_PHYSICAL_START=0x
CONFIG_TASK_SIZE_BOOL=y
CONFIG_TASK_SIZE=0x8000

Which results in 1.5GB of lowmem.

Or if you want to map all 2G of RAM directly in the kernel without
highmem, but limit user virtual space to 1.5G:

CONFIG_ADVANCED_OPTIONS=y
CONFIG_LOWMEM_SIZE_BOOL=y
CONFIG_LOWMEM_SIZE=0x8000
CONFIG_PAGE_OFFSET_BOOL=y
CONFIG_PAGE_OFFSET=0x7000
CONFIG_KERNEL_START_BOOL=y
CONFIG_KERNEL_START=0x7000
CONFIG_PHYSICAL_START=0x
CONFIG_TASK_SIZE_BOOL=y
CONFIG_TASK_SIZE=0x6000

You can also reclaim another 256MB of virtual space if you disable
CONFIG_MODULES.

Those configs do boot on qemu. But I don't have easy access to my 32-bit
machine to test if they boot on actual hardware.

cheers


[PATCH] ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K

2024-06-06 Thread Michael Ellerman
The pata_macio driver advertises a max_segment_size of 0xff00, because
the hardware doesn't cope with requests >= 64K.

However the SCSI core requires max_segment_size to be at least
PAGE_SIZE, which is a problem for pata_macio when the kernel is built
with 64K pages.

In older kernels the SCSI core would just increase the segment size to
be equal to PAGE_SIZE, however since the commit tagged below it causes a
warning and the device fails to probe:

  WARNING: CPU: 0 PID: 26 at block/blk-settings.c:202 
.blk_validate_limits+0x2f8/0x35c
  CPU: 0 PID: 26 Comm: kworker/u4:1 Not tainted 6.10.0-rc1 #1
  Hardware name: PowerMac7,2 PPC970 0x390202 PowerMac
  ...
  NIP .blk_validate_limits+0x2f8/0x35c
  LR  .blk_alloc_queue+0xc0/0x2f8
  Call Trace:
.blk_alloc_queue+0xc0/0x2f8
.blk_mq_alloc_queue+0x60/0xf8
.scsi_alloc_sdev+0x208/0x3c0
.scsi_probe_and_add_lun+0x314/0x52c
.__scsi_add_device+0x170/0x1a4
.ata_scsi_scan_host+0x2bc/0x3e4
.async_port_probe+0x6c/0xa0
.async_run_entry_fn+0x60/0x1bc
.process_one_work+0x228/0x510
.worker_thread+0x360/0x530
.kthread+0x134/0x13c
.start_kernel_thread+0x10/0x14
  ...
  scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices 
might not be configured

Although the hardware can't cope with a 64K segment, the driver
already deals with that internally by splitting large requests in
pata_macio_qc_prep(). That is how the driver has managed to function
until now on 64K kernels.

So fix the driver to advertise a max_segment_size of 64K, which avoids
the warning and keeps the SCSI core happy.

Fixes: afd53a3d8528 ("scsi: core: Initialize scsi midlayer limits before 
allocating the queue")
Reported-by: Guenter Roeck 
Closes: 
https://lore.kernel.org/all/ce2bf6af-4382-4fe1-b392-cc6829f5c...@roeck-us.net/
Reported-by: Doru Iorgulescu 
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218858
Signed-off-by: Michael Ellerman 
Reviewed-by: Christoph Hellwig 
---
 drivers/ata/pata_macio.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c
index 817838e2f70e..3cb455a32d92 100644
--- a/drivers/ata/pata_macio.c
+++ b/drivers/ata/pata_macio.c
@@ -915,10 +915,13 @@ static const struct scsi_host_template pata_macio_sht = {
.sg_tablesize   = MAX_DCMDS,
/* We may not need that strict one */
.dma_boundary   = ATA_DMA_BOUNDARY,
-   /* Not sure what the real max is but we know it's less than 64K, let's
-* use 64K minus 256
+   /*
+* The SCSI core requires the segment size to cover at least a page, so
+* for 64K page size kernels this must be at least 64K. However the
+* hardware can't handle 64K, so pata_macio_qc_prep() will split large
+* requests.
 */
-   .max_segment_size   = MAX_DBDMA_SEG,
+   .max_segment_size   = SZ_64K,
.device_configure   = pata_macio_device_configure,
.sdev_groups= ata_common_sdev_groups,
.can_queue  = ATA_DEF_QUEUE,
-- 
2.45.1



Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)

2024-06-05 Thread Michael Ellerman
David Hildenbrand  writes:
> On 01.06.24 08:01, Yu Zhao wrote:
>> On Wed, May 15, 2024 at 4:06 PM Yu Zhao  wrote:
...
>> 
>> Your system has 2GB memory and it uses zswap with zsmalloc (which is
>> good since it can allocate from the highmem zone) and zstd/lzo (which
>> doesn't matter much). Somehow -- I couldn't figure out why -- it
>> splits the 2GB into a 0.25GB DMA zone and a 1.75GB highmem zone:
>> 
>> [0.00] Zone ranges:
>> [0.00]   DMA  [mem 0x-0x2fff]
>> [0.00]   Normal   empty
>> [0.00]   HighMem  [mem 0x3000-0x7fff]
>
> That's really odd. But we are messing with "PowerMac3,6", so I don't 
> really know what's right or wrong ...

The DMA zone exists because 9739ab7eda45 ("powerpc: enable a 30-bit
ZONE_DMA for 32-bit pmac") selects it.

It's 768MB (not 0.25GB) because it's clamped at max_low_pfn:

#ifdef CONFIG_ZONE_DMA
max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
  1UL << (zone_dma_bits - PAGE_SHIFT));
#endif

Which comes eventually from CONFIG_LOWMEM_SIZE, which defaults to 768MB.

I think it's 768MB because the user:kernel split is 3G:1G, and then the
kernel needs some of that 1G virtual space for vmalloc/ioremap/highmem,
so it splits it 768M:256M.

Then ZONE_NORMAL is empty because it is also limited to max_low_pfn:

max_zone_pfns[ZONE_NORMAL] = max_low_pfn;

The rest of RAM is highmem.

So I think that's all behaving as expected, but I don't know 32-bit /
highmem stuff that well so I could be wrong.

cheers


Re: [PATCH 04/23] scsi: initialize scsi midlayer limits before allocating the queue

2024-06-05 Thread Michael Ellerman
Michael Ellerman  writes:
> Christoph Hellwig  writes:
>> On Fri, May 31, 2024 at 12:28:21AM +1000, Michael Ellerman wrote:
>>> No that's wrong. The actual hardware page size is 4K, but
>>> CONFIG_PAGE_SIZE and PAGE_SHIFT etc. is 64K.
>>> 
>>> So at least for this user the driver used to work with 64K pages, and
>>> now doesn't.
>>
>> Which suggested that the communicated max_hw_sectors is wrong, and
>> previously we were saved by the block layer increasing it to
>> PAGE_SIZE after a warning.  Should we just increment it to 64k?
>
> It looks like that user actually only has the CDROM hanging off
> pata_macio, so it's possible it has been broken previously and they
> didn't notice. I'll see if they can confirm the CDROM has been working
> up until now.
>
> I can test the CDROM on my G5 next week.

I can confirm that the driver does work with 64K pages prior to the
recent changes. I'm able to boot and read CDs with no errors.

However AFAICS that's because the driver splits large requests in
pata_macio_qc_prep():

static enum ata_completion_errors pata_macio_qc_prep(struct ata_queued_cmd *qc)
{
   ...
   for_each_sg(qc->sg, sg, qc->n_elem, si) {
  u32 addr, sg_len, len;
  ...
  addr = (u32) sg_dma_address(sg);
  sg_len = sg_dma_len(sg);

  while (sg_len) {
 ...
 len = (sg_len < MAX_DBDMA_SEG) ? sg_len : MAX_DBDMA_SEG;
 table->command = cpu_to_le16(write ? OUTPUT_MORE: 
INPUT_MORE);
 table->req_count = cpu_to_le16(len);
 ...
 addr += len;
 sg_len -= len;
 ++table;
  }
  }

 
If I increase MAX_DBMA_SEG from 0xff00 to 64K I see IO errors at boot:

  [   24.989755] sr 4:0:0:0: [sr0] tag#0 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_OK cmd_age=6s
  [   25.007310] sr 4:0:0:0: [sr0] tag#0 Sense Key : Medium Error [current]
  [   25.020502] sr 4:0:0:0: [sr0] tag#0 ASC=0x10 <>ASCQ=0x90
  [   25.032655] sr 4:0:0:0: [sr0] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 
20 00
  [   25.047232] I/O error, dev sr0, sector 0 op 0x0:(READ) flags 0x80700 
phys_seg 1 prio class 0


On the other hand increasing max_segment_size to 64K while leaving MAX_DBDMA_SEG
at 0xff00 seems to work fine. And that's effectively what's been happening on
existing kernels until now.

The only question is whether that violates some assumption elsewhere in the
SCSI layer?

Anyway patch below that works for me on v6.10-rc2.

cheers


diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c
index 817838e2f70e..3cb455a32d92 100644
--- a/drivers/ata/pata_macio.c
+++ b/drivers/ata/pata_macio.c
@@ -915,10 +915,13 @@ static const struct scsi_host_template pata_macio_sht = {
.sg_tablesize   = MAX_DCMDS,
/* We may not need that strict one */
.dma_boundary   = ATA_DMA_BOUNDARY,
-   /* Not sure what the real max is but we know it's less than 64K, let's
-* use 64K minus 256
+   /*
+* The SCSI core requires the segment size to cover at least a page, so
+* for 64K page size kernels this must be at least 64K. However the
+* hardware can't handle 64K, so pata_macio_qc_prep() will split large
+* requests.
 */
-   .max_segment_size   = MAX_DBDMA_SEG,
+   .max_segment_size   = SZ_64K,
.device_configure   = pata_macio_device_configure,
.sdev_groups= ata_common_sdev_groups,
.can_queue  = ATA_DEF_QUEUE,


[RFC PATCH 2/2] dt-bindings: memory: fsl: replace maintainer

2024-06-04 Thread Michael Walle
Li Yang's mail address is bouncing, replace it with Shawn Guo's one.

Signed-off-by: Michael Walle 
---
This is marked as an RFC because it is more of a question for Shawn if
he is willing to take over the maintainership.
---
 .../devicetree/bindings/memory-controllers/fsl/fsl,ifc.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/Documentation/devicetree/bindings/memory-controllers/fsl/fsl,ifc.yaml 
b/Documentation/devicetree/bindings/memory-controllers/fsl/fsl,ifc.yaml
index 3be1db30bf41..d1c3421bee10 100644
--- a/Documentation/devicetree/bindings/memory-controllers/fsl/fsl,ifc.yaml
+++ b/Documentation/devicetree/bindings/memory-controllers/fsl/fsl,ifc.yaml
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: FSL/NXP Integrated Flash Controller
 
 maintainers:
-  - Li Yang 
+  - Shawn Guo 
 
 description: |
   NXP's integrated flash controller (IFC) is an advanced version of the
-- 
2.39.2



[PATCH 1/2] dt-bindings: Drop Li Yang as maintainer for all bindings

2024-06-04 Thread Michael Walle
Remove Li Yang from all device tree bindings because mails to this
address are bouncing.

Commit fbdd90334a62 ("MAINTAINERS: Drop Li Yang as their email address
stopped working") already removed the entry from the MAINTAINERS but
didn't address all the in-file entries of the device tree bindings.

Signed-off-by: Michael Walle 
---
 Documentation/devicetree/bindings/arm/fsl.yaml   | 1 -
 .../devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml  | 1 -
 .../devicetree/bindings/soc/fsl/fsl,layerscape-dcfg.yaml | 1 -
 .../devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml | 1 -
 4 files changed, 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/fsl.yaml 
b/Documentation/devicetree/bindings/arm/fsl.yaml
index f731fb5b5e2a..8a1f8e387a61 100644
--- a/Documentation/devicetree/bindings/arm/fsl.yaml
+++ b/Documentation/devicetree/bindings/arm/fsl.yaml
@@ -8,7 +8,6 @@ title: Freescale i.MX Platforms
 
 maintainers:
   - Shawn Guo 
-  - Li Yang 
 
 properties:
   $nodename:
diff --git 
a/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml 
b/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml
index 887e565b9573..199b34fdbefc 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml
+++ b/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml
@@ -8,7 +8,6 @@ title: Freescale Layerscape External Interrupt Controller
 
 maintainers:
   - Shawn Guo 
-  - Li Yang 
 
 description: |
   Some Layerscape SOCs (LS1021A, LS1043A, LS1046A LS1088A, LS208xA,
diff --git a/Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-dcfg.yaml 
b/Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-dcfg.yaml
index ce1a6505eb51..3fb0534ea597 100644
--- a/Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-dcfg.yaml
+++ b/Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-dcfg.yaml
@@ -8,7 +8,6 @@ title: Freescale Layerscape Device Configuration Unit
 
 maintainers:
   - Shawn Guo 
-  - Li Yang 
 
 description: |
   DCFG is the device configuration unit, that provides general purpose
diff --git a/Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml 
b/Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml
index a6a511b00a12..2a456c8af992 100644
--- a/Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml
+++ b/Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml
@@ -8,7 +8,6 @@ title: Freescale Layerscape Supplemental Configuration Unit
 
 maintainers:
   - Shawn Guo 
-  - Li Yang 
 
 description: |
   SCFG is the supplemental configuration unit, that provides SoC specific
-- 
2.39.2



Re: [PATCH 4/6] KVM: PPC: Book3S HV: Add one-reg interface for DEXCR register

2024-06-04 Thread Michael Ellerman
"Nicholas Piggin"  writes:
> On Mon Jun 3, 2024 at 9:14 PM AEST, Shivaprasad G Bhat wrote:
>> The patch adds a one-reg register identifier which can be used to
>> read and set the DEXCR for the guest during enter/exit with
>> KVM_REG_PPC_DEXCR. The specific SPR KVM API documentation
>> too updated.
>
> I wonder if the uapi and documentation parts should go in their
> own patch in a ppc kvm uapi topic branch?

I'll put the whole series in the topic/ppc-kvm branch, which I think is
probably sufficient.

cheers


Re: [PATCH 4/6] KVM: PPC: Book3S HV: Add one-reg interface for DEXCR register

2024-06-04 Thread Michael Ellerman
Shivaprasad G Bhat  writes:
> The patch adds a one-reg register identifier which can be used to
> read and set the DEXCR for the guest during enter/exit with
> KVM_REG_PPC_DEXCR. The specific SPR KVM API documentation
> too updated.
>
> Signed-off-by: Shivaprasad G Bhat 
> ---
>  Documentation/virt/kvm/api.rst|1 +
>  arch/powerpc/include/uapi/asm/kvm.h   |1 +
>  arch/powerpc/kvm/book3s_hv.c  |6 ++
>  tools/arch/powerpc/include/uapi/asm/kvm.h |1 +
 
Headers under tools/ are not supposed to be updated directly, they're
synced later by the perf developers.

See: https://lore.kernel.org/all/ZlYxAdHjyAkvGtMW@x1/

cheers


Re: [PATCH] powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64

2024-06-02 Thread Michael Ellerman
On Wed, 29 May 2024 09:28:50 -0700, Samuel Holland wrote:
> When building a 32-bit kernel, some toolchains do not allow mixing soft
> float and hard float object files:
> 
> LD  vmlinux.o
>   powerpc64le-unknown-linux-musl-ld: lib/test_fpu_impl.o uses hard float, 
> arch/powerpc/kernel/udbg.o uses soft float
>   powerpc64le-unknown-linux-musl-ld: failed to merge target specific data of 
> file lib/test_fpu_impl.o
>   make[2]: *** [scripts/Makefile.vmlinux_o:62: vmlinux.o] Error 1
>   make[1]: *** [Makefile:1152: vmlinux_o] Error 2
>   make: *** [Makefile:240: __sub-make] Error 2
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64
  https://git.kernel.org/powerpc/c/be2fc65d66e0406cc9d39d40becaecdf4ee765f3

cheers


Re: [PATCH bpf v3] powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH

2024-06-02 Thread Michael Ellerman
On Mon, 13 May 2024 10:02:48 +, Puranjay Mohan wrote:
> The Linux Kernel Memory Model [1][2] requires RMW operations that have a
> return value to be fully ordered.
> 
> BPF atomic operations with BPF_FETCH (including BPF_XCHG and
> BPF_CMPXCHG) return a value back so they need to be JITed to fully
> ordered operations. POWERPC currently emits relaxed operations for
> these.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
  https://git.kernel.org/powerpc/c/b1e7cee96127468c2483cf10c2899c9b5cf79bf8

cheers


Re: [PATCH v2] powerpc/pseries/lparcfg: drop error message from guest name lookup

2024-06-02 Thread Michael Ellerman
On Fri, 24 May 2024 14:29:54 -0500, Nathan Lynch wrote:
> It's not an error or exceptional situation when the hosting
> environment does not expose a name for the LP/guest via RTAS or the
> device tree. This happens with qemu when run without the '-name'
> option. The message also lacks a newline. Remove it.
> 
> 

Applied to powerpc/fixes.

[1/1] powerpc/pseries/lparcfg: drop error message from guest name lookup
  https://git.kernel.org/powerpc/c/12870ae3818e39ea65bf710f645972277b634f72

cheers


Re: [PATCH v2 1/2] powerpc/uaccess: Fix build errors seen with GCC 13/14

2024-06-02 Thread Michael Ellerman
On Wed, 29 May 2024 22:30:28 +1000, Michael Ellerman wrote:
> Building ppc64le_defconfig with GCC 14 fails with assembler errors:
> 
> CC  fs/readdir.o
>   /tmp/ccdQn0mD.s: Assembler messages:
>   /tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 
> 4)
>   /tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 
> 4)
>   ... [6 lines]
>   /tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple of 
> 4)
> 
> [...]

Applied to powerpc/fixes.

[1/2] powerpc/uaccess: Fix build errors seen with GCC 13/14
  https://git.kernel.org/powerpc/c/2d43cc701b96f910f50915ac4c2a0cae5deb734c
[2/2] powerpc/uaccess: Use YZ asm constraint for ld
  https://git.kernel.org/powerpc/c/50934945d54238d2d6d8db4b7c1d4c90d2696c57

cheers


[GIT PULL] Please pull powerpc/linux.git powerpc-6.10-2 tag

2024-06-01 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Linus,

Please pull powerpc fixes for 6.10:

The following changes since commit 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0:

  Linux 6.10-rc1 (2024-05-26 15:20:12 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-6.10-2

for you to fetch changes up to be2fc65d66e0406cc9d39d40becaecdf4ee765f3:

  powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64 (2024-05-30 22:57:27 
+1000)

- --
powerpc fixes for 6.10 #2

 - Enforce full ordering for ATOMIC operations with BPF_FETCH.

 - Fix uaccess build errors seen with GCC 13/14.

 - Fix build errors on ppc32 due to ARCH_HAS_KERNEL_FPU_SUPPORT.

 - Drop error message from lparcfg guest name lookup.

Thanks to: Christophe Leroy, Guenter Roeck, Nathan Lynch, Naveen N Rao, Puranjay
Mohan, Samuel Holland.

- --
Michael Ellerman (2):
  powerpc/uaccess: Fix build errors seen with GCC 13/14
  powerpc/uaccess: Use YZ asm constraint for ld

Nathan Lynch (1):
  powerpc/pseries/lparcfg: drop error message from guest name lookup

Puranjay Mohan (1):
  powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH

Samuel Holland (1):
  powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64


 arch/powerpc/Kconfig |  2 +-
 arch/powerpc/include/asm/uaccess.h   | 27 
 arch/powerpc/net/bpf_jit_comp32.c| 12 +
 arch/powerpc/net/bpf_jit_comp64.c| 12 +
 arch/powerpc/platforms/pseries/lparcfg.c |  4 +--
 5 files changed, 54 insertions(+), 3 deletions(-)
-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmZbtSIACgkQUevqPMjh
pYALTQ//YkCAb17EW6iQfmcuaq7Amhz5QUDUU3TFhfcmmuDd3Fw3bh9sppF+so0S
UsZpBRgY9C6xFkOqpyrqj9KOSXNsWE5m46Hp0Cl7BlkdeM2c68T77BxN5pEcnI4i
so64UHaLDI0miirQE25ihA4BdmtzAfw6PL6vubcBoLrlSWktXDQXBZb0EkOeYNiR
QI+4EnwfkUiw+55eXEHoWIwWuyW7oMd2px8wXEnb9daOxu7NqDhKINVYN8g0If8u
m90egWk56gq1A/ei43zeqPQAKi8hvQe93+tkmCI7NkJCx+YrYmUStqIC/4/iZlJd
XMaUre8mckU6eWRfUL5G28BDETtNm3t2TlflJ+GK1XvLd0LOj2SEk0f5i2bXA6ey
o1ISDVE3dRbS3CzfiZw8S7QvJiCeqBU1d3gMjNge+c1iSG0rgd20tMA17+SEKaHn
W2tSZev6P9vbpF+9R0kxvyCRQ3EmPeOReSk0XIXU+X0V2NFIEIzQJMS/2NL2Mro8
O5tj3elpRbqSa/rEUUQymUpQ3qEkTfoFrIAiCkFvu+OcAtq26OL18olxWo0RF9Fg
8ElHjsGLMDNPyTrBIIIegcgsX+/fvGbwg5NpQlXOD564Y0cMfYi8kYwgU6Z9ism1
YfKFgvrj1akmTMrZobqZ1N0tCjXZVbRP+ykyT6uUtG7ut4v4d8U=
=+3xo
-END PGP SIGNATURE-


Re: Please add powerpc topic/kdump-hotplug branch to linux-next

2024-06-01 Thread Michael Ellerman
Stephen Rothwell  writes:
> Hi Michael,
>
> On Tue, 23 Apr 2024 23:56:42 +1000 Michael Ellerman  
> wrote:
>>
> \> Can you please add the topic/kdump-hotplug branch of the powerpc tree to
>> linux-next. It contains a series that touches generic kexec code as well
>> as x86 and powerpc code.
>> 
>> The hope is to have to get it merged for v6.10, so it should go along
>> with the powerpc next branch in the merge order.
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/log/?h=topic/kdump-hotplug
>
> Added from today.

This branch has now been merged and can be dropped from linux-next,
thanks.

cheers


Re: Xorg doesn't start and some other issues with the RC1 of kernel 6.10

2024-05-31 Thread Michael Ellerman
Michael Ellerman  writes:
> Christian Zigotzky  writes:
>> On 28.05.24 22:00, Christian Zigotzky wrote:
>>> Hi All,
>>>
>>> Xorg doesn't start anymore since the RC1 of kernel 6.10. We tested it 
>>> with the VirtIO GPU and with some Radeon cards.
>>>
>>> Another error message: Failed to start Setup Virtual Console.
>>>
>>> Maybe this is the issue: + CONFIG_ARCH_HAS_KERNEL_FPU_SUPPORT=y
>>>
>>> Tested with FSL P5040, FSL P5020, and PASEMI boards.
>>>
>>> Could you please test Xorg on your PowerPC machines?
>>>
>>> Thanks,
>>> Christian
>> I tested the RC1 in a virtual e5500 QEMU PowerPC machine with Bochs VGA 
>> (-device VGA,vgamem_mb=256) and Xorg doesn't start either.
>>
>> Error message: xf86OpenConsole: KDSETMODE KD_GRAPHICS failed 
>> Inappropriate ioctl for device.
>
> That is presumably because of this:
>   https://lore.kernel.org/all/0da9785e-ba44-4718-9d08-4e96c1ba7...@kernel.org/

Attempting to regzbot this.

#regzbot introduced: 8c467f330059
#regzbot monitor: 
https://lore.kernel.org/all/0da9785e-ba44-4718-9d08-4e96c1ba7...@kernel.org/

cheers


Re: [PATCH 04/23] scsi: initialize scsi midlayer limits before allocating the queue

2024-05-31 Thread Michael Ellerman
Christoph Hellwig  writes:
> On Fri, May 31, 2024 at 12:28:21AM +1000, Michael Ellerman wrote:
>> No that's wrong. The actual hardware page size is 4K, but
>> CONFIG_PAGE_SIZE and PAGE_SHIFT etc. is 64K.
>> 
>> So at least for this user the driver used to work with 64K pages, and
>> now doesn't.
>
> Which suggested that the communicated max_hw_sectors is wrong, and
> previously we were saved by the block layer increasing it to
> PAGE_SIZE after a warning.  Should we just increment it to 64k?

It looks like that user actually only has the CDROM hanging off
pata_macio, so it's possible it has been broken previously and they
didn't notice. I'll see if they can confirm the CDROM has been working
up until now.

I can test the CDROM on my G5 next week.

cheers


Re: [PATCH 04/23] scsi: initialize scsi midlayer limits before allocating the queue

2024-05-30 Thread Michael Ellerman
Michael Ellerman  writes:
> "Linux regression tracking (Thorsten Leemhuis)"  
> writes:
>> [CCing the regression list, as it should be in the loop for regressions:
>> https://docs.kernel.org/admin-guide/reporting-regressions.html]
>>
>> On 20.05.24 17:15, Christoph Hellwig wrote:
>>> Adding ben and the linuxppc list.
>>
>> Hmm, no reply and no other progress to get this resolved afaics. So lets
>> bring Michael into the mix, he might be able to help out.
>
> Sorry I didn't see the original forward for some reason.
>
> I haven't seen this on my G5, but it's hard drive is on SATA. I think
> the CDROM is pata_macio, but there isn't a disk in the drive to test
> with.
>
>> BTW TWIMC: a PowerMac G5 user user reported similar symptoms here
>> recently: https://bugzilla.kernel.org/show_bug.cgi?id=218858
>
> AFAICS that report is from a 4K page size kernel (Page orders: ...
> virtual = 12), so there must be something else going on?

No that's wrong. The actual hardware page size is 4K, but
CONFIG_PAGE_SIZE and PAGE_SHIFT etc. is 64K.

So at least for this user the driver used to work with 64K pages, and
now doesn't.

cheers


Re: Xorg doesn't start and some other issues with the RC1 of kernel 6.10

2024-05-30 Thread Michael Ellerman
Christian Zigotzky  writes:
> On 28.05.24 22:00, Christian Zigotzky wrote:
>> Hi All,
>>
>> Xorg doesn't start anymore since the RC1 of kernel 6.10. We tested it 
>> with the VirtIO GPU and with some Radeon cards.
>>
>> Another error message: Failed to start Setup Virtual Console.
>>
>> Maybe this is the issue: + CONFIG_ARCH_HAS_KERNEL_FPU_SUPPORT=y
>>
>> Tested with FSL P5040, FSL P5020, and PASEMI boards.
>>
>> Could you please test Xorg on your PowerPC machines?
>>
>> Thanks,
>> Christian
> I tested the RC1 in a virtual e5500 QEMU PowerPC machine with Bochs VGA 
> (-device VGA,vgamem_mb=256) and Xorg doesn't start either.
>
> Error message: xf86OpenConsole: KDSETMODE KD_GRAPHICS failed 
> Inappropriate ioctl for device.

That is presumably because of this:
  https://lore.kernel.org/all/0da9785e-ba44-4718-9d08-4e96c1ba7...@kernel.org/

cheers


Re: [PATCH 04/23] scsi: initialize scsi midlayer limits before allocating the queue

2024-05-30 Thread Michael Ellerman
"Linux regression tracking (Thorsten Leemhuis)"  
writes:
> [CCing the regression list, as it should be in the loop for regressions:
> https://docs.kernel.org/admin-guide/reporting-regressions.html]
>
> On 20.05.24 17:15, Christoph Hellwig wrote:
>> Adding ben and the linuxppc list.
>
> Hmm, no reply and no other progress to get this resolved afaics. So lets
> bring Michael into the mix, he might be able to help out.

Sorry I didn't see the original forward for some reason.

I haven't seen this on my G5, but it's hard drive is on SATA. I think
the CDROM is pata_macio, but there isn't a disk in the drive to test
with.

> BTW TWIMC: a PowerMac G5 user user reported similar symptoms here
> recently: https://bugzilla.kernel.org/show_bug.cgi?id=218858

AFAICS that report is from a 4K page size kernel (Page orders: ...
virtual = 12), so there must be something else going on?

I've asked the reporter to confirm the page size.

cheers

>> Context: pata_macio initialization now fails as we enforce that the
>> segment size is set properly.
>> 
>> On Wed, May 15, 2024 at 04:52:29PM -0700, Guenter Roeck wrote:
>>> pata_macio_common_init() Calling ata_host_activate() with limit 65280
>>> ...
>>> max_segment_size is 65280; PAGE_SIZE is 65536; BLK_MAX_SEGMENT_SIZE is 65536
>>> WARNING: CPU: 0 PID: 12 at block/blk-settings.c:202 
>>> blk_validate_limits+0x2d4/0x364
>>> ...
>>>
>>> This is with PPC_BOOK3S_64 which selects a default page size of 64k.
>> 
>> Yeah.  Did you actually manage to use pata macio previously?  Or is
>> it just used because it's part of the pmac default config?
>> 
>>> Looking at the old code, I think it did what you suggested above,
>> 
>>> but assuming that the driver requested a lower limit on purpose that
>>> may not be the best solution.
>> 
>>> Never mind, though - I updated my test configuration to explicitly
>>> configure the page size to 4k to work around the problem. With that,
>>> please consider this report a note in case someone hits the problem
>>> on a real system (and sorry for the noise).
>> 
>> Yes, the idea behind this change was to catch such errors.  So far
>> most errors have been drivers setting lower limits than what the
>> hardware can actually handle, but I'd love to track this down.
>> 
>> If the hardware can't actually handle the lower limit we should
>> probably just fail the probe gracefully with a well comment if
>> statement instead.


Re: [PATCH] selftests/overlayfs: Fix build error on ppc64

2024-05-29 Thread Michael Ellerman
Shuah Khan  writes:
> On 5/20/24 20:26, Michael Ellerman wrote:
>> Fix build error on ppc64:
>>dev_in_maps.c: In function ‘get_file_dev_and_inode’:
>>dev_in_maps.c:60:59: error: format ‘%llu’ expects argument of type
>>‘long long unsigned int *’, but argument 7 has type ‘__u64 *’ {aka ‘long
>>unsigned int *’} [-Werror=format=]
>> 
>> By switching to unsigned long long for u64 for ppc64 builds.
>> 
>> Signed-off-by: Michael Ellerman 
>> ---
>>   tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c | 1 +
>>   1 file changed, 1 insertion(+)
>> 
>> diff --git a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c 
>> b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
>> index 759f86e7d263..2862aae58b79 100644
>> --- a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
>> +++ b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
>> @@ -1,5 +1,6 @@
>>   // SPDX-License-Identifier: GPL-2.0
>>   #define _GNU_SOURCE
>> +#define __SANE_USERSPACE_TYPES__ // Use ll64
>>   
>>   #include 
>>   #include 
>
> Applied to linux-kselftest fixes for the next rc.

Thanks.

> Michael, If you want to take this through, let me know, I can drop this.

I'm happy for you to take this one and the others.

cheers


[PATCH v2 2/2] powerpc/uaccess: Use YZ asm constraint for ld

2024-05-29 Thread Michael Ellerman
The 'ld' instruction requires a 4-byte aligned displacement because it
is a DS-form instruction. But the "m" asm constraint doesn't enforce
that.

Add a special case of __get_user_asm2_goto() so that the "YZ" constraint
can be used for "ld".

The "Z" constraint is documented in the GCC manual PowerPC machine
constraints, and specifies a "memory operand accessed with indexed or
indirect addressing". "Y" is not documented in the manual but specifies
a "memory operand for a DS-form instruction". Using both allows the
compiler to generate a DS-form "ld" or X-form "ldx" as appropriate.

The change has to be conditional on CONFIG_PPC_KERNEL_PREFIXED because
the "Y" constraint does not guarantee 4-byte alignment when prefixed
instructions are enabled.

No build errors have been reported due to this, but the possibility is
there depending on compiler code generation decisions.

Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with 
__put_user()/__get_user()")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/uaccess.h | 11 +++
 1 file changed, 11 insertions(+)

v2: Unchanged.

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 4cba724c8899..fd594bf6c6a9 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -181,8 +181,19 @@ do {   
\
 #endif
 
 #ifdef __powerpc64__
+#ifdef CONFIG_PPC_KERNEL_PREFIXED
 #define __get_user_asm2_goto(x, addr, label)   \
__get_user_asm_goto(x, addr, label, "ld")
+#else
+#define __get_user_asm2_goto(x, addr, label)   \
+   asm_goto_output(\
+   "1: ld%U1%X1 %0, %1 # get_user\n"   \
+   EX_TABLE(1b, %l2)   \
+   : "=r" (x)  \
+   : DS_FORM_CONSTRAINT (*addr)\
+   :   \
+   : label)
+#endif // CONFIG_PPC_KERNEL_PREFIXED
 #else /* __powerpc64__ */
 #define __get_user_asm2_goto(x, addr, label)   \
asm_goto_output(\
-- 
2.45.1



[PATCH v2 1/2] powerpc/uaccess: Fix build errors seen with GCC 13/14

2024-05-29 Thread Michael Ellerman
Building ppc64le_defconfig with GCC 14 fails with assembler errors:

CC  fs/readdir.o
  /tmp/ccdQn0mD.s: Assembler messages:
  /tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 4)
  /tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 4)
  ... [6 lines]
  /tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple of 4)

A snippet of the asm shows:

  # ../fs/readdir.c:210: unsafe_copy_dirent_name(dirent->d_name, name, 
namlen, efault_end);
 ld 9,0(29)   # MEM[(u64 *)name_38(D) + _88 * 1], MEM[(u64 
*)name_38(D) + _88 * 1]
  # 210 "../fs/readdir.c" 1
 1:  std 9,18(8) # put_user   # *__pus_addr_52, MEM[(u64 
*)name_38(D) + _88 * 1]

The 'std' instruction requires a 4-byte aligned displacement because
it is a DS-form instruction, and as the assembler says, 18 is not a
multiple of 4.

A similar error is seen with GCC 13 and CONFIG_UBSAN_SIGNED_WRAP=y.

The fix is to change the constraint on the memory operand to put_user(),
from "m" which is a general memory reference to "YZ".

The "Z" constraint is documented in the GCC manual PowerPC machine
constraints, and specifies a "memory operand accessed with indexed or
indirect addressing". "Y" is not documented in the manual but specifies
a "memory operand for a DS-form instruction". Using both allows the
compiler to generate a DS-form "std" or X-form "stdx" as appropriate.

The change has to be conditional on CONFIG_PPC_KERNEL_PREFIXED because
the "Y" constraint does not guarantee 4-byte alignment when prefixed
instructions are enabled.

Unfortunately clang doesn't support the "Y" constraint so that has to be
behind an ifdef.

Although the build error is only seen with GCC 13/14, that appears
to just be luck. The constraint has been incorrect since it was first
added.

Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with 
__put_user()/__get_user()")
Cc: sta...@vger.kernel.org # v5.10+
Suggested-by: Kewen Lin 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/uaccess.h | 16 
 1 file changed, 16 insertions(+)

v2: Update changelog to mention breakage with GCC 13 and UBSAN.

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index de10437fd206..4cba724c8899 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -92,9 +92,25 @@ __pu_failed: 
\
: label)
 #endif
 
+#ifdef CONFIG_CC_IS_CLANG
+#define DS_FORM_CONSTRAINT "Z<>"
+#else
+#define DS_FORM_CONSTRAINT "YZ<>"
+#endif
+
 #ifdef __powerpc64__
+#ifdef CONFIG_PPC_KERNEL_PREFIXED
 #define __put_user_asm2_goto(x, ptr, label)\
__put_user_asm_goto(x, ptr, label, "std")
+#else
+#define __put_user_asm2_goto(x, addr, label)   \
+   asm goto ("1: std%U1%X1 %0,%1   # put_user\n"   \
+   EX_TABLE(1b, %l2)   \
+   :   \
+   : "r" (x), DS_FORM_CONSTRAINT (*addr)   \
+   :   \
+   : label)
+#endif // CONFIG_PPC_KERNEL_PREFIXED
 #else /* __powerpc64__ */
 #define __put_user_asm2_goto(x, addr, label)   \
asm goto(   \
-- 
2.45.1



Re: [PATCH] selftests/openat2: Fix build warnings on ppc64

2024-05-29 Thread Michael Ellerman
Muhammad Usama Anjum  writes:
> I was looking at if we can add this flag for ppc64 for all selftests
> somewhere. But there isn't any suitable place other than in KHDR_INCLUDES.
> But there is a series already trying to add _GNU_SOURCE to it.

IMHO adding other flags to KHDR_INCLUDES is not the right solution, it
conflates unrelated things. Some tests may want the kernel headers but
not _GNU_SOURCE, or vice versa.

Adding a separate define for "standard kselftest flags" would be
preferable, and then something like __SANE_USERSPACE_TYPES__ would make
sense being added to it.

> Reviewed-by: Muhammad Usama Anjum 

Thanks.

cheers

> On 5/20/24 8:03 PM, Michael Ellerman wrote:
>> Fix warnings like:
>> 
>>   openat2_test.c: In function ‘test_openat2_flags’:
>>   openat2_test.c:303:73: warning: format ‘%llX’ expects argument of type
>>   ‘long long unsigned int’, but argument 5 has type ‘__u64’ {aka ‘long
>>   unsigned int’} [-Wformat=]
>> 
>> By switching to unsigned long long for u64 for ppc64 builds.
>> 
>> Signed-off-by: Michael Ellerman 
>> ---
>>  tools/testing/selftests/openat2/openat2_test.c | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/tools/testing/selftests/openat2/openat2_test.c 
>> b/tools/testing/selftests/openat2/openat2_test.c
>> index 9024754530b2..5790ab446527 100644
>> --- a/tools/testing/selftests/openat2/openat2_test.c
>> +++ b/tools/testing/selftests/openat2/openat2_test.c
>> @@ -5,6 +5,7 @@
>>   */
>>  
>>  #define _GNU_SOURCE
>> +#define __SANE_USERSPACE_TYPES__ // Use ll64
>>  #include 
>>  #include 
>>  #include 
>
> -- 
> BR,
> Muhammad Usama Anjum


Re: [PATCH v2 2/2] powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFC

2024-05-29 Thread Michael Ellerman
Esben Haabendal  writes:
> Krzysztof Kozlowski  writes:
>
>> On 28/05/2024 14:28, Esben Haabendal wrote:
>>> With CONFIG_FSL_IFC now being user-visible, and thus changed from a select
>>> to depends in CONFIG_MTD_NAND_FSL_IFC, the dependencies needs to be
>>> selected in config snippets.
>>> 
>>> Signed-off-by: Esben Haabendal 
>>> ---
>>>  arch/powerpc/configs/85xx-hw.config | 2 ++
>>>  1 file changed, 2 insertions(+)
>>> 
>>> diff --git a/arch/powerpc/configs/85xx-hw.config 
>>> b/arch/powerpc/configs/85xx-hw.config
>>> index 524db76f47b7..8aff83217397 100644
>>> --- a/arch/powerpc/configs/85xx-hw.config
>>> +++ b/arch/powerpc/configs/85xx-hw.config
>>> @@ -24,6 +24,7 @@ CONFIG_FS_ENET=y
>>>  CONFIG_FSL_CORENET_CF=y
>>>  CONFIG_FSL_DMA=y
>>>  CONFIG_FSL_HV_MANAGER=y
>>> +CONFIG_FSL_IFC=y
>>
>> Does not look like placed according to config order.
>
> Correct.
>
>> This is not alphabetically sorted, but as Kconfig creates it (make
>> savedefconfig).
>
> Are you sure about this?
>
> It looks very much alphabetically sorted, with only two "errors"
>
> $ diff -u 85xx-hw.config 85xx-hw.config.sorted 
> --- 85xx-hw.config  2024-05-28 15:05:44.665354428 +0200
> +++ 85xx-hw.config.sorted   2024-05-28 15:05:56.102019081 +0200
> @@ -15,8 +15,8 @@
>  CONFIG_DMADEVICES=y
>  CONFIG_E1000E=y
>  CONFIG_E1000=y
> -CONFIG_EDAC=y
>  CONFIG_EDAC_MPC85XX=y
> +CONFIG_EDAC=y
>  CONFIG_EEPROM_AT24=y
>  CONFIG_EEPROM_LEGACY=y
>  CONFIG_FB_FSL_DIU=y
> @@ -71,10 +71,10 @@
>  CONFIG_MTD_CMDLINE_PARTS=y
>  CONFIG_MTD_NAND_FSL_ELBC=y
>  CONFIG_MTD_NAND_FSL_IFC=y
> -CONFIG_MTD_RAW_NAND=y
>  CONFIG_MTD_PHYSMAP_OF=y
>  CONFIG_MTD_PHYSMAP=y
>  CONFIG_MTD_PLATRAM=y
> +CONFIG_MTD_RAW_NAND=y
>  CONFIG_MTD_SPI_NOR=y
>  CONFIG_NETDEVICES=y
>  CONFIG_NVRAM=y
>
> I don't think that this file has ever been Kconfig sorted since it was
> created back in ancient times.
>
> And as it is merged with other config snippets using merge_into_defconfig
> function. I have no idea how to use savedefconfig to maintain such a snippet.
> It would require doing the reverse of the merge_into_defconfig.

Right. This is a config fragment, not a full config, so it's not managed
with savedefconfig.

Alphabetical order is preferable when adding new symbols.

cheers


[PATCH 3/6] powerpc/64e: Drop E500 ifdefs in 64-bit code

2024-05-24 Thread Michael Ellerman
All 64-bit Book3E have E500=y, so drop the unneeded ifdefs.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/nohash/tlb_64e.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/arch/powerpc/mm/nohash/tlb_64e.c b/arch/powerpc/mm/nohash/tlb_64e.c
index 7d5506d23eab..9db85ee9ba5b 100644
--- a/arch/powerpc/mm/nohash/tlb_64e.c
+++ b/arch/powerpc/mm/nohash/tlb_64e.c
@@ -85,7 +85,6 @@ static void __init setup_page_sizes(void)
unsigned int eptcfg;
int psize;
 
-#ifdef CONFIG_PPC_E500
unsigned int mmucfg = mfspr(SPRN_MMUCFG);
int fsl_mmu = mmu_has_feature(MMU_FTR_TYPE_FSL_E);
 
@@ -151,7 +150,6 @@ static void __init setup_page_sizes(void)
 
goto out;
}
-#endif
 out:
/* Cleanup array and print summary */
pr_info("MMU: Supported page sizes\n");
@@ -180,13 +178,11 @@ static void __init setup_mmu_htw(void)
 */
 
switch (book3e_htw_mode) {
-#ifdef CONFIG_PPC_E500
case PPC_HTW_E6500:
extlb_level_exc = EX_TLB_SIZE;
patch_exception(0x1c0, exc_data_tlb_miss_e6500_book3e);
patch_exception(0x1e0, exc_instruction_tlb_miss_e6500_book3e);
break;
-#endif
}
pr_info("MMU: Book3E HW tablewalk %s\n",
book3e_htw_mode != PPC_HTW_NONE ? "enabled" : "not supported");
@@ -217,7 +213,6 @@ static void early_init_this_mmu(void)
}
mtspr(SPRN_MAS4, mas4);
 
-#ifdef CONFIG_PPC_E500
if (mmu_has_feature(MMU_FTR_TYPE_FSL_E)) {
unsigned int num_cams;
bool map = true;
@@ -238,7 +233,6 @@ static void early_init_this_mmu(void)
linear_map_top = map_mem_in_cams(linear_map_top,
 num_cams, false, true);
}
-#endif
 
/* A sync won't hurt us after mucking around with
 * the MMU configuration
@@ -270,7 +264,6 @@ static void __init early_init_mmu_global(void)
/* Look for HW tablewalk support */
setup_mmu_htw();
 
-#ifdef CONFIG_PPC_E500
if (mmu_has_feature(MMU_FTR_TYPE_FSL_E)) {
if (book3e_htw_mode == PPC_HTW_NONE) {
extlb_level_exc = EX_TLB_SIZE;
@@ -279,7 +272,6 @@ static void __init early_init_mmu_global(void)
exc_instruction_tlb_miss_bolted_book3e);
}
}
-#endif
 
/* Set the global containing the top of the linear mapping
 * for use by the TLB miss code
@@ -291,7 +283,6 @@ static void __init early_init_mmu_global(void)
 
 static void __init early_mmu_set_memory_limit(void)
 {
-#ifdef CONFIG_PPC_E500
if (mmu_has_feature(MMU_FTR_TYPE_FSL_E)) {
/*
 * Limit memory so we dont have linear faults.
@@ -302,7 +293,6 @@ static void __init early_mmu_set_memory_limit(void)
 */
memblock_enforce_memory_limit(linear_map_top);
}
-#endif
 
memblock_set_current_limit(linear_map_top);
 }
@@ -340,7 +330,6 @@ void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
 * We crop it to the size of the first MEMBLOCK to
 * avoid going over total available memory just in case...
 */
-#ifdef CONFIG_PPC_E500
if (early_mmu_has_feature(MMU_FTR_TYPE_FSL_E)) {
unsigned long linear_sz;
unsigned int num_cams;
@@ -353,7 +342,6 @@ void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
 
ppc64_rma_size = min_t(u64, linear_sz, 0x4000);
} else
-#endif
ppc64_rma_size = min_t(u64, first_memblock_size, 0x4000);
 
/* Finally limit subsequent allocations */
-- 
2.45.1



[PATCH 2/6] powerpc/64e: Split out nohash Book3E 64-bit code

2024-05-24 Thread Michael Ellerman
A reasonable chunk of nohash/tlb.c is 64-bit only code, split it out
into a separate file.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/nohash/Makefile  |   2 +-
 arch/powerpc/mm/nohash/tlb.c | 343 +
 arch/powerpc/mm/nohash/tlb_64e.c | 361 +++
 3 files changed, 363 insertions(+), 343 deletions(-)
 create mode 100644 arch/powerpc/mm/nohash/tlb_64e.c

diff --git a/arch/powerpc/mm/nohash/Makefile b/arch/powerpc/mm/nohash/Makefile
index f3894e79d5f7..24b445a5fcac 100644
--- a/arch/powerpc/mm/nohash/Makefile
+++ b/arch/powerpc/mm/nohash/Makefile
@@ -3,7 +3,7 @@
 ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 
 obj-y  += mmu_context.o tlb.o tlb_low.o kup.o
-obj-$(CONFIG_PPC_BOOK3E_64)+= tlb_low_64e.o book3e_pgtable.o
+obj-$(CONFIG_PPC_BOOK3E_64)+= tlb_64e.o tlb_low_64e.o book3e_pgtable.o
 obj-$(CONFIG_40x)  += 40x.o
 obj-$(CONFIG_44x)  += 44x.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index a5bb87ec8578..f57dc721d063 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -110,28 +110,6 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
 };
 #endif
 
-/* The variables below are currently only used on 64-bit Book3E
- * though this will probably be made common with other nohash
- * implementations at some point
- */
-#ifdef CONFIG_PPC64
-
-int mmu_pte_psize; /* Page size used for PTE pages */
-int mmu_vmemmap_psize; /* Page size used for the virtual mem map */
-int book3e_htw_mode;   /* HW tablewalk?  Value is PPC_HTW_* */
-unsigned long linear_map_top;  /* Top of linear mapping */
-
-
-/*
- * Number of bytes to add to SPRN_SPRG_TLB_EXFRAME on crit/mcheck/debug
- * exceptions.  This is used for bolted and e6500 TLB miss handlers which
- * do not modify this SPRG in the TLB miss code; for other TLB miss handlers,
- * this is set to zero.
- */
-int extlb_level_exc;
-
-#endif /* CONFIG_PPC64 */
-
 #ifdef CONFIG_PPC_E500
 /* next_tlbcam_idx is used to round-robin tlbcam entry assignment */
 DEFINE_PER_CPU(int, next_tlbcam_idx);
@@ -358,326 +336,7 @@ void tlb_flush(struct mmu_gather *tlb)
flush_tlb_mm(tlb->mm);
 }
 
-/*
- * Below are functions specific to the 64-bit variant of Book3E though that
- * may change in the future
- */
-
-#ifdef CONFIG_PPC64
-
-/*
- * Handling of virtual linear page tables or indirect TLB entries
- * flushing when PTE pages are freed
- */
-void tlb_flush_pgtable(struct mmu_gather *tlb, unsigned long address)
-{
-   int tsize = mmu_psize_defs[mmu_pte_psize].enc;
-
-   if (book3e_htw_mode != PPC_HTW_NONE) {
-   unsigned long start = address & PMD_MASK;
-   unsigned long end = address + PMD_SIZE;
-   unsigned long size = 1UL << mmu_psize_defs[mmu_pte_psize].shift;
-
-   /* This isn't the most optimal, ideally we would factor out the
-* while preempt & CPU mask mucking around, or even the IPI but
-* it will do for now
-*/
-   while (start < end) {
-   __flush_tlb_page(tlb->mm, start, tsize, 1);
-   start += size;
-   }
-   } else {
-   unsigned long rmask = 0xf000ul;
-   unsigned long rid = (address & rmask) | 0x1000ul;
-   unsigned long vpte = address & ~rmask;
-
-   vpte = (vpte >> (PAGE_SHIFT - 3)) & ~0xffful;
-   vpte |= rid;
-   __flush_tlb_page(tlb->mm, vpte, tsize, 0);
-   }
-}
-
-static void __init setup_page_sizes(void)
-{
-   unsigned int tlb0cfg;
-   unsigned int eptcfg;
-   int psize;
-
-#ifdef CONFIG_PPC_E500
-   unsigned int mmucfg = mfspr(SPRN_MMUCFG);
-   int fsl_mmu = mmu_has_feature(MMU_FTR_TYPE_FSL_E);
-
-   if (fsl_mmu && (mmucfg & MMUCFG_MAVN) == MMUCFG_MAVN_V1) {
-   unsigned int tlb1cfg = mfspr(SPRN_TLB1CFG);
-   unsigned int min_pg, max_pg;
-
-   min_pg = (tlb1cfg & TLBnCFG_MINSIZE) >> TLBnCFG_MINSIZE_SHIFT;
-   max_pg = (tlb1cfg & TLBnCFG_MAXSIZE) >> TLBnCFG_MAXSIZE_SHIFT;
-
-   for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
-   struct mmu_psize_def *def;
-   unsigned int shift;
-
-   def = _psize_defs[psize];
-   shift = def->shift;
-
-   if (shift == 0 || shift & 1)
-   continue;
-
-   /* adjust to be in terms of 4^shift Kb */
-   shift = (shift - 10) >> 1;
-
-   if ((shift >= min_pg) && (shift <= max_pg))
-   def->flags |= M

[PATCH 6/6] powerpc/64e: Drop unused TLB miss handlers

2024-05-24 Thread Michael Ellerman
There are two possibilities for book3e_htw_mode, PPC_HTW_E6500 or
PPC_HTW_NONE.

The TLB miss handlers are patched to use, respectively:
  - exc_[data|indstruction]_tlb_miss_e6500_book3e
  - exc_[data|indstruction]_tlb_miss_bolted_book3e

Which means the default handlers are never used. Remove those, and use
the bolted handlers (PPC_HTW_NONE) by default.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/exceptions-64e.S |   4 +-
 arch/powerpc/mm/nohash/tlb_64e.c |   4 -
 arch/powerpc/mm/nohash/tlb_low_64e.S | 226 ---
 3 files changed, 2 insertions(+), 232 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index dcf0591ad3c2..63f6b9f513a4 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -485,8 +485,8 @@ interrupt_base_book3e:  
/* fake trap */
EXCEPTION_STUB(0x160, decrementer)  /* 0x0900 */
EXCEPTION_STUB(0x180, fixed_interval)   /* 0x0980 */
EXCEPTION_STUB(0x1a0, watchdog) /* 0x09f0 */
-   EXCEPTION_STUB(0x1c0, data_tlb_miss)
-   EXCEPTION_STUB(0x1e0, instruction_tlb_miss)
+   EXCEPTION_STUB(0x1c0, data_tlb_miss_bolted)
+   EXCEPTION_STUB(0x1e0, instruction_tlb_miss_bolted)
EXCEPTION_STUB(0x200, altivec_unavailable)
EXCEPTION_STUB(0x220, altivec_assist)
EXCEPTION_STUB(0x260, perfmon)
diff --git a/arch/powerpc/mm/nohash/tlb_64e.c b/arch/powerpc/mm/nohash/tlb_64e.c
index d83ecf466929..053128a5636c 100644
--- a/arch/powerpc/mm/nohash/tlb_64e.c
+++ b/arch/powerpc/mm/nohash/tlb_64e.c
@@ -244,10 +244,6 @@ static void __init early_init_mmu_global(void)
patch_exception(0x1c0, exc_data_tlb_miss_e6500_book3e);
patch_exception(0x1e0, exc_instruction_tlb_miss_e6500_book3e);
break;
-   case PPC_HTW_NONE:
-   patch_exception(0x1c0, exc_data_tlb_miss_bolted_book3e);
-   patch_exception(0x1e0, exc_instruction_tlb_miss_bolted_book3e);
-   break;
}
 
pr_info("MMU: Book3E HW tablewalk %s\n",
diff --git a/arch/powerpc/mm/nohash/tlb_low_64e.S 
b/arch/powerpc/mm/nohash/tlb_low_64e.S
index b0eb3f7eaed1..a54e7d6c3d0b 100644
--- a/arch/powerpc/mm/nohash/tlb_low_64e.S
+++ b/arch/powerpc/mm/nohash/tlb_low_64e.S
@@ -511,232 +511,6 @@ itlb_miss_fault_e6500:
tlb_epilog_bolted
b   exc_instruction_storage_book3e
 
-/**
- **
- * TLB miss handling for Book3E with TLB reservation and HES support  *
- **
- **/
-
-
-/* Data TLB miss */
-   START_EXCEPTION(data_tlb_miss)
-   TLB_MISS_PROLOG
-
-   /* Now we handle the fault proper. We only save DEAR in normal
-* fault case since that's the only interesting values here.
-* We could probably also optimize by not saving SRR0/1 in the
-* linear mapping case but I'll leave that for later
-*/
-   mfspr   r14,SPRN_ESR
-   mfspr   r16,SPRN_DEAR   /* get faulting address */
-   srdir15,r16,44  /* get region */
-   xoris   r15,r15,0xc
-   cmpldi  cr0,r15,0   /* linear mapping ? */
-   beq tlb_load_linear /* yes -> go to linear map load */
-   cmpldi  cr1,r15,1   /* vmalloc mapping ? */
-
-   /* The page tables are mapped virtually linear. At this point, though,
-* we don't know whether we are trying to fault in a first level
-* virtual address or a virtual page table address. We can get that
-* from bit 0x1 of the region ID which we have set for a page table
-*/
-   andis.  r10,r15,0x1
-   bne-virt_page_table_tlb_miss
-
-   std r14,EX_TLB_ESR(r12);/* save ESR */
-   std r16,EX_TLB_DEAR(r12);   /* save DEAR */
-
-/* We need _PAGE_PRESENT and  _PAGE_ACCESSED set */
-   li  r11,_PAGE_PRESENT
-   orisr11,r11,_PAGE_ACCESSED@h
-
-   /* We do the user/kernel test for the PID here along with the RW test
-*/
-   srdi.   r15,r16,60  /* Check for user region */
-
-   /* We pre-test some combination of permissions to avoid double
-* faults:
-*
-* We move the ESR:ST bit into the position of _PAGE_BAP_SW in the PTE
-* ESR_ST   is 0x0080
-* _PAGE_BAP_SW is 0x0010
-* So the shift is >> 19. This tests for supervisor writeability.
-* If the page happens to be supervisor writeable and not user
-* writeable, we will take a new fault later, but that should be
-* a rare enough case.
-*
-* We also move ES

[PATCH 1/6] powerpc/64e: Remove unused IBM HTW code

2024-05-24 Thread Michael Ellerman
The nohash HTW_IBM (Hardware Table Walk) code is unused since support
for A2 was removed in commit fb5a515704d7 ("powerpc: Remove platforms/
wsp and associated pieces") (2014).

The remaining supported CPUs use either no HTW (data_tlb_miss_bolted),
or the e6500 HTW (data_tlb_miss_e6500).

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/nohash/mmu-e500.h |   3 +-
 arch/powerpc/mm/nohash/tlb.c   |  57 +-
 arch/powerpc/mm/nohash/tlb_low_64e.S   | 195 -
 3 files changed, 2 insertions(+), 253 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/mmu-e500.h 
b/arch/powerpc/include/asm/nohash/mmu-e500.h
index 6ddced0415cb..7dc24b8632d7 100644
--- a/arch/powerpc/include/asm/nohash/mmu-e500.h
+++ b/arch/powerpc/include/asm/nohash/mmu-e500.h
@@ -303,8 +303,7 @@ extern unsigned long linear_map_top;
 extern int book3e_htw_mode;
 
 #define PPC_HTW_NONE   0
-#define PPC_HTW_IBM1
-#define PPC_HTW_E6500  2
+#define PPC_HTW_E6500  1
 
 /*
  * 64-bit booke platforms don't load the tlb in the tlb miss handler code.
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index 5ffa0af4328a..a5bb87ec8578 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -400,9 +400,8 @@ void tlb_flush_pgtable(struct mmu_gather *tlb, unsigned 
long address)
 static void __init setup_page_sizes(void)
 {
unsigned int tlb0cfg;
-   unsigned int tlb0ps;
unsigned int eptcfg;
-   int i, psize;
+   int psize;
 
 #ifdef CONFIG_PPC_E500
unsigned int mmucfg = mfspr(SPRN_MMUCFG);
@@ -471,50 +470,6 @@ static void __init setup_page_sizes(void)
goto out;
}
 #endif
-
-   tlb0cfg = mfspr(SPRN_TLB0CFG);
-   tlb0ps = mfspr(SPRN_TLB0PS);
-   eptcfg = mfspr(SPRN_EPTCFG);
-
-   /* Look for supported direct sizes */
-   for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
-   struct mmu_psize_def *def = _psize_defs[psize];
-
-   if (tlb0ps & (1U << (def->shift - 10)))
-   def->flags |= MMU_PAGE_SIZE_DIRECT;
-   }
-
-   /* Indirect page sizes supported ? */
-   if ((tlb0cfg & TLBnCFG_IND) == 0 ||
-   (tlb0cfg & TLBnCFG_PT) == 0)
-   goto out;
-
-   book3e_htw_mode = PPC_HTW_IBM;
-
-   /* Now, we only deal with one IND page size for each
-* direct size. Hopefully all implementations today are
-* unambiguous, but we might want to be careful in the
-* future.
-*/
-   for (i = 0; i < 3; i++) {
-   unsigned int ps, sps;
-
-   sps = eptcfg & 0x1f;
-   eptcfg >>= 5;
-   ps = eptcfg & 0x1f;
-   eptcfg >>= 5;
-   if (!ps || !sps)
-   continue;
-   for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
-   struct mmu_psize_def *def = _psize_defs[psize];
-
-   if (ps == (def->shift - 10))
-   def->flags |= MMU_PAGE_SIZE_INDIRECT;
-   if (sps == (def->shift - 10))
-   def->ind = ps + 10;
-   }
-   }
-
 out:
/* Cleanup array and print summary */
pr_info("MMU: Supported page sizes\n");
@@ -543,10 +498,6 @@ static void __init setup_mmu_htw(void)
 */
 
switch (book3e_htw_mode) {
-   case PPC_HTW_IBM:
-   patch_exception(0x1c0, exc_data_tlb_miss_htw_book3e);
-   patch_exception(0x1e0, exc_instruction_tlb_miss_htw_book3e);
-   break;
 #ifdef CONFIG_PPC_E500
case PPC_HTW_E6500:
extlb_level_exc = EX_TLB_SIZE;
@@ -577,12 +528,6 @@ static void early_init_this_mmu(void)
mmu_pte_psize = MMU_PAGE_2M;
break;
 
-   case PPC_HTW_IBM:
-   mas4 |= MAS4_INDD;
-   mas4 |= BOOK3E_PAGESZ_1M << MAS4_TSIZED_SHIFT;
-   mmu_pte_psize = MMU_PAGE_1M;
-   break;
-
case PPC_HTW_NONE:
mas4 |= BOOK3E_PAGESZ_4K << MAS4_TSIZED_SHIFT;
mmu_pte_psize = mmu_virtual_psize;
diff --git a/arch/powerpc/mm/nohash/tlb_low_64e.S 
b/arch/powerpc/mm/nohash/tlb_low_64e.S
index 7e0b8fe1c279..b0eb3f7eaed1 100644
--- a/arch/powerpc/mm/nohash/tlb_low_64e.S
+++ b/arch/powerpc/mm/nohash/tlb_low_64e.S
@@ -893,201 +893,6 @@ virt_page_table_tlb_miss_whacko_fault:
TLB_MISS_EPILOG_ERROR
b   exc_data_storage_book3e
 
-
-/**
- **
- * TLB miss handling for Book3E with hw page table support*
- **
- **/
-
-
-/* Data TLB miss */
-   START

[PATCH 5/6] powerpc/64e: Consolidate TLB miss handler patching

2024-05-24 Thread Michael Ellerman
The 64e TLB miss handler patching is done in setup_mmu_htw(), and then
again immediately afterward in early_init_mmu_global(). Consolidate it
into a single location.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/nohash/tlb_64e.c | 38 +---
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/mm/nohash/tlb_64e.c b/arch/powerpc/mm/nohash/tlb_64e.c
index 21c4b2442fcf..d83ecf466929 100644
--- a/arch/powerpc/mm/nohash/tlb_64e.c
+++ b/arch/powerpc/mm/nohash/tlb_64e.c
@@ -169,24 +169,6 @@ static void __init setup_page_sizes(void)
}
 }
 
-static void __init setup_mmu_htw(void)
-{
-   /*
-* If we want to use HW tablewalk, enable it by patching the TLB miss
-* handlers to branch to the one dedicated to it.
-*/
-
-   switch (book3e_htw_mode) {
-   case PPC_HTW_E6500:
-   extlb_level_exc = EX_TLB_SIZE;
-   patch_exception(0x1c0, exc_data_tlb_miss_e6500_book3e);
-   patch_exception(0x1e0, exc_instruction_tlb_miss_e6500_book3e);
-   break;
-   }
-   pr_info("MMU: Book3E HW tablewalk %s\n",
-   book3e_htw_mode != PPC_HTW_NONE ? "enabled" : "not supported");
-}
-
 /*
  * Early initialization of the MMU TLB code
  */
@@ -252,15 +234,25 @@ static void __init early_init_mmu_global(void)
/* Look for supported page sizes */
setup_page_sizes();
 
-   /* Look for HW tablewalk support */
-   setup_mmu_htw();
-
-   if (book3e_htw_mode == PPC_HTW_NONE) {
-   extlb_level_exc = EX_TLB_SIZE;
+   /*
+* If we want to use HW tablewalk, enable it by patching the TLB miss
+* handlers to branch to the one dedicated to it.
+*/
+   extlb_level_exc = EX_TLB_SIZE;
+   switch (book3e_htw_mode) {
+   case PPC_HTW_E6500:
+   patch_exception(0x1c0, exc_data_tlb_miss_e6500_book3e);
+   patch_exception(0x1e0, exc_instruction_tlb_miss_e6500_book3e);
+   break;
+   case PPC_HTW_NONE:
patch_exception(0x1c0, exc_data_tlb_miss_bolted_book3e);
patch_exception(0x1e0, exc_instruction_tlb_miss_bolted_book3e);
+   break;
}
 
+   pr_info("MMU: Book3E HW tablewalk %s\n",
+   book3e_htw_mode != PPC_HTW_NONE ? "enabled" : "not supported");
+
/* Set the global containing the top of the linear mapping
 * for use by the TLB miss code
 */
-- 
2.45.1



[PATCH 4/6] powerpc/64e: Drop MMU_FTR_TYPE_FSL_E checks in 64-bit code

2024-05-24 Thread Michael Ellerman
All 64-bit Book3E have MMU_FTR_TYPE_FSL_E, since A2 was removed, so
remove checks for it in 64-bit only code.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/setup_64.c   |  6 +-
 arch/powerpc/mm/nohash/tlb_64e.c | 97 
 2 files changed, 38 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 2f19d5e94485..12ca0bf27045 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -696,11 +696,7 @@ __init u64 ppc64_bolted_size(void)
 {
 #ifdef CONFIG_PPC_BOOK3E_64
/* Freescale BookE bolts the entire linear mapping */
-   /* XXX: BookE ppc64_rma_limit setup seems to disagree? */
-   if (early_mmu_has_feature(MMU_FTR_TYPE_FSL_E))
-   return linear_map_top;
-   /* Other BookE, we assume the first GB is bolted */
-   return 1ul << 30;
+   return linear_map_top;
 #else
/* BookS radix, does not take faults on linear mapping */
if (early_radix_enabled())
diff --git a/arch/powerpc/mm/nohash/tlb_64e.c b/arch/powerpc/mm/nohash/tlb_64e.c
index 9db85ee9ba5b..21c4b2442fcf 100644
--- a/arch/powerpc/mm/nohash/tlb_64e.c
+++ b/arch/powerpc/mm/nohash/tlb_64e.c
@@ -86,9 +86,8 @@ static void __init setup_page_sizes(void)
int psize;
 
unsigned int mmucfg = mfspr(SPRN_MMUCFG);
-   int fsl_mmu = mmu_has_feature(MMU_FTR_TYPE_FSL_E);
 
-   if (fsl_mmu && (mmucfg & MMUCFG_MAVN) == MMUCFG_MAVN_V1) {
+   if ((mmucfg & MMUCFG_MAVN) == MMUCFG_MAVN_V1) {
unsigned int tlb1cfg = mfspr(SPRN_TLB1CFG);
unsigned int min_pg, max_pg;
 
@@ -115,7 +114,7 @@ static void __init setup_page_sizes(void)
goto out;
}
 
-   if (fsl_mmu && (mmucfg & MMUCFG_MAVN) == MMUCFG_MAVN_V2) {
+   if ((mmucfg & MMUCFG_MAVN) == MMUCFG_MAVN_V2) {
u32 tlb1cfg, tlb1ps;
 
tlb0cfg = mfspr(SPRN_TLB0CFG);
@@ -213,26 +212,24 @@ static void early_init_this_mmu(void)
}
mtspr(SPRN_MAS4, mas4);
 
-   if (mmu_has_feature(MMU_FTR_TYPE_FSL_E)) {
-   unsigned int num_cams;
-   bool map = true;
+   unsigned int num_cams;
+   bool map = true;
 
-   /* use a quarter of the TLBCAM for bolted linear map */
-   num_cams = (mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY) / 4;
+   /* use a quarter of the TLBCAM for bolted linear map */
+   num_cams = (mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY) / 4;
 
-   /*
-* Only do the mapping once per core, or else the
-* transient mapping would cause problems.
-*/
+   /*
+* Only do the mapping once per core, or else the
+* transient mapping would cause problems.
+*/
 #ifdef CONFIG_SMP
-   if (hweight32(get_tensr()) > 1)
-   map = false;
+   if (hweight32(get_tensr()) > 1)
+   map = false;
 #endif
 
-   if (map)
-   linear_map_top = map_mem_in_cams(linear_map_top,
-num_cams, false, true);
-   }
+   if (map)
+   linear_map_top = map_mem_in_cams(linear_map_top,
+num_cams, false, true);
 
/* A sync won't hurt us after mucking around with
 * the MMU configuration
@@ -242,16 +239,10 @@ static void early_init_this_mmu(void)
 
 static void __init early_init_mmu_global(void)
 {
-   /* XXX This should be decided at runtime based on supported
-* page sizes in the TLB, but for now let's assume 16M is
-* always there and a good fit (which it probably is)
-*
+   /*
 * Freescale booke only supports 4K pages in TLB0, so use that.
 */
-   if (mmu_has_feature(MMU_FTR_TYPE_FSL_E))
-   mmu_vmemmap_psize = MMU_PAGE_4K;
-   else
-   mmu_vmemmap_psize = MMU_PAGE_16M;
+   mmu_vmemmap_psize = MMU_PAGE_4K;
 
/* XXX This code only checks for TLB 0 capabilities and doesn't
 * check what page size combos are supported by the HW. It
@@ -264,13 +255,10 @@ static void __init early_init_mmu_global(void)
/* Look for HW tablewalk support */
setup_mmu_htw();
 
-   if (mmu_has_feature(MMU_FTR_TYPE_FSL_E)) {
-   if (book3e_htw_mode == PPC_HTW_NONE) {
-   extlb_level_exc = EX_TLB_SIZE;
-   patch_exception(0x1c0, exc_data_tlb_miss_bolted_book3e);
-   patch_exception(0x1e0,
-   exc_instruction_tlb_miss_bolted_book3e);
-   }
+   if (book3e_htw_mode == PPC_HTW_NONE) {
+   extlb_level_exc = EX_TLB_SIZE;
+   patch_exception(0x1c0, exc_data_tlb_miss_bolted_book3e);
+   patch_exc

Re: [RFC PATCH v2 12/20] powerpc/64e: Remove unneeded #ifdef CONFIG_PPC_E500

2024-05-24 Thread Michael Ellerman
Christophe Leroy  writes:
> When it is a nohash/64 it can't be anything else than
> CONFIG_PPC_E500 so remove the #ifdef as they are always true.

I have a series doing some similar cleanups, I'll post it. We can decide
whether to merge it before your series or combine them or whatever.

cheers


Re: [PATCH v2] powerpc/perf: Set cpumode flags using sample address

2024-05-24 Thread Michael Ellerman
Hi Anjali,

Anjali K  writes:
> Currently in some cases, when the sampled instruction address register
> latches to a specific address during sampling, there is an inconsistency
> in the privilege bits captured in the sampled event register.
 
I don't really like "inconsistency", it's vague.

The sampled address is correct, and the privilege bits are incorrect.

If someone is offended by that wording you can direct them to me :)

> For example, a snippet from the perf report on a power10 system is:
> Overhead  Address Command   Shared Object  Symbol
>   ..    .  
> ...
>  2.41%  0x7fff9f94a02c  null_syscall  [unknown]  [k] 
> 0x7fff9f94a02c
>  2.20%  0x7fff9f94a02c  null_syscall  libc.so.6  [.] syscall
>
> perf_get_misc_flags() function looks at the privilege bits to return
> the corresponding flags to be used for the address symbol and these
> privilege bit details are read from the sampled event register. In the
> above snippet, address "0x7fff9f94a02c" is shown as "k" (kernel) due
> to the inconsistent privilege bits captured in the sampled event register.
 
"incorrect privilege bits"

> To address this case, the proposed fix is to additionally check whether the
 
"To address this case check whether the"

> sampled address is in the kernel area. Since this is specific to the latest
> platform, a new pmu flag is added called "PPMU_P10" and is used to
> contain the proposed fix.

You should explain why this fix replaces the existing P10_DD1 logic.

> Signed-off-by: Anjali K 
> ---
> Changelog:
> V1->V2:
> Fixed the build warning reported by the kernel test bot
> Added a new flag PPMU_P10 and used it instead of PPMU_ARCH_31 to restrict
> the changes to the current platform (Power10)
>
>  arch/powerpc/include/asm/perf_event_server.h |  1 +
>  arch/powerpc/perf/core-book3s.c  | 43 
>  arch/powerpc/perf/power10-pmu.c  |  3 +-
>  3 files changed, 20 insertions(+), 27 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/perf_event_server.h 
> b/arch/powerpc/include/asm/perf_event_server.h
> index e2221d29fdf9..12f7bfb4cab1 100644
> --- a/arch/powerpc/include/asm/perf_event_server.h
> +++ b/arch/powerpc/include/asm/perf_event_server.h
> @@ -90,6 +90,7 @@ struct power_pmu {
>  #define PPMU_ARCH_31 0x0200 /* Has MMCR3, SIER2 and SIER3 */
>  #define PPMU_P10_DD1 0x0400 /* Is power10 DD1 processor version 
> */
>  #define PPMU_HAS_ATTR_CONFIG10x0800 /* Using config1 attribute */
> +#define PPMU_P10 0x1000 /* For power10 pmu */
  
Can you put PPMU_P10 immediately after PPMU_P10_DD1. It's OK to renumber
PPMU_HAS_ATTR_CONFIG1.

> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 6b5f8a94e7d8..8a2677463a73 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -266,31 +266,12 @@ static inline u32 perf_flags_from_msr(struct pt_regs 
> *regs)
>  static inline u32 perf_get_misc_flags(struct pt_regs *regs)
>  {
>   bool use_siar = regs_use_siar(regs);
> - unsigned long mmcra = regs->dsisr;
> - int marked = mmcra & MMCRA_SAMPLE_ENABLE;
> + unsigned long siar = mfspr(SPRN_SIAR);
 
We shouldn't read SPRN_SIAR until we know it will be used.

> + unsigned long addr;
>  
>   if (!use_siar)
>   return perf_flags_from_msr(regs);
>  
> - /*
> -  * Check the address in SIAR to identify the
> -  * privilege levels since the SIER[MSR_HV, MSR_PR]
> -  * bits are not set for marked events in power10
> -  * DD1.
> -  */
> - if (marked && (ppmu->flags & PPMU_P10_DD1)) {
> - unsigned long siar = mfspr(SPRN_SIAR);
> - if (siar) {
> - if (is_kernel_addr(siar))
> - return PERF_RECORD_MISC_KERNEL;
> - return PERF_RECORD_MISC_USER;
> - } else {
> - if (is_kernel_addr(regs->nip))
> - return PERF_RECORD_MISC_KERNEL;
> - return PERF_RECORD_MISC_USER;
> - }
> - }
> -
>   /*
>* If we don't have flags in MMCRA, rather than using
>* the MSR, we intuit the flags from the address in
> @@ -298,19 +279,29 @@ static inline u32 perf_get_misc_flags(struct pt_regs 
> *regs)
>* results
>*/
>   if (ppmu->flags & PPMU_NO_SIPR) {
> - unsigned long siar = mfspr(SPRN_SIAR);
>   if (is_kernel_addr(siar))
>   return PERF_RECORD_MISC_KERNEL;
>   return PERF_RECORD_MISC_USER;
>   }
>  
>   /* PR has priority over HV, so order below is important */
> - if (regs_sipr(regs))
> - return PERF_RECORD_MISC_USER;
> -
> - if (regs_sihv(regs) && (freeze_events_kernel != MMCR0_FCHV))
> + if 

Re: [PATCH 1/2] powerpc/uaccess: Fix build errors seen with GCC 14

2024-05-24 Thread Michael Ellerman
Nick Desaulniers  writes:
> On Tue, May 21, 2024 at 5:39 AM Michael Ellerman  wrote:
>>
>> Building ppc64le_defconfig with GCC 14 fails with assembler errors:
>>
>> CC  fs/readdir.o
>>   /tmp/ccdQn0mD.s: Assembler messages:
>>   /tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 
>> 4)
>>   /tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 
>> 4)
>>   ... [6 lines]
>>   /tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple 
>> of 4)
>>
>> A snippet of the asm shows:
>>
>>   # ../fs/readdir.c:210: unsafe_copy_dirent_name(dirent->d_name, 
>> name, namlen, efault_end);
>>  ld 9,0(29)   # MEM[(u64 *)name_38(D) + _88 * 1], MEM[(u64 
>> *)name_38(D) + _88 * 1]
>>   # 210 "../fs/readdir.c" 1
>>  1:  std 9,18(8) # put_user   # *__pus_addr_52, MEM[(u64 
>> *)name_38(D) + _88 * 1]
>>
>> The 'std' instruction requires a 4-byte aligned displacement because
>> it is a DS-form instruction, and as the assembler says, 18 is not a
>> multiple of 4.
>>
>> The fix is to change the constraint on the memory operand to put_user(),
>> from "m" which is a general memory reference to "YZ".
>>
>> The "Z" constraint is documented in the GCC manual PowerPC machine
>> constraints, and specifies a "memory operand accessed with indexed or
>> indirect addressing". "Y" is not documented in the manual but specifies
>> a "memory operand for a DS-form instruction". Using both allows the
>> compiler to generate a DS-form "std" or X-form "stdx" as appropriate.
>>
>> The change has to be conditional on CONFIG_PPC_KERNEL_PREFIXED because
>> the "Y" constraint does not guarantee 4-byte alignment when prefixed
>> instructions are enabled.
>>
>> Unfortunately clang doesn't support the "Y" constraint so that has to be
>> behind an ifdef.
>
> Filed: https://github.com/llvm/llvm-project/issues/92939

Thanks. I will file one to have the GCC constraint documented.

cheers


Re: [RFC PATCH v2 00/20] Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)

2024-05-23 Thread Michael Ellerman
Hi Peter,

Peter Xu  writes:
> On Fri, May 17, 2024 at 08:59:54PM +0200, Christophe Leroy wrote:
>> This is the continuation of the RFC v1 series "Reimplement huge pages
>> without hugepd on powerpc 8xx". It now get rid of hugepd completely
>> after handling also e500 and book3s/64
>> 
>> Unlike most architectures, powerpc 8xx HW requires a two-level
>> pagetable topology for all page sizes. So a leaf PMD-contig approach
>> is not feasible as such.

>
> Great to see this series, thanks again Christophe.
>
> I requested for help on the lsfmm hugetlb unification session, but
> unfortunately I don't think there were Power people around.. I'd like to
> request help from Power developers again here on the list: it will be very
> appreciated if you can help have a look at this series.

Christophe is a powerpc developer :)

I'll help where I can, but I don't know the hugepd code that well, I've
never really worked on it before. Nick will hopefully also be able to
help, he at least knows mm better than me, but he also has other work.

Hopefully we can make this series work, and replace hugepd. But if we
can't make that work then there is the possibility of just dropping
support for 16M/16G pages with HPT/4K pages.

> It's a direct dependent work to the hugetlb refactoring that we'll be
> working on, while it looks like the hugetlb refactoring is something the
> community as a whole would like to see in the near future.
>
> We don't want to add more Power-only CONFIG_ARCH_HAS_HUGEPD checks for
> hugetlb in any new code.

Yes I understand.

cheers


bnx2x: UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c

2024-05-23 Thread Michael Ellerman
Hi folks,

I'm seeing an UBSAN warning when loading the bnx2x module on my Power8 machine:

  [ cut here ]
  UBSAN: array-index-out-of-bounds in 
../drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1529:11
  index 20 is out of range for type 'stats_query_entry [19]'
  CPU: 1 PID: 3870 Comm: NetworkManager Not tainted 6.9.0-dirty #17
  Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf04 of:IBM,FW860.B3 
(SV860_245) hv:phyp pSeries
  Call Trace:
dump_stack_lvl+0x80/0xe8 (unreliable)
__ubsan_handle_out_of_bounds+0xc4/0x110
bnx2x_stats_init+0x6f0/0x724 [bnx2x]
bnx2x_post_irq_nic_init+0x2bc/0x51c [bnx2x]
bnx2x_nic_load+0xd74/0x2de0 [bnx2x]
bnx2x_open+0x194/0x310 [bnx2x]
__dev_open+0x16c/0x22c
__dev_change_flags+0x258/0x2f4
dev_change_flags+0x3c/0x9c
do_setlink+0x35c/0x13b4
__rtnl_newlink+0x9b8/0xd88
rtnl_newlink+0x70/0xac
rtnetlink_rcv_msg+0x380/0x578
netlink_rcv_skb+0x80/0x190
rtnetlink_rcv+0x28/0x3c
netlink_unicast+0x2bc/0x3d4
netlink_sendmsg+0x21c/0x54c
sys_sendmsg+0x28c/0x3c0
___sys_sendmsg+0xcc/0x128
__sys_sendmsg+0x94/0xf4
system_call_exception+0x174/0x320
system_call_common+0x160/0x2c4

It seems there's some confusion about how many queues there should be, earlier 
the driver prints:
 
  [  480.692141] bnx2x 0010:01:00.0: set number of queues to 21

But the stats array only has space for 19?

Loading the driver with num_queues=18 avoids the warning.

This naive patch does fix it, but is probably just papering over the issue:

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index e2a4e1088b7f..7fe3562fa8a9 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1263,7 +1263,7 @@ enum {
 struct bnx2x_fw_stats_req {
struct stats_query_header hdr;
struct stats_query_entry query[FP_SB_MAX_E1x+
-   BNX2X_FIRST_QUEUE_QUERY_IDX];
+   BNX2X_FIRST_QUEUE_QUERY_IDX + 2];
 };
 
 struct bnx2x_fw_stats_data {


Full dmesg leading up to the UBSAN report below.

cheers


$ modprobe bnx2x
[  480.575366] bnx2x 0010:01:00.0: msix capability found
[  480.594616] bnx2x 0010:01:00.0: me reg PF num: 0
[  480.594747] bnx2x 0010:01:00.0: This is a physical function
[  480.594754] bnx2x 0010:01:00.0: Cnic support is on
[  480.594760] bnx2x 0010:01:00.0: Max num of status blocks 31
[  480.594766] bnx2x 0010:01:00.0: Allocated netdev with 91 tx and 31 rx queues
[  480.594781] bnx2x 0010:01:00.0: chip is in 2_PORT_MODE
[  480.594787] bnx2x 0010:01:00.0: pf_id: 0
[  480.594792] bnx2x 0010:01:00.0: chip ID is 0x168e1000
[  480.594802] bnx2x 0010:01:00.0: flash_size 0x20 (2097152)
[  480.594815] bnx2x 0010:01:00.0: shmem offset 0x3c6c80  shmem2 offset 0x3c575c
[  480.594824] bnx2x 0010:01:00.0: hw_config 0x000f0001
[  480.594831] bnx2x 0010:01:00.0: bc_ver 70A04
[  480.594939] bnx2x 0010:01:00.0: not WoL capable
[  480.594948] bnx2x 0010:01:00.0: part number 0-0-0-0
[  480.594963] bnx2x 0010:01:00.0: IGU Normal Mode
[  480.595077] bnx2x 0010:01:00.0: igu_dsb_id 0  igu_base_sb 1  igu_sb_cnt 31
   base_fw_ndsb 1
[  480.595086] bnx2x 0010:01:00.0: shmem2base 0x3c575c, size 412, mfcfg offset 
16
[  480.595096] bnx2x 0010:01:00.0: single function mode
[  480.595107] bnx2x 0010:01:00.0: lane_config 0x  speed_cap_mask0 
0x005c  link_config0 0x
[  480.595118] bnx2x: [bnx2x_phy_probe:12595(eth%d)]Begin phy probe
[  480.595124] bnx2x: [bnx2x_phy_probe:12608(eth%d)]phy_config_swapped 0, 
phy_index 0, actual_phy_idx 0
[  480.595137] bnx2x: [bnx2x_populate_int_phy:12217(eth%d)]:chip_id = 0x168e1000
[  480.595147] bnx2x: [bnx2x_populate_int_phy:12335(eth%d)]Internal phy port=0, 
addr=0x1, mdio_ctl=0x8000
[  480.595162] bnx2x: [bnx2x_phy_def_cfg:12505(eth%d)]Default config phy idx 0 
cfg 0x0 speed_cap_mask 0x5c
[  480.595171] bnx2x: [bnx2x_phy_probe:12608(eth%d)]phy_config_swapped 0, 
phy_index 1, actual_phy_idx 1
[  480.595184] bnx2x: [bnx2x_populate_ext_phy:12463(eth%d)]phy_type 0xd00 port 
0 found in index 1
[  480.595192] bnx2x: [bnx2x_populate_ext_phy:12465(eth%d)] 
addr=0x10, mdio_ctl=0x8000
[  480.595202] bnx2x: [bnx2x_phy_def_cfg:12505(eth%d)]Default config phy idx 1 
cfg 0x0 speed_cap_mask 0x5c
[  480.595210] bnx2x: [bnx2x_phy_probe:12608(eth%d)]phy_config_swapped 0, 
phy_index 2, actual_phy_idx 2
[  480.595218] bnx2x: [bnx2x_phy_probe:12658(eth%d)]End phy probe. #phys found 2
[  480.595226] bnx2x 0010:01:00.0: phy_addr 0x1
[  480.595231] bnx2x 0010:01:00.0: supported 0x70ec 0x0
[  480.595237] bnx2x 0010:01:00.0: req_line_speed 0  req_duplex 1 req_flow_ctrl 
0x0 advertising 0x70ec
[  480.595255] bnx2x 0010:01:00.0: max_iscsi_conn 0x0
[  480.595265] bnx2x 0010:01:00.0: max_fcoe_conn 0x0
[  480.595273] bnx2x 0010:01:00.0: msix_table_size 32
[  480.595278] bnx2x 0010:01:00.0: fp_array_size 31
[  480.595307] bnx2x 

Re: [PATCH next] arch: powerpc: platforms: Remove unnecessary call to of_node_get

2024-05-23 Thread Michael Ellerman
Prabhav Kumar Vaish  writes:
> `dev->of_node` has a pointer to device node, of_node_get call seems
> unnecessary.

Sorry but it is necessary.

> Signed-off-by: Prabhav Kumar Vaish 
> ---
>  arch/powerpc/platforms/cell/iommu.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/platforms/cell/iommu.c 
> b/arch/powerpc/platforms/cell/iommu.c
> index 4cd9c0de22c2..5b794ce08689 100644
> --- a/arch/powerpc/platforms/cell/iommu.c
> +++ b/arch/powerpc/platforms/cell/iommu.c
> @@ -780,14 +780,13 @@ static int __init cell_iommu_init_disabled(void)
>  static u64 cell_iommu_get_fixed_address(struct device *dev)
>  {
>   u64 cpu_addr, size, best_size, dev_addr = OF_BAD_ADDR;
> - struct device_node *np;
> + struct device_node *np = dev->of_node;
>   const u32 *ranges = NULL;
>   int i, len, best, naddr, nsize, pna, range_size;
>  
>   /* We can be called for platform devices that have no of_node */
> - np = of_node_get(dev->of_node);
>   if (!np)
> - goto out;
> + return dev_addr;
>  
>   while (1) {
>   naddr = of_n_addr_cells(np);

nsize = of_n_size_cells(np);
np = of_get_next_parent(np);
if (!np)
break;

of_get_next_parent() drops the reference of the node passed to it (np).

So if you actually tested your patch you should see a recount underflow.

cheers


Re: [PATCH v2] mm/mm_init: use node's number of cpus in deferred_page_init_max_threads

2024-05-23 Thread Michael Ellerman
Eric Chanudet  writes:
> x86_64 is already using the node's cpu as maximum threads. Make that the
> default for all archs setting DEFERRED_STRUCT_PAGE_INIT.
>
> This returns to the behavior prior making the function arch-specific
> with commit ecd096506922 ("mm: make deferred init's max threads
> arch-specific").
>
> Signed-off-by: Eric Chanudet 
>
> ---
> Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 platforms
> shows faster deferred_init_memmap completions:
>
> | | x13s| SA8775p-ride | Ampere R137-P31 | Ampere HR330 |
> | | Metal, 32GB | VM, 36GB | VM, 58GB| Metal, 128GB |
> | | 8cpus   | 8cpus| 8cpus   | 32cpus   |
> |-|-|--|-|--|
> | threads |  ms (%) | ms   (%) |  ms (%) |  ms  (%) |
> |-|-|--|-|--|
> | 1   | 108(0%) | 72  (0%) | 224(0%) | 324 (0%) |
> | cpus|  24  (-77%) | 36(-50%) |  40  (-82%) |  56   (-82%) |
>
> - v1: 
> https://lore.kernel.org/linux-arm-kernel/20240520231555.395979-5-echan...@redhat.com
> - Changes since v1:
>  - Make the generic function return the number of cpus of the node as
>max threads limit instead overriding it for arm64.
> - Drop Baoquan He's R-b on v1 since the logic changed.
> - Add CCs according to patch changes (ppc and s390 set
>   DEFERRED_STRUCT_PAGE_INIT by default).
>
>  arch/x86/mm/init_64.c | 12 
>  mm/mm_init.c  |  2 +-
>  2 files changed, 1 insertion(+), 13 deletions(-)

On a machine here (1TB, 40 cores, 4KB pages) the existing code gives:

  [0.500124] node 2 deferred pages initialised in 210ms
  [0.515790] node 3 deferred pages initialised in 230ms
  [0.516061] node 0 deferred pages initialised in 230ms
  [0.516522] node 7 deferred pages initialised in 230ms
  [0.516672] node 4 deferred pages initialised in 230ms
  [0.516798] node 6 deferred pages initialised in 230ms
  [0.517051] node 5 deferred pages initialised in 230ms
  [0.523887] node 1 deferred pages initialised in 240ms

vs with the patch:

  [0.379613] node 0 deferred pages initialised in 90ms
  [0.380388] node 1 deferred pages initialised in 90ms
  [0.380540] node 4 deferred pages initialised in 100ms
  [0.390239] node 6 deferred pages initialised in 100ms
  [0.390249] node 2 deferred pages initialised in 100ms
  [0.390786] node 3 deferred pages initialised in 110ms
  [0.396721] node 5 deferred pages initialised in 110ms
  [0.397095] node 7 deferred pages initialised in 110ms

Which is a nice speedup.

Tested-by: Michael Ellerman  (powerpc)

cheers


Re: [PATCH v5 44/68] selftests/powerpc: Drop define _GNU_SOURCE

2024-05-22 Thread Michael Ellerman
Edward Liaw  writes:
> _GNU_SOURCE is provided by lib.mk, so it should be dropped to prevent
> redefinition warnings.

Most of these tests build with -Werror, so the duplicate define is
actually a hard error. Can you put this patch earlier in the series at
least?

cheers

> Signed-off-by: Edward Liaw 
> ---
>  tools/testing/selftests/powerpc/benchmarks/context_switch.c| 2 --
>  tools/testing/selftests/powerpc/benchmarks/exec_target.c   | 2 --
>  tools/testing/selftests/powerpc/benchmarks/fork.c  | 2 --
>  tools/testing/selftests/powerpc/benchmarks/futex_bench.c   | 3 ---
>  tools/testing/selftests/powerpc/dexcr/hashchk_test.c   | 3 ---
>  tools/testing/selftests/powerpc/dscr/dscr_default_test.c   | 3 ---
>  tools/testing/selftests/powerpc/dscr/dscr_explicit_test.c  | 3 ---
>  tools/testing/selftests/powerpc/dscr/dscr_sysfs_thread_test.c  | 1 -
>  tools/testing/selftests/powerpc/mm/exec_prot.c | 2 --
>  tools/testing/selftests/powerpc/mm/pkey_exec_prot.c| 2 --
>  tools/testing/selftests/powerpc/mm/pkey_siginfo.c  | 2 --
>  tools/testing/selftests/powerpc/mm/tlbie_test.c| 2 --
>  tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c| 1 -
>  tools/testing/selftests/powerpc/pmu/count_instructions.c   | 3 ---
>  tools/testing/selftests/powerpc/pmu/count_stcx_fail.c  | 3 ---
>  tools/testing/selftests/powerpc/pmu/ebb/ebb.c  | 3 ---
>  .../testing/selftests/powerpc/pmu/ebb/instruction_count_test.c | 3 ---
>  tools/testing/selftests/powerpc/pmu/event.c| 2 --
>  tools/testing/selftests/powerpc/pmu/lib.c  | 3 ---
>  tools/testing/selftests/powerpc/pmu/per_event_excludes.c   | 3 ---
>  tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c  | 3 ---
>  tools/testing/selftests/powerpc/ptrace/ptrace-syscall.c| 2 --
>  tools/testing/selftests/powerpc/signal/sig_sc_double_restart.c | 1 -
>  tools/testing/selftests/powerpc/signal/sigreturn_kernel.c  | 3 ---
>  tools/testing/selftests/powerpc/signal/sigreturn_vdso.c| 3 ---
>  tools/testing/selftests/powerpc/syscalls/ipc_unmuxed.c | 2 --
>  tools/testing/selftests/powerpc/tm/tm-exec.c   | 2 --
>  tools/testing/selftests/powerpc/tm/tm-poison.c | 2 --
>  .../testing/selftests/powerpc/tm/tm-signal-context-force-tm.c  | 2 --
>  tools/testing/selftests/powerpc/tm/tm-signal-sigreturn-nt.c| 2 --
>  tools/testing/selftests/powerpc/tm/tm-tmspr.c  | 2 --
>  tools/testing/selftests/powerpc/tm/tm-trap.c   | 2 --
>  tools/testing/selftests/powerpc/tm/tm-unavailable.c| 2 --
>  tools/testing/selftests/powerpc/utils.c| 3 ---
>  34 files changed, 79 deletions(-)
>
> diff --git a/tools/testing/selftests/powerpc/benchmarks/context_switch.c 
> b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> index 96554e2794d1..0b245572bd45 100644
> --- a/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> +++ b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> @@ -4,8 +4,6 @@
>   *
>   * Copyright (C) 2015 Anton Blanchard , IBM
>   */
> -
> -#define _GNU_SOURCE
>  #include 
>  #include 
>  #include 
> diff --git a/tools/testing/selftests/powerpc/benchmarks/exec_target.c 
> b/tools/testing/selftests/powerpc/benchmarks/exec_target.c
> index c14b0fc1edde..8646540037d8 100644
> --- a/tools/testing/selftests/powerpc/benchmarks/exec_target.c
> +++ b/tools/testing/selftests/powerpc/benchmarks/exec_target.c
> @@ -5,8 +5,6 @@
>   *
>   * Copyright 2018, Anton Blanchard, IBM Corp.
>   */
> -
> -#define _GNU_SOURCE
>  #include 
>  #include 
>  
> diff --git a/tools/testing/selftests/powerpc/benchmarks/fork.c 
> b/tools/testing/selftests/powerpc/benchmarks/fork.c
> index d312e638cb37..327231646a2a 100644
> --- a/tools/testing/selftests/powerpc/benchmarks/fork.c
> +++ b/tools/testing/selftests/powerpc/benchmarks/fork.c
> @@ -5,8 +5,6 @@
>   *
>   * Copyright 2018, Anton Blanchard, IBM Corp.
>   */
> -
> -#define _GNU_SOURCE
>  #include 
>  #include 
>  #include 
> diff --git a/tools/testing/selftests/powerpc/benchmarks/futex_bench.c 
> b/tools/testing/selftests/powerpc/benchmarks/futex_bench.c
> index 017057090490..0483a13c88f9 100644
> --- a/tools/testing/selftests/powerpc/benchmarks/futex_bench.c
> +++ b/tools/testing/selftests/powerpc/benchmarks/futex_bench.c
> @@ -2,9 +2,6 @@
>  /*
>   * Copyright 2016, Anton Blanchard, Michael Ellerman, IBM Corp.
>   */
> -
> -#define _GNU_SOURCE
> -
>  #include 
>  #include 
>  #include 
> diff --git a/tools/testing/selftests/powerpc/dexcr

[PATCH 1/2] powerpc/uaccess: Fix build errors seen with GCC 14

2024-05-21 Thread Michael Ellerman
Building ppc64le_defconfig with GCC 14 fails with assembler errors:

CC  fs/readdir.o
  /tmp/ccdQn0mD.s: Assembler messages:
  /tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 4)
  /tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 4)
  ... [6 lines]
  /tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple of 4)

A snippet of the asm shows:

  # ../fs/readdir.c:210: unsafe_copy_dirent_name(dirent->d_name, name, 
namlen, efault_end);
 ld 9,0(29)   # MEM[(u64 *)name_38(D) + _88 * 1], MEM[(u64 
*)name_38(D) + _88 * 1]
  # 210 "../fs/readdir.c" 1
 1:  std 9,18(8) # put_user   # *__pus_addr_52, MEM[(u64 
*)name_38(D) + _88 * 1]

The 'std' instruction requires a 4-byte aligned displacement because
it is a DS-form instruction, and as the assembler says, 18 is not a
multiple of 4.

The fix is to change the constraint on the memory operand to put_user(),
from "m" which is a general memory reference to "YZ".

The "Z" constraint is documented in the GCC manual PowerPC machine
constraints, and specifies a "memory operand accessed with indexed or
indirect addressing". "Y" is not documented in the manual but specifies
a "memory operand for a DS-form instruction". Using both allows the
compiler to generate a DS-form "std" or X-form "stdx" as appropriate.

The change has to be conditional on CONFIG_PPC_KERNEL_PREFIXED because
the "Y" constraint does not guarantee 4-byte alignment when prefixed
instructions are enabled.

Unfortunately clang doesn't support the "Y" constraint so that has to be
behind an ifdef.

Although the build error is only seen with GCC 14, that appears to just
be luck. The constraint has been incorrect since it was first added.

Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with 
__put_user()/__get_user()")
Cc: sta...@vger.kernel.org # v5.10+
Suggested-by: Kewen Lin 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/uaccess.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index de10437fd206..4cba724c8899 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -92,9 +92,25 @@ __pu_failed: 
\
: label)
 #endif
 
+#ifdef CONFIG_CC_IS_CLANG
+#define DS_FORM_CONSTRAINT "Z<>"
+#else
+#define DS_FORM_CONSTRAINT "YZ<>"
+#endif
+
 #ifdef __powerpc64__
+#ifdef CONFIG_PPC_KERNEL_PREFIXED
 #define __put_user_asm2_goto(x, ptr, label)\
__put_user_asm_goto(x, ptr, label, "std")
+#else
+#define __put_user_asm2_goto(x, addr, label)   \
+   asm goto ("1: std%U1%X1 %0,%1   # put_user\n"   \
+   EX_TABLE(1b, %l2)   \
+   :   \
+   : "r" (x), DS_FORM_CONSTRAINT (*addr)   \
+   :   \
+   : label)
+#endif // CONFIG_PPC_KERNEL_PREFIXED
 #else /* __powerpc64__ */
 #define __put_user_asm2_goto(x, addr, label)   \
asm goto(   \
-- 
2.45.1



[PATCH 2/2] powerpc/uaccess: Use YZ asm constraint for ld

2024-05-21 Thread Michael Ellerman
The 'ld' instruction requires a 4-byte aligned displacement because it
is a DS-form instruction. But the "m" asm constraint doesn't enforce
that.

Add a special case of __get_user_asm2_goto() so that the "YZ" constraint
can be used for "ld".

The "Z" constraint is documented in the GCC manual PowerPC machine
constraints, and specifies a "memory operand accessed with indexed or
indirect addressing". "Y" is not documented in the manual but specifies
a "memory operand for a DS-form instruction". Using both allows the
compiler to generate a DS-form "ld" or X-form "ldx" as appropriate.

The change has to be conditional on CONFIG_PPC_KERNEL_PREFIXED because
the "Y" constraint does not guarantee 4-byte alignment when prefixed
instructions are enabled.

No build errors have been reported due to this, but the possibility is
there depending on compiler code generation decisions.

Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with 
__put_user()/__get_user()")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/uaccess.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 4cba724c8899..fd594bf6c6a9 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -181,8 +181,19 @@ do {   
\
 #endif
 
 #ifdef __powerpc64__
+#ifdef CONFIG_PPC_KERNEL_PREFIXED
 #define __get_user_asm2_goto(x, addr, label)   \
__get_user_asm_goto(x, addr, label, "ld")
+#else
+#define __get_user_asm2_goto(x, addr, label)   \
+   asm_goto_output(\
+   "1: ld%U1%X1 %0, %1 # get_user\n"   \
+   EX_TABLE(1b, %l2)   \
+   : "=r" (x)  \
+   : DS_FORM_CONSTRAINT (*addr)\
+   :   \
+   : label)
+#endif // CONFIG_PPC_KERNEL_PREFIXED
 #else /* __powerpc64__ */
 #define __get_user_asm2_goto(x, addr, label)   \
asm_goto_output(\
-- 
2.45.1



[PATCH] selftests/mm: Fix build warnings on ppc64

2024-05-20 Thread Michael Ellerman
Fix warnings like:

  In file included from uffd-unit-tests.c:8:
  uffd-unit-tests.c: In function ‘uffd_poison_handle_fault’:
  uffd-common.h:45:33: warning: format ‘%llu’ expects argument of type
  ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long
  unsigned int’} [-Wformat=]

By switching to unsigned long long for u64 for ppc64 builds.

Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/mm/gup_test.c| 1 +
 tools/testing/selftests/mm/uffd-common.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/testing/selftests/mm/gup_test.c 
b/tools/testing/selftests/mm/gup_test.c
index bd335cf9bc0e..bdeaac67ff9a 100644
--- a/tools/testing/selftests/mm/gup_test.c
+++ b/tools/testing/selftests/mm/gup_test.c
@@ -1,3 +1,4 @@
+#define __SANE_USERSPACE_TYPES__ // Use ll64
 #include 
 #include 
 #include 
diff --git a/tools/testing/selftests/mm/uffd-common.h 
b/tools/testing/selftests/mm/uffd-common.h
index cc5629c3d2aa..a70ae10b5f62 100644
--- a/tools/testing/selftests/mm/uffd-common.h
+++ b/tools/testing/selftests/mm/uffd-common.h
@@ -8,6 +8,7 @@
 #define __UFFD_COMMON_H__
 
 #define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
 #include 
 #include 
 #include 
-- 
2.45.1



[PATCH] selftests: cachestat: Fix build warnings on ppc64

2024-05-20 Thread Michael Ellerman
Fix warnings like:
  test_cachestat.c: In function ‘print_cachestat’:
  test_cachestat.c:30:38: warning: format ‘%llu’ expects argument of
  type ‘long long unsigned int’, but argument 2 has type ‘__u64’ {aka
  ‘long unsigned int’} [-Wformat=]

By switching to unsigned long long for u64 for ppc64 builds.

Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/cachestat/test_cachestat.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/cachestat/test_cachestat.c 
b/tools/testing/selftests/cachestat/test_cachestat.c
index b171fd53b004..632ab44737ec 100644
--- a/tools/testing/selftests/cachestat/test_cachestat.c
+++ b/tools/testing/selftests/cachestat/test_cachestat.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
 
 #include 
 #include 
-- 
2.45.1



[PATCH] selftests/openat2: Fix build warnings on ppc64

2024-05-20 Thread Michael Ellerman
Fix warnings like:

  openat2_test.c: In function ‘test_openat2_flags’:
  openat2_test.c:303:73: warning: format ‘%llX’ expects argument of type
  ‘long long unsigned int’, but argument 5 has type ‘__u64’ {aka ‘long
  unsigned int’} [-Wformat=]

By switching to unsigned long long for u64 for ppc64 builds.

Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/openat2/openat2_test.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/openat2/openat2_test.c 
b/tools/testing/selftests/openat2/openat2_test.c
index 9024754530b2..5790ab446527 100644
--- a/tools/testing/selftests/openat2/openat2_test.c
+++ b/tools/testing/selftests/openat2/openat2_test.c
@@ -5,6 +5,7 @@
  */
 
 #define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
 #include 
 #include 
 #include 
-- 
2.45.1



[PATCH] selftests/overlayfs: Fix build error on ppc64

2024-05-20 Thread Michael Ellerman
Fix build error on ppc64:
  dev_in_maps.c: In function ‘get_file_dev_and_inode’:
  dev_in_maps.c:60:59: error: format ‘%llu’ expects argument of type
  ‘long long unsigned int *’, but argument 7 has type ‘__u64 *’ {aka ‘long
  unsigned int *’} [-Werror=format=]

By switching to unsigned long long for u64 for ppc64 builds.

Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c 
b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
index 759f86e7d263..2862aae58b79 100644
--- a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
+++ b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
 
 #include 
 #include 
-- 
2.45.1



Re: [PATCH] powerpc/kernel: Fix potential spectre v1 in syscall

2024-05-20 Thread Michael Ellerman
Nathan Lynch  writes:
> Michael Ellerman  writes:
>> Breno Leitao  writes:
>>> On Tue, Mar 12, 2024 at 08:17:42AM +, Christophe Leroy wrote:
>>>> +Nathan as this is RTAS related.
>
> Thanks!
>
>>>> Le 21/08/2018 à 20:42, Breno Leitao a écrit :
>>>> > The rtas syscall reads a value from a user-provided structure and uses it
>>>> > to index an array, being a possible area for a potential spectre v1 
>>>> > attack.
>>>> > This is the code that exposes this problem.
>>>> > 
>>>> >  args.rets = [nargs];
>>>> > 
>>>> > The nargs is an user provided value, and the below code is an example 
>>>> > where
>>>> > the 'nargs' value would be set to XX.
>>>> > 
>>>> >  struct rtas_args ra;
>>>> >  ra.nargs = htobe32(XX);
>>>> >  syscall(__NR_rtas, );
>>>> 
>>>> 
>>>> This patch has been hanging around in patchwork since 2018 and doesn't 
>>>> apply anymore. Is it still relevant ? If so, can you rebase et resubmit ?
>>>
>>> This seems to be important, since nargs is a user-provided value. I can
>>> submit it if the maintainers are willing to accept. I do not want to
>>> spend my time if no one is willing to review it.
>>
>> My memory is that I didn't think it was actually a problem, because all
>> we do is memset args.rets to zero.
>
> This is also my initial reaction to this. I suppose if the memset()
> implementation performs some validation of the destination buffer
> contents (comparing to a known poison value or something) that could
> load the CPU cache then there is a more plausible issue?

Yeah I guess that's possible.

In the past my approach to these was to analyse the exploitability of
each case and only patch those where there was a feasible case to be
made.

But I think that was wrong, doing that analysis is too time consuming at
scale, is easy to get wrong, and is also fragile in the face of the code
changing.

Especially in cases like this where the performance cost of the checks
is going to be dwarfed by other factors like syscall/firmware overhead.

>> Anyway we should probably just fix it to be safe and keep the static
>> checkers happy.
>
> Here is the relevant passage in its current state:
>
> if (copy_from_user(, uargs, 3 * sizeof(u32)) != 0)
> return -EFAULT;
>
> nargs = be32_to_cpu(args.nargs);
> nret  = be32_to_cpu(args.nret);
> token = be32_to_cpu(args.token);
>
> if (nargs >= ARRAY_SIZE(args.args)
> || nret > ARRAY_SIZE(args.args)
> || nargs + nret > ARRAY_SIZE(args.args))
> return -EINVAL;
>
> /* Copy in args. */
> if (copy_from_user(args.args, uargs->args,
>nargs * sizeof(rtas_arg_t)) != 0)
> return -EFAULT;
>
> /*
>  * If this token doesn't correspond to a function the kernel
>  * understands, you're not allowed to call it.
>  */
> func = rtas_token_to_function_untrusted(token);
> if (!func)
> return -EINVAL;
>
> args.rets = [nargs];
> memset(args.rets, 0, nret * sizeof(rtas_arg_t));
>
> Some questions:
>
> 1. The patch sanitizes 'nargs' immediately before the call to memset(),
>but shouldn't that happen before 'nargs' is used as an input to
>copy_from_user()?

I think the reasoning is that there's no way to exploit an out of bounds
value using copy_from_user(). But it's much easier to reason about if we
just do the sanitisation up front.

> 2. If 'nargs' needs this treatment, then why wouldn't the user-supplied
>'nret' and 'token' need them as well? 'nret' is used to index the
>same array as 'nargs'. And at least conceptually, 'token' is used to
>index a data structure (xarray) with array-like semantics (to be
>fair, this is a relatively recent development and was not the case
>when this change was submitted).

I don't know exactly what smatch looks for when trying to detect these,
but I suspect it's a plain array access. Not sure why it doesn't
complain about nret, but I think it would be good to sanitise it as well.

token is different, at least in the above code, because it's not bounds
checked, so there's no bounds check to bypass. Though maybe there is one
inside the rtas lookup code that should be masked.

cheers


Re: [RFC PATCH v2 06/20] powerpc/8xx: Fix size given to set_huge_pte_at()

2024-05-20 Thread Michael Ellerman
Christophe Leroy  writes:
> Hi Oscar, hi Michael,
>
> Le 20/05/2024 à 11:14, Oscar Salvador a écrit :
>> On Fri, May 17, 2024 at 09:00:00PM +0200, Christophe Leroy wrote:
>>> set_huge_pte_at() expects the real page size, not the psize which is
>> 
>> "expects the size of the huge page" sounds bettter?
>
> Parameter 'pzize' already provides the size of the hugepage, but not in 
> the way set_huge_pte_at() expects it.
>
> psize has one of the values defined by MMU_PAGE_XXX macros defined in 
> arch/powerpc/include/asm/mmu.h while set_huge_pte_at() expects the size 
> as a value.
>
>> 
>>> the index of the page definition in table mmu_psize_defs[]
>>>
>>> Fixes: 935d4f0c6dc8 ("mm: hugetlb: add huge page size param to 
>>> set_huge_pte_at()")
>>> Signed-off-by: Christophe Leroy 
>> 
>> Reviewed-by: Oscar Salvador 
>> 
>> AFAICS, this fixup is not related to the series, right? (yes, you will
>> the parameter later)
>> I would have it at the very beginning of the series.
>
> You are right, I should have submitted it separately.
>
> Michael can you take it as a fix for 6.10 ?

Yeah I can. Does it actually cause a bug at runtime (I assume so)?

cheers


Re: CVE-2023-52665: powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2

2024-05-20 Thread Michael Ellerman
Greg Kroah-Hartman  writes:
> On Mon, May 20, 2024 at 05:35:32PM +0900, Geoff Levand wrote:
>> On 5/20/24 16:04, Michael Ellerman wrote:
>> > Greg Kroah-Hartman  writes:
>> >> Description
>> >> ===
>> >>
>> >> In the Linux kernel, the following vulnerability has been resolved:
>> >>
>> >> powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2
>> >>
>> >> Commit 8c5fa3b5c4df ("powerpc/64: Make ELFv2 the default for big-endian
>> >> builds"), merged in Linux-6.5-rc1 changes the calling ABI in a way
>> >> that is incompatible with the current code for the PS3's LV1 hypervisor
>> >> calls.
>> >>
>> >> This change just adds the line '# CONFIG_PPC64_BIG_ENDIAN_ELF_ABI_V2 is 
>> >> not set'
>> >> to the ps3_defconfig file so that the PPC64_ELF_ABI_V1 is used.
>> >>
>> >> Fixes run time errors like these:
>> >>
>> >>   BUG: Kernel NULL pointer dereference at 0x
>> >>   Faulting instruction address: 0xc0047cf0
>> >>   Oops: Kernel access of bad area, sig: 11 [#1]
>> >>   Call Trace:
>> >>   [c23039e0] [c100ebfc] ps3_create_spu+0xc4/0x2b0 
>> >> (unreliable)
>> >>   [c2303ab0] [c100d4c4] create_spu+0xcc/0x3c4
>> >>   [c2303b40] [c100eae4] ps3_enumerate_spus+0xa4/0xf8
>> >>
>> >> The Linux kernel CVE team has assigned CVE-2023-52665 to this issue.
>> > 
>> > IMHO this doesn't warrant a CVE. The crash mentioned above happens at
>> > boot, so the system is not vulnerable it's just broken :)
>> 
>> As Greg says, with PPC64_BIG_ENDIAN_ELF_ABI_V2 enabled the system won't
>> boot, so there is no chance of a vulnerability.
>
> The definition of "vulnerability" from CVE.org is:
>   An instance of one or more weaknesses in a Product that can be
>   exploited, causing a negative impact to confidentiality, integrity, or
>   availability; a set of conditions or behaviors that allows the
>   violation of an explicit or implicit security policy.
>
> Having a system that does not boot is a "negative impact to
> availability", which is why this was selected for a CVE.  I.e. if a new
> kernel update has this problem in it, it would not allow the system to
> boot correctly.

I think the key word above is "exploited", implying some sort of
unauthorised action.

This bug can cause the system to not boot, but only by someone who
builds a new kernel and installs it - and if they have permission to do
that they can just replace the kernel with anything, they don't need a
bug.

> But, if the maintainer of the subsystem thinks this should not be
> assigned a CVE because of this fix, we'll be glad to revoke it.
>
> Michael, still want this revoked?

Yes please.

cheers


Re: CVE-2023-52665: powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2

2024-05-20 Thread Michael Ellerman
Greg Kroah-Hartman  writes:
> Description
> ===
>
> In the Linux kernel, the following vulnerability has been resolved:
>
> powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2
>
> Commit 8c5fa3b5c4df ("powerpc/64: Make ELFv2 the default for big-endian
> builds"), merged in Linux-6.5-rc1 changes the calling ABI in a way
> that is incompatible with the current code for the PS3's LV1 hypervisor
> calls.
>
> This change just adds the line '# CONFIG_PPC64_BIG_ENDIAN_ELF_ABI_V2 is not 
> set'
> to the ps3_defconfig file so that the PPC64_ELF_ABI_V1 is used.
>
> Fixes run time errors like these:
>
>   BUG: Kernel NULL pointer dereference at 0x
>   Faulting instruction address: 0xc0047cf0
>   Oops: Kernel access of bad area, sig: 11 [#1]
>   Call Trace:
>   [c23039e0] [c100ebfc] ps3_create_spu+0xc4/0x2b0 (unreliable)
>   [c2303ab0] [c100d4c4] create_spu+0xcc/0x3c4
>   [c2303b40] [c100eae4] ps3_enumerate_spus+0xa4/0xf8
>
> The Linux kernel CVE team has assigned CVE-2023-52665 to this issue.

IMHO this doesn't warrant a CVE. The crash mentioned above happens at
boot, so the system is not vulnerable it's just broken :)

cheers


[PATCH] selftests/sigaltstack: Fix ppc64 GCC build

2024-05-20 Thread Michael Ellerman
Building the sigaltstack test with GCC on 64-bit powerpc errors with:

  gcc -Wall sas.c  -o /home/michael/linux/.build/kselftest/sigaltstack/sas
  In file included from sas.c:23:
  current_stack_pointer.h:22:2: error: #error "implement current_stack_pointer 
equivalent"
 22 | #error "implement current_stack_pointer equivalent"
|  ^
  sas.c: In function ‘my_usr1’:
  sas.c:50:13: error: ‘sp’ undeclared (first use in this function); did you 
mean ‘p’?
 50 | if (sp < (unsigned long)sstack ||
| ^~

This happens because GCC doesn't define __ppc__ for 64-bit builds, only
32-bit builds. Instead use __powerpc__ to detect powerpc builds, which
is defined by clang and GCC for 64-bit and 32-bit builds.

Fixes: 05107edc9101 ("selftests: sigaltstack: fix -Wuninitialized")
Cc: sta...@vger.kernel.org # v6.3+
Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/sigaltstack/current_stack_pointer.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

I'll plan to merge this via the powerpc tree unless anyone objects.

diff --git a/tools/testing/selftests/sigaltstack/current_stack_pointer.h 
b/tools/testing/selftests/sigaltstack/current_stack_pointer.h
index ea9bdf3a90b1..09da8f1011ce 100644
--- a/tools/testing/selftests/sigaltstack/current_stack_pointer.h
+++ b/tools/testing/selftests/sigaltstack/current_stack_pointer.h
@@ -8,7 +8,7 @@ register unsigned long sp asm("sp");
 register unsigned long sp asm("esp");
 #elif __loongarch64
 register unsigned long sp asm("$sp");
-#elif __ppc__
+#elif __powerpc__
 register unsigned long sp asm("r1");
 #elif __s390x__
 register unsigned long sp asm("%15");
-- 
2.45.1



[GIT PULL] Please pull powerpc/linux.git powerpc-6.10-1 tag

2024-05-17 Thread Michael Ellerman
ved device

Geoff Levand (1):
  powerpc: Fix PS3 allmodconfig warning

Ghanshyam Agrawal (1):
  powerpc/eeh: Fix spelling of the word "auxillary" and update comment

Greg Kurz (1):
  powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"

Hari Bathini (7):
  powerpc/64/bpf: fix tail calls for PCREL addressing
  powerpc/bpf: enable kfunc call
  powerpc/pseries/fadump: add support for multiple boot memory regions
  powerpc/fadump: setup additional parameters for dump capture kernel
  powerpc/fadump: pass additional parameters when fadump is active
  powerpc/fadump: update documentation about bootargs_append
  powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP

Joel Stanley (1):
  KVM: PPC: Fix documentation for ppc mmu caps

Kunwu Chan (4):
  powerpc/iommu: Code cleanup for cell/iommu.c
  powerpc/cell: Code cleanup for spufs_mfc_flush
  powerpc/pseries/pci: Code cleanup
  KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver

Li Yang (2):
  powerpc: dts: mpc85xx: remove "simple-bus" compatible from ifc node
  powerpc: dts: fsl: rename ifc node name to be memory-controller

Lidong Zhong (1):
  powerpc/pseries/vio: Don't return ENODEV if node or compatible missing

Madhavan Srinivasan (3):
  selftests/powerpc: Re-order *FLAGS to follow lib.mk
  selftests/powerpc: Add flags.mk to support pmu buildable
  selftests/powerpc: make sub-folders buildable on their own

Mahesh Salgaonkar (1):
  powerpc: Avoid nmi_enter/nmi_exit in real mode interrupt.

Masahiro Yamada (1):
  powerpc: remove unused *_syscall_64.o variables in Makefile

Matthias Schiffer (1):
  powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX

Michael Ellerman (12):
  powerpc/dart: Drop unnecessary call to kmemleak_no_scan()
  selftests/powerpc: Convert pmu Makefile to for loop style
  selftests/powerpc: Install tests in sub-directories
  powerpc: Mark memory_limit as initdata
  MAINTAINERS: powerpc: Remove Aneesh
  MAINTAINERS: MMU GATHER: Update Aneesh's address
  powerpc/io: Avoid clang null pointer arithmetic warnings
  powerpc/64: Set _IO_BASE to POISON_POINTER_DELTA not 0 for CONFIG_PCI=n
  macintosh/ams: Fix unused variable warning
  Merge branch 'topic/ppc-kvm' into next
  Merge branch 'topic/kdump-hotplug' into next
  powerpc/fadump: Fix section mismatch warning

Nathan Chancellor (1):
  powerpc: Fix fatal warnings flag for LLVM's integrated assembler

Nathan Lynch (1):
  powerpc/pseries: Enforce hcall result buffer validity and size

Naveen N Rao (1):
  powerpc/Makefile: Remove bits related to the previous use of 
-mcmodel=large

Nicholas Miehlbradt (1):
  powerpc: Add static_key_feature_checks_initialized flag

Ran Wang (1):
  powerpc: dts: add power management nodes to FSL chips

Ritesh Harjani (IBM) (1):
  powerpc/ptdump: Fix walk_vmemmap() to also print first vmemmap entry

Shrikanth Hegde (2):
  powerpc/pseries: Add pool idle time at LPAR boot
  powerpc/pseries: Add failure related checks for h_get_mpp and h_get_ppp

Sourabh Jain (10):
  crash: forward memory_notify arg to arch crash hotplug handler
  crash: add a new kexec flag for hotplug support
  powerpc/kexec: move *_memory_ranges functions to ranges.c
  powerpc/kexec: make the update_cpus_node() function public
  powerpc/crash: add crash CPU hotplug support
  powerpc/crash: add crash memory hotplug support
  powerpc: make fadump resilient with memory add/remove events
  powerpc/fadump: add hotplug_ready sysfs interface
  Documentation/powerpc: update fadump implementation details
  powerpc/crash: remove unnecessary NULL check before kvfree()

Stephen Rothwell (1):
  Documentation: Fix the address of the linuxppc-dev mailing list

Thorsten Blum (1):
  powerpc: Use str_plural() in cpu_init_thread_core_maps()

Vaibhav Jain (1):
  KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception

Xiaowei Bao (1):
  powerpc: dts: p1010rdb: fix INTx interrupt issue on P1010RDB-PB

Yang Li (3):
  powerpc: boot: Fix kernel-doc param for partial_decompress
  powerpc: Fix kernel-doc comments in fsl_gtm.c
  powerpc/rtas: Add kernel-doc comments to smp_startup_cpu()

sundar (1):
  macintosh/macio-adb: replace of_node_put() with __free


 Documentation/ABI/testing/sysfs-devices-system-cpu  |  14 +-
 Documentation/ABI/testing/sysfs-firmware-opal-powercap  |   4 +-
 Documentation/ABI/testing/sysfs-firmware-opal-psr   |   4 +-
 Documentation/ABI/testing/sysfs-firmware-opal-sensor-groups |   4 +-
 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info |  10 +-
 Documentation/ABI/testing/sysfs-kernel-fadump   |  18 +
 Documentation/arch/powerpc/dexcr.rst| 141 -
 Documentation/arch/powerpc/fi

[PATCH] powerpc/fadump: Fix section mismatch warning

2024-05-16 Thread Michael Ellerman
With some compilers/configs fadump_setup_param_area() isn't inlined into
its caller (which is __init), leading to a section mismatch warning:

  WARNING: modpost: vmlinux: section mismatch in reference:
  fadump_setup_param_area+0x200 (section: .text.fadump_setup_param_area)
  -> memblock_phys_alloc_range (section: .init.text)

Fix it by adding an __init annotation.

Fixes: 683eab94da75 ("powerpc/fadump: setup additional parameters for dump 
capture kernel")
Reported-by: Stephen Rothwell 
Closes: https://lore.kernel.org/all/20240515163708.3380c...@canb.auug.org.au/
Reported-by: kernel test robot 
Closes: https://lore.kernel.org/all/202405140922.ouclox4y-...@intel.com/
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/fadump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 2276bacc4170..60f974775fc8 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1740,7 +1740,7 @@ static void __init fadump_process(void)
  * Reserve memory to store additional parameters to be passed
  * for fadump/capture kernel.
  */
-static void fadump_setup_param_area(void)
+static void __init fadump_setup_param_area(void)
 {
phys_addr_t range_start, range_end;
 
-- 
2.45.0



Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

2024-05-16 Thread Michael Ellerman
Andy Polyakov  writes:
> Hi,
>
>>> +.abiversion2
>>
>> I'd prefer that was left to the compiler flags.
>
> Problem is that it's the compiler that is responsible for providing this
> directive in the intermediate .s prior invoking the assembler. And there
> is no assembler flag to pass through -Wa.

Hmm, right. But none of our existing .S files include .abiversion
directives.

We build .S files with gcc, passing -mabi=elfv2, but it seems to have no
effect.

So all the intermediate .o's generated from .S files are not ELFv2:

  $ find .build/ -name '*.o' | xargs file | grep Unspecified
  .build/arch/powerpc/kernel/vdso/note-64.o:ELF 64-bit 
LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, 
version 1 (SYSV), not stripped
  .build/arch/powerpc/kernel/vdso/sigtramp64-64.o:  ELF 64-bit 
LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, 
version 1 (SYSV), not stripped
  .build/arch/powerpc/kernel/vdso/getcpu-64.o:  ELF 64-bit 
LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, 
version 1 (SYSV), not stripped
  .build/arch/powerpc/kernel/vdso/gettimeofday-64.o:ELF 64-bit 
LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, 
version 1 (SYSV), not stripped
  .build/arch/powerpc/kernel/vdso/datapage-64.o:ELF 64-bit 
LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, 
version 1 (SYSV), not stripped
  ...

But the actual code follows ELFv2, because we wrote it that way, and I
guess the linker doesn't look at the actual ABI version of the .o ?

So it currently works. But it's kind of gross that those .o files are
not ELFv2 for an ELFv2 build.

> If concern is ABI neutrality,
> then solution would rather be #if (_CALL_ELF-0) == 2/#endif. One can
> also make a case for
>
> #ifdef _CALL_ELF
> .abiversion _CALL_ELF
> #endif

Is .abiversion documented anywhere? I can't see it in the manual.

We used to use _CALL_ELF, but the kernel config is supposed to be the
source of truth, so we'd use:

  #ifdef CONFIG_PPC64_ELF_ABI_V2
  .abiversion 2
  #endif

And probably put it in a macro like:

  #ifdef CONFIG_PPC64_ELF_ABI_V2
  #define ASM_ABI_VERSION .abiversion 2
  #else
  #define ASM_ABI_VERSION
  #endif

Or something like that. But it's annoying that we need to go and
sprinkle that in every .S file.

Anyway, my comment can be ignored as far as this series is concerned,
seems we have to clean this up everywhere.

cheers


Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

2024-05-15 Thread Michael Ellerman
Hi Danny,

Danny Tsen  writes:
> Use the perl output of x25519-ppc64.pl from CRYPTOGAMs and added three
> supporting functions, x25519_fe51_sqr_times, x25519_fe51_frombytes
> and x25519_fe51_tobytes.

For other algorithms we have checked-in the perl script and generated
the code at runtime. Is there a reason you've done it differently this time?

> Signed-off-by: Danny Tsen 
> ---
>  arch/powerpc/crypto/curve25519-ppc64le_asm.S | 648 +++
>  1 file changed, 648 insertions(+)
>  create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S
>
> diff --git a/arch/powerpc/crypto/curve25519-ppc64le_asm.S 
> b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
> new file mode 100644
> index ..8a018104838a
> --- /dev/null
> +++ b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
> @@ -0,0 +1,648 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#
> +# Copyright 2024- IBM Corp.  All Rights Reserved.
 
I'm not a lawyer, but AFAIK "All Rights Reserved" is not required and
can be confusing - because we are not reserving all rights, we are
granting some rights under the GPL.

I also think the IBM copyright should be down below where your
modifications are described.

> +# This code is taken from CRYPTOGAMs[1] and is included here using the option
> +# in the license to distribute the code under the GPL. Therefore this program
> +# is free software; you can redistribute it and/or modify it under the terms 
> of
> +# the GNU General Public License version 2 as published by the Free Software
> +# Foundation.
> +#
> +# [1] https://www.openssl.org/~appro/cryptogams/
> +
> +# Copyright (c) 2006-2017, CRYPTOGAMS by 
> +# All rights reserved.
> +#
> +# Redistribution and use in source and binary forms, with or without
> +# modification, are permitted provided that the following conditions
> +# are met:
> +#
> +#   * Redistributions of source code must retain copyright notices,
> +# this list of conditions and the following disclaimer.
> +#
> +#   * Redistributions in binary form must reproduce the above
> +# copyright notice, this list of conditions and the following
> +# disclaimer in the documentation and/or other materials
> +# provided with the distribution.
> +#
> +#   * Neither the name of the CRYPTOGAMS nor the names of its
> +# copyright holder and contributors may be used to endorse or
> +# promote products derived from this software without specific
> +# prior written permission.
> +#
> +# ALTERNATIVELY, provided that this notice is retained in full, this
> +# product may be distributed under the terms of the GNU General Public
> +# License (GPL), in which case the provisions of the GPL apply INSTEAD OF
> +# those given above.
> +#
> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS
> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +# 
> +# Written by Andy Polyakov  for the OpenSSL
> +# project. The module is, however, dual licensed under OpenSSL and
> +# CRYPTOGAMS licenses depending on where you obtain it. For further
> +# details see https://www.openssl.org/~appro/cryptogams/.
> +# 
> +
> +#
> +# 
> +# Written and Modified by Danny Tsen 
> +# - Added x25519_fe51_sqr_times, x25519_fe51_frombytes, x25519_fe51_tobytes

ie. here.

> +# X25519 lower-level primitives for PPC64.
> +#
> +
> +#include 
> +
> +.machine "any"
 
Please don't add new .machine directives unless they are required.

> +.abiversion  2

I'd prefer that was left to the compiler flags.

cheers


Re: linux-next: build warning after merge of the powerpc tree

2024-05-15 Thread Michael Ellerman
Stephen Rothwell  writes:
> Hi all,
>
> After merging the powerpc tree, today's (it may have been yesterday's)
> linux-next build (powerpc allyesconfig) produced this warning:
>
> WARNING: modpost: vmlinux: section mismatch in reference: 
> fadump_setup_param_area+0x200 (section: .text.fadump_setup_param_area) -> 
> memblock_phys_alloc_range (section: .init.text)

I don't see the warning, but clearly it is possible if the compiler
decides not to inline fadump_setup_param_area().

What compiler version are you using?

cheers


Re: [PATCH bpf v3] powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH

2024-05-13 Thread Michael Ellerman
Puranjay Mohan  writes:
> Naveen N Rao  writes:
>> On Mon, May 13, 2024 at 10:02:48AM GMT, Puranjay Mohan wrote:
>>> The Linux Kernel Memory Model [1][2] requires RMW operations that have a
>>> return value to be fully ordered.
>>> 
>>> BPF atomic operations with BPF_FETCH (including BPF_XCHG and
>>> BPF_CMPXCHG) return a value back so they need to be JITed to fully
>>> ordered operations. POWERPC currently emits relaxed operations for
>>> these.
>>> 
>>> We can show this by running the following litmus-test:
>>> 
>>> PPC SB+atomic_add+fetch
>>> 
>>> {
>>> 0:r0=x;  (* dst reg assuming offset is 0 *)
>>> 0:r1=2;  (* src reg *)
>>> 0:r2=1;
>>> 0:r4=y;  (* P0 writes to this, P1 reads this *)
>>> 0:r5=z;  (* P1 writes to this, P0 reads this *)
>>> 0:r6=0;
>>> 
>>> 1:r2=1;
>>> 1:r4=y;
>>> 1:r5=z;
>>> }
>>> 
>>> P0  | P1;
>>> stw r2, 0(r4)   | stw  r2,0(r5) ;
>>> |   ;
>>> loop:lwarx  r3, r6, r0  |   ;
>>> mr  r8, r3  |   ;
>>> add r3, r3, r1  | sync  ;
>>> stwcx.  r3, r6, r0  |   ;
>>> bne loop|   ;
>>> mr  r1, r8  |   ;
>>> |   ;
>>> lwa r7, 0(r5)   | lwa  r7,0(r4) ;
>>> 
>>> ~exists(0:r7=0 /\ 1:r7=0)
>>> 
>>> Witnesses
>>> Positive: 9 Negative: 3
>>> Condition ~exists (0:r7=0 /\ 1:r7=0)
>>> Observation SB+atomic_add+fetch Sometimes 3 9
>>> 
>>> This test shows that the older store in P0 is reordered with a newer
>>> load to a different address. Although there is a RMW operation with
>>> fetch between them. Adding a sync before and after RMW fixes the issue:
>>> 
>>> Witnesses
>>> Positive: 9 Negative: 0
>>> Condition ~exists (0:r7=0 /\ 1:r7=0)
>>> Observation SB+atomic_add+fetch Never 0 9
>>> 
>>> [1] https://www.kernel.org/doc/Documentation/memory-barriers.txt
>>> [2] https://www.kernel.org/doc/Documentation/atomic_t.txt
>>> 
>>> Fixes: 65112709115f ("powerpc/bpf/64: add support for BPF_ATOMIC bitwise 
>>> operations")
>>
>> As I noted in v2, I think that is the wrong commit. This fixes the below 
>
> Sorry for missing this. Would this need another version or your message
> below will make it work with the stable process?

No need for another version. b4 should pick up those tags, or if not
I'll add them by hand.

cheers


Re: [PATCH v2 0/3] powerpc/fadump: pass additional args to dump capture kernel

2024-05-13 Thread Michael Ellerman
On Thu, 09 May 2024 17:27:52 +0530, Hari Bathini wrote:
> While fadump is a more reliable alternative to kdump dump capturing
> method, it doesn't support passing additional parameters. Having
> such support is desirable for two major reasons:
> 
>   1. It helps minimize the memory consumption of fadump dump capture
>  kernel by disabling features that consume considerable amount of
>  memory but have little significance for dump capture environment
>  (eg. numa, cma, cgroup, etc.)
>2. It helps disable such features/components in dump capture kernel
>   that are unstable and/or are being debugged.
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc/pseries/fadump: add support for multiple boot memory regions
  https://git.kernel.org/powerpc/c/78d5cc15fb7d1b2683f0baf418a9a870c02319fb
[2/3] powerpc/fadump: setup additional parameters for dump capture kernel
  https://git.kernel.org/powerpc/c/683eab94da75bcf55a9c65e0c31d0529edebe86d
[3/3] powerpc/fadump: pass additional parameters when fadump is active
  https://git.kernel.org/powerpc/c/3416c9daa6b13c0e2a656d4e2dee8de95f9a38cf

cheers


Re: [PATCH] powerpc/fadump: update documentation about bootargs_append

2024-05-13 Thread Michael Ellerman
On Fri, 10 May 2024 13:51:14 +0530, Hari Bathini wrote:
> Update ABI documentation about the introduction of the new sysfs
> entry bootargs_append. This sysfs entry will be used to setup the
> additional parameters to be passed to dump capture kernel.
> 
> 

Applied to powerpc/next.

[1/1] powerpc/fadump: update documentation about bootargs_append
  https://git.kernel.org/powerpc/c/9dc140785961e53b1d45d186961a3b0d374bfc6a

cheers


Re: [PATCH] powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP

2024-05-13 Thread Michael Ellerman
On Fri, 10 May 2024 13:37:57 +0530, Hari Bathini wrote:
> Since commit 5c4233cc0920 ("powerpc/kdump: Split KEXEC_CORE and
> CRASH_DUMP dependency"), crashing_cpu is not available without
> CONFIG_CRASH_DUMP. Fix compile error on 64-BIT 85xx owing to this
> change.
> 
> 

Applied to powerpc/next.

[1/1] powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
  https://git.kernel.org/powerpc/c/7b090b6ff51b9a9f002139660672f662b95f0630

cheers


Re: [PATCH v2 1/2] powerpc/io: Avoid clang null pointer arithmetic warnings

2024-05-10 Thread Michael Ellerman
Naresh Kamboju  writes:
> On Fri, 3 May 2024 at 13:26, Michael Ellerman  wrote:
>>
>> With -Wextra clang warns about pointer arithmetic using a null pointer.
>> When building with CONFIG_PCI=n, that triggers a warning in the IO
>> accessors, eg:
>>
>>   In file included from linux/arch/powerpc/include/asm/io.h:672:
>>   linux/arch/powerpc/include/asm/io-defs.h:23:1: warning: performing pointer 
>> arithmetic on a null pointer has undefined behavior 
>> [-Wnull-pointer-arithmetic]
>>  23 | DEF_PCI_AC_RET(inb, u8, (unsigned long port), (port), pio, port)
>> | ^~~~
>>   ...
>>   linux/arch/powerpc/include/asm/io.h:591:53: note: expanded from macro 
>> '__do_inb'
>> 591 | #define __do_inb(port)  readb((PCI_IO_ADDR)_IO_BASE + 
>> port);
>> |   ~ ^
>>
>> That is because when CONFIG_PCI=n, _IO_BASE is defined as 0.
>>
>> Although _IO_BASE is defined as plain 0, the cast (PCI_IO_ADDR) converts
>> it to void * before the addition with port happens.
>>
>> Instead the addition can be done first, and then the cast. The resulting
>> value will be the same, but avoids the warning, and also avoids void
>> pointer arithmetic which is apparently non-standard.
>>
>> Reported-by: Naresh Kamboju 
>> Closes: 
>> https://lore.kernel.org/all/CA+G9fYtEh8zmq8k8wE-8RZwW-Qr927RLTn+KqGnq1F=ptaa...@mail.gmail.com
>> Signed-off-by: Michael Ellerman 
>
> Tested-by: Linux Kernel Functional Testing 

Thanks.

cheers


Re: [PATCH 3/3] powerpc: Check only single values are passed to CPU/MMU feature checks

2024-05-10 Thread Michael Ellerman
Segher Boessenkool  writes:
> On Thu, May 09, 2024 at 10:12:48PM +1000, Michael Ellerman wrote:
>> cpu_has_feature()/mmu_has_feature() are only able to check a single
>> feature at a time, but there is no enforcement of that.
>> 
>> In fact, as fixed in the previous commit, there was code that was
>> passing multiple values to cpu_has_feature().
>> 
>> So add a check that only a single feature is passed using popcount.
>> 
>> Note that the test allows 0 or 1 bits to be set, because some code
>> relies on cpu_has_feature(0) being false, the check with
>> CPU_FTRS_POSSIBLE ensures that. See for example CPU_FTR_PPC_LE.
>
> This btw is exactly
>
>   BUILD_BUG_ON(feature & (feature - 1));
>
> but the popcount is more readable :-)

Yeah for those of us who don't see bits cascading in our sleep I think
the popcount is easier to understand ;)

cheers


[PATCH 1/3] powerpc: Drop clang workaround for builtin constant checks

2024-05-09 Thread Michael Ellerman
The CPU/MMU feature code has build-time checks that the feature value is
a builtin constant.

Back when the code was added clang wasn't able to compile the
checks, so an ifdef was added to avoid the checks for clang builds.
See b5fa0f7f88ed ("powerpc: Fix build failure with clang due to
BUILD_BUG_ON()")

These days clang 13 and later are able to build the checks successfully,
so drop the workaround.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/cpu_has_feature.h | 2 --
 arch/powerpc/include/asm/mmu.h | 2 --
 2 files changed, 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cpu_has_feature.h 
b/arch/powerpc/include/asm/cpu_has_feature.h
index 0efabccd820c..92e24e979954 100644
--- a/arch/powerpc/include/asm/cpu_has_feature.h
+++ b/arch/powerpc/include/asm/cpu_has_feature.h
@@ -24,9 +24,7 @@ static __always_inline bool cpu_has_feature(unsigned long 
feature)
 {
int i;
 
-#ifndef __clang__ /* clang can't cope with this */
BUILD_BUG_ON(!__builtin_constant_p(feature));
-#endif
 
 #ifdef CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG
if (!static_key_feature_checks_initialized) {
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 24f830cf9bb4..4ab9a630d943 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -246,9 +246,7 @@ static __always_inline bool mmu_has_feature(unsigned long 
feature)
 {
int i;
 
-#ifndef __clang__ /* clang can't cope with this */
BUILD_BUG_ON(!__builtin_constant_p(feature));
-#endif
 
 #ifdef CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG
if (!static_key_feature_checks_initialized) {
-- 
2.45.0



[PATCH 3/3] powerpc: Check only single values are passed to CPU/MMU feature checks

2024-05-09 Thread Michael Ellerman
cpu_has_feature()/mmu_has_feature() are only able to check a single
feature at a time, but there is no enforcement of that.

In fact, as fixed in the previous commit, there was code that was
passing multiple values to cpu_has_feature().

So add a check that only a single feature is passed using popcount.

Note that the test allows 0 or 1 bits to be set, because some code
relies on cpu_has_feature(0) being false, the check with
CPU_FTRS_POSSIBLE ensures that. See for example CPU_FTR_PPC_LE.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/cpu_has_feature.h | 1 +
 arch/powerpc/include/asm/mmu.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/cpu_has_feature.h 
b/arch/powerpc/include/asm/cpu_has_feature.h
index 92e24e979954..bf8a228229fa 100644
--- a/arch/powerpc/include/asm/cpu_has_feature.h
+++ b/arch/powerpc/include/asm/cpu_has_feature.h
@@ -25,6 +25,7 @@ static __always_inline bool cpu_has_feature(unsigned long 
feature)
int i;
 
BUILD_BUG_ON(!__builtin_constant_p(feature));
+   BUILD_BUG_ON(__builtin_popcountl(feature) > 1);
 
 #ifdef CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG
if (!static_key_feature_checks_initialized) {
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 4ab9a630d943..eb3065692055 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -247,6 +247,7 @@ static __always_inline bool mmu_has_feature(unsigned long 
feature)
int i;
 
BUILD_BUG_ON(!__builtin_constant_p(feature));
+   BUILD_BUG_ON(__builtin_popcountl(feature) > 1);
 
 #ifdef CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG
if (!static_key_feature_checks_initialized) {
-- 
2.45.0



[PATCH 2/3] powerpc/xmon: Fix disassembly CPU feature checks

2024-05-09 Thread Michael Ellerman
In the xmon disassembly code there are several CPU feature checks to
determine what dialects should be passed to the disassembler. The
dialect controls which instructions the disassembler will recognise.

Unfortunately the checks are incorrect, because instead of passing a
single CPU feature they are passing a mask of feature bits.

For example the code:

  if (cpu_has_feature(CPU_FTRS_POWER5))
  dialect |= PPC_OPCODE_POWER5;

Is trying to check if the system is running on a Power5 CPU. But
CPU_FTRS_POWER5 is a mask of *all* the feature bits that are enabled on
a Power5.

In practice the test will always return true for any 64-bit CPU, because
at least one bit in the mask will be present in the CPU_FTRS_ALWAYS
mask.

Similarly for all the other checks against CPU_FTRS_xx masks.

Rather than trying to match the disassembly behaviour exactly to the
current CPU, just differentiate between 32-bit and 64-bit, and Altivec,
VSX and HTM.

That will cause some instructions to be shown in disassembly even
on a CPU that doesn't support them, but that's OK, objdump -d output
has the same behaviour, and if anything it's less confusing than some
instructions not being disassembled.

Fixes: 897f112bb42e ("[POWERPC] Import updated version of ppc disassembly code 
for xmon")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/xmon/ppc-dis.c | 33 +++--
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/xmon/ppc-dis.c b/arch/powerpc/xmon/ppc-dis.c
index 75fa98221d48..af105e1bc3fc 100644
--- a/arch/powerpc/xmon/ppc-dis.c
+++ b/arch/powerpc/xmon/ppc-dis.c
@@ -122,32 +122,21 @@ int print_insn_powerpc (unsigned long insn, unsigned long 
memaddr)
   bool insn_is_short;
   ppc_cpu_t dialect;
 
-  dialect = PPC_OPCODE_PPC | PPC_OPCODE_COMMON
-| PPC_OPCODE_64 | PPC_OPCODE_POWER4 | PPC_OPCODE_ALTIVEC;
+  dialect = PPC_OPCODE_PPC | PPC_OPCODE_COMMON;
 
-  if (cpu_has_feature(CPU_FTRS_POWER5))
-dialect |= PPC_OPCODE_POWER5;
+  if (IS_ENABLED(CONFIG_PPC64))
+dialect |= PPC_OPCODE_64 | PPC_OPCODE_POWER4 | PPC_OPCODE_CELL |
+   PPC_OPCODE_POWER5 | PPC_OPCODE_POWER6 | PPC_OPCODE_POWER7 | 
PPC_OPCODE_POWER8 |
+   PPC_OPCODE_POWER9;
 
-  if (cpu_has_feature(CPU_FTRS_CELL))
-dialect |= (PPC_OPCODE_CELL | PPC_OPCODE_ALTIVEC);
+  if (cpu_has_feature(CPU_FTR_TM))
+dialect |= PPC_OPCODE_HTM;
 
-  if (cpu_has_feature(CPU_FTRS_POWER6))
-dialect |= (PPC_OPCODE_POWER5 | PPC_OPCODE_POWER6 | PPC_OPCODE_ALTIVEC);
+  if (cpu_has_feature(CPU_FTR_ALTIVEC))
+dialect |= PPC_OPCODE_ALTIVEC | PPC_OPCODE_ALTIVEC2;
 
-  if (cpu_has_feature(CPU_FTRS_POWER7))
-dialect |= (PPC_OPCODE_POWER5 | PPC_OPCODE_POWER6 | PPC_OPCODE_POWER7
-| PPC_OPCODE_ALTIVEC | PPC_OPCODE_VSX);
-
-  if (cpu_has_feature(CPU_FTRS_POWER8))
-dialect |= (PPC_OPCODE_POWER5 | PPC_OPCODE_POWER6 | PPC_OPCODE_POWER7
-   | PPC_OPCODE_POWER8 | PPC_OPCODE_HTM
-   | PPC_OPCODE_ALTIVEC | PPC_OPCODE_ALTIVEC2 | PPC_OPCODE_VSX);
-
-  if (cpu_has_feature(CPU_FTRS_POWER9))
-dialect |= (PPC_OPCODE_POWER5 | PPC_OPCODE_POWER6 | PPC_OPCODE_POWER7
-   | PPC_OPCODE_POWER8 | PPC_OPCODE_POWER9 | PPC_OPCODE_HTM
-   | PPC_OPCODE_ALTIVEC | PPC_OPCODE_ALTIVEC2
-   | PPC_OPCODE_VSX | PPC_OPCODE_VSX3);
+  if (cpu_has_feature(CPU_FTR_VSX))
+dialect |= PPC_OPCODE_VSX | PPC_OPCODE_VSX3;
 
   /* Get the major opcode of the insn.  */
   opcode = NULL;
-- 
2.45.0



Re: [PATCH 7/8] powerpc: Fix typos

2024-05-08 Thread Michael Ellerman
Bjorn Helgaas  writes:
> From: Bjorn Helgaas 
>
> Fix typos, most reported by "codespell arch/powerpc".  Only touches
> comments, no code changes.
>
> Signed-off-by: Bjorn Helgaas 
> Cc: Nicholas Piggin 
> Cc: Christophe Leroy 
> Cc: linuxppc-dev@lists.ozlabs.org

Applied to powerpc/next.

[1/1] powerpc: Fix typos
  https://git.kernel.org/powerpc/c/0ddbbb8960eaf91c7b432ec80566dfa60a8d79e4

cheers


Re: [PATCH v2] KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception

2024-05-08 Thread Michael Ellerman
On Mon, 15 Apr 2024 09:27:29 +0530, Vaibhav Jain wrote:
> This reverts commit 180c6b072bf3 ("KVM: PPC: Book3S HV nestedv2: Do not
> cancel pending decrementer exception") [1] which prevented canceling a
> pending HDEC exception for nestedv2 KVM guests. It was done to avoid
> overhead of a H_GUEST_GET_STATE hcall to read the 'DEC expiry TB' register
> which was higher compared to handling extra decrementer exceptions.
> 
> However recent benchmarks indicate that overhead of not handling 'DECR'
> expiry for Nested KVM Guest(L2) is higher and results in much larger exits
> to Pseries Host(L1) as indicated by the Unixbench-arithoh bench[2]
> 
> [...]

Applied to powerpc/topic/ppc-kvm.

[1/1] KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception
  https://git.kernel.org/powerpc/c/7be6ce7043b4cf293c8826a48fd9f56931cef2cf

cheers


Re: [PATCH] KVM: PPC: Fix documentation for ppc mmu caps

2024-05-08 Thread Michael Ellerman
On Tue, 11 Apr 2023 15:44:46 +0930, Joel Stanley wrote:
> The documentation mentions KVM_CAP_PPC_RADIX_MMU, but the defines in the
> kvm headers spell it KVM_CAP_PPC_MMU_RADIX. Similarly with
> KVM_CAP_PPC_MMU_HASH_V3.
> 
> 

Applied to powerpc/topic/ppc-kvm.

[1/1] KVM: PPC: Fix documentation for ppc mmu caps
  https://git.kernel.org/powerpc/c/651d61bc8b7d8bb622cfc24be2ee92eebb4ed3cc

cheers


Re: [PATCH] KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info()

2024-05-08 Thread Michael Ellerman
On Sun, 28 Jan 2024 12:34:25 +0100, Christophe JAILLET wrote:
> The return value of kvmppc_gse_put_buff_info() is not assigned to 'rc' and
> 'rc' is uninitialized at this point.
> So the error handling can not work.
> 
> Assign the expected value to 'rc' to fix the issue.
> 
> 
> [...]

Applied to powerpc/topic/ppc-kvm.

[1/1] KVM: PPC: Book3S HV nestedv2: Fix an error handling path in 
gs_msg_ops_kvmhv_nestedv2_config_fill_info()
  https://git.kernel.org/powerpc/c/b52e8cd3f835869370f8540f1bc804a47a47f02b

cheers


Re: [PATCH v2] KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver

2024-05-08 Thread Michael Ellerman
On Thu, 25 Jan 2024 16:33:48 +0800, Kunwu Chan wrote:
> This part was commented from commit 2f4cf5e42d13 ("Add book3s.c")
> in about 14 years before.
> If there are no plans to enable this part code in the future,
> we can remove this dead code.
> 
> 

Applied to powerpc/topic/ppc-kvm.

[1/1] KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver
  https://git.kernel.org/powerpc/c/a9c08bcd3179a59998d6339505d0010b82cbcb93

cheers


  1   2   3   4   5   6   7   8   9   10   >