[powerpc:next-test] BUILD SUCCESS 335e1a91042764629fbbcd8c7e40379fa3762d35

2022-09-28 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: 335e1a91042764629fbbcd8c7e40379fa3762d35  powerpc: Ignore DSI 
error caused by the copy/paste instruction

elapsed time: 869m

configs tested: 58
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
x86_64   rhel-8.3-kvm
x86_64   rhel-8.3-syz
x86_64 rhel-8.3-kunit
alphaallyesconfig
um i386_defconfig
um   x86_64_defconfig
arc  allyesconfig
arc defconfig
alpha   defconfig
m68k allyesconfig
s390 allmodconfig
m68k allmodconfig
s390defconfig
x86_64  defconfig
x86_64  rhel-8.3-func
x86_64rhel-8.3-kselftests
i386defconfig
x86_64   rhel-8.3
x86_64randconfig-a013
x86_64randconfig-a011
s390 allyesconfig
arc  randconfig-r043-20220928
x86_64randconfig-a015
x86_64   allyesconfig
x86_64   randconfig-a002-20220926
x86_64   randconfig-a001-20220926
powerpc   allnoconfig
i386  randconfig-a001
x86_64   randconfig-a003-20220926
arm defconfig
powerpc  allmodconfig
i386  randconfig-a003
mips allyesconfig
x86_64   randconfig-a004-20220926
i386 allyesconfig
x86_64   randconfig-a006-20220926
sh   allmodconfig
x86_64   randconfig-a005-20220926
i386  randconfig-a005
arm64allyesconfig
arm  allyesconfig
ia64 allmodconfig

clang tested configs:
x86_64randconfig-a012
hexagon  randconfig-r045-20220928
hexagon  randconfig-r041-20220928
x86_64randconfig-a014
x86_64randconfig-a016
riscvrandconfig-r042-20220928
s390 randconfig-r044-20220928
i386 randconfig-a011-20220926
i386 randconfig-a013-20220926
i386 randconfig-a012-20220926
i386  randconfig-a002
i386 randconfig-a016-20220926
i386  randconfig-a006
i386 randconfig-a015-20220926
i386  randconfig-a004
i386 randconfig-a014-20220926

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


Re: [PATCH V2 2/3] tools/perf/tests: Fix branch stack sampling test to include sanity check for branch filter

2022-09-28 Thread Michael Ellerman
Arnaldo Carvalho de Melo  writes:
> Em Wed, Sep 21, 2022 at 08:22:54PM +0530, Athira Rajeev escreveu:
>> commit b55878c90ab9 ("perf test: Add test for branch stack sampling")
>> added test for branch stack sampling. There is a sanity check in the
>> beginning to skip the test if the hardware doesn't support branch stack
>> sampling.
>> 
>> Snippet
>> <<>>
>> skip the test if the hardware doesn't support branch stack sampling
>> perf record -b -o- -B true > /dev/null 2>&1 || exit 2
>> <<>>
>> 
>> But the testcase also uses branch sample types: save_type, any. if any
>> platform doesn't support the branch filters used in the test, the testcase
>> will fail. In powerpc, currently mutliple branch filters are not supported
>> and hence this test fails in powerpc. Fix the sanity check to look at
>> the support for branch filters used in this test before proceeding with
>>> the test.
>
> Applied the tools/perf/ part

Thanks, I have the other two queued.

cheers


[powerpc:next] BUILD SUCCESS 6bd7ff497b4af13ea3d53781ffca7dc744dbb4da

2022-09-28 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: 6bd7ff497b4af13ea3d53781ffca7dc744dbb4da  powerpc/udbg: Remove 
extern function prototypes

elapsed time: 853m

configs tested: 58
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
um i386_defconfig
um   x86_64_defconfig
arc defconfig
alpha   defconfig
powerpc   allnoconfig
mips allyesconfig
s390 allmodconfig
s390defconfig
powerpc  allmodconfig
sh   allmodconfig
x86_64  defconfig
i386defconfig
s390 allyesconfig
x86_64   rhel-8.3-syz
x86_64 rhel-8.3-kunit
x86_64   rhel-8.3-kvm
x86_64   rhel-8.3
x86_64randconfig-a013
x86_64randconfig-a011
x86_64  rhel-8.3-func
arc  randconfig-r043-20220927
x86_64randconfig-a015
arm defconfig
x86_64   allyesconfig
x86_64rhel-8.3-kselftests
riscvrandconfig-r042-20220927
s390 randconfig-r044-20220927
i386 allyesconfig
m68k allmodconfig
i386  randconfig-a001
arc  allyesconfig
i386  randconfig-a003
alphaallyesconfig
m68k allyesconfig
i386  randconfig-a005
arm64allyesconfig
arm  allyesconfig
x86_64randconfig-a004
x86_64randconfig-a002
x86_64randconfig-a006
ia64 allmodconfig

clang tested configs:
i386 randconfig-a011-20220926
i386 randconfig-a013-20220926
i386 randconfig-a012-20220926
x86_64randconfig-a012
i386 randconfig-a016-20220926
i386 randconfig-a015-20220926
i386 randconfig-a014-20220926
hexagon  randconfig-r045-20220927
x86_64randconfig-a014
hexagon  randconfig-r041-20220927
x86_64randconfig-a016
i386  randconfig-a004
i386  randconfig-a002
i386  randconfig-a006
x86_64randconfig-a001
x86_64randconfig-a003
x86_64randconfig-a005

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


[PATCH] powerpc/configs: Enable PPC_UV in powernv_defconfig

2022-09-28 Thread Michael Ellerman
Make sure the ultravisor code at least gets some build testing by
enabling it in powernv_defconfig.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/powernv_defconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/configs/powernv_defconfig 
b/arch/powerpc/configs/powernv_defconfig
index 49f49c263935..76fb6ff8769f 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -50,6 +50,7 @@ CONFIG_CPU_IDLE=y
 CONFIG_HZ_100=y
 CONFIG_BINFMT_MISC=m
 CONFIG_PPC_TRANSACTIONAL_MEM=y
+CONFIG_PPC_UV=y
 CONFIG_HOTPLUG_CPU=y
 CONFIG_KEXEC=y
 CONFIG_KEXEC_FILE=y
@@ -65,6 +66,8 @@ CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
 CONFIG_SCHED_SMT=y
 CONFIG_PM=y
 CONFIG_HOTPLUG_PCI=y
+CONFIG_ZONE_DEVICE=y
+CONFIG_DEVICE_PRIVATE=y
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
-- 
2.37.3



Re: [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page

2022-09-28 Thread Michael Ellerman
Alistair Popple  writes:
> Michael Ellerman  writes:
>> Alistair Popple  writes:
>>> When the CPU tries to access a device private page the migrate_to_ram()
>>> callback associated with the pgmap for the page is called. However no
>>> reference is taken on the faulting page. Therefore a concurrent
>>> migration of the device private page can free the page and possibly the
>>> underlying pgmap. This results in a race which can crash the kernel due
>>> to the migrate_to_ram() function pointer becoming invalid. It also means
>>> drivers can't reliably read the zone_device_data field because the page
>>> may have been freed with memunmap_pages().
>>>
>>> Close the race by getting a reference on the page while holding the ptl
>>> to ensure it has not been freed. Unfortunately the elevated reference
>>> count will cause the migration required to handle the fault to fail. To
>>> avoid this failure pass the faulting page into the migrate_vma functions
>>> so that if an elevated reference count is found it can be checked to see
>>> if it's expected or not.
>>>
>>> Signed-off-by: Alistair Popple 
>>> ---
>>>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 15 ++-
>>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++--
>>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |  2 +-
>>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +---
>>>  include/linux/migrate.h  |  8 ++-
>>>  lib/test_hmm.c   |  7 ++---
>>>  mm/memory.c  | 16 +++-
>>>  mm/migrate.c | 34 ++---
>>>  mm/migrate_device.c  | 18 +
>>>  9 files changed, 87 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
>>> b/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> index 5980063..d4eacf4 100644
>>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> @@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
...
>>> @@ -994,7 +997,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct 
>>> vm_fault *vmf)
>>>
>>> if (kvmppc_svm_page_out(vmf->vma, vmf->address,
>>> vmf->address + PAGE_SIZE, PAGE_SHIFT,
>>> -   pvt->kvm, pvt->gpa))
>>> +   pvt->kvm, pvt->gpa, vmf->page))
>>> return VM_FAULT_SIGBUS;
>>> else
>>> return 0;
>>
>> I don't have a UV test system, but as-is it doesn't even compile :)
>
> Ugh, thanks. I did get as far as installing a PPC cross-compiler and
> building a kernel. Apparently I did not get as far as enabling
> CONFIG_PPC_UV :)

No worries, that's really on us. If we're going to keep the code in the
tree then it should really be enabled in at least one of our defconfigs.

cheers


Bug in the VirtIO GPU driver since the RC7 of kernel 6.0

2022-09-28 Thread Christian Zigotzky
Hi All,

I have found the issue. I cross compiled this kernel with GCC 11.2.0 on Ubuntu 
22.04.1.

I cross compiled the same kernel with GCC 9.4.0 again. This time on Ubuntu 
20.04.5.

KVM with the VirtIO GPU works with the GCC 9.4.0 compiled kernel.

— Christian

I wrote:

Hello,

Xorg doesn't start anymore in a virtual e5500 QEMU KVM HV machine with 
the VirtIO GPU [1] since the RC7 of kernel 6.0. [2]

Please find attached the kernel config.

Thanks,
Christian

[1] qemu-system-ppc64 -M ppce500 -cpu e5500 -m 1024 -kernel uImage-6.0 
-drive format=raw,file=void-live-powerpc-20220129.img,index=0,if=virtio 
-netdev user,id=mynet0 -device virtio-net,netdev=mynet0 -append "rw 
root=/dev/vda2" -device virtio-gpu -device virtio-mouse-pci -device 
virtio-keyboard-pci -device pci-ohci,id=newusb -audiodev 
id=sndbe,driver=pa,server=/run/user/1000/pulse/native -device 
usb-audio,bus=newusb.0 -enable-kvm -smp 4 -fsdev 
local,security_model=passthrough,id=fsdev0,path=/home/amigaone/Music 
-device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=hostshare

[2] Error messages in a virtual Void PPC machine:
[drm] pci: virtio-gpu-pci detected at :00:02.0
[drm] features: -virgl +edid -resource_blob -host_visible
[drm] features: -context_init
[drm] number of scanouts: 1
[drm] number of cap sets: 0
[drm] Initialized virtio_gpu 0.1.0 0 for virtio1 on minor 0
BUG: Kernel NULL pointer dereference on read at 0x
Faulting instruction address: 0xc00c9934
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K SMP NR_CPUS=4 QEMU e500
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc7_A-EON_X5000 #1
NIP:  c00c9934 LR: c00c9f58 CTR: 
REGS: c208ab20 TRAP: 0300   Not tainted (6.0.0-rc7_A-EON_X5000)
MSR:  90029002   CR: 84008242  XER: 
DEAR:  ESR:  IRQMASK: 0
GPR00: c06f0060 c208adc0 c1ac3500 c25f0010
GPR04:    c19908b0
GPR08: 0105   0180
GPR12: 24008242 c0003fff9500 c0001384 
GPR16:    
GPR20:   c169021f c208b088
GPR24:  c2336800  
GPR28: c2a48000 c2336800  c25f0010
NIP [c00c9934] .dma_map_direct+0x8/0x10
LR [c00c9f58] .dma_max_mapping_size+0x24/0x78
Call Trace:
[c208adc0] [c208ae80] 0xc208ae80 (unreliable)
[c208ae40] [c06f0060] .drm_prime_pages_to_sg+0xa0/0xb8
[c208aed0] [c070f96c] .drm_gem_shmem_get_sg_table+0x28/0x3c
[c208af40] [c0808c8c] .virtio_gpu_object_create+0x134/0x3a8
[c208b010] [c0804c34] 
.virtio_gpu_mode_dumb_create+0xe4/0x15c
[c208b110] [c06ff7f4] .drm_mode_create_dumb+0xcc/0xec
[c208b180] [c0707748] 
.drm_client_framebuffer_create+0x98/0x1f0
[c208b260] [c071fb6c] 
.drm_fb_helper_generic_probe+0x78/0x1a0
[c208b320] [c071ef08] 
.__drm_fb_helper_initial_config_and_unlock+0x428/0x54c
[c208b410] [c071f9dc] .drm_fbdev_client_hotplug+0xec/0x128
[c208b4a0] [c071fdec] .drm_fbdev_generic_setup+0x158/0x198
[c208b530] [c0803dc4] .virtio_gpu_probe+0x1ac/0x1e0
[c208b5f0] [c069e11c] .virtio_dev_probe+0x2d0/0x3d4
[c208b690] [c0815f34] .really_probe+0x1a0/0x344
[c208b720] [c08161c8] .__driver_probe_device+0xf0/0x100
[c208b7b0] [c081620c] .driver_probe_device+0x34/0xac
[c208b840] [c0816774] .__driver_attach+0x124/0x134
[c208b8d0] [c0813974] .bus_for_each_dev+0x8c/0xd0
[c208b980] [c08154a4] .driver_attach+0x24/0x38
[c208b9f0] [c0814dd4] .bus_add_driver+0xd8/0x210
[c208baa0] [c0816fd4] .driver_register+0xe0/0x134
[c208bb20] [c069d8a8] .register_virtio_driver+0x40/0x54
hrtimer: interrupt took 4631040 ns
[c208bb90] [c195] .virtio_gpu_driver_init+0x18/0x2c
[c208bc00] [c0001044] .do_one_initcall+0x7c/0x1c0
[c208bce0] [c1925710] .kernel_init_freeable+0x23c/0x240
[c208bd90] [c00013ac] .kernel_init+0x28/0x14c
[c208be10] [c5a0] .ret_from_kernel_thread+0x58/0x60
Instruction dump:
3921 7c23f840 38210080 7d20485e 792307e0 48d551d8 7c9f2378 4bdc
792307e0 4e800020 e92301f8 7c852378  4b7c 7c0802a6 28060003
---[ end trace  ]---

Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
Rebooting in 180 seconds..


Re: PASEMI: Wrong lscpu info since the RC1 of kernel 6.0

2022-09-28 Thread Christian Zigotzky


Just for info:

The values have been fine again since the RC7 of kernel 6.0.

— Christian

> On 7. Sep 2022, at 06:25, Christian Zigotzky  wrote:
> 
> Hi All,
> 
> I use the Nemo board with a PASemi PA6T CPU and some values of lscpu are 
> wrong since the RC1 of kernel 6.0.
> 
> ┌──(mintppc㉿mintppc)-[~]
> └─$ lscpu
> Architecture:ppc64
> CPU op-mode(s):  32-bit, 64-bit
> Byte Order:  Big Endian
> CPU(s):  2
> On-line CPU(s) list: 0,1
> Thread(s) per core:  2
> Core(s) per socket:  1
> Socket(s):   1
> Model:   1.2 (pvr 0090 0102)
> Model name:  PA6T, altivec supported
> L1d cache:   64 KiB
> L1i cache:   64 KiB
> Vulnerability Itlb multihit: Not affected
> Vulnerability L1tf:  Vulnerable
> Vulnerability Mds:   Not affected
> Vulnerability Meltdown:  Vulnerable
> Vulnerability Mmio stale data:   Not affected
> Vulnerability Retbleed:  Not affected
> Vulnerability Spec store bypass: Vulnerable
> Vulnerability Spectre v1:Mitigation; __user pointer sanitization
> Vulnerability Spectre v2:Vulnerable
> Vulnerability Srbds: Not affected
> Vulnerability Tsx async abort:   Not affected
> 
> —-
> 
> One core with 2 threads is wrong. Two cores are correct. Each core has one 
> thread.
> 
> Have you modified the detection of the CPU?
> 
> Thanks,
> Christian



[PATCH RESEND 3/4] kexec: replace crash_mem_range with range

2022-09-28 Thread Baoquan He
From: Li Chen 

We already have struct range, so just use it.

Signed-off-by: Li Chen 
Acked-by: Baoquan He 
Signed-off-by: Baoquan He 
Cc: linuxppc-dev 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
---
Baoquan:
  - Rebased to the latest linus's master branch, fix the conflict
in include/linux/kexec.h.
  - Test passed on ibm-p9wr ppc64le system.

 arch/powerpc/kexec/file_load_64.c | 2 +-
 arch/powerpc/kexec/ranges.c   | 8 
 include/linux/kexec.h | 7 ++-
 kernel/kexec_file.c   | 2 +-
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 349a781cea0b..60e12b716d3c 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -35,7 +35,7 @@ struct umem_info {
 
/* usable memory ranges to look up */
unsigned int nr_ranges;
-   const struct crash_mem_range *ranges;
+   const struct range *ranges;
 };
 
 const struct kexec_file_ops * const kexec_file_loaders[] = {
diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
index 563e9989a5bf..5fc53a5fcfdf 100644
--- a/arch/powerpc/kexec/ranges.c
+++ b/arch/powerpc/kexec/ranges.c
@@ -33,7 +33,7 @@
 static inline unsigned int get_max_nr_ranges(size_t size)
 {
return ((size - sizeof(struct crash_mem)) /
-   sizeof(struct crash_mem_range));
+   sizeof(struct range));
 }
 
 /**
@@ -51,7 +51,7 @@ static inline size_t get_mem_rngs_size(struct crash_mem 
*mem_rngs)
return 0;
 
size = (sizeof(struct crash_mem) +
-   (mem_rngs->max_nr_ranges * sizeof(struct crash_mem_range)));
+   (mem_rngs->max_nr_ranges * sizeof(struct range)));
 
/*
 * Memory is allocated in size multiple of MEM_RANGE_CHUNK_SZ.
@@ -98,7 +98,7 @@ static int __add_mem_range(struct crash_mem **mem_ranges, u64 
base, u64 size)
  */
 static void __merge_memory_ranges(struct crash_mem *mem_rngs)
 {
-   struct crash_mem_range *ranges;
+   struct range *ranges;
int i, idx;
 
if (!mem_rngs)
@@ -123,7 +123,7 @@ static void __merge_memory_ranges(struct crash_mem 
*mem_rngs)
 /* cmp_func_t callback to sort ranges with sort() */
 static int rngcmp(const void *_x, const void *_y)
 {
-   const struct crash_mem_range *x = _x, *y = _y;
+   const struct range *x = _x, *y = _y;
 
if (x->start > y->start)
return 1;
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 13e6c4b58f07..b900311b4f87 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -17,6 +17,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -240,14 +241,10 @@ static inline int arch_kexec_locate_mem_hole(struct 
kexec_buf *kbuf)
 /* Alignment required for elf header segment */
 #define ELF_CORE_HEADER_ALIGN   4096
 
-struct crash_mem_range {
-   u64 start, end;
-};
-
 struct crash_mem {
unsigned int max_nr_ranges;
unsigned int nr_ranges;
-   struct crash_mem_range ranges[];
+   struct range ranges[];
 };
 
 extern int crash_exclude_mem_range(struct crash_mem *mem,
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 1d546dc97c50..22df37ca5143 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -1141,7 +1141,7 @@ int crash_exclude_mem_range(struct crash_mem *mem,
 {
int i, j;
unsigned long long start, end, p_start, p_end;
-   struct crash_mem_range temp_range = {0, 0};
+   struct range temp_range = {0, 0};
 
for (i = 0; i < mem->nr_ranges; i++) {
start = mem->ranges[i].start;
-- 
2.34.1



[PATCH] powerpc: remove orphan systbl_chk.sh

2022-09-28 Thread Nicholas Piggin
arch/powerpc/kernel/systbl_chk.sh has not been referenced since commit
ab66dcc76d6a ("powerpc: generate uapi header and system call table
files"). Remove it.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/systbl_chk.sh | 30 --
 1 file changed, 30 deletions(-)
 delete mode 100644 arch/powerpc/kernel/systbl_chk.sh

diff --git a/arch/powerpc/kernel/systbl_chk.sh 
b/arch/powerpc/kernel/systbl_chk.sh
deleted file mode 100644
index c7ac3ed657c4..
--- a/arch/powerpc/kernel/systbl_chk.sh
+++ /dev/null
@@ -1,30 +0,0 @@
-#!/bin/sh
-# SPDX-License-Identifier: GPL-2.0-or-later
-#
-# Just process the CPP output from systbl_chk.c and complain
-# if anything is out of order.
-#
-# Copyright © 2008 IBM Corporation
-#
-
-awk'BEGIN { num = -1; }# Ignore the beginning of the file
-   /^#/ { next; }
-   /^[ \t]*$/ { next; }
-   /^START_TABLE/ { num = 0; next; }
-   /^END_TABLE/ {
-   if (num != $2) {
-   printf "Error: NR_syscalls (%s) is not one more than 
the last syscall (%s)\n",
-   $2, num - 1;
-   exit(1);
-   }
-   num = -1;   # Ignore the rest of the file
-   }
-   {
-   if (num == -1) next;
-   if (($1 != -1) && ($1 != num)) {
-   printf "Error: Syscall %s out of order (expected %s)\n",
-   $1, num;
-   exit(1);
-   };
-   num++;
-   }' "$1"
-- 
2.37.2



[PATCH] powerpc/microwatt: Add litesd

2022-09-28 Thread Joel Stanley
This is the register layout of the litesd peripheral for the fusesoc
based Microwatt SoC.

Signed-off-by: Joel Stanley 
---
 arch/powerpc/boot/dts/microwatt.dts | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/boot/dts/microwatt.dts 
b/arch/powerpc/boot/dts/microwatt.dts
index b69db1d275cd..0a2e82ca1c79 100644
--- a/arch/powerpc/boot/dts/microwatt.dts
+++ b/arch/powerpc/boot/dts/microwatt.dts
@@ -141,6 +141,21 @@ ethernet@802 {
litex,slot-size = <0x800>;
interrupts = <0x11 0x1>;
};
+
+   mmc@804 {
+   compatible = "litex,mmc";
+   reg = <0x8042800 0x800
+   0x8041000 0x800
+   0x8040800 0x800
+   0x8042000 0x800
+   0x8041800 0x800>;
+   reg-names = "phy", "core", "reader", "writer", "irq";
+   bus-width = <4>;
+   interrupts = <0x13 1>;
+   cap-sd-highspeed;
+   non-removeable;
+   status = "disabled";
+   };
};
 
chosen {
-- 
2.35.1



[PATCH] powerpc/pseries/vas: Pass hw_cpu_id to node associativity HCALL

2022-09-28 Thread Haren Myneni


Generally the hypervisor decides to allocate a window on different
VAS instances. But if the user space wishes to allocate on the
current VAS instance where the process is executing, the kernel has
to pass associativity domain IDs to allocate VAS window HCALL. To
determine the associativity domain IDs for the current CPU, passing
smp_processor_id() to node associativity HCALL which may return
H_P2 (-55) error during DLPAR CPU event.

This patch fixes this issue by passing hard_smp_processor_id() with
VPHN_FLAG_VCPU flag (PAPR 14.11.6.1 H_HOME_NODE_ASSOCIATIVITY).

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/pseries/vas.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index fe33bdb620d5..533026fd1f40 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -348,7 +348,7 @@ static struct vas_window *vas_allocate_window(int vas_id, 
u64 flags,
 * So no unpacking needs to be done.
 */
rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, domain,
- VPHN_FLAG_VCPU, smp_processor_id());
+ VPHN_FLAG_VCPU, hard_smp_processor_id());
if (rc != H_SUCCESS) {
pr_err("H_HOME_NODE_ASSOCIATIVITY error: %d\n", rc);
goto out;
-- 
2.26.3




Re: [PATCH linux-next][RFC] powerpc: avoid lockdep when we are offline

2022-09-28 Thread Zhouyi Zhou
On Wed, Sep 28, 2022 at 10:51 AM Nicholas Piggin  wrote:
>
> On Wed Sep 28, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote:
> > Thank Nick for reviewing my patch
> >
> > On Tue, Sep 27, 2022 at 12:25 PM Nicholas Piggin  wrote:
> > >
> > > On Tue Sep 27, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote:
> > > > This is second version of my fix to PPC's  "WARNING: suspicious RCU 
> > > > usage",
> > > > I improved my fix under Paul E. McKenney's guidance:
> > > > Link: 
> > > > https://lore.kernel.org/lkml/20220914021528.15946-1-zhouzho...@gmail.com/T/
> > > >
> > > > During the cpu offlining, the sub functions of xive_teardown_cpu will
> > > > call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
> > > > travel RCU protected list, so "WARNING: suspicious RCU usage" will be
> > > > triggered.
> > > >
> > > > Avoid lockdep when we are offline.
> > >
> > > I don't see how this is safe. If RCU is no longer watching the CPU then
> > > the memory it is accessing here could be concurrently freed. I think the
> > > warning is valid.
> > Agree
> > >
> > > powerpc's problem is that cpuhp_report_idle_dead() is called before
> > > arch_cpu_idle_dead(), so it must not rely on any RCU protection there.
> > > I would say xive cleanup just needs to be done earlier. I wonder why it
> > > is not done in __cpu_disable or thereabouts, that's where the interrupt
> > > controller is supposed to be stopped.
> > Yes, I learn flowing events sequence from kgdb debugging
> > __cpu_disable -> pseries_cpu_disable -> set_cpu_online(cpu, false)  =
> > leads to =>  do_idle: if (cpu_is_offline(cpu) -> arch_cpu_idle_dead
> > so xive cleanup should be done in pseries_cpu_disable.
>
> It's a good catch and a reasonable approach to the problem.
Thank Nick for your encouragement ;-)
>
> > But as a beginner, I afraid that I am incompetent to do above
> > sophisticated work without error although I am very like to,
> > Could any expert do this for us?
>
> This will be difficult for anybody, it's tricky code. I'm not an
> expert at it.
>
> It looks like the interrupt controller disable split has been there
> since long before xive. I would try just move them together than see
> if that works.
Yes, I use "git blame" (I learned "git blame" from Paul E. McKenny ;-)
) to see the same.
and anticipate your great works!
>
> Documentation/core-api/cpu_hotplug.rst says that __cpu_disable should
> shut down the interrupt handler. So if there is a complication it
> would probably be from powerpc-specific CPU hotplug  or interrupt
> code.
Thank Nick for your guidance! I studied
Documentation/core-api/cpu_hotplug.rst this morning.
I also found X86 shut down the interrupt handler in __cpu_disable
according to above document.

Many Thanks
Zhouyi
>
> Thanks,
> Nick
>


Re: [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page

2022-09-28 Thread Alistair Popple


Michael Ellerman  writes:

> Alistair Popple  writes:
>> When the CPU tries to access a device private page the migrate_to_ram()
>> callback associated with the pgmap for the page is called. However no
>> reference is taken on the faulting page. Therefore a concurrent
>> migration of the device private page can free the page and possibly the
>> underlying pgmap. This results in a race which can crash the kernel due
>> to the migrate_to_ram() function pointer becoming invalid. It also means
>> drivers can't reliably read the zone_device_data field because the page
>> may have been freed with memunmap_pages().
>>
>> Close the race by getting a reference on the page while holding the ptl
>> to ensure it has not been freed. Unfortunately the elevated reference
>> count will cause the migration required to handle the fault to fail. To
>> avoid this failure pass the faulting page into the migrate_vma functions
>> so that if an elevated reference count is found it can be checked to see
>> if it's expected or not.
>>
>> Signed-off-by: Alistair Popple 
>> ---
>>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 15 ++-
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++--
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |  2 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +---
>>  include/linux/migrate.h  |  8 ++-
>>  lib/test_hmm.c   |  7 ++---
>>  mm/memory.c  | 16 +++-
>>  mm/migrate.c | 34 ++---
>>  mm/migrate_device.c  | 18 +
>>  9 files changed, 87 insertions(+), 41 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
>> b/arch/powerpc/kvm/book3s_hv_uvmem.c
>> index 5980063..d4eacf4 100644
>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
>> @@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>>  static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
>>  unsigned long start,
>>  unsigned long end, unsigned long page_shift,
>> -struct kvm *kvm, unsigned long gpa)
>> +struct kvm *kvm, unsigned long gpa, struct page *fault_page)
>>  {
>>  unsigned long src_pfn, dst_pfn = 0;
>> -struct migrate_vma mig;
>> +struct migrate_vma mig = { 0 };
>>  struct page *dpage, *spage;
>>  struct kvmppc_uvmem_page_pvt *pvt;
>>  unsigned long pfn;
>> @@ -525,6 +525,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
>> *vma,
>>  mig.dst = _pfn;
>>  mig.pgmap_owner = _uvmem_pgmap;
>>  mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
>> +mig.fault_page = fault_page;
>>
>>  /* The requested page is already paged-out, nothing to do */
>>  if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
>> @@ -580,12 +581,14 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
>> *vma,
>>  static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
>>unsigned long start, unsigned long end,
>>unsigned long page_shift,
>> -  struct kvm *kvm, unsigned long gpa)
>> +  struct kvm *kvm, unsigned long gpa,
>> +  struct page *fault_page)
>>  {
>>  int ret;
>>
>>  mutex_lock(>arch.uvmem_lock);
>> -ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa);
>> +ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa,
>> +fault_page);
>>  mutex_unlock(>arch.uvmem_lock);
>>
>>  return ret;
>> @@ -736,7 +739,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
>>  bool pagein)
>>  {
>>  unsigned long src_pfn, dst_pfn = 0;
>> -struct migrate_vma mig;
>> +struct migrate_vma mig = { 0 };
>>  struct page *spage;
>>  unsigned long pfn;
>>  struct page *dpage;
>> @@ -994,7 +997,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct 
>> vm_fault *vmf)
>>
>>  if (kvmppc_svm_page_out(vmf->vma, vmf->address,
>>  vmf->address + PAGE_SIZE, PAGE_SHIFT,
>> -pvt->kvm, pvt->gpa))
>> +pvt->kvm, pvt->gpa, vmf->page))
>>  return VM_FAULT_SIGBUS;
>>  else
>>  return 0;
>
> I don't have a UV test system, but as-is it doesn't even compile :)

Ugh, thanks. I did get as far as installing a PPC cross-compiler and
building a kernel. Apparently I did not get as far as enabling
CONFIG_PPC_UV :)

> kvmppc_svm_page_out() is called via some paths other than the
> migrate_to_ram callback.
>
> I think it's correct to just pass fault_page = NULL when it's not called
> from the migrate_to_ram callback?
>
> Incremental diff below.
>
> cheers
>
>
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> 

Re: [PATCH v6 3/8] dt-bindings: clock: Add ids for Lynx 10g PLLs

2022-09-28 Thread Stephen Boyd
Quoting Sean Anderson (2022-09-20 13:23:51)
> This adds ids for the Lynx 10g SerDes's internal PLLs. These may be used
> with assigned-clock* to specify a particular frequency to use. For
> example, to set the second PLL (at offset 0x20)'s frequency, use
> LYNX10G_PLLa(1). These are for use only in the device tree, and are not
> otherwise used by the driver.
> 
> Signed-off-by: Sean Anderson 
> Acked-by: Rob Herring 
> ---

Acked-by: Stephen Boyd 


Re: [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page

2022-09-28 Thread Michael Ellerman
Alistair Popple  writes:
> When the CPU tries to access a device private page the migrate_to_ram()
> callback associated with the pgmap for the page is called. However no
> reference is taken on the faulting page. Therefore a concurrent
> migration of the device private page can free the page and possibly the
> underlying pgmap. This results in a race which can crash the kernel due
> to the migrate_to_ram() function pointer becoming invalid. It also means
> drivers can't reliably read the zone_device_data field because the page
> may have been freed with memunmap_pages().
>
> Close the race by getting a reference on the page while holding the ptl
> to ensure it has not been freed. Unfortunately the elevated reference
> count will cause the migration required to handle the fault to fail. To
> avoid this failure pass the faulting page into the migrate_vma functions
> so that if an elevated reference count is found it can be checked to see
> if it's expected or not.
>
> Signed-off-by: Alistair Popple 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 15 ++-
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++--
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +---
>  include/linux/migrate.h  |  8 ++-
>  lib/test_hmm.c   |  7 ++---
>  mm/memory.c  | 16 +++-
>  mm/migrate.c | 34 ++---
>  mm/migrate_device.c  | 18 +
>  9 files changed, 87 insertions(+), 41 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 5980063..d4eacf4 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
>   unsigned long start,
>   unsigned long end, unsigned long page_shift,
> - struct kvm *kvm, unsigned long gpa)
> + struct kvm *kvm, unsigned long gpa, struct page *fault_page)
>  {
>   unsigned long src_pfn, dst_pfn = 0;
> - struct migrate_vma mig;
> + struct migrate_vma mig = { 0 };
>   struct page *dpage, *spage;
>   struct kvmppc_uvmem_page_pvt *pvt;
>   unsigned long pfn;
> @@ -525,6 +525,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
> *vma,
>   mig.dst = _pfn;
>   mig.pgmap_owner = _uvmem_pgmap;
>   mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
> + mig.fault_page = fault_page;
>  
>   /* The requested page is already paged-out, nothing to do */
>   if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
> @@ -580,12 +581,14 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
> *vma,
>  static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
> unsigned long start, unsigned long end,
> unsigned long page_shift,
> -   struct kvm *kvm, unsigned long gpa)
> +   struct kvm *kvm, unsigned long gpa,
> +   struct page *fault_page)
>  {
>   int ret;
>  
>   mutex_lock(>arch.uvmem_lock);
> - ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa);
> + ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa,
> + fault_page);
>   mutex_unlock(>arch.uvmem_lock);
>  
>   return ret;
> @@ -736,7 +739,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
>   bool pagein)
>  {
>   unsigned long src_pfn, dst_pfn = 0;
> - struct migrate_vma mig;
> + struct migrate_vma mig = { 0 };
>   struct page *spage;
>   unsigned long pfn;
>   struct page *dpage;
> @@ -994,7 +997,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct 
> vm_fault *vmf)
>  
>   if (kvmppc_svm_page_out(vmf->vma, vmf->address,
>   vmf->address + PAGE_SIZE, PAGE_SHIFT,
> - pvt->kvm, pvt->gpa))
> + pvt->kvm, pvt->gpa, vmf->page))
>   return VM_FAULT_SIGBUS;
>   else
>   return 0;

I don't have a UV test system, but as-is it doesn't even compile :)

kvmppc_svm_page_out() is called via some paths other than the
migrate_to_ram callback.

I think it's correct to just pass fault_page = NULL when it's not called
from the migrate_to_ram callback?

Incremental diff below.

cheers


diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index d4eacf410956..965c9e9e500b 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -637,7 +637,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot 
*slot,

Re: [PATCH 2/3] PCI/AER: Disable AER service on suspend when IRQ is shared with PME

2022-09-28 Thread Bjorn Helgaas
On Wed, Jul 27, 2022 at 09:32:51AM +0800, Kai-Heng Feng wrote:
> PCIe service that shares IRQ with PME may cause spurious wakeup on
> system suspend.
> 
> PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states
> that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready
> (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose
> much here to disable AER during system suspend.
> 
> This is very similar to previous attempts to suspend AER and DPC [1],
> but with a different reason.
> 
> [1] 
> https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> 
> Signed-off-by: Kai-Heng Feng 
> ---
>  drivers/pci/pcie/aer.c | 23 ++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 7952e5efd6cf3..60cc373754af2 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1372,6 +1372,26 @@ static int aer_probe(struct pcie_device *dev)
>   return 0;
>  }
>  
> +static int aer_suspend(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> +
> + if (dev->shared_pme_irq)
> + aer_disable_rootport(rpc);

aer_disable_rootport() seems like it might be overkill.  IIUC, what
we want to do here is disable AER interrupts, which should only
require clearing ROOT_PORT_INTR_ON_MESG_MASK in PCI_ERR_ROOT_COMMAND.

In addition to clearing ROOT_PORT_INTR_ON_MESG_MASK,
aer_disable_rootport() traverses the whole hierarchy, clearing
PCI_EXP_AER_FLAGS (CERE | NFERE | FERE | URRE) in PCI_EXP_DEVCTL.
I don't think these DEVCTL bits control interrupt generation, so I
don't know why we need to touch them.

aer_disable_rootport() also clears PCI_ERR_ROOT_STATUS, which I think
we should not do during suspend either.  We might want to clear it
on resume (which we already do in pci_restore_state()), but I think
generally we should preserve error information as long as it doesn't
cause trouble.

Your thoughts please :)

> +
> + return 0;
> +}
> +
> +static int aer_resume(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> +
> + if (dev->shared_pme_irq)
> + aer_enable_rootport(rpc);
> +
> + return 0;
> +}
> +
>  /**
>   * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
>   * @dev: pointer to Root Port, RCEC, or RCiEP
> @@ -1441,8 +1461,9 @@ static struct pcie_port_service_driver aerdriver = {
>   .name   = "aer",
>   .port_type  = PCIE_ANY_PORT,
>   .service= PCIE_PORT_SERVICE_AER,
> -
>   .probe  = aer_probe,
> + .suspend= aer_suspend,
> + .resume = aer_resume,
>   .remove = aer_remove,
>  };
>  
> -- 
> 2.36.1
> 


Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-28 Thread Lyude Paul
Re comments about infinite retry: gotcha, makes sense to me.

On Tue, 2022-09-27 at 09:45 +1000, Alistair Popple wrote:
> John Hubbard  writes:
> 
> > On 9/26/22 14:35, Lyude Paul wrote:
> > > > +   for (i = 0; i < npages; i++) {
> > > > +   if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
> > > > +   struct page *dpage;
> > > > +
> > > > +   /*
> > > > +* _GFP_NOFAIL because the GPU is going away 
> > > > and there
> > > > +* is nothing sensible we can do if we can't 
> > > > copy the
> > > > +* data back.
> > > > +*/
> > > 
> > > You'll have to excuse me for a moment since this area of nouveau isn't 
> > > one of
> > > my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means 
> > > infinite
> > > retry, in the case of a GPU hotplug event I would assume we would rather 
> > > just
> > > stop trying to migrate things to the GPU and just drop the data instead of
> > > hanging on infinite retries.
> > > 
> 
> No problem, thanks for taking a look!
> 
> > Hi Lyude!
> > 
> > Actually, I really think it's better in this case to keep trying
> > (presumably not necessarily infinitely, but only until memory becomes
> > available), rather than failing out and corrupting data.
> > 
> > That's because I'm not sure it's completely clear that this memory is
> > discardable. And at some point, we're going to make this all work with
> > file-backed memory, which will *definitely* not be discardable--I
> > realize that we're not there yet, of course.
> > 
> > But here, it's reasonable to commit to just retrying indefinitely,
> > really. Memory should eventually show up. And if it doesn't, then
> > restarting the machine is better than corrupting data, generally.
> 
> The memory is definitely not discardable here if the migration failed
> because that implies it is still mapped into some userspace process.
> 
> We could avoid restarting the machine by doing something similar to what
> happens during memory failure and killing every process that maps the
> page(s). But overall I think it's better to retry until memory is
> available, because that allows things like reclaim to work and in the
> worst case allows the OOM killer to select an appropriate task to kill.
> It also won't cause data corruption if/when we have file-backed memory.
> 
> > thanks,
> 

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat



Re: [PATCH 3/3] PCI/DPC: Disable DPC service on suspend when IRQ is shared with PME

2022-09-28 Thread Bjorn Helgaas
On Wed, Jul 27, 2022 at 09:32:52AM +0800, Kai-Heng Feng wrote:
> PCIe service that shares IRQ with PME may cause spurious wakeup on
> system suspend.
> 
> Since AER is conditionally disabled in previous patch, also apply the
> same condition to disable DPC which depends on AER to work.
> 
> PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states
> that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready
> (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose
> much here to disable DPC during system suspend.
> 
> This is very similar to previous attempts to suspend AER and DPC [1],
> but with a different reason.
> 
> [1] 
> https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> 
> Signed-off-by: Kai-Heng Feng 
> ---
>  drivers/pci/pcie/dpc.c | 52 +-
>  1 file changed, 41 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 3e9afee02e8d1..542f282c43f75 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -343,13 +343,33 @@ void pci_dpc_init(struct pci_dev *pdev)
>   }
>  }
>  
> +static void dpc_enable(struct pcie_device *dev)
> +{
> + struct pci_dev *pdev = dev->port;
> + u16 ctl;
> +
> + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
> + ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | 
> PCI_EXP_DPC_CTL_INT_EN;
> + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
> +}

I guess the reason we need this is because we disable interupts in
pci_pm_suspend() first, then we call pci_save_dpc_state() from
pci_pm_suspend_noirq(), so we save the *disabled* control register.
Then when we resume, we restore that disabled control register so we
need to enable DPC again.  Right?

I think we should save a "dpc_enabled" bit in the pci_dev and
conditionally set PCI_EXP_DPC_CTL_INT_EN here.  If we unconditionally
set it here, we depend on portdrv *not* calling dpc_resume() if we
didn't enable DPC at enumeration-time for some reason.

And I would leave PCI_EXP_DPC_CTL_EN_FATAL alone; see below.

> +static void dpc_disable(struct pcie_device *dev)
> +{
> + struct pci_dev *pdev = dev->port;
> + u16 ctl;
> +
> + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
> + ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
> + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);

  #define  PCI_EXP_DPC_CTL_EN_FATAL   0x0001
  #define  PCI_EXP_DPC_CTL_INT_EN 0x0008

Clearing PCI_EXP_DPC_CTL_INT_EN makes sense to me, but I don't
understand the PCI_EXP_DPC_CTL_EN_FATAL part.

PCI_EXP_DPC_CTL_EN_FATAL is one of the four values of the two-bit DPC
Trigger Enable, so clearing that bit leaves the field as either 00b
(DPC is disabled) or 10b (DPC enabled and triggered when the port
detects an uncorrectable error or receives an ERR_NONFATAL or
ERR_FATAL message).

I think we should only clear PCI_EXP_DPC_CTL_INT_EN.

> +}
> +
>  #define FLAG(x, y) (((x) & (y)) ? '+' : '-')
>  static int dpc_probe(struct pcie_device *dev)
>  {
>   struct pci_dev *pdev = dev->port;
>   struct device *device = >device;
>   int status;
> - u16 ctl, cap;
> + u16 cap;
>  
>   if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native)
>   return -ENOTSUPP;
> @@ -364,10 +384,7 @@ static int dpc_probe(struct pcie_device *dev)
>   }
>  
>   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, );
> - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
> -
> - ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | 
> PCI_EXP_DPC_CTL_INT_EN;
> - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);

I think we should keep the PCI_EXP_DPC_CTL_EN_FATAL part here. That
just sets the desired trigger mode but AFAICT, has nothing to do with
generating interrupts.

> + dpc_enable(dev);

Then dpc_enable() could be called something like dpc_enable_irq(), and
it would *only* control interupt generation.

>   pci_info(pdev, "enabled with IRQ %d\n", dev->irq);
>  
>   pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c 
> PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n",
> @@ -380,14 +397,25 @@ static int dpc_probe(struct pcie_device *dev)
>   return status;
>  }
>  
> -static void dpc_remove(struct pcie_device *dev)
> +static int dpc_suspend(struct pcie_device *dev)
>  {
> - struct pci_dev *pdev = dev->port;
> - u16 ctl;
> + if (dev->shared_pme_irq)
> + dpc_disable(dev);
>  
> - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
> - ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
> - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
> + return 0;
> +}
> +
> +static int dpc_resume(struct pcie_device *dev)
> +{
> 

Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-28 Thread Lyude Paul
On Tue, 2022-09-27 at 11:39 +1000, Alistair Popple wrote:
> Felix Kuehling  writes:
> 
> > On 2022-09-26 17:35, Lyude Paul wrote:
> > > On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote:
> > > > When the module is unloaded or a GPU is unbound from the module it is
> > > > possible for device private pages to be left mapped in currently running
> > > > processes. This leads to a kernel crash when the pages are either freed
> > > > or accessed from the CPU because the GPU and associated data structures
> > > > and callbacks have all been freed.
> > > > 
> > > > Fix this by migrating any mappings back to normal CPU memory prior to
> > > > freeing the GPU memory chunks and associated device private pages.
> > > > 
> > > > Signed-off-by: Alistair Popple 
> > > > 
> > > > ---
> > > > 
> > > > I assume the AMD driver might have a similar issue. However I can't see
> > > > where device private (or coherent) pages actually get unmapped/freed
> > > > during teardown as I couldn't find any relevant calls to
> > > > devm_memunmap(), memunmap(), devm_release_mem_region() or
> > > > release_mem_region(). So it appears that ZONE_DEVICE pages are not being
> > > > properly freed during module unload, unless I'm missing something?
> > > I've got no idea, will poke Ben to see if they know the answer to this
> > 
> > I guess we're relying on devm to release the region. Isn't the whole point 
> > of
> > using devm_request_free_mem_region that we don't have to remember to 
> > explicitly
> > release it when the device gets destroyed? I believe we had an explicit free
> > call at some point by mistake, and that caused a double-free during module
> > unload. See this commit for reference:
> 
> Argh, thanks for that pointer. I was not so familiar with
> devm_request_free_mem_region()/devm_memremap_pages() as currently
> Nouveau explicitly manages that itself.

Mhm, TBH I feel like this was going to happen eventually anyway but there's
another reason for nouveau to start using the managed versions of these
functions at some point

> 
> > commit 22f4f4faf337d5fb2d2750aff13215726814273e
> > Author: Philip Yang 
> > Date:   Mon Sep 20 17:25:52 2021 -0400
> > 
> > drm/amdkfd: fix svm_migrate_fini warning
> >  Device manager releases device-specific resources when a driver
> > disconnects from a device, devm_memunmap_pages and
> > devm_release_mem_region calls in svm_migrate_fini are redundant.
> >  It causes below warning trace after patch "drm/amdgpu: Split
> > amdgpu_device_fini into early and late", so remove function
> > svm_migrate_fini.
> >  BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718
> >  WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795
> > devm_release_action+0x51/0x60
> > Call Trace:
> > ? memunmap_pages+0x360/0x360
> > svm_migrate_fini+0x2d/0x60 [amdgpu]
> > kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
> > amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
> > amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
> > amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
> > drm_dev_release+0x20/0x40 [drm]
> > release_nodes+0x196/0x1e0
> > device_release_driver_internal+0x104/0x1d0
> > driver_detach+0x47/0x90
> > bus_remove_driver+0x7a/0xd0
> > pci_unregister_driver+0x3d/0x90
> > amdgpu_exit+0x11/0x20 [amdgpu]
> >  Signed-off-by: Philip Yang 
> > Reviewed-by: Felix Kuehling 
> > Signed-off-by: Alex Deucher 
> > 
> > Furthermore, I guess we are assuming that nobody is using the GPU when the
> > module is unloaded. As long as any processes have /dev/kfd open, you won't 
> > be
> > able to unload the module (except by force-unload). I suppose with 
> > ZONE_DEVICE
> > memory, we can have references to device memory pages even when user mode 
> > has
> > closed /dev/kfd. We do have a cleanup handler that runs in an 
> > MMU-free-notifier.
> > In theory that should run after all the pages in the mm_struct have been 
> > freed.
> > It releases all sorts of other device resources and needs the driver to 
> > still be
> > there. I'm not sure if there is anything preventing a module unload before 
> > the
> > free-notifier runs. I'll look into that.
> 
> Right - module unload (or device unbind) is one of the other ways we can
> hit this issue in Nouveau at least. You can end up with ZONE_DEVICE
> pages mapped in a running process after the module has unloaded.
> Although now you mention it that seems a bit wrong - the pgmap refcount
> should provide some protection against that. Will have to look into
> that too.
> 
> > Regards,
> >   Felix
> > 
> > 
> > > 
> > > > ---
> > > >   drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 
> > > > +++-
> > > >   1 file changed, 48 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
> > > > b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> > > > index 66ebbd4..3b247b8 100644
> > 

Re: [objtool] ca5e2b42c0: kernel_BUG_at_arch/x86/kernel/jump_label.c

2022-09-28 Thread Nathan Chancellor
On Wed, Sep 28, 2022 at 12:13:53PM -0700, Josh Poimboeuf wrote:
> On Wed, Sep 28, 2022 at 08:44:27AM -0700, Nathan Chancellor wrote:
> > This crash appears to just be a symptom of objtool erroring throughout
> > the entire build, which means things like the jump label hacks do not
> > get applied. I see a flood of
> > 
> >   error: objtool: --mnop requires --mcount
> > 
> > throughout the build because the configuration has
> > CONFIG_HAVE_NOP_MCOUNT=y because CONFIG_HAVE_OBJTOOL_MCOUNT is
> > unconditionally enabled for x86_64 due to CONFIG_HAVE_OBJTOOL but
> > '--mcount' is only actually used when CONFIG_FTRACE_MCOUNT_USE_OBJTOOL
> > is enabled so '--mnop' gets passed in without '--mcount'. This should
> > obviously be fixed somehow, perhaps by moving the '--mnop' addition into
> > the '--mcount' if, even if that makes the line really long.
> > 
> > A secondary issue is that it seems like if objtool encounters a fatal
> > error like this, it should completely fail the build to make it obvious
> > that something is wrong, rather than allowing it to continue and
> > generate a broken kernel, especially since x86_64 requires objtool to
> > build a working kernel at this point.
> 
> Grrr... I really dislike that objtool is capable of bricking the kernel
> like this.  We just saw something similar in RHEL.
> 
> IMO, we should just get rid of this "short JMP" feature in the jump
> label code, those saved three bytes aren't worth the pain.
> 
> But yes, we do need to fix that config issue.

Right, I actually see that the report I was CC'd on was a part of a
larger thread, where Naveen already suggested the fix for this problem,
which is not clang specific it seems:

https://lore.kernel.org/1663223588.wppdx3129x.nav...@linux.ibm.com/

> And yes, maybe fatal objtool warnings should cause a build failure.  We
> used to do that, but it brought a different sort of pain.  But if
> objtool is going to be in the kernel's critical boot path then I guess
> we have to do that.

Right, that was

  644592d32837 ("objtool: Fail the kernel build on fatal errors")

which was reverted in

  655cf86548a3 ("objtool: Don't fail the kernel build on fatal errors")

objtool should not error on warnings but it seems like it should error
for invalid option combinations and other misconfiguration problems? Did
this regress with commit b51277eb9775 ("objtool: Ditch subcommands")? I
can see that the return code of the subcommands would be passed back via
exit() (?) so objtool could fail the build if there was a true problem
but after that change, objtool_run() does not have its return code
checked so any errors that happen don't get passed back up. Perhaps just
the following diff would resolve this? I assume we would need to look at
all the different return values to know if this is safe though.

Cheers,
Nathan

diff --git a/tools/objtool/objtool.c b/tools/objtool/objtool.c
index a7ecc32e3512..cda649644e32 100644
--- a/tools/objtool/objtool.c
+++ b/tools/objtool/objtool.c
@@ -146,7 +146,5 @@ int main(int argc, const char **argv)
exec_cmd_init("objtool", UNUSED, UNUSED, UNUSED);
pager_init(UNUSED);
 
-   objtool_run(argc, argv);
-
-   return 0;
+   return objtool_run(argc, argv);
 }


Re: [PATCH 1/6] powerpc: Add ppc_md.name to dump stack arch description

2022-09-28 Thread Nathan Lynch
Michael Ellerman  writes:
> @@ -588,6 +590,15 @@ static __init int add_pcspkr(void)
>  device_initcall(add_pcspkr);
>  #endif   /* CONFIG_PCSPKR_PLATFORM */
>  
> +static char ppc_hw_desc_buf[128] __initdata;
> +
> +struct seq_buf ppc_hw_desc __initdata = {
> + .buffer = ppc_hw_desc_buf,
> + .size = sizeof(ppc_hw_desc_buf),
> + .len = 0,
> + .readpos = 0,
> +};
> +
>  static __init void probe_machine(void)
>  {
>   extern struct machdep_calls __machine_desc_start;
> @@ -628,6 +639,9 @@ static __init void probe_machine(void)
>   for (;;);
>   }
>  
> + seq_buf_printf(_hw_desc,"machine:%s ", ppc_md.name);
> + dump_stack_set_arch_desc(ppc_hw_desc.buffer);

At first I was confused by the seemingly unnecessary use of the seq_buf,
but after reading the rest of the series I see that this is the final
addition to a temporary buffer before setting the arch description
string. Looks OK to me.


Re: [PATCH 6/6] powerpc/pseries: Add firmware details to dump stack arch description

2022-09-28 Thread Nathan Lynch
Michael Ellerman  writes:

> Add firmware version details to the dump stack arch description, which
> is printed in case of an oops.
>
> Currently /hypervisor only exists on KVM, so if we don't find that
> look for something that suggests we're on phyp and if so that's
> probably a good guess. The actual content of the ibm,fw-net-version
> seems to be a full path so is too long to add to the description.

My only reservation is that ibm,fw-net-version seems to be unspecified
and could disappear in future firmware versions.

/ibm,powervm-partition would be the best PAPR-specified property for
this purpose, but I don't see it on a P8/860 partition I checked,
unfortunately. I do see it on a P9. Presumably it's present in device
trees on PowerVM P9 systems and later, but it's probably too new to use
for this.

/ibm,lpar-capable "indicates that the platform is capable of supporting
logical partitioning and is only present on such systems." This one is
present on the P8.

So consider this a weak suggestion to replace the ibm,fw-net-version
check with ibm,lpar-capable. But I don't want to bikeshed it, overall
the series looks good.

>
> eg: Hardware name: ... of:'IBM,FW860.42 (SV860_138)' hv:phyp

Will this info get printed during boot as well? There are many times it
would have been useful to me when looking at logs from non-oopsed
kernels.


Re: [objtool] ca5e2b42c0: kernel_BUG_at_arch/x86/kernel/jump_label.c

2022-09-28 Thread Josh Poimboeuf
On Wed, Sep 28, 2022 at 08:44:27AM -0700, Nathan Chancellor wrote:
> This crash appears to just be a symptom of objtool erroring throughout
> the entire build, which means things like the jump label hacks do not
> get applied. I see a flood of
> 
>   error: objtool: --mnop requires --mcount
> 
> throughout the build because the configuration has
> CONFIG_HAVE_NOP_MCOUNT=y because CONFIG_HAVE_OBJTOOL_MCOUNT is
> unconditionally enabled for x86_64 due to CONFIG_HAVE_OBJTOOL but
> '--mcount' is only actually used when CONFIG_FTRACE_MCOUNT_USE_OBJTOOL
> is enabled so '--mnop' gets passed in without '--mcount'. This should
> obviously be fixed somehow, perhaps by moving the '--mnop' addition into
> the '--mcount' if, even if that makes the line really long.
> 
> A secondary issue is that it seems like if objtool encounters a fatal
> error like this, it should completely fail the build to make it obvious
> that something is wrong, rather than allowing it to continue and
> generate a broken kernel, especially since x86_64 requires objtool to
> build a working kernel at this point.

Grrr... I really dislike that objtool is capable of bricking the kernel
like this.  We just saw something similar in RHEL.

IMO, we should just get rid of this "short JMP" feature in the jump
label code, those saved three bytes aren't worth the pain.

But yes, we do need to fix that config issue.

And yes, maybe fatal objtool warnings should cause a build failure.  We
used to do that, but it brought a different sort of pain.  But if
objtool is going to be in the kernel's critical boot path then I guess
we have to do that.

-- 
Josh


Re: [PATCH v2 2/2] powerpc/rtas: block error injection when locked down

2022-09-28 Thread Nathan Lynch
Andrew Donnellan  writes:

> On Mon, 2022-09-26 at 08:16 -0500, Nathan Lynch wrote:
>> The error injection facility on pseries VMs allows corruption of
>> arbitrary guest memory, potentially enabling a sufficiently
>> privileged
>> user to disable lockdown or perform other modifications of the
>> running
>> kernel via the rtas syscall.
>> 
>> Block the PAPR error injection facility from being opened or called
>> when locked down.
>> 
>> Signed-off-by: Nathan Lynch 
>
> Is there any circumstance (short of arbitrary code execution etc) where
> the rtas_call() check will actually trigger rather than the sys_rtas()
> check? (Not that it matters, defence in depth is good.)

Fair question! There are no in-kernel users of rtas_call() that pass the
error injection tokens as far as I could tell. Nor am I aware of any
out-of-tree users, for that matter. But rtas_call() is the likely most
appropriate place to have the lockdown gate should that situation change
(as it might, see https://github.com/ibm-power-utilities/librtas/issues/29).


Re: [External] Re: [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api

2022-09-28 Thread Zhuo Chen




On 9/28/22 7:06 PM, Serge Semin wrote:

On Wed, Sep 28, 2022 at 06:59:37PM +0800, Zhuo Chen wrote:

Hello.

Here comes patch v3, which contains some fixes and optimizations of
aer api usage. The v1 and v2 can be found on the mailing list.

v3:
- Modifications to comments proposed by Sathyanarayanan.



Remove
   pci_aer_clear_nonfatal_status() call in NTB and improve commit log.


Failed to see who has requested that...

-Sergey


Hi, Sergey

Currently other vendor drivers do not clear error status in their own 
init code, I don't exactly know what is special reason for clearing 
error status during init code in ntb driver.


An evidence is in pci_aer_init(), PCI core driver has do 
pci_aer_clear_status() and pci_enable_pcie_error_reporting() in common 
process. So vendor drivers don't need to do again.


But I don't know the reason why many vendor drivers reserve 
pci_enable_pcie_error_reporting() after commit f26e58bf6f54 ("PCI/AER: 
Enable error reporting when AER is native"). Do they need to be removed?

Could Bjorn and Sathyanarayanan help look into it, thanks a lot.

Thanks.


v2:
- Modifications to comments proposed by Bjorn. Split patch into more
   obvious parts.

Zhuo Chen (9):
   PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
   PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
 uncorrectable error status
   NTB: Remove pci_aer_clear_nonfatal_status() call
   scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
   PCI/AER: Unexport pci_aer_clear_nonfatal_status()
   PCI/AER: Move check inside pcie_clear_device_status().
   PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
   PCI/ERR: Clear fatal error status when pci_channel_io_frozen
   PCI/AER: Refine status clearing process with api

  drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
  drivers/pci/pci.c   |  7 +++--
  drivers/pci/pci.h   |  2 ++
  drivers/pci/pcie/aer.c  | 45 +++--
  drivers/pci/pcie/dpc.c  |  3 +--
  drivers/pci/pcie/err.c  | 15 ---
  drivers/pci/pcie/portdrv_core.c |  3 +--
  drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
  include/linux/aer.h |  4 +--
  9 files changed, 44 insertions(+), 41 deletions(-)

--
2.30.1 (Apple Git-130)



--
Zhuo Chen


Re: [PATCH] tools/perf: Fix aggr_printout to display cpu field irrespective of core value

2022-09-28 Thread James Clark



On 13/09/2022 12:57, Athira Rajeev wrote:
> perf stat includes option to specify aggr_mode to display
> per-socket, per-core, per-die, per-node counter details.
> Also there is option -A ( AGGR_NONE, -no-aggr ), where the
> counter values are displayed for each cpu along with "CPU"
> value in one field of the output.
> 
> Each of the aggregate mode uses the information fetched
> from "/sys/devices/system/cpu/cpuX/topology" like core_id,

I thought that this wouldn't apply to the cpu field because cpu is
basically interchangeable as an index in cpumap, rather than anything
being read from the topology file.

> physical_package_id. Utility functions in "cpumap.c" fetches
> this information and populates the socket id, core id, cpu etc.
> If the platform does not expose the topology information,
> these values will be set to -1. Example, in case of powerpc,
> details like physical_package_id is restricted to be exposed
> in pSeries platform. So id.socket, id.core, id.cpu all will
> be set as -1.
> 
> In case of displaying socket or die value, there is no check
> done in the "aggr_printout" function to see if it points to
> valid socket id or die. But for displaying "cpu" value, there
> is a check for "if (id.core > -1)". In case of powerpc pSeries
> where detail like physical_package_id is restricted to be
> exposed, id.core will be set to -1. Hence the column or field
> itself for CPU won't be displayed in the output.
> 
> Result for per-socket:
> 
> <<>>
> perf stat -e branches --per-socket -a true
> 
>  Performance counter stats for 'system wide':
> 
> S-1  32416,851  branches
> <<>>
> 
> Here S has -1 in above result. But with -A option which also
> expects CPU in one column in the result, below is observed.
> 
> <<>>
>  /bin/perf stat -e instructions -A -a true
> 
>  Performance counter stats for 'system wide':
> 
> 47,146  instructions
> 45,226  instructions
> 43,354  instructions
> 45,184  instructions
> <<>>
> 
> If the cpu id value is pointing to -1 also, it makes sense
> to display the column in the output to replicate the behaviour
> or to be in precedence with other aggr options(like per-socket,
> per-core). Remove the check "id.core" so that CPU field gets
> displayed in the output.

Why would you want to print -1 out? Seems like the if statement was a
good one to me, otherwise the output looks a bit broken to users. Are
the other aggregation modes even working if -1 is set for socket and
die? Maybe we need to not print -1 in those cases or exit earlier with a
failure.

The -1 value has a specific internal meaning which is "to not
aggregate". It doesn't mean "not set".

> 
> After the fix:
> 
> <<>>
> perf stat -e instructions -A -a true
> 
>  Performance counter stats for 'system wide':
> 
> CPU-1  64,034  instructions
> CPU-1  68,941  instructions
> CPU-1  59,418  instructions
> CPU-1  70,478  instructions
> CPU-1  65,201  instructions
> CPU-1  63,704  instructions
> <<>>
> 
> This is caught while running "perf test" for
> "stat+json_output.sh" and "stat+csv_output.sh".

Is it possible to fix the issue by making the tests cope with the lack
of the CPU id?

> 
> Reported-by: Disha Goel 
> Signed-off-by: Athira Rajeev 
> ---
>  tools/perf/util/stat-display.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index b82844cb0ce7..1b751a730271 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -168,10 +168,9 @@ static void aggr_printout(struct perf_stat_config 
> *config,
>   id.socket,
>   id.die,
>   id.core);
> - } else if (id.core > -1) {
> + } else

This should have been "id.cpu.cpu > -1". Looks like it was changed by
some kind of bad merge or rebase in df936cadfb because there is no
obvious justification for the change to .core in that commit.

>   fprintf(config->output, "\"cpu\" : \"%d\", ",
>   id.cpu.cpu);
> - }
>   } else {
>   if (evsel->percore && !config->percore_show_thread) {
>   fprintf(config->output, "S%d-D%d-C%*d%s",
> @@ -179,11 +178,10 @@ static void aggr_printout(struct perf_stat_config 
> *config,
>   id.die,
>   config->csv_output ? 0 : -3,
>   id.core, config->csv_sep);
> - } else if (id.core > -1) {
> + } else
>   fprintf(config->output, "CPU%*d%s",
> 

Re: [objtool] ca5e2b42c0: kernel_BUG_at_arch/x86/kernel/jump_label.c

2022-09-28 Thread Nathan Chancellor
Hi all,

On Wed, Sep 28, 2022 at 08:48:53AM +0800, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed the following commit (built with clang-14):
> 
> commit: ca5e2b42c0d4438ba93623579b6860b98f3598f3 ("[PATCH v3 11/16] objtool: 
> Add --mnop as an option to --mcount")
> url: 
> https://github.com/intel-lab-lkp/linux/commits/Sathvika-Vasireddy/objtool-Enable-and-implement-mcount-option-on-powerpc/20220912-163023
> base: https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git 
> topic/ppc-kvm
> patch link: 
> https://lore.kernel.org/linuxppc-dev/20220912082020.226755-12...@linux.ibm.com
> 
> in testcase: boot
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> [  152.068363][T0] jump_label: Fatal kernel bug, unexpected op at 
> trace_initcall_start+0xc/0x180 [810016ec] (e9 c9 00 00 00 != 0f 1f 44 
> 00 00)) size:5 type:1
> [  152.070368][T0] [ cut here ]
> [  152.071050][T0] kernel BUG at arch/x86/kernel/jump_label.c:73!
> [  152.071825][T0] invalid opcode:  [#1] SMP KASAN PTI
> [  152.072427][T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 6.0.0-rc2-00011-gca5e2b42c0d4 #1 96a19ca45386d518c4bccc5b3bc53f548a2dc122
> [  152.073837][T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> [  152.075461][T0] RIP: 0010:__jump_label_patch+0x340/0x350
> [  152.076162][T0] Code: 00 48 89 da e9 51 fe ff ff 48 c7 c7 00 d1 80 83 
> 4c 89 fe 4c 89 fa 4c 89 f9 49 89 d8 45 89 e9 41 54 e8 f2 91 34 02 48 83 c4 08 
> <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 84 00 00 00 00 00 48 c7 c7 00 09 69
> [  152.078374][T0] RSP: :84607cb8 EFLAGS: 00010086
> [  152.079159][T0] RAX: 0092 RBX: 8380f62a RCX: 
> 84634d80
> [  152.080100][T0] RDX:  RSI: ffea RDI: 
> fffe
> [  152.081020][T0] RBP: 855d9f60 R08: 8124f17c R09: 
> fbfff08c0f53
> [  152.081936][T0] R10: d7fff08c0f54 R11: 108c0f52 R12: 
> 0001
> [  152.082832][T0] R13: 0005 R14: 8380f62a R15: 
> 810016ec
> [  152.083744][T0] FS:  () GS:8883aee0() 
> knlGS:
> [  152.084763][T0] CS:  0010 DS:  ES:  CR0: 80050033
> [  152.085567][T0] CR2: 88843000 CR3: 04628000 CR4: 
> 000406b0
> [  152.086472][T0] DR0:  DR1:  DR2: 
> 
> [  152.087407][T0] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  152.088326][T0] Call Trace:
> [  152.088702][T0]  
> [  152.089042][T0]  ? trace_initcall_start+0xc/0x180
> [  152.089660][T0]  ? trace_initcall_start+0x1b/0x180
> [  152.090281][T0]  ? trace_initcall_start+0x11/0x180
> [  152.091237][T0]  ? jump_label_transform+0x25/0xd0
> [  152.091923][T0]  ? arch_jump_label_transform_queue+0x87/0xd0
> [  152.092651][T0]  ? __jump_label_update+0x192/0x3b0
> [  152.093320][T0]  ? static_key_enable_cpuslocked+0x129/0x250
> [  152.094020][T0]  ? rcu_lock_release+0x20/0x20
> [  152.094573][T0]  ? static_key_enable+0x16/0x20
> [  152.095167][T0]  ? tracepoint_add_func+0x87e/0x9d0
> [  152.095822][T0]  ? rcu_lock_release+0x20/0x20
> [  152.096394][T0]  ? tracepoint_probe_register+0x99/0xd0
> [  152.097055][T0]  ? rcu_lock_release+0x20/0x20
> [  152.097606][T0]  ? initcall_debug_enable+0x21/0x6b
> [  152.098305][T0]  ? start_kernel+0x24b/0x4e6
> [  152.098861][T0]  ? secondary_startup_64_no_verify+0xce/0xdb
> [  152.099556][T0]  
> [  152.099891][T0] Modules linked in:
> [  152.100352][T0] ---[ end trace  ]---
> [  152.100980][T0] RIP: 0010:__jump_label_patch+0x340/0x350
> [  152.101652][T0] Code: 00 48 89 da e9 51 fe ff ff 48 c7 c7 00 d1 80 83 
> 4c 89 fe 4c 89 fa 4c 89 f9 49 89 d8 45 89 e9 41 54 e8 f2 91 34 02 48 83 c4 08 
> <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 84 00 00 00 00 00 48 c7 c7 00 09 69
> [  152.103892][T0] RSP: :84607cb8 EFLAGS: 00010086
> [  152.104544][T0] RAX: 0092 RBX: 8380f62a RCX: 
> 84634d80
> [  152.105421][T0] RDX:  RSI: ffea RDI: 
> fffe
> [  152.106280][T0] RBP: 855d9f60 R08: 8124f17c R09: 
> fbfff08c0f53
> [  152.107182][T0] R10: d7fff08c0f54 R11: 108c0f52 R12: 
> 0001
> [  152.108110][T0] R13: 0005 R14: 8380f62a R15: 
> 810016ec
> [  152.109002][T0] FS:  () GS:8883aee0() 
> knlGS:
> [  152.109986][T0] CS:  0010 DS:  ES:  CR0: 80050033
> [  152.110796][T0] CR2: 88843000 CR3: 04628000 

Re: [PATCH] tools/perf: Fix aggr_printout to display cpu field irrespective of core value

2022-09-28 Thread Athira Rajeev



> On 16-Sep-2022, at 5:01 PM, Disha Goel  wrote:
> 
> This Message Is From an External Sender
> This message came from outside your organization.
> 
> 
> On 9/13/22 5:27 PM, Athira Rajeev wrote:
>> perf stat includes option to specify aggr_mode to display
>> per-socket, per-core, per-die, per-node counter details.
>> Also there is option -A ( AGGR_NONE, -no-aggr ), where the
>> counter values are displayed for each cpu along with "CPU"
>> value in one field of the output.
>> 
>> Each of the aggregate mode uses the information fetched
>> from "/sys/devices/system/cpu/cpuX/topology" like core_id,
>> physical_package_id. Utility functions in "cpumap.c" fetches
>> this information and populates the socket id, core id, cpu etc.
>> If the platform does not expose the topology information,
>> these values will be set to -1. Example, in case of powerpc,
>> details like physical_package_id is restricted to be exposed
>> in pSeries platform. So id.socket, id.core, id.cpu all will
>> be set as -1.
>> 
>> In case of displaying socket or die value, there is no check
>> done in the "aggr_printout" function to see if it points to
>> valid socket id or die. But for displaying "cpu" value, there
>> is a check for "if (id.core > -1)". In case of powerpc pSeries
>> where detail like physical_package_id is restricted to be
>> exposed, id.core will be set to -1. Hence the column or field
>> itself for CPU won't be displayed in the output.
>> 
>> Result for per-socket:
>> 
>> <<>>
>> perf stat -e branches --per-socket -a true
>> 
>>  Performance counter stats for 'system wide':
>> 
>> S-1  32416,851  branches
>> <<>>
>> 
>> Here S has -1 in above result. But with -A option which also
>> expects CPU in one column in the result, below is observed.
>> 
>> <<>>
>>  /bin/perf stat -e instructions -A -a true
>> 
>>  Performance counter stats for 'system wide':
>> 
>> 47,146  instructions
>> 45,226  instructions
>> 43,354  instructions
>> 45,184  instructions
>> <<>>
>> 
>> If the cpu id value is pointing to -1 also, it makes sense
>> to display the column in the output to replicate the behaviour
>> or to be in precedence with other aggr options(like per-socket,
>> per-core). Remove the check "id.core" so that CPU field gets
>> displayed in the output.
>> 
>> After the fix:
>> 
>> <<>>
>> perf stat -e instructions -A -a true
>> 
>>  Performance counter stats for 'system wide':
>> 
>> CPU-1  64,034  instructions
>> CPU-1  68,941  instructions
>> CPU-1  59,418  instructions
>> CPU-1  70,478  instructions
>> CPU-1  65,201  instructions
>> CPU-1  63,704  instructions
>> <<>>
>> 
>> This is caught while running "perf test" for
>> "stat+json_output.sh" and "stat+csv_output.sh".
>> 
>> Reported-by: Disha Goel 
>> 
>> 
>> Signed-off-by: Athira Rajeev 
>> 
> Tested-by: Disha Goel 

Hi,

Looking for review comments for this change.

Thanks
Athira
>> ---
>>  tools/perf/util/stat-display.c | 6 ++
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>> 
>> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
>> index b82844cb0ce7..1b751a730271 100644
>> --- a/tools/perf/util/stat-display.c
>> +++ b/tools/perf/util/stat-display.c
>> @@ -168,10 +168,9 @@ static void aggr_printout(struct perf_stat_config 
>> *config,
>>  id.socket,
>>  id.die,
>>  id.core);
>> -} else if (id.core > -1) {
>> +} else
>>  fprintf(config->output, "\"cpu\" : \"%d\", ",
>>  id.cpu.cpu);
>> -}
>>  } else {
>>  if (evsel->percore && !config->percore_show_thread) {
>>  fprintf(config->output, "S%d-D%d-C%*d%s",
>> @@ -179,11 +178,10 @@ static void aggr_printout(struct perf_stat_config 
>> *config,
>>  id.die,
>>  config->csv_output ? 0 : -3,
>>  id.core, config->csv_sep);
>> -} else if (id.core > -1) {
>> +} else
>>  fprintf(config->output, "CPU%*d%s",
>>  config->csv_output ? 0 : -7,
>>  id.cpu.cpu, config->csv_sep);
>> -}
>>  }
>>  break;
>>  case AGGR_THREAD:
>> 



Re: [PATCH V2 2/3] tools/perf/tests: Fix branch stack sampling test to include sanity check for branch filter

2022-09-28 Thread Arnaldo Carvalho de Melo
Em Wed, Sep 21, 2022 at 08:22:54PM +0530, Athira Rajeev escreveu:
> commit b55878c90ab9 ("perf test: Add test for branch stack sampling")
> added test for branch stack sampling. There is a sanity check in the
> beginning to skip the test if the hardware doesn't support branch stack
> sampling.
> 
> Snippet
> <<>>
> skip the test if the hardware doesn't support branch stack sampling
> perf record -b -o- -B true > /dev/null 2>&1 || exit 2
> <<>>
> 
> But the testcase also uses branch sample types: save_type, any. if any
> platform doesn't support the branch filters used in the test, the testcase
> will fail. In powerpc, currently mutliple branch filters are not supported
> and hence this test fails in powerpc. Fix the sanity check to look at
> the support for branch filters used in this test before proceeding with
>> the test.

Applied the tools/perf/ part

- Arnaldo
 
> Fixes: b55878c90ab9 ("perf test: Add test for branch stack sampling")
> Reported-by: Disha Goel 
> Signed-off-by: Athira Rajeev 
> Reviewed-by: Kajol Jain 
> ---
>  tools/perf/tests/shell/test_brstack.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/tests/shell/test_brstack.sh 
> b/tools/perf/tests/shell/test_brstack.sh
> index c644f94a6500..ec801cffae6b 100755
> --- a/tools/perf/tests/shell/test_brstack.sh
> +++ b/tools/perf/tests/shell/test_brstack.sh
> @@ -12,7 +12,8 @@ if ! [ -x "$(command -v cc)" ]; then
>  fi
>  
>  # skip the test if the hardware doesn't support branch stack sampling
> -perf record -b -o- -B true > /dev/null 2>&1 || exit 2
> +# and if the architecture doesn't support filter types: any,save_type,u
> +perf record -b -o- -B --branch-filter any,save_type,u true > /dev/null 2>&1 
> || exit 2
>  
>  TMPDIR=$(mktemp -d /tmp/__perf_test.program.X)
>  
> -- 
> 2.31.1

-- 

- Arnaldo


Re: [PATCH V2 2/3] tools/perf/tests: Fix branch stack sampling test to include sanity check for branch filter

2022-09-28 Thread Athira Rajeev



> On 21-Sep-2022, at 8:22 PM, Athira Rajeev  wrote:
> 
> commit b55878c90ab9 ("perf test: Add test for branch stack sampling")
> added test for branch stack sampling. There is a sanity check in the
> beginning to skip the test if the hardware doesn't support branch stack
> sampling.
> 
> Snippet
> <<>>
> skip the test if the hardware doesn't support branch stack sampling
> perf record -b -o- -B true > /dev/null 2>&1 || exit 2
> <<>>
> 
> But the testcase also uses branch sample types: save_type, any. if any
> platform doesn't support the branch filters used in the test, the testcase
> will fail. In powerpc, currently mutliple branch filters are not supported
> and hence this test fails in powerpc. Fix the sanity check to look at
> the support for branch filters used in this test before proceeding with
> the test.
> 
> Fixes: b55878c90ab9 ("perf test: Add test for branch stack sampling")
> Reported-by: Disha Goel 
> Signed-off-by: Athira Rajeev 
> Reviewed-by: Kajol Jain 
> ---
> tools/perf/tests/shell/test_brstack.sh | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/tests/shell/test_brstack.sh 
> b/tools/perf/tests/shell/test_brstack.sh
> index c644f94a6500..ec801cffae6b 100755
> --- a/tools/perf/tests/shell/test_brstack.sh
> +++ b/tools/perf/tests/shell/test_brstack.sh
> @@ -12,7 +12,8 @@ if ! [ -x "$(command -v cc)" ]; then
> fi
> 
> # skip the test if the hardware doesn't support branch stack sampling
> -perf record -b -o- -B true > /dev/null 2>&1 || exit 2
> +# and if the architecture doesn't support filter types: any,save_type,u
> +perf record -b -o- -B --branch-filter any,save_type,u true > /dev/null 2>&1 
> || exit 2

Hi,

Looking for review comments for this change.

Thanks
Athira
> 
> TMPDIR=$(mktemp -d /tmp/__perf_test.program.X)
> 
> -- 
> 2.31.1
> 



Re: [PATCH] tools/perf/tests: Update is_ignored_symbol function in vmlinux-kallsyms test

2022-09-28 Thread Arnaldo Carvalho de Melo
Em Wed, Sep 28, 2022 at 10:22:18AM +0530, Athira Rajeev escreveu:
> The testcase “vmlinux-kallsyms.c” fails in powerpc.
> 
>   vmlinux symtab matches kallsyms: FAILED!

Thanks, applied.

- Arnaldo

 
> This test look at the symbols in the vmlinux DSO
> and check if we find all of them in the kallsyms dso.
> But from the powerpc logs , observed that the failure
> happens for:
>   ERR : 0xc00fe9c8: .Lmfspr_table not on kallsyms
>   ERR : 0xc01009c8: .Lmtspr_table not on kallsyms
> 
> These are labels ( with .L) in the source code and
> has to be ignored. Reference code with .Lmtspr_table:
> arch/powerpc/xmon/spr_access.S
> 
> The testcases invokes is_ignored_symbol() function to
> ignore hidden symbols in the dso like local symbols. This
> function is adapted from is_ignored_symbol() kernel
> function in code: scripts/kallsyms.c . The kernel
> function got some updates which is not reflected in
> the testcase function and the new updates also handles
> ignoring "labels".
> 
> Below is the changes that went in the kernel function.
> 
>/* Symbol names that begin with the following are ignored.*/
>static const char * const ignored_prefixes[] = {
>   "$",/* local symbols for ARM, MIPS, 
> etc. */
>   -   ".LASANPC", /* s390 kasan local symbols */
>   +   ".L",   /* local labels, 
> .LBB,.Ltmpxxx,.L__unnamed_xx,.LASANPC, etc. */
>   "__crc_",   /* modversions */
>   "__efistub_",   /* arm64 EFI stub namespace */
>   -   "__kvm_nvhe_",  /* arm64 non-VHE KVM namespace 
> */
>   +   "__kvm_nvhe_$", /* arm64 local symbols in 
> non-VHE KVM namespace */
>   +   "__kvm_nvhe_.L",/* arm64 local symbols in 
> non-VHE KVM namespace */
>   "__AArch64ADRPThunk_",  /* arm64 lld */
>   "__ARMV5PILongThunk_",  /* arm lld */
>   "__ARMV7PILongThunk_",
> 
> This change is part of below commits and will handle the
> symbols with “.L”
> 
> commit d4c858643263 ("kallsyms: ignore all local labels prefixed
> by '.L'")
> commit 6ccf9cb557bd ("KVM: arm64: Symbolize the nVHE HYP addresses")
> 
> Update the testcase function to include the new
> changes.
> 
> Reported-by: Disha Goel 
> Signed-off-by: Athira Rajeev 
> ---
>  tools/perf/tests/vmlinux-kallsyms.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/tests/vmlinux-kallsyms.c 
> b/tools/perf/tests/vmlinux-kallsyms.c
> index 4fd8d703ff19..8ab035b55875 100644
> --- a/tools/perf/tests/vmlinux-kallsyms.c
> +++ b/tools/perf/tests/vmlinux-kallsyms.c
> @@ -43,10 +43,11 @@ static bool is_ignored_symbol(const char *name, char type)
>   /* Symbol names that begin with the following are ignored.*/
>   static const char * const ignored_prefixes[] = {
>   "$",/* local symbols for ARM, MIPS, etc. */
> - ".LASANPC", /* s390 kasan local symbols */
> + ".L",   /* local labels, 
> .LBB,.Ltmpxxx,.L__unnamed_xx,.LASANPC, etc. */
>   "__crc_",   /* modversions */
>   "__efistub_",   /* arm64 EFI stub namespace */
> - "__kvm_nvhe_",  /* arm64 non-VHE KVM namespace */
> + "__kvm_nvhe_$", /* arm64 local symbols in non-VHE KVM 
> namespace */
> + "__kvm_nvhe_.L",/* arm64 local symbols in non-VHE KVM 
> namespace */
>   "__AArch64ADRPThunk_",  /* arm64 lld */
>   "__ARMV5PILongThunk_",  /* arm lld */
>   "__ARMV7PILongThunk_",
> -- 
> 2.31.1

-- 

- Arnaldo


Re: [PATCH v2 2/2] powerpc/rtas: block error injection when locked down

2022-09-28 Thread Andrew Donnellan
On Mon, 2022-09-26 at 08:16 -0500, Nathan Lynch wrote:
> The error injection facility on pseries VMs allows corruption of
> arbitrary guest memory, potentially enabling a sufficiently
> privileged
> user to disable lockdown or perform other modifications of the
> running
> kernel via the rtas syscall.
> 
> Block the PAPR error injection facility from being opened or called
> when locked down.
> 
> Signed-off-by: Nathan Lynch 

Is there any circumstance (short of arbitrary code execution etc) where
the rtas_call() check will actually trigger rather than the sys_rtas()
check? (Not that it matters, defence in depth is good.)

Reviewed-by: Andrew Donnellan 

> ---
>  arch/powerpc/kernel/rtas.c | 25 -
>  include/linux/security.h   |  1 +
>  security/security.c    |  1 +
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 693133972294..c2540d393f1c 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -464,6 +465,9 @@ void rtas_call_unlocked(struct rtas_args *args,
> int token, int nargs, int nret,
> va_end(list);
>  }
>  
> +static int ibm_open_errinjct_token;
> +static int ibm_errinjct_token;
> +
>  int rtas_call(int token, int nargs, int nret, int *outputs, ...)
>  {
> va_list list;
> @@ -476,6 +480,16 @@ int rtas_call(int token, int nargs, int nret,
> int *outputs, ...)
> if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE)
> return -1;
>  
> +   if (token == ibm_open_errinjct_token || token ==
> ibm_errinjct_token) {
> +   /*
> +    * It would be nicer to not discard the error value
> +    * from security_locked_down(), but callers expect an
> +    * RTAS status, not an errno.
> +    */
> +   if
> (security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION))
> +   return -1;
> +   }
> +
> if ((mfmsr() & (MSR_IR|MSR_DR)) != (MSR_IR|MSR_DR)) {
> WARN_ON_ONCE(1);
> return -1;
> @@ -1227,6 +1241,14 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user
> *, uargs)
> if (block_rtas_call(token, nargs, ))
> return -EINVAL;
>  
> +   if (token == ibm_open_errinjct_token || token ==
> ibm_errinjct_token) {
> +   int err;
> +
> +   err =
> security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION);
> +   if (err)
> +   return err;
> +   }
> +
> /* Need to handle ibm,suspend_me call specially */
> if (token == rtas_token("ibm,suspend-me")) {
>  
> @@ -1325,7 +1347,8 @@ void __init rtas_initialize(void)
>  #ifdef CONFIG_RTAS_ERROR_LOGGING
> rtas_last_error_token = rtas_token("rtas-last-error");
>  #endif
> -
> +   ibm_open_errinjct_token = rtas_token("ibm,open-errinjct");
> +   ibm_errinjct_token = rtas_token("ibm,errinjct");
> rtas_syscall_filter_init();
>  }
>  
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 39e7c0e403d9..70f89dc3a712 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -123,6 +123,7 @@ enum lockdown_reason {
> LOCKDOWN_XMON_WR,
> LOCKDOWN_BPF_WRITE_USER,
> LOCKDOWN_DBG_WRITE_KERNEL,
> +   LOCKDOWN_RTAS_ERROR_INJECTION,
> LOCKDOWN_INTEGRITY_MAX,
> LOCKDOWN_KCORE,
> LOCKDOWN_KPROBES,
> diff --git a/security/security.c b/security/security.c
> index 51bf66d4f472..eabe3ce7e74e 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -61,6 +61,7 @@ const char *const
> lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
> [LOCKDOWN_XMON_WR] = "xmon write access",
> [LOCKDOWN_BPF_WRITE_USER] = "use of bpf to write user RAM",
> [LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write
> kernel RAM",
> +   [LOCKDOWN_RTAS_ERROR_INJECTION] = "RTAS error injection",
> [LOCKDOWN_INTEGRITY_MAX] = "integrity",
> [LOCKDOWN_KCORE] = "/proc/kcore access",
> [LOCKDOWN_KPROBES] = "use of kprobes",

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited



[PATCH 6/6] powerpc/pseries: Add firmware details to dump stack arch description

2022-09-28 Thread Michael Ellerman
Add firmware version details to the dump stack arch description, which
is printed in case of an oops.

Currently /hypervisor only exists on KVM, so if we don't find that
look for something that suggests we're on phyp and if so that's
probably a good guess. The actual content of the ibm,fw-net-version
seems to be a full path so is too long to add to the description.

eg: Hardware name: ... of:'IBM,FW860.42 (SV860_138)' hv:phyp

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/setup.c | 36 ++
 1 file changed, 36 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 5e44c65a032c..f0ce8256ebb8 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1011,6 +1012,39 @@ static void __init pSeries_cmo_feature_init(void)
pr_debug(" <- fw_cmo_feature_init()\n");
 }
 
+static void __init pseries_add_dump_stack_arch_desc(void)
+{
+   struct device_node *dn;
+   const char *prop;
+
+   dn = of_find_node_by_path("/openprom");
+   if (dn) {
+   prop = of_get_property(dn, "model", NULL);
+   if (prop)
+   seq_buf_printf(_hw_desc, "of:'%s' ", prop);
+
+   of_node_put(dn);
+   }
+
+   dn = of_find_node_by_path("/hypervisor");
+   if (dn) {
+   prop = of_get_property(dn, "compatible", NULL);
+   if (prop)
+   seq_buf_printf(_hw_desc, "hv:%s ", prop);
+
+   of_node_put(dn);
+   } else {
+   dn = of_find_node_by_path("/");
+   if (dn) {
+   prop = of_get_property(dn, "ibm,fw-net-version", NULL);
+   if (prop)
+   seq_buf_printf(_hw_desc, "hv:phyp ");
+
+   of_node_put(dn);
+   }
+   }
+}
+
 /*
  * Early initialization.  Relocation is on but do not reference unbolted pages
  */
@@ -1018,6 +1052,8 @@ static void __init pseries_init(void)
 {
pr_debug(" -> pseries_init()\n");
 
+   pseries_add_dump_stack_arch_desc();
+
 #ifdef CONFIG_HVC_CONSOLE
if (firmware_has_feature(FW_FEATURE_LPAR))
hvc_vio_init_early();
-- 
2.37.3



[PATCH 2/6] powerpc: Add PVR & CPU name to dump stack arch description

2022-09-28 Thread Michael Ellerman
Add the PVR and CPU name to the dump stack arch description, which is
printed in case of an oops.

eg: Hardware name: ... POWER8E (raw) pvr:0x4b0201

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/prom.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 2e7a04dab2f7..7987d69f1785 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -819,6 +820,10 @@ void __init early_init_devtree(void *params)
 
dt_cpu_ftrs_scan();
 
+   /* We can now set the CPU name & PVR for the oops output */
+   seq_buf_printf(_hw_desc, "%s pvr:0x%04lx ", cur_cpu_spec->cpu_name,
+  mfspr(SPRN_PVR));
+
/* Retrieve CPU related informations from the flat tree
 * (altivec support, boot CPU ID, ...)
 */
-- 
2.37.3



[PATCH 5/6] powerpc/powernv: Add opal details to dump stack arch description

2022-09-28 Thread Michael Ellerman
Add OPAL version details to the dump stack arch description, which is
printed in case of an oops.

eg: Hardware name: ... opal:v6.2

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/powernv/setup.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index dac545aa0308..fddc61ec1d5b 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -207,8 +208,33 @@ static void __init pnv_setup_arch(void)
pnv_rng_init();
 }
 
+static void __init pnv_add_dump_stack_arch_desc(void)
+{
+   struct device_node *dn;
+   const char *version;
+
+   dn = of_find_node_by_path("/ibm,opal/firmware");
+   if (!dn)
+   return;
+
+   version = of_get_property(dn, "version", NULL);
+   if (!version)
+   version = of_get_property(dn, "git-id", NULL);
+
+   if (version)
+   seq_buf_printf(_hw_desc, "opal:%s", version);
+
+   version = of_get_property(dn, "mi-version", NULL);
+   if (version)
+   seq_buf_printf(_hw_desc, "mi:%s", version);
+
+   of_node_put(dn);
+}
+
 static void __init pnv_init(void)
 {
+   pnv_add_dump_stack_arch_desc();
+
/*
 * Initialize the LPC bus now so that legacy serial
 * ports can be found on it
-- 
2.37.3



[PATCH 1/6] powerpc: Add ppc_md.name to dump stack arch description

2022-09-28 Thread Michael Ellerman
As soon as we know the name of the machine description we're using,
add it to the dump stack arch description, which is printed in case of
an oops.

eg: Hardware name: ... machine:pSeries

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/setup.h   |  2 ++
 arch/powerpc/kernel/setup-common.c | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 85143849a586..e29e83f8a89c 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -88,6 +88,8 @@ unsigned long __init prom_init(unsigned long r3, unsigned 
long r4,
   unsigned long pp, unsigned long r6,
   unsigned long r7, unsigned long kbase);
 
+extern struct seq_buf ppc_hw_desc;
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_SETUP_H */
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index dd98f43bd685..42279c3f1d31 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -25,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -588,6 +590,15 @@ static __init int add_pcspkr(void)
 device_initcall(add_pcspkr);
 #endif /* CONFIG_PCSPKR_PLATFORM */
 
+static char ppc_hw_desc_buf[128] __initdata;
+
+struct seq_buf ppc_hw_desc __initdata = {
+   .buffer = ppc_hw_desc_buf,
+   .size = sizeof(ppc_hw_desc_buf),
+   .len = 0,
+   .readpos = 0,
+};
+
 static __init void probe_machine(void)
 {
extern struct machdep_calls __machine_desc_start;
@@ -628,6 +639,9 @@ static __init void probe_machine(void)
for (;;);
}
 
+   seq_buf_printf(_hw_desc,"machine:%s ", ppc_md.name);
+   dump_stack_set_arch_desc(ppc_hw_desc.buffer);
+
printk(KERN_INFO "Using %s machine description\n", ppc_md.name);
 }
 
-- 
2.37.3



[PATCH 4/6] powerpc: Add device-tree model to dump stack arch description

2022-09-28 Thread Michael Ellerman
Add the model of the machine we're on to the dump stack arch
description, which is printed in case of an oops.

eg: Hardware name: model:'IBM,8247-22L'

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/prom.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index dca06ef7fcde..605e4ad8a4c2 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -715,6 +715,23 @@ static void __init tm_init(void)
 static void tm_init(void) { }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
+static int __init
+early_init_dt_scan_model(unsigned long node, const char *uname,
+int depth, void *data)
+{
+   const char *prop;
+
+   if (depth != 0)
+   return 0;
+
+   prop = of_get_flat_dt_prop(node, "model", NULL);
+   if (prop)
+   seq_buf_printf(_hw_desc, "model:'%s' ", prop);
+
+   /* break now */
+   return 1;
+}
+
 #ifdef CONFIG_PPC64
 static void __init save_fscr_to_task(void)
 {
@@ -743,6 +760,8 @@ void __init early_init_devtree(void *params)
if (!early_init_dt_verify(params))
panic("BUG: Failed verifying flat device tree, bad version?");
 
+   of_scan_flat_dt(early_init_dt_scan_model, NULL);
+
 #ifdef CONFIG_PPC_RTAS
/* Some machines might need RTAS info for debugging, grab it now. */
of_scan_flat_dt(early_init_dt_scan_rtas, NULL);
-- 
2.37.3



[PATCH 3/6] powerpc/64: Add logical PVR to the dump stack arch description

2022-09-28 Thread Michael Ellerman
If we detect a logical PVR add that to the dump stack arch
description, which is printed in case of an oops.

eg: Hardware name: ... lpvr:0xf04

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/prom.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 7987d69f1785..dca06ef7fcde 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -390,8 +390,10 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
 */
if (!dt_cpu_ftrs_in_use()) {
prop = of_get_flat_dt_prop(node, "cpu-version", NULL);
-   if (prop && (be32_to_cpup(prop) & 0xff00) == 0x0f00)
+   if (prop && (be32_to_cpup(prop) & 0xff00) == 0x0f00) {
identify_cpu(0, be32_to_cpup(prop));
+   seq_buf_printf(_hw_desc, "lpvr:0x%04x ", 
be32_to_cpup(prop));
+   }
 
check_cpu_feature_properties(node);
check_cpu_features(node, "ibm,pa-features", ibm_pa_features,
-- 
2.37.3



[PATCH] powerpc: Drops STABS_DEBUG from linker scripts

2022-09-28 Thread Michael Ellerman
No toolchain we support should be generating stabs debug information
anymore. Drop the sections entirely from our linker scripts.

We removed all the manual stabs annotations in commit
12318163737c ("powerpc/32: Remove remaining .stabs annotations").

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/vdso/vdso32.lds.S | 1 -
 arch/powerpc/kernel/vdso/vdso64.lds.S | 1 -
 arch/powerpc/kernel/vmlinux.lds.S | 1 -
 3 files changed, 3 deletions(-)

diff --git a/arch/powerpc/kernel/vdso/vdso32.lds.S 
b/arch/powerpc/kernel/vdso/vdso32.lds.S
index e0d19d74455f..bc0be274a9ac 100644
--- a/arch/powerpc/kernel/vdso/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso/vdso32.lds.S
@@ -78,7 +78,6 @@ SECTIONS
__end = .;
PROVIDE(end = .);
 
-   STABS_DEBUG
DWARF_DEBUG
ELF_DETAILS
 
diff --git a/arch/powerpc/kernel/vdso/vdso64.lds.S 
b/arch/powerpc/kernel/vdso/vdso64.lds.S
index 1a4a7bc4c815..744ae5363e6c 100644
--- a/arch/powerpc/kernel/vdso/vdso64.lds.S
+++ b/arch/powerpc/kernel/vdso/vdso64.lds.S
@@ -76,7 +76,6 @@ SECTIONS
_end = .;
PROVIDE(end = .);
 
-   STABS_DEBUG
DWARF_DEBUG
ELF_DETAILS
 
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index fe22d940412f..8a1c47b59b5a 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -382,7 +382,6 @@ SECTIONS
_end = . ;
PROVIDE32 (end = .);
 
-   STABS_DEBUG
DWARF_DEBUG
ELF_DETAILS
 
-- 
2.37.3



[PATCH] powerpc/64s: Remove lost/old comment

2022-09-28 Thread Michael Ellerman
The bulk of this was moved/reworded in:
  57f266497d81 ("powerpc: Use gas sections for arranging exception vectors")

And now appears around line 700 in arch/powerpc/kernel/exceptions-64s.S.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/exceptions-64s.S | 10 --
 1 file changed, 10 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 8088134ccb2a..dee434a3e145 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -3034,16 +3034,6 @@ EXPORT_SYMBOL(do_uaccess_flush)
 MASKED_INTERRUPT
 MASKED_INTERRUPT hsrr=1
 
-   /*
-* Relocation-on interrupts: A subset of the interrupts can be delivered
-* with IR=1/DR=1, if AIL==2 and MSR.HV won't be changed by delivering
-* it.  Addresses are the same as the original interrupt addresses, but
-* offset by 0xc0004000.
-* It's impossible to receive interrupts below 0x300 via this mechanism.
-* KVM: None of these traps are from the guest ; anything that escalated
-* to HV=1 from HV=0 is delivered via real mode handlers.
-*/
-
 USE_FIXED_SECTION(virt_trampolines)
/*
 * All code below __end_soft_masked is treated as soft-masked. If
-- 
2.37.3



[PATCH] powerpc/64s: Remove old STAB comment

2022-09-28 Thread Michael Ellerman
This used to be about the 0x4300 handler, but that was moved in commit
da2bc4644c75 ("powerpc/64s: Add new exception vector macros").

Note that "STAB" here refers to "Segment Table" not the debug format.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/exceptions-64s.S | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 3d0dc133a9ae..8088134ccb2a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -3044,12 +3044,6 @@ MASKED_INTERRUPT hsrr=1
 * to HV=1 from HV=0 is delivered via real mode handlers.
 */
 
-   /*
-* This uses the standard macro, since the original 0x300 vector
-* only has extra guff for STAB-based processors -- which never
-* come here.
-*/
-
 USE_FIXED_SECTION(virt_trampolines)
/*
 * All code below __end_soft_masked is treated as soft-masked. If
-- 
2.37.3



Re: [PATCH v6 13/25] powerpc: Remove direct call to mmap2 syscall handlers

2022-09-28 Thread Arnd Bergmann
On Wed, Sep 28, 2022, at 2:15 PM, Michael Ellerman wrote:

> But I think it makes more sense to do the same as mmap2() and pass the
> 4K offset through, and pass shift = PAGE_SHIFT - 12. I also borrowed the
> "off_4k" name from arm64. End result:
>
> #ifdef CONFIG_COMPAT
> COMPAT_SYSCALL_DEFINE6(mmap2,
>  unsigned long, addr, size_t, len,
>  unsigned long, prot, unsigned long, flags,
>  unsigned long, fd, unsigned long, off_4k)
> {
>   return do_mmap2(addr, len, prot, flags, fd, off_4k, PAGE_SHIFT-12);
> }
> #endif
>
> With that my G5 boots again :)

Any chance we can instead add a working compat_sys_mmap2/sys_mmap2
in mm/mmap.c alongside the sys_mmap_pgoff implementation?

While sys_mmap_pgoff() was meant to replace the various sys_mmap2()
implementations, I think it was ultimately a mistake, and we later
converged on the sys_mmap2() calling conventions with 12 bits
offset for almost all 32-bit architectures.

   Arnd


Re: [PATCH v6 13/25] powerpc: Remove direct call to mmap2 syscall handlers

2022-09-28 Thread Michael Ellerman
Rohan McLure  writes:
> Syscall handlers should not be invoked internally by their symbol names,
> as these symbols defined by the architecture-defined SYSCALL_DEFINE
> macro. Move the compatibility syscall definition for mmap2 to
> syscalls.c, so that all mmap implementations can share a helper function.
>
> Remove 'inline' on static mmap helper.
>
> Signed-off-by: Rohan McLure 
> Reviewed-by: Nicholas Piggin 
> ---
> V2: Move mmap2 compat implementation to asm/kernel/syscalls.c.
> V4: Move to be applied before syscall wrapper introduced.
> V5: Remove 'inline' in helper.
> ---
>  arch/powerpc/kernel/sys_ppc32.c |  9 -
>  arch/powerpc/kernel/syscalls.c  | 17 ++---
>  2 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
> index d961634976d8..776ae7565fc5 100644
> --- a/arch/powerpc/kernel/sys_ppc32.c
> +++ b/arch/powerpc/kernel/sys_ppc32.c
> @@ -25,7 +25,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -48,14 +47,6 @@
>  #include 
>  #include 
>  
> -unsigned long compat_sys_mmap2(unsigned long addr, size_t len,
> -   unsigned long prot, unsigned long flags,
> -   unsigned long fd, unsigned long pgoff)
> -{
> - /* This should remain 12 even if PAGE_SIZE changes */
> - return sys_mmap(addr, len, prot, flags, fd, pgoff << 12);
> -}
> -
>  compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
> compat_size_t count,
>u32 reg6, u32 pos1, u32 pos2)
>  {
> diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
> index a04c97faa21a..9830957498b0 100644
> --- a/arch/powerpc/kernel/syscalls.c
> +++ b/arch/powerpc/kernel/syscalls.c
> @@ -36,9 +36,9 @@
>  #include 
>  #include 
>  
> -static inline long do_mmap2(unsigned long addr, size_t len,
> - unsigned long prot, unsigned long flags,
> - unsigned long fd, unsigned long off, int shift)
> +static long do_mmap2(unsigned long addr, size_t len,
> +  unsigned long prot, unsigned long flags,
> +  unsigned long fd, unsigned long off, int shift)
>  {
>   if (!arch_validate_prot(prot, addr))
>   return -EINVAL;
> @@ -56,6 +56,17 @@ SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
>   return do_mmap2(addr, len, prot, flags, fd, pgoff, PAGE_SHIFT-12);
>  }
>  
> +#ifdef CONFIG_COMPAT
> +COMPAT_SYSCALL_DEFINE6(mmap2,
> +unsigned long, addr, size_t, len,
> +unsigned long, prot, unsigned long, flags,
> +unsigned long, fd, unsigned long, pgoff)
> +{
> + /* This should remain 12 even if PAGE_SIZE changes */
> + return do_mmap2(addr, len, prot, flags, fd, pgoff << 12, PAGE_SHIFT-12);

This isn't quite right.

The comment about it remaining 12 is kind of misleading, it was true
when compat_sys_mmap2() called sys_mmap(), but it's wrong now that we're
calling do_mmap2().

The incoming "pgoff" here is in units of 4K.

do_mmap2() takes "off" in whatever units, but also takes "shift", which
has to tell us how to shift "off" into PAGE_SIZE units.

If we pass off = pgoff << 12, that's in bytes, so we need to page
shift = PAGE_SHIFT.

But I think it makes more sense to do the same as mmap2() and pass the
4K offset through, and pass shift = PAGE_SHIFT - 12. I also borrowed the
"off_4k" name from arm64. End result:

#ifdef CONFIG_COMPAT
COMPAT_SYSCALL_DEFINE6(mmap2,
   unsigned long, addr, size_t, len,
   unsigned long, prot, unsigned long, flags,
   unsigned long, fd, unsigned long, off_4k)
{
return do_mmap2(addr, len, prot, flags, fd, off_4k, PAGE_SHIFT-12);
}
#endif

With that my G5 boots again :)

cheers


Re: [PATCH v6 19/25] powerpc: Remove high-order word clearing on compat syscall entry

2022-09-28 Thread Michael Ellerman
Rohan McLure  writes:
> Remove explicit clearing of the high order-word of user parameters when
> handling compatibility syscalls in system_call_exception. The
> COMPAT_SYSCALL_DEFINEx macros handle this clearing through an
> explicit cast to the signature type of the target handler.

Unfortunately this doesn't work.

We don't have compat handlers for everything, so we end up with 64-bit
values getting passsed in and things break.

Our hugetlb_vs_thp selftest hits it, as seen in ftrace:

  12848 mmap(0xa000, 16777216, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf6fb
  12848 munmap(0xa000, 16777216) = -1 EINVAL (Invalid argument)

But in the source the only mmap()/munmap() is of 0xa000.

Looking at x86 they send all 32-bit syscalls via a wrapper that does the
truncation (SC_IA32_REGS_TO_ARGS). So I think we could do something
similar eventually and get rid of this explicit clearing.

But I don't want to predicate this whole series on that, so I've dropped
this patch for now, and reworked some of the following patches to keep
the register clearing.

cheers

> diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
> index 9875486f6168..15af0ed019a7 100644
> --- a/arch/powerpc/kernel/syscall.c
> +++ b/arch/powerpc/kernel/syscall.c
> @@ -157,14 +157,6 @@ notrace long system_call_exception(long r3, long r4, 
> long r5,
>  
>   if (unlikely(is_compat_task())) {
>   f = (void *)compat_sys_call_table[r0];
> -
> - r3 &= 0xULL;
> - r4 &= 0xULL;
> - r5 &= 0xULL;
> - r6 &= 0xULL;
> - r7 &= 0xULL;
> - r8 &= 0xULL;
> -
>   } else {
>   f = (void *)sys_call_table[r0];
>   }
> -- 
> 2.34.1


Re: [PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()

2022-09-28 Thread Alistair Popple


Lyude Paul  writes:

> On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote:
>> nouveau_dmem_fault_copy_one() is used during handling of CPU faults via
>> the migrate_to_ram() callback and is used to copy data from GPU to CPU
>> memory. It is currently specific to fault handling, however a future
>> patch implementing eviction of data during teardown needs similar
>> functionality.
>>
>> Refactor out the core functionality so that it is not specific to fault
>> handling.
>>
>> Signed-off-by: Alistair Popple 
>> ---
>>  drivers/gpu/drm/nouveau/nouveau_dmem.c | 59 +--
>>  1 file changed, 29 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
>> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> index f9234ed..66ebbd4 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> @@ -139,44 +139,25 @@ static void nouveau_dmem_fence_done(struct 
>> nouveau_fence **fence)
>>  }
>>  }
>>
>> -static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm,
>> -struct vm_fault *vmf, struct migrate_vma *args,
>> -dma_addr_t *dma_addr)
>> +static int nouveau_dmem_copy_one(struct nouveau_drm *drm, struct page 
>> *spage,
>> +struct page *dpage, dma_addr_t *dma_addr)
>>  {
>>  struct device *dev = drm->dev->dev;
>> -struct page *dpage, *spage;
>> -struct nouveau_svmm *svmm;
>> -
>> -spage = migrate_pfn_to_page(args->src[0]);
>> -if (!spage || !(args->src[0] & MIGRATE_PFN_MIGRATE))
>> -return 0;
>>
>> -dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address);
>> -if (!dpage)
>> -return VM_FAULT_SIGBUS;
>>  lock_page(dpage);
>>
>>  *dma_addr = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
>>  if (dma_mapping_error(dev, *dma_addr))
>> -goto error_free_page;
>> +return -EIO;
>>
>> -svmm = spage->zone_device_data;
>> -mutex_lock(>mutex);
>> -nouveau_svmm_invalidate(svmm, args->start, args->end);
>>  if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr,
>> -NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage)))
>> -goto error_dma_unmap;
>> -mutex_unlock(>mutex);
>> + NOUVEAU_APER_VRAM,
>> + nouveau_dmem_page_addr(spage))) {
>> +dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
>> +return -EIO;
>> +}
>
> Feel free to just align this with the starting (, as long as it doesn't go
> above 100 characters it doesn't really matter imho and would look nicer that
> way.
>
> Otherwise:
>
> Reviewed-by: Lyude Paul 

Thanks! I'm not sure I precisely understood your alignment comment above
but feel free to let me know if I got it wrong in v2.

> Will look at the other patch in a moment
>
>>
>> -args->dst[0] = migrate_pfn(page_to_pfn(dpage));
>>  return 0;
>> -
>> -error_dma_unmap:
>> -mutex_unlock(>mutex);
>> -dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
>> -error_free_page:
>> -__free_page(dpage);
>> -return VM_FAULT_SIGBUS;
>>  }
>>
>>  static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
>> @@ -184,9 +165,11 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct 
>> vm_fault *vmf)
>>  struct nouveau_drm *drm = page_to_drm(vmf->page);
>>  struct nouveau_dmem *dmem = drm->dmem;
>>  struct nouveau_fence *fence;
>> +struct nouveau_svmm *svmm;
>> +struct page *spage, *dpage;
>>  unsigned long src = 0, dst = 0;
>>  dma_addr_t dma_addr = 0;
>> -vm_fault_t ret;
>> +vm_fault_t ret = 0;
>>  struct migrate_vma args = {
>>  .vma= vmf->vma,
>>  .start  = vmf->address,
>> @@ -207,9 +190,25 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct 
>> vm_fault *vmf)
>>  if (!args.cpages)
>>  return 0;
>>
>> -ret = nouveau_dmem_fault_copy_one(drm, vmf, , _addr);
>> -if (ret || dst == 0)
>> +spage = migrate_pfn_to_page(src);
>> +if (!spage || !(src & MIGRATE_PFN_MIGRATE))
>> +goto done;
>> +
>> +dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address);
>> +if (!dpage)
>> +goto done;
>> +
>> +dst = migrate_pfn(page_to_pfn(dpage));
>> +
>> +svmm = spage->zone_device_data;
>> +mutex_lock(>mutex);
>> +nouveau_svmm_invalidate(svmm, args.start, args.end);
>> +ret = nouveau_dmem_copy_one(drm, spage, dpage, _addr);
>> +mutex_unlock(>mutex);
>> +if (ret) {
>> +ret = VM_FAULT_SIGBUS;
>>  goto done;
>> +}
>>
>>  nouveau_fence_new(dmem->migrate.chan, false, );
>>  migrate_vma_pages();


Re: [PATCH v4 5/5] drm/ofdrm: Support big-endian scanout buffers

2022-09-28 Thread Thomas Zimmermann

Hi

Am 28.09.22 um 13:12 schrieb Michal Suchánek:

Hello,

On Wed, Sep 28, 2022 at 12:50:10PM +0200, Thomas Zimmermann wrote:

All DRM formats assume little-endian byte order. On big-endian systems,
it is likely that the scanout buffer is in big endian as well. Update
the format accordingly and add endianess conversion to the format-helper
library. Also opt-in to allocated buffers in host format by default.


This sounds backwards to me.


In which way?



Skimming through the code it sounds like the buffer is in fact in the
same format all the time but when the CPU is switched to BE it sees the
data loaded from it differently.

Or am I missing something?


Which buffer do you mean? The scanout buffer coming from the firmware, 
or the GEM BOs that are allocated by renderers?


The scanout buffer is either in BE or LE format. According to the code 
in offb, it's signaled by a node in the device tree. I took that code 
into ofdrm and set the scanout format accordingly.


The GEM BO can be in any format. If necessary, ofdrm converts internally 
while copying it to the scanout buffer. The quirk we opt in, makes DRM 
prefer whatever default byteorder the host prefers (BE or LE). According 
to the docs, it's the right thing to do. But that only affects the GEM 
code, not the scanout buffer.


Best regards
Thomas



Thanks

Michal



Suggested-by: Geert Uytterhoeven 
Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/drm_format_helper.c | 10 ++
  drivers/gpu/drm/tiny/ofdrm.c| 55 +++--
  2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_format_helper.c 
b/drivers/gpu/drm/drm_format_helper.c
index 4afc4ac27342..fca7936db083 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -659,6 +659,11 @@ int drm_fb_blit(struct iosys_map *dst, const unsigned int 
*dst_pitch, uint32_t d
drm_fb_xrgb_to_rgb565(dst, dst_pitch, src, fb, 
clip, false);
return 0;
}
+   } else if (dst_format == (DRM_FORMAT_RGB565 | DRM_FORMAT_BIG_ENDIAN)) {
+   if (fb_format == DRM_FORMAT_RGB565) {
+   drm_fb_swab(dst, dst_pitch, src, fb, clip, false);
+   return 0;
+   }
} else if (dst_format == DRM_FORMAT_RGB888) {
if (fb_format == DRM_FORMAT_XRGB) {
drm_fb_xrgb_to_rgb888(dst, dst_pitch, src, fb, 
clip);
@@ -677,6 +682,11 @@ int drm_fb_blit(struct iosys_map *dst, const unsigned int 
*dst_pitch, uint32_t d
drm_fb_xrgb_to_xrgb2101010(dst, dst_pitch, src, fb, 
clip);
return 0;
}
+   } else if (dst_format == DRM_FORMAT_BGRX) {
+   if (fb_format == DRM_FORMAT_XRGB) {
+   drm_fb_swab(dst, dst_pitch, src, fb, clip, false);
+   return 0;
+   }
}
  
  	drm_warn_once(fb->dev, "No conversion helper from %p4cc to %p4cc found.\n",

diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index 0bf5eebf6678..6e100a7f5db7 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -94,7 +94,7 @@ static int display_get_validated_int0(struct drm_device *dev, 
const char *name,
  }
  
  static const struct drm_format_info *display_get_validated_format(struct drm_device *dev,

- u32 depth)
+ u32 depth, 
bool big_endian)
  {
const struct drm_format_info *info;
u32 format;
@@ -115,6 +115,29 @@ static const struct drm_format_info 
*display_get_validated_format(struct drm_dev
return ERR_PTR(-EINVAL);
}
  
+	/*

+* DRM formats assume little-endian byte order. Update the format
+* if the scanout buffer uses big-endian ordering.
+*/
+   if (big_endian) {
+   switch (format) {
+   case DRM_FORMAT_XRGB:
+   format = DRM_FORMAT_BGRX;
+   break;
+   case DRM_FORMAT_ARGB:
+   format = DRM_FORMAT_BGRA;
+   break;
+   case DRM_FORMAT_RGB565:
+   format = DRM_FORMAT_RGB565 | DRM_FORMAT_BIG_ENDIAN;
+   break;
+   case DRM_FORMAT_XRGB1555:
+   format = DRM_FORMAT_XRGB1555 | DRM_FORMAT_BIG_ENDIAN;
+   break;
+   default:
+   break;
+   }
+   }
+
info = drm_format_info(format);
if (!info) {
drm_err(dev, "cannot find framebuffer format for depth %u\n", 
depth);
@@ -134,6 +157,23 @@ static int display_read_u32_of(struct drm_device *dev, 
struct device_node *of_no
return ret;
  }
  
+static 

Re: [PATCH v4 5/5] drm/ofdrm: Support big-endian scanout buffers

2022-09-28 Thread Michal Suchánek
Hello,

On Wed, Sep 28, 2022 at 12:50:10PM +0200, Thomas Zimmermann wrote:
> All DRM formats assume little-endian byte order. On big-endian systems,
> it is likely that the scanout buffer is in big endian as well. Update
> the format accordingly and add endianess conversion to the format-helper
> library. Also opt-in to allocated buffers in host format by default.

This sounds backwards to me.

Skimming through the code it sounds like the buffer is in fact in the
same format all the time but when the CPU is switched to BE it sees the
data loaded from it differently.

Or am I missing something?

Thanks

Michal

> 
> Suggested-by: Geert Uytterhoeven 
> Signed-off-by: Thomas Zimmermann 
> ---
>  drivers/gpu/drm/drm_format_helper.c | 10 ++
>  drivers/gpu/drm/tiny/ofdrm.c| 55 +++--
>  2 files changed, 63 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_format_helper.c 
> b/drivers/gpu/drm/drm_format_helper.c
> index 4afc4ac27342..fca7936db083 100644
> --- a/drivers/gpu/drm/drm_format_helper.c
> +++ b/drivers/gpu/drm/drm_format_helper.c
> @@ -659,6 +659,11 @@ int drm_fb_blit(struct iosys_map *dst, const unsigned 
> int *dst_pitch, uint32_t d
>   drm_fb_xrgb_to_rgb565(dst, dst_pitch, src, fb, 
> clip, false);
>   return 0;
>   }
> + } else if (dst_format == (DRM_FORMAT_RGB565 | DRM_FORMAT_BIG_ENDIAN)) {
> + if (fb_format == DRM_FORMAT_RGB565) {
> + drm_fb_swab(dst, dst_pitch, src, fb, clip, false);
> + return 0;
> + }
>   } else if (dst_format == DRM_FORMAT_RGB888) {
>   if (fb_format == DRM_FORMAT_XRGB) {
>   drm_fb_xrgb_to_rgb888(dst, dst_pitch, src, fb, 
> clip);
> @@ -677,6 +682,11 @@ int drm_fb_blit(struct iosys_map *dst, const unsigned 
> int *dst_pitch, uint32_t d
>   drm_fb_xrgb_to_xrgb2101010(dst, dst_pitch, src, fb, 
> clip);
>   return 0;
>   }
> + } else if (dst_format == DRM_FORMAT_BGRX) {
> + if (fb_format == DRM_FORMAT_XRGB) {
> + drm_fb_swab(dst, dst_pitch, src, fb, clip, false);
> + return 0;
> + }
>   }
>  
>   drm_warn_once(fb->dev, "No conversion helper from %p4cc to %p4cc 
> found.\n",
> diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
> index 0bf5eebf6678..6e100a7f5db7 100644
> --- a/drivers/gpu/drm/tiny/ofdrm.c
> +++ b/drivers/gpu/drm/tiny/ofdrm.c
> @@ -94,7 +94,7 @@ static int display_get_validated_int0(struct drm_device 
> *dev, const char *name,
>  }
>  
>  static const struct drm_format_info *display_get_validated_format(struct 
> drm_device *dev,
> -   u32 depth)
> +   u32 depth, 
> bool big_endian)
>  {
>   const struct drm_format_info *info;
>   u32 format;
> @@ -115,6 +115,29 @@ static const struct drm_format_info 
> *display_get_validated_format(struct drm_dev
>   return ERR_PTR(-EINVAL);
>   }
>  
> + /*
> +  * DRM formats assume little-endian byte order. Update the format
> +  * if the scanout buffer uses big-endian ordering.
> +  */
> + if (big_endian) {
> + switch (format) {
> + case DRM_FORMAT_XRGB:
> + format = DRM_FORMAT_BGRX;
> + break;
> + case DRM_FORMAT_ARGB:
> + format = DRM_FORMAT_BGRA;
> + break;
> + case DRM_FORMAT_RGB565:
> + format = DRM_FORMAT_RGB565 | DRM_FORMAT_BIG_ENDIAN;
> + break;
> + case DRM_FORMAT_XRGB1555:
> + format = DRM_FORMAT_XRGB1555 | DRM_FORMAT_BIG_ENDIAN;
> + break;
> + default:
> + break;
> + }
> + }
> +
>   info = drm_format_info(format);
>   if (!info) {
>   drm_err(dev, "cannot find framebuffer format for depth %u\n", 
> depth);
> @@ -134,6 +157,23 @@ static int display_read_u32_of(struct drm_device *dev, 
> struct device_node *of_no
>   return ret;
>  }
>  
> +static bool display_get_big_endian_of(struct drm_device *dev, struct 
> device_node *of_node)
> +{
> + bool big_endian;
> +
> +#ifdef __BIG_ENDIAN
> + big_endian = true;
> + if (of_get_property(of_node, "little-endian", NULL))
> + big_endian = false;
> +#else
> + big_endian = false;
> + if (of_get_property(of_node, "big-endian", NULL))
> + big_endian = true;
> +#endif
> +
> + return big_endian;
> +}
> +
>  static int display_get_width_of(struct drm_device *dev, struct device_node 
> *of_node)
>  {
>   u32 width;
> @@ -613,6 +653,7 @@ static void ofdrm_device_set_gamma_linear(struct 
> 

Re: [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api

2022-09-28 Thread Serge Semin
On Wed, Sep 28, 2022 at 06:59:37PM +0800, Zhuo Chen wrote:
> Hello.
> 
> Here comes patch v3, which contains some fixes and optimizations of
> aer api usage. The v1 and v2 can be found on the mailing list.
> 
> v3:
> - Modifications to comments proposed by Sathyanarayanan.

> Remove
>   pci_aer_clear_nonfatal_status() call in NTB and improve commit log. 

Failed to see who has requested that...

-Sergey

> 
> v2:
> - Modifications to comments proposed by Bjorn. Split patch into more
>   obvious parts.
> 
> Zhuo Chen (9):
>   PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
>   PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
> uncorrectable error status
>   NTB: Remove pci_aer_clear_nonfatal_status() call
>   scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
>   PCI/AER: Unexport pci_aer_clear_nonfatal_status()
>   PCI/AER: Move check inside pcie_clear_device_status().
>   PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
>   PCI/ERR: Clear fatal error status when pci_channel_io_frozen
>   PCI/AER: Refine status clearing process with api
> 
>  drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
>  drivers/pci/pci.c   |  7 +++--
>  drivers/pci/pci.h   |  2 ++
>  drivers/pci/pcie/aer.c  | 45 +++--
>  drivers/pci/pcie/dpc.c  |  3 +--
>  drivers/pci/pcie/err.c  | 15 ---
>  drivers/pci/pcie/portdrv_core.c |  3 +--
>  drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
>  include/linux/aer.h |  4 +--
>  9 files changed, 44 insertions(+), 41 deletions(-)
> 
> -- 
> 2.30.1 (Apple Git-130)
> 


Re: [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call

2022-09-28 Thread Serge Semin
On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> There is no need to clear error status during init code, so remove it.

Why do you think there isn't? Justify in more details.

-Sergey

> 
> Signed-off-by: Zhuo Chen 
> ---
>  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 0ed6f809ff2e..fed03217289d 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>   ret = pci_enable_pcie_error_reporting(pdev);
>   if (ret != 0)
>   dev_warn(>dev, "PCIe AER capability disabled\n");
> - else /* Cleanup nonfatal error status before getting to init */
> - pci_aer_clear_nonfatal_status(pdev);
>  
>   /* First enable the PCI device */
>   ret = pcim_enable_device(pdev);
> -- 
> 2.30.1 (Apple Git-130)
> 


[PATCH v3 9/9] PCI/AER: Refine status clearing process with api

2022-09-28 Thread Zhuo Chen
Statements clearing status in aer_enable_rootport() is functionally
equivalent with pcie_clear_device_status() and pci_aer_clear_status().
So replace them, which has no functional changes.

After commit 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status()
to unconditionally clear Error Status"), pci_aer_raw_clear_status()
is only used by the EDR path, so we add note in function comment.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/aer.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a6d29269ccf2..bd5ecfa4860f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -306,6 +306,8 @@ EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
  * Clearing AER error status registers unconditionally, regardless of
  * whether they're owned by firmware or the OS.
  *
+ * Used only by the EDR path. Other paths should use pci_aer_clear_status().
+ *
  * Returns 0 on success, or negative on failure.
  */
 int pci_aer_raw_clear_status(struct pci_dev *dev)
@@ -1277,24 +1279,17 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
 {
struct pci_dev *pdev = rpc->rpd;
int aer = pdev->aer_cap;
-   u16 reg16;
u32 reg32;
 
/* Clear PCIe Capability's Device Status */
-   pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, );
-   pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16);
+   pcie_clear_device_status(pdev);
 
/* Disable system error generation in response to error messages */
pcie_capability_clear_word(pdev, PCI_EXP_RTCTL,
   SYSTEM_ERROR_INTR_ON_MESG_MASK);
 
/* Clear error status */
-   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
+   pci_aer_clear_status(pdev);
 
/*
 * Enable error reporting for the root port device and downstream port
-- 
2.30.1 (Apple Git-130)



[PATCH v3 8/9] PCI/ERR: Clear fatal error status when pci_channel_io_frozen

2022-09-28 Thread Zhuo Chen
When state is pci_channel_io_frozen in pcie_do_recovery(), the
severity is fatal and fatal error status should be cleared.
So add pci_aer_clear_fatal_status().

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/err.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index f80b21244ef1..b46f1d36c090 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -241,7 +241,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_walk_bridge(bridge, report_resume, );
 
pcie_clear_device_status(dev);
-   pci_aer_clear_nonfatal_status(dev);
+   if (state == pci_channel_io_frozen)
+   pci_aer_clear_fatal_status(dev);
+   else
+   pci_aer_clear_nonfatal_status(dev);
 
pci_info(bridge, "device recovery successful\n");
return status;
-- 
2.30.1 (Apple Git-130)



[PATCH v3 7/9] PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER

2022-09-28 Thread Zhuo Chen
Use pcie_aer_is_native() in place of "host->native_aer ||
pcie_ports_native" to judge whether OS owns AER in aer_root_reset().

Replace "dev->aer_cap && (pcie_ports_native || host->native_aer)" in
get_port_device_capability() with pcie_aer_is_native(), which has no
functional changes.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/aer.c  | 5 ++---
 drivers/pci/pcie/portdrv_core.c | 3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2320ab27a31..a6d29269ccf2 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1403,7 +1403,6 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
int type = pci_pcie_type(dev);
struct pci_dev *root;
int aer;
-   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
u32 reg32;
int rc;
 
@@ -1424,7 +1423,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
 */
aer = root ? root->aer_cap : 0;
 
-   if ((host->native_aer || pcie_ports_native) && aer) {
+   if (aer && pcie_aer_is_native(root)) {
/* Disable Root's interrupt in response to error messages */
pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
@@ -1443,7 +1442,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
pci_is_root_bus(dev->bus) ? "Root" : "Downstream", rc);
}
 
-   if ((host->native_aer || pcie_ports_native) && aer) {
+   if (aer && pcie_aer_is_native(root)) {
/* Clear Root Error Status */
pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, );
pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 1ac7fec47d6f..844297c0c85e 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -221,8 +221,7 @@ static int get_port_device_capability(struct pci_dev *dev)
}
 
 #ifdef CONFIG_PCIEAER
-   if (dev->aer_cap && pci_aer_available() &&
-   (pcie_ports_native || host->native_aer))
+   if (pcie_aer_is_native(dev) && pci_aer_available())
services |= PCIE_PORT_SERVICE_AER;
 #endif
 
-- 
2.30.1 (Apple Git-130)



[PATCH v3 6/9] PCI/AER: Move check inside pcie_clear_device_status().

2022-09-28 Thread Zhuo Chen
pcie_clear_device_status() doesn't check for pcie_aer_is_native()
internally, but after commit 068c29a248b6 ("PCI/ERR: Clear PCIe Device
Status errors only if OS owns AER") and commit aa344bc8b727 ("PCI/ERR:
Clear AER status only when we control AER"), both callers check before
calling it. So move the check inside pcie_clear_device_status().

pcie_clear_device_status() and pci_aer_clear_nonfatal_status() both
have check internally, so remove check when callers calling them.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pci.c  |  7 +--
 drivers/pci/pcie/aer.c |  4 ++--
 drivers/pci/pcie/err.c | 14 +++---
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 95bc329e74c0..8caf4a5529a1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2282,9 +2282,12 @@ EXPORT_SYMBOL_GPL(pci_set_pcie_reset_state);
 void pcie_clear_device_status(struct pci_dev *dev)
 {
u16 sta;
+   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
-   pcie_capability_read_word(dev, PCI_EXP_DEVSTA, );
-   pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
+   if (host->native_aer || pcie_ports_native) {
+   pcie_capability_read_word(dev, PCI_EXP_DEVSTA, );
+   pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
+   }
 }
 #endif
 
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2ebd108339d..e2320ab27a31 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -971,11 +971,11 @@ static void handle_error_source(struct pci_dev *dev, 
struct aer_err_info *info)
 * Correctable error does not need software intervention.
 * No need to go through error recovery process.
 */
-   if (aer)
+   if (aer) {
pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS,
info->status);
-   if (pcie_aer_is_native(dev))
pcie_clear_device_status(dev);
+   }
} else if (info->severity == AER_NONFATAL)
pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
else if (info->severity == AER_FATAL)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 59c90d04a609..f80b21244ef1 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -188,7 +188,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
int type = pci_pcie_type(dev);
struct pci_dev *bridge;
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
/*
 * If the error was detected by a Root Port, Downstream Port, RCEC,
@@ -241,16 +240,9 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast resume message\n");
pci_walk_bridge(bridge, report_resume, );
 
-   /*
-* If we have native control of AER, clear error status in the device
-* that detected the error.  If the platform retained control of AER,
-* it is responsible for clearing this status.  In that case, the
-* signaling device may not even be visible to the OS.
-*/
-   if (host->native_aer || pcie_ports_native) {
-   pcie_clear_device_status(dev);
-   pci_aer_clear_nonfatal_status(dev);
-   }
+   pcie_clear_device_status(dev);
+   pci_aer_clear_nonfatal_status(dev);
+
pci_info(bridge, "device recovery successful\n");
return status;
 
-- 
2.30.1 (Apple Git-130)



Re: [PATCH v2 3/9] NTB: Change to use pci_aer_clear_uncorrect_error_status()

2022-09-28 Thread Serge Semin
On Tue, Sep 27, 2022 at 11:35:18PM +0800, Zhuo Chen wrote:
> Status bits for ERR_NONFATAL errors only are cleared in
> pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
> error status in idt_init_pci(), so we change to use
> pci_aer_clear_uncorrect_error_status().

Have the modification changed since:
https://lore.kernel.org/linux-pci/20220901181634.99591-2-chenzhu...@bytedance.com/
?
AFAICS it hasn't. Then my ab-tag should have been preserved on v2.
One more time:
Acked-by: Serge Semin 

Don't forget to add it should you have some more patchset re-spins.

-Sergey

> 
> Signed-off-by: Zhuo Chen 
> ---
>  drivers/ntb/hw/idt/ntb_hw_idt.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 0ed6f809ff2e..d5f0aa87f817 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>   ret = pci_enable_pcie_error_reporting(pdev);
>   if (ret != 0)
>   dev_warn(>dev, "PCIe AER capability disabled\n");
> - else /* Cleanup nonfatal error status before getting to init */
> - pci_aer_clear_nonfatal_status(pdev);
> + else /* Cleanup uncorrectable error status before getting to init */
> + pci_aer_clear_uncorrect_error_status(pdev);
>  
>   /* First enable the PCI device */
>   ret = pcim_enable_device(pdev);
> -- 
> 2.30.1 (Apple Git-130)
> 


[PATCH v3 5/9] PCI/AER: Unexport pci_aer_clear_nonfatal_status()

2022-09-28 Thread Zhuo Chen
Since pci_aer_clear_nonfatal_status() is used only internally, move
its declaration to the PCI internal header file. Also, no one cares
about return value of pci_aer_clear_nonfatal_status(), so make it void.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pci.h  | 2 ++
 drivers/pci/pcie/aer.c | 7 ++-
 include/linux/aer.h| 5 -
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 785f31086313..a114175d08e4 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -684,6 +684,7 @@ void pci_aer_init(struct pci_dev *dev);
 void pci_aer_exit(struct pci_dev *dev);
 extern const struct attribute_group aer_stats_attr_group;
 void pci_aer_clear_fatal_status(struct pci_dev *dev);
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pci_aer_clear_status(struct pci_dev *dev);
 int pci_aer_raw_clear_status(struct pci_dev *dev);
 #else
@@ -691,6 +692,7 @@ static inline void pci_no_aer(void) { }
 static inline void pci_aer_init(struct pci_dev *d) { }
 static inline void pci_aer_exit(struct pci_dev *d) { }
 static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
+static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
 static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
 static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return 
-EINVAL; }
 #endif
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 4e637121be23..e2ebd108339d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
 
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
int aer = dev->aer_cap;
u32 status, sev;
 
if (!pcie_aer_is_native(dev))
-   return -EIO;
+   return;
 
/* Clear status bits for ERR_NONFATAL errors only */
pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
@@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
status &= ~sev;
if (status)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
-
-   return 0;
 }
-EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
 
 void pci_aer_clear_fatal_status(struct pci_dev *dev)
 {
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 154690c278cb..f638ad955deb 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -44,7 +44,6 @@ struct aer_capability_regs {
 /* PCIe port driver needs this function to enable AER */
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
 void pci_save_aer_state(struct pci_dev *dev);
 void pci_restore_aer_state(struct pci_dev *dev);
@@ -57,10 +56,6 @@ static inline int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)
 {
return -EINVAL;
 }
-static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
-{
-   return -EINVAL;
-}
 static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
 {
return -EINVAL;
-- 
2.30.1 (Apple Git-130)



[PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()

2022-09-28 Thread Zhuo Chen
lpfc_aer_cleanup_state() requires clearing both fatal and non-fatal
uncorrectable error status. But using pci_aer_clear_nonfatal_status()
will only clear non-fatal error status. To clear both fatal and
non-fatal error status, use pci_aer_clear_uncorrect_error_status().

Signed-off-by: Zhuo Chen 
---
 drivers/scsi/lpfc/lpfc_attr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 09cf2cd0ae60..d835cc0ba153 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -4689,7 +4689,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
  * Description:
  * If the @buf contains 1 and the device currently has the AER support
  * enabled, then invokes the kernel AER helper routine
- * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
+ * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
  * error status register.
  *
  * Notes:
@@ -4715,7 +4715,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct 
device_attribute *attr,
return -EINVAL;
 
if (phba->hba_flag & HBA_AER_ENABLED)
-   rc = pci_aer_clear_nonfatal_status(phba->pcidev);
+   rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
 
if (rc == 0)
return strlen(buf);
-- 
2.30.1 (Apple Git-130)



[PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call

2022-09-28 Thread Zhuo Chen
There is no need to clear error status during init code, so remove it.

Signed-off-by: Zhuo Chen 
---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 0ed6f809ff2e..fed03217289d 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
ret = pci_enable_pcie_error_reporting(pdev);
if (ret != 0)
dev_warn(>dev, "PCIe AER capability disabled\n");
-   else /* Cleanup nonfatal error status before getting to init */
-   pci_aer_clear_nonfatal_status(pdev);
 
/* First enable the PCI device */
ret = pcim_enable_device(pdev);
-- 
2.30.1 (Apple Git-130)



[PATCH v3 2/9] PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-28 Thread Zhuo Chen
pci_aer_clear_uncorrect_error_status() clears both fatal and non-fatal
errors. So use it in place of pci_aer_clear_nonfatal_status()
and pci_aer_clear_fatal_status().

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/dpc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 3e9afee02e8d..7942073fbb34 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
 dpc_get_aer_uncorrect_severity(pdev, ) &&
 aer_get_device_error_info(pdev, )) {
aer_print_error(pdev, );
-   pci_aer_clear_nonfatal_status(pdev);
-   pci_aer_clear_fatal_status(pdev);
+   pci_aer_clear_uncorrect_error_status(pdev);
}
 }
 
-- 
2.30.1 (Apple Git-130)



[PATCH v3 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core

2022-09-28 Thread Zhuo Chen
In lpfc_aer_cleanup_state(), uncorrectable error status needs to be
cleared, which can be done by calling pci_aer_clear_nonfatal_status()
and pci_aer_clear_fatal_status(). Meanwhile they can be combined in
one function (the same in dpc_process_error). So add
pci_aer_clear_uncorrect_error_status() function to PCI core and
export symbol to other modules which wants to use it.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/aer.c | 16 
 include/linux/aer.h|  5 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2d8a74f83c3..4e637121be23 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -286,6 +286,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
 }
 
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+   int aer = dev->aer_cap;
+   u32 status;
+
+   if (!pcie_aer_is_native(dev))
+   return -EIO;
+
+   pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
+   if (status)
+   pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
+
 /**
  * pci_aer_raw_clear_status - Clear AER error registers.
  * @dev: the PCI device
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 97f64ba1b34a..154690c278cb 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -45,6 +45,7 @@ struct aer_capability_regs {
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
 void pci_save_aer_state(struct pci_dev *dev);
 void pci_restore_aer_state(struct pci_dev *dev);
 #else
@@ -60,6 +61,10 @@ static inline int pci_aer_clear_nonfatal_status(struct 
pci_dev *dev)
 {
return -EINVAL;
 }
+static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+   return -EINVAL;
+}
 static inline void pci_save_aer_state(struct pci_dev *dev) {}
 static inline void pci_restore_aer_state(struct pci_dev *dev) {}
 #endif
-- 
2.30.1 (Apple Git-130)



[PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api

2022-09-28 Thread Zhuo Chen
Hello.

Here comes patch v3, which contains some fixes and optimizations of
aer api usage. The v1 and v2 can be found on the mailing list.

v3:
- Modifications to comments proposed by Sathyanarayanan. Remove
  pci_aer_clear_nonfatal_status() call in NTB and improve commit log. 

v2:
- Modifications to comments proposed by Bjorn. Split patch into more
  obvious parts.

Zhuo Chen (9):
  PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
  PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
uncorrectable error status
  NTB: Remove pci_aer_clear_nonfatal_status() call
  scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
  PCI/AER: Unexport pci_aer_clear_nonfatal_status()
  PCI/AER: Move check inside pcie_clear_device_status().
  PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
  PCI/ERR: Clear fatal error status when pci_channel_io_frozen
  PCI/AER: Refine status clearing process with api

 drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
 drivers/pci/pci.c   |  7 +++--
 drivers/pci/pci.h   |  2 ++
 drivers/pci/pcie/aer.c  | 45 +++--
 drivers/pci/pcie/dpc.c  |  3 +--
 drivers/pci/pcie/err.c  | 15 ---
 drivers/pci/pcie/portdrv_core.c |  3 +--
 drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
 include/linux/aer.h |  4 +--
 9 files changed, 44 insertions(+), 41 deletions(-)

-- 
2.30.1 (Apple Git-130)



Re: [PATCH v2 1/6] powerpc/code-patching: Implement generic text patching function

2022-09-28 Thread Christophe Leroy


Le 28/09/2022 à 03:30, Benjamin Gray a écrit :
> On Tue, 2022-09-27 at 06:40 +, Christophe Leroy wrote:
>>> +   /* Flush on the EA that may be executed in case of a non-
>>> coherent icache */
>>> +   icbi(prog_addr);
>>
>> prog_addr is a misleading name ? Is that the address at which you
>> program it ? Is that the address the programs runs at ?
>>
>> exec_addr was a lot more explicit as it clearly defines the address
>> at
>> which the code is executed.
> 
> I'm not sure what it could be confused for other than "the address the
> program uses" (be it uses for executing, or uses as data). I just
> called it that because it's not necessarily executed, so 'exec_addr' is
> misleading (to the extent it matters in the first place...).

For me, at first look it means 'programming address', that is the 
address that is used to perform the programming ie the patching. Hence 
the confusion.

Whereas exec_addr is explicitely for me the address at which the code is 
run.


> 
>>> +   if (IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES < 64)
>>> +   icbi(prog_addr + size - 1);
>>
>> This doesn't exist today.
>>
>> I'd rather have:
>>
>>  BUILD_BUG_ON(IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES <
>> 64);
> 
> Sure, I can adjust the style.

That's not just style ...

> 
>>> +static int __always_inline __do_patch_memory(void *dest, unsigned
>>> long src, size_t size)
>>>    {
>>>  int err;
>>>  u32 *patch_addr;
>>> -   unsigned long text_poke_addr;
>>>  pte_t *pte;
>>> -   unsigned long pfn = get_patch_pfn(addr);
>>> -
>>> -   text_poke_addr = (unsigned
>>> long)__this_cpu_read(text_poke_area)->addr & PAGE_MASK;
>>> -   patch_addr = (u32 *)(text_poke_addr +
>>> offset_in_page(addr));
>>> +   unsigned long text_poke_addr = (unsigned
>>> long)__this_cpu_read(text_poke_area)->addr & PAGE_MASK;
>>> +   unsigned long pfn = get_patch_pfn(dest);
>>>
>>> +   patch_addr = (u32 *)(text_poke_addr +
>>> offset_in_page(dest));
>>
>> Can we avoid this churn ?
>> Ok, you want to change 'addr' to 'dest', can we leave everything else
>> as
>> is ?
> 
> 'addr' was only renamed because the v1 used a pointer to the data, so
> 'addr' was ambiguous. I'll restore it to 'addr' for v3.
> 
> I'll also restore the formatting.

Then maybe the 'src' should be renamed 'val' or something else.

Christophe

[PATCH v4 5/5] drm/ofdrm: Support big-endian scanout buffers

2022-09-28 Thread Thomas Zimmermann
All DRM formats assume little-endian byte order. On big-endian systems,
it is likely that the scanout buffer is in big endian as well. Update
the format accordingly and add endianess conversion to the format-helper
library. Also opt-in to allocated buffers in host format by default.

Suggested-by: Geert Uytterhoeven 
Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_format_helper.c | 10 ++
 drivers/gpu/drm/tiny/ofdrm.c| 55 +++--
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_format_helper.c 
b/drivers/gpu/drm/drm_format_helper.c
index 4afc4ac27342..fca7936db083 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -659,6 +659,11 @@ int drm_fb_blit(struct iosys_map *dst, const unsigned int 
*dst_pitch, uint32_t d
drm_fb_xrgb_to_rgb565(dst, dst_pitch, src, fb, 
clip, false);
return 0;
}
+   } else if (dst_format == (DRM_FORMAT_RGB565 | DRM_FORMAT_BIG_ENDIAN)) {
+   if (fb_format == DRM_FORMAT_RGB565) {
+   drm_fb_swab(dst, dst_pitch, src, fb, clip, false);
+   return 0;
+   }
} else if (dst_format == DRM_FORMAT_RGB888) {
if (fb_format == DRM_FORMAT_XRGB) {
drm_fb_xrgb_to_rgb888(dst, dst_pitch, src, fb, 
clip);
@@ -677,6 +682,11 @@ int drm_fb_blit(struct iosys_map *dst, const unsigned int 
*dst_pitch, uint32_t d
drm_fb_xrgb_to_xrgb2101010(dst, dst_pitch, src, fb, 
clip);
return 0;
}
+   } else if (dst_format == DRM_FORMAT_BGRX) {
+   if (fb_format == DRM_FORMAT_XRGB) {
+   drm_fb_swab(dst, dst_pitch, src, fb, clip, false);
+   return 0;
+   }
}
 
drm_warn_once(fb->dev, "No conversion helper from %p4cc to %p4cc 
found.\n",
diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index 0bf5eebf6678..6e100a7f5db7 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -94,7 +94,7 @@ static int display_get_validated_int0(struct drm_device *dev, 
const char *name,
 }
 
 static const struct drm_format_info *display_get_validated_format(struct 
drm_device *dev,
- u32 depth)
+ u32 depth, 
bool big_endian)
 {
const struct drm_format_info *info;
u32 format;
@@ -115,6 +115,29 @@ static const struct drm_format_info 
*display_get_validated_format(struct drm_dev
return ERR_PTR(-EINVAL);
}
 
+   /*
+* DRM formats assume little-endian byte order. Update the format
+* if the scanout buffer uses big-endian ordering.
+*/
+   if (big_endian) {
+   switch (format) {
+   case DRM_FORMAT_XRGB:
+   format = DRM_FORMAT_BGRX;
+   break;
+   case DRM_FORMAT_ARGB:
+   format = DRM_FORMAT_BGRA;
+   break;
+   case DRM_FORMAT_RGB565:
+   format = DRM_FORMAT_RGB565 | DRM_FORMAT_BIG_ENDIAN;
+   break;
+   case DRM_FORMAT_XRGB1555:
+   format = DRM_FORMAT_XRGB1555 | DRM_FORMAT_BIG_ENDIAN;
+   break;
+   default:
+   break;
+   }
+   }
+
info = drm_format_info(format);
if (!info) {
drm_err(dev, "cannot find framebuffer format for depth %u\n", 
depth);
@@ -134,6 +157,23 @@ static int display_read_u32_of(struct drm_device *dev, 
struct device_node *of_no
return ret;
 }
 
+static bool display_get_big_endian_of(struct drm_device *dev, struct 
device_node *of_node)
+{
+   bool big_endian;
+
+#ifdef __BIG_ENDIAN
+   big_endian = true;
+   if (of_get_property(of_node, "little-endian", NULL))
+   big_endian = false;
+#else
+   big_endian = false;
+   if (of_get_property(of_node, "big-endian", NULL))
+   big_endian = true;
+#endif
+
+   return big_endian;
+}
+
 static int display_get_width_of(struct drm_device *dev, struct device_node 
*of_node)
 {
u32 width;
@@ -613,6 +653,7 @@ static void ofdrm_device_set_gamma_linear(struct 
ofdrm_device *odev,
 
switch (format->format) {
case DRM_FORMAT_RGB565:
+   case DRM_FORMAT_RGB565 | DRM_FORMAT_BIG_ENDIAN:
/* Use better interpolation, to take 32 values from 0 to 255 */
for (i = 0; i < OFDRM_GAMMA_LUT_SIZE / 8; i++) {
unsigned char r = i * 8 + i / 4;
@@ -631,6 +672,7 @@ static void ofdrm_device_set_gamma_linear(struct 
ofdrm_device *odev,
}

[PATCH v4 4/5] drm/ofdrm: Support color management

2022-09-28 Thread Thomas Zimmermann
Support the CRTC's color-management property and implement each model's
palette support.

The OF hardware has different methods of setting the palette. The
respective code has been taken from fbdev's offb and refactored into
per-model device functions. The device functions integrate this
functionality into the overall modesetting.

As palette handling is a CRTC property that depends on the primary
plane's color format, the plane's atomic_check helper now updates the
format field in ofdrm's custom CRTC state. The CRTC's atomic_flush
helper updates the palette for the format as needed.

v4:
* use cpu_to_be32() (Geert)
v3:
* lookup CRTC state with drm_atomic_get_new_crtc_state()
* access HW palette with writeb(), writel(), and readl() (Ben)
* declare register values as u32

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/tiny/ofdrm.c | 442 ++-
 1 file changed, 437 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index 5abed3ec0a35..0bf5eebf6678 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -31,6 +32,33 @@
 #define PCI_VENDOR_ID_ATI_R520 0x7100
 #define PCI_VENDOR_ID_ATI_R600 0x9400
 
+#define OFDRM_GAMMA_LUT_SIZE   256
+
+/* Definitions used by the Avivo palette  */
+#define AVIVO_DC_LUT_RW_SELECT  0x6480
+#define AVIVO_DC_LUT_RW_MODE0x6484
+#define AVIVO_DC_LUT_RW_INDEX   0x6488
+#define AVIVO_DC_LUT_SEQ_COLOR  0x648c
+#define AVIVO_DC_LUT_PWL_DATA   0x6490
+#define AVIVO_DC_LUT_30_COLOR   0x6494
+#define AVIVO_DC_LUT_READ_PIPE_SELECT   0x6498
+#define AVIVO_DC_LUT_WRITE_EN_MASK  0x649c
+#define AVIVO_DC_LUT_AUTOFILL   0x64a0
+#define AVIVO_DC_LUTA_CONTROL   0x64c0
+#define AVIVO_DC_LUTA_BLACK_OFFSET_BLUE 0x64c4
+#define AVIVO_DC_LUTA_BLACK_OFFSET_GREEN0x64c8
+#define AVIVO_DC_LUTA_BLACK_OFFSET_RED  0x64cc
+#define AVIVO_DC_LUTA_WHITE_OFFSET_BLUE 0x64d0
+#define AVIVO_DC_LUTA_WHITE_OFFSET_GREEN0x64d4
+#define AVIVO_DC_LUTA_WHITE_OFFSET_RED  0x64d8
+#define AVIVO_DC_LUTB_CONTROL   0x6cc0
+#define AVIVO_DC_LUTB_BLACK_OFFSET_BLUE 0x6cc4
+#define AVIVO_DC_LUTB_BLACK_OFFSET_GREEN0x6cc8
+#define AVIVO_DC_LUTB_BLACK_OFFSET_RED  0x6ccc
+#define AVIVO_DC_LUTB_WHITE_OFFSET_BLUE 0x6cd0
+#define AVIVO_DC_LUTB_WHITE_OFFSET_GREEN0x6cd4
+#define AVIVO_DC_LUTB_WHITE_OFFSET_RED  0x6cd8
+
 enum ofdrm_model {
OFDRM_MODEL_UNKNOWN,
OFDRM_MODEL_MACH64, /* ATI Mach64 */
@@ -211,7 +239,14 @@ static enum ofdrm_model display_get_model_of(struct 
drm_device *dev, struct devi
  * Open Firmware display device
  */
 
+struct ofdrm_device;
+
 struct ofdrm_device_funcs {
+   void __iomem *(*cmap_ioremap)(struct ofdrm_device *odev,
+ struct device_node *of_node,
+ u64 fb_bas);
+   void (*cmap_write)(struct ofdrm_device *odev, unsigned char index,
+  unsigned char r, unsigned char g, unsigned char b);
 };
 
 struct ofdrm_device {
@@ -226,6 +261,9 @@ struct ofdrm_device {
const struct drm_format_info *format;
unsigned int pitch;
 
+   /* colormap */
+   void __iomem *cmap_base;
+
/* modesetting */
uint32_t formats[8];
struct drm_plane primary_plane;
@@ -338,12 +376,322 @@ static struct resource *ofdrm_find_fb_resource(struct 
ofdrm_device *odev,
return max_res;
 }
 
+/*
+ * Colormap / Palette
+ */
+
+static void __iomem *get_cmap_address_of(struct ofdrm_device *odev, struct 
device_node *of_node,
+int bar_no, unsigned long offset, 
unsigned long size)
+{
+   struct drm_device *dev = >dev;
+   const __be32 *addr_p;
+   u64 max_size, address;
+   unsigned int flags;
+   void __iomem *mem;
+
+   addr_p = of_get_pci_address(of_node, bar_no, _size, );
+   if (!addr_p)
+   addr_p = of_get_address(of_node, bar_no, _size, );
+   if (!addr_p)
+   return ERR_PTR(-ENODEV);
+
+   if ((flags & (IORESOURCE_IO | IORESOURCE_MEM)) == 0)
+   return ERR_PTR(-ENODEV);
+
+   if ((offset + size) >= max_size)
+   return ERR_PTR(-ENODEV);
+
+   address = of_translate_address(of_node, addr_p);
+   if (address == OF_BAD_ADDR)
+   return ERR_PTR(-ENODEV);
+
+   mem = devm_ioremap(dev->dev, address + offset, size);
+   if (!mem)
+   return ERR_PTR(-ENOMEM);
+
+   return mem;
+}
+
+static void __iomem *ofdrm_mach64_cmap_ioremap(struct ofdrm_device *odev,
+   

[PATCH v4 2/5] drm/ofdrm: Add CRTC state

2022-09-28 Thread Thomas Zimmermann
Add a dedicated CRTC state to ofdrm to later store information for
palette updates.

v3:
* rework CRTC state helpers (Javier)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/tiny/ofdrm.c | 59 ++--
 1 file changed, 56 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index 98bd99ab7e46..d53ca70934b5 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -278,6 +278,21 @@ static struct resource *ofdrm_find_fb_resource(struct 
ofdrm_device *odev,
  * Modesetting
  */
 
+struct ofdrm_crtc_state {
+   struct drm_crtc_state base;
+};
+
+static struct ofdrm_crtc_state *to_ofdrm_crtc_state(struct drm_crtc_state 
*base)
+{
+   return container_of(base, struct ofdrm_crtc_state, base);
+}
+
+static void ofdrm_crtc_state_destroy(struct ofdrm_crtc_state *ofdrm_crtc_state)
+{
+   __drm_atomic_helper_crtc_destroy_state(_crtc_state->base);
+   kfree(ofdrm_crtc_state);
+}
+
 /*
  * Support all formats of OF display and maybe more; in order
  * of preference. The display's update function will do any
@@ -428,13 +443,51 @@ static const struct drm_crtc_helper_funcs 
ofdrm_crtc_helper_funcs = {
.atomic_check = ofdrm_crtc_helper_atomic_check,
 };
 
+static void ofdrm_crtc_reset(struct drm_crtc *crtc)
+{
+   struct ofdrm_crtc_state *ofdrm_crtc_state =
+   kzalloc(sizeof(*ofdrm_crtc_state), GFP_KERNEL);
+
+   if (crtc->state)
+   ofdrm_crtc_state_destroy(to_ofdrm_crtc_state(crtc->state));
+
+   if (ofdrm_crtc_state)
+   __drm_atomic_helper_crtc_reset(crtc, _crtc_state->base);
+   else
+   __drm_atomic_helper_crtc_reset(crtc, NULL);
+}
+
+static struct drm_crtc_state *ofdrm_crtc_atomic_duplicate_state(struct 
drm_crtc *crtc)
+{
+   struct drm_device *dev = crtc->dev;
+   struct drm_crtc_state *crtc_state = crtc->state;
+   struct ofdrm_crtc_state *new_ofdrm_crtc_state;
+
+   if (drm_WARN_ON(dev, !crtc_state))
+   return NULL;
+
+   new_ofdrm_crtc_state = kzalloc(sizeof(*new_ofdrm_crtc_state), 
GFP_KERNEL);
+   if (!new_ofdrm_crtc_state)
+   return NULL;
+
+   __drm_atomic_helper_crtc_duplicate_state(crtc, 
_ofdrm_crtc_state->base);
+
+   return _ofdrm_crtc_state->base;
+}
+
+static void ofdrm_crtc_atomic_destroy_state(struct drm_crtc *crtc,
+   struct drm_crtc_state *crtc_state)
+{
+   ofdrm_crtc_state_destroy(to_ofdrm_crtc_state(crtc_state));
+}
+
 static const struct drm_crtc_funcs ofdrm_crtc_funcs = {
-   .reset = drm_atomic_helper_crtc_reset,
+   .reset = ofdrm_crtc_reset,
.destroy = drm_crtc_cleanup,
.set_config = drm_atomic_helper_set_config,
.page_flip = drm_atomic_helper_page_flip,
-   .atomic_duplicate_state = drm_atomic_helper_crtc_duplicate_state,
-   .atomic_destroy_state = drm_atomic_helper_crtc_destroy_state,
+   .atomic_duplicate_state = ofdrm_crtc_atomic_duplicate_state,
+   .atomic_destroy_state = ofdrm_crtc_atomic_destroy_state,
 };
 
 static int ofdrm_connector_helper_get_modes(struct drm_connector *connector)
-- 
2.37.3



[PATCH v4 3/5] drm/ofdrm: Add per-model device function

2022-09-28 Thread Thomas Zimmermann
Add a per-model device-function structure in preparation of adding
color-management support. Detection of the individual models has been
taken from fbdev's offb.

v3:
* define constants for PCI ids (Javier)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/tiny/ofdrm.c | 125 +++
 1 file changed, 125 insertions(+)

diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index d53ca70934b5..5abed3ec0a35 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -28,6 +28,21 @@
 #define DRIVER_MAJOR   1
 #define DRIVER_MINOR   0
 
+#define PCI_VENDOR_ID_ATI_R520 0x7100
+#define PCI_VENDOR_ID_ATI_R600 0x9400
+
+enum ofdrm_model {
+   OFDRM_MODEL_UNKNOWN,
+   OFDRM_MODEL_MACH64, /* ATI Mach64 */
+   OFDRM_MODEL_RAGE128, /* ATI Rage128 */
+   OFDRM_MODEL_RAGE_M3A, /* ATI Rage Mobility M3 Head A */
+   OFDRM_MODEL_RAGE_M3B, /* ATI Rage Mobility M3 Head B */
+   OFDRM_MODEL_RADEON, /* ATI Radeon */
+   OFDRM_MODEL_GXT2000, /* IBM GXT2000 */
+   OFDRM_MODEL_AVIVO, /* ATI R5xx */
+   OFDRM_MODEL_QEMU, /* QEMU VGA */
+};
+
 /*
  * Helpers for display nodes
  */
@@ -148,14 +163,63 @@ static u64 display_get_address_of(struct drm_device *dev, 
struct device_node *of
return address;
 }
 
+static bool is_avivo(__be32 vendor, __be32 device)
+{
+   /* This will match most R5xx */
+   return (vendor == PCI_VENDOR_ID_ATI) &&
+  ((device >= PCI_VENDOR_ID_ATI_R520 && device < 0x7800) ||
+   (PCI_VENDOR_ID_ATI_R600 >= 0x9400));
+}
+
+static enum ofdrm_model display_get_model_of(struct drm_device *dev, struct 
device_node *of_node)
+{
+   enum ofdrm_model model = OFDRM_MODEL_UNKNOWN;
+
+   if (of_node_name_prefix(of_node, "ATY,Rage128")) {
+   model = OFDRM_MODEL_RAGE128;
+   } else if (of_node_name_prefix(of_node, "ATY,RageM3pA") ||
+  of_node_name_prefix(of_node, "ATY,RageM3p12A")) {
+   model = OFDRM_MODEL_RAGE_M3A;
+   } else if (of_node_name_prefix(of_node, "ATY,RageM3pB")) {
+   model = OFDRM_MODEL_RAGE_M3B;
+   } else if (of_node_name_prefix(of_node, "ATY,Rage6")) {
+   model = OFDRM_MODEL_RADEON;
+   } else if (of_node_name_prefix(of_node, "ATY,")) {
+   return OFDRM_MODEL_MACH64;
+   } else if (of_device_is_compatible(of_node, "pci1014,b7") ||
+  of_device_is_compatible(of_node, "pci1014,21c")) {
+   model = OFDRM_MODEL_GXT2000;
+   } else if (of_node_name_prefix(of_node, "vga,Display-")) {
+   struct device_node *of_parent;
+   const __be32 *vendor_p, *device_p;
+
+   /* Look for AVIVO initialized by SLOF */
+   of_parent = of_get_parent(of_node);
+   vendor_p = of_get_property(of_parent, "vendor-id", NULL);
+   device_p = of_get_property(of_parent, "device-id", NULL);
+   if (vendor_p && device_p && is_avivo(*vendor_p, *device_p))
+   model = OFDRM_MODEL_AVIVO;
+   of_node_put(of_parent);
+   } else if (of_device_is_compatible(of_node, "qemu,std-vga")) {
+   model = OFDRM_MODEL_QEMU;
+   }
+
+   return model;
+}
+
 /*
  * Open Firmware display device
  */
 
+struct ofdrm_device_funcs {
+};
+
 struct ofdrm_device {
struct drm_device dev;
struct platform_device *pdev;
 
+   const struct ofdrm_device_funcs *funcs;
+
/* firmware-buffer settings */
struct iosys_map screen_base;
struct drm_display_mode mode;
@@ -519,6 +583,33 @@ static const struct drm_mode_config_funcs 
ofdrm_mode_config_funcs = {
  * Init / Cleanup
  */
 
+static const struct ofdrm_device_funcs ofdrm_unknown_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_mach64_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_rage128_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_rage_m3a_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_rage_m3b_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_radeon_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_gxt2000_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_avivo_device_funcs = {
+};
+
+static const struct ofdrm_device_funcs ofdrm_qemu_device_funcs = {
+};
+
 static struct drm_display_mode ofdrm_mode(unsigned int width, unsigned int 
height)
 {
/*
@@ -540,6 +631,7 @@ static struct ofdrm_device *ofdrm_device_create(struct 
drm_driver *drv,
struct device_node *of_node = pdev->dev.of_node;
struct ofdrm_device *odev;
struct drm_device *dev;
+   enum ofdrm_model model;
int width, height, depth, linebytes;
const struct drm_format_info *format;
u64 address;
@@ -568,6 +660,39 @@ static struct ofdrm_device 

[PATCH v4 0/5] drm: Add driver for PowerPC OF displays

2022-09-28 Thread Thomas Zimmermann
PowerPC's Open Firmware offers a simple display buffer for graphics
output. Add ofdrm, a DRM driver for the device. As with the existing
simpledrm driver, the graphics hardware is pre-initialized by the
firmware. The driver only provides blitting, no actual DRM modesetting
is possible.

Patch 1 adds ofdrm, which has again been significantly reworked.
The FWFB library has been removed infavor of various functions in
existing DRM helper libraries. Ofdrm now supports damage iterators
and synchronization for imported GEM BOs.

Patches 2 to 4 add support for color management. The code has been
taken from fbdev's offb. I have no hardware available for testing the
functionality. Qemu's stdvga apparently does not support gamma tables
in RGB modes. I verified that the color management code is executed
by running Gnome's night-mode settings, but the display's color tone
does not change.

Patch 5, which is new in version 4 of this patchset, adds support for
big-endian scanout buffers. It works at least with qemu's ppc64
emulation. Fbdev emulation and pixman rendering works. GL rendering
produces incorrect colors.

Tested by running fbdev emulation, Wayland Gnome, and Weston on qemu's
ppc64le and ppc64 emulation. 

Thomas Zimmermann (5):
  drm/ofdrm: Add ofdrm for Open Firmware framebuffers
  drm/ofdrm: Add CRTC state
  drm/ofdrm: Add per-model device function
  drm/ofdrm: Support color management
  drm/ofdrm: Support big-endian scanout buffers

 MAINTAINERS |1 +
 drivers/gpu/drm/drm_format_helper.c |   10 +
 drivers/gpu/drm/tiny/Kconfig|   13 +
 drivers/gpu/drm/tiny/Makefile   |1 +
 drivers/gpu/drm/tiny/ofdrm.c| 1421 +++
 drivers/video/fbdev/Kconfig |1 +
 6 files changed, 1447 insertions(+)
 create mode 100644 drivers/gpu/drm/tiny/ofdrm.c


base-commit: eee1f4330f388247943e97b93008ef11ababfda0
-- 
2.37.3



[PATCH v4 1/5] drm/ofdrm: Add ofdrm for Open Firmware framebuffers

2022-09-28 Thread Thomas Zimmermann
Open Firmware provides basic display output via the 'display' node.
DT platform code already provides a device that represents the node's
framebuffer. Add a DRM driver for the device. The display mode and
color format is pre-initialized by the system's firmware. Runtime
modesetting via DRM is not possible. The display is useful during
early boot stages or as error fallback.

Similar functionality is already provided by fbdev's offb driver,
which is insufficient for modern userspace. The old driver includes
support for BootX device tree, which can be found on old 32-bit
PowerPC Macintosh systems. If these are still in use, the
functionality can be added to ofdrm or implemented in a new
driver. As with simpledrm, the fbdev driver cannot be selected if
ofdrm is already enabled.

Two notable points about the driver:

 * Reading the framebuffer aperture from the device tree is not
reliable on all systems. Ofdrm takes the heuristics and a comment
from offb to pick the correct range.

 * No resource management may be tied to the underlying PCI device.
Otherwise the handover to the native driver will fail with a resource
conflict. PCI management is therefore done as part of the platform
device's cleanup.

The driver has been tested on qemu's ppc64le emulation. The device
hand-over has been tested with bochs.

v4:
* set preferred depth to the correct value
* set bpp value for console emulation
* output scanout-buffer parameters with drm_dbg()
v3:
* reintegrate FWFB helpers into ofdrm
* use damage iterator
* sync GEM BOs with drm_gem_fb_{begin,end}_cpu_access()
* fix various atomic_check helpers
* remove CRTC atomic_{enable,disable} (Javier)
* compute stride with drm_format_info_min_pitch() (Daniel)
v2:
* removed simple-pipe helpers
* built driver on top of FWFB helpers
* merged all init code into single function
* make PCI support optional (Michal)
* support COMPILE_TEST (Javier)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 MAINTAINERS   |   1 +
 drivers/gpu/drm/tiny/Kconfig  |  13 +
 drivers/gpu/drm/tiny/Makefile |   1 +
 drivers/gpu/drm/tiny/ofdrm.c  | 760 ++
 drivers/video/fbdev/Kconfig   |   1 +
 5 files changed, 776 insertions(+)
 create mode 100644 drivers/gpu/drm/tiny/ofdrm.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7c7db809473e..dcb443f2496b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6656,6 +6656,7 @@ L:dri-de...@lists.freedesktop.org
 S: Maintained
 T: git git://anongit.freedesktop.org/drm/drm-misc
 F: drivers/gpu/drm/drm_aperture.c
+F: drivers/gpu/drm/tiny/ofdrm.c
 F: drivers/gpu/drm/tiny/simpledrm.c
 F: drivers/video/aperture.c
 F: include/drm/drm_aperture.h
diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
index 565957264875..a300b03a3c7a 100644
--- a/drivers/gpu/drm/tiny/Kconfig
+++ b/drivers/gpu/drm/tiny/Kconfig
@@ -51,6 +51,19 @@ config DRM_GM12U320
 This is a KMS driver for projectors which use the GM12U320 chipset
 for video transfer over USB2/3, such as the Acer C120 mini projector.
 
+config DRM_OFDRM
+   tristate "Open Firmware display driver"
+   depends on DRM && OF && (PPC || COMPILE_TEST)
+   select APERTURE_HELPERS
+   select DRM_GEM_SHMEM_HELPER
+   select DRM_KMS_HELPER
+   help
+ DRM driver for Open Firmware framebuffers.
+
+ This driver assumes that the display hardware has been initialized
+ by the Open Firmware before the kernel boots. Scanout buffer, size,
+ and display format must be provided via device tree.
+
 config DRM_PANEL_MIPI_DBI
tristate "DRM support for MIPI DBI compatible panels"
depends on DRM && SPI
diff --git a/drivers/gpu/drm/tiny/Makefile b/drivers/gpu/drm/tiny/Makefile
index 1d9d6227e7ab..76dde89a044b 100644
--- a/drivers/gpu/drm/tiny/Makefile
+++ b/drivers/gpu/drm/tiny/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_DRM_ARCPGU)+= arcpgu.o
 obj-$(CONFIG_DRM_BOCHS)+= bochs.o
 obj-$(CONFIG_DRM_CIRRUS_QEMU)  += cirrus.o
 obj-$(CONFIG_DRM_GM12U320) += gm12u320.o
+obj-$(CONFIG_DRM_OFDRM)+= ofdrm.o
 obj-$(CONFIG_DRM_PANEL_MIPI_DBI)   += panel-mipi-dbi.o
 obj-$(CONFIG_DRM_SIMPLEDRM)+= simpledrm.o
 obj-$(CONFIG_TINYDRM_HX8357D)  += hx8357d.o
diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
new file mode 100644
index ..98bd99ab7e46
--- /dev/null
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -0,0 +1,760 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_NAME"ofdrm"

Re: [PATCH v2 1/2] powerpc/pseries: block untrusted device tree changes when locked down

2022-09-28 Thread Andrew Donnellan
On Mon, 2022-09-26 at 08:16 -0500, Nathan Lynch wrote:
> The /proc/powerpc/ofdt interface allows the root user to freely alter
> the in-kernel device tree, enabling arbitrary physical address writes
> via drivers that could bind to malicious device nodes, thus making it
> possible to disable lockdown.
> 
> Historically this interface has been used on the pseries platform to
> facilitate the runtime addition and removal of processor, memory, and
> device resources (aka Dynamic Logical Partitioning or DLPAR). Years
> ago, the processor and memory use cases were migrated to designs that
> happen to be lockdown-friendly: device tree updates are communicated
> directly to the kernel from firmware without passing through
> untrusted
> user space. I/O device DLPAR via the "drmgr" command in powerpc-utils
> remains the sole legitimate user of /proc/powerpc/ofdt, but it is
> already broken in lockdown since it uses /dev/mem to allocate
> argument
> buffers for the rtas syscall. So only illegitimate uses of the
> interface should see a behavior change when running on a locked down
> kernel.
> 
> Signed-off-by: Nathan Lynch 

Seems sensible to me.

Reviewed-by: Andrew Donnellan 

> ---
>  arch/powerpc/platforms/pseries/reconfig.c | 5 +
>  include/linux/security.h  | 1 +
>  security/security.c   | 1 +
>  3 files changed, 7 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/reconfig.c
> b/arch/powerpc/platforms/pseries/reconfig.c
> index cad7a0c93117..599bd2c78514 100644
> --- a/arch/powerpc/platforms/pseries/reconfig.c
> +++ b/arch/powerpc/platforms/pseries/reconfig.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -361,6 +362,10 @@ static ssize_t ofdt_write(struct file *file,
> const char __user *buf, size_t coun
> char *kbuf;
> char *tmp;
>  
> +   rv = security_locked_down(LOCKDOWN_DEVICE_TREE);
> +   if (rv)
> +   return rv;
> +
> kbuf = memdup_user_nul(buf, count);
> if (IS_ERR(kbuf))
> return PTR_ERR(kbuf);
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 7bd0c490703d..39e7c0e403d9 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -114,6 +114,7 @@ enum lockdown_reason {
> LOCKDOWN_IOPORT,
> LOCKDOWN_MSR,
> LOCKDOWN_ACPI_TABLES,
> +   LOCKDOWN_DEVICE_TREE,
> LOCKDOWN_PCMCIA_CIS,
> LOCKDOWN_TIOCSSERIAL,
> LOCKDOWN_MODULE_PARAMETERS,
> diff --git a/security/security.c b/security/security.c
> index 4b95de24bc8d..51bf66d4f472 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -52,6 +52,7 @@ const char *const
> lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
> [LOCKDOWN_IOPORT] = "raw io port access",
> [LOCKDOWN_MSR] = "raw MSR access",
> [LOCKDOWN_ACPI_TABLES] = "modifying ACPI tables",
> +   [LOCKDOWN_DEVICE_TREE] = "modifying device tree contents",
> [LOCKDOWN_PCMCIA_CIS] = "direct PCMCIA CIS storage",
> [LOCKDOWN_TIOCSSERIAL] = "reconfiguration of serial port IO",
> [LOCKDOWN_MODULE_PARAMETERS] = "unsafe module parameters",

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited



Re: [PATCH v2 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core

2022-09-28 Thread Zhuo Chen




On 9/28/22 3:31 AM, Sathyanarayanan Kuppuswamy wrote:

Hi,

On 9/27/22 8:35 AM, Zhuo Chen wrote:

Sometimes we need to clear aer uncorrectable error status, so we add


Adding n actual use case will help.


pci_aer_clear_uncorrect_error_status() to PCI core.


If possible, try to avoid "we" usage in commit log. Just say "so add
pci_aer_clear_uncorrect_error_status() function"



Signed-off-by: Zhuo Chen 
---
  drivers/pci/pcie/aer.c | 16 
  include/linux/aer.h|  5 +
  2 files changed, 21 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2d8a74f83c3..4e637121be23 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -286,6 +286,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
  }
  
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)

+{
+   int aer = dev->aer_cap;
+   u32 status;
+
+   if (!pcie_aer_is_native(dev))
+   return -EIO;
+
+   pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
+   if (status)
+   pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);


Why not just write all '1' and clear it? Why read and write?


Hi Sathyanarayanan,

The current implementation and the previous implementation are all first 
read and then write to clear the status. Currently just be consistent 
with them:

 - aer_enable_rootport()
 - pci_aer_raw_clear_status()
 - pcibios_plat_dev_init() in arch/mips/pci/pci-octeon.c
 - commit fd3362cb73de ("PCI/AER: Squash aerdrv_core.c into aerdrv.c")
   pci_cleanup_aer_uncorrect_error_status



+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);


Add details about why you want to export in commit log.


Thanks,

I will add details and improve commit log in next version.



+
  /**
   * pci_aer_raw_clear_status - Clear AER error registers.
   * @dev: the PCI device
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 97f64ba1b34a..154690c278cb 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -45,6 +45,7 @@ struct aer_capability_regs {
  int pci_enable_pcie_error_reporting(struct pci_dev *dev);
  int pci_disable_pcie_error_reporting(struct pci_dev *dev);
  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
  void pci_save_aer_state(struct pci_dev *dev);
  void pci_restore_aer_state(struct pci_dev *dev);
  #else
@@ -60,6 +61,10 @@ static inline int pci_aer_clear_nonfatal_status(struct 
pci_dev *dev)
  {
return -EINVAL;
  }
+static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+   return -EINVAL;
+}
  static inline void pci_save_aer_state(struct pci_dev *dev) {}
  static inline void pci_restore_aer_state(struct pci_dev *dev) {}
  #endif




--
Thanks,
Zhuo Chen


[PATCH] powerpc/8xx: Reverse order entries are written by __set_pte_at()

2022-09-28 Thread Christophe Leroy
At the time being, with 16k pages __set_pte_at() writes table entries
in reverse order:

 294:   91 49 00 0c stw r10,12(r9)
 298:   91 49 00 08 stw r10,8(r9)
 29c:   91 49 00 04 stw r10,4(r9)
 2a0:   91 49 00 00 stw r10,0(r9)

Allthough there should be no impact at all as it stays in a single
cacheline, reverse the writing in a more natural order.

 288:   91 49 00 0c stw r10,0(r9)
 28c:   91 49 00 08 stw r10,4(r9)
 290:   91 49 00 04 stw r10,8(r9)
 294:   91 49 00 00 stw r10,12(r9)

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/pgtable.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 4fd73c7412d0..d2e39823af09 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -183,7 +183,7 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
 * cases, and 32-bit non-hash with 32-bit PTEs.
 */
 #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
-   ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
+   ptep->pte3 = ptep->pte2 = ptep->pte1 = ptep->pte = pte_val(pte);
 #else
*ptep = pte;
 #endif
-- 
2.37.1



[PATCH] powerpc/8xx: Simplify pte_update() with 16k pages

2022-09-28 Thread Christophe Leroy
While looking at code generated for code patching, I saw that
pte_clear generated:

 2d8:   38 a0 00 00 li  r5,0
 2dc:   38 e0 10 00 li  r7,4096
 2e0:   39 00 20 00 li  r8,8192
 2e4:   39 40 30 00 li  r10,12288
 2e8:   90 a9 00 00 stw r5,0(r9)
 2ec:   90 e9 00 04 stw r7,4(r9)
 2f0:   91 09 00 08 stw r8,8(r9)
 2f4:   91 49 00 0c stw r10,12(r9)

With 16k pages, only the first entry is used by the kernel, so no need
to adapt the address of other entries. Only duplicate the first entry
for hardware.

Now it is:

 2cc:   39 40 00 00 li  r10,0
 2d0:   91 49 00 00 stw r10,0(r9)
 2d4:   91 49 00 04 stw r10,4(r9)
 2d8:   91 49 00 08 stw r10,8(r9)
 2dc:   91 49 00 0c stw r10,12(r9)

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 9091e4904a6b..981e414bdeef 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -256,8 +256,14 @@ static inline pte_basic_t pte_update(struct mm_struct *mm, 
unsigned long addr, p
 
num = number_of_cells_per_pte(pmd, new, huge);
 
-   for (i = 0; i < num; i++, entry++, new += SZ_4K)
-   *entry = new;
+   for (i = 0; i < num; i += PAGE_SIZE / SZ_4K, new += PAGE_SIZE) {
+   *entry++ = new;
+   if (IS_ENABLED(CONFIG_PPC_16K_PAGES) && num != 1) {
+   *entry++ = new;
+   *entry++ = new;
+   *entry++ = new;
+   }
+   }
 
return old;
 }
-- 
2.37.1