Re: [PATCH 0/5] s390/pci: automatic error recovery
On Mon, Sep 6, 2021 at 4:49 AM Niklas Schnelle wrote: > I believe we might be the first > implementation of PCI device recovery in a virtualized setting requiring > us to > coordinate the device reset with the hypervisor platform by issuing a > disable > and re-enable to the platform as well as starting the recovery following > a platform event. > I recall none of the details, but SRIOV is a standardized system for sharing a PCI device across multiple virtual machines. It has detailed info on what the hypervisor must do, and what the local OS instance must do to accomplish this. It's part of the PCI standard, and its more than a decade old now, maybe two. Being a part of the PCI standard, it was interoperable with error recovery, to the best of my recollection. At the time it was introduced, it got pushed very aggressively. The x86 hypervisor vendors were aiming at the heart of zseries, and were militant about it. -- Linas -- Patrick: Are they laughing at us? Sponge Bob: No, Patrick, they are laughing next to us.
Re: [PATCH 0/5] s390/pci: automatic error recovery
On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle wrote: > > Patch 3 I already sent separately resulting in the discussion below but > without > a final conclusion. > > https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/ > > I believe even though there were some doubts about the use of > pci_dev_is_added() by arch code the existing uses as well as the use in the > final patch of this series warrant this export. The use of pci_dev_is_added() in arch/powerpc was because in the past pci_bus_add_device() could be called before pci_device_add(). That was fixed a while ago so It should be safe to remove those calls now. > Patch 4 "PCI: Export pci_dev_lock()" is basically an extension to commit > e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") which > already exported pci_dev_trylock(). In the final patch we make use of > pci_dev_lock() to wait for any other exclusive uses of the pdev to be finished > before starting recovery. Hmm, I noticed the EEH (arch/powerpc/kernel/eeh_driver.c:eeh_pe_report_edev()) and the generic PCIe error recovery code (see drivers/pci/pcie/err.c:report_error_detected()) only call device_lock() before entering the driver's error handling callbacks. I wonder if they should be using pci_dev_lock() instead. The only real difference is that pci_dev_lock() will also block user space from accessing the device's config space while error recovery is in progress which seems sensible enough.
Re: [PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h
Hi Niklas, I love your patch! Yet something to improve: [auto build test ERROR on s390/features] [also build test ERROR on next-20210906] [cannot apply to pci/next powerpc/next v5.14] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309 base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features config: i386-allyesconfig (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce (this is a W=1 build): # https://github.com/0day-ci/linux/commit/404ed8c00a612e7ae31c50557c80c6726c464863 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309 git checkout 404ed8c00a612e7ae31c50557c80c6726c464863 # save the attached .config to linux build tree make W=1 ARCH=i386 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): drivers/pci/hotplug/acpiphp_glue.c: In function 'acpiphp_add_context': >> drivers/pci/hotplug/acpiphp_glue.c:330:6: error: implicit declaration of >> function 'pci_bus_read_dev_vendor_id' [-Werror=implicit-function-declaration] 330 | if (pci_bus_read_dev_vendor_id(pbus, PCI_DEVFN(device, function), | ^~ drivers/pci/hotplug/acpiphp_glue.c: In function 'enable_slot': >> drivers/pci/hotplug/acpiphp_glue.c:505:6: error: implicit declaration of >> function '__pci_bus_size_bridges'; did you mean 'pci_bus_size_bridges'? >> [-Werror=implicit-function-declaration] 505 | __pci_bus_size_bridges(dev->subordinate, | ^~ | pci_bus_size_bridges >> drivers/pci/hotplug/acpiphp_glue.c:510:3: error: implicit declaration of >> function '__pci_bus_assign_resources'; did you mean >> 'pci_bus_assign_resources'? [-Werror=implicit-function-declaration] 510 | __pci_bus_assign_resources(bus, _list, NULL); | ^~ | pci_bus_assign_resources drivers/pci/hotplug/acpiphp_glue.c: In function 'trim_stale_devices': >> drivers/pci/hotplug/acpiphp_glue.c:660:3: error: implicit declaration of >> function 'pci_dev_set_disconnected' [-Werror=implicit-function-declaration] 660 | pci_dev_set_disconnected(dev, NULL); | ^~~~ >> drivers/pci/hotplug/acpiphp_glue.c:661:7: error: implicit declaration of >> function 'pci_has_subordinate' [-Werror=implicit-function-declaration] 661 | if (pci_has_subordinate(dev)) | ^~~ cc1: some warnings being treated as errors vim +/pci_bus_read_dev_vendor_id +330 drivers/pci/hotplug/acpiphp_glue.c 4e8662bbd680c5 Kristen Accardi 2006-06-28 217 3799c5a032aefb Rafael J. Wysocki 2014-02-16 218 /** 3799c5a032aefb Rafael J. Wysocki 2014-02-16 219 * acpiphp_add_context - Add ACPIPHP context to an ACPI device object. 3799c5a032aefb Rafael J. Wysocki 2014-02-16 220 * @handle: ACPI handle of the object to add a context to. 3799c5a032aefb Rafael J. Wysocki 2014-02-16 221 * @lvl: Not used. 3799c5a032aefb Rafael J. Wysocki 2014-02-16 222 * @data: The object's parent ACPIPHP bridge. 3799c5a032aefb Rafael J. Wysocki 2014-02-16 223 * @rv: Not used. 3799c5a032aefb Rafael J. Wysocki 2014-02-16 224 */ 3799c5a032aefb Rafael J. Wysocki 2014-02-16 225 static acpi_status acpiphp_add_context(acpi_handle handle, u32 lvl, void *data, cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13 226 void **rv) ^1da177e4c3f41 Linus Torvalds2005-04-16 227 { cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13 228struct acpiphp_bridge *bridge = data; cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13 229struct acpiphp_context *context; bbcbfc0eed6220 Rafael J. Wysocki 2014-02-04 230struct acpi_device *adev; ^1da177e4c3f41 Linus Torvalds2005-04-16 231struct acpiphp_slot *slot; ^1da177e4c3f41 Linus Torvalds2005-04-16 232struct acpiphp_func *newfunc; ^1da177e4c3f41 Linus Torvalds2005-04-16 233acpi_status status = AE_OK; bbd34fcdd1b201 Rafael J. Wysocki 2013-07-13 234unsigned long long adr; bbd34fcdd1b201 Rafael J. Wysocki 2013-07-13 235int device, function; e8c331e963c58b Kenji Kaneshige 2008-12-17 236struct pci_bus *pbus = bridge->pci_bus; bbd34fcdd1b201 Rafael J. Wysocki 2013-07-13 237struct pci_dev *pdev = bridge->pci_dev; 3b63aaa70e1ccc Jiang Liu 2013-04-12 238u32 val; ^1da177e4c3f41 Linus Torvalds2005-04-16 239 dfb117b
Re: [PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h
Hi Niklas, I love your patch! Yet something to improve: [auto build test ERROR on s390/features] [also build test ERROR on next-20210906] [cannot apply to pci/next powerpc/next v5.14] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309 base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features config: i386-randconfig-a016-20210906 (attached as .config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 6fe2beba7d2a41964af658c8c59dd172683ef739) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/404ed8c00a612e7ae31c50557c80c6726c464863 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309 git checkout 404ed8c00a612e7ae31c50557c80c6726c464863 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=i386 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): >> drivers/pci/hotplug/acpiphp_glue.c:330:6: error: implicit declaration of >> function 'pci_bus_read_dev_vendor_id' >> [-Werror,-Wimplicit-function-declaration] if (pci_bus_read_dev_vendor_id(pbus, PCI_DEVFN(device, function), ^ >> drivers/pci/hotplug/acpiphp_glue.c:505:6: error: implicit declaration of >> function '__pci_bus_size_bridges' [-Werror,-Wimplicit-function-declaration] __pci_bus_size_bridges(dev->subordinate, ^ drivers/pci/hotplug/acpiphp_glue.c:505:6: note: did you mean 'pci_bus_size_bridges'? include/linux/pci.h:1336:6: note: 'pci_bus_size_bridges' declared here void pci_bus_size_bridges(struct pci_bus *bus); ^ >> drivers/pci/hotplug/acpiphp_glue.c:510:3: error: implicit declaration of >> function '__pci_bus_assign_resources' >> [-Werror,-Wimplicit-function-declaration] __pci_bus_assign_resources(bus, _list, NULL); ^ drivers/pci/hotplug/acpiphp_glue.c:510:3: note: did you mean 'pci_bus_assign_resources'? include/linux/pci.h:1334:6: note: 'pci_bus_assign_resources' declared here void pci_bus_assign_resources(const struct pci_bus *bus); ^ drivers/pci/hotplug/acpiphp_glue.c:604:8: error: implicit declaration of function 'pci_bus_read_dev_vendor_id' [-Werror,-Wimplicit-function-declaration] if (pci_bus_read_dev_vendor_id(slot->bus, ^ drivers/pci/hotplug/acpiphp_glue.c:619:7: error: implicit declaration of function 'pci_bus_read_dev_vendor_id' [-Werror,-Wimplicit-function-declaration] if (pci_bus_read_dev_vendor_id(slot->bus, ^ >> drivers/pci/hotplug/acpiphp_glue.c:660:3: error: implicit declaration of >> function 'pci_dev_set_disconnected' [-Werror,-Wimplicit-function-declaration] pci_dev_set_disconnected(dev, NULL); ^ >> drivers/pci/hotplug/acpiphp_glue.c:661:7: error: implicit declaration of >> function 'pci_has_subordinate' [-Werror,-Wimplicit-function-declaration] if (pci_has_subordinate(dev)) ^ 7 errors generated. vim +/pci_bus_read_dev_vendor_id +330 drivers/pci/hotplug/acpiphp_glue.c 4e8662bbd680c5 Kristen Accardi 2006-06-28 217 3799c5a032aefb Rafael J. Wysocki 2014-02-16 218 /** 3799c5a032aefb Rafael J. Wysocki 2014-02-16 219 * acpiphp_add_context - Add ACPIPHP context to an ACPI device object. 3799c5a032aefb Rafael J. Wysocki 2014-02-16 220 * @handle: ACPI handle of the object to add a context to. 3799c5a032aefb Rafael J. Wysocki 2014-02-16 221 * @lvl: Not used. 3799c5a032aefb Rafael J. Wysocki 2014-02-16 222 * @data: The object's parent ACPIPHP bridge. 3799c5a032aefb Rafael J. Wysocki 2014-02-16 223 * @rv: Not used. 3799c5a032aefb Rafael J. Wysocki 2014-02-16 224 */ 3799c5a032aefb Rafael J. Wysocki 2014-02-16 225 static acpi_status acpiphp_add_context(acpi_handle handle, u32 lvl, void *data, cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13 226 void **rv) ^1da177e4c3f41 Linus Torvalds2005-04-16 227 { cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13 228struct acpiphp_bridge *bridge = data; cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13 229struct ac
Re: [PATCH v2] ftrace: Cleanup ftrace_dyn_arch_init()
On 9/6/21 1:16 PM, Weizhao Ouyang wrote: Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common ftrace_dyn_arch_init() to cleanup them. Signed-off-by: Weizhao Ouyang Acked-by: Heiko Carstens (s390) Acked-by: Helge Deller # parisc Thanks, Helge --- Changes in v2: -- correct CONFIG_DYNAMIC_FTRACE on PowerPC -- add Acked-by tag --- arch/arm/kernel/ftrace.c | 5 - arch/arm64/kernel/ftrace.c| 5 - arch/csky/kernel/ftrace.c | 5 - arch/ia64/kernel/ftrace.c | 6 -- arch/microblaze/kernel/ftrace.c | 5 - arch/mips/include/asm/ftrace.h| 2 ++ arch/nds32/kernel/ftrace.c| 5 - arch/parisc/kernel/ftrace.c | 5 - arch/powerpc/include/asm/ftrace.h | 4 arch/riscv/kernel/ftrace.c| 5 - arch/s390/kernel/ftrace.c | 5 - arch/sh/kernel/ftrace.c | 5 - arch/sparc/kernel/ftrace.c| 5 - arch/x86/kernel/ftrace.c | 5 - include/linux/ftrace.h| 1 - kernel/trace/ftrace.c | 5 + 16 files changed, 11 insertions(+), 62 deletions(-) diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c index 3c83b5d29697..a006585e1c09 100644 --- a/arch/arm/kernel/ftrace.c +++ b/arch/arm/kernel/ftrace.c @@ -193,11 +193,6 @@ int ftrace_make_nop(struct module *mod, return ret; } - -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} #endif /* CONFIG_DYNAMIC_FTRACE */ #ifdef CONFIG_FUNCTION_GRAPH_TRACER diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c index 7f467bd9db7a..fc62dfe73f93 100644 --- a/arch/arm64/kernel/ftrace.c +++ b/arch/arm64/kernel/ftrace.c @@ -236,11 +236,6 @@ void arch_ftrace_update_code(int command) command |= FTRACE_MAY_SLEEP; ftrace_modify_all_code(command); } - -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} #endif /* CONFIG_DYNAMIC_FTRACE */ #ifdef CONFIG_FUNCTION_GRAPH_TRACER diff --git a/arch/csky/kernel/ftrace.c b/arch/csky/kernel/ftrace.c index b4a7ec1517ff..50bfcf129078 100644 --- a/arch/csky/kernel/ftrace.c +++ b/arch/csky/kernel/ftrace.c @@ -133,11 +133,6 @@ int ftrace_update_ftrace_func(ftrace_func_t func) (unsigned long)func, true, true); return ret; } - -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} #endif /* CONFIG_DYNAMIC_FTRACE */ #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS diff --git a/arch/ia64/kernel/ftrace.c b/arch/ia64/kernel/ftrace.c index b2ab2d58fb30..d6360fd404ab 100644 --- a/arch/ia64/kernel/ftrace.c +++ b/arch/ia64/kernel/ftrace.c @@ -194,9 +194,3 @@ int ftrace_update_ftrace_func(ftrace_func_t func) flush_icache_range(addr, addr + 16); return 0; } - -/* run from kstop_machine */ -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} diff --git a/arch/microblaze/kernel/ftrace.c b/arch/microblaze/kernel/ftrace.c index 224eea40e1ee..188749d62709 100644 --- a/arch/microblaze/kernel/ftrace.c +++ b/arch/microblaze/kernel/ftrace.c @@ -163,11 +163,6 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) return ret; } -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} - int ftrace_update_ftrace_func(ftrace_func_t func) { unsigned long ip = (unsigned long)(_call); diff --git a/arch/mips/include/asm/ftrace.h b/arch/mips/include/asm/ftrace.h index b463f2aa5a61..ed013e767390 100644 --- a/arch/mips/include/asm/ftrace.h +++ b/arch/mips/include/asm/ftrace.h @@ -76,6 +76,8 @@ do { \ #ifdef CONFIG_DYNAMIC_FTRACE +int __init ftrace_dyn_arch_init(void); + static inline unsigned long ftrace_call_adjust(unsigned long addr) { return addr; diff --git a/arch/nds32/kernel/ftrace.c b/arch/nds32/kernel/ftrace.c index 0e23e3a8df6b..f0ef4842d191 100644 --- a/arch/nds32/kernel/ftrace.c +++ b/arch/nds32/kernel/ftrace.c @@ -84,11 +84,6 @@ void _ftrace_caller(unsigned long parent_ip) /* restore all state needed by the compiler epilogue */ } -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} - static unsigned long gen_sethi_insn(unsigned long addr) { unsigned long opcode = 0x4600; diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c index 0a1e75af5382..01581f715737 100644 --- a/arch/parisc/kernel/ftrace.c +++ b/arch/parisc/kernel/ftrace.c @@ -94,11 +94,6 @@ int ftrace_disable_ftrace_graph_caller(void) #endif #ifdef CONFIG_DYNAMIC_FTRACE - -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} int ftrace_update_ftrace_func(ftrace_func_t func) { return 0; diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h index debe8c4f7062..d59f67c0225f 100644 --- a/arch/powerpc/include/asm/ftrace.h +++ b/arch/powerpc/include/asm/ftrace.h @@ -61,6 +61,10 @@ struct dyn_arch_ftrace { }; #endif /* __ASSEMBLY__ */ +#ifdef CONFIG_DYNAMIC_FTRACE
Re: [PATCH v2] ftrace: Cleanup ftrace_dyn_arch_init()
On 2021/9/6 22:22, Michael Ellerman wrote: > Weizhao Ouyang writes: >> Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common >> ftrace_dyn_arch_init() to cleanup them. >> >> Signed-off-by: Weizhao Ouyang >> Acked-by: Heiko Carstens (s390) >> >> --- >> >> Changes in v2: >> -- correct CONFIG_DYNAMIC_FTRACE on PowerPC >> -- add Acked-by tag >> diff --git a/arch/powerpc/include/asm/ftrace.h >> b/arch/powerpc/include/asm/ftrace.h >> index debe8c4f7062..d59f67c0225f 100644 >> --- a/arch/powerpc/include/asm/ftrace.h >> +++ b/arch/powerpc/include/asm/ftrace.h >> @@ -61,6 +61,10 @@ struct dyn_arch_ftrace { >> }; >> #endif /* __ASSEMBLY__ */ >> >> +#ifdef CONFIG_DYNAMIC_FTRACE >> +int __init ftrace_dyn_arch_init(void); >> +#endif /* CONFIG_DYNAMIC_FTRACE */ >> + >> #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS >> #define ARCH_SUPPORTS_FTRACE_OPS 1 >> #endif > That breaks the build for powerpc: > > /linux/arch/powerpc/include/asm/ftrace.h: Assembler messages: > /linux/arch/powerpc/include/asm/ftrace.h:65: Error: unrecognized opcode: > `int' > make[4]: *** [/linux/scripts/Makefile.build:352: > arch/powerpc/kernel/trace/ftrace_64.o] Error 1 > make[3]: *** [/linux/scripts/Makefile.build:514: arch/powerpc/kernel/trace] > Error 2 > make[2]: *** [/linux/scripts/Makefile.build:514: arch/powerpc/kernel] Error > 2 > make[1]: *** [/linux/Makefile:1861: arch/powerpc] Error 2 > make[1]: *** Waiting for unfinished jobs > > It needs to be inside an #ifndef __ASSEMBLY__ section. > > cheers Thanks for reply, I'll fix it up.
Re: [PATCH v2] ftrace: Cleanup ftrace_dyn_arch_init()
Weizhao Ouyang writes: > Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common > ftrace_dyn_arch_init() to cleanup them. > > Signed-off-by: Weizhao Ouyang > Acked-by: Heiko Carstens (s390) > > --- > > Changes in v2: > -- correct CONFIG_DYNAMIC_FTRACE on PowerPC > -- add Acked-by tag > diff --git a/arch/powerpc/include/asm/ftrace.h > b/arch/powerpc/include/asm/ftrace.h > index debe8c4f7062..d59f67c0225f 100644 > --- a/arch/powerpc/include/asm/ftrace.h > +++ b/arch/powerpc/include/asm/ftrace.h > @@ -61,6 +61,10 @@ struct dyn_arch_ftrace { > }; > #endif /* __ASSEMBLY__ */ > > +#ifdef CONFIG_DYNAMIC_FTRACE > +int __init ftrace_dyn_arch_init(void); > +#endif /* CONFIG_DYNAMIC_FTRACE */ > + > #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS > #define ARCH_SUPPORTS_FTRACE_OPS 1 > #endif That breaks the build for powerpc: /linux/arch/powerpc/include/asm/ftrace.h: Assembler messages: /linux/arch/powerpc/include/asm/ftrace.h:65: Error: unrecognized opcode: `int' make[4]: *** [/linux/scripts/Makefile.build:352: arch/powerpc/kernel/trace/ftrace_64.o] Error 1 make[3]: *** [/linux/scripts/Makefile.build:514: arch/powerpc/kernel/trace] Error 2 make[2]: *** [/linux/scripts/Makefile.build:514: arch/powerpc/kernel] Error 2 make[1]: *** [/linux/Makefile:1861: arch/powerpc] Error 2 make[1]: *** Waiting for unfinished jobs It needs to be inside an #ifndef __ASSEMBLY__ section. cheers
Re: [PATCH] powerpc/mce: Fix access error in mce handler
Ganesh Goudar writes: > We queue an irq work for deferred processing of mce event > in realmode mce handler, where translation is disabled. > Queuing of the work may result in accessing memory outside > RMO region, such access needs the translation to be enabled > for an LPAR running with hash mmu else the kernel crashes. > > So enable the translation before queuing the work. > > Without this change following trace is seen on injecting machine > check error in an LPAR running with hash mmu. What type of error are you injecting? > Oops: Kernel access of bad area, sig: 11 [#1] > LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0-mce+ #137 > NIP: c0735d60 LR: c0318640 CTR: > REGS: c0001ebff9a0 TRAP: 0300 Tainted: G OE (5.14.0-mce+) > MSR: 80001003 CR: 28008228 XER: 0001 > CFAR: c031863c DAR: c0027fa8fe08 DSISR: 4000 IRQMASK: 0 > GPR00: c03186d0 c0001ebffc40 c1b0df00 c16337e8 > GPR04: c16337e8 c0027fa8fe08 0023 c16337f0 > GPR08: 0023 c12ffe08 c00801460240 > GPR12: c0001ec9a900 c0002ac4bd00 > GPR16: 05a0 c008006b c008006b05a0 c0ff3068 > GPR20: c0002ac4bbc0 0001 c0002ac4bbc0 c00801490298 > GPR24: c00801490108 c1636198 c00801470090 c00801470058 > GPR28: 0510 c0080100 c0080819 0019 > NIP [c0735d60] llist_add_batch+0x0/0x40 > LR [c0318640] __irq_work_queue_local+0x70/0xc0 > Call Trace: > [c0001ebffc40] [c0001ebffc0c] 0xc0001ebffc0c (unreliable) > [c0001ebffc60] [c03186d0] irq_work_queue+0x40/0x70 > [c0001ebffc80] [c004425c] machine_check_queue_event+0xbc/0xd0 > [c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4 > > Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from > handler") Please explain in more detail why that commit caused this breakage. > diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c > index 47a683cd00d2..9d1e39d42e3e 100644 > --- a/arch/powerpc/kernel/mce.c > +++ b/arch/powerpc/kernel/mce.c > @@ -249,6 +249,7 @@ void machine_check_queue_event(void) > { > int index; > struct machine_check_event evt; > + unsigned long msr; > > if (!get_mce_event(, MCE_EVENT_RELEASE)) > return; > @@ -262,8 +263,19 @@ void machine_check_queue_event(void) > memcpy(_paca->mce_info->mce_event_queue[index], > , sizeof(evt)); > > - /* Queue irq work to process this event later. */ > - irq_work_queue(_event_process_work); > + /* Queue irq work to process this event later. Before > + * queuing the work enable translation for non radix LPAR, > + * as irq_work_queue may try to access memory outside RMO > + * region. > + */ > + if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) { > + msr = mfmsr(); > + mtmsr(msr | MSR_IR | MSR_DR); > + irq_work_queue(_event_process_work); > + mtmsr(msr); > + } else { > + irq_work_queue(_event_process_work); > + } > } We already went to virtual mode and queued (different) irq work in arch/powerpc/platforms/pseries/ras.c:mce_handle_error() We also called save_mce_event() which also might have queued irq work, via machine_check_ue_event(). So it really feels like something about the design is wrong if we have to go to virtual mode again and queue more irq work here. I guess we can probably merge this as a backportable fix, doing anything else would be a bigger change. Looking at ras.c there's the comment: * Enable translation as we will be accessing per-cpu variables * in save_mce_event() which may fall outside RMO region, also But AFAICS it's only irq_work_queue() that touches anything percpu? So maybe we should just not be using irq_work_queue(). It's a pretty thin wrapper around set_dec(1), perhaps we just need to hand-roll some real-mode friendly way of doing that. cheers
[PATCH v2] ftrace: Cleanup ftrace_dyn_arch_init()
Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common ftrace_dyn_arch_init() to cleanup them. Signed-off-by: Weizhao Ouyang Acked-by: Heiko Carstens (s390) --- Changes in v2: -- correct CONFIG_DYNAMIC_FTRACE on PowerPC -- add Acked-by tag --- arch/arm/kernel/ftrace.c | 5 - arch/arm64/kernel/ftrace.c| 5 - arch/csky/kernel/ftrace.c | 5 - arch/ia64/kernel/ftrace.c | 6 -- arch/microblaze/kernel/ftrace.c | 5 - arch/mips/include/asm/ftrace.h| 2 ++ arch/nds32/kernel/ftrace.c| 5 - arch/parisc/kernel/ftrace.c | 5 - arch/powerpc/include/asm/ftrace.h | 4 arch/riscv/kernel/ftrace.c| 5 - arch/s390/kernel/ftrace.c | 5 - arch/sh/kernel/ftrace.c | 5 - arch/sparc/kernel/ftrace.c| 5 - arch/x86/kernel/ftrace.c | 5 - include/linux/ftrace.h| 1 - kernel/trace/ftrace.c | 5 + 16 files changed, 11 insertions(+), 62 deletions(-) diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c index 3c83b5d29697..a006585e1c09 100644 --- a/arch/arm/kernel/ftrace.c +++ b/arch/arm/kernel/ftrace.c @@ -193,11 +193,6 @@ int ftrace_make_nop(struct module *mod, return ret; } - -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} #endif /* CONFIG_DYNAMIC_FTRACE */ #ifdef CONFIG_FUNCTION_GRAPH_TRACER diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c index 7f467bd9db7a..fc62dfe73f93 100644 --- a/arch/arm64/kernel/ftrace.c +++ b/arch/arm64/kernel/ftrace.c @@ -236,11 +236,6 @@ void arch_ftrace_update_code(int command) command |= FTRACE_MAY_SLEEP; ftrace_modify_all_code(command); } - -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} #endif /* CONFIG_DYNAMIC_FTRACE */ #ifdef CONFIG_FUNCTION_GRAPH_TRACER diff --git a/arch/csky/kernel/ftrace.c b/arch/csky/kernel/ftrace.c index b4a7ec1517ff..50bfcf129078 100644 --- a/arch/csky/kernel/ftrace.c +++ b/arch/csky/kernel/ftrace.c @@ -133,11 +133,6 @@ int ftrace_update_ftrace_func(ftrace_func_t func) (unsigned long)func, true, true); return ret; } - -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} #endif /* CONFIG_DYNAMIC_FTRACE */ #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS diff --git a/arch/ia64/kernel/ftrace.c b/arch/ia64/kernel/ftrace.c index b2ab2d58fb30..d6360fd404ab 100644 --- a/arch/ia64/kernel/ftrace.c +++ b/arch/ia64/kernel/ftrace.c @@ -194,9 +194,3 @@ int ftrace_update_ftrace_func(ftrace_func_t func) flush_icache_range(addr, addr + 16); return 0; } - -/* run from kstop_machine */ -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} diff --git a/arch/microblaze/kernel/ftrace.c b/arch/microblaze/kernel/ftrace.c index 224eea40e1ee..188749d62709 100644 --- a/arch/microblaze/kernel/ftrace.c +++ b/arch/microblaze/kernel/ftrace.c @@ -163,11 +163,6 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) return ret; } -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} - int ftrace_update_ftrace_func(ftrace_func_t func) { unsigned long ip = (unsigned long)(_call); diff --git a/arch/mips/include/asm/ftrace.h b/arch/mips/include/asm/ftrace.h index b463f2aa5a61..ed013e767390 100644 --- a/arch/mips/include/asm/ftrace.h +++ b/arch/mips/include/asm/ftrace.h @@ -76,6 +76,8 @@ do { \ #ifdef CONFIG_DYNAMIC_FTRACE +int __init ftrace_dyn_arch_init(void); + static inline unsigned long ftrace_call_adjust(unsigned long addr) { return addr; diff --git a/arch/nds32/kernel/ftrace.c b/arch/nds32/kernel/ftrace.c index 0e23e3a8df6b..f0ef4842d191 100644 --- a/arch/nds32/kernel/ftrace.c +++ b/arch/nds32/kernel/ftrace.c @@ -84,11 +84,6 @@ void _ftrace_caller(unsigned long parent_ip) /* restore all state needed by the compiler epilogue */ } -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} - static unsigned long gen_sethi_insn(unsigned long addr) { unsigned long opcode = 0x4600; diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c index 0a1e75af5382..01581f715737 100644 --- a/arch/parisc/kernel/ftrace.c +++ b/arch/parisc/kernel/ftrace.c @@ -94,11 +94,6 @@ int ftrace_disable_ftrace_graph_caller(void) #endif #ifdef CONFIG_DYNAMIC_FTRACE - -int __init ftrace_dyn_arch_init(void) -{ - return 0; -} int ftrace_update_ftrace_func(ftrace_func_t func) { return 0; diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h index debe8c4f7062..d59f67c0225f 100644 --- a/arch/powerpc/include/asm/ftrace.h +++ b/arch/powerpc/include/asm/ftrace.h @@ -61,6 +61,10 @@ struct dyn_arch_ftrace { }; #endif /* __ASSEMBLY__ */ +#ifdef CONFIG_DYNAMIC_FTRACE +int __init ftrace_dyn_arch_init(void); +#endif /* CONFIG_DYNAMIC_FTRACE */ + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS #define
Re: [PATCH 1/5] KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
On 20/08/21 20:51, Mathieu Desnoyers wrote: Ah, or is it the case that rseq_cs is non-NULL if and only if userspace is in an rseq critical section, and because syscalls in critical sections are illegal, by definition clearing rseq_cs is a nop unless userspace is misbehaving. Not quite, as I described above. But we want it to stay set so the CONFIG_DEBUG_RSEQ code executed when returning from ioctl to userspace will be able to validate that it is not nested within a rseq critical section. If that's true, what about explicitly checking that at NOTIFY_RESUME? Or is it not worth the extra code to detect an error that will likely be caught anyways? The error will indeed already be caught on return from ioctl to userspace, so I don't see any added value in duplicating this check. Sean, can you send a v2 (even for this patch only would be okay)? Thanks, Paolo
[PATCH 4/5] PCI: Export pci_dev_lock()
Commit e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") already exported pci_dev_trylock()/pci_dev_unlock() however in some circumstances such as during error recovery it makes sense to block waiting to get full access to the device so also export pci_dev_lock(). Signed-off-by: Niklas Schnelle --- drivers/pci/pci.c | 3 ++- include/linux/pci.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index aacf575c15cf..3f416c6d3b0b 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5040,12 +5040,13 @@ static int pci_reset_bus_function(struct pci_dev *dev, int probe) return pci_parent_bus_reset(dev, probe); } -static void pci_dev_lock(struct pci_dev *dev) +void pci_dev_lock(struct pci_dev *dev) { pci_cfg_access_lock(dev); /* block PM suspend, driver probe, etc. */ device_lock(>dev); } +EXPORT_SYMBOL_GPL(pci_dev_lock); /* Return 1 on successful lock, 0 on contention */ int pci_dev_trylock(struct pci_dev *dev) diff --git a/include/linux/pci.h b/include/linux/pci.h index ea0e23dbc8ec..efc78b8d4696 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1639,6 +1639,7 @@ void pci_cfg_access_lock(struct pci_dev *dev); bool pci_cfg_access_trylock(struct pci_dev *dev); void pci_cfg_access_unlock(struct pci_dev *dev); +void pci_dev_lock(struct pci_dev *dev); int pci_dev_trylock(struct pci_dev *dev); void pci_dev_unlock(struct pci_dev *dev); -- 2.25.1
[PATCH 5/5] s390/pci: implement minimal PCI error recovery
When the platform detects an error on a PCI function or a service action has been performed it is put in the error state and an error event notification is provided to the OS. Currently we treat all error event notifications the same and simply set pdev->error_state = pci_channel_io_perm_failure requiring user intervention such as use of the recover attribute to get the device usable again. Despite requiring a manual step this also has the disadvantage that the device is completely torn down and recreated resulting in higher level devices such as a block or network device being recreated. In case of a block device this also means that it may need to be removed and added to a software raid even if that could otherwise survive with a temporary degradation. This is of course not ideal more so since an error notification with PEC 0x3A indicates that the platform already performed error recovery successfully or that the error state was caused by a service action that is now finished. At least in this case we can assume that the error state can be reset and the function made usable again. So as not to have the disadvantage of a full tear down and recreation we need to coordinate this recovery with the driver. Thankfully there is already a well defined recovery flow for this described in Documentation/PCI/pci-error-recovery.rst. The implementation of this is somewhat straight forward and simplified by the fact that our recovery flow is defined per PCI function. As a reset we use the newly introduced zpci_hot_reset_device() which also takes the PCI function out of the error state. Signed-off-by: Niklas Schnelle --- arch/s390/include/asm/pci.h | 4 +- arch/s390/pci/pci.c | 49 ++ arch/s390/pci/pci_event.c | 190 +++- 3 files changed, 241 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 2a2ed165a270..558877aff2e5 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -294,8 +294,10 @@ void zpci_debug_exit(void); void zpci_debug_init_device(struct zpci_dev *, const char *); void zpci_debug_exit_device(struct zpci_dev *); -/* Error reporting */ +/* Error handling */ int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *); +int zpci_clear_error_state(struct zpci_dev *zdev); +int zpci_reset_load_store_blocked(struct zpci_dev *zdev); #ifdef CONFIG_NUMA diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index a6322f45b5bd..77a3e85d43fb 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -954,6 +954,55 @@ int zpci_report_error(struct pci_dev *pdev, } EXPORT_SYMBOL(zpci_report_error); +/** + * zpci_clear_error_state() - Clears the zPCI error state of the device + * @zdev: The zdev for which the zPCI error state should be reset + * + * Clear the zPCI error state of the device. If clearing the zPCI error state + * fails the device is left in the error state. In this case it may make sense + * to call zpci_io_perm_failure() on the associated pdev if it exists. + * + * Returns: 0 on success, -EIO otherwise + */ +int zpci_clear_error_state(struct zpci_dev *zdev) +{ + u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_RESET_ERROR); + struct zpci_fib fib = {0}; + u8 status; + int rc; + + rc = zpci_mod_fc(req, , ); + if (rc) + return -EIO; + + return 0; +} + +/** + * zpci_reset_load_store_blocked() - Re-enables L/S from error state + * @zdev: The zdev for which to unblock load/store access + * + * R-eenables load/store access for a PCI function in the error state while + * keeping DMA blocked. In this state drivers can poke MMIO space to determine + * if error recovery is possible while catching any rogue DMA access from the + * device. + * + * Returns: 0 on success, -EIO otherwise + */ +int zpci_reset_load_store_blocked(struct zpci_dev *zdev) +{ + u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_RESET_BLOCK); + struct zpci_fib fib = {0}; + u8 status; + int rc; + + rc = zpci_mod_fc(req, , ); + if (rc) + return -EIO; + + return 0; +} + static int zpci_mem_init(void) { BUILD_BUG_ON(!is_power_of_2(__alignof__(struct zpci_fmb)) || diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c index e868d996ec5b..ac9ed1572d39 100644 --- a/arch/s390/pci/pci_event.c +++ b/arch/s390/pci/pci_event.c @@ -47,15 +47,190 @@ struct zpci_ccdf_avail { u16 pec;/* PCI event code */ } __packed; +static inline bool ers_result_indicates_abort(pci_ers_result_t ers_res) +{ + switch (ers_res) { + case PCI_ERS_RESULT_CAN_RECOVER: + case PCI_ERS_RESULT_RECOVERED: + case PCI_ERS_RESULT_NEED_RESET: + return false; + default: + return true; + } +} + +static pci_ers_result_t zpci_event_notify_error_detected(struct pci_dev *pdev) +{ +
[PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in PCI arch code of both s390 and powerpc leading to awkward relative includes. Move it to the global include/linux/pci.h and get rid of these includes just for that one function. Signed-off-by: Niklas Schnelle --- arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- arch/powerpc/platforms/pseries/setup.c | 1 - arch/s390/pci/pci_sysfs.c | 2 -- drivers/pci/hotplug/acpiphp_glue.c | 1 - drivers/pci/pci.h | 15 --- include/linux/pci.h| 15 +++ 6 files changed, 15 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c index 28aac933a439..2e0ca5451e85 100644 --- a/arch/powerpc/platforms/powernv/pci-sriov.c +++ b/arch/powerpc/platforms/powernv/pci-sriov.c @@ -9,9 +9,6 @@ #include "pci.h" -/* for pci_dev_is_added() */ -#include "../../../../drivers/pci/pci.h" - /* * The majority of the complexity in supporting SR-IOV on PowerNV comes from * the need to put the MMIO space for each VF into a separate PE. Internally diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 0dfaa6ab44cc..08e846ae1853 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -74,7 +74,6 @@ #include #include "pseries.h" -#include "../../../../drivers/pci/pci.h" DEFINE_STATIC_KEY_FALSE(shared_processor); EXPORT_SYMBOL(shared_processor); diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c index 335c281811c7..40733b93a086 100644 --- a/arch/s390/pci/pci_sysfs.c +++ b/arch/s390/pci/pci_sysfs.c @@ -13,8 +13,6 @@ #include #include -#include "../../../drivers/pci/pci.h" - #include #define zpci_attr(name, fmt, member) \ diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index f031302ad401..4cb963f88183 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -38,7 +38,6 @@ #include #include -#include "../pci.h" #include "acpiphp.h" static LIST_HEAD(bridge_list); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 93dcdd431072..a159cd0f6f05 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) return dev->error_state == pci_channel_io_perm_failure; } -/* pci_dev priv_flags */ -#define PCI_DEV_ADDED 0 -#define PCI_DPC_RECOVERED 1 -#define PCI_DPC_RECOVERING 2 - -static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) -{ - assign_bit(PCI_DEV_ADDED, >priv_flags, added); -} - -static inline bool pci_dev_is_added(const struct pci_dev *dev) -{ - return test_bit(PCI_DEV_ADDED, >priv_flags); -} - #ifdef CONFIG_PCIEAER #include diff --git a/include/linux/pci.h b/include/linux/pci.h index 540b377ca8f6..ea0e23dbc8ec 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -507,6 +507,21 @@ struct pci_dev { unsigned long priv_flags; /* Private flags for the PCI driver */ }; +/* pci_dev priv_flags */ +#define PCI_DEV_ADDED 0 +#define PCI_DPC_RECOVERED 1 +#define PCI_DPC_RECOVERING 2 + +static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) +{ + assign_bit(PCI_DEV_ADDED, >priv_flags, added); +} + +static inline bool pci_dev_is_added(const struct pci_dev *dev) +{ + return test_bit(PCI_DEV_ADDED, >priv_flags); +} + static inline struct pci_dev *pci_physfn(struct pci_dev *dev) { #ifdef CONFIG_PCI_IOV -- 2.25.1
[PATCH 2/5] s390/pci: implement reset_slot for hotplug slot
This is done by adding a zpci_hot_reset_device() call which does a low level reset of the PCI function without changing its higher level function state. This way it can be used while the zPCI function is bound to a driver and with DMA tables being controlled either through the IOMMU or DMA APIs which is prohibited when using zpci_disable_device() as that drop existing DMA translations. As this reset, unlike a normal FLR, also calls zpci_clear_irq() we need to implement arch_restore_msi_irqs() and make sure we re-enable IRQs for the PCI function if they were previously disabled. Signed-off-by: Niklas Schnelle --- arch/s390/include/asm/pci.h| 1 + arch/s390/pci/pci.c| 58 ++ arch/s390/pci/pci_irq.c| 9 + drivers/pci/hotplug/s390_pci_hpc.c | 24 + 4 files changed, 92 insertions(+) diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 5e6cba22a801..2a2ed165a270 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -208,6 +208,7 @@ int zpci_disable_device(struct zpci_dev *); int zpci_scan_configured_device(struct zpci_dev *zdev, u32 fh); int zpci_deconfigure_device(struct zpci_dev *zdev); +int zpci_hot_reset_device(struct zpci_dev *zdev); int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64); int zpci_unregister_ioat(struct zpci_dev *, u8); void zpci_remove_reserved_devices(void); diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index af22778551c1..a6322f45b5bd 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -723,6 +723,64 @@ int zpci_disable_device(struct zpci_dev *zdev) return rc; } +/** + * zpci_hot_reset_device - perform a reset of the given zPCI function + * @zdev: the slot which should be reset + * + * Performs a low level reset of the zPCI function. The reset is low level in + * the sense that the zPCI function can be reset without detaching it from the + * common PCI subsystem. The reset may be performed while under control of + * either DMA or IOMMU APIs in which case the existing DMA/IOMMU translation + * table is reinstated at the end of the reset. + * + * After the reset the functions internal state is reset to an initial state + * equivalent to its state during boot when first probing a driver. + * Consequently after reset the PCI function requires re-initialization via the + * common PCI code including re-enabling IRQs via pci_alloc_irq_vectors() + * and enabling the function via e.g.pci_enablde_device_flags().The caller + * must guard against concurrent reset attempts. + * + * In most cases this function should not be called directly but through + * pci_reset_function() or pci_reset_bus() which handle the save/restore and + * locking. + * + * Return: 0 on success and an error value otherwise + */ +int zpci_hot_reset_device(struct zpci_dev *zdev) +{ + int rc; + + zpci_dbg(3, "reset fid:%x\n", zdev->fid); + if (zdev_enabled(zdev)) { + /* Disables device access, DMAs and IRQs (reset state) */ + rc = zpci_disable_device(zdev); + /* +* Due to a z/VM vs LPAR inconsistency in the error state the +* FH may indicate an enabled device but disable says the +* device is already disabled don't treat it as an error here. +*/ + if (rc == -EINVAL) + rc = 0; + if (rc) + return rc; + } + + rc = zpci_enable_device(zdev); + if (rc) + return rc; + + if (zdev->dma_table) { + rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma, + (u64)zdev->dma_table); + if (rc) + return rc; + } else { + zpci_dma_init_device(zdev); + } + + return 0; +} + /** * zpci_create_device() - Create a new zpci_dev and add it to the zbus * @fid: Function ID of the device to be created diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c index 9c7de9089939..ab98e7f5b79b 100644 --- a/arch/s390/pci/pci_irq.c +++ b/arch/s390/pci/pci_irq.c @@ -391,6 +391,15 @@ void arch_teardown_msi_irqs(struct pci_dev *pdev) airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, zdev->msi_nr_irqs); } +void arch_restore_msi_irqs(struct pci_dev *pdev) +{ + struct zpci_dev *zdev = to_zpci(pdev); + + if (!zdev->irqs_registered) + zpci_set_irq(zdev); + default_restore_msi_irqs(pdev); +} + static struct airq_struct zpci_airq = { .handler = zpci_floating_irq_handler, .isc = PCI_ISC, diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c index 014868752cd4..07f28db0eed5 100644 --- a/drivers/pci/hotplug/s390_pci_hpc.c +++ b/drivers/pci/hotplug/s390_pci_hpc.c @@ -57,6 +57,29 @@ static int disable_slot(struct hotplug_slot *hotplug_slot)
[PATCH 0/5] s390/pci: automatic error recovery
Hello, This series implements automatic error recovery for PCI devices on s390 following the scheme outlined at Documentation/PCI/pci-error-recovery.rst it applies on top of currenct master. The patches have are almost completely s390 specific except for two patches exporting existing functionality for use by arch/s390/pci/ code. Nevertheless I would also appreciate any feedback, especially on the last patch, concerning the implementation of the error recovery flow. I believe we might be the first implementation of PCI device recovery in a virtualized setting requiring us to coordinate the device reset with the hypervisor platform by issuing a disable and re-enable to the platform as well as starting the recovery following a platform event. The outline of the patches is as follows: Patch 1 and 2 add s390 specific code implementing a reset mechanism that takes the PCI function out of the platform specific error state. Patches 3 and 4 export existing common code functions for use by the s390 specific recovery code. Patch 3 I already sent separately resulting in the discussion below but without a final conclusion. https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/ I believe even though there were some doubts about the use of pci_dev_is_added() by arch code the existing uses as well as the use in the final patch of this series warrant this export. Patch 4 "PCI: Export pci_dev_lock()" is basically an extension to commit e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") which already exported pci_dev_trylock(). In the final patch we make use of pci_dev_lock() to wait for any other exclusive uses of the pdev to be finished before starting recovery. Finally Patch 5 implements the recovery flow as part of the existing s390 specific PCI availability and error event mechanism. Where previously the error case only set an error indication requiring manual intervention to make the device usable again. Now we handle the case where firmware has already reset a PCI function after an error was encountered informing the OS that it should be ready to be used again. Note that the same event is also issued by the hypervisor if the function was manually taken into a service mode for example for firmware upgrade via the hypervisor and is now ready to be used again. Thanks, Niklas Schnelle Niklas Schnelle (5): s390/pci: refresh function handle in iomap s390/pci: implement reset_slot for hotplug slot PCI: Move pci_dev_is/assign_added() to pci.h PCI: Export pci_dev_lock() s390/pci: implement minimal PCI error recovery arch/powerpc/platforms/powernv/pci-sriov.c | 3 - arch/powerpc/platforms/pseries/setup.c | 1 - arch/s390/include/asm/pci.h| 6 +- arch/s390/pci/pci.c| 143 ++- arch/s390/pci/pci_event.c | 196 - arch/s390/pci/pci_insn.c | 4 +- arch/s390/pci/pci_irq.c| 9 + arch/s390/pci/pci_sysfs.c | 2 - drivers/pci/hotplug/acpiphp_glue.c | 1 - drivers/pci/hotplug/s390_pci_hpc.c | 24 +++ drivers/pci/pci.c | 3 +- drivers/pci/pci.h | 15 -- include/linux/pci.h| 16 ++ 13 files changed, 389 insertions(+), 34 deletions(-) -- 2.25.1
[PATCH 1/5] s390/pci: refresh function handle in iomap
The function handle of a PCI function is updated when disabling or enabling it as well as when the function's availability changes or it enters the error state. Until now this only occurred either while there is no struct pci_dev associated with the function yet or the function became unavailable. This meant that leaving a stale function handle in the iomap either didn't happen because there was no iomap yet or it lead to errors on PCI access but so would the correct disabled function handle. In the future a CLP Set PCI Function Disable/Enable cycle during PCI device recovery may be done while the device is bound to a driver. In this case we must update the iomap associated with the now-stale function handle to ensure that the resulting zPCI instruction references an accurate function handle. Since the function handle is accessed by the PCI accessor helpers without locking use READ_ONCE()/WRITE_ONCE() to mark this access and prevent compiler optimizations that would move the load/store. With that infrastructure in place let's also properly update the function handle in the existing cases. This makes sure that in the future debugging of a zPCI function access through the handle will show an up to date handle reducing the chance of confusion. Also it makes sure we have one single place where a zPCI function handle is updated after initialization. Signed-off-by: Niklas Schnelle --- arch/s390/include/asm/pci.h | 1 + arch/s390/pci/pci.c | 36 arch/s390/pci/pci_event.c | 6 +++--- arch/s390/pci/pci_insn.c| 4 ++-- 4 files changed, 38 insertions(+), 9 deletions(-) diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index e4803ec51110..5e6cba22a801 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -211,6 +211,7 @@ int zpci_deconfigure_device(struct zpci_dev *zdev); int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64); int zpci_unregister_ioat(struct zpci_dev *, u8); void zpci_remove_reserved_devices(void); +void zpci_update_fh(struct zpci_dev *zdev, u32 fh); /* CLP */ int clp_setup_writeback_mio(void); diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index e7e6788d75a8..af22778551c1 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -481,6 +481,34 @@ static void zpci_free_iomap(struct zpci_dev *zdev, int entry) spin_unlock(_iomap_lock); } +static void zpci_do_update_iomap_fh(struct zpci_dev *zdev, u32 fh) +{ + int bar, idx; + + spin_lock(_iomap_lock); + for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) { + if (!zdev->bars[bar].size) + continue; + idx = zdev->bars[bar].map_idx; + if (!zpci_iomap_start[idx].count) + continue; + WRITE_ONCE(zpci_iomap_start[idx].fh, zdev->fh); + } + spin_unlock(_iomap_lock); +} + +void zpci_update_fh(struct zpci_dev *zdev, u32 fh) +{ + if (!fh || zdev->fh == fh) + return; + + zdev->fh = fh; + if (zpci_use_mio(zdev)) + return; + if (zdev->has_resources && zdev_enabled(zdev)) + zpci_do_update_iomap_fh(zdev, fh); +} + static struct resource *__alloc_res(struct zpci_dev *zdev, unsigned long start, unsigned long size, unsigned long flags) { @@ -668,7 +696,7 @@ int zpci_enable_device(struct zpci_dev *zdev) if (clp_enable_fh(zdev, , ZPCI_NR_DMA_SPACES)) rc = -EIO; else - zdev->fh = fh; + zpci_update_fh(zdev, fh); return rc; } @@ -679,14 +707,14 @@ int zpci_disable_device(struct zpci_dev *zdev) cc = clp_disable_fh(zdev, ); if (!cc) { - zdev->fh = fh; + zpci_update_fh(zdev, fh); } else if (cc == CLP_RC_SETPCIFN_ALRDY) { pr_info("Disabling PCI function %08x had no effect as it was already disabled\n", zdev->fid); /* Function is already disabled - update handle */ rc = clp_refresh_fh(zdev->fid, ); if (!rc) { - zdev->fh = fh; + zpci_update_fh(zdev, fh); rc = -EINVAL; } } else { @@ -768,7 +796,7 @@ int zpci_scan_configured_device(struct zpci_dev *zdev, u32 fh) { int rc; - zdev->fh = fh; + zpci_update_fh(zdev, fh); /* the PCI function will be scanned once function 0 appears */ if (!zdev->zbus->bus) return 0; diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c index c856f80cb21b..e868d996ec5b 100644 --- a/arch/s390/pci/pci_event.c +++ b/arch/s390/pci/pci_event.c @@ -76,7 +76,7 @@ void zpci_event_error(void *data) static void zpci_event_hard_deconfigured(struct zpci_dev *zdev, u32 fh) { - zdev->fh = fh; + zpci_update_fh(zdev, fh);
[RFC PATCH v2] powerpc/papr_scm: Move duplicate definitions to common header files
papr_scm and ndtest share common PDSM payload structs like nd_papr_pdsm_health. Presently these structs are duplicated across papr_pdsm.h and ndtest.h header files. Since 'ndtest' is essentially arch independent and can run on platforms other than PPC64, a way needs to be deviced to avoid redundancy and duplication of PDSM structs in future. So the patch proposes moving the PDSM header from arch/powerpc/include/uapi/ to the generic include/uapi/linux directory. Also, there are some #defines common between papr_scm and ndtest which are not exported to the user space. So, move them to a header file which can be shared across ndtest and papr_scm via newly introduced include/linux/papr_scm.h. Signed-off-by: Shivaprasad G Bhat Signed-off-by: Vaibhav Jain Suggested-by: "Aneesh Kumar K.V" --- Changelog: Since v1: Link: https://patchwork.kernel.org/project/linux-nvdimm/patch/162505488483.72147.12741153746322191381.stgit@56e104a48989/ * Removed dependency on this patch for the other patches MAINTAINERS |2 arch/powerpc/include/uapi/asm/papr_pdsm.h | 165 - arch/powerpc/platforms/pseries/papr_scm.c | 43 include/linux/papr_scm.h | 48 include/uapi/linux/papr_pdsm.h| 165 + tools/testing/nvdimm/test/ndtest.c|2 tools/testing/nvdimm/test/ndtest.h| 120 - 7 files changed, 219 insertions(+), 326 deletions(-) delete mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h create mode 100644 include/linux/papr_scm.h create mode 100644 include/uapi/linux/papr_pdsm.h diff --git a/MAINTAINERS b/MAINTAINERS index 6c8be735cc91..03fe0c77cefa 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10661,6 +10661,8 @@ F: drivers/rtc/rtc-opal.c F: drivers/scsi/ibmvscsi/ F: drivers/tty/hvc/hvc_opal.c F: drivers/watchdog/wdrtas.c +F: include/linux/papr_scm.h +F: include/uapi/linux/papr_pdsm.h F: tools/testing/selftests/powerpc N: /pmac N: powermac diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h deleted file mode 100644 index 17439925045c.. --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h +++ /dev/null @@ -1,165 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -/* - * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl - * - * (C) Copyright IBM 2020 - * - * Author: Vaibhav Jain - */ - -#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_ -#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_ - -#include -#include - -/* - * PDSM Envelope: - * - * The ioctl ND_CMD_CALL exchange data between user-space and kernel via - * envelope which consists of 2 headers sections and payload sections as - * illustrated below: - * +-+---+---+ - * | 64-Bytes | 8-Bytes | Max 184-Bytes | - * +-+---+---+ - * | ND-HEADER | PDSM-HEADER | PDSM-PAYLOAD | - * +-+---+---+ - * | nd_family | | | - * | nd_size_out | cmd_status| | - * | nd_size_in | reserved | nd_pdsm_payload | - * | nd_command | payload --> | | - * | nd_fw_size | | | - * | nd_payload ---> | | | - * +---+-+---+ - * - * ND Header: - * This is the generic libnvdimm header described as 'struct nd_cmd_pkg' - * which is interpreted by libnvdimm before passed on to papr_scm. Important - * member fields used are: - * 'nd_family' : (In) NVDIMM_FAMILY_PAPR_SCM - * 'nd_size_in': (In) PDSM-HEADER + PDSM-IN-PAYLOAD (usually 0) - * 'nd_size_out': (In) PDSM-HEADER + PDSM-RETURN-PAYLOAD - * 'nd_command' : (In) One of PAPR_PDSM_XXX - * 'nd_fw_size' : (Out) PDSM-HEADER + size of actual payload returned - * - * PDSM Header: - * This is papr-scm specific header that precedes the payload. This is defined - * as nd_cmd_pdsm_pkg. Following fields aare available in this header: - * - * 'cmd_status': (Out) Errors if any encountered while servicing PDSM. - * 'reserved' : Not used, reserved for future and should be set to 0. - * 'payload': A union of all the possible payload structs - * - * PDSM Payload: - * - * The layout of the PDSM Payload is defined by various structs shared between - * papr_scm and libndctl so that contents of payload can be interpreted. As such - * its defined as a union of all possible payload structs as - * 'union nd_pdsm_payload'. Based on the value of 'nd_cmd_pkg.nd_command' - * appropriate member of the union is accessed. - */ - -/* Max
[PATCH v2] tests/nvdimm/ndtest: Simulate nvdimm health, DSC and smart-inject
The 'papr_scm' module and 'papr' implementation in libndctl supports PDSMs for reporting PAPR NVDIMM health, dirty-shutdown-count and injecting smart-errors. This patch adds support for those PDSMs in ndtest module so that PDSM specific paths in libndctl can be exercised. Signed-off-by: Shivaprasad G Bhat Signed-off-by: Vaibhav Jain --- Changelog: Since v1: Link: https://patchwork.kernel.org/project/linux-nvdimm/list/?series=521767 * Removed the dependency on a header movement patch tools/testing/nvdimm/test/ndtest.c | 148 tools/testing/nvdimm/test/ndtest.h | 96 +++ 2 files changed, 244 insertions(+) diff --git a/tools/testing/nvdimm/test/ndtest.c b/tools/testing/nvdimm/test/ndtest.c index 6862915f1fb0..45d42cd25e82 100644 --- a/tools/testing/nvdimm/test/ndtest.c +++ b/tools/testing/nvdimm/test/ndtest.c @@ -48,6 +48,10 @@ static struct ndtest_dimm dimm_group1[] = { .uuid_str = "1e5c75d2-b618-11ea-9aa3-507b9ddc0f72", .physical_id = 0, .num_formats = 2, + .flags = PAPR_PMEM_HEALTH_NON_CRITICAL, + .extension_flags = PDSM_DIMM_DSC_VALID | PDSM_DIMM_HEALTH_RUN_GAUGE_VALID, + .dimm_fuel_gauge = 95, + .dimm_dsc = 42, }, { .size = DIMM_SIZE, @@ -55,6 +59,10 @@ static struct ndtest_dimm dimm_group1[] = { .uuid_str = "1c4d43ac-b618-11ea-be80-507b9ddc0f72", .physical_id = 1, .num_formats = 2, + .flags = PAPR_PMEM_HEALTH_NON_CRITICAL, + .extension_flags = PDSM_DIMM_DSC_VALID | PDSM_DIMM_HEALTH_RUN_GAUGE_VALID, + .dimm_fuel_gauge = 95, + .dimm_dsc = 42, }, { .size = DIMM_SIZE, @@ -62,6 +70,10 @@ static struct ndtest_dimm dimm_group1[] = { .uuid_str = "a9f17ffc-b618-11ea-b36d-507b9ddc0f72", .physical_id = 2, .num_formats = 2, + .flags = PAPR_PMEM_HEALTH_NON_CRITICAL, + .extension_flags = PDSM_DIMM_DSC_VALID | PDSM_DIMM_HEALTH_RUN_GAUGE_VALID, + .dimm_fuel_gauge = 95, + .dimm_dsc = 42, }, { .size = DIMM_SIZE, @@ -69,6 +81,10 @@ static struct ndtest_dimm dimm_group1[] = { .uuid_str = "b6b83b22-b618-11ea-8aae-507b9ddc0f72", .physical_id = 3, .num_formats = 2, + .flags = PAPR_PMEM_HEALTH_NON_CRITICAL, + .extension_flags = PDSM_DIMM_DSC_VALID | PDSM_DIMM_HEALTH_RUN_GAUGE_VALID, + .dimm_fuel_gauge = 95, + .dimm_dsc = 42, }, { .size = DIMM_SIZE, @@ -296,6 +312,103 @@ static int ndtest_get_config_size(struct ndtest_dimm *dimm, unsigned int buf_len return 0; } +static int ndtest_pdsm_health(struct ndtest_dimm *dimm, + union nd_pdsm_payload *payload, + unsigned int buf_len) +{ + struct nd_papr_pdsm_health *health = >health; + + if (buf_len < sizeof(health)) + return -EINVAL; + + health->extension_flags = 0; + health->dimm_unarmed = !!(dimm->flags & PAPR_PMEM_UNARMED_MASK); + health->dimm_bad_shutdown = !!(dimm->flags & PAPR_PMEM_BAD_SHUTDOWN_MASK); + health->dimm_bad_restore = !!(dimm->flags & PAPR_PMEM_BAD_RESTORE_MASK); + health->dimm_health = PAPR_PDSM_DIMM_HEALTHY; + + if (dimm->flags & PAPR_PMEM_HEALTH_FATAL) + health->dimm_health = PAPR_PDSM_DIMM_FATAL; + else if (dimm->flags & PAPR_PMEM_HEALTH_CRITICAL) + health->dimm_health = PAPR_PDSM_DIMM_CRITICAL; + else if (dimm->flags & PAPR_PMEM_HEALTH_UNHEALTHY || +dimm->flags & PAPR_PMEM_HEALTH_NON_CRITICAL) + health->dimm_health = PAPR_PDSM_DIMM_UNHEALTHY; + + health->extension_flags = 0; + if (dimm->extension_flags & PDSM_DIMM_HEALTH_RUN_GAUGE_VALID) { + health->dimm_fuel_gauge = dimm->dimm_fuel_gauge; + health->extension_flags |= PDSM_DIMM_HEALTH_RUN_GAUGE_VALID; + } + if (dimm->extension_flags & PDSM_DIMM_DSC_VALID) { + health->dimm_dsc = dimm->dimm_dsc; + health->extension_flags |= PDSM_DIMM_DSC_VALID; + } + + return 0; +} + +static void smart_notify(struct ndtest_dimm *dimm) +{ + struct device *bus = dimm->dev->parent; + + if (!(dimm->flags & PAPR_PMEM_HEALTH_NON_CRITICAL) || + (dimm->flags & PAPR_PMEM_BAD_SHUTDOWN_MASK)) { + device_lock(bus); + /* send smart notification */ + if (dimm->notify_handle) + sysfs_notify_dirent(dimm->notify_handle); + device_unlock(bus); + } +} + +static int ndtest_pdsm_smart_inject(struct ndtest_dimm *dimm, + union
[PATCH v2] powerpc/papr_scm: Implement initial support for injecting smart errors
From: Vaibhav Jain Presently PAPR doesn't support injecting smart errors on an NVDIMM. This makes testing the NVDIMM health reporting functionality difficult as simulating NVDIMM health related events need a hacked up qemu version. To solve this problem this patch proposes simulating certain set of NVDIMM health related events in papr_scm. Specifically 'fatal' health state and 'dirty' shutdown state. These error can be injected via the user-space 'ndctl-inject-smart(1)' command. With the proposed patch and corresponding ndctl patches following command flow is expected: $ sudo ndctl list -DH -d nmem0 ... "health_state":"ok", "shutdown_state":"clean", ... # inject unsafe shutdown and fatal health error $ sudo ndctl inject-smart nmem0 -Uf ... "health_state":"fatal", "shutdown_state":"dirty", ... # uninject all errors $ sudo ndctl inject-smart nmem0 -N ... "health_state":"ok", "shutdown_state":"clean", ... The patch adds two members 'health_bitmap_mask' and 'health_bitmap_override' inside struct papr_scm_priv which are then bit blt'ed to the health bitmaps fetched from the hypervisor. In case we are not able to fetch health information from the hypervisor we service the health bitmap from these two members. These members are accessible from sysfs at nmemX/papr/health_bitmap_override A new PDSM named 'SMART_INJECT' is proposed that accepts newly introduced 'struct nd_papr_pdsm_smart_inject' as payload thats exchanged between libndctl and papr_scm to indicate the requested smart-error states. When the processing the PDSM 'SMART_INJECT', papr_pdsm_smart_inject() constructs a pair or 'mask' and 'override' bitmaps from the payload and bit-blt it to the 'health_bitmap_{mask, override}' members. This ensures the after being fetched from the hypervisor, the health_bitmap reflects requested smart-error states. Signed-off-by: Vaibhav Jain Signed-off-by: Shivaprasad G Bhat --- Changelog: Since v1: Link: https://patchwork.kernel.org/project/linux-nvdimm/list/?series=513881 * Updated the patch description. * Removed dependency of a header movement patch. * Removed '__packed' attribute for 'struct nd_papr_pdsm_smart_inject' [Aneesh] arch/powerpc/include/uapi/asm/papr_pdsm.h | 18 ++ arch/powerpc/platforms/pseries/papr_scm.c | 94 - 2 files changed, 109 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h index 82488b1e7276..17439925045c 100644 --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h @@ -116,6 +116,22 @@ struct nd_papr_pdsm_health { }; }; +/* Flags for injecting specific smart errors */ +#define PDSM_SMART_INJECT_HEALTH_FATAL (1 << 0) +#define PDSM_SMART_INJECT_BAD_SHUTDOWN (1 << 1) + +struct nd_papr_pdsm_smart_inject { + union { + struct { + /* One or more of PDSM_SMART_INJECT_ */ + __u32 flags; + __u8 fatal_enable; + __u8 unsafe_shutdown_enable; + }; + __u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE]; + }; +}; + /* * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel * via 'nd_cmd_pkg.nd_command' member of the ioctl struct @@ -123,12 +139,14 @@ struct nd_papr_pdsm_health { enum papr_pdsm { PAPR_PDSM_MIN = 0x0, PAPR_PDSM_HEALTH, + PAPR_PDSM_SMART_INJECT, PAPR_PDSM_MAX, }; /* Maximal union that can hold all possible payload types */ union nd_pdsm_payload { struct nd_papr_pdsm_health health; + struct nd_papr_pdsm_smart_inject smart_inject; __u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE]; } __packed; diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index f48e87ac89c9..de4cf329cfb3 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -68,6 +68,10 @@ #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS) #define PAPR_SCM_PERF_STATS_VERSION 0x1 +/* Use bitblt method to override specific bits in the '_bitmap_' */ +#define BITBLT_BITMAP(_bitmap_, _mask_, _override_)\ + (((_bitmap_) & ~(_mask_)) | ((_mask_) & (_override_))) + /* Struct holding a single performance metric */ struct papr_scm_perf_stat { u8 stat_id[8]; @@ -120,6 +124,12 @@ struct papr_scm_priv { /* length of the stat buffer as expected by phyp */ size_t stat_buffer_len; + + /* The bits which needs to be overridden */ + u64 health_bitmap_mask; + + /* The overridden values for the bits having the masks set */ + u64 health_bitmap_override; }; static int papr_scm_pmem_flush(struct nd_region *nd_region, @@ -347,19 +357,28 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p, static int __drc_pmem_query_health(struct papr_scm_priv
[PATCH v3 3/3] powerpc/mce: Modify the real address error logging messages
To avoid ambiguity, modify the strings in real address error logging messages to "foreign/control memory" from "foreign", Since the error discriptions in P9 user manual and P10 user manual are different for same type of errors. P9 User Manual for MCE: DSISR:59 Host real address to foreign space during translation. DSISR:60 Host real address to foreign space on a load or store access. P10 User Manual for MCE: DSISR:59 D-side tablewalk used a host real address in the control memory address range. DSISR:60 D-side operand access to control memory address space. Signed-off-by: Ganesh Goudar --- v3: No changes. v2: No changes. --- arch/powerpc/kernel/mce.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 9d1e39d42e3e..5baf69503349 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -400,14 +400,14 @@ void machine_check_print_event_info(struct machine_check_event *evt, static const char *mc_ra_types[] = { "Indeterminate", "Instruction fetch (bad)", - "Instruction fetch (foreign)", + "Instruction fetch (foreign/control memory)", "Page table walk ifetch (bad)", - "Page table walk ifetch (foreign)", + "Page table walk ifetch (foreign/control memory)", "Load (bad)", "Store (bad)", "Page table walk Load/Store (bad)", - "Page table walk Load/Store (foreign)", - "Load/Store (foreign)", + "Page table walk Load/Store (foreign/control memory)", + "Load/Store (foreign/control memory)", }; static const char *mc_link_types[] = { "Indeterminate", -- 2.31.1
[PATCH v3 2/3] selftests/powerpc: Add test for real address error handling
Add test for real address or control memory address access error handling, using NX-GZIP engine. The error is injected by accessing the control memory address using illegal instruction, on successful handling the process attempting to access control memory address using illegal instruction receives SIGBUS. Signed-off-by: Ganesh Goudar --- v3: Avoid using shell script to inject error. v2: Fix build error. --- tools/testing/selftests/powerpc/Makefile | 3 +- tools/testing/selftests/powerpc/mce/Makefile | 7 ++ .../selftests/powerpc/mce/inject-ra-err.c | 65 +++ tools/testing/selftests/powerpc/mce/vas-api.h | 1 + 4 files changed, 75 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/mce/Makefile create mode 100644 tools/testing/selftests/powerpc/mce/inject-ra-err.c create mode 12 tools/testing/selftests/powerpc/mce/vas-api.h diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile index 0830e63818c1..4830372d7416 100644 --- a/tools/testing/selftests/powerpc/Makefile +++ b/tools/testing/selftests/powerpc/Makefile @@ -31,7 +31,8 @@ SUB_DIRS = alignment \ vphn \ math \ ptrace \ - security + security \ + mce endif diff --git a/tools/testing/selftests/powerpc/mce/Makefile b/tools/testing/selftests/powerpc/mce/Makefile new file mode 100644 index ..2424513982d9 --- /dev/null +++ b/tools/testing/selftests/powerpc/mce/Makefile @@ -0,0 +1,7 @@ +#SPDX-License-Identifier: GPL-2.0-or-later + +TEST_GEN_PROGS := inject-ra-err + +include ../../lib.mk + +$(TEST_GEN_PROGS): ../harness.c diff --git a/tools/testing/selftests/powerpc/mce/inject-ra-err.c b/tools/testing/selftests/powerpc/mce/inject-ra-err.c new file mode 100644 index ..94323c34d9a6 --- /dev/null +++ b/tools/testing/selftests/powerpc/mce/inject-ra-err.c @@ -0,0 +1,65 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "vas-api.h" +#include "utils.h" + +static bool faulted; + +static void sigbus_handler(int n, siginfo_t *info, void *ctxt_v) +{ + ucontext_t *ctxt = (ucontext_t *)ctxt_v; + struct pt_regs *regs = ctxt->uc_mcontext.regs; + + faulted = true; + regs->nip += 4; +} + +static int test_ra_error(void) +{ + struct vas_tx_win_open_attr attr; + int fd, *paste_addr; + char *devname = "/dev/crypto/nx-gzip"; + struct sigaction act = { + .sa_sigaction = sigbus_handler, + .sa_flags = SA_SIGINFO, + }; + + memset(, 0, sizeof(attr)); + attr.version = 1; + attr.vas_id = 0; + + SKIP_IF(access(devname, F_OK)); + + fd = open(devname, O_RDWR); + FAIL_IF(fd < 0); + FAIL_IF(ioctl(fd, VAS_TX_WIN_OPEN, ) < 0); + FAIL_IF(sigaction(SIGBUS, , NULL) != 0); + + paste_addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0ULL); + + /* The following assignment triggers exception */ + mb(); + *paste_addr = 1; + mb(); + + FAIL_IF(!faulted); + + return 0; +} + +int main(void) +{ + return test_harness(test_ra_error, "inject-ra-err"); +} + diff --git a/tools/testing/selftests/powerpc/mce/vas-api.h b/tools/testing/selftests/powerpc/mce/vas-api.h new file mode 12 index ..1455c1bcd351 --- /dev/null +++ b/tools/testing/selftests/powerpc/mce/vas-api.h @@ -0,0 +1 @@ +../../../../../arch/powerpc/include/uapi/asm/vas-api.h \ No newline at end of file -- 2.31.1
[PATCH v3 1/3] powerpc/pseries: Parse control memory access error
Add support to parse and log control memory access error for pseries. These changes are made according to PAPR v2.11 10.3.2.2.12. Signed-off-by: Ganesh Goudar --- v3: Modify the commit log to mention the document according to which changes are made. Define and use a macro to check if the effective address is provided. v2: No changes. --- arch/powerpc/platforms/pseries/ras.c | 36 1 file changed, 32 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index 56092dccfdb8..e62a0ca2611a 100644 --- a/arch/powerpc/platforms/pseries/ras.c +++ b/arch/powerpc/platforms/pseries/ras.c @@ -60,11 +60,17 @@ struct pseries_mc_errorlog { * XX 2: Reserved. *XXX 3: Type of UE error. * -* For error_type != MC_ERROR_TYPE_UE +* For error_type == MC_ERROR_TYPE_SLB/ERAT/TLB * * X 1: Effective address provided. *X 5: Reserved. * XX 2: Type of SLB/ERAT/TLB error. +* +* For error_type == MC_ERROR_TYPE_CTRL_MEM_ACCESS +* +* X 1: Error causing address provided. +*XXX 3: Type of error. +* 4: Reserved. */ u8 sub_err_type; u8 reserved_1[6]; @@ -80,6 +86,7 @@ struct pseries_mc_errorlog { #define MC_ERROR_TYPE_TLB 0x04 #define MC_ERROR_TYPE_D_CACHE 0x05 #define MC_ERROR_TYPE_I_CACHE 0x07 +#define MC_ERROR_TYPE_CTRL_MEM_ACCESS 0x08 /* RTAS pseries MCE error sub types */ #define MC_ERROR_UE_INDETERMINATE 0 @@ -90,6 +97,7 @@ struct pseries_mc_errorlog { #define UE_EFFECTIVE_ADDR_PROVIDED 0x40 #define UE_LOGICAL_ADDR_PROVIDED 0x20 +#define MC_EFFECTIVE_ADDR_PROVIDED 0x80 #define MC_ERROR_SLB_PARITY0 #define MC_ERROR_SLB_MULTIHIT 1 @@ -103,6 +111,9 @@ struct pseries_mc_errorlog { #define MC_ERROR_TLB_MULTIHIT 2 #define MC_ERROR_TLB_INDETERMINATE 3 +#define MC_ERROR_CTRL_MEM_ACCESS_PTABLE_WALK 0 +#define MC_ERROR_CTRL_MEM_ACCESS_OP_ACCESS 1 + static inline u8 rtas_mc_error_sub_type(const struct pseries_mc_errorlog *mlog) { switch (mlog->error_type) { @@ -112,6 +123,8 @@ static inline u8 rtas_mc_error_sub_type(const struct pseries_mc_errorlog *mlog) caseMC_ERROR_TYPE_ERAT: caseMC_ERROR_TYPE_TLB: return (mlog->sub_err_type & 0x03); + caseMC_ERROR_TYPE_CTRL_MEM_ACCESS: + return (mlog->sub_err_type & 0x70) >> 4; default: return 0; } @@ -656,7 +669,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs, mce_err.u.slb_error_type = MCE_SLB_ERROR_INDETERMINATE; break; } - if (mce_log->sub_err_type & 0x80) + if (mce_log->sub_err_type & MC_EFFECTIVE_ADDR_PROVIDED) eaddr = be64_to_cpu(mce_log->effective_address); break; case MC_ERROR_TYPE_ERAT: @@ -673,7 +686,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs, mce_err.u.erat_error_type = MCE_ERAT_ERROR_INDETERMINATE; break; } - if (mce_log->sub_err_type & 0x80) + if (mce_log->sub_err_type & MC_EFFECTIVE_ADDR_PROVIDED) eaddr = be64_to_cpu(mce_log->effective_address); break; case MC_ERROR_TYPE_TLB: @@ -690,7 +703,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs, mce_err.u.tlb_error_type = MCE_TLB_ERROR_INDETERMINATE; break; } - if (mce_log->sub_err_type & 0x80) + if (mce_log->sub_err_type & MC_EFFECTIVE_ADDR_PROVIDED) eaddr = be64_to_cpu(mce_log->effective_address); break; case MC_ERROR_TYPE_D_CACHE: @@ -699,6 +712,21 @@ static int mce_handle_err_virtmode(struct pt_regs *regs, case MC_ERROR_TYPE_I_CACHE: mce_err.error_type = MCE_ERROR_TYPE_ICACHE; break; + case MC_ERROR_TYPE_CTRL_MEM_ACCESS: + mce_err.error_type = MCE_ERROR_TYPE_RA; + switch (err_sub_type) { + case MC_ERROR_CTRL_MEM_ACCESS_PTABLE_WALK: + mce_err.u.ra_error_type = + MCE_RA_ERROR_PAGE_TABLE_WALK_LOAD_STORE_FOREIGN; + break; + case MC_ERROR_CTRL_MEM_ACCESS_OP_ACCESS: + mce_err.u.ra_error_type = + MCE_RA_ERROR_LOAD_STORE_FOREIGN; + break; + } + if (mce_log->sub_err_type & MC_EFFECTIVE_ADDR_PROVIDED) +
[PATCH] powerpc/mce: Fix access error in mce handler
We queue an irq work for deferred processing of mce event in realmode mce handler, where translation is disabled. Queuing of the work may result in accessing memory outside RMO region, such access needs the translation to be enabled for an LPAR running with hash mmu else the kernel crashes. So enable the translation before queuing the work. Without this change following trace is seen on injecting machine check error in an LPAR running with hash mmu. Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0-mce+ #137 NIP: c0735d60 LR: c0318640 CTR: REGS: c0001ebff9a0 TRAP: 0300 Tainted: G OE (5.14.0-mce+) MSR: 80001003 CR: 28008228 XER: 0001 CFAR: c031863c DAR: c0027fa8fe08 DSISR: 4000 IRQMASK: 0 GPR00: c03186d0 c0001ebffc40 c1b0df00 c16337e8 GPR04: c16337e8 c0027fa8fe08 0023 c16337f0 GPR08: 0023 c12ffe08 c00801460240 GPR12: c0001ec9a900 c0002ac4bd00 GPR16: 05a0 c008006b c008006b05a0 c0ff3068 GPR20: c0002ac4bbc0 0001 c0002ac4bbc0 c00801490298 GPR24: c00801490108 c1636198 c00801470090 c00801470058 GPR28: 0510 c0080100 c0080819 0019 NIP [c0735d60] llist_add_batch+0x0/0x40 LR [c0318640] __irq_work_queue_local+0x70/0xc0 Call Trace: [c0001ebffc40] [c0001ebffc0c] 0xc0001ebffc0c (unreliable) [c0001ebffc60] [c03186d0] irq_work_queue+0x40/0x70 [c0001ebffc80] [c004425c] machine_check_queue_event+0xbc/0xd0 [c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4 Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from handler") Signed-off-by: Ganesh Goudar --- arch/powerpc/kernel/mce.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 47a683cd00d2..9d1e39d42e3e 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -249,6 +249,7 @@ void machine_check_queue_event(void) { int index; struct machine_check_event evt; + unsigned long msr; if (!get_mce_event(, MCE_EVENT_RELEASE)) return; @@ -262,8 +263,19 @@ void machine_check_queue_event(void) memcpy(_paca->mce_info->mce_event_queue[index], , sizeof(evt)); - /* Queue irq work to process this event later. */ - irq_work_queue(_event_process_work); + /* Queue irq work to process this event later. Before +* queuing the work enable translation for non radix LPAR, +* as irq_work_queue may try to access memory outside RMO +* region. +*/ + if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) { + msr = mfmsr(); + mtmsr(msr | MSR_IR | MSR_DR); + irq_work_queue(_event_process_work); + mtmsr(msr); + } else { + irq_work_queue(_event_process_work); + } } void mce_common_process_ue(struct pt_regs *regs, -- 2.31.1