Re: [PATCH] powerpc: ensure that swiotlb buffer is allocated from low memory
On Mon, Dec 09, 2019 at 04:43:17PM +1100, Michael Ellerman wrote: > Mike Rapoport writes: > > From: Mike Rapoport > > > > Some powerpc platforms (e.g. 85xx) limit DMA-able memory way below 4G. If a > > system has more physical memory than this limit, the swiotlb buffer is not > > addressable because it is allocated from memblock using top-down mode. > > > > Force memblock to bottom-up mode before calling swiotlb_init() to ensure > > that the swiotlb buffer is DMA-able. > > > > Link: > > https://lkml.kernel.org/r/f1ebb706-73df-430e-9020-c214ec8ed...@xenosoft.de > > This wasn't bisected, but I thought it was a regression. Do we know what > commit caused it? > > Was it 25078dc1f74b ("powerpc: use mm zones more sensibly") ? swiotlb buffer is initialized before zones are actually used, so probably not :) > Or was that a red herring? > > cheers > > > Reported-by: Christian Zigotzky > > Signed-off-by: Mike Rapoport > > Cc: Benjamin Herrenschmidt > > Cc: Christoph Hellwig > > Cc: Darren Stevens > > Cc: mad skateman > > Cc: Michael Ellerman > > Cc: Nicolas Saenz Julienne > > Cc: Paul Mackerras > > Cc: Robin Murphy > > Cc: Rob Herring > > --- > > arch/powerpc/mm/mem.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c > > index be941d382c8d..14c2c53e3f9e 100644 > > --- a/arch/powerpc/mm/mem.c > > +++ b/arch/powerpc/mm/mem.c > > @@ -260,6 +260,14 @@ void __init mem_init(void) > > BUILD_BUG_ON(MMU_PAGE_COUNT > 16); > > > > #ifdef CONFIG_SWIOTLB > > + /* > > +* Some platforms (e.g. 85xx) limit DMA-able memory way below > > +* 4G. We force memblock to bottom-up mode to ensure that the > > +* memory allocated in swiotlb_init() is DMA-able. > > +* As it's the last memblock allocation, no need to reset it > > +* back to to-down. > > +*/ > > + memblock_set_bottom_up(true); > > swiotlb_init(0); > > #endif > > > > -- > > 2.24.0 -- Sincerely yours, Mike.
[PATCH] powerpc/irq: fix stack overflow verification
Before commit 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack"), check_stack_overflow() was called by do_IRQ(), before switching to the irq stack. In that commit, do_IRQ() was renamed __do_irq(), and is now executing on the irq stack, so check_stack_overflow() has just become almost useless. Move check_stack_overflow() call in do_IRQ() to do the check while still on the current stack. Fixes: 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/irq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 0aebd7843c73..e2bce937d51f 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -667,8 +667,6 @@ void __do_irq(struct pt_regs *regs) trace_irq_entry(regs); - check_stack_overflow(); - /* * Query the platform PIC for the interrupt & ack it. * @@ -701,6 +699,8 @@ void do_IRQ(struct pt_regs *regs) irqsp = hardirq_ctx[raw_smp_processor_id()]; sirqsp = softirq_ctx[raw_smp_processor_id()]; + check_stack_overflow(); + /* Already there ? */ if (unlikely(cursp == irqsp || cursp == sirqsp)) { __do_irq(regs); -- 2.13.3
[PATCH 2/2] powerpc/irq: use IS_ENABLED() in check_stack_overflow()
Instead of #ifdef, use IS_ENABLED(CONFIG_DEBUG_STACKOVERFLOW). This enable GCC to check for code validity even when the option is not selected. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/irq.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 4d468d835558..0aebd7843c73 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -598,16 +598,17 @@ u64 arch_irq_stat_cpu(unsigned int cpu) static inline void check_stack_overflow(void) { -#ifdef CONFIG_DEBUG_STACKOVERFLOW register unsigned long r1 asm("r1"); long sp = r1 & (THREAD_SIZE - 1); + if (!IS_ENABLED(CONFIG_DEBUG_STACKOVERFLOW)) + return; + /* check for stack overflow: is there less than 2KB free? */ if (unlikely(sp < 2048)) { pr_err("do_IRQ: stack overflow: %ld\n", sp); dump_stack(); } -#endif } #ifdef CONFIG_PPC32 -- 2.13.3
[PATCH 1/2] powerpc/irq: don't use current_stack_pointer() in check_stack_overflow()
current_stack_pointer() doesn't return the stack pointer, but the caller's stack frame. See commit bfe9a2cfe91a ("powerpc: Reimplement __get_SP() as a function not a define") and commit acf620ecf56c ("powerpc: Rename __get_SP() to current_stack_pointer()") for details. The purpose of check_stack_overflow() is to verify that the stack has not overflowed. To really know whether the stack pointer is still within boundaries, the check must be done directly on the value of r1. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/irq.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index bb34005ff9d2..4d468d835558 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -599,9 +599,8 @@ u64 arch_irq_stat_cpu(unsigned int cpu) static inline void check_stack_overflow(void) { #ifdef CONFIG_DEBUG_STACKOVERFLOW - long sp; - - sp = current_stack_pointer() & (THREAD_SIZE-1); + register unsigned long r1 asm("r1"); + long sp = r1 & (THREAD_SIZE - 1); /* check for stack overflow: is there less than 2KB free? */ if (unlikely(sp < 2048)) { -- 2.13.3
Re: [PATCH] powerpc/archrandom: fix arch_get_random_seed_int()
On Wed, 2019-12-04 at 11:50:15 UTC, Ard Biesheuvel wrote: > Commit 01c9348c7620ec65 > > powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_* > > updated arch_get_random_[int|long]() to be NOPs, and moved the hardware > RNG backing to arch_get_random_seed_[int|long]() instead. However, it > failed to take into account that arch_get_random_int() was implemented > in terms of arch_get_random_long(), and so we ended up with a version > of the former that is essentially a NOP as well. > > Fix this by calling arch_get_random_seed_long() from > arch_get_random_seed_int() instead. > > Fixes: 01c9348c7620ec65 ("powerpc: Use hardware RNG for > arch_get_random_seed_* not arch_get_random_*") > Signed-off-by: Ard Biesheuvel Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/b6afd1234cf93aa0d71b4be4788c47534905f0be cheers
Re: [PATCH] powerpc/pmem: Convert to EXPORT_SYMBOL_GPL
On Mon, 2019-12-02 at 06:40:18 UTC, "Aneesh Kumar K.V" wrote: > All other architecture export this as GPL symbol > > Signed-off-by: Aneesh Kumar K.V Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/551003fff7235ce935bc1fefb72d12b63a408bd0 cheers
Re: [PATCH v3] platforms/powernv: Avoid re-registration of imc debugfs directory
On Wed, 2019-11-27 at 07:20:35 UTC, Anju T Sudhakar wrote: > export_imc_mode_and_cmd() function which creates the debugfs interface for > imc-mode and imc-command, is invoked when each nest pmu units is > registered. > When the first nest pmu unit is registered, export_imc_mode_and_cmd() > creates 'imc' directory under `/debug/powerpc/`. In the subsequent > invocations debugfs_create_dir() function returns, since the directory > already exists. > > The recent commit (debugfs: make error message a bit more > verbose), throws a warning if we try to invoke `debugfs_create_dir()` > with an already existing directory name. > > Address this warning by making the debugfs directory registration > in the opal_imc_counters_probe() function, i.e invoke > export_imc_mode_and_cmd() function from the probe function. > > Signed-off-by: Anju T Sudhakar Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/48e626ac85b43cc589dd1b3b8004f7f85f03544d cheers
Re: [PATCH v2] powerpc/perf: Disable trace_imc pmu
On Mon, 2019-11-18 at 03:44:52 UTC, Madhavan Srinivasan wrote: > When a root user or a user with CAP_SYS_ADMIN > privilege use trace_imc performance monitoring > unit events, to monitor application or KVM threads, > may result in a checkstop (System crash). Reason > being frequent switch of the "trace/accumulation" > mode of In-Memory Collection hardware. > This patch disables trace_imc pmu unit, but will > be re-enabled at a later stage with a fix patchset. > > Fixes: 012ae244845f1 ('powerpc/perf: Trace imc PMU functions') > Signed-off-by: Madhavan Srinivasan Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/249fad734a25889a4f23ed014d43634af6798063 cheers
Re: [PATCH] powerpc: ensure that swiotlb buffer is allocated from low memory
Mike Rapoport writes: > From: Mike Rapoport > > Some powerpc platforms (e.g. 85xx) limit DMA-able memory way below 4G. If a > system has more physical memory than this limit, the swiotlb buffer is not > addressable because it is allocated from memblock using top-down mode. > > Force memblock to bottom-up mode before calling swiotlb_init() to ensure > that the swiotlb buffer is DMA-able. > > Link: > https://lkml.kernel.org/r/f1ebb706-73df-430e-9020-c214ec8ed...@xenosoft.de This wasn't bisected, but I thought it was a regression. Do we know what commit caused it? Was it 25078dc1f74b ("powerpc: use mm zones more sensibly") ? Or was that a red herring? cheers > Reported-by: Christian Zigotzky > Signed-off-by: Mike Rapoport > Cc: Benjamin Herrenschmidt > Cc: Christoph Hellwig > Cc: Darren Stevens > Cc: mad skateman > Cc: Michael Ellerman > Cc: Nicolas Saenz Julienne > Cc: Paul Mackerras > Cc: Robin Murphy > Cc: Rob Herring > --- > arch/powerpc/mm/mem.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c > index be941d382c8d..14c2c53e3f9e 100644 > --- a/arch/powerpc/mm/mem.c > +++ b/arch/powerpc/mm/mem.c > @@ -260,6 +260,14 @@ void __init mem_init(void) > BUILD_BUG_ON(MMU_PAGE_COUNT > 16); > > #ifdef CONFIG_SWIOTLB > + /* > + * Some platforms (e.g. 85xx) limit DMA-able memory way below > + * 4G. We force memblock to bottom-up mode to ensure that the > + * memory allocated in swiotlb_init() is DMA-able. > + * As it's the last memblock allocation, no need to reset it > + * back to to-down. > + */ > + memblock_set_bottom_up(true); > swiotlb_init(0); > #endif > > -- > 2.24.0
Re: [PATCH V2 00/13] powerpc/vas: Page fault handling for user space NX requests
Hi, What do you mean by NX ? Up to now, NX has been standing for No-eXecute. That's a bit in segment registers on book3s/32 to forbid executing code. Therefore, some of your text is really misleading. If NX means something else for you, your text must be unambiguous. Christophe Le 09/12/2019 à 04:18, Haren Myneni a écrit : Applications will send compression / decompression requests to NX with COPY/PASTE instructions. When NX is processing these requests, can hit fault on the request buffer (not in memory). It issues an interrupt and pastes fault CRB in fault FIFO. Expects kernel to handle this fault and return credits for both send and fault windows after processing. This patch series adds IRQ and fault window setup, and NX fault handling: - Read IRQ# from "interrupts" property and configure IRQ per VAS instance. - Set port# for each window to generate an interrupt when noticed fault. - Set fault window and FIFO on which NX paste fault CRB. - Setup IRQ thread fault handler per VAS instance. - When receiving an interrupt, Read CRBs from fault FIFO and update coprocessor_status_block (CSB) in the corresponding CRB with translation failure (CSB_CC_TRANSLATION). After issuing NX requests, process polls on CSB address. When it sees translation error, can touch the request buffer to bring the page in to memory and reissue NX request. - If copy_to_user fails on user space CSB address, OS sends SEGV signal. Tested these patches with NX-GZIP support and will be posting this series soon. Patch 2: Define nx_fault_stamp on which NX writes fault status for the fault CRB Patch 3: Read interrupts and port properties per VAS instance Patch 4: Setup fault window per each VAS instance. This window is used for NX to paste fault CRB in FIFO. Patches 5 & 6: Setup threaded IRQ per VAS and register NX with fault window ID and port number for each send window so that NX paste fault CRB in this window. Patch 7: Reference to pid and mm so that pid is not used until window closed. Needed for multi thread application where child can open a window and can be used by parent later. Patches 8 and 9: Process CRBs from fault FIFO and notify tasks by updating CSB or through signals. Patches 10 and 11: Return credits for send and fault windows after handling faults. Patch 13:Fix closing send window after all credits are returned. This issue happens only for user space requests. No page faults on kernel request buffer. Changelog: V2: - Use threaded IRQ instead of own kernel thread handler - Use pswid insted of user space CSB address to find valid CRB - Removed unused macros and other changes as suggested by Christoph Hellwig Haren Myneni (13): powerpc/vas: Describe vas-port and interrupts properties powerpc/vas: Define nx_fault_stamp in coprocessor_request_block powerpc/vas: Read interrupts and vas-port device tree properties powerpc/vas: Setup fault window per VAS instance powerpc/vas: Setup thread IRQ handler per VAS instance powerpc/vas: Register NX with fault window ID and IRQ port value powerpc/vas: Take reference to PID and mm for user space windows powerpc/vas: Update CSB and notify process for fault CRBs powerpc/vas: Print CRB and FIFO values powerpc/vas: Do not use default credits for receive window powerpc/VAS: Return credits after handling fault powerpc/vas: Display process stuck message powerpc/vas: Free send window in VAS instance after credits returned .../devicetree/bindings/powerpc/ibm,vas.txt| 5 + arch/powerpc/include/asm/icswx.h | 18 +- arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/vas-debug.c | 2 +- arch/powerpc/platforms/powernv/vas-fault.c | 337 + arch/powerpc/platforms/powernv/vas-window.c| 173 ++- arch/powerpc/platforms/powernv/vas.c | 77 - arch/powerpc/platforms/powernv/vas.h | 38 ++- 8 files changed, 627 insertions(+), 25 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/vas-fault.c
[PATCH v5 6/6] powerpc/fadump: sysfs for fadump memory reservation
Add a sys interface to allow querying the memory reserved by FADump for saving the crash dump. Also added Documentation/ABI for the new sysfs file. Signed-off-by: Sourabh Jain --- Documentation/ABI/testing/sysfs-kernel-fadump| 7 +++ Documentation/powerpc/firmware-assisted-dump.rst | 5 + arch/powerpc/kernel/fadump.c | 9 + 3 files changed, 21 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump b/Documentation/ABI/testing/sysfs-kernel-fadump index 5d988b919e81..8f7a64a81783 100644 --- a/Documentation/ABI/testing/sysfs-kernel-fadump +++ b/Documentation/ABI/testing/sysfs-kernel-fadump @@ -31,3 +31,10 @@ Description: write only the system is booted to capture the vmcore using FADump. It is used to release the memory reserved by FADump to save the crash dump. + +What: /sys/kernel/fadump/mem_reserved +Date: Dec 2019 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read only + Provide information about the amount of memory reserved by + FADump to save the crash dump in bytes. diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst index 365c10209ef3..04993eaf3113 100644 --- a/Documentation/powerpc/firmware-assisted-dump.rst +++ b/Documentation/powerpc/firmware-assisted-dump.rst @@ -268,6 +268,11 @@ Here is the list of files under kernel sysfs: be handled and vmcore will not be captured. This interface can be easily integrated with kdump service start/stop. + /sys/kernel/fadump/mem_reserved + + This is used to display the memory reserved by FADump for saving the + crash dump. + /sys/kernel/fadump_release_mem This file is available only when FADump is active during second kernel. This is used to release the reserved memory diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 35ecb51edc50..6f367e5b7970 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1364,6 +1364,13 @@ static ssize_t enabled_show(struct kobject *kobj, return sprintf(buf, "%d\n", fw_dump.fadump_enabled); } +static ssize_t mem_reserved_show(struct kobject *kobj, +struct kobj_attribute *attr, +char *buf) +{ + return sprintf(buf, "%ld\n", fw_dump.reserve_dump_area_size); +} + static ssize_t registered_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -1431,10 +1438,12 @@ EXPORT_SYMBOL_GPL(fadump_kobj); static struct kobj_attribute release_attr = __ATTR_WO(release_mem); static struct kobj_attribute enable_attr = __ATTR_RO(enabled); static struct kobj_attribute register_attr = __ATTR_RW(registered); +static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved); static struct attribute *fadump_attrs[] = { _attr.attr, _attr.attr, + _reserved_attr.attr, NULL, }; -- 2.17.2
[PATCH v5 5/6] Documentation/ABI: mark /sys/kernel/fadump_* sysfs files deprecated
Add a deprecation note in FADump sysfs ABI documentation files and move them from ABI/testing to ABI/obsolete directory. Signed-off-by: Sourabh Jain --- .../ABI/{testing => obsolete}/sysfs-kernel-fadump_enabled | 2 ++ .../{testing => obsolete}/sysfs-kernel-fadump_registered | 2 ++ .../{testing => obsolete}/sysfs-kernel-fadump_release_mem | 2 ++ Documentation/powerpc/firmware-assisted-dump.rst | 8 4 files changed, 14 insertions(+) rename Documentation/ABI/{testing => obsolete}/sysfs-kernel-fadump_enabled (73%) rename Documentation/ABI/{testing => obsolete}/sysfs-kernel-fadump_registered (77%) rename Documentation/ABI/{testing => obsolete}/sysfs-kernel-fadump_release_mem (78%) diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_enabled b/Documentation/ABI/obsolete/sysfs-kernel-fadump_enabled similarity index 73% rename from Documentation/ABI/testing/sysfs-kernel-fadump_enabled rename to Documentation/ABI/obsolete/sysfs-kernel-fadump_enabled index f73632b1c006..e9c2de8b3688 100644 --- a/Documentation/ABI/testing/sysfs-kernel-fadump_enabled +++ b/Documentation/ABI/obsolete/sysfs-kernel-fadump_enabled @@ -1,3 +1,5 @@ +This ABI is renamed and moved to a new location /sys/kernel/fadump/enabled. + What: /sys/kernel/fadump_enabled Date: Feb 2012 Contact: linuxppc-dev@lists.ozlabs.org diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_registered b/Documentation/ABI/obsolete/sysfs-kernel-fadump_registered similarity index 77% rename from Documentation/ABI/testing/sysfs-kernel-fadump_registered rename to Documentation/ABI/obsolete/sysfs-kernel-fadump_registered index dcf925e53f0f..0360be39c98e 100644 --- a/Documentation/ABI/testing/sysfs-kernel-fadump_registered +++ b/Documentation/ABI/obsolete/sysfs-kernel-fadump_registered @@ -1,3 +1,5 @@ +This ABI is renamed and moved to a new location /sys/kernel/fadump/registered.?? + What: /sys/kernel/fadump_registered Date: Feb 2012 Contact: linuxppc-dev@lists.ozlabs.org diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem b/Documentation/ABI/obsolete/sysfs-kernel-fadump_release_mem similarity index 78% rename from Documentation/ABI/testing/sysfs-kernel-fadump_release_mem rename to Documentation/ABI/obsolete/sysfs-kernel-fadump_release_mem index 9c20d64ab48d..6ce0b129ab12 100644 --- a/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem +++ b/Documentation/ABI/obsolete/sysfs-kernel-fadump_release_mem @@ -1,3 +1,5 @@ +This ABI is renamed and moved to a new location /sys/kernel/fadump/release_mem.?? + What: /sys/kernel/fadump_release_mem Date: Feb 2012 Contact: linuxppc-dev@lists.ozlabs.org diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst index 345a3405206e..365c10209ef3 100644 --- a/Documentation/powerpc/firmware-assisted-dump.rst +++ b/Documentation/powerpc/firmware-assisted-dump.rst @@ -295,6 +295,14 @@ Note: /sys/kernel/fadump_release_opalcore sysfs has moved to echo 1 > /sys/firmware/opal/mpipl/release_core +Note: The following FADump sysfs files are deprecated. + +Deprecated Alternative + +/sys/kernel/fadump_enabled /sys/kernel/fadump/enabled +/sys/kernel/fadump_registered/sys/kernel/fadump/registered +/sys/kernel/fadump_release_mem /sys/kernel/fadump/release_mem + Here is the list of files under powerpc debugfs: (Assuming debugfs is mounted on /sys/kernel/debug directory.) -- 2.17.2
[PATCH v5 4/6] powerpc/powernv: move core and fadump_release_opalcore under new kobject
The /sys/firmware/opal/core and /sys/kernel/fadump_release_opalcore sysfs files are used to export and release the OPAL memory on PowerNV platform. let's organize them into a new kobject under /sys/firmware/opal/mpipl/ directory. A symlink is added to maintain the backward compatibility for /sys/firmware/opal/core sysfs file. Signed-off-by: Sourabh Jain --- .../sysfs-kernel-fadump_release_opalcore | 2 + .../powerpc/firmware-assisted-dump.rst| 15 +++-- arch/powerpc/platforms/powernv/opal-core.c| 55 ++- 3 files changed, 51 insertions(+), 21 deletions(-) rename Documentation/ABI/{testing => removed}/sysfs-kernel-fadump_release_opalcore (82%) diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore b/Documentation/ABI/removed/sysfs-kernel-fadump_release_opalcore similarity index 82% rename from Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore rename to Documentation/ABI/removed/sysfs-kernel-fadump_release_opalcore index 53313c1d4e7a..a8d46cd0f4e6 100644 --- a/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore +++ b/Documentation/ABI/removed/sysfs-kernel-fadump_release_opalcore @@ -1,3 +1,5 @@ +This ABI is moved to /sys/firmware/opal/mpipl/release_core. + What: /sys/kernel/fadump_release_opalcore Date: Sep 2019 Contact: linuxppc-dev@lists.ozlabs.org diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst index 0455a78486d5..345a3405206e 100644 --- a/Documentation/powerpc/firmware-assisted-dump.rst +++ b/Documentation/powerpc/firmware-assisted-dump.rst @@ -112,13 +112,13 @@ to ensure that crash data is preserved to process later. -- On OPAL based machines (PowerNV), if the kernel is build with CONFIG_OPAL_CORE=y, OPAL memory at the time of crash is also - exported as /sys/firmware/opal/core file. This procfs file is + exported as /sys/firmware/opal/mpipl/core file. This procfs file is helpful in debugging OPAL crashes with GDB. The kernel memory used for exporting this procfs file can be released by echo'ing - '1' to /sys/kernel/fadump_release_opalcore node. + '1' to /sys/firmware/opal/mpipl/release_core node. e.g. - # echo 1 > /sys/kernel/fadump_release_opalcore + # echo 1 > /sys/firmware/opal/mpipl/release_core Implementation details: --- @@ -283,14 +283,17 @@ Here is the list of files under kernel sysfs: enhanced to use this interface to release the memory reserved for dump and continue without 2nd reboot. - /sys/kernel/fadump_release_opalcore +Note: /sys/kernel/fadump_release_opalcore sysfs has moved to + /sys/firmware/opal/mpipl/release_core + + /sys/firmware/opal/mpipl/release_core This file is available only on OPAL based machines when FADump is active during capture kernel. This is used to release the memory -used by the kernel to export /sys/firmware/opal/core file. To +used by the kernel to export /sys/firmware/opal/mpipl/core file. To release this memory, echo '1' to it: -echo 1 > /sys/kernel/fadump_release_opalcore +echo 1 > /sys/firmware/opal/mpipl/release_core Here is the list of files under powerpc debugfs: (Assuming debugfs is mounted on /sys/kernel/debug directory.) diff --git a/arch/powerpc/platforms/powernv/opal-core.c b/arch/powerpc/platforms/powernv/opal-core.c index ed895d82c048..6dba3b62269f 100644 --- a/arch/powerpc/platforms/powernv/opal-core.c +++ b/arch/powerpc/platforms/powernv/opal-core.c @@ -71,6 +71,7 @@ static LIST_HEAD(opalcore_list); static struct opalcore_config *oc_conf; static const struct opal_mpipl_fadump *opalc_metadata; static const struct opal_mpipl_fadump *opalc_cpu_metadata; +struct kobject *mpipl_kobj; /* * Set crashing CPU's signal to SIGUSR1. if the kernel is triggered @@ -428,7 +429,7 @@ static void opalcore_cleanup(void) return; /* Remove OPAL core sysfs file */ - sysfs_remove_bin_file(opal_kobj, _core_attr); + sysfs_remove_bin_file(mpipl_kobj, _core_attr); oc_conf->ptload_phdr = NULL; oc_conf->ptload_cnt = 0; @@ -563,9 +564,9 @@ static void __init opalcore_config_init(void) of_node_put(np); } -static ssize_t fadump_release_opalcore_store(struct kobject *kobj, -struct kobj_attribute *attr, -const char *buf, size_t count) +static ssize_t release_core_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) { int input = -1; @@ -589,9 +590,23 @@ static ssize_t fadump_release_opalcore_store(struct kobject *kobj, return count; } -static struct kobj_attribute opalcore_rel_attr = __ATTR(fadump_release_opalcore, - 0200, NULL, -
[PATCH v5 3/6] powerpc/fadump: reorganize /sys/kernel/fadump_* sysfs files
As the number of FADump sysfs files increases it is hard to manage all of them inside /sys/kernel directory. It's better to have all the FADump related sysfs files in a dedicated directory /sys/kernel/fadump. But in order to maintain backward compatibility a symlink has been added for every sysfs that has moved to new location. As the FADump sysfs files are now part of a dedicated directory there is no need to prefix their name with fadump_, hence sysfs file names are also updated. For example fadump_enabled sysfs file is now referred as enabled. Also consolidate ABI documentation for all the FADump sysfs files in a single file Documentation/ABI/testing/sysfs-kernel-fadump. Signed-off-by: Sourabh Jain --- Documentation/ABI/testing/sysfs-kernel-fadump | 33 +++ arch/powerpc/kernel/fadump.c | 95 --- 2 files changed, 94 insertions(+), 34 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump b/Documentation/ABI/testing/sysfs-kernel-fadump new file mode 100644 index ..5d988b919e81 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-fadump @@ -0,0 +1,33 @@ +What: /sys/kernel/fadump/* +Date: Dec 2019 +Contact: linuxppc-dev@lists.ozlabs.org +Description: + The /sys/kernel/fadump/* is a collection of FADump sysfs + file provide information about the configuration status + of Firmware Assisted Dump (FADump). + +What: /sys/kernel/fadump/enabled +Date: Dec 2019 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read only + Primarily used to identify whether the FADump is enabled in + the kernel or not. +User: Kdump service + +What: /sys/kernel/fadump/registered +Date: Dec 2019 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read/write + Helps to control the dump collect feature from userspace. + Setting 1 to this file enables the system to collect the + dump and 0 to disable it. +User: Kdump service + +What: /sys/kernel/fadump/release_mem +Date: Dec 2019 +Contact: linuxppc-dev@lists.ozlabs.org +Description: write only + This is a special sysfs file and only available when + the system is booted to capture the vmcore using FADump. + It is used to release the memory reserved by FADump to + save the crash dump. diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index ed59855430b9..35ecb51edc50 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -44,6 +44,13 @@ struct fadump_mrange_info reserved_mrange_info = { "reserved", NULL, 0, 0, 0 }; #ifdef CONFIG_CMA static struct cma *fadump_cma; +#define CREATE_SYMLINK(target, symlink_name) do {\ + rc = compat_only_sysfs_link_entry_to_kobj(kernel_kobj, fadump_kobj, \ + target, symlink_name); \ + if (rc) \ + pr_err("unable to create %s symlink (%d)", symlink_name, rc); \ +} while (0) + /* * fadump_cma_init() - Initialize CMA area from a fadump reserved memory * @@ -1323,9 +1330,9 @@ static void fadump_invalidate_release_mem(void) fw_dump.ops->fadump_init_mem_struct(_dump); } -static ssize_t fadump_release_memory_store(struct kobject *kobj, - struct kobj_attribute *attr, - const char *buf, size_t count) +static ssize_t release_mem_store(struct kobject *kobj, +struct kobj_attribute *attr, +const char *buf, size_t count) { int input = -1; @@ -1350,23 +1357,23 @@ static ssize_t fadump_release_memory_store(struct kobject *kobj, return count; } -static ssize_t fadump_enabled_show(struct kobject *kobj, - struct kobj_attribute *attr, - char *buf) +static ssize_t enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) { return sprintf(buf, "%d\n", fw_dump.fadump_enabled); } -static ssize_t fadump_register_show(struct kobject *kobj, - struct kobj_attribute *attr, - char *buf) +static ssize_t registered_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) { return sprintf(buf, "%d\n", fw_dump.dump_registered); } -static ssize_t fadump_register_store(struct kobject *kobj, - struct kobj_attribute *attr, - const char *buf, size_t count) +static ssize_t
[PATCH v5 2/6] sysfs: wrap __compat_only_sysfs_link_entry_to_kobj function to change the symlink name
The __compat_only_sysfs_link_entry_to_kobj function creates a symlink to a kobject but doesn't provide an option to change the symlink file name. This patch adds a wrapper function compat_only_sysfs_link_entry_to_kobj that extends the __compat_only_sysfs_link_entry_to_kobj functionality which allows function caller to customize the symlink name. Signed-off-by: Sourabh Jain --- fs/sysfs/group.c | 28 +--- include/linux/sysfs.h | 12 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c index d41c21fef138..0993645f0b59 100644 --- a/fs/sysfs/group.c +++ b/fs/sysfs/group.c @@ -424,6 +424,25 @@ EXPORT_SYMBOL_GPL(sysfs_remove_link_from_group); int __compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj, struct kobject *target_kobj, const char *target_name) +{ + return compat_only_sysfs_link_entry_to_kobj(kobj, target_kobj, + target_name, NULL); +} +EXPORT_SYMBOL_GPL(__compat_only_sysfs_link_entry_to_kobj); + +/** + * compat_only_sysfs_link_entry_to_kobj - add a symlink to a kobject pointing + * to a group or an attribute + * @kobj: The kobject containing the group. + * @target_kobj: The target kobject. + * @target_name: The name of the target group or attribute. + * @symlink_name: The name of the symlink file (target_name will be + * considered if symlink_name is NULL). + */ +int compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj, +struct kobject *target_kobj, +const char *target_name, +const char *symlink_name) { struct kernfs_node *target; struct kernfs_node *entry; @@ -448,12 +467,15 @@ int __compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj, return -ENOENT; } - link = kernfs_create_link(kobj->sd, target_name, entry); + if (!symlink_name) + symlink_name = target_name; + + link = kernfs_create_link(kobj->sd, symlink_name, entry); if (IS_ERR(link) && PTR_ERR(link) == -EEXIST) - sysfs_warn_dup(kobj->sd, target_name); + sysfs_warn_dup(kobj->sd, symlink_name); kernfs_put(entry); kernfs_put(target); return PTR_ERR_OR_ZERO(link); } -EXPORT_SYMBOL_GPL(__compat_only_sysfs_link_entry_to_kobj); +EXPORT_SYMBOL_GPL(compat_only_sysfs_link_entry_to_kobj); diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h index 5420817ed317..15b195a4529d 100644 --- a/include/linux/sysfs.h +++ b/include/linux/sysfs.h @@ -300,6 +300,10 @@ void sysfs_remove_link_from_group(struct kobject *kobj, const char *group_name, int __compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj, struct kobject *target_kobj, const char *target_name); +int compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj, +struct kobject *target_kobj, +const char *target_name, +const char *symlink_name); void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr); @@ -508,6 +512,14 @@ static inline int __compat_only_sysfs_link_entry_to_kobj( return 0; } +static int compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj, + struct kobject *target_kobj, + const char *target_name, + const char *symlink_name) +{ + return 0; +} + static inline void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr) { -- 2.17.2
[PATCH v5 0/6] reorganize and add FADump sysfs files
Currently, FADump sysfs files are present inside /sys/kernel directory. But as the number of FADump sysfs file increases it is not a good idea to push all of them in /sys/kernel directory. It is better to have separate directory to keep all the FADump sysfs files. Patch series reorganizes the FADump sysfs files and avail all the existing FADump sysfs files present inside /sys/kernel into a new directory /sys/kernel/fadump. The backward compatibility is maintained by adding a symlink for every sysfs file that has moved to new location. Also a new FADump sys interface is added to get the amount of memory reserved by FADump for saving the crash dump. Changelog: v1 -> v2: - Move fadump_release_opalcore sysfs to FADump Kobject instead of replicating. - Changed the patch order 1,2,3,4 -> 2,1,3,4 (First add the ABI doc for exisiting sysfs file then replicate them under FADump kobject). v2 -> v3: - Remove the fadump_ prefix from replicated FADump sysfs file names. v3 -> v4: - New patch that adds a wrapper function to create symlink with custom symlink file name. - Add symlink instead of replicating the FADump sysfs files. - Move the OPAL core rel v4 -> v5: - Changed the wrapper function name in 2/6. - Defined FADump kobject attributes using __ATTR_* macros. - Replace individual FADump sysfs file creation with group. - Added a macro to create symlink. Sourabh Jain (6): Documentation/ABI: add ABI documentation for /sys/kernel/fadump_* sysfs: wrap __compat_only_sysfs_link_entry_to_kobj function to change the symlink name powerpc/fadump: reorganize /sys/kernel/fadump_* sysfs files powerpc/powernv: move core and fadump_release_opalcore under new kobject Documentation/ABI: mark /sys/kernel/fadump_* sysfs files deprecated powerpc/fadump: sysfs for fadump memory reservation .../ABI/obsolete/sysfs-kernel-fadump_enabled | 9 ++ .../obsolete/sysfs-kernel-fadump_registered | 10 ++ .../obsolete/sysfs-kernel-fadump_release_mem | 10 ++ .../sysfs-kernel-fadump_release_opalcore | 9 ++ Documentation/ABI/testing/sysfs-kernel-fadump | 40 +++ .../powerpc/firmware-assisted-dump.rst| 28 - arch/powerpc/kernel/fadump.c | 104 -- arch/powerpc/platforms/powernv/opal-core.c| 55 ++--- fs/sysfs/group.c | 28 - include/linux/sysfs.h | 12 ++ 10 files changed, 247 insertions(+), 58 deletions(-) create mode 100644 Documentation/ABI/obsolete/sysfs-kernel-fadump_enabled create mode 100644 Documentation/ABI/obsolete/sysfs-kernel-fadump_registered create mode 100644 Documentation/ABI/obsolete/sysfs-kernel-fadump_release_mem create mode 100644 Documentation/ABI/removed/sysfs-kernel-fadump_release_opalcore create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump -- 2.17.2
[PATCH v5 1/6] Documentation/ABI: add ABI documentation for /sys/kernel/fadump_*
Add missing ABI documentation for existing FADump sysfs files. Signed-off-by: Sourabh Jain --- Documentation/ABI/testing/sysfs-kernel-fadump_enabled | 7 +++ Documentation/ABI/testing/sysfs-kernel-fadump_registered | 8 Documentation/ABI/testing/sysfs-kernel-fadump_release_mem | 8 .../ABI/testing/sysfs-kernel-fadump_release_opalcore | 7 +++ 4 files changed, 30 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump_enabled create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump_registered create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump_release_mem create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_enabled b/Documentation/ABI/testing/sysfs-kernel-fadump_enabled new file mode 100644 index ..f73632b1c006 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-fadump_enabled @@ -0,0 +1,7 @@ +What: /sys/kernel/fadump_enabled +Date: Feb 2012 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read only + Primarily used to identify whether the FADump is enabled in + the kernel or not. +User: Kdump service diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_registered b/Documentation/ABI/testing/sysfs-kernel-fadump_registered new file mode 100644 index ..dcf925e53f0f --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-fadump_registered @@ -0,0 +1,8 @@ +What: /sys/kernel/fadump_registered +Date: Feb 2012 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read/write + Helps to control the dump collect feature from userspace. + Setting 1 to this file enables the system to collect the + dump and 0 to disable it. +User: Kdump service diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem b/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem new file mode 100644 index ..9c20d64ab48d --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem @@ -0,0 +1,8 @@ +What: /sys/kernel/fadump_release_mem +Date: Feb 2012 +Contact: linuxppc-dev@lists.ozlabs.org +Description: write only + This is a special sysfs file and only available when + the system is booted to capture the vmcore using FADump. + It is used to release the memory reserved by FADump to + save the crash dump. diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore b/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore new file mode 100644 index ..53313c1d4e7a --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore @@ -0,0 +1,7 @@ +What: /sys/kernel/fadump_release_opalcore +Date: Sep 2019 +Contact: linuxppc-dev@lists.ozlabs.org +Description: write only + The sysfs file is available when the system is booted to + collect the dump on OPAL based machine. It used to release + the memory used to collect the opalcore. -- 2.17.2
[PATCH V2 13/13] powerpc/vas: Free send window in VAS instance after credits returned
NX may be processing requests while trying to close window. Wait until all credits are returned and then free send window from VAS instance. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-window.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index 578f144..5322d1c 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -1309,14 +1309,14 @@ int vas_win_close(struct vas_window *window) unmap_paste_region(window); - clear_vinst_win(window); - poll_window_busy_state(window); unpin_close_window(window); poll_window_credits(window); + clear_vinst_win(window); + poll_window_castout(window); /* if send window, drop reference to matching receive window */ -- 1.8.3.1
[PATCH V2 12/13] powerpc/vas: Display process stuck message
Process can not close send window until all requests are processed. Means wait until window state is not busy and send credits are returned. Display debug message in case taking longer to close the window. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-window.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index 27848d3..578f144 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -1176,6 +1176,7 @@ static void poll_window_credits(struct vas_window *window) { u64 val; int creds, mode; + int count = 0; val = read_hvwc_reg(window, VREG(WINCTL)); if (window->tx_win) @@ -1194,10 +1195,25 @@ static void poll_window_credits(struct vas_window *window) creds = GET_FIELD(VAS_LRX_WCRED, val); } + /* +* Takes around few microseconds to complete all pending requests +* and return credits. +* TODO: Issue CRB Kill to stop all pending requests. Need only +* if there is a bug in NX or fault handling in kernel. +*/ if (creds < window->wcreds_max) { val = 0; set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(msecs_to_jiffies(10)); + count++; + /* +* Process can not close send window until all credits are +* returned. +*/ + if (!(count % 1)) + pr_debug("%s() pid %d stuck? retries %d\n", __func__, + vas_window_pid(window), count); + goto retry; } } @@ -1211,6 +1227,7 @@ static void poll_window_busy_state(struct vas_window *window) { int busy; u64 val; + int count = 0; retry: val = read_hvwc_reg(window, VREG(WIN_STATUS)); @@ -1219,6 +1236,15 @@ static void poll_window_busy_state(struct vas_window *window) val = 0; set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(msecs_to_jiffies(5)); + count++; + /* +* Takes around 5 microseconds to process all pending +* requests. +*/ + if (!(count % 1)) + pr_debug("%s() pid %d stuck? retries %d\n", __func__, + vas_window_pid(window), count); + goto retry; } } -- 1.8.3.1
[PATCH V2 11/13] powerpc/vas: Return credits after handling fault
NX expects OS to return credit for send window after processing each fault. Also credit has to be returned even for fault window. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-fault.c | 10 ++ arch/powerpc/platforms/powernv/vas-window.c | 17 + arch/powerpc/platforms/powernv/vas.h| 1 + 3 files changed, 28 insertions(+) diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c index cf41b65..926fdf3 100644 --- a/arch/powerpc/platforms/powernv/vas-fault.c +++ b/arch/powerpc/platforms/powernv/vas-fault.c @@ -247,6 +247,11 @@ irqreturn_t vas_fault_handler(int irq, void *data) memset(fifo, 0, CRB_SIZE); mutex_unlock(>mutex); + /* +* Return credit for the fault window. +*/ + vas_return_credit(vinst->fault_win, 0); + pr_devel("VAS[%d] fault_fifo %p, fifo %p, fault_crbs %d\n", vinst->vas_id, vinst->fault_fifo, fifo, vinst->fault_crbs); @@ -273,6 +278,11 @@ irqreturn_t vas_fault_handler(int irq, void *data) } update_csb(window, crb); + /* +* Return credit for send window after processing +* fault CRB. +*/ + vas_return_credit(window, 1); } while (true); return IRQ_HANDLED; diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index 941582b..27848d3 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -1312,6 +1312,23 @@ int vas_win_close(struct vas_window *window) } EXPORT_SYMBOL_GPL(vas_win_close); +/* + * Return credit for the given window. + */ +void vas_return_credit(struct vas_window *window, bool tx) +{ + uint64_t val; + + val = 0ULL; + if (tx) { /* send window */ + val = SET_FIELD(VAS_TX_WCRED, val, 1); + write_hvwc_reg(window, VREG(TX_WCRED_ADDER), val); + } else { + val = SET_FIELD(VAS_LRX_WCRED, val, 1); + write_hvwc_reg(window, VREG(LRX_WCRED_ADDER), val); + } +} + struct vas_window *vas_pswid_to_window(struct vas_instance *vinst, uint32_t pswid) { diff --git a/arch/powerpc/platforms/powernv/vas.h b/arch/powerpc/platforms/powernv/vas.h index d7398b7..6332683 100644 --- a/arch/powerpc/platforms/powernv/vas.h +++ b/arch/powerpc/platforms/powernv/vas.h @@ -415,6 +415,7 @@ struct vas_winctx { extern void vas_window_free_dbgdir(struct vas_window *win); extern int vas_setup_fault_window(struct vas_instance *vinst); extern irqreturn_t vas_fault_handler(int irq, void *data); +extern void vas_return_credit(struct vas_window *window, bool tx); extern struct vas_window *vas_pswid_to_window(struct vas_instance *vinst, uint32_t pswid); -- 1.8.3.1
[PATCH V2 10/13] powerpc/vas: Do not use default credits for receive window
System checkstops if RxFIFO overruns with more requests than the maximum possible number of CRBs allowed in FIFO at any time. So max credits value (rxattr.wcreds_max) is set and is passed to vas_rx_win_open() by the the driver. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-window.c | 4 ++-- arch/powerpc/platforms/powernv/vas.h| 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index 344db11..941582b 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -772,7 +772,7 @@ static bool rx_win_args_valid(enum vas_cop_type cop, if (attr->rx_fifo_size > VAS_RX_FIFO_SIZE_MAX) return false; - if (attr->wcreds_max > VAS_RX_WCREDS_MAX) + if (!attr->wcreds_max) return false; if (attr->nx_win) { @@ -878,7 +878,7 @@ struct vas_window *vas_rx_win_open(int vasid, enum vas_cop_type cop, rxwin->nx_win = rxattr->nx_win; rxwin->user_win = rxattr->user_win; rxwin->cop = cop; - rxwin->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT; + rxwin->wcreds_max = rxattr->wcreds_max; init_winctx_for_rxwin(rxwin, rxattr, ); init_winctx_regs(rxwin, ); diff --git a/arch/powerpc/platforms/powernv/vas.h b/arch/powerpc/platforms/powernv/vas.h index cd609ce..d7398b7 100644 --- a/arch/powerpc/platforms/powernv/vas.h +++ b/arch/powerpc/platforms/powernv/vas.h @@ -101,11 +101,9 @@ /* * Initial per-process credits. * Max send window credits:4K-1 (12-bits in VAS_TX_WCRED) - * Max receive window credits: 64K-1 (16 bits in VAS_LRX_WCRED) * * TODO: Needs tuning for per-process credits */ -#define VAS_RX_WCREDS_MAX ((64 << 10) - 1) #define VAS_TX_WCREDS_MAX ((4 << 10) - 1) #define VAS_WCREDS_DEFAULT (1 << 10) -- 1.8.3.1
[PATCH V2 09/13] powerpc/vas: Print CRB and FIFO values
Dump FIFO entry values if could not find send window and print CRB for debugging. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-fault.c | 41 ++ 1 file changed, 41 insertions(+) diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c index 88a211b..cf41b65 100644 --- a/arch/powerpc/platforms/powernv/vas-fault.c +++ b/arch/powerpc/platforms/powernv/vas-fault.c @@ -26,6 +26,28 @@ */ #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20) +static void dump_crb(struct coprocessor_request_block *crb) +{ + struct data_descriptor_entry *dde; + struct nx_fault_stamp *nx; + + dde = >source; + pr_devel("SrcDDE: addr 0x%llx, len %d, count %d, idx %d, flags %d\n", + be64_to_cpu(dde->address), be32_to_cpu(dde->length), + dde->count, dde->index, dde->flags); + + dde = >target; + pr_devel("TgtDDE: addr 0x%llx, len %d, count %d, idx %d, flags %d\n", + be64_to_cpu(dde->address), be32_to_cpu(dde->length), + dde->count, dde->index, dde->flags); + + nx = >stamp.nx; + pr_devel("NX Stamp: PSWID 0x%x, FSA 0x%llx, flags 0x%x, FS 0x%x\n", + be32_to_cpu(nx->pswid), + be64_to_cpu(crb->stamp.nx.fault_storage_addr), + nx->flags, be32_to_cpu(nx->fault_status)); +} + static void notify_process(pid_t pid, u64 fault_addr) { int rc; @@ -154,6 +176,23 @@ static void update_csb(struct vas_window *window, } } +static void dump_fifo(struct vas_instance *vinst, void *entry) +{ + int i; + unsigned long *fifo = entry; + + pr_err("Fault fifo size %d, max crbs %d, crb size %lu\n", + vinst->fault_fifo_size, + vinst->fault_fifo_size / CRB_SIZE, + sizeof(struct coprocessor_request_block)); + + pr_err("Fault FIFO Entry Dump:\n"); + for (i = 0; i < CRB_SIZE; i += 4, fifo += 4) { + pr_err("[%.3d, %p]: 0x%.16lx 0x%.16lx 0x%.16lx 0x%.16lx\n", + i, fifo, *fifo, *(fifo+1), *(fifo+2), *(fifo+3)); + } +} + /* * Process CRBs that we receive on the fault window. */ @@ -212,6 +251,7 @@ irqreturn_t vas_fault_handler(int irq, void *data) vinst->vas_id, vinst->fault_fifo, fifo, vinst->fault_crbs); + dump_crb(crb); window = vas_pswid_to_window(vinst, be32_to_cpu(crb->stamp.nx.pswid)); @@ -222,6 +262,7 @@ irqreturn_t vas_fault_handler(int irq, void *data) * even clean it up (return credit). * But we should not get here. */ + dump_fifo(vinst, (void *)crb); pr_err("VAS[%d] fault_fifo %p, fifo %p, pswid 0x%x, fault_crbs %d bad CRB?\n", vinst->vas_id, vinst->fault_fifo, fifo, be32_to_cpu(crb->stamp.nx.pswid), -- 1.8.3.1
[PATCH V2 08/13] powerpc/vas: Update CSB and notify process for fault CRBs
For each fault CRB, update fault address in CRB (fault_storage_addr) and translation error status in CSB so that user space can touch the fault address and resend the request. If the user space passed invalid CSB address send signal to process with SIGSEGV. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-fault.c | 130 + 1 file changed, 130 insertions(+) diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c index e1e34c6..88a211b 100644 --- a/arch/powerpc/platforms/powernv/vas-fault.c +++ b/arch/powerpc/platforms/powernv/vas-fault.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -25,6 +26,134 @@ */ #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20) +static void notify_process(pid_t pid, u64 fault_addr) +{ + int rc; + struct kernel_siginfo info; + + memset(, 0, sizeof(info)); + + info.si_signo = SIGSEGV; + info.si_errno = EFAULT; + info.si_code = SEGV_MAPERR; + info.si_addr = (void *)fault_addr; + /* +* process will be polling on csb.flags after request is sent to +* NX. So generally CSB update should not fail except when an +* application does not follow the process properly. So an error +* message will be displayed and leave it to user space whether +* to ignore or handle this signal. +*/ + rcu_read_lock(); + rc = kill_pid_info(SIGSEGV, , find_vpid(pid)); + rcu_read_unlock(); + + pr_devel("%s(): pid %d kill_proc_info() rc %d\n", __func__, pid, rc); +} + +/* + * Update the CSB to indicate a translation error. + * + * If the fault is in the CSB address itself or if we are unable to + * update the CSB, send a signal to the process, because we have no + * other way of notifying the user process. + * + * Remaining settings in the CSB are based on wait_for_csb() of + * NX-GZIP. + */ +static void update_csb(struct vas_window *window, + struct coprocessor_request_block *crb) +{ + int rc; + pid_t pid; + int task_exit = 0; + void __user *csb_addr; + struct task_struct *tsk; + struct coprocessor_status_block csb; + + /* +* NX user space windows can not be opened for task->mm=NULL +* and faults will not be generated for kernel requests. +*/ + if (!window->mm || !window->user_win) + return; + + csb_addr = (void *)be64_to_cpu(crb->csb_addr); + + csb.cc = CSB_CC_TRANSLATION; + csb.ce = CSB_CE_TERMINATION; + csb.cs = 0; + csb.count = 0; + + /* +* Returns the fault address in CPU format since it is passed with +* signal. But if the user space expects BE format, need changes. +* i.e either kernel (here) or user should convert to CPU format. +* Not both! +*/ + csb.address = be64_to_cpu(crb->stamp.nx.fault_storage_addr); + csb.flags = 0; + + use_mm(window->mm); + rc = copy_to_user(csb_addr, , sizeof(csb)); + /* +* User space polls on csb.flags (first byte). So add barrier +* then copy first byte with csb flags update. +*/ + smp_mb(); + if (!rc) { + csb.flags = CSB_V; + rc = copy_to_user(csb_addr, , sizeof(u8)); + } + unuse_mm(window->mm); + + /* Success */ + if (!rc) + return; + + /* +* User space passed invalid CSB address, Notify process with +* SEGV signal. +*/ + tsk = get_pid_task(window->pid, PIDTYPE_PID); + /* +* Send window will be closed after processing all NX requests +* and process exits after closing all windows. In multi-thread +* applications, thread may not exists, but does not close FD +* (means send window) upon exit. Parent thread (tgid) can use +* and close the window later. +* pid and mm references are taken when window is opened by +* process (pid). So tgid is used only when child thread is not +* available in multithread tasks. +* +*/ + if (tsk) { + if (tsk->flags & PF_EXITING) + task_exit = 1; + put_task_struct(tsk); + pid = vas_window_pid(window); + } else { + pid = window->tgid; + + rcu_read_lock(); + tsk = find_task_by_vpid(pid); + if (!tsk) { + rcu_read_unlock(); + return; + } + if (tsk->flags & PF_EXITING) + task_exit = 1; + rcu_read_unlock(); + } + + /* Do not notify if the task is exiting. */ + if (!task_exit) { + pr_err("Invalid CSB address 0x%p signalling pid(%d)\n", +
[PATCH V2 07/13] powerpc/vas: Take reference to PID and mm for user space windows
Process close windows after its requests are completed. In multi-thread applications, child can open a window but release FD will not be called upon its exit. Parent thread will be closing it later upon its exit. The parent can also send NX requests with this window and NX can generate page faults. After kernel handles the page fault, send signal to process by using PID if CSB address is invalid. Parent thread will not receive signal since its PID is different from the one saved in vas_window. So use tgid in case if the task for the pid saved in window is not running and send signal to its parent. To prevent reusing the pid until the window closed, take reference to pid and task mm. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-debug.c | 2 +- arch/powerpc/platforms/powernv/vas-window.c | 44 ++--- arch/powerpc/platforms/powernv/vas.h| 9 +- 3 files changed, 49 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/platforms/powernv/vas-debug.c b/arch/powerpc/platforms/powernv/vas-debug.c index 09e63df..ef9a717 100644 --- a/arch/powerpc/platforms/powernv/vas-debug.c +++ b/arch/powerpc/platforms/powernv/vas-debug.c @@ -38,7 +38,7 @@ static int info_show(struct seq_file *s, void *private) seq_printf(s, "Type: %s, %s\n", cop_to_str(window->cop), window->tx_win ? "Send" : "Receive"); - seq_printf(s, "Pid : %d\n", window->pid); + seq_printf(s, "Pid : %d\n", vas_window_pid(window)); unlock: mutex_unlock(_mutex); diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index e36c5d2..344db11 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -12,6 +12,8 @@ #include #include #include +#include +#include #include #include #include "vas.h" @@ -877,8 +879,6 @@ struct vas_window *vas_rx_win_open(int vasid, enum vas_cop_type cop, rxwin->user_win = rxattr->user_win; rxwin->cop = cop; rxwin->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT; - if (rxattr->user_win) - rxwin->pid = task_pid_vnr(current); init_winctx_for_rxwin(rxwin, rxattr, ); init_winctx_regs(rxwin, ); @@ -1028,7 +1028,6 @@ struct vas_window *vas_tx_win_open(int vasid, enum vas_cop_type cop, txwin->tx_win = 1; txwin->rxwin = rxwin; txwin->nx_win = txwin->rxwin->nx_win; - txwin->pid = attr->pid; txwin->user_win = attr->user_win; txwin->wcreds_max = attr->wcreds_max ?: VAS_WCREDS_DEFAULT; @@ -1069,6 +1068,34 @@ struct vas_window *vas_tx_win_open(int vasid, enum vas_cop_type cop, goto free_window; } + if (txwin->user_win) { + /* +* Window opened by child thread may not be closed when +* it exits. So take reference to its pid and release it +* when the window is free by parent thread. +* Acquire a reference to the task's pid to make sure +* pid will not be re-used. +*/ + txwin->pid = get_task_pid(current, PIDTYPE_PID); + /* +* Acquire a reference to the task's mm. +*/ + txwin->mm = get_task_mm(current); + + if (txwin->mm) { + mmput(txwin->mm); + mmgrab(txwin->mm); + mm_context_add_copro(txwin->mm); + } else { + put_pid(txwin->pid); + pr_err("VAS: pid(%d): mm_struct is not found\n", + current->pid); + rc = -EPERM; + goto free_window; + } + txwin->tgid = task_tgid_vnr(current); + } + set_vinst_win(vinst, txwin); return txwin; @@ -1267,8 +1294,17 @@ int vas_win_close(struct vas_window *window) poll_window_castout(window); /* if send window, drop reference to matching receive window */ - if (window->tx_win) + if (window->tx_win) { + if (window->user_win) { + /* Drop references to pid and mm */ + put_pid(window->pid); + if (window->mm) { + mmdrop(window->mm); + mm_context_remove_copro(window->mm); + } + } put_rx_win(window->rxwin); + } vas_window_free(window); diff --git a/arch/powerpc/platforms/powernv/vas.h b/arch/powerpc/platforms/powernv/vas.h index 2621df1..cd609ce 100644 --- a/arch/powerpc/platforms/powernv/vas.h +++ b/arch/powerpc/platforms/powernv/vas.h @@ -340,7 +340,9 @@ struct vas_window { bool user_win; /* True if user space window */
[PATCH V2 06/13] powerpc/vas: Register NX with fault window ID and IRQ port value
For each user space send window, register NX with fault window ID and port value so that NX paste CRBs in this fault FIFO when it sees fault on the request buffer. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-window.c | 15 +-- arch/powerpc/platforms/powernv/vas.h| 15 +++ 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index cec1b41..e36c5d2 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -373,7 +373,7 @@ int init_winctx_regs(struct vas_window *window, struct vas_winctx *winctx) init_xlate_regs(window, winctx->user_win); val = 0ULL; - val = SET_FIELD(VAS_FAULT_TX_WIN, val, 0); + val = SET_FIELD(VAS_FAULT_TX_WIN, val, winctx->fault_win_id); write_hvwc_reg(window, VREG(FAULT_TX_WIN), val); /* In PowerNV, interrupts go to HV. */ @@ -748,6 +748,8 @@ static void init_winctx_for_rxwin(struct vas_window *rxwin, winctx->min_scope = VAS_SCOPE_LOCAL; winctx->max_scope = VAS_SCOPE_VECTORED_GROUP; + if (rxwin->vinst->virq) + winctx->irq_port = rxwin->vinst->irq_port; } static bool rx_win_args_valid(enum vas_cop_type cop, @@ -945,13 +947,22 @@ static void init_winctx_for_txwin(struct vas_window *txwin, winctx->lpid = txattr->lpid; winctx->pidr = txattr->pidr; winctx->rx_win_id = txwin->rxwin->winid; + /* +* IRQ and fault window setup is successful. Set fault window +* for the send window so that ready to handle faults. +*/ + if (txwin->vinst->virq) + winctx->fault_win_id = txwin->vinst->fault_win->winid; winctx->dma_type = VAS_DMA_TYPE_INJECT; winctx->tc_mode = txattr->tc_mode; winctx->min_scope = VAS_SCOPE_LOCAL; winctx->max_scope = VAS_SCOPE_VECTORED_GROUP; + if (txwin->vinst->virq) + winctx->irq_port = txwin->vinst->irq_port; - winctx->pswid = 0; + winctx->pswid = txattr->pswid ? txattr->pswid : + encode_pswid(txwin->vinst->vas_id, txwin->winid); } static bool tx_win_args_valid(enum vas_cop_type cop, diff --git a/arch/powerpc/platforms/powernv/vas.h b/arch/powerpc/platforms/powernv/vas.h index 879f5b4..2621df1 100644 --- a/arch/powerpc/platforms/powernv/vas.h +++ b/arch/powerpc/platforms/powernv/vas.h @@ -455,6 +455,21 @@ static inline u64 read_hvwc_reg(struct vas_window *win, return in_be64(win->hvwc_map+reg); } +/* + * Encode/decode the Partition Send Window ID (PSWID) for a window in + * a way that we can uniquely identify any window in the system. i.e. + * we should be able to locate the 'struct vas_window' given the PSWID. + * + * BitsUsage + * 0:7 VAS id (8 bits) + * 8:15Unused, 0 (3 bits) + * 16:31 Window id (16 bits) + */ +static inline u32 encode_pswid(int vasid, int winid) +{ + return ((u32)winid | (vasid << (31 - 7))); +} + static inline void decode_pswid(u32 pswid, int *vasid, int *winid) { if (vasid) -- 1.8.3.1
[PATCH V2 05/13] powerpc/vas: Setup thread IRQ handler per VAS instance
Setup thread IRQ handler per each VAS instance. When NX sees a fault on CRB, kernel gets an interrupt and vas_fault_handler will be executed to process fault CRBs. Read all valid CRBs from fault FIFO, determine the corresponding send window from CRB and process fault requests. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-fault.c | 83 + arch/powerpc/platforms/powernv/vas-window.c | 60 + arch/powerpc/platforms/powernv/vas.c| 15 +- arch/powerpc/platforms/powernv/vas.h| 4 ++ 4 files changed, 161 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c index b0258ed..e1e34c6 100644 --- a/arch/powerpc/platforms/powernv/vas-fault.c +++ b/arch/powerpc/platforms/powernv/vas-fault.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include "vas.h" @@ -25,6 +26,88 @@ #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20) /* + * Process CRBs that we receive on the fault window. + */ +irqreturn_t vas_fault_handler(int irq, void *data) +{ + struct vas_instance *vinst = (struct vas_instance *)data; + struct coprocessor_request_block buf, *crb; + struct vas_window *window; + void *fifo; + + /* +* VAS can interrupt with multiple page faults. So process all +* valid CRBs within fault FIFO until reaches invalid CRB. +* NX updates nx_fault_stamp in CRB and pastes in fault FIFO. +* kernel retrives send window from parition send window ID +* (pswid) in nx_fault_stamp. So pswid should be non-zero and +* use this to check whether CRB is valid. +* After reading CRB entry, it is reset with 0's in fault FIFO. +* +* In case kernel receives another interrupt with different page +* fault and CRBs are processed by the previous handling, will be +* returned from this function when it sees invalid CRB (means 0's). +*/ + do { + mutex_lock(>mutex); + + /* +* Advance the fault fifo pointer to next CRB. +* Use CRB_SIZE rather than sizeof(*crb) since the latter is +* aligned to CRB_ALIGN (256) but the CRB written to by VAS is +* only CRB_SIZE in len. +*/ + fifo = vinst->fault_fifo + (vinst->fault_crbs * CRB_SIZE); + crb = (struct coprocessor_request_block *)fifo; + + /* +* pswid returned from NX will be in _be32, but just +* checking non-zero value to make sure the CRB is valid. +* Return if reached invalid CRB. +*/ + if (!crb->stamp.nx.pswid) { + mutex_unlock(>mutex); + return IRQ_HANDLED; + } + + vinst->fault_crbs++; + if (vinst->fault_crbs == vinst->fault_fifo_size/CRB_SIZE) + vinst->fault_crbs = 0; + + crb = + memcpy(crb, fifo, CRB_SIZE); + memset(fifo, 0, CRB_SIZE); + mutex_unlock(>mutex); + + pr_devel("VAS[%d] fault_fifo %p, fifo %p, fault_crbs %d\n", + vinst->vas_id, vinst->fault_fifo, fifo, + vinst->fault_crbs); + + window = vas_pswid_to_window(vinst, + be32_to_cpu(crb->stamp.nx.pswid)); + + if (IS_ERR(window)) { + /* +* We got an interrupt about a specific send +* window but we can't find that window and we can't +* even clean it up (return credit). +* But we should not get here. +*/ + pr_err("VAS[%d] fault_fifo %p, fifo %p, pswid 0x%x, fault_crbs %d bad CRB?\n", + vinst->vas_id, vinst->fault_fifo, fifo, + be32_to_cpu(crb->stamp.nx.pswid), + vinst->fault_crbs); + + WARN_ON_ONCE(1); + return IRQ_HANDLED; + } + + } while (true); + + return IRQ_HANDLED; +} + +/* * Fault window is opened per VAS instance. NX pastes fault CRB in fault * FIFO upon page faults. */ diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index f07f49a..cec1b41 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -1041,6 +1041,15 @@ struct vas_window *vas_tx_win_open(int vasid, enum vas_cop_type cop, } } else { /* +* Interrupt hanlder or fault window setup failed. Means +* NX can not
[PATCH V2 04/13] powerpc/vas: Setup fault window per VAS instance
Setup fault window for each VAS instance. When NX gets fault on request buffer, write fault CRBs in the corresponding fault FIFO and then sends an interrupt to the OS. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/vas-fault.c | 73 + arch/powerpc/platforms/powernv/vas-window.c | 3 +- arch/powerpc/platforms/powernv/vas.c| 24 ++ arch/powerpc/platforms/powernv/vas.h| 5 ++ 5 files changed, 105 insertions(+), 2 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/vas-fault.c diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index a3ac964..74c2246 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -17,6 +17,6 @@ obj-$(CONFIG_MEMORY_FAILURE) += opal-memory-errors.o obj-$(CONFIG_OPAL_PRD) += opal-prd.o obj-$(CONFIG_PERF_EVENTS) += opal-imc.o obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o -obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o +obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o vas-fault.o obj-$(CONFIG_OCXL_BASE)+= ocxl.o obj-$(CONFIG_SCOM_DEBUGFS) += opal-xscom.o diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c new file mode 100644 index 000..b0258ed --- /dev/null +++ b/arch/powerpc/platforms/powernv/vas-fault.c @@ -0,0 +1,73 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * VAS Fault handling. + * Copyright 2019, IBM Corporation + */ + +#define pr_fmt(fmt) "vas: " fmt + +#include +#include +#include +#include +#include +#include + +#include "vas.h" + +/* + * The maximum FIFO size for fault window can be 8MB + * (VAS_RX_FIFO_SIZE_MAX). Using 4MB FIFO since each VAS + * instance will be having fault window. + * 8MB FIFO can be used if expects more faults for each VAS + * instance. + */ +#define VAS_FAULT_WIN_FIFO_SIZE(4 << 20) + +/* + * Fault window is opened per VAS instance. NX pastes fault CRB in fault + * FIFO upon page faults. + */ +int vas_setup_fault_window(struct vas_instance *vinst) +{ + struct vas_rx_win_attr attr; + + vinst->fault_fifo_size = VAS_FAULT_WIN_FIFO_SIZE; + vinst->fault_fifo = kzalloc(vinst->fault_fifo_size, GFP_KERNEL); + if (!vinst->fault_fifo) { + pr_err("Unable to alloc %d bytes for fault_fifo\n", + vinst->fault_fifo_size); + return -ENOMEM; + } + + vas_init_rx_win_attr(, VAS_COP_TYPE_FAULT); + + attr.rx_fifo_size = vinst->fault_fifo_size; + attr.rx_fifo = vinst->fault_fifo; + + /* +* Max creds is based on number of CRBs can fit in the FIFO. +* (fault_fifo_size/CRB_SIZE). If 8MB FIFO is used, max creds +* will be 0x since the receive creds field is 16bits wide. +*/ + attr.wcreds_max = vinst->fault_fifo_size / CRB_SIZE; + attr.lnotify_lpid = 0; + attr.lnotify_pid = mfspr(SPRN_PID); + attr.lnotify_tid = mfspr(SPRN_PID); + + vinst->fault_win = vas_rx_win_open(vinst->vas_id, VAS_COP_TYPE_FAULT, + ); + + if (IS_ERR(vinst->fault_win)) { + pr_err("VAS: Error %ld opening FaultWin\n", + PTR_ERR(vinst->fault_win)); + kfree(vinst->fault_fifo); + return PTR_ERR(vinst->fault_win); + } + + pr_devel("VAS: Created FaultWin %d, LPID/PID/TID [%d/%d/%d]\n", + vinst->fault_win->winid, attr.lnotify_lpid, + attr.lnotify_pid, attr.lnotify_tid); + + return 0; +} diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index 0c0d27d..f07f49a 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -827,9 +827,10 @@ void vas_init_rx_win_attr(struct vas_rx_win_attr *rxattr, enum vas_cop_type cop) rxattr->fault_win = true; rxattr->notify_disable = true; rxattr->rx_wcred_mode = true; - rxattr->tx_wcred_mode = true; rxattr->rx_win_ord_mode = true; rxattr->tx_win_ord_mode = true; + rxattr->rej_no_credit = true; + rxattr->tc_mode = VAS_THRESH_DISABLED; } else if (cop == VAS_COP_TYPE_FTW) { rxattr->user_win = true; rxattr->intr_disable = true; diff --git a/arch/powerpc/platforms/powernv/vas.c b/arch/powerpc/platforms/powernv/vas.c index 40d8213..ec34c06 100644 --- a/arch/powerpc/platforms/powernv/vas.c +++ b/arch/powerpc/platforms/powernv/vas.c @@ -23,6 +23,15 @@ static DEFINE_PER_CPU(int, cpu_vas_id); +static int vas_irq_fault_window_setup(struct vas_instance *vinst) +{ + int rc = 0; + + rc = vas_setup_fault_window(vinst); +
[PATCH V2 03/13] powerpc/vas: Read interrupts and vas-port device tree properties
Read interrupts and vas-port device tree properties per each VAS instance. NX generates an interrupt when it sees page fault on the request buffer. Interrupts property is used to setup IRQ for handing the fault and set port value for each user space send window. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas.c | 40 arch/powerpc/platforms/powernv/vas.h | 2 ++ 2 files changed, 34 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/platforms/powernv/vas.c b/arch/powerpc/platforms/powernv/vas.c index ed9cc6d..40d8213 100644 --- a/arch/powerpc/platforms/powernv/vas.c +++ b/arch/powerpc/platforms/powernv/vas.c @@ -25,10 +25,11 @@ static int init_vas_instance(struct platform_device *pdev) { - int rc, cpu, vasid; - struct resource *res; - struct vas_instance *vinst; struct device_node *dn = pdev->dev.of_node; + int rc, cpu, vasid, nresources = 5; + struct vas_instance *vinst; + struct resource *res; + uint64_t port; rc = of_property_read_u32(dn, "ibm,vas-id", ); if (rc) { @@ -36,7 +37,18 @@ static int init_vas_instance(struct platform_device *pdev) return -ENODEV; } - if (pdev->num_resources != 4) { + rc = of_property_read_u64(dn, "ibm,vas-port", ); + if (rc) { + pr_err("No ibm,vas-port property for %s?\n", pdev->name); + /* No interrupts property */ + nresources = 4; + } + + /* +* interrupts property is available with 'ibm,vas-port' property. +* 4 Resources and 1 IRQ if interrupts property is available. +*/ + if (pdev->num_resources != nresources) { pr_err("Unexpected DT configuration for [%s, %d]\n", pdev->name, vasid); return -ENODEV; @@ -51,6 +63,7 @@ static int init_vas_instance(struct platform_device *pdev) mutex_init(>mutex); vinst->vas_id = vasid; vinst->pdev = pdev; + vinst->irq_port = port; res = >resource[0]; vinst->hvwc_bar_start = res->start; @@ -66,12 +79,23 @@ static int init_vas_instance(struct platform_device *pdev) pr_err("Bad 'paste_win_id_shift' in DT, %llx\n", res->end); goto free_vinst; } - vinst->paste_win_id_shift = 63 - res->end; - pr_devel("Initialized instance [%s, %d], paste_base 0x%llx, " - "paste_win_id_shift 0x%llx\n", pdev->name, vasid, - vinst->paste_base_addr, vinst->paste_win_id_shift); + /* interrupts property */ + if (pdev->num_resources == 5) { + res = >resource[4]; + vinst->virq = res->start; + if (vinst->virq <= 0) { + pr_err("IRQ resource is not available for [%s, %d]\n", + pdev->name, vasid); + vinst->virq = 0; + } + } + + pr_devel("Initialized instance [%s, %d] paste_base 0x%llx paste_win_id_shift 0x%llx IRQ %d Port 0x%llx\n", + pdev->name, vasid, vinst->paste_base_addr, + vinst->paste_win_id_shift, vinst->virq, + vinst->irq_port); for_each_possible_cpu(cpu) { if (cpu_to_chip_id(cpu) == of_get_ibm_chip_id(dn)) diff --git a/arch/powerpc/platforms/powernv/vas.h b/arch/powerpc/platforms/powernv/vas.h index 5574aec..598608b 100644 --- a/arch/powerpc/platforms/powernv/vas.h +++ b/arch/powerpc/platforms/powernv/vas.h @@ -313,6 +313,8 @@ struct vas_instance { u64 paste_base_addr; u64 paste_win_id_shift; + u64 irq_port; + int virq; struct mutex mutex; struct vas_window *rxwin[VAS_COP_TYPE_MAX]; struct vas_window *windows[VAS_WINDOWS_PER_CHIP]; -- 1.8.3.1
[PATCH V2 02/13] powerpc/vas: Define nx_fault_stamp in coprocessor_request_block
Kernel sets fault address and status in CRB for NX page fault on user space address after processing page fault. User space gets the signal and handles the fault mentioned in CRB by bringing the page in to memory and send NX request again. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/include/asm/icswx.h | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/icswx.h b/arch/powerpc/include/asm/icswx.h index 9872f85..b233d1e 100644 --- a/arch/powerpc/include/asm/icswx.h +++ b/arch/powerpc/include/asm/icswx.h @@ -108,6 +108,17 @@ struct data_descriptor_entry { __be64 address; } __packed __aligned(DDE_ALIGN); +/* 4.3.2 NX-stamped Fault CRB */ + +#define NX_STAMP_ALIGN (0x10) + +struct nx_fault_stamp { + __be64 fault_storage_addr; + __be16 reserved; + __u8 flags; + __u8 fault_status; + __be32 pswid; +} __packed __aligned(NX_STAMP_ALIGN); /* Chapter 6.5.2 Coprocessor-Request Block (CRB) */ @@ -135,7 +146,12 @@ struct coprocessor_request_block { struct coprocessor_completion_block ccb; - u8 reserved[48]; + union { + struct nx_fault_stamp nx; + u8 reserved[16]; + } stamp; + + u8 reserved[32]; struct coprocessor_status_block csb; } __packed __aligned(CRB_ALIGN); -- 1.8.3.1
[PATCH V2 01/13] powerpc/vas: Describe vas-port and interrupts properties
Signed-off-by: Haren Myneni --- Documentation/devicetree/bindings/powerpc/ibm,vas.txt | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/devicetree/bindings/powerpc/ibm,vas.txt b/Documentation/devicetree/bindings/powerpc/ibm,vas.txt index bf11d2f..12de08b 100644 --- a/Documentation/devicetree/bindings/powerpc/ibm,vas.txt +++ b/Documentation/devicetree/bindings/powerpc/ibm,vas.txt @@ -11,6 +11,8 @@ Required properties: window context start and length, OS/User window context start and length, "Paste address" start and length, "Paste window id" start bit and number of bits) +- ibm,vas-port : Port address for the interrupt. +- interrupts: IRQ value for each VAS instance and level. Example: @@ -18,5 +20,8 @@ Example: compatible = "ibm,vas", "ibm,power9-vas"; reg = <0x60191 0x200 0x60190 0x1 0x8 0x1 0x20 0x10>; name = "vas"; + interrupts = <0x1f 0>; + interrupt-parent = <>; ibm,vas-id = <0x1>; + ibm,vas-port = <0x601000100>; }; -- 1.8.3.1
[PATCH V2 00/13] powerpc/vas: Page fault handling for user space NX requests
Applications will send compression / decompression requests to NX with COPY/PASTE instructions. When NX is processing these requests, can hit fault on the request buffer (not in memory). It issues an interrupt and pastes fault CRB in fault FIFO. Expects kernel to handle this fault and return credits for both send and fault windows after processing. This patch series adds IRQ and fault window setup, and NX fault handling: - Read IRQ# from "interrupts" property and configure IRQ per VAS instance. - Set port# for each window to generate an interrupt when noticed fault. - Set fault window and FIFO on which NX paste fault CRB. - Setup IRQ thread fault handler per VAS instance. - When receiving an interrupt, Read CRBs from fault FIFO and update coprocessor_status_block (CSB) in the corresponding CRB with translation failure (CSB_CC_TRANSLATION). After issuing NX requests, process polls on CSB address. When it sees translation error, can touch the request buffer to bring the page in to memory and reissue NX request. - If copy_to_user fails on user space CSB address, OS sends SEGV signal. Tested these patches with NX-GZIP support and will be posting this series soon. Patch 2: Define nx_fault_stamp on which NX writes fault status for the fault CRB Patch 3: Read interrupts and port properties per VAS instance Patch 4: Setup fault window per each VAS instance. This window is used for NX to paste fault CRB in FIFO. Patches 5 & 6: Setup threaded IRQ per VAS and register NX with fault window ID and port number for each send window so that NX paste fault CRB in this window. Patch 7: Reference to pid and mm so that pid is not used until window closed. Needed for multi thread application where child can open a window and can be used by parent later. Patches 8 and 9: Process CRBs from fault FIFO and notify tasks by updating CSB or through signals. Patches 10 and 11: Return credits for send and fault windows after handling faults. Patch 13:Fix closing send window after all credits are returned. This issue happens only for user space requests. No page faults on kernel request buffer. Changelog: V2: - Use threaded IRQ instead of own kernel thread handler - Use pswid insted of user space CSB address to find valid CRB - Removed unused macros and other changes as suggested by Christoph Hellwig Haren Myneni (13): powerpc/vas: Describe vas-port and interrupts properties powerpc/vas: Define nx_fault_stamp in coprocessor_request_block powerpc/vas: Read interrupts and vas-port device tree properties powerpc/vas: Setup fault window per VAS instance powerpc/vas: Setup thread IRQ handler per VAS instance powerpc/vas: Register NX with fault window ID and IRQ port value powerpc/vas: Take reference to PID and mm for user space windows powerpc/vas: Update CSB and notify process for fault CRBs powerpc/vas: Print CRB and FIFO values powerpc/vas: Do not use default credits for receive window powerpc/VAS: Return credits after handling fault powerpc/vas: Display process stuck message powerpc/vas: Free send window in VAS instance after credits returned .../devicetree/bindings/powerpc/ibm,vas.txt| 5 + arch/powerpc/include/asm/icswx.h | 18 +- arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/vas-debug.c | 2 +- arch/powerpc/platforms/powernv/vas-fault.c | 337 + arch/powerpc/platforms/powernv/vas-window.c| 173 ++- arch/powerpc/platforms/powernv/vas.c | 77 - arch/powerpc/platforms/powernv/vas.h | 38 ++- 8 files changed, 627 insertions(+), 25 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/vas-fault.c -- 1.8.3.1
Re: [bug] userspace hitting sporadic SIGBUS on xfs (Power9, ppc64le), v4.19 and later
On 12/6/19 6:09 PM, dftxbs3e wrote: > Hello! > > I am very happy that someone has found this issue. > > I have been suffering from rather random SIGBUS errors in similar > conditions described by the author. > > I don't have much troubleshooting information to provide, however, I hit > the issue regularly so I could investigate during that. > > How do you debug such an issue? I tried a debugger etc. but besides > crashing with SIGBUS, I couldnt get any other meaningful information. You may want to test the patch Christoph sent on the original thread for this issue. -Eric
[tip: sched/urgent] sched/rt, powerpc: Use CONFIG_PREEMPTION
The following commit has been merged into the sched/urgent branch of tip: Commit-ID: fdc5569eaba997852e0bfb57d11af496e4c1fa9a Gitweb: https://git.kernel.org/tip/fdc5569eaba997852e0bfb57d11af496e4c1fa9a Author:Thomas Gleixner AuthorDate:Thu, 24 Oct 2019 18:04:58 +02:00 Committer: Ingo Molnar CommitterDate: Sun, 08 Dec 2019 14:37:32 +01:00 sched/rt, powerpc: Use CONFIG_PREEMPTION CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the entry code over to use CONFIG_PREEMPTION. [bigeasy: +Kconfig] Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Michael Ellerman Cc: Christophe Leroy Cc: Linus Torvalds Cc: Paul Mackerras Cc: Peter Zijlstra Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20191024160458.vlnf3wlcyjl2i...@linutronix.de Signed-off-by: Ingo Molnar --- arch/powerpc/Kconfig | 2 +- arch/powerpc/kernel/entry_32.S | 4 ++-- arch/powerpc/kernel/entry_64.S | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index e446bb5..c781170 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -106,7 +106,7 @@ config LOCKDEP_SUPPORT config GENERIC_LOCKBREAK bool default y - depends on SMP && PREEMPT + depends on SMP && PREEMPTION config GENERIC_HWEIGHT bool diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index d60908e..e1a4c39 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -897,7 +897,7 @@ resume_kernel: bne-0b 1: -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION /* check current_thread_info->preempt_count */ lwz r0,TI_PREEMPT(r2) cmpwi 0,r0,0 /* if non-zero, just restore regs and return */ @@ -921,7 +921,7 @@ resume_kernel: */ bl trace_hardirqs_on #endif -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ restore_kuap: kuap_restore r1, r2, r9, r10, r0 diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 3fd3ef3..a9a1d3c 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -846,7 +846,7 @@ resume_kernel: bne-0b 1: -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION /* Check if we need to preempt */ andi. r0,r4,_TIF_NEED_RESCHED beq+restore @@ -877,7 +877,7 @@ resume_kernel: li r10,MSR_RI mtmsrd r10,1 /* Update machine state */ #endif /* CONFIG_PPC_BOOK3E */ -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ .globl fast_exc_return_irq fast_exc_return_irq:
[PATCH] powerpc/irq: don't use current_stack_pointer() in do_IRQ()
Before commit 7306e83ccf5c ("powerpc: Don't use CURRENT_THREAD_INFO to find the stack"), the current stack base address was obtained by calling current_thread_info(). That inline function was simply masking out the value of r1. In that commit, it was changed to using current_stack_pointer(), which is an heavier function as it is an outline assembly function which cannot be inlined and which reads the content of the stack at 0(r1) Revert to just getting r1 and masking out its value to obtain the base address of the stack pointer as before. Signed-off-by: Christophe Leroy Fixes: 7306e83ccf5c ("powerpc: Don't use CURRENT_THREAD_INFO to find the stack") --- arch/powerpc/kernel/irq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 240eca12c71d..bb34005ff9d2 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -693,10 +693,11 @@ void __do_irq(struct pt_regs *regs) void do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); + register unsigned long r1 asm("r1"); void *cursp, *irqsp, *sirqsp; /* Switch to the irq stack to handle this */ - cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1)); + cursp = (void *)(r1 & ~(THREAD_SIZE - 1)); irqsp = hardirq_ctx[raw_smp_processor_id()]; sirqsp = softirq_ctx[raw_smp_processor_id()]; -- 2.13.3