Re: [PATCH] powerpc: ensure that swiotlb buffer is allocated from low memory

2019-12-08 Thread Mike Rapoport
On Mon, Dec 09, 2019 at 04:43:17PM +1100, Michael Ellerman wrote:
> Mike Rapoport  writes:
> > From: Mike Rapoport 
> >
> > Some powerpc platforms (e.g. 85xx) limit DMA-able memory way below 4G. If a
> > system has more physical memory than this limit, the swiotlb buffer is not
> > addressable because it is allocated from memblock using top-down mode.
> >
> > Force memblock to bottom-up mode before calling swiotlb_init() to ensure
> > that the swiotlb buffer is DMA-able.
> >
> > Link: 
> > https://lkml.kernel.org/r/f1ebb706-73df-430e-9020-c214ec8ed...@xenosoft.de
> 
> This wasn't bisected, but I thought it was a regression. Do we know what
> commit caused it?
> 
> Was it 25078dc1f74b ("powerpc: use mm zones more sensibly") ?

swiotlb buffer is initialized before zones are actually used, so probably
not :)
 
> Or was that a red herring?
> 
> cheers
> 
> > Reported-by: Christian Zigotzky 
> > Signed-off-by: Mike Rapoport 
> > Cc: Benjamin Herrenschmidt 
> > Cc: Christoph Hellwig 
> > Cc: Darren Stevens 
> > Cc: mad skateman 
> > Cc: Michael Ellerman 
> > Cc: Nicolas Saenz Julienne 
> > Cc: Paul Mackerras 
> > Cc: Robin Murphy 
> > Cc: Rob Herring 
> > ---
> >  arch/powerpc/mm/mem.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> > index be941d382c8d..14c2c53e3f9e 100644
> > --- a/arch/powerpc/mm/mem.c
> > +++ b/arch/powerpc/mm/mem.c
> > @@ -260,6 +260,14 @@ void __init mem_init(void)
> > BUILD_BUG_ON(MMU_PAGE_COUNT > 16);
> >  
> >  #ifdef CONFIG_SWIOTLB
> > +   /*
> > +* Some platforms (e.g. 85xx) limit DMA-able memory way below
> > +* 4G. We force memblock to bottom-up mode to ensure that the
> > +* memory allocated in swiotlb_init() is DMA-able.
> > +* As it's the last memblock allocation, no need to reset it
> > +* back to to-down.
> > +*/
> > +   memblock_set_bottom_up(true);
> > swiotlb_init(0);
> >  #endif
> >  
> > -- 
> > 2.24.0

-- 
Sincerely yours,
Mike.


[PATCH] powerpc/irq: fix stack overflow verification

2019-12-08 Thread Christophe Leroy
Before commit 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of
the irq stack"), check_stack_overflow() was called by do_IRQ(), before
switching to the irq stack.
In that commit, do_IRQ() was renamed __do_irq(), and is now executing
on the irq stack, so check_stack_overflow() has just become almost
useless.

Move check_stack_overflow() call in do_IRQ() to do the check while
still on the current stack.

Fixes: 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/irq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 0aebd7843c73..e2bce937d51f 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -667,8 +667,6 @@ void __do_irq(struct pt_regs *regs)
 
trace_irq_entry(regs);
 
-   check_stack_overflow();
-
/*
 * Query the platform PIC for the interrupt & ack it.
 *
@@ -701,6 +699,8 @@ void do_IRQ(struct pt_regs *regs)
irqsp = hardirq_ctx[raw_smp_processor_id()];
sirqsp = softirq_ctx[raw_smp_processor_id()];
 
+   check_stack_overflow();
+
/* Already there ? */
if (unlikely(cursp == irqsp || cursp == sirqsp)) {
__do_irq(regs);
-- 
2.13.3



[PATCH 2/2] powerpc/irq: use IS_ENABLED() in check_stack_overflow()

2019-12-08 Thread Christophe Leroy
Instead of #ifdef, use IS_ENABLED(CONFIG_DEBUG_STACKOVERFLOW).
This enable GCC to check for code validity even when the option
is not selected.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/irq.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 4d468d835558..0aebd7843c73 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -598,16 +598,17 @@ u64 arch_irq_stat_cpu(unsigned int cpu)
 
 static inline void check_stack_overflow(void)
 {
-#ifdef CONFIG_DEBUG_STACKOVERFLOW
register unsigned long r1 asm("r1");
long sp = r1 & (THREAD_SIZE - 1);
 
+   if (!IS_ENABLED(CONFIG_DEBUG_STACKOVERFLOW))
+   return;
+
/* check for stack overflow: is there less than 2KB free? */
if (unlikely(sp < 2048)) {
pr_err("do_IRQ: stack overflow: %ld\n", sp);
dump_stack();
}
-#endif
 }
 
 #ifdef CONFIG_PPC32
-- 
2.13.3



[PATCH 1/2] powerpc/irq: don't use current_stack_pointer() in check_stack_overflow()

2019-12-08 Thread Christophe Leroy
current_stack_pointer() doesn't return the stack pointer, but the
caller's stack frame. See commit bfe9a2cfe91a ("powerpc: Reimplement
__get_SP() as a function not a define") and commit acf620ecf56c
("powerpc: Rename __get_SP() to current_stack_pointer()") for details.

The purpose of check_stack_overflow() is to verify that the stack has
not overflowed.

To really know whether the stack pointer is still within boundaries,
the check must be done directly on the value of r1.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/irq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index bb34005ff9d2..4d468d835558 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -599,9 +599,8 @@ u64 arch_irq_stat_cpu(unsigned int cpu)
 static inline void check_stack_overflow(void)
 {
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
-   long sp;
-
-   sp = current_stack_pointer() & (THREAD_SIZE-1);
+   register unsigned long r1 asm("r1");
+   long sp = r1 & (THREAD_SIZE - 1);
 
/* check for stack overflow: is there less than 2KB free? */
if (unlikely(sp < 2048)) {
-- 
2.13.3



Re: [PATCH] powerpc/archrandom: fix arch_get_random_seed_int()

2019-12-08 Thread Michael Ellerman
On Wed, 2019-12-04 at 11:50:15 UTC, Ard Biesheuvel wrote:
> Commit 01c9348c7620ec65
> 
>   powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*
> 
> updated arch_get_random_[int|long]() to be NOPs, and moved the hardware
> RNG backing to arch_get_random_seed_[int|long]() instead. However, it
> failed to take into account that arch_get_random_int() was implemented
> in terms of arch_get_random_long(), and so we ended up with a version
> of the former that is essentially a NOP as well.
> 
> Fix this by calling arch_get_random_seed_long() from
> arch_get_random_seed_int() instead.
> 
> Fixes: 01c9348c7620ec65 ("powerpc: Use hardware RNG for 
> arch_get_random_seed_* not arch_get_random_*")
> Signed-off-by: Ard Biesheuvel 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/b6afd1234cf93aa0d71b4be4788c47534905f0be

cheers


Re: [PATCH] powerpc/pmem: Convert to EXPORT_SYMBOL_GPL

2019-12-08 Thread Michael Ellerman
On Mon, 2019-12-02 at 06:40:18 UTC, "Aneesh Kumar K.V" wrote:
> All other architecture export this as GPL symbol
> 
> Signed-off-by: Aneesh Kumar K.V 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/551003fff7235ce935bc1fefb72d12b63a408bd0

cheers


Re: [PATCH v3] platforms/powernv: Avoid re-registration of imc debugfs directory

2019-12-08 Thread Michael Ellerman
On Wed, 2019-11-27 at 07:20:35 UTC, Anju T Sudhakar wrote:
> export_imc_mode_and_cmd() function which creates the debugfs interface for
> imc-mode and imc-command, is invoked when each nest pmu units is
> registered.
> When the first nest pmu unit is registered, export_imc_mode_and_cmd()
> creates 'imc' directory under `/debug/powerpc/`. In the subsequent
> invocations debugfs_create_dir() function returns, since the directory
> already exists.
> 
> The recent commit  (debugfs: make error message a bit more
> verbose), throws a warning if we try to invoke `debugfs_create_dir()`
> with an already existing directory name.
> 
> Address this warning by making the debugfs directory registration
> in the opal_imc_counters_probe() function, i.e invoke
> export_imc_mode_and_cmd() function from the probe function.
> 
> Signed-off-by: Anju T Sudhakar 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/48e626ac85b43cc589dd1b3b8004f7f85f03544d

cheers


Re: [PATCH v2] powerpc/perf: Disable trace_imc pmu

2019-12-08 Thread Michael Ellerman
On Mon, 2019-11-18 at 03:44:52 UTC, Madhavan Srinivasan wrote:
> When a root user or a user with CAP_SYS_ADMIN
> privilege use trace_imc performance monitoring
> unit events, to monitor application or KVM threads,
> may result in a checkstop (System crash). Reason
> being frequent switch of the "trace/accumulation"
> mode of In-Memory Collection hardware.
> This patch disables trace_imc pmu unit, but will
> be re-enabled at a later stage with a fix patchset.
> 
> Fixes: 012ae244845f1 ('powerpc/perf: Trace imc PMU functions') 
> Signed-off-by: Madhavan Srinivasan 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/249fad734a25889a4f23ed014d43634af6798063

cheers


Re: [PATCH] powerpc: ensure that swiotlb buffer is allocated from low memory

2019-12-08 Thread Michael Ellerman
Mike Rapoport  writes:
> From: Mike Rapoport 
>
> Some powerpc platforms (e.g. 85xx) limit DMA-able memory way below 4G. If a
> system has more physical memory than this limit, the swiotlb buffer is not
> addressable because it is allocated from memblock using top-down mode.
>
> Force memblock to bottom-up mode before calling swiotlb_init() to ensure
> that the swiotlb buffer is DMA-able.
>
> Link: 
> https://lkml.kernel.org/r/f1ebb706-73df-430e-9020-c214ec8ed...@xenosoft.de

This wasn't bisected, but I thought it was a regression. Do we know what
commit caused it?

Was it 25078dc1f74b ("powerpc: use mm zones more sensibly") ?

Or was that a red herring?

cheers

> Reported-by: Christian Zigotzky 
> Signed-off-by: Mike Rapoport 
> Cc: Benjamin Herrenschmidt 
> Cc: Christoph Hellwig 
> Cc: Darren Stevens 
> Cc: mad skateman 
> Cc: Michael Ellerman 
> Cc: Nicolas Saenz Julienne 
> Cc: Paul Mackerras 
> Cc: Robin Murphy 
> Cc: Rob Herring 
> ---
>  arch/powerpc/mm/mem.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index be941d382c8d..14c2c53e3f9e 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -260,6 +260,14 @@ void __init mem_init(void)
>   BUILD_BUG_ON(MMU_PAGE_COUNT > 16);
>  
>  #ifdef CONFIG_SWIOTLB
> + /*
> +  * Some platforms (e.g. 85xx) limit DMA-able memory way below
> +  * 4G. We force memblock to bottom-up mode to ensure that the
> +  * memory allocated in swiotlb_init() is DMA-able.
> +  * As it's the last memblock allocation, no need to reset it
> +  * back to to-down.
> +  */
> + memblock_set_bottom_up(true);
>   swiotlb_init(0);
>  #endif
>  
> -- 
> 2.24.0


Re: [PATCH V2 00/13] powerpc/vas: Page fault handling for user space NX requests

2019-12-08 Thread Christophe Leroy

Hi,

What do you mean by NX ?
Up to now, NX has been standing for No-eXecute. That's a bit in segment 
registers on book3s/32 to forbid executing code.


Therefore, some of your text is really misleading. If NX means something 
else for you, your text must be unambiguous.


Christophe

Le 09/12/2019 à 04:18, Haren Myneni a écrit :


Applications will send compression / decompression requests to NX with
COPY/PASTE instructions. When NX is processing these requests, can hit
fault on the request buffer (not in memory). It issues an interrupt and
pastes fault CRB in fault FIFO. Expects kernel to handle this fault and
return credits for both send and fault windows after processing.

This patch series adds IRQ and fault window setup, and NX fault handling:
- Read IRQ# from "interrupts" property and configure IRQ per VAS instance.
- Set port# for each window to generate an interrupt when noticed fault.
- Set fault window and FIFO on which NX paste fault CRB.
- Setup IRQ thread fault handler per VAS instance.
- When receiving an interrupt, Read CRBs from fault FIFO and update
   coprocessor_status_block (CSB) in the corresponding CRB with translation
   failure (CSB_CC_TRANSLATION). After issuing NX requests, process polls
   on CSB address. When it sees translation error, can touch the request
   buffer to bring the page in to memory and reissue NX request.
- If copy_to_user fails on user space CSB address, OS sends SEGV signal.

Tested these patches with NX-GZIP support and will be posting this series
soon.

Patch 2: Define nx_fault_stamp on which NX writes fault status for the fault
  CRB
Patch 3: Read interrupts and port properties per VAS instance
Patch 4: Setup fault window per each VAS instance. This window is used for
  NX to paste fault CRB in FIFO.
Patches 5 & 6: Setup threaded IRQ per VAS and register NX with fault window
 ID and port number for each send window so that NX paste fault CRB
 in this window.
Patch 7: Reference to pid and mm so that pid is not used until window closed.
 Needed for multi thread application where child can open a window
 and can be used by parent later.
Patches 8 and 9: Process CRBs from fault FIFO and notify tasks by
  updating CSB or through signals.
Patches 10 and 11: Return credits for send and fault windows after handling
 faults.
Patch 13:Fix closing send window after all credits are returned. This issue
  happens only for user space requests. No page faults on kernel
  request buffer.

Changelog:
V2:
   - Use threaded IRQ instead of own kernel thread handler
   - Use pswid insted of user space CSB address to find valid CRB
   - Removed unused macros and other changes as suggested by Christoph Hellwig

Haren Myneni (13):
   powerpc/vas: Describe vas-port and interrupts properties
   powerpc/vas: Define nx_fault_stamp in coprocessor_request_block
   powerpc/vas: Read interrupts and vas-port device tree properties
   powerpc/vas: Setup fault window per VAS instance
   powerpc/vas: Setup thread IRQ handler per VAS instance
   powerpc/vas: Register NX with fault window ID and IRQ port value
   powerpc/vas: Take reference to PID and mm for user space windows
   powerpc/vas: Update CSB and notify process for fault CRBs
   powerpc/vas: Print CRB and FIFO values
   powerpc/vas: Do not use default credits for receive window
   powerpc/VAS: Return credits after handling fault
   powerpc/vas: Display process stuck message
   powerpc/vas: Free send window in VAS instance after credits returned

  .../devicetree/bindings/powerpc/ibm,vas.txt|   5 +
  arch/powerpc/include/asm/icswx.h   |  18 +-
  arch/powerpc/platforms/powernv/Makefile|   2 +-
  arch/powerpc/platforms/powernv/vas-debug.c |   2 +-
  arch/powerpc/platforms/powernv/vas-fault.c | 337 +
  arch/powerpc/platforms/powernv/vas-window.c| 173 ++-
  arch/powerpc/platforms/powernv/vas.c   |  77 -
  arch/powerpc/platforms/powernv/vas.h   |  38 ++-
  8 files changed, 627 insertions(+), 25 deletions(-)
  create mode 100644 arch/powerpc/platforms/powernv/vas-fault.c



[PATCH v5 6/6] powerpc/fadump: sysfs for fadump memory reservation

2019-12-08 Thread Sourabh Jain
Add a sys interface to allow querying the memory reserved by FADump for
saving the crash dump.

Also added Documentation/ABI for the new sysfs file.

Signed-off-by: Sourabh Jain 
---
 Documentation/ABI/testing/sysfs-kernel-fadump| 7 +++
 Documentation/powerpc/firmware-assisted-dump.rst | 5 +
 arch/powerpc/kernel/fadump.c | 9 +
 3 files changed, 21 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump 
b/Documentation/ABI/testing/sysfs-kernel-fadump
index 5d988b919e81..8f7a64a81783 100644
--- a/Documentation/ABI/testing/sysfs-kernel-fadump
+++ b/Documentation/ABI/testing/sysfs-kernel-fadump
@@ -31,3 +31,10 @@ Description: write only
the system is booted to capture the vmcore using FADump.
It is used to release the memory reserved by FADump to
save the crash dump.
+
+What:  /sys/kernel/fadump/mem_reserved
+Date:  Dec 2019
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   read only
+   Provide information about the amount of memory reserved by
+   FADump to save the crash dump in bytes.
diff --git a/Documentation/powerpc/firmware-assisted-dump.rst 
b/Documentation/powerpc/firmware-assisted-dump.rst
index 365c10209ef3..04993eaf3113 100644
--- a/Documentation/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/powerpc/firmware-assisted-dump.rst
@@ -268,6 +268,11 @@ Here is the list of files under kernel sysfs:
 be handled and vmcore will not be captured. This interface can be
 easily integrated with kdump service start/stop.
 
+ /sys/kernel/fadump/mem_reserved
+
+   This is used to display the memory reserved by FADump for saving the
+   crash dump.
+
  /sys/kernel/fadump_release_mem
 This file is available only when FADump is active during
 second kernel. This is used to release the reserved memory
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 35ecb51edc50..6f367e5b7970 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1364,6 +1364,13 @@ static ssize_t enabled_show(struct kobject *kobj,
return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
 }
 
+static ssize_t mem_reserved_show(struct kobject *kobj,
+struct kobj_attribute *attr,
+char *buf)
+{
+   return sprintf(buf, "%ld\n", fw_dump.reserve_dump_area_size);
+}
+
 static ssize_t registered_show(struct kobject *kobj,
   struct kobj_attribute *attr,
   char *buf)
@@ -1431,10 +1438,12 @@ EXPORT_SYMBOL_GPL(fadump_kobj);
 static struct kobj_attribute release_attr = __ATTR_WO(release_mem);
 static struct kobj_attribute enable_attr = __ATTR_RO(enabled);
 static struct kobj_attribute register_attr = __ATTR_RW(registered);
+static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved);
 
 static struct attribute *fadump_attrs[] = {
_attr.attr,
_attr.attr,
+   _reserved_attr.attr,
NULL,
 };
 
-- 
2.17.2



[PATCH v5 5/6] Documentation/ABI: mark /sys/kernel/fadump_* sysfs files deprecated

2019-12-08 Thread Sourabh Jain
Add a deprecation note in FADump sysfs ABI documentation files and move
them from ABI/testing to ABI/obsolete directory.

Signed-off-by: Sourabh Jain 
---
 .../ABI/{testing => obsolete}/sysfs-kernel-fadump_enabled | 2 ++
 .../{testing => obsolete}/sysfs-kernel-fadump_registered  | 2 ++
 .../{testing => obsolete}/sysfs-kernel-fadump_release_mem | 2 ++
 Documentation/powerpc/firmware-assisted-dump.rst  | 8 
 4 files changed, 14 insertions(+)
 rename Documentation/ABI/{testing => obsolete}/sysfs-kernel-fadump_enabled 
(73%)
 rename Documentation/ABI/{testing => obsolete}/sysfs-kernel-fadump_registered 
(77%)
 rename Documentation/ABI/{testing => obsolete}/sysfs-kernel-fadump_release_mem 
(78%)

diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_enabled 
b/Documentation/ABI/obsolete/sysfs-kernel-fadump_enabled
similarity index 73%
rename from Documentation/ABI/testing/sysfs-kernel-fadump_enabled
rename to Documentation/ABI/obsolete/sysfs-kernel-fadump_enabled
index f73632b1c006..e9c2de8b3688 100644
--- a/Documentation/ABI/testing/sysfs-kernel-fadump_enabled
+++ b/Documentation/ABI/obsolete/sysfs-kernel-fadump_enabled
@@ -1,3 +1,5 @@
+This ABI is renamed and moved to a new location /sys/kernel/fadump/enabled.
+
 What:  /sys/kernel/fadump_enabled
 Date:  Feb 2012
 Contact:   linuxppc-dev@lists.ozlabs.org
diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_registered 
b/Documentation/ABI/obsolete/sysfs-kernel-fadump_registered
similarity index 77%
rename from Documentation/ABI/testing/sysfs-kernel-fadump_registered
rename to Documentation/ABI/obsolete/sysfs-kernel-fadump_registered
index dcf925e53f0f..0360be39c98e 100644
--- a/Documentation/ABI/testing/sysfs-kernel-fadump_registered
+++ b/Documentation/ABI/obsolete/sysfs-kernel-fadump_registered
@@ -1,3 +1,5 @@
+This ABI is renamed and moved to a new location 
/sys/kernel/fadump/registered.??
+
 What:  /sys/kernel/fadump_registered
 Date:  Feb 2012
 Contact:   linuxppc-dev@lists.ozlabs.org
diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem 
b/Documentation/ABI/obsolete/sysfs-kernel-fadump_release_mem
similarity index 78%
rename from Documentation/ABI/testing/sysfs-kernel-fadump_release_mem
rename to Documentation/ABI/obsolete/sysfs-kernel-fadump_release_mem
index 9c20d64ab48d..6ce0b129ab12 100644
--- a/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem
+++ b/Documentation/ABI/obsolete/sysfs-kernel-fadump_release_mem
@@ -1,3 +1,5 @@
+This ABI is renamed and moved to a new location 
/sys/kernel/fadump/release_mem.??
+
 What:  /sys/kernel/fadump_release_mem
 Date:  Feb 2012
 Contact:   linuxppc-dev@lists.ozlabs.org
diff --git a/Documentation/powerpc/firmware-assisted-dump.rst 
b/Documentation/powerpc/firmware-assisted-dump.rst
index 345a3405206e..365c10209ef3 100644
--- a/Documentation/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/powerpc/firmware-assisted-dump.rst
@@ -295,6 +295,14 @@ Note: /sys/kernel/fadump_release_opalcore sysfs has moved 
to
 
 echo 1  > /sys/firmware/opal/mpipl/release_core
 
+Note: The following FADump sysfs files are deprecated.
+
+Deprecated   Alternative
+
+/sys/kernel/fadump_enabled   /sys/kernel/fadump/enabled
+/sys/kernel/fadump_registered/sys/kernel/fadump/registered
+/sys/kernel/fadump_release_mem   /sys/kernel/fadump/release_mem
+
 Here is the list of files under powerpc debugfs:
 (Assuming debugfs is mounted on /sys/kernel/debug directory.)
 
-- 
2.17.2



[PATCH v5 4/6] powerpc/powernv: move core and fadump_release_opalcore under new kobject

2019-12-08 Thread Sourabh Jain
The /sys/firmware/opal/core and /sys/kernel/fadump_release_opalcore sysfs
files are used to export and release the OPAL memory on PowerNV platform.
let's organize them into a new kobject under /sys/firmware/opal/mpipl/
directory.

A symlink is added to maintain the backward compatibility for
/sys/firmware/opal/core sysfs file.

Signed-off-by: Sourabh Jain 
---
 .../sysfs-kernel-fadump_release_opalcore  |  2 +
 .../powerpc/firmware-assisted-dump.rst| 15 +++--
 arch/powerpc/platforms/powernv/opal-core.c| 55 ++-
 3 files changed, 51 insertions(+), 21 deletions(-)
 rename Documentation/ABI/{testing => 
removed}/sysfs-kernel-fadump_release_opalcore (82%)

diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore 
b/Documentation/ABI/removed/sysfs-kernel-fadump_release_opalcore
similarity index 82%
rename from Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore
rename to Documentation/ABI/removed/sysfs-kernel-fadump_release_opalcore
index 53313c1d4e7a..a8d46cd0f4e6 100644
--- a/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore
+++ b/Documentation/ABI/removed/sysfs-kernel-fadump_release_opalcore
@@ -1,3 +1,5 @@
+This ABI is moved to /sys/firmware/opal/mpipl/release_core.
+
 What:  /sys/kernel/fadump_release_opalcore
 Date:  Sep 2019
 Contact:   linuxppc-dev@lists.ozlabs.org
diff --git a/Documentation/powerpc/firmware-assisted-dump.rst 
b/Documentation/powerpc/firmware-assisted-dump.rst
index 0455a78486d5..345a3405206e 100644
--- a/Documentation/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/powerpc/firmware-assisted-dump.rst
@@ -112,13 +112,13 @@ to ensure that crash data is preserved to process later.
 
 -- On OPAL based machines (PowerNV), if the kernel is build with
CONFIG_OPAL_CORE=y, OPAL memory at the time of crash is also
-   exported as /sys/firmware/opal/core file. This procfs file is
+   exported as /sys/firmware/opal/mpipl/core file. This procfs file is
helpful in debugging OPAL crashes with GDB. The kernel memory
used for exporting this procfs file can be released by echo'ing
-   '1' to /sys/kernel/fadump_release_opalcore node.
+   '1' to /sys/firmware/opal/mpipl/release_core node.
 
e.g.
- # echo 1 > /sys/kernel/fadump_release_opalcore
+ # echo 1 > /sys/firmware/opal/mpipl/release_core
 
 Implementation details:
 ---
@@ -283,14 +283,17 @@ Here is the list of files under kernel sysfs:
 enhanced to use this interface to release the memory reserved for
 dump and continue without 2nd reboot.
 
- /sys/kernel/fadump_release_opalcore
+Note: /sys/kernel/fadump_release_opalcore sysfs has moved to
+  /sys/firmware/opal/mpipl/release_core
+
+ /sys/firmware/opal/mpipl/release_core
 
 This file is available only on OPAL based machines when FADump is
 active during capture kernel. This is used to release the memory
-used by the kernel to export /sys/firmware/opal/core file. To
+used by the kernel to export /sys/firmware/opal/mpipl/core file. To
 release this memory, echo '1' to it:
 
-echo 1  > /sys/kernel/fadump_release_opalcore
+echo 1  > /sys/firmware/opal/mpipl/release_core
 
 Here is the list of files under powerpc debugfs:
 (Assuming debugfs is mounted on /sys/kernel/debug directory.)
diff --git a/arch/powerpc/platforms/powernv/opal-core.c 
b/arch/powerpc/platforms/powernv/opal-core.c
index ed895d82c048..6dba3b62269f 100644
--- a/arch/powerpc/platforms/powernv/opal-core.c
+++ b/arch/powerpc/platforms/powernv/opal-core.c
@@ -71,6 +71,7 @@ static LIST_HEAD(opalcore_list);
 static struct opalcore_config *oc_conf;
 static const struct opal_mpipl_fadump *opalc_metadata;
 static const struct opal_mpipl_fadump *opalc_cpu_metadata;
+struct kobject *mpipl_kobj;
 
 /*
  * Set crashing CPU's signal to SIGUSR1. if the kernel is triggered
@@ -428,7 +429,7 @@ static void opalcore_cleanup(void)
return;
 
/* Remove OPAL core sysfs file */
-   sysfs_remove_bin_file(opal_kobj, _core_attr);
+   sysfs_remove_bin_file(mpipl_kobj, _core_attr);
oc_conf->ptload_phdr = NULL;
oc_conf->ptload_cnt = 0;
 
@@ -563,9 +564,9 @@ static void __init opalcore_config_init(void)
of_node_put(np);
 }
 
-static ssize_t fadump_release_opalcore_store(struct kobject *kobj,
-struct kobj_attribute *attr,
-const char *buf, size_t count)
+static ssize_t release_core_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
 {
int input = -1;
 
@@ -589,9 +590,23 @@ static ssize_t fadump_release_opalcore_store(struct 
kobject *kobj,
return count;
 }
 
-static struct kobj_attribute opalcore_rel_attr = 
__ATTR(fadump_release_opalcore,
-   0200, NULL,
- 

[PATCH v5 3/6] powerpc/fadump: reorganize /sys/kernel/fadump_* sysfs files

2019-12-08 Thread Sourabh Jain
As the number of FADump sysfs files increases it is hard to manage all of
them inside /sys/kernel directory. It's better to have all the FADump
related sysfs files in a dedicated directory /sys/kernel/fadump. But in
order to maintain backward compatibility a symlink has been added for every
sysfs that has moved to new location.

As the FADump sysfs files are now part of a dedicated directory there is no
need to prefix their name with fadump_, hence sysfs file names are also
updated. For example fadump_enabled sysfs file is now referred as enabled.

Also consolidate ABI documentation for all the FADump sysfs files in a
single file Documentation/ABI/testing/sysfs-kernel-fadump.

Signed-off-by: Sourabh Jain 
---
 Documentation/ABI/testing/sysfs-kernel-fadump | 33 +++
 arch/powerpc/kernel/fadump.c  | 95 ---
 2 files changed, 94 insertions(+), 34 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump

diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump 
b/Documentation/ABI/testing/sysfs-kernel-fadump
new file mode 100644
index ..5d988b919e81
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-fadump
@@ -0,0 +1,33 @@
+What:  /sys/kernel/fadump/*
+Date:  Dec 2019
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:
+   The /sys/kernel/fadump/* is a collection of FADump sysfs
+   file provide information about the configuration status
+   of Firmware Assisted Dump (FADump).
+
+What:  /sys/kernel/fadump/enabled
+Date:  Dec 2019
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   read only
+   Primarily used to identify whether the FADump is enabled in
+   the kernel or not.
+User:  Kdump service
+
+What:  /sys/kernel/fadump/registered
+Date:  Dec 2019
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   read/write
+   Helps to control the dump collect feature from userspace.
+   Setting 1 to this file enables the system to collect the
+   dump and 0 to disable it.
+User:  Kdump service
+
+What:  /sys/kernel/fadump/release_mem
+Date:  Dec 2019
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   write only
+   This is a special sysfs file and only available when
+   the system is booted to capture the vmcore using FADump.
+   It is used to release the memory reserved by FADump to
+   save the crash dump.
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index ed59855430b9..35ecb51edc50 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -44,6 +44,13 @@ struct fadump_mrange_info reserved_mrange_info = { 
"reserved", NULL, 0, 0, 0 };
 #ifdef CONFIG_CMA
 static struct cma *fadump_cma;
 
+#define CREATE_SYMLINK(target, symlink_name) do {\
+   rc = compat_only_sysfs_link_entry_to_kobj(kernel_kobj, fadump_kobj, \
+ target, symlink_name); \
+   if (rc) \
+   pr_err("unable to create %s symlink (%d)", symlink_name, rc); \
+} while (0)
+
 /*
  * fadump_cma_init() - Initialize CMA area from a fadump reserved memory
  *
@@ -1323,9 +1330,9 @@ static void fadump_invalidate_release_mem(void)
fw_dump.ops->fadump_init_mem_struct(_dump);
 }
 
-static ssize_t fadump_release_memory_store(struct kobject *kobj,
-   struct kobj_attribute *attr,
-   const char *buf, size_t count)
+static ssize_t release_mem_store(struct kobject *kobj,
+struct kobj_attribute *attr,
+const char *buf, size_t count)
 {
int input = -1;
 
@@ -1350,23 +1357,23 @@ static ssize_t fadump_release_memory_store(struct 
kobject *kobj,
return count;
 }
 
-static ssize_t fadump_enabled_show(struct kobject *kobj,
-   struct kobj_attribute *attr,
-   char *buf)
+static ssize_t enabled_show(struct kobject *kobj,
+   struct kobj_attribute *attr,
+   char *buf)
 {
return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
 }
 
-static ssize_t fadump_register_show(struct kobject *kobj,
-   struct kobj_attribute *attr,
-   char *buf)
+static ssize_t registered_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *buf)
 {
return sprintf(buf, "%d\n", fw_dump.dump_registered);
 }
 
-static ssize_t fadump_register_store(struct kobject *kobj,
-   struct kobj_attribute *attr,
-   const char *buf, size_t count)
+static ssize_t 

[PATCH v5 2/6] sysfs: wrap __compat_only_sysfs_link_entry_to_kobj function to change the symlink name

2019-12-08 Thread Sourabh Jain
The __compat_only_sysfs_link_entry_to_kobj function creates a symlink to a
kobject but doesn't provide an option to change the symlink file name.

This patch adds a wrapper function compat_only_sysfs_link_entry_to_kobj
that extends the __compat_only_sysfs_link_entry_to_kobj functionality
which allows function caller to customize the symlink name.

Signed-off-by: Sourabh Jain 
---
 fs/sysfs/group.c  | 28 +---
 include/linux/sysfs.h | 12 
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index d41c21fef138..0993645f0b59 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -424,6 +424,25 @@ EXPORT_SYMBOL_GPL(sysfs_remove_link_from_group);
 int __compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj,
  struct kobject *target_kobj,
  const char *target_name)
+{
+   return compat_only_sysfs_link_entry_to_kobj(kobj, target_kobj,
+   target_name, NULL);
+}
+EXPORT_SYMBOL_GPL(__compat_only_sysfs_link_entry_to_kobj);
+
+/**
+ * compat_only_sysfs_link_entry_to_kobj - add a symlink to a kobject pointing
+ * to a group or an attribute
+ * @kobj:  The kobject containing the group.
+ * @target_kobj:   The target kobject.
+ * @target_name:   The name of the target group or attribute.
+ * @symlink_name:  The name of the symlink file (target_name will be
+ * considered if symlink_name is NULL).
+ */
+int compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj,
+struct kobject *target_kobj,
+const char *target_name,
+const char *symlink_name)
 {
struct kernfs_node *target;
struct kernfs_node *entry;
@@ -448,12 +467,15 @@ int __compat_only_sysfs_link_entry_to_kobj(struct kobject 
*kobj,
return -ENOENT;
}
 
-   link = kernfs_create_link(kobj->sd, target_name, entry);
+   if (!symlink_name)
+   symlink_name = target_name;
+
+   link = kernfs_create_link(kobj->sd, symlink_name, entry);
if (IS_ERR(link) && PTR_ERR(link) == -EEXIST)
-   sysfs_warn_dup(kobj->sd, target_name);
+   sysfs_warn_dup(kobj->sd, symlink_name);
 
kernfs_put(entry);
kernfs_put(target);
return PTR_ERR_OR_ZERO(link);
 }
-EXPORT_SYMBOL_GPL(__compat_only_sysfs_link_entry_to_kobj);
+EXPORT_SYMBOL_GPL(compat_only_sysfs_link_entry_to_kobj);
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 5420817ed317..15b195a4529d 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -300,6 +300,10 @@ void sysfs_remove_link_from_group(struct kobject *kobj, 
const char *group_name,
 int __compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj,
  struct kobject *target_kobj,
  const char *target_name);
+int compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj,
+struct kobject *target_kobj,
+const char *target_name,
+const char *symlink_name);
 
 void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr);
 
@@ -508,6 +512,14 @@ static inline int __compat_only_sysfs_link_entry_to_kobj(
return 0;
 }
 
+static int compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj,
+   struct kobject *target_kobj,
+   const char *target_name,
+   const char *symlink_name)
+{
+   return 0;
+}
+
 static inline void sysfs_notify(struct kobject *kobj, const char *dir,
const char *attr)
 {
-- 
2.17.2



[PATCH v5 0/6] reorganize and add FADump sysfs files

2019-12-08 Thread Sourabh Jain
Currently, FADump sysfs files are present inside /sys/kernel directory.
But as the number of FADump sysfs file increases it is not a good idea to
push all of them in /sys/kernel directory. It is better to have separate
directory to keep all the FADump sysfs files.

Patch series reorganizes the FADump sysfs files and avail all the existing
FADump sysfs files present inside /sys/kernel into a new directory
/sys/kernel/fadump. The backward compatibility is maintained by adding a
symlink for every sysfs file that has moved to new location. Also a new
FADump sys interface is added to get the amount of memory reserved by FADump
for saving the crash dump.

Changelog:
v1 -> v2:
 - Move fadump_release_opalcore sysfs to FADump Kobject instead of
   replicating.
 - Changed the patch order 1,2,3,4 -> 2,1,3,4 (First add the ABI doc for
   exisiting sysfs file then replicate them under FADump kobject).

v2 -> v3:
 - Remove the fadump_ prefix from replicated FADump sysfs file names.

 v3 -> v4:
 - New patch that adds a wrapper function to create symlink with
   custom symlink file name.
 - Add symlink instead of replicating the FADump sysfs files.
 - Move the OPAL core rel

v4 -> v5:
 - Changed the wrapper function name in 2/6.
 - Defined FADump kobject attributes using __ATTR_* macros.
 - Replace individual FADump sysfs file creation with group.
 - Added a macro to create symlink.

Sourabh Jain (6):
  Documentation/ABI: add ABI documentation for /sys/kernel/fadump_*
  sysfs: wrap __compat_only_sysfs_link_entry_to_kobj function to change
the symlink name
  powerpc/fadump: reorganize /sys/kernel/fadump_* sysfs files
  powerpc/powernv: move core and fadump_release_opalcore under new
kobject
  Documentation/ABI: mark /sys/kernel/fadump_* sysfs files deprecated
  powerpc/fadump: sysfs for fadump memory reservation

 .../ABI/obsolete/sysfs-kernel-fadump_enabled  |   9 ++
 .../obsolete/sysfs-kernel-fadump_registered   |  10 ++
 .../obsolete/sysfs-kernel-fadump_release_mem  |  10 ++
 .../sysfs-kernel-fadump_release_opalcore  |   9 ++
 Documentation/ABI/testing/sysfs-kernel-fadump |  40 +++
 .../powerpc/firmware-assisted-dump.rst|  28 -
 arch/powerpc/kernel/fadump.c  | 104 --
 arch/powerpc/platforms/powernv/opal-core.c|  55 ++---
 fs/sysfs/group.c  |  28 -
 include/linux/sysfs.h |  12 ++
 10 files changed, 247 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/ABI/obsolete/sysfs-kernel-fadump_enabled
 create mode 100644 Documentation/ABI/obsolete/sysfs-kernel-fadump_registered
 create mode 100644 Documentation/ABI/obsolete/sysfs-kernel-fadump_release_mem
 create mode 100644 
Documentation/ABI/removed/sysfs-kernel-fadump_release_opalcore
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump

-- 
2.17.2



[PATCH v5 1/6] Documentation/ABI: add ABI documentation for /sys/kernel/fadump_*

2019-12-08 Thread Sourabh Jain
Add missing ABI documentation for existing FADump sysfs files.

Signed-off-by: Sourabh Jain 
---
 Documentation/ABI/testing/sysfs-kernel-fadump_enabled | 7 +++
 Documentation/ABI/testing/sysfs-kernel-fadump_registered  | 8 
 Documentation/ABI/testing/sysfs-kernel-fadump_release_mem | 8 
 .../ABI/testing/sysfs-kernel-fadump_release_opalcore  | 7 +++
 4 files changed, 30 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump_enabled
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump_registered
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump_release_mem
 create mode 100644 
Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore

diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_enabled 
b/Documentation/ABI/testing/sysfs-kernel-fadump_enabled
new file mode 100644
index ..f73632b1c006
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-fadump_enabled
@@ -0,0 +1,7 @@
+What:  /sys/kernel/fadump_enabled
+Date:  Feb 2012
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   read only
+   Primarily used to identify whether the FADump is enabled in
+   the kernel or not.
+User:  Kdump service
diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_registered 
b/Documentation/ABI/testing/sysfs-kernel-fadump_registered
new file mode 100644
index ..dcf925e53f0f
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-fadump_registered
@@ -0,0 +1,8 @@
+What:  /sys/kernel/fadump_registered
+Date:  Feb 2012
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   read/write
+   Helps to control the dump collect feature from userspace.
+   Setting 1 to this file enables the system to collect the
+   dump and 0 to disable it.
+User:  Kdump service
diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem 
b/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem
new file mode 100644
index ..9c20d64ab48d
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem
@@ -0,0 +1,8 @@
+What:  /sys/kernel/fadump_release_mem
+Date:  Feb 2012
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   write only
+   This is a special sysfs file and only available when
+   the system is booted to capture the vmcore using FADump.
+   It is used to release the memory reserved by FADump to
+   save the crash dump.
diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore 
b/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore
new file mode 100644
index ..53313c1d4e7a
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore
@@ -0,0 +1,7 @@
+What:  /sys/kernel/fadump_release_opalcore
+Date:  Sep 2019
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   write only
+   The sysfs file is available when the system is booted to
+   collect the dump on OPAL based machine. It used to release
+   the memory used to collect the opalcore.
-- 
2.17.2



[PATCH V2 13/13] powerpc/vas: Free send window in VAS instance after credits returned

2019-12-08 Thread Haren Myneni


NX may be processing requests while trying to close window. Wait until
all credits are returned and then free send window from VAS instance.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-window.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 578f144..5322d1c 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1309,14 +1309,14 @@ int vas_win_close(struct vas_window *window)
 
unmap_paste_region(window);
 
-   clear_vinst_win(window);
-
poll_window_busy_state(window);
 
unpin_close_window(window);
 
poll_window_credits(window);
 
+   clear_vinst_win(window);
+
poll_window_castout(window);
 
/* if send window, drop reference to matching receive window */
-- 
1.8.3.1





[PATCH V2 12/13] powerpc/vas: Display process stuck message

2019-12-08 Thread Haren Myneni


Process can not close send window until all requests are processed.
Means wait until window state is not busy and send credits are
returned. Display debug message in case taking longer to close the
window.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-window.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 27848d3..578f144 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1176,6 +1176,7 @@ static void poll_window_credits(struct vas_window *window)
 {
u64 val;
int creds, mode;
+   int count = 0;
 
val = read_hvwc_reg(window, VREG(WINCTL));
if (window->tx_win)
@@ -1194,10 +1195,25 @@ static void poll_window_credits(struct vas_window 
*window)
creds = GET_FIELD(VAS_LRX_WCRED, val);
}
 
+   /*
+* Takes around few microseconds to complete all pending requests
+* and return credits.
+* TODO: Issue CRB Kill to stop all pending requests. Need only
+*   if there is a bug in NX or fault handling in kernel.
+*/
if (creds < window->wcreds_max) {
val = 0;
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(msecs_to_jiffies(10));
+   count++;
+   /*
+* Process can not close send window until all credits are
+* returned.
+*/
+   if (!(count % 1))
+   pr_debug("%s() pid %d stuck? retries %d\n", __func__,
+   vas_window_pid(window), count);
+
goto retry;
}
 }
@@ -1211,6 +1227,7 @@ static void poll_window_busy_state(struct vas_window 
*window)
 {
int busy;
u64 val;
+   int count = 0;
 
 retry:
val = read_hvwc_reg(window, VREG(WIN_STATUS));
@@ -1219,6 +1236,15 @@ static void poll_window_busy_state(struct vas_window 
*window)
val = 0;
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(msecs_to_jiffies(5));
+   count++;
+   /*
+* Takes around 5 microseconds to process all pending
+* requests.
+*/
+   if (!(count % 1))
+   pr_debug("%s() pid %d stuck? retries %d\n", __func__,
+   vas_window_pid(window), count);
+
goto retry;
}
 }
-- 
1.8.3.1





[PATCH V2 11/13] powerpc/vas: Return credits after handling fault

2019-12-08 Thread Haren Myneni


NX expects OS to return credit for send window after processing each
fault. Also credit has to be returned even for fault window.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-fault.c  | 10 ++
 arch/powerpc/platforms/powernv/vas-window.c | 17 +
 arch/powerpc/platforms/powernv/vas.h|  1 +
 3 files changed, 28 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
index cf41b65..926fdf3 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -247,6 +247,11 @@ irqreturn_t vas_fault_handler(int irq, void *data)
memset(fifo, 0, CRB_SIZE);
mutex_unlock(>mutex);
 
+   /*
+* Return credit for the fault window.
+*/
+   vas_return_credit(vinst->fault_win, 0);
+
pr_devel("VAS[%d] fault_fifo %p, fifo %p, fault_crbs %d\n",
vinst->vas_id, vinst->fault_fifo, fifo,
vinst->fault_crbs);
@@ -273,6 +278,11 @@ irqreturn_t vas_fault_handler(int irq, void *data)
}
 
update_csb(window, crb);
+   /*
+* Return credit for send window after processing
+* fault CRB.
+*/
+   vas_return_credit(window, 1);
} while (true);
 
return IRQ_HANDLED;
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 941582b..27848d3 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1312,6 +1312,23 @@ int vas_win_close(struct vas_window *window)
 }
 EXPORT_SYMBOL_GPL(vas_win_close);
 
+/*
+ * Return credit for the given window.
+ */
+void vas_return_credit(struct vas_window *window, bool tx)
+{
+   uint64_t val;
+
+   val = 0ULL;
+   if (tx) { /* send window */
+   val = SET_FIELD(VAS_TX_WCRED, val, 1);
+   write_hvwc_reg(window, VREG(TX_WCRED_ADDER), val);
+   } else {
+   val = SET_FIELD(VAS_LRX_WCRED, val, 1);
+   write_hvwc_reg(window, VREG(LRX_WCRED_ADDER), val);
+   }
+}
+
 struct vas_window *vas_pswid_to_window(struct vas_instance *vinst,
uint32_t pswid)
 {
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index d7398b7..6332683 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -415,6 +415,7 @@ struct vas_winctx {
 extern void vas_window_free_dbgdir(struct vas_window *win);
 extern int vas_setup_fault_window(struct vas_instance *vinst);
 extern irqreturn_t vas_fault_handler(int irq, void *data);
+extern void vas_return_credit(struct vas_window *window, bool tx);
 extern struct vas_window *vas_pswid_to_window(struct vas_instance *vinst,
uint32_t pswid);
 
-- 
1.8.3.1





[PATCH V2 10/13] powerpc/vas: Do not use default credits for receive window

2019-12-08 Thread Haren Myneni


System checkstops if RxFIFO overruns with more requests than the
maximum possible number of CRBs allowed in FIFO at any time. So
max credits value (rxattr.wcreds_max) is set and is passed to
vas_rx_win_open() by the the driver.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-window.c | 4 ++--
 arch/powerpc/platforms/powernv/vas.h| 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 344db11..941582b 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -772,7 +772,7 @@ static bool rx_win_args_valid(enum vas_cop_type cop,
if (attr->rx_fifo_size > VAS_RX_FIFO_SIZE_MAX)
return false;
 
-   if (attr->wcreds_max > VAS_RX_WCREDS_MAX)
+   if (!attr->wcreds_max)
return false;
 
if (attr->nx_win) {
@@ -878,7 +878,7 @@ struct vas_window *vas_rx_win_open(int vasid, enum 
vas_cop_type cop,
rxwin->nx_win = rxattr->nx_win;
rxwin->user_win = rxattr->user_win;
rxwin->cop = cop;
-   rxwin->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT;
+   rxwin->wcreds_max = rxattr->wcreds_max;
 
init_winctx_for_rxwin(rxwin, rxattr, );
init_winctx_regs(rxwin, );
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index cd609ce..d7398b7 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -101,11 +101,9 @@
 /*
  * Initial per-process credits.
  * Max send window credits:4K-1 (12-bits in VAS_TX_WCRED)
- * Max receive window credits: 64K-1 (16 bits in VAS_LRX_WCRED)
  *
  * TODO: Needs tuning for per-process credits
  */
-#define VAS_RX_WCREDS_MAX  ((64 << 10) - 1)
 #define VAS_TX_WCREDS_MAX  ((4 << 10) - 1)
 #define VAS_WCREDS_DEFAULT (1 << 10)
 
-- 
1.8.3.1





[PATCH V2 09/13] powerpc/vas: Print CRB and FIFO values

2019-12-08 Thread Haren Myneni


Dump FIFO entry values if could not find send window and print CRB
for debugging.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-fault.c | 41 ++
 1 file changed, 41 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
index 88a211b..cf41b65 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -26,6 +26,28 @@
  */
 #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20)
 
+static void dump_crb(struct coprocessor_request_block *crb)
+{
+   struct data_descriptor_entry *dde;
+   struct nx_fault_stamp *nx;
+
+   dde = >source;
+   pr_devel("SrcDDE: addr 0x%llx, len %d, count %d, idx %d, flags %d\n",
+   be64_to_cpu(dde->address), be32_to_cpu(dde->length),
+   dde->count, dde->index, dde->flags);
+
+   dde = >target;
+   pr_devel("TgtDDE: addr 0x%llx, len %d, count %d, idx %d, flags %d\n",
+   be64_to_cpu(dde->address), be32_to_cpu(dde->length),
+   dde->count, dde->index, dde->flags);
+
+   nx = >stamp.nx;
+   pr_devel("NX Stamp: PSWID 0x%x, FSA 0x%llx, flags 0x%x, FS 0x%x\n",
+   be32_to_cpu(nx->pswid),
+   be64_to_cpu(crb->stamp.nx.fault_storage_addr),
+   nx->flags, be32_to_cpu(nx->fault_status));
+}
+
 static void notify_process(pid_t pid, u64 fault_addr)
 {
int rc;
@@ -154,6 +176,23 @@ static void update_csb(struct vas_window *window,
}
 }
 
+static void dump_fifo(struct vas_instance *vinst, void *entry)
+{
+   int i;
+   unsigned long *fifo = entry;
+
+   pr_err("Fault fifo size %d, max crbs %d, crb size %lu\n",
+   vinst->fault_fifo_size,
+   vinst->fault_fifo_size / CRB_SIZE,
+   sizeof(struct coprocessor_request_block));
+
+   pr_err("Fault FIFO Entry Dump:\n");
+   for (i = 0; i < CRB_SIZE; i += 4, fifo += 4) {
+   pr_err("[%.3d, %p]: 0x%.16lx 0x%.16lx 0x%.16lx 0x%.16lx\n",
+   i, fifo, *fifo, *(fifo+1), *(fifo+2), *(fifo+3));
+   }
+}
+
 /*
  * Process CRBs that we receive on the fault window.
  */
@@ -212,6 +251,7 @@ irqreturn_t vas_fault_handler(int irq, void *data)
vinst->vas_id, vinst->fault_fifo, fifo,
vinst->fault_crbs);
 
+   dump_crb(crb);
window = vas_pswid_to_window(vinst,
be32_to_cpu(crb->stamp.nx.pswid));
 
@@ -222,6 +262,7 @@ irqreturn_t vas_fault_handler(int irq, void *data)
 * even clean it up (return credit).
 * But we should not get here.
 */
+   dump_fifo(vinst, (void *)crb);
pr_err("VAS[%d] fault_fifo %p, fifo %p, pswid 0x%x, 
fault_crbs %d bad CRB?\n",
vinst->vas_id, vinst->fault_fifo, fifo,
be32_to_cpu(crb->stamp.nx.pswid),
-- 
1.8.3.1





[PATCH V2 08/13] powerpc/vas: Update CSB and notify process for fault CRBs

2019-12-08 Thread Haren Myneni


For each fault CRB, update fault address in CRB (fault_storage_addr)
and translation error status in CSB so that user space can touch the
fault address and resend the request. If the user space passed invalid
CSB address send signal to process with SIGSEGV.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-fault.c | 130 +
 1 file changed, 130 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
index e1e34c6..88a211b 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -25,6 +26,134 @@
  */
 #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20)
 
+static void notify_process(pid_t pid, u64 fault_addr)
+{
+   int rc;
+   struct kernel_siginfo info;
+
+   memset(, 0, sizeof(info));
+
+   info.si_signo = SIGSEGV;
+   info.si_errno = EFAULT;
+   info.si_code = SEGV_MAPERR;
+   info.si_addr = (void *)fault_addr;
+   /*
+* process will be polling on csb.flags after request is sent to
+* NX. So generally CSB update should not fail except when an
+* application does not follow the process properly. So an error
+* message will be displayed and leave it to user space whether
+* to ignore or handle this signal.
+*/
+   rcu_read_lock();
+   rc = kill_pid_info(SIGSEGV, , find_vpid(pid));
+   rcu_read_unlock();
+
+   pr_devel("%s(): pid %d kill_proc_info() rc %d\n", __func__, pid, rc);
+}
+
+/*
+ * Update the CSB to indicate a translation error.
+ *
+ * If the fault is in the CSB address itself or if we are unable to
+ * update the CSB, send a signal to the process, because we have no
+ * other way of notifying the user process.
+ *
+ * Remaining settings in the CSB are based on wait_for_csb() of
+ * NX-GZIP.
+ */
+static void update_csb(struct vas_window *window,
+   struct coprocessor_request_block *crb)
+{
+   int rc;
+   pid_t pid;
+   int task_exit = 0;
+   void __user *csb_addr;
+   struct task_struct *tsk;
+   struct coprocessor_status_block csb;
+
+   /*
+* NX user space windows can not be opened for task->mm=NULL
+* and faults will not be generated for kernel requests.
+*/
+   if (!window->mm || !window->user_win)
+   return;
+
+   csb_addr = (void *)be64_to_cpu(crb->csb_addr);
+
+   csb.cc = CSB_CC_TRANSLATION;
+   csb.ce = CSB_CE_TERMINATION;
+   csb.cs = 0;
+   csb.count = 0;
+
+   /*
+* Returns the fault address in CPU format since it is passed with
+* signal. But if the user space expects BE format, need changes.
+* i.e either kernel (here) or user should convert to CPU format.
+* Not both!
+*/
+   csb.address = be64_to_cpu(crb->stamp.nx.fault_storage_addr);
+   csb.flags = 0;
+
+   use_mm(window->mm);
+   rc = copy_to_user(csb_addr, , sizeof(csb));
+   /*
+* User space polls on csb.flags (first byte). So add barrier
+* then copy first byte with csb flags update.
+*/
+   smp_mb();
+   if (!rc) {
+   csb.flags = CSB_V;
+   rc = copy_to_user(csb_addr, , sizeof(u8));
+   }
+   unuse_mm(window->mm);
+
+   /* Success */
+   if (!rc)
+   return;
+
+   /*
+* User space passed invalid CSB address, Notify process with
+* SEGV signal.
+*/
+   tsk = get_pid_task(window->pid, PIDTYPE_PID);
+   /*
+* Send window will be closed after processing all NX requests
+* and process exits after closing all windows. In multi-thread
+* applications, thread may not exists, but does not close FD
+* (means send window) upon exit. Parent thread (tgid) can use
+* and close the window later.
+* pid and mm references are taken when window is opened by
+* process (pid). So tgid is used only when child thread is not
+* available in multithread tasks.
+*
+*/
+   if (tsk) {
+   if (tsk->flags & PF_EXITING)
+   task_exit = 1;
+   put_task_struct(tsk);
+   pid = vas_window_pid(window);
+   } else {
+   pid = window->tgid;
+
+   rcu_read_lock();
+   tsk = find_task_by_vpid(pid);
+   if (!tsk) {
+   rcu_read_unlock();
+   return;
+   }
+   if (tsk->flags & PF_EXITING)
+   task_exit = 1;
+   rcu_read_unlock();
+   }
+
+   /* Do not notify if the task is exiting. */
+   if (!task_exit) {
+   pr_err("Invalid CSB address 0x%p signalling pid(%d)\n",
+   

[PATCH V2 07/13] powerpc/vas: Take reference to PID and mm for user space windows

2019-12-08 Thread Haren Myneni


Process close windows after its requests are completed. In multi-thread
applications, child can open a window but release FD will not be called
upon its exit. Parent thread will be closing it later upon its exit.

The parent can also send NX requests with this window and NX can
generate page faults. After kernel handles the page fault, send
signal to process by using PID if CSB address is invalid. Parent
thread will not receive signal since its PID is different from the one
saved in vas_window. So use tgid in case if the task for the pid saved
in window is not running and send signal to its parent.

To prevent reusing the pid until the window closed, take reference to
pid and task mm.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-debug.c  |  2 +-
 arch/powerpc/platforms/powernv/vas-window.c | 44 ++---
 arch/powerpc/platforms/powernv/vas.h|  9 +-
 3 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-debug.c 
b/arch/powerpc/platforms/powernv/vas-debug.c
index 09e63df..ef9a717 100644
--- a/arch/powerpc/platforms/powernv/vas-debug.c
+++ b/arch/powerpc/platforms/powernv/vas-debug.c
@@ -38,7 +38,7 @@ static int info_show(struct seq_file *s, void *private)
 
seq_printf(s, "Type: %s, %s\n", cop_to_str(window->cop),
window->tx_win ? "Send" : "Receive");
-   seq_printf(s, "Pid : %d\n", window->pid);
+   seq_printf(s, "Pid : %d\n", vas_window_pid(window));
 
 unlock:
mutex_unlock(_mutex);
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index e36c5d2..344db11 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -12,6 +12,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include "vas.h"
@@ -877,8 +879,6 @@ struct vas_window *vas_rx_win_open(int vasid, enum 
vas_cop_type cop,
rxwin->user_win = rxattr->user_win;
rxwin->cop = cop;
rxwin->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT;
-   if (rxattr->user_win)
-   rxwin->pid = task_pid_vnr(current);
 
init_winctx_for_rxwin(rxwin, rxattr, );
init_winctx_regs(rxwin, );
@@ -1028,7 +1028,6 @@ struct vas_window *vas_tx_win_open(int vasid, enum 
vas_cop_type cop,
txwin->tx_win = 1;
txwin->rxwin = rxwin;
txwin->nx_win = txwin->rxwin->nx_win;
-   txwin->pid = attr->pid;
txwin->user_win = attr->user_win;
txwin->wcreds_max = attr->wcreds_max ?: VAS_WCREDS_DEFAULT;
 
@@ -1069,6 +1068,34 @@ struct vas_window *vas_tx_win_open(int vasid, enum 
vas_cop_type cop,
goto free_window;
}
 
+   if (txwin->user_win) {
+   /*
+* Window opened by child thread may not be closed when
+* it exits. So take reference to its pid and release it
+* when the window is free by parent thread.
+* Acquire a reference to the task's pid to make sure
+* pid will not be re-used.
+*/
+   txwin->pid = get_task_pid(current, PIDTYPE_PID);
+   /*
+* Acquire a reference to the task's mm.
+*/
+   txwin->mm = get_task_mm(current);
+
+   if (txwin->mm) {
+   mmput(txwin->mm);
+   mmgrab(txwin->mm);
+   mm_context_add_copro(txwin->mm);
+   } else {
+   put_pid(txwin->pid);
+   pr_err("VAS: pid(%d): mm_struct is not found\n",
+   current->pid);
+   rc = -EPERM;
+   goto free_window;
+   }
+   txwin->tgid = task_tgid_vnr(current);
+   }
+
set_vinst_win(vinst, txwin);
 
return txwin;
@@ -1267,8 +1294,17 @@ int vas_win_close(struct vas_window *window)
poll_window_castout(window);
 
/* if send window, drop reference to matching receive window */
-   if (window->tx_win)
+   if (window->tx_win) {
+   if (window->user_win) {
+   /* Drop references to pid and mm */
+   put_pid(window->pid);
+   if (window->mm) {
+   mmdrop(window->mm);
+   mm_context_remove_copro(window->mm);
+   }
+   }
put_rx_win(window->rxwin);
+   }
 
vas_window_free(window);
 
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index 2621df1..cd609ce 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -340,7 +340,9 @@ struct vas_window {
bool user_win;  /* True if user space window */
  

[PATCH V2 06/13] powerpc/vas: Register NX with fault window ID and IRQ port value

2019-12-08 Thread Haren Myneni


For each user space send window, register NX with fault window ID
and port value so that NX paste CRBs in this fault FIFO when it
sees fault on the request buffer.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-window.c | 15 +--
 arch/powerpc/platforms/powernv/vas.h| 15 +++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index cec1b41..e36c5d2 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -373,7 +373,7 @@ int init_winctx_regs(struct vas_window *window, struct 
vas_winctx *winctx)
init_xlate_regs(window, winctx->user_win);
 
val = 0ULL;
-   val = SET_FIELD(VAS_FAULT_TX_WIN, val, 0);
+   val = SET_FIELD(VAS_FAULT_TX_WIN, val, winctx->fault_win_id);
write_hvwc_reg(window, VREG(FAULT_TX_WIN), val);
 
/* In PowerNV, interrupts go to HV. */
@@ -748,6 +748,8 @@ static void init_winctx_for_rxwin(struct vas_window *rxwin,
 
winctx->min_scope = VAS_SCOPE_LOCAL;
winctx->max_scope = VAS_SCOPE_VECTORED_GROUP;
+   if (rxwin->vinst->virq)
+   winctx->irq_port = rxwin->vinst->irq_port;
 }
 
 static bool rx_win_args_valid(enum vas_cop_type cop,
@@ -945,13 +947,22 @@ static void init_winctx_for_txwin(struct vas_window 
*txwin,
winctx->lpid = txattr->lpid;
winctx->pidr = txattr->pidr;
winctx->rx_win_id = txwin->rxwin->winid;
+   /*
+* IRQ and fault window setup is successful. Set fault window
+* for the send window so that ready to handle faults.
+*/
+   if (txwin->vinst->virq)
+   winctx->fault_win_id = txwin->vinst->fault_win->winid;
 
winctx->dma_type = VAS_DMA_TYPE_INJECT;
winctx->tc_mode = txattr->tc_mode;
winctx->min_scope = VAS_SCOPE_LOCAL;
winctx->max_scope = VAS_SCOPE_VECTORED_GROUP;
+   if (txwin->vinst->virq)
+   winctx->irq_port = txwin->vinst->irq_port;
 
-   winctx->pswid = 0;
+   winctx->pswid = txattr->pswid ? txattr->pswid :
+   encode_pswid(txwin->vinst->vas_id, txwin->winid);
 }
 
 static bool tx_win_args_valid(enum vas_cop_type cop,
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index 879f5b4..2621df1 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -455,6 +455,21 @@ static inline u64 read_hvwc_reg(struct vas_window *win,
return in_be64(win->hvwc_map+reg);
 }
 
+/*
+ * Encode/decode the Partition Send Window ID (PSWID) for a window in
+ * a way that we can uniquely identify any window in the system. i.e.
+ * we should be able to locate the 'struct vas_window' given the PSWID.
+ *
+ * BitsUsage
+ * 0:7 VAS id (8 bits)
+ * 8:15Unused, 0 (3 bits)
+ * 16:31   Window id (16 bits)
+ */
+static inline u32 encode_pswid(int vasid, int winid)
+{
+   return ((u32)winid | (vasid << (31 - 7)));
+}
+
 static inline void decode_pswid(u32 pswid, int *vasid, int *winid)
 {
if (vasid)
-- 
1.8.3.1





[PATCH V2 05/13] powerpc/vas: Setup thread IRQ handler per VAS instance

2019-12-08 Thread Haren Myneni


Setup thread IRQ handler per each VAS instance. When NX sees a fault
on CRB, kernel gets an interrupt and vas_fault_handler will be
executed to process fault CRBs. Read all valid CRBs from fault FIFO,
determine the corresponding send window from CRB and process fault
requests.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-fault.c  | 83 +
 arch/powerpc/platforms/powernv/vas-window.c | 60 +
 arch/powerpc/platforms/powernv/vas.c| 15 +-
 arch/powerpc/platforms/powernv/vas.h|  4 ++
 4 files changed, 161 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
index b0258ed..e1e34c6 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "vas.h"
@@ -25,6 +26,88 @@
 #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20)
 
 /*
+ * Process CRBs that we receive on the fault window.
+ */
+irqreturn_t vas_fault_handler(int irq, void *data)
+{
+   struct vas_instance *vinst = (struct vas_instance *)data;
+   struct coprocessor_request_block buf, *crb;
+   struct vas_window *window;
+   void *fifo;
+
+   /*
+* VAS can interrupt with multiple page faults. So process all
+* valid CRBs within fault FIFO until reaches invalid CRB.
+* NX updates nx_fault_stamp in CRB and pastes in fault FIFO.
+* kernel retrives send window from parition send window ID
+* (pswid) in nx_fault_stamp. So pswid should be non-zero and
+* use this to check whether CRB is valid.
+* After reading CRB entry, it is reset with 0's in fault FIFO.
+*
+* In case kernel receives another interrupt with different page
+* fault and CRBs are processed by the previous handling, will be
+* returned from this function when it sees invalid CRB (means 0's).
+*/
+   do {
+   mutex_lock(>mutex);
+
+   /*
+* Advance the fault fifo pointer to next CRB.
+* Use CRB_SIZE rather than sizeof(*crb) since the latter is
+* aligned to CRB_ALIGN (256) but the CRB written to by VAS is
+* only CRB_SIZE in len.
+*/
+   fifo = vinst->fault_fifo + (vinst->fault_crbs * CRB_SIZE);
+   crb = (struct coprocessor_request_block *)fifo;
+
+   /*
+* pswid returned from NX will be in _be32, but just
+* checking non-zero value to make sure the CRB is valid.
+* Return if reached invalid CRB.
+*/
+   if (!crb->stamp.nx.pswid) {
+   mutex_unlock(>mutex);
+   return IRQ_HANDLED;
+   }
+
+   vinst->fault_crbs++;
+   if (vinst->fault_crbs == vinst->fault_fifo_size/CRB_SIZE)
+   vinst->fault_crbs = 0;
+
+   crb = 
+   memcpy(crb, fifo, CRB_SIZE);
+   memset(fifo, 0, CRB_SIZE);
+   mutex_unlock(>mutex);
+
+   pr_devel("VAS[%d] fault_fifo %p, fifo %p, fault_crbs %d\n",
+   vinst->vas_id, vinst->fault_fifo, fifo,
+   vinst->fault_crbs);
+
+   window = vas_pswid_to_window(vinst,
+   be32_to_cpu(crb->stamp.nx.pswid));
+
+   if (IS_ERR(window)) {
+   /*
+* We got an interrupt about a specific send
+* window but we can't find that window and we can't
+* even clean it up (return credit).
+* But we should not get here.
+*/
+   pr_err("VAS[%d] fault_fifo %p, fifo %p, pswid 0x%x, 
fault_crbs %d bad CRB?\n",
+   vinst->vas_id, vinst->fault_fifo, fifo,
+   be32_to_cpu(crb->stamp.nx.pswid),
+   vinst->fault_crbs);
+
+   WARN_ON_ONCE(1);
+   return IRQ_HANDLED;
+   }
+
+   } while (true);
+
+   return IRQ_HANDLED;
+}
+
+/*
  * Fault window is opened per VAS instance. NX pastes fault CRB in fault
  * FIFO upon page faults.
  */
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index f07f49a..cec1b41 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1041,6 +1041,15 @@ struct vas_window *vas_tx_win_open(int vasid, enum 
vas_cop_type cop,
}
} else {
/*
+* Interrupt hanlder or fault window setup failed. Means
+* NX can not 

[PATCH V2 04/13] powerpc/vas: Setup fault window per VAS instance

2019-12-08 Thread Haren Myneni


Setup fault window for each VAS instance. When NX gets fault on request
buffer, write fault CRBs in the corresponding fault FIFO and then sends
an interrupt to the OS.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/Makefile |  2 +-
 arch/powerpc/platforms/powernv/vas-fault.c  | 73 +
 arch/powerpc/platforms/powernv/vas-window.c |  3 +-
 arch/powerpc/platforms/powernv/vas.c| 24 ++
 arch/powerpc/platforms/powernv/vas.h|  5 ++
 5 files changed, 105 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/vas-fault.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index a3ac964..74c2246 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -17,6 +17,6 @@ obj-$(CONFIG_MEMORY_FAILURE)  += opal-memory-errors.o
 obj-$(CONFIG_OPAL_PRD) += opal-prd.o
 obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
-obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o
+obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o vas-fault.o
 obj-$(CONFIG_OCXL_BASE)+= ocxl.o
 obj-$(CONFIG_SCOM_DEBUGFS) += opal-xscom.o
diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
new file mode 100644
index 000..b0258ed
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * VAS Fault handling.
+ * Copyright 2019, IBM Corporation
+ */
+
+#define pr_fmt(fmt) "vas: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vas.h"
+
+/*
+ * The maximum FIFO size for fault window can be 8MB
+ * (VAS_RX_FIFO_SIZE_MAX). Using 4MB FIFO since each VAS
+ * instance will be having fault window.
+ * 8MB FIFO can be used if expects more faults for each VAS
+ * instance.
+ */
+#define VAS_FAULT_WIN_FIFO_SIZE(4 << 20)
+
+/*
+ * Fault window is opened per VAS instance. NX pastes fault CRB in fault
+ * FIFO upon page faults.
+ */
+int vas_setup_fault_window(struct vas_instance *vinst)
+{
+   struct vas_rx_win_attr attr;
+
+   vinst->fault_fifo_size = VAS_FAULT_WIN_FIFO_SIZE;
+   vinst->fault_fifo = kzalloc(vinst->fault_fifo_size, GFP_KERNEL);
+   if (!vinst->fault_fifo) {
+   pr_err("Unable to alloc %d bytes for fault_fifo\n",
+   vinst->fault_fifo_size);
+   return -ENOMEM;
+   }
+
+   vas_init_rx_win_attr(, VAS_COP_TYPE_FAULT);
+
+   attr.rx_fifo_size = vinst->fault_fifo_size;
+   attr.rx_fifo = vinst->fault_fifo;
+
+   /*
+* Max creds is based on number of CRBs can fit in the FIFO.
+* (fault_fifo_size/CRB_SIZE). If 8MB FIFO is used, max creds
+* will be 0x since the receive creds field is 16bits wide.
+*/
+   attr.wcreds_max = vinst->fault_fifo_size / CRB_SIZE;
+   attr.lnotify_lpid = 0;
+   attr.lnotify_pid = mfspr(SPRN_PID);
+   attr.lnotify_tid = mfspr(SPRN_PID);
+
+   vinst->fault_win = vas_rx_win_open(vinst->vas_id, VAS_COP_TYPE_FAULT,
+   );
+
+   if (IS_ERR(vinst->fault_win)) {
+   pr_err("VAS: Error %ld opening FaultWin\n",
+   PTR_ERR(vinst->fault_win));
+   kfree(vinst->fault_fifo);
+   return PTR_ERR(vinst->fault_win);
+   }
+
+   pr_devel("VAS: Created FaultWin %d, LPID/PID/TID [%d/%d/%d]\n",
+   vinst->fault_win->winid, attr.lnotify_lpid,
+   attr.lnotify_pid, attr.lnotify_tid);
+
+   return 0;
+}
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 0c0d27d..f07f49a 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -827,9 +827,10 @@ void vas_init_rx_win_attr(struct vas_rx_win_attr *rxattr, 
enum vas_cop_type cop)
rxattr->fault_win = true;
rxattr->notify_disable = true;
rxattr->rx_wcred_mode = true;
-   rxattr->tx_wcred_mode = true;
rxattr->rx_win_ord_mode = true;
rxattr->tx_win_ord_mode = true;
+   rxattr->rej_no_credit = true;
+   rxattr->tc_mode = VAS_THRESH_DISABLED;
} else if (cop == VAS_COP_TYPE_FTW) {
rxattr->user_win = true;
rxattr->intr_disable = true;
diff --git a/arch/powerpc/platforms/powernv/vas.c 
b/arch/powerpc/platforms/powernv/vas.c
index 40d8213..ec34c06 100644
--- a/arch/powerpc/platforms/powernv/vas.c
+++ b/arch/powerpc/platforms/powernv/vas.c
@@ -23,6 +23,15 @@
 
 static DEFINE_PER_CPU(int, cpu_vas_id);
 
+static int vas_irq_fault_window_setup(struct vas_instance *vinst)
+{
+   int rc = 0;
+
+   rc = vas_setup_fault_window(vinst);
+

[PATCH V2 03/13] powerpc/vas: Read interrupts and vas-port device tree properties

2019-12-08 Thread Haren Myneni


Read interrupts and vas-port device tree properties per each VAS
instance. NX generates an interrupt when it sees page fault on the
request buffer. Interrupts property is used to setup IRQ for handing
the fault and set port value for each user space send window.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas.c | 40 
 arch/powerpc/platforms/powernv/vas.h |  2 ++
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas.c 
b/arch/powerpc/platforms/powernv/vas.c
index ed9cc6d..40d8213 100644
--- a/arch/powerpc/platforms/powernv/vas.c
+++ b/arch/powerpc/platforms/powernv/vas.c
@@ -25,10 +25,11 @@
 
 static int init_vas_instance(struct platform_device *pdev)
 {
-   int rc, cpu, vasid;
-   struct resource *res;
-   struct vas_instance *vinst;
struct device_node *dn = pdev->dev.of_node;
+   int rc, cpu, vasid, nresources = 5;
+   struct vas_instance *vinst;
+   struct resource *res;
+   uint64_t port;
 
rc = of_property_read_u32(dn, "ibm,vas-id", );
if (rc) {
@@ -36,7 +37,18 @@ static int init_vas_instance(struct platform_device *pdev)
return -ENODEV;
}
 
-   if (pdev->num_resources != 4) {
+   rc = of_property_read_u64(dn, "ibm,vas-port", );
+   if (rc) {
+   pr_err("No ibm,vas-port property for %s?\n", pdev->name);
+   /* No interrupts property */
+   nresources = 4;
+   }
+
+   /*
+* interrupts property is available with 'ibm,vas-port' property.
+* 4 Resources and 1 IRQ if interrupts property is available.
+*/
+   if (pdev->num_resources != nresources) {
pr_err("Unexpected DT configuration for [%s, %d]\n",
pdev->name, vasid);
return -ENODEV;
@@ -51,6 +63,7 @@ static int init_vas_instance(struct platform_device *pdev)
mutex_init(>mutex);
vinst->vas_id = vasid;
vinst->pdev = pdev;
+   vinst->irq_port = port;
 
res = >resource[0];
vinst->hvwc_bar_start = res->start;
@@ -66,12 +79,23 @@ static int init_vas_instance(struct platform_device *pdev)
pr_err("Bad 'paste_win_id_shift' in DT, %llx\n", res->end);
goto free_vinst;
}
-
vinst->paste_win_id_shift = 63 - res->end;
 
-   pr_devel("Initialized instance [%s, %d], paste_base 0x%llx, "
-   "paste_win_id_shift 0x%llx\n", pdev->name, vasid,
-   vinst->paste_base_addr, vinst->paste_win_id_shift);
+   /* interrupts property */
+   if (pdev->num_resources == 5) {
+   res = >resource[4];
+   vinst->virq = res->start;
+   if (vinst->virq <= 0) {
+   pr_err("IRQ resource is not available for [%s, %d]\n",
+   pdev->name, vasid);
+   vinst->virq = 0;
+   }
+   }
+
+   pr_devel("Initialized instance [%s, %d] paste_base 0x%llx 
paste_win_id_shift 0x%llx IRQ %d Port 0x%llx\n",
+   pdev->name, vasid, vinst->paste_base_addr,
+   vinst->paste_win_id_shift, vinst->virq,
+   vinst->irq_port);
 
for_each_possible_cpu(cpu) {
if (cpu_to_chip_id(cpu) == of_get_ibm_chip_id(dn))
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index 5574aec..598608b 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -313,6 +313,8 @@ struct vas_instance {
u64 paste_base_addr;
u64 paste_win_id_shift;
 
+   u64 irq_port;
+   int virq;
struct mutex mutex;
struct vas_window *rxwin[VAS_COP_TYPE_MAX];
struct vas_window *windows[VAS_WINDOWS_PER_CHIP];
-- 
1.8.3.1





[PATCH V2 02/13] powerpc/vas: Define nx_fault_stamp in coprocessor_request_block

2019-12-08 Thread Haren Myneni


Kernel sets fault address and status in CRB for NX page fault on user
space address after processing page fault. User space gets the signal
and handles the fault mentioned in CRB by bringing the page in to
memory and send NX request again.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/icswx.h | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/icswx.h b/arch/powerpc/include/asm/icswx.h
index 9872f85..b233d1e 100644
--- a/arch/powerpc/include/asm/icswx.h
+++ b/arch/powerpc/include/asm/icswx.h
@@ -108,6 +108,17 @@ struct data_descriptor_entry {
__be64 address;
 } __packed __aligned(DDE_ALIGN);
 
+/* 4.3.2 NX-stamped Fault CRB */
+
+#define NX_STAMP_ALIGN  (0x10)
+
+struct nx_fault_stamp {
+   __be64 fault_storage_addr;
+   __be16 reserved;
+   __u8   flags;
+   __u8   fault_status;
+   __be32 pswid;
+} __packed __aligned(NX_STAMP_ALIGN);
 
 /* Chapter 6.5.2 Coprocessor-Request Block (CRB) */
 
@@ -135,7 +146,12 @@ struct coprocessor_request_block {
 
struct coprocessor_completion_block ccb;
 
-   u8 reserved[48];
+   union {
+   struct nx_fault_stamp nx;
+   u8 reserved[16];
+   } stamp;
+
+   u8 reserved[32];
 
struct coprocessor_status_block csb;
 } __packed __aligned(CRB_ALIGN);
-- 
1.8.3.1





[PATCH V2 01/13] powerpc/vas: Describe vas-port and interrupts properties

2019-12-08 Thread Haren Myneni


Signed-off-by: Haren Myneni 
---
 Documentation/devicetree/bindings/powerpc/ibm,vas.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/powerpc/ibm,vas.txt 
b/Documentation/devicetree/bindings/powerpc/ibm,vas.txt
index bf11d2f..12de08b 100644
--- a/Documentation/devicetree/bindings/powerpc/ibm,vas.txt
+++ b/Documentation/devicetree/bindings/powerpc/ibm,vas.txt
@@ -11,6 +11,8 @@ Required properties:
   window context start and length, OS/User window context start and length,
   "Paste address" start and length, "Paste window id" start bit and number
   of bits)
+- ibm,vas-port : Port address for the interrupt.
+- interrupts: IRQ value for each VAS instance and level.
 
 Example:
 
@@ -18,5 +20,8 @@ Example:
compatible = "ibm,vas", "ibm,power9-vas";
reg = <0x60191 0x200 0x60190 0x1 
0x8 0x1 0x20 0x10>;
name = "vas";
+   interrupts = <0x1f 0>;
+   interrupt-parent = <>;
ibm,vas-id = <0x1>;
+   ibm,vas-port = <0x601000100>;
};
-- 
1.8.3.1





[PATCH V2 00/13] powerpc/vas: Page fault handling for user space NX requests

2019-12-08 Thread Haren Myneni


Applications will send compression / decompression requests to NX with
COPY/PASTE instructions. When NX is processing these requests, can hit
fault on the request buffer (not in memory). It issues an interrupt and
pastes fault CRB in fault FIFO. Expects kernel to handle this fault and
return credits for both send and fault windows after processing.

This patch series adds IRQ and fault window setup, and NX fault handling:
- Read IRQ# from "interrupts" property and configure IRQ per VAS instance.
- Set port# for each window to generate an interrupt when noticed fault.
- Set fault window and FIFO on which NX paste fault CRB.
- Setup IRQ thread fault handler per VAS instance.
- When receiving an interrupt, Read CRBs from fault FIFO and update
  coprocessor_status_block (CSB) in the corresponding CRB with translation
  failure (CSB_CC_TRANSLATION). After issuing NX requests, process polls
  on CSB address. When it sees translation error, can touch the request
  buffer to bring the page in to memory and reissue NX request.
- If copy_to_user fails on user space CSB address, OS sends SEGV signal.

Tested these patches with NX-GZIP support and will be posting this series
soon.

Patch 2: Define nx_fault_stamp on which NX writes fault status for the fault
 CRB
Patch 3: Read interrupts and port properties per VAS instance
Patch 4: Setup fault window per each VAS instance. This window is used for
 NX to paste fault CRB in FIFO.
Patches 5 & 6: Setup threaded IRQ per VAS and register NX with fault window
 ID and port number for each send window so that NX paste fault CRB
 in this window.
Patch 7: Reference to pid and mm so that pid is not used until window closed.
 Needed for multi thread application where child can open a window
 and can be used by parent later.
Patches 8 and 9: Process CRBs from fault FIFO and notify tasks by
 updating CSB or through signals.
Patches 10 and 11: Return credits for send and fault windows after handling
faults.
Patch 13:Fix closing send window after all credits are returned. This issue
 happens only for user space requests. No page faults on kernel
 request buffer.

Changelog:
V2:
  - Use threaded IRQ instead of own kernel thread handler
  - Use pswid insted of user space CSB address to find valid CRB
  - Removed unused macros and other changes as suggested by Christoph Hellwig

Haren Myneni (13):
  powerpc/vas: Describe vas-port and interrupts properties
  powerpc/vas: Define nx_fault_stamp in coprocessor_request_block
  powerpc/vas: Read interrupts and vas-port device tree properties
  powerpc/vas: Setup fault window per VAS instance
  powerpc/vas: Setup thread IRQ handler per VAS instance
  powerpc/vas: Register NX with fault window ID and IRQ port value
  powerpc/vas: Take reference to PID and mm for user space windows
  powerpc/vas: Update CSB and notify process for fault CRBs
  powerpc/vas: Print CRB and FIFO values
  powerpc/vas: Do not use default credits for receive window
  powerpc/VAS: Return credits after handling fault
  powerpc/vas: Display process stuck message
  powerpc/vas: Free send window in VAS instance after credits returned

 .../devicetree/bindings/powerpc/ibm,vas.txt|   5 +
 arch/powerpc/include/asm/icswx.h   |  18 +-
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/vas-debug.c |   2 +-
 arch/powerpc/platforms/powernv/vas-fault.c | 337 +
 arch/powerpc/platforms/powernv/vas-window.c| 173 ++-
 arch/powerpc/platforms/powernv/vas.c   |  77 -
 arch/powerpc/platforms/powernv/vas.h   |  38 ++-
 8 files changed, 627 insertions(+), 25 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/vas-fault.c

-- 
1.8.3.1





Re: [bug] userspace hitting sporadic SIGBUS on xfs (Power9, ppc64le), v4.19 and later

2019-12-08 Thread Eric Sandeen



On 12/6/19 6:09 PM, dftxbs3e wrote:
> Hello!
> 
> I am very happy that someone has found this issue.
> 
> I have been suffering from rather random SIGBUS errors in similar
> conditions described by the author.
> 
> I don't have much troubleshooting information to provide, however, I hit
> the issue regularly so I could investigate during that.
> 
> How do you debug such an issue? I tried a debugger etc. but besides
> crashing with SIGBUS, I couldnt get any other meaningful information.

You may want to test the patch Christoph sent on the original thread for
this issue.

-Eric


[tip: sched/urgent] sched/rt, powerpc: Use CONFIG_PREEMPTION

2019-12-08 Thread tip-bot2 for Thomas Gleixner
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: fdc5569eaba997852e0bfb57d11af496e4c1fa9a
Gitweb:
https://git.kernel.org/tip/fdc5569eaba997852e0bfb57d11af496e4c1fa9a
Author:Thomas Gleixner 
AuthorDate:Thu, 24 Oct 2019 18:04:58 +02:00
Committer: Ingo Molnar 
CommitterDate: Sun, 08 Dec 2019 14:37:32 +01:00

sched/rt, powerpc: Use CONFIG_PREEMPTION

CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Switch the entry code over to use CONFIG_PREEMPTION.

[bigeasy: +Kconfig]

Signed-off-by: Thomas Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Linus Torvalds 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20191024160458.vlnf3wlcyjl2i...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 arch/powerpc/Kconfig   | 2 +-
 arch/powerpc/kernel/entry_32.S | 4 ++--
 arch/powerpc/kernel/entry_64.S | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e446bb5..c781170 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -106,7 +106,7 @@ config LOCKDEP_SUPPORT
 config GENERIC_LOCKBREAK
bool
default y
-   depends on SMP && PREEMPT
+   depends on SMP && PREEMPTION
 
 config GENERIC_HWEIGHT
bool
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index d60908e..e1a4c39 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -897,7 +897,7 @@ resume_kernel:
bne-0b
 1:
 
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
/* check current_thread_info->preempt_count */
lwz r0,TI_PREEMPT(r2)
cmpwi   0,r0,0  /* if non-zero, just restore regs and return */
@@ -921,7 +921,7 @@ resume_kernel:
 */
bl  trace_hardirqs_on
 #endif
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
 restore_kuap:
kuap_restore r1, r2, r9, r10, r0
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 3fd3ef3..a9a1d3c 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -846,7 +846,7 @@ resume_kernel:
bne-0b
 1:
 
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
/* Check if we need to preempt */
andi.   r0,r4,_TIF_NEED_RESCHED
beq+restore
@@ -877,7 +877,7 @@ resume_kernel:
li  r10,MSR_RI
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
 
.globl  fast_exc_return_irq
 fast_exc_return_irq:


[PATCH] powerpc/irq: don't use current_stack_pointer() in do_IRQ()

2019-12-08 Thread Christophe Leroy
Before commit 7306e83ccf5c ("powerpc: Don't use CURRENT_THREAD_INFO to
find the stack"), the current stack base address was obtained by
calling current_thread_info(). That inline function was simply masking
out the value of r1.

In that commit, it was changed to using current_stack_pointer(), which
is an heavier function as it is an outline assembly function which
cannot be inlined and which reads the content of the stack at 0(r1)

Revert to just getting r1 and masking out its value to obtain the base
address of the stack pointer as before.

Signed-off-by: Christophe Leroy 
Fixes: 7306e83ccf5c ("powerpc: Don't use CURRENT_THREAD_INFO to find the stack")
---
 arch/powerpc/kernel/irq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 240eca12c71d..bb34005ff9d2 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -693,10 +693,11 @@ void __do_irq(struct pt_regs *regs)
 void do_IRQ(struct pt_regs *regs)
 {
struct pt_regs *old_regs = set_irq_regs(regs);
+   register unsigned long r1 asm("r1");
void *cursp, *irqsp, *sirqsp;
 
/* Switch to the irq stack to handle this */
-   cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
+   cursp = (void *)(r1 & ~(THREAD_SIZE - 1));
irqsp = hardirq_ctx[raw_smp_processor_id()];
sirqsp = softirq_ctx[raw_smp_processor_id()];
 
-- 
2.13.3