Re: [PATCH 0/5] s390/pci: automatic error recovery

2021-09-06 Thread Linas Vepstas
On Mon, Sep 6, 2021 at 4:49 AM Niklas Schnelle 
wrote:

>  I believe we might be the first
> implementation of PCI device recovery in a virtualized setting requiring
> us to
> coordinate the device reset with the hypervisor platform by issuing a
> disable
> and re-enable to the platform as well as starting the recovery following
> a platform event.
>

I recall none of the details, but SRIOV is a standardized system for
sharing a PCI device across multiple virtual machines. It has detailed info
on what the hypervisor must do, and what the local OS instance must do to
accomplish this.  It's part of the PCI standard, and its more than a decade
old now, maybe two. Being a part of the PCI standard, it was interoperable
with error recovery, to the best of my recollection. At the time it was
introduced, it got pushed very aggressively.  The x86 hypervisor vendors
were aiming at the heart of zseries, and were militant about it.

-- Linas

-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.


Re: [PATCH 0/5] s390/pci: automatic error recovery

2021-09-06 Thread Oliver O'Halloran
On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle  wrote:
>
> Patch 3 I already sent separately resulting in the discussion below but 
> without
> a final conclusion.
>
> https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/
>
> I believe even though there were some doubts about the use of
> pci_dev_is_added() by arch code the existing uses as well as the use in the
> final patch of this series warrant this export.

The use of pci_dev_is_added() in arch/powerpc was because in the past
pci_bus_add_device() could be called before pci_device_add(). That was
fixed a while ago so It should be safe to remove those calls now.

> Patch 4 "PCI: Export pci_dev_lock()" is basically an extension to commit
> e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") which
> already exported pci_dev_trylock(). In the final patch we make use of
> pci_dev_lock() to wait for any other exclusive uses of the pdev to be finished
> before starting recovery.

Hmm, I noticed the EEH
(arch/powerpc/kernel/eeh_driver.c:eeh_pe_report_edev())  and the
generic PCIe error recovery code (see
drivers/pci/pcie/err.c:report_error_detected()) only call
device_lock() before entering the driver's error handling callbacks. I
wonder if they should be using pci_dev_lock() instead. The only real
difference is that pci_dev_lock() will also block user space from
accessing the device's config space while error recovery is in
progress which seems sensible enough.


Re: [PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h

2021-09-06 Thread kernel test robot
Hi Niklas,

I love your patch! Yet something to improve:

[auto build test ERROR on s390/features]
[also build test ERROR on next-20210906]
[cannot apply to pci/next powerpc/next v5.14]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# 
https://github.com/0day-ci/linux/commit/404ed8c00a612e7ae31c50557c80c6726c464863
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309
git checkout 404ed8c00a612e7ae31c50557c80c6726c464863
# save the attached .config to linux build tree
make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   drivers/pci/hotplug/acpiphp_glue.c: In function 'acpiphp_add_context':
>> drivers/pci/hotplug/acpiphp_glue.c:330:6: error: implicit declaration of 
>> function 'pci_bus_read_dev_vendor_id' [-Werror=implicit-function-declaration]
 330 |  if (pci_bus_read_dev_vendor_id(pbus, PCI_DEVFN(device, function),
 |  ^~
   drivers/pci/hotplug/acpiphp_glue.c: In function 'enable_slot':
>> drivers/pci/hotplug/acpiphp_glue.c:505:6: error: implicit declaration of 
>> function '__pci_bus_size_bridges'; did you mean 'pci_bus_size_bridges'? 
>> [-Werror=implicit-function-declaration]
 505 |  __pci_bus_size_bridges(dev->subordinate,
 |  ^~
 |  pci_bus_size_bridges
>> drivers/pci/hotplug/acpiphp_glue.c:510:3: error: implicit declaration of 
>> function '__pci_bus_assign_resources'; did you mean 
>> 'pci_bus_assign_resources'? [-Werror=implicit-function-declaration]
 510 |   __pci_bus_assign_resources(bus, _list, NULL);
 |   ^~
 |   pci_bus_assign_resources
   drivers/pci/hotplug/acpiphp_glue.c: In function 'trim_stale_devices':
>> drivers/pci/hotplug/acpiphp_glue.c:660:3: error: implicit declaration of 
>> function 'pci_dev_set_disconnected' [-Werror=implicit-function-declaration]
 660 |   pci_dev_set_disconnected(dev, NULL);
 |   ^~~~
>> drivers/pci/hotplug/acpiphp_glue.c:661:7: error: implicit declaration of 
>> function 'pci_has_subordinate' [-Werror=implicit-function-declaration]
 661 |   if (pci_has_subordinate(dev))
 |   ^~~
   cc1: some warnings being treated as errors


vim +/pci_bus_read_dev_vendor_id +330 drivers/pci/hotplug/acpiphp_glue.c

4e8662bbd680c5 Kristen Accardi   2006-06-28  217  
3799c5a032aefb Rafael J. Wysocki 2014-02-16  218  /**
3799c5a032aefb Rafael J. Wysocki 2014-02-16  219   * acpiphp_add_context - Add 
ACPIPHP context to an ACPI device object.
3799c5a032aefb Rafael J. Wysocki 2014-02-16  220   * @handle: ACPI handle of 
the object to add a context to.
3799c5a032aefb Rafael J. Wysocki 2014-02-16  221   * @lvl: Not used.
3799c5a032aefb Rafael J. Wysocki 2014-02-16  222   * @data: The object's parent 
ACPIPHP bridge.
3799c5a032aefb Rafael J. Wysocki 2014-02-16  223   * @rv: Not used.
3799c5a032aefb Rafael J. Wysocki 2014-02-16  224   */
3799c5a032aefb Rafael J. Wysocki 2014-02-16  225  static acpi_status 
acpiphp_add_context(acpi_handle handle, u32 lvl, void *data,
cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13  226
   void **rv)
^1da177e4c3f41 Linus Torvalds2005-04-16  227  {
cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13  228struct acpiphp_bridge 
*bridge = data;
cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13  229struct acpiphp_context 
*context;
bbcbfc0eed6220 Rafael J. Wysocki 2014-02-04  230struct acpi_device 
*adev;
^1da177e4c3f41 Linus Torvalds2005-04-16  231struct acpiphp_slot 
*slot;
^1da177e4c3f41 Linus Torvalds2005-04-16  232struct acpiphp_func 
*newfunc;
^1da177e4c3f41 Linus Torvalds2005-04-16  233acpi_status status = 
AE_OK;
bbd34fcdd1b201 Rafael J. Wysocki 2013-07-13  234unsigned long long adr;
bbd34fcdd1b201 Rafael J. Wysocki 2013-07-13  235int device, function;
e8c331e963c58b Kenji Kaneshige   2008-12-17  236struct pci_bus *pbus = 
bridge->pci_bus;
bbd34fcdd1b201 Rafael J. Wysocki 2013-07-13  237struct pci_dev *pdev = 
bridge->pci_dev;
3b63aaa70e1ccc Jiang Liu 2013-04-12  238u32 val;
^1da177e4c3f41 Linus Torvalds2005-04-16  239  
dfb117b

Re: [PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h

2021-09-06 Thread kernel test robot
Hi Niklas,

I love your patch! Yet something to improve:

[auto build test ERROR on s390/features]
[also build test ERROR on next-20210906]
[cannot apply to pci/next powerpc/next v5.14]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: i386-randconfig-a016-20210906 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 
6fe2beba7d2a41964af658c8c59dd172683ef739)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/404ed8c00a612e7ae31c50557c80c6726c464863
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309
git checkout 404ed8c00a612e7ae31c50557c80c6726c464863
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> drivers/pci/hotplug/acpiphp_glue.c:330:6: error: implicit declaration of 
>> function 'pci_bus_read_dev_vendor_id' 
>> [-Werror,-Wimplicit-function-declaration]
   if (pci_bus_read_dev_vendor_id(pbus, PCI_DEVFN(device, function),
   ^
>> drivers/pci/hotplug/acpiphp_glue.c:505:6: error: implicit declaration of 
>> function '__pci_bus_size_bridges' [-Werror,-Wimplicit-function-declaration]
   
__pci_bus_size_bridges(dev->subordinate,
   ^
   drivers/pci/hotplug/acpiphp_glue.c:505:6: note: did you mean 
'pci_bus_size_bridges'?
   include/linux/pci.h:1336:6: note: 'pci_bus_size_bridges' declared here
   void pci_bus_size_bridges(struct pci_bus *bus);
^
>> drivers/pci/hotplug/acpiphp_glue.c:510:3: error: implicit declaration of 
>> function '__pci_bus_assign_resources' 
>> [-Werror,-Wimplicit-function-declaration]
   __pci_bus_assign_resources(bus, _list, NULL);
   ^
   drivers/pci/hotplug/acpiphp_glue.c:510:3: note: did you mean 
'pci_bus_assign_resources'?
   include/linux/pci.h:1334:6: note: 'pci_bus_assign_resources' declared here
   void pci_bus_assign_resources(const struct pci_bus *bus);
^
   drivers/pci/hotplug/acpiphp_glue.c:604:8: error: implicit declaration of 
function 'pci_bus_read_dev_vendor_id' [-Werror,-Wimplicit-function-declaration]
   if (pci_bus_read_dev_vendor_id(slot->bus,
   ^
   drivers/pci/hotplug/acpiphp_glue.c:619:7: error: implicit declaration of 
function 'pci_bus_read_dev_vendor_id' [-Werror,-Wimplicit-function-declaration]
   if (pci_bus_read_dev_vendor_id(slot->bus,
   ^
>> drivers/pci/hotplug/acpiphp_glue.c:660:3: error: implicit declaration of 
>> function 'pci_dev_set_disconnected' [-Werror,-Wimplicit-function-declaration]
   pci_dev_set_disconnected(dev, NULL);
   ^
>> drivers/pci/hotplug/acpiphp_glue.c:661:7: error: implicit declaration of 
>> function 'pci_has_subordinate' [-Werror,-Wimplicit-function-declaration]
   if (pci_has_subordinate(dev))
   ^
   7 errors generated.


vim +/pci_bus_read_dev_vendor_id +330 drivers/pci/hotplug/acpiphp_glue.c

4e8662bbd680c5 Kristen Accardi   2006-06-28  217  
3799c5a032aefb Rafael J. Wysocki 2014-02-16  218  /**
3799c5a032aefb Rafael J. Wysocki 2014-02-16  219   * acpiphp_add_context - Add 
ACPIPHP context to an ACPI device object.
3799c5a032aefb Rafael J. Wysocki 2014-02-16  220   * @handle: ACPI handle of 
the object to add a context to.
3799c5a032aefb Rafael J. Wysocki 2014-02-16  221   * @lvl: Not used.
3799c5a032aefb Rafael J. Wysocki 2014-02-16  222   * @data: The object's parent 
ACPIPHP bridge.
3799c5a032aefb Rafael J. Wysocki 2014-02-16  223   * @rv: Not used.
3799c5a032aefb Rafael J. Wysocki 2014-02-16  224   */
3799c5a032aefb Rafael J. Wysocki 2014-02-16  225  static acpi_status 
acpiphp_add_context(acpi_handle handle, u32 lvl, void *data,
cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13  226
   void **rv)
^1da177e4c3f41 Linus Torvalds2005-04-16  227  {
cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13  228struct acpiphp_bridge 
*bridge = data;
cb7b8cedf6c88b Rafael J. Wysocki 2013-07-13  229struct ac

Re: [PATCH v2] ftrace: Cleanup ftrace_dyn_arch_init()

2021-09-06 Thread Helge Deller

On 9/6/21 1:16 PM, Weizhao Ouyang wrote:

Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common
ftrace_dyn_arch_init() to cleanup them.

Signed-off-by: Weizhao Ouyang 
Acked-by: Heiko Carstens  (s390)


Acked-by: Helge Deller  # parisc

Thanks,
Helge



---

Changes in v2:
-- correct CONFIG_DYNAMIC_FTRACE on PowerPC
-- add Acked-by tag

---
  arch/arm/kernel/ftrace.c  | 5 -
  arch/arm64/kernel/ftrace.c| 5 -
  arch/csky/kernel/ftrace.c | 5 -
  arch/ia64/kernel/ftrace.c | 6 --
  arch/microblaze/kernel/ftrace.c   | 5 -
  arch/mips/include/asm/ftrace.h| 2 ++
  arch/nds32/kernel/ftrace.c| 5 -
  arch/parisc/kernel/ftrace.c   | 5 -
  arch/powerpc/include/asm/ftrace.h | 4 
  arch/riscv/kernel/ftrace.c| 5 -
  arch/s390/kernel/ftrace.c | 5 -
  arch/sh/kernel/ftrace.c   | 5 -
  arch/sparc/kernel/ftrace.c| 5 -
  arch/x86/kernel/ftrace.c  | 5 -
  include/linux/ftrace.h| 1 -
  kernel/trace/ftrace.c | 5 +
  16 files changed, 11 insertions(+), 62 deletions(-)

diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index 3c83b5d29697..a006585e1c09 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -193,11 +193,6 @@ int ftrace_make_nop(struct module *mod,

return ret;
  }
-
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
  #endif /* CONFIG_DYNAMIC_FTRACE */

  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 7f467bd9db7a..fc62dfe73f93 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -236,11 +236,6 @@ void arch_ftrace_update_code(int command)
command |= FTRACE_MAY_SLEEP;
ftrace_modify_all_code(command);
  }
-
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
  #endif /* CONFIG_DYNAMIC_FTRACE */

  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
diff --git a/arch/csky/kernel/ftrace.c b/arch/csky/kernel/ftrace.c
index b4a7ec1517ff..50bfcf129078 100644
--- a/arch/csky/kernel/ftrace.c
+++ b/arch/csky/kernel/ftrace.c
@@ -133,11 +133,6 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
(unsigned long)func, true, true);
return ret;
  }
-
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
  #endif /* CONFIG_DYNAMIC_FTRACE */

  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
diff --git a/arch/ia64/kernel/ftrace.c b/arch/ia64/kernel/ftrace.c
index b2ab2d58fb30..d6360fd404ab 100644
--- a/arch/ia64/kernel/ftrace.c
+++ b/arch/ia64/kernel/ftrace.c
@@ -194,9 +194,3 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
flush_icache_range(addr, addr + 16);
return 0;
  }
-
-/* run from kstop_machine */
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
diff --git a/arch/microblaze/kernel/ftrace.c b/arch/microblaze/kernel/ftrace.c
index 224eea40e1ee..188749d62709 100644
--- a/arch/microblaze/kernel/ftrace.c
+++ b/arch/microblaze/kernel/ftrace.c
@@ -163,11 +163,6 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long 
addr)
return ret;
  }

-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
-
  int ftrace_update_ftrace_func(ftrace_func_t func)
  {
unsigned long ip = (unsigned long)(_call);
diff --git a/arch/mips/include/asm/ftrace.h b/arch/mips/include/asm/ftrace.h
index b463f2aa5a61..ed013e767390 100644
--- a/arch/mips/include/asm/ftrace.h
+++ b/arch/mips/include/asm/ftrace.h
@@ -76,6 +76,8 @@ do {  \


  #ifdef CONFIG_DYNAMIC_FTRACE
+int __init ftrace_dyn_arch_init(void);
+
  static inline unsigned long ftrace_call_adjust(unsigned long addr)
  {
return addr;
diff --git a/arch/nds32/kernel/ftrace.c b/arch/nds32/kernel/ftrace.c
index 0e23e3a8df6b..f0ef4842d191 100644
--- a/arch/nds32/kernel/ftrace.c
+++ b/arch/nds32/kernel/ftrace.c
@@ -84,11 +84,6 @@ void _ftrace_caller(unsigned long parent_ip)
/* restore all state needed by the compiler epilogue */
  }

-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
-
  static unsigned long gen_sethi_insn(unsigned long addr)
  {
unsigned long opcode = 0x4600;
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 0a1e75af5382..01581f715737 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -94,11 +94,6 @@ int ftrace_disable_ftrace_graph_caller(void)
  #endif

  #ifdef CONFIG_DYNAMIC_FTRACE
-
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
  int ftrace_update_ftrace_func(ftrace_func_t func)
  {
return 0;
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index debe8c4f7062..d59f67c0225f 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -61,6 +61,10 @@ struct dyn_arch_ftrace {
  };
  #endif /* __ASSEMBLY__ */

+#ifdef CONFIG_DYNAMIC_FTRACE

Re: [PATCH v2] ftrace: Cleanup ftrace_dyn_arch_init()

2021-09-06 Thread Weizhao Ouyang


On 2021/9/6 22:22, Michael Ellerman wrote:
> Weizhao Ouyang  writes:
>> Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common
>> ftrace_dyn_arch_init() to cleanup them.
>>
>> Signed-off-by: Weizhao Ouyang 
>> Acked-by: Heiko Carstens  (s390)
>>
>> ---
>>
>> Changes in v2:
>> -- correct CONFIG_DYNAMIC_FTRACE on PowerPC
>> -- add Acked-by tag
>> diff --git a/arch/powerpc/include/asm/ftrace.h 
>> b/arch/powerpc/include/asm/ftrace.h
>> index debe8c4f7062..d59f67c0225f 100644
>> --- a/arch/powerpc/include/asm/ftrace.h
>> +++ b/arch/powerpc/include/asm/ftrace.h
>> @@ -61,6 +61,10 @@ struct dyn_arch_ftrace {
>>  };
>>  #endif /* __ASSEMBLY__ */
>>  
>> +#ifdef CONFIG_DYNAMIC_FTRACE
>> +int __init ftrace_dyn_arch_init(void);
>> +#endif /* CONFIG_DYNAMIC_FTRACE */
>> +
>>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>>  #define ARCH_SUPPORTS_FTRACE_OPS 1
>>  #endif
> That breaks the build for powerpc:
>
>   /linux/arch/powerpc/include/asm/ftrace.h: Assembler messages:
>   /linux/arch/powerpc/include/asm/ftrace.h:65: Error: unrecognized opcode: 
> `int'
>   make[4]: *** [/linux/scripts/Makefile.build:352: 
> arch/powerpc/kernel/trace/ftrace_64.o] Error 1
>   make[3]: *** [/linux/scripts/Makefile.build:514: arch/powerpc/kernel/trace] 
> Error 2
>   make[2]: *** [/linux/scripts/Makefile.build:514: arch/powerpc/kernel] Error 
> 2
>   make[1]: *** [/linux/Makefile:1861: arch/powerpc] Error 2
>   make[1]: *** Waiting for unfinished jobs
>
> It needs to be inside an #ifndef __ASSEMBLY__ section.
>
> cheers

Thanks for reply, I'll fix it up.


Re: [PATCH v2] ftrace: Cleanup ftrace_dyn_arch_init()

2021-09-06 Thread Michael Ellerman
Weizhao Ouyang  writes:
> Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common
> ftrace_dyn_arch_init() to cleanup them.
>
> Signed-off-by: Weizhao Ouyang 
> Acked-by: Heiko Carstens  (s390)
>
> ---
>
> Changes in v2:
> -- correct CONFIG_DYNAMIC_FTRACE on PowerPC
> -- add Acked-by tag

> diff --git a/arch/powerpc/include/asm/ftrace.h 
> b/arch/powerpc/include/asm/ftrace.h
> index debe8c4f7062..d59f67c0225f 100644
> --- a/arch/powerpc/include/asm/ftrace.h
> +++ b/arch/powerpc/include/asm/ftrace.h
> @@ -61,6 +61,10 @@ struct dyn_arch_ftrace {
>  };
>  #endif /* __ASSEMBLY__ */
>  
> +#ifdef CONFIG_DYNAMIC_FTRACE
> +int __init ftrace_dyn_arch_init(void);
> +#endif /* CONFIG_DYNAMIC_FTRACE */
> +
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>  #define ARCH_SUPPORTS_FTRACE_OPS 1
>  #endif

That breaks the build for powerpc:

  /linux/arch/powerpc/include/asm/ftrace.h: Assembler messages:
  /linux/arch/powerpc/include/asm/ftrace.h:65: Error: unrecognized opcode: `int'
  make[4]: *** [/linux/scripts/Makefile.build:352: 
arch/powerpc/kernel/trace/ftrace_64.o] Error 1
  make[3]: *** [/linux/scripts/Makefile.build:514: arch/powerpc/kernel/trace] 
Error 2
  make[2]: *** [/linux/scripts/Makefile.build:514: arch/powerpc/kernel] Error 2
  make[1]: *** [/linux/Makefile:1861: arch/powerpc] Error 2
  make[1]: *** Waiting for unfinished jobs

It needs to be inside an #ifndef __ASSEMBLY__ section.

cheers


Re: [PATCH] powerpc/mce: Fix access error in mce handler

2021-09-06 Thread Michael Ellerman
Ganesh Goudar  writes:
> We queue an irq work for deferred processing of mce event
> in realmode mce handler, where translation is disabled.
> Queuing of the work may result in accessing memory outside
> RMO region, such access needs the translation to be enabled
> for an LPAR running with hash mmu else the kernel crashes.
>
> So enable the translation before queuing the work.
>
> Without this change following trace is seen on injecting machine
> check error in an LPAR running with hash mmu.

What type of error are you injecting?

> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0-mce+ #137
> NIP:  c0735d60 LR: c0318640 CTR: 
> REGS: c0001ebff9a0 TRAP: 0300   Tainted: G   OE  (5.14.0-mce+)
> MSR:  80001003   CR: 28008228  XER: 0001
> CFAR: c031863c DAR: c0027fa8fe08 DSISR: 4000 IRQMASK: 0
> GPR00: c03186d0 c0001ebffc40 c1b0df00 c16337e8
> GPR04: c16337e8 c0027fa8fe08 0023 c16337f0
> GPR08: 0023 c12ffe08  c00801460240
> GPR12:  c0001ec9a900 c0002ac4bd00 
> GPR16: 05a0 c008006b c008006b05a0 c0ff3068
> GPR20: c0002ac4bbc0 0001 c0002ac4bbc0 c00801490298
> GPR24: c00801490108 c1636198 c00801470090 c00801470058
> GPR28: 0510 c0080100 c0080819 0019
> NIP [c0735d60] llist_add_batch+0x0/0x40
> LR [c0318640] __irq_work_queue_local+0x70/0xc0
> Call Trace:
> [c0001ebffc40] [c0001ebffc0c] 0xc0001ebffc0c (unreliable)
> [c0001ebffc60] [c03186d0] irq_work_queue+0x40/0x70
> [c0001ebffc80] [c004425c] machine_check_queue_event+0xbc/0xd0
> [c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4
>
> Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from 
> handler")

Please explain in more detail why that commit caused this breakage.

> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index 47a683cd00d2..9d1e39d42e3e 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -249,6 +249,7 @@ void machine_check_queue_event(void)
>  {
>   int index;
>   struct machine_check_event evt;
> + unsigned long msr;
>  
>   if (!get_mce_event(, MCE_EVENT_RELEASE))
>   return;
> @@ -262,8 +263,19 @@ void machine_check_queue_event(void)
>   memcpy(_paca->mce_info->mce_event_queue[index],
>  , sizeof(evt));
>  
> - /* Queue irq work to process this event later. */
> - irq_work_queue(_event_process_work);
> + /* Queue irq work to process this event later. Before
> +  * queuing the work enable translation for non radix LPAR,
> +  * as irq_work_queue may try to access memory outside RMO
> +  * region.
> +  */
> + if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
> + msr = mfmsr();
> + mtmsr(msr | MSR_IR | MSR_DR);
> + irq_work_queue(_event_process_work);
> + mtmsr(msr);
> + } else {
> + irq_work_queue(_event_process_work);
> + }
>  }

We already went to virtual mode and queued (different) irq work in
arch/powerpc/platforms/pseries/ras.c:mce_handle_error()

We also called save_mce_event() which also might have queued irq work,
via machine_check_ue_event().

So it really feels like something about the design is wrong if we have
to go to virtual mode again and queue more irq work here.

I guess we can probably merge this as a backportable fix, doing anything
else would be a bigger change.

Looking at ras.c there's the comment:

 * Enable translation as we will be accessing per-cpu variables
 * in save_mce_event() which may fall outside RMO region, also

But AFAICS it's only irq_work_queue() that touches anything percpu?

So maybe we should just not be using irq_work_queue(). It's a pretty
thin wrapper around set_dec(1), perhaps we just need to hand-roll some
real-mode friendly way of doing that.

cheers


[PATCH v2] ftrace: Cleanup ftrace_dyn_arch_init()

2021-09-06 Thread Weizhao Ouyang
Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common
ftrace_dyn_arch_init() to cleanup them.

Signed-off-by: Weizhao Ouyang 
Acked-by: Heiko Carstens  (s390)

---

Changes in v2:
-- correct CONFIG_DYNAMIC_FTRACE on PowerPC
-- add Acked-by tag

---
 arch/arm/kernel/ftrace.c  | 5 -
 arch/arm64/kernel/ftrace.c| 5 -
 arch/csky/kernel/ftrace.c | 5 -
 arch/ia64/kernel/ftrace.c | 6 --
 arch/microblaze/kernel/ftrace.c   | 5 -
 arch/mips/include/asm/ftrace.h| 2 ++
 arch/nds32/kernel/ftrace.c| 5 -
 arch/parisc/kernel/ftrace.c   | 5 -
 arch/powerpc/include/asm/ftrace.h | 4 
 arch/riscv/kernel/ftrace.c| 5 -
 arch/s390/kernel/ftrace.c | 5 -
 arch/sh/kernel/ftrace.c   | 5 -
 arch/sparc/kernel/ftrace.c| 5 -
 arch/x86/kernel/ftrace.c  | 5 -
 include/linux/ftrace.h| 1 -
 kernel/trace/ftrace.c | 5 +
 16 files changed, 11 insertions(+), 62 deletions(-)

diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index 3c83b5d29697..a006585e1c09 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -193,11 +193,6 @@ int ftrace_make_nop(struct module *mod,
 
return ret;
 }
-
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 7f467bd9db7a..fc62dfe73f93 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -236,11 +236,6 @@ void arch_ftrace_update_code(int command)
command |= FTRACE_MAY_SLEEP;
ftrace_modify_all_code(command);
 }
-
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
diff --git a/arch/csky/kernel/ftrace.c b/arch/csky/kernel/ftrace.c
index b4a7ec1517ff..50bfcf129078 100644
--- a/arch/csky/kernel/ftrace.c
+++ b/arch/csky/kernel/ftrace.c
@@ -133,11 +133,6 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
(unsigned long)func, true, true);
return ret;
 }
-
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
diff --git a/arch/ia64/kernel/ftrace.c b/arch/ia64/kernel/ftrace.c
index b2ab2d58fb30..d6360fd404ab 100644
--- a/arch/ia64/kernel/ftrace.c
+++ b/arch/ia64/kernel/ftrace.c
@@ -194,9 +194,3 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
flush_icache_range(addr, addr + 16);
return 0;
 }
-
-/* run from kstop_machine */
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
diff --git a/arch/microblaze/kernel/ftrace.c b/arch/microblaze/kernel/ftrace.c
index 224eea40e1ee..188749d62709 100644
--- a/arch/microblaze/kernel/ftrace.c
+++ b/arch/microblaze/kernel/ftrace.c
@@ -163,11 +163,6 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long 
addr)
return ret;
 }
 
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
-
 int ftrace_update_ftrace_func(ftrace_func_t func)
 {
unsigned long ip = (unsigned long)(_call);
diff --git a/arch/mips/include/asm/ftrace.h b/arch/mips/include/asm/ftrace.h
index b463f2aa5a61..ed013e767390 100644
--- a/arch/mips/include/asm/ftrace.h
+++ b/arch/mips/include/asm/ftrace.h
@@ -76,6 +76,8 @@ do {  \
 
 
 #ifdef CONFIG_DYNAMIC_FTRACE
+int __init ftrace_dyn_arch_init(void);
+
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
return addr;
diff --git a/arch/nds32/kernel/ftrace.c b/arch/nds32/kernel/ftrace.c
index 0e23e3a8df6b..f0ef4842d191 100644
--- a/arch/nds32/kernel/ftrace.c
+++ b/arch/nds32/kernel/ftrace.c
@@ -84,11 +84,6 @@ void _ftrace_caller(unsigned long parent_ip)
/* restore all state needed by the compiler epilogue */
 }
 
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
-
 static unsigned long gen_sethi_insn(unsigned long addr)
 {
unsigned long opcode = 0x4600;
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 0a1e75af5382..01581f715737 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -94,11 +94,6 @@ int ftrace_disable_ftrace_graph_caller(void)
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
-
-int __init ftrace_dyn_arch_init(void)
-{
-   return 0;
-}
 int ftrace_update_ftrace_func(ftrace_func_t func)
 {
return 0;
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index debe8c4f7062..d59f67c0225f 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -61,6 +61,10 @@ struct dyn_arch_ftrace {
 };
 #endif /* __ASSEMBLY__ */
 
+#ifdef CONFIG_DYNAMIC_FTRACE
+int __init ftrace_dyn_arch_init(void);
+#endif /* CONFIG_DYNAMIC_FTRACE */
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 #define 

Re: [PATCH 1/5] KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest

2021-09-06 Thread Paolo Bonzini

On 20/08/21 20:51, Mathieu Desnoyers wrote:

Ah, or is it the case that rseq_cs is non-NULL if and only if userspace is in an
rseq critical section, and because syscalls in critical sections are illegal, by
definition clearing rseq_cs is a nop unless userspace is misbehaving.

Not quite, as I described above. But we want it to stay set so the 
CONFIG_DEBUG_RSEQ
code executed when returning from ioctl to userspace will be able to validate 
that
it is not nested within a rseq critical section.


If that's true, what about explicitly checking that at NOTIFY_RESUME?  Or is it
not worth the extra code to detect an error that will likely be caught anyways?

The error will indeed already be caught on return from ioctl to userspace, so I
don't see any added value in duplicating this check.


Sean, can you send a v2 (even for this patch only would be okay)?

Thanks,

Paolo



[PATCH 4/5] PCI: Export pci_dev_lock()

2021-09-06 Thread Niklas Schnelle
Commit e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()")
already exported pci_dev_trylock()/pci_dev_unlock() however in some
circumstances such as during error recovery it makes sense to block
waiting to get full access to the device so also export pci_dev_lock().

Signed-off-by: Niklas Schnelle 
---
 drivers/pci/pci.c   | 3 ++-
 include/linux/pci.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aacf575c15cf..3f416c6d3b0b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5040,12 +5040,13 @@ static int pci_reset_bus_function(struct pci_dev *dev, 
int probe)
return pci_parent_bus_reset(dev, probe);
 }
 
-static void pci_dev_lock(struct pci_dev *dev)
+void pci_dev_lock(struct pci_dev *dev)
 {
pci_cfg_access_lock(dev);
/* block PM suspend, driver probe, etc. */
device_lock(>dev);
 }
+EXPORT_SYMBOL_GPL(pci_dev_lock);
 
 /* Return 1 on successful lock, 0 on contention */
 int pci_dev_trylock(struct pci_dev *dev)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ea0e23dbc8ec..efc78b8d4696 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1639,6 +1639,7 @@ void pci_cfg_access_lock(struct pci_dev *dev);
 bool pci_cfg_access_trylock(struct pci_dev *dev);
 void pci_cfg_access_unlock(struct pci_dev *dev);
 
+void pci_dev_lock(struct pci_dev *dev);
 int pci_dev_trylock(struct pci_dev *dev);
 void pci_dev_unlock(struct pci_dev *dev);
 
-- 
2.25.1



[PATCH 5/5] s390/pci: implement minimal PCI error recovery

2021-09-06 Thread Niklas Schnelle
When the platform detects an error on a PCI function or a service action
has been performed it is put in the error state and an error event
notification is provided to the OS.

Currently we treat all error event notifications the same and simply set
pdev->error_state = pci_channel_io_perm_failure requiring user
intervention such as use of the recover attribute to get the device
usable again. Despite requiring a manual step this also has the
disadvantage that the device is completely torn down and recreated
resulting in higher level devices such as a block or network device
being recreated. In case of a block device this also means that it may
need to be removed and added to a software raid even if that could
otherwise survive with a temporary degradation.

This is of course not ideal more so since an error notification with PEC
0x3A indicates that the platform already performed error recovery
successfully or that the error state was caused by a service action that
is now finished.

At least in this case we can assume that the error state can be reset
and the function made usable again. So as not to have the disadvantage
of a full tear down and recreation we need to coordinate this recovery
with the driver. Thankfully there is already a well defined recovery
flow for this described in Documentation/PCI/pci-error-recovery.rst.

The implementation of this is somewhat straight forward and simplified
by the fact that our recovery flow is defined per PCI function. As
a reset we use the newly introduced zpci_hot_reset_device() which also
takes the PCI function out of the error state.

Signed-off-by: Niklas Schnelle 
---
 arch/s390/include/asm/pci.h |   4 +-
 arch/s390/pci/pci.c |  49 ++
 arch/s390/pci/pci_event.c   | 190 +++-
 3 files changed, 241 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 2a2ed165a270..558877aff2e5 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -294,8 +294,10 @@ void zpci_debug_exit(void);
 void zpci_debug_init_device(struct zpci_dev *, const char *);
 void zpci_debug_exit_device(struct zpci_dev *);
 
-/* Error reporting */
+/* Error handling */
 int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *);
+int zpci_clear_error_state(struct zpci_dev *zdev);
+int zpci_reset_load_store_blocked(struct zpci_dev *zdev);
 
 #ifdef CONFIG_NUMA
 
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index a6322f45b5bd..77a3e85d43fb 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -954,6 +954,55 @@ int zpci_report_error(struct pci_dev *pdev,
 }
 EXPORT_SYMBOL(zpci_report_error);
 
+/**
+ * zpci_clear_error_state() - Clears the zPCI error state of the device
+ * @zdev: The zdev for which the zPCI error state should be reset
+ *
+ * Clear the zPCI error state of the device. If clearing the zPCI error state
+ * fails the device is left in the error state. In this case it may make sense
+ * to call zpci_io_perm_failure() on the associated pdev if it exists.
+ *
+ * Returns: 0 on success, -EIO otherwise
+ */
+int zpci_clear_error_state(struct zpci_dev *zdev)
+{
+   u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_RESET_ERROR);
+   struct zpci_fib fib = {0};
+   u8 status;
+   int rc;
+
+   rc = zpci_mod_fc(req, , );
+   if (rc)
+   return -EIO;
+
+   return 0;
+}
+
+/**
+ * zpci_reset_load_store_blocked() - Re-enables L/S from error state
+ * @zdev: The zdev for which to unblock load/store access
+ *
+ * R-eenables load/store access for a PCI function in the error state while
+ * keeping DMA blocked. In this state drivers can poke MMIO space to determine
+ * if error recovery is possible while catching any rogue DMA access from the
+ * device.
+ *
+ * Returns: 0 on success, -EIO otherwise
+ */
+int zpci_reset_load_store_blocked(struct zpci_dev *zdev)
+{
+   u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_RESET_BLOCK);
+   struct zpci_fib fib = {0};
+   u8 status;
+   int rc;
+
+   rc = zpci_mod_fc(req, , );
+   if (rc)
+   return -EIO;
+
+   return 0;
+}
+
 static int zpci_mem_init(void)
 {
BUILD_BUG_ON(!is_power_of_2(__alignof__(struct zpci_fmb)) ||
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index e868d996ec5b..ac9ed1572d39 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -47,15 +47,190 @@ struct zpci_ccdf_avail {
u16 pec;/* PCI event code */
 } __packed;
 
+static inline bool ers_result_indicates_abort(pci_ers_result_t ers_res)
+{
+   switch (ers_res) {
+   case PCI_ERS_RESULT_CAN_RECOVER:
+   case PCI_ERS_RESULT_RECOVERED:
+   case PCI_ERS_RESULT_NEED_RESET:
+   return false;
+   default:
+   return true;
+   }
+}
+
+static pci_ers_result_t zpci_event_notify_error_detected(struct pci_dev *pdev)
+{
+   

[PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h

2021-09-06 Thread Niklas Schnelle
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
PCI arch code of both s390 and powerpc leading to awkward relative
includes. Move it to the global include/linux/pci.h and get rid of these
includes just for that one function.

Signed-off-by: Niklas Schnelle 
---
 arch/powerpc/platforms/powernv/pci-sriov.c |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  1 -
 arch/s390/pci/pci_sysfs.c  |  2 --
 drivers/pci/hotplug/acpiphp_glue.c |  1 -
 drivers/pci/pci.h  | 15 ---
 include/linux/pci.h| 15 +++
 6 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
b/arch/powerpc/platforms/powernv/pci-sriov.c
index 28aac933a439..2e0ca5451e85 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -9,9 +9,6 @@
 
 #include "pci.h"
 
-/* for pci_dev_is_added() */
-#include "../../../../drivers/pci/pci.h"
-
 /*
  * The majority of the complexity in supporting SR-IOV on PowerNV comes from
  * the need to put the MMIO space for each VF into a separate PE. Internally
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 0dfaa6ab44cc..08e846ae1853 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -74,7 +74,6 @@
 #include 
 
 #include "pseries.h"
-#include "../../../../drivers/pci/pci.h"
 
 DEFINE_STATIC_KEY_FALSE(shared_processor);
 EXPORT_SYMBOL(shared_processor);
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index 335c281811c7..40733b93a086 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -13,8 +13,6 @@
 #include 
 #include 
 
-#include "../../../drivers/pci/pci.h"
-
 #include 
 
 #define zpci_attr(name, fmt, member)   \
diff --git a/drivers/pci/hotplug/acpiphp_glue.c 
b/drivers/pci/hotplug/acpiphp_glue.c
index f031302ad401..4cb963f88183 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -38,7 +38,6 @@
 #include 
 #include 
 
-#include "../pci.h"
 #include "acpiphp.h"
 
 static LIST_HEAD(bridge_list);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 93dcdd431072..a159cd0f6f05 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct 
pci_dev *dev)
return dev->error_state == pci_channel_io_perm_failure;
 }
 
-/* pci_dev priv_flags */
-#define PCI_DEV_ADDED 0
-#define PCI_DPC_RECOVERED 1
-#define PCI_DPC_RECOVERING 2
-
-static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
-{
-   assign_bit(PCI_DEV_ADDED, >priv_flags, added);
-}
-
-static inline bool pci_dev_is_added(const struct pci_dev *dev)
-{
-   return test_bit(PCI_DEV_ADDED, >priv_flags);
-}
-
 #ifdef CONFIG_PCIEAER
 #include 
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 540b377ca8f6..ea0e23dbc8ec 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -507,6 +507,21 @@ struct pci_dev {
unsigned long   priv_flags; /* Private flags for the PCI driver */
 };
 
+/* pci_dev priv_flags */
+#define PCI_DEV_ADDED 0
+#define PCI_DPC_RECOVERED 1
+#define PCI_DPC_RECOVERING 2
+
+static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
+{
+   assign_bit(PCI_DEV_ADDED, >priv_flags, added);
+}
+
+static inline bool pci_dev_is_added(const struct pci_dev *dev)
+{
+   return test_bit(PCI_DEV_ADDED, >priv_flags);
+}
+
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
 {
 #ifdef CONFIG_PCI_IOV
-- 
2.25.1



[PATCH 2/5] s390/pci: implement reset_slot for hotplug slot

2021-09-06 Thread Niklas Schnelle
This is done by adding a zpci_hot_reset_device() call which does a low
level reset of the PCI function without changing its higher level
function state. This way it can be used while the zPCI function is bound
to a driver and with DMA tables being controlled either through the
IOMMU or DMA APIs which is prohibited when using zpci_disable_device()
as that drop existing DMA translations.

As this reset, unlike a normal FLR, also calls zpci_clear_irq() we need
to implement arch_restore_msi_irqs() and make sure we re-enable IRQs for
the PCI function if they were previously disabled.

Signed-off-by: Niklas Schnelle 
---
 arch/s390/include/asm/pci.h|  1 +
 arch/s390/pci/pci.c| 58 ++
 arch/s390/pci/pci_irq.c|  9 +
 drivers/pci/hotplug/s390_pci_hpc.c | 24 +
 4 files changed, 92 insertions(+)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 5e6cba22a801..2a2ed165a270 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -208,6 +208,7 @@ int zpci_disable_device(struct zpci_dev *);
 int zpci_scan_configured_device(struct zpci_dev *zdev, u32 fh);
 int zpci_deconfigure_device(struct zpci_dev *zdev);
 
+int zpci_hot_reset_device(struct zpci_dev *zdev);
 int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64);
 int zpci_unregister_ioat(struct zpci_dev *, u8);
 void zpci_remove_reserved_devices(void);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index af22778551c1..a6322f45b5bd 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -723,6 +723,64 @@ int zpci_disable_device(struct zpci_dev *zdev)
return rc;
 }
 
+/**
+ * zpci_hot_reset_device - perform a reset of the given zPCI function
+ * @zdev: the slot which should be reset
+ *
+ * Performs a low level reset of the zPCI function. The reset is low level in
+ * the sense that the zPCI function can be reset without detaching it from the
+ * common PCI subsystem. The reset may be performed while under control of
+ * either DMA or IOMMU APIs in which case the existing DMA/IOMMU translation
+ * table is reinstated at the end of the reset.
+ *
+ * After the reset the functions internal state is reset to an initial state
+ * equivalent to its state during boot when first probing a driver.
+ * Consequently after reset the PCI function requires re-initialization via the
+ * common PCI code including re-enabling IRQs via pci_alloc_irq_vectors()
+ * and enabling the function via e.g.pci_enablde_device_flags().The caller
+ * must guard against concurrent reset attempts.
+ *
+ * In most cases this function should not be called directly but through
+ * pci_reset_function() or pci_reset_bus() which handle the save/restore and
+ * locking.
+ *
+ * Return: 0 on success and an error value otherwise
+ */
+int zpci_hot_reset_device(struct zpci_dev *zdev)
+{
+   int rc;
+
+   zpci_dbg(3, "reset fid:%x\n", zdev->fid);
+   if (zdev_enabled(zdev)) {
+   /* Disables device access, DMAs and IRQs (reset state) */
+   rc = zpci_disable_device(zdev);
+   /*
+* Due to a z/VM vs LPAR inconsistency in the error state the
+* FH may indicate an enabled device but disable says the
+* device is already disabled don't treat it as an error here.
+*/
+   if (rc == -EINVAL)
+   rc = 0;
+   if (rc)
+   return rc;
+   }
+
+   rc = zpci_enable_device(zdev);
+   if (rc)
+   return rc;
+
+   if (zdev->dma_table) {
+   rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
+   (u64)zdev->dma_table);
+   if (rc)
+   return rc;
+   } else {
+   zpci_dma_init_device(zdev);
+   }
+
+   return 0;
+}
+
 /**
  * zpci_create_device() - Create a new zpci_dev and add it to the zbus
  * @fid: Function ID of the device to be created
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 9c7de9089939..ab98e7f5b79b 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -391,6 +391,15 @@ void arch_teardown_msi_irqs(struct pci_dev *pdev)
airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, 
zdev->msi_nr_irqs);
 }
 
+void arch_restore_msi_irqs(struct pci_dev *pdev)
+{
+   struct zpci_dev *zdev = to_zpci(pdev);
+
+   if (!zdev->irqs_registered)
+   zpci_set_irq(zdev);
+   default_restore_msi_irqs(pdev);
+}
+
 static struct airq_struct zpci_airq = {
.handler = zpci_floating_irq_handler,
.isc = PCI_ISC,
diff --git a/drivers/pci/hotplug/s390_pci_hpc.c 
b/drivers/pci/hotplug/s390_pci_hpc.c
index 014868752cd4..07f28db0eed5 100644
--- a/drivers/pci/hotplug/s390_pci_hpc.c
+++ b/drivers/pci/hotplug/s390_pci_hpc.c
@@ -57,6 +57,29 @@ static int disable_slot(struct hotplug_slot *hotplug_slot)
 

[PATCH 0/5] s390/pci: automatic error recovery

2021-09-06 Thread Niklas Schnelle
Hello,

This series implements automatic error recovery for PCI devices on s390
following the scheme outlined at Documentation/PCI/pci-error-recovery.rst
it applies on top of currenct master.

The patches have are almost completely s390 specific except for two patches
exporting existing functionality for use by arch/s390/pci/ code. Nevertheless
I would also appreciate any feedback, especially on the last patch, concerning
the implementation of the error recovery flow. I believe we might be the first
implementation of PCI device recovery in a virtualized setting requiring us to
coordinate the device reset with the hypervisor platform by issuing a disable
and re-enable to the platform as well as starting the recovery following
a platform event.

The outline of the patches is as follows:

Patch 1 and 2 add s390 specific code implementing a reset mechanism that
takes the PCI function out of the platform specific error state.

Patches 3 and 4 export existing common code functions for use by the s390
specific recovery code.

Patch 3 I already sent separately resulting in the discussion below but without
a final conclusion.

https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/

I believe even though there were some doubts about the use of
pci_dev_is_added() by arch code the existing uses as well as the use in the
final patch of this series warrant this export.

Patch 4 "PCI: Export pci_dev_lock()" is basically an extension to commit
e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") which
already exported pci_dev_trylock(). In the final patch we make use of
pci_dev_lock() to wait for any other exclusive uses of the pdev to be finished
before starting recovery.

Finally Patch 5 implements the recovery flow as part of the existing s390
specific PCI availability and error event mechanism. Where previously the error
case only set an error indication requiring manual intervention to make the
device usable again. Now we handle the case where firmware has already reset
a PCI function after an error was encountered informing the OS that it should
be ready to be used again. Note that the same event is also issued by the
hypervisor if the function was manually taken into a service mode for example
for firmware upgrade via the hypervisor and is now ready to be used again.

Thanks,
Niklas Schnelle

Niklas Schnelle (5):
  s390/pci: refresh function handle in iomap
  s390/pci: implement reset_slot for hotplug slot
  PCI: Move pci_dev_is/assign_added() to pci.h
  PCI: Export pci_dev_lock()
  s390/pci: implement minimal PCI error recovery

 arch/powerpc/platforms/powernv/pci-sriov.c |   3 -
 arch/powerpc/platforms/pseries/setup.c |   1 -
 arch/s390/include/asm/pci.h|   6 +-
 arch/s390/pci/pci.c| 143 ++-
 arch/s390/pci/pci_event.c  | 196 -
 arch/s390/pci/pci_insn.c   |   4 +-
 arch/s390/pci/pci_irq.c|   9 +
 arch/s390/pci/pci_sysfs.c  |   2 -
 drivers/pci/hotplug/acpiphp_glue.c |   1 -
 drivers/pci/hotplug/s390_pci_hpc.c |  24 +++
 drivers/pci/pci.c  |   3 +-
 drivers/pci/pci.h  |  15 --
 include/linux/pci.h|  16 ++
 13 files changed, 389 insertions(+), 34 deletions(-)

-- 
2.25.1



[PATCH 1/5] s390/pci: refresh function handle in iomap

2021-09-06 Thread Niklas Schnelle
The function handle of a PCI function is updated when disabling or
enabling it as well as when the function's availability changes or it
enters the error state.

Until now this only occurred either while there is no struct pci_dev
associated with the function yet or the function became unavailable.
This meant that leaving a stale function handle in the iomap either
didn't happen because there was no iomap yet or it lead to errors on PCI
access but so would the correct disabled function handle.

In the future a CLP Set PCI Function Disable/Enable cycle during PCI
device recovery may be done while the device is bound to a driver.  In
this case we must update the iomap associated with the now-stale
function handle to ensure that the resulting zPCI instruction references
an accurate function handle.

Since the function handle is accessed by the PCI accessor helpers
without locking use READ_ONCE()/WRITE_ONCE() to mark this access and
prevent compiler optimizations that would move the load/store.

With that infrastructure in place let's also properly update the
function handle in the existing cases. This makes sure that in the
future debugging of a zPCI function access through the handle will
show an up to date handle reducing the chance of confusion. Also it
makes sure we have one single place where a zPCI function handle is
updated after initialization.

Signed-off-by: Niklas Schnelle 
---
 arch/s390/include/asm/pci.h |  1 +
 arch/s390/pci/pci.c | 36 
 arch/s390/pci/pci_event.c   |  6 +++---
 arch/s390/pci/pci_insn.c|  4 ++--
 4 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index e4803ec51110..5e6cba22a801 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -211,6 +211,7 @@ int zpci_deconfigure_device(struct zpci_dev *zdev);
 int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64);
 int zpci_unregister_ioat(struct zpci_dev *, u8);
 void zpci_remove_reserved_devices(void);
+void zpci_update_fh(struct zpci_dev *zdev, u32 fh);
 
 /* CLP */
 int clp_setup_writeback_mio(void);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index e7e6788d75a8..af22778551c1 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -481,6 +481,34 @@ static void zpci_free_iomap(struct zpci_dev *zdev, int 
entry)
spin_unlock(_iomap_lock);
 }
 
+static void zpci_do_update_iomap_fh(struct zpci_dev *zdev, u32 fh)
+{
+   int bar, idx;
+
+   spin_lock(_iomap_lock);
+   for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) {
+   if (!zdev->bars[bar].size)
+   continue;
+   idx = zdev->bars[bar].map_idx;
+   if (!zpci_iomap_start[idx].count)
+   continue;
+   WRITE_ONCE(zpci_iomap_start[idx].fh, zdev->fh);
+   }
+   spin_unlock(_iomap_lock);
+}
+
+void zpci_update_fh(struct zpci_dev *zdev, u32 fh)
+{
+   if (!fh || zdev->fh == fh)
+   return;
+
+   zdev->fh = fh;
+   if (zpci_use_mio(zdev))
+   return;
+   if (zdev->has_resources && zdev_enabled(zdev))
+   zpci_do_update_iomap_fh(zdev, fh);
+}
+
 static struct resource *__alloc_res(struct zpci_dev *zdev, unsigned long start,
unsigned long size, unsigned long flags)
 {
@@ -668,7 +696,7 @@ int zpci_enable_device(struct zpci_dev *zdev)
if (clp_enable_fh(zdev, , ZPCI_NR_DMA_SPACES))
rc = -EIO;
else
-   zdev->fh = fh;
+   zpci_update_fh(zdev, fh);
return rc;
 }
 
@@ -679,14 +707,14 @@ int zpci_disable_device(struct zpci_dev *zdev)
 
cc = clp_disable_fh(zdev, );
if (!cc) {
-   zdev->fh = fh;
+   zpci_update_fh(zdev, fh);
} else if (cc == CLP_RC_SETPCIFN_ALRDY) {
pr_info("Disabling PCI function %08x had no effect as it was 
already disabled\n",
zdev->fid);
/* Function is already disabled - update handle */
rc = clp_refresh_fh(zdev->fid, );
if (!rc) {
-   zdev->fh = fh;
+   zpci_update_fh(zdev, fh);
rc = -EINVAL;
}
} else {
@@ -768,7 +796,7 @@ int zpci_scan_configured_device(struct zpci_dev *zdev, u32 
fh)
 {
int rc;
 
-   zdev->fh = fh;
+   zpci_update_fh(zdev, fh);
/* the PCI function will be scanned once function 0 appears */
if (!zdev->zbus->bus)
return 0;
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index c856f80cb21b..e868d996ec5b 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -76,7 +76,7 @@ void zpci_event_error(void *data)
 
 static void zpci_event_hard_deconfigured(struct zpci_dev *zdev, u32 fh)
 {
-   zdev->fh = fh;
+   zpci_update_fh(zdev, fh);
   

[RFC PATCH v2] powerpc/papr_scm: Move duplicate definitions to common header files

2021-09-06 Thread Shivaprasad G Bhat
papr_scm and ndtest share common PDSM payload structs like
nd_papr_pdsm_health. Presently these structs are duplicated across papr_pdsm.h
and ndtest.h header files. Since 'ndtest' is essentially arch independent and 
can
run on platforms other than PPC64, a way needs to be deviced to avoid redundancy
and duplication of PDSM structs in future.

So the patch proposes moving the PDSM header from arch/powerpc/include/uapi/ to
the generic include/uapi/linux directory. Also, there are some #defines common
between papr_scm and ndtest which are not exported to the user space. So, move
them to a header file which can be shared across ndtest and papr_scm via newly
introduced include/linux/papr_scm.h.

Signed-off-by: Shivaprasad G Bhat 
Signed-off-by: Vaibhav Jain 
Suggested-by: "Aneesh Kumar K.V" 
---
Changelog:

Since v1:
Link: 
https://patchwork.kernel.org/project/linux-nvdimm/patch/162505488483.72147.12741153746322191381.stgit@56e104a48989/
* Removed dependency on this patch for the other patches

 MAINTAINERS   |2 
 arch/powerpc/include/uapi/asm/papr_pdsm.h |  165 -
 arch/powerpc/platforms/pseries/papr_scm.c |   43 
 include/linux/papr_scm.h  |   48 
 include/uapi/linux/papr_pdsm.h|  165 +
 tools/testing/nvdimm/test/ndtest.c|2 
 tools/testing/nvdimm/test/ndtest.h|  120 -
 7 files changed, 219 insertions(+), 326 deletions(-)
 delete mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
 create mode 100644 include/linux/papr_scm.h
 create mode 100644 include/uapi/linux/papr_pdsm.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6c8be735cc91..03fe0c77cefa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10661,6 +10661,8 @@ F:  drivers/rtc/rtc-opal.c
 F: drivers/scsi/ibmvscsi/
 F: drivers/tty/hvc/hvc_opal.c
 F: drivers/watchdog/wdrtas.c
+F: include/linux/papr_scm.h
+F: include/uapi/linux/papr_pdsm.h
 F: tools/testing/selftests/powerpc
 N: /pmac
 N: powermac
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
b/arch/powerpc/include/uapi/asm/papr_pdsm.h
deleted file mode 100644
index 17439925045c..
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ /dev/null
@@ -1,165 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-/*
- * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
- *
- * (C) Copyright IBM 2020
- *
- * Author: Vaibhav Jain 
- */
-
-#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
-#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
-
-#include 
-#include 
-
-/*
- * PDSM Envelope:
- *
- * The ioctl ND_CMD_CALL exchange data between user-space and kernel via
- * envelope which consists of 2 headers sections and payload sections as
- * illustrated below:
- *  +-+---+---+
- *  |   64-Bytes  |   8-Bytes |   Max 184-Bytes   |
- *  +-+---+---+
- *  | ND-HEADER   |  PDSM-HEADER  |  PDSM-PAYLOAD |
- *  +-+---+---+
- *  | nd_family   |   |   |
- *  | nd_size_out | cmd_status|   |
- *  | nd_size_in  | reserved  | nd_pdsm_payload   |
- *  | nd_command  | payload   --> |   |
- *  | nd_fw_size  |   |   |
- *  | nd_payload ---> |   |   |
- *  +---+-+---+
- *
- * ND Header:
- * This is the generic libnvdimm header described as 'struct nd_cmd_pkg'
- * which is interpreted by libnvdimm before passed on to papr_scm. Important
- * member fields used are:
- * 'nd_family' : (In) NVDIMM_FAMILY_PAPR_SCM
- * 'nd_size_in': (In) PDSM-HEADER + PDSM-IN-PAYLOAD (usually 0)
- * 'nd_size_out': (In) PDSM-HEADER + PDSM-RETURN-PAYLOAD
- * 'nd_command' : (In) One of PAPR_PDSM_XXX
- * 'nd_fw_size' : (Out) PDSM-HEADER + size of actual payload returned
- *
- * PDSM Header:
- * This is papr-scm specific header that precedes the payload. This is defined
- * as nd_cmd_pdsm_pkg.  Following fields aare available in this header:
- *
- * 'cmd_status': (Out) Errors if any encountered while 
servicing PDSM.
- * 'reserved'  : Not used, reserved for future and should be set to 0.
- * 'payload': A union of all the possible payload structs
- *
- * PDSM Payload:
- *
- * The layout of the PDSM Payload is defined by various structs shared between
- * papr_scm and libndctl so that contents of payload can be interpreted. As 
such
- * its defined as a union of all possible payload structs as
- * 'union nd_pdsm_payload'. Based on the value of 'nd_cmd_pkg.nd_command'
- * appropriate member of the union is accessed.
- */
-
-/* Max 

[PATCH v2] tests/nvdimm/ndtest: Simulate nvdimm health, DSC and smart-inject

2021-09-06 Thread Shivaprasad G Bhat
The 'papr_scm' module and 'papr' implementation in libndctl supports
PDSMs for reporting PAPR NVDIMM health, dirty-shutdown-count and
injecting smart-errors. This patch adds support for those PDSMs in
ndtest module so that PDSM specific paths in libndctl can be exercised.

Signed-off-by: Shivaprasad G Bhat 
Signed-off-by: Vaibhav Jain 
---
Changelog:

Since v1:
Link: https://patchwork.kernel.org/project/linux-nvdimm/list/?series=521767
* Removed the dependency on a header movement patch

 tools/testing/nvdimm/test/ndtest.c |  148 
 tools/testing/nvdimm/test/ndtest.h |   96 +++
 2 files changed, 244 insertions(+)

diff --git a/tools/testing/nvdimm/test/ndtest.c 
b/tools/testing/nvdimm/test/ndtest.c
index 6862915f1fb0..45d42cd25e82 100644
--- a/tools/testing/nvdimm/test/ndtest.c
+++ b/tools/testing/nvdimm/test/ndtest.c
@@ -48,6 +48,10 @@ static struct ndtest_dimm dimm_group1[] = {
.uuid_str = "1e5c75d2-b618-11ea-9aa3-507b9ddc0f72",
.physical_id = 0,
.num_formats = 2,
+   .flags = PAPR_PMEM_HEALTH_NON_CRITICAL,
+   .extension_flags = PDSM_DIMM_DSC_VALID | 
PDSM_DIMM_HEALTH_RUN_GAUGE_VALID,
+   .dimm_fuel_gauge = 95,
+   .dimm_dsc = 42,
},
{
.size = DIMM_SIZE,
@@ -55,6 +59,10 @@ static struct ndtest_dimm dimm_group1[] = {
.uuid_str = "1c4d43ac-b618-11ea-be80-507b9ddc0f72",
.physical_id = 1,
.num_formats = 2,
+   .flags = PAPR_PMEM_HEALTH_NON_CRITICAL,
+   .extension_flags = PDSM_DIMM_DSC_VALID | 
PDSM_DIMM_HEALTH_RUN_GAUGE_VALID,
+   .dimm_fuel_gauge = 95,
+   .dimm_dsc = 42,
},
{
.size = DIMM_SIZE,
@@ -62,6 +70,10 @@ static struct ndtest_dimm dimm_group1[] = {
.uuid_str = "a9f17ffc-b618-11ea-b36d-507b9ddc0f72",
.physical_id = 2,
.num_formats = 2,
+   .flags = PAPR_PMEM_HEALTH_NON_CRITICAL,
+   .extension_flags = PDSM_DIMM_DSC_VALID | 
PDSM_DIMM_HEALTH_RUN_GAUGE_VALID,
+   .dimm_fuel_gauge = 95,
+   .dimm_dsc = 42,
},
{
.size = DIMM_SIZE,
@@ -69,6 +81,10 @@ static struct ndtest_dimm dimm_group1[] = {
.uuid_str = "b6b83b22-b618-11ea-8aae-507b9ddc0f72",
.physical_id = 3,
.num_formats = 2,
+   .flags = PAPR_PMEM_HEALTH_NON_CRITICAL,
+   .extension_flags = PDSM_DIMM_DSC_VALID | 
PDSM_DIMM_HEALTH_RUN_GAUGE_VALID,
+   .dimm_fuel_gauge = 95,
+   .dimm_dsc = 42,
},
{
.size = DIMM_SIZE,
@@ -296,6 +312,103 @@ static int ndtest_get_config_size(struct ndtest_dimm 
*dimm, unsigned int buf_len
return 0;
 }
 
+static int ndtest_pdsm_health(struct ndtest_dimm *dimm,
+   union nd_pdsm_payload *payload,
+   unsigned int buf_len)
+{
+   struct nd_papr_pdsm_health *health = >health;
+
+   if (buf_len < sizeof(health))
+   return -EINVAL;
+
+   health->extension_flags = 0;
+   health->dimm_unarmed = !!(dimm->flags & PAPR_PMEM_UNARMED_MASK);
+   health->dimm_bad_shutdown = !!(dimm->flags & 
PAPR_PMEM_BAD_SHUTDOWN_MASK);
+   health->dimm_bad_restore = !!(dimm->flags & PAPR_PMEM_BAD_RESTORE_MASK);
+   health->dimm_health = PAPR_PDSM_DIMM_HEALTHY;
+
+   if (dimm->flags & PAPR_PMEM_HEALTH_FATAL)
+   health->dimm_health = PAPR_PDSM_DIMM_FATAL;
+   else if (dimm->flags & PAPR_PMEM_HEALTH_CRITICAL)
+   health->dimm_health = PAPR_PDSM_DIMM_CRITICAL;
+   else if (dimm->flags & PAPR_PMEM_HEALTH_UNHEALTHY ||
+dimm->flags & PAPR_PMEM_HEALTH_NON_CRITICAL)
+   health->dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
+
+   health->extension_flags = 0;
+   if (dimm->extension_flags & PDSM_DIMM_HEALTH_RUN_GAUGE_VALID) {
+   health->dimm_fuel_gauge = dimm->dimm_fuel_gauge;
+   health->extension_flags |= PDSM_DIMM_HEALTH_RUN_GAUGE_VALID;
+   }
+   if (dimm->extension_flags & PDSM_DIMM_DSC_VALID) {
+   health->dimm_dsc = dimm->dimm_dsc;
+   health->extension_flags |= PDSM_DIMM_DSC_VALID;
+   }
+
+   return 0;
+}
+
+static void smart_notify(struct ndtest_dimm *dimm)
+{
+   struct device *bus = dimm->dev->parent;
+
+   if (!(dimm->flags & PAPR_PMEM_HEALTH_NON_CRITICAL) ||
+   (dimm->flags & PAPR_PMEM_BAD_SHUTDOWN_MASK)) {
+   device_lock(bus);
+   /* send smart notification */
+   if (dimm->notify_handle)
+   sysfs_notify_dirent(dimm->notify_handle);
+   device_unlock(bus);
+   }
+}
+
+static int ndtest_pdsm_smart_inject(struct ndtest_dimm *dimm,
+   union 

[PATCH v2] powerpc/papr_scm: Implement initial support for injecting smart errors

2021-09-06 Thread Shivaprasad G Bhat
From: Vaibhav Jain 

Presently PAPR doesn't support injecting smart errors on an
NVDIMM. This makes testing the NVDIMM health reporting functionality
difficult as simulating NVDIMM health related events need a hacked up
qemu version.

To solve this problem this patch proposes simulating certain set of
NVDIMM health related events in papr_scm. Specifically 'fatal' health
state and 'dirty' shutdown state. These error can be injected via the
user-space 'ndctl-inject-smart(1)' command. With the proposed patch and
corresponding ndctl patches following command flow is expected:

$ sudo ndctl list -DH -d nmem0
...
  "health_state":"ok",
  "shutdown_state":"clean",
...
 # inject unsafe shutdown and fatal health error
$ sudo ndctl inject-smart nmem0 -Uf
...
  "health_state":"fatal",
  "shutdown_state":"dirty",
...
 # uninject all errors
$ sudo ndctl inject-smart nmem0 -N
...
  "health_state":"ok",
  "shutdown_state":"clean",
...

The patch adds two members 'health_bitmap_mask' and
'health_bitmap_override' inside struct papr_scm_priv which are then
bit blt'ed to the health bitmaps fetched from the hypervisor. In case
we are not able to fetch health information from the hypervisor we
service the health bitmap from these two members. These members are
accessible from sysfs at nmemX/papr/health_bitmap_override

A new PDSM named 'SMART_INJECT' is proposed that accepts newly
introduced 'struct nd_papr_pdsm_smart_inject' as payload thats
exchanged between libndctl and papr_scm to indicate the requested
smart-error states.

When the processing the PDSM 'SMART_INJECT', papr_pdsm_smart_inject()
constructs a pair or 'mask' and 'override' bitmaps from the payload
and bit-blt it to the 'health_bitmap_{mask, override}' members. This
ensures the after being fetched from the hypervisor, the health_bitmap
reflects requested smart-error states.

Signed-off-by: Vaibhav Jain 
Signed-off-by: Shivaprasad G Bhat 
---
Changelog:

Since v1:
Link: https://patchwork.kernel.org/project/linux-nvdimm/list/?series=513881
* Updated the patch description.
* Removed dependency of a header movement patch.
* Removed '__packed' attribute for 'struct nd_papr_pdsm_smart_inject' [Aneesh]

 arch/powerpc/include/uapi/asm/papr_pdsm.h |   18 ++
 arch/powerpc/platforms/pseries/papr_scm.c |   94 -
 2 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index 82488b1e7276..17439925045c 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -116,6 +116,22 @@ struct nd_papr_pdsm_health {
};
 };
 
+/* Flags for injecting specific smart errors */
+#define PDSM_SMART_INJECT_HEALTH_FATAL (1 << 0)
+#define PDSM_SMART_INJECT_BAD_SHUTDOWN (1 << 1)
+
+struct nd_papr_pdsm_smart_inject {
+   union {
+   struct {
+   /* One or more of PDSM_SMART_INJECT_ */
+   __u32 flags;
+   __u8 fatal_enable;
+   __u8 unsafe_shutdown_enable;
+   };
+   __u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
+   };
+};
+
 /*
  * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
  * via 'nd_cmd_pkg.nd_command' member of the ioctl struct
@@ -123,12 +139,14 @@ struct nd_papr_pdsm_health {
 enum papr_pdsm {
PAPR_PDSM_MIN = 0x0,
PAPR_PDSM_HEALTH,
+   PAPR_PDSM_SMART_INJECT,
PAPR_PDSM_MAX,
 };
 
 /* Maximal union that can hold all possible payload types */
 union nd_pdsm_payload {
struct nd_papr_pdsm_health health;
+   struct nd_papr_pdsm_smart_inject smart_inject;
__u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
 } __packed;
 
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index f48e87ac89c9..de4cf329cfb3 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -68,6 +68,10 @@
 #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
 #define PAPR_SCM_PERF_STATS_VERSION 0x1
 
+/* Use bitblt method to override specific bits in the '_bitmap_' */
+#define BITBLT_BITMAP(_bitmap_, _mask_, _override_)\
+   (((_bitmap_) & ~(_mask_)) | ((_mask_) & (_override_)))
+
 /* Struct holding a single performance metric */
 struct papr_scm_perf_stat {
u8 stat_id[8];
@@ -120,6 +124,12 @@ struct papr_scm_priv {
 
/* length of the stat buffer as expected by phyp */
size_t stat_buffer_len;
+
+   /* The bits which needs to be overridden */
+   u64 health_bitmap_mask;
+
+   /* The overridden values for the bits having the masks set */
+   u64 health_bitmap_override;
 };
 
 static int papr_scm_pmem_flush(struct nd_region *nd_region,
@@ -347,19 +357,28 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv 
*p,
 static int __drc_pmem_query_health(struct papr_scm_priv 

[PATCH v3 3/3] powerpc/mce: Modify the real address error logging messages

2021-09-06 Thread Ganesh Goudar
To avoid ambiguity, modify the strings in real address error
logging messages to "foreign/control memory" from "foreign",
Since the error discriptions in P9 user manual and P10 user
manual are different for same type of errors.

P9 User Manual for MCE:
DSISR:59 Host real address to foreign space during translation.
DSISR:60 Host real address to foreign space on a load or store
 access.

P10 User Manual for MCE:
DSISR:59 D-side tablewalk used a host real address in the
 control memory address range.
DSISR:60 D-side operand access to control memory address space.

Signed-off-by: Ganesh Goudar 
---
v3: No changes.

v2: No changes.
---
 arch/powerpc/kernel/mce.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 9d1e39d42e3e..5baf69503349 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -400,14 +400,14 @@ void machine_check_print_event_info(struct 
machine_check_event *evt,
static const char *mc_ra_types[] = {
"Indeterminate",
"Instruction fetch (bad)",
-   "Instruction fetch (foreign)",
+   "Instruction fetch (foreign/control memory)",
"Page table walk ifetch (bad)",
-   "Page table walk ifetch (foreign)",
+   "Page table walk ifetch (foreign/control memory)",
"Load (bad)",
"Store (bad)",
"Page table walk Load/Store (bad)",
-   "Page table walk Load/Store (foreign)",
-   "Load/Store (foreign)",
+   "Page table walk Load/Store (foreign/control memory)",
+   "Load/Store (foreign/control memory)",
};
static const char *mc_link_types[] = {
"Indeterminate",
-- 
2.31.1



[PATCH v3 2/3] selftests/powerpc: Add test for real address error handling

2021-09-06 Thread Ganesh Goudar
Add test for real address or control memory address access
error handling, using NX-GZIP engine.

The error is injected by accessing the control memory address
using illegal instruction, on successful handling the process
attempting to access control memory address using illegal
instruction receives SIGBUS.

Signed-off-by: Ganesh Goudar 
---
v3: Avoid using shell script to inject error.

v2: Fix build error.
---
 tools/testing/selftests/powerpc/Makefile  |  3 +-
 tools/testing/selftests/powerpc/mce/Makefile  |  7 ++
 .../selftests/powerpc/mce/inject-ra-err.c | 65 +++
 tools/testing/selftests/powerpc/mce/vas-api.h |  1 +
 4 files changed, 75 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/mce/Makefile
 create mode 100644 tools/testing/selftests/powerpc/mce/inject-ra-err.c
 create mode 12 tools/testing/selftests/powerpc/mce/vas-api.h

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 0830e63818c1..4830372d7416 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -31,7 +31,8 @@ SUB_DIRS = alignment  \
   vphn \
   math \
   ptrace   \
-  security
+  security \
+  mce
 
 endif
 
diff --git a/tools/testing/selftests/powerpc/mce/Makefile 
b/tools/testing/selftests/powerpc/mce/Makefile
new file mode 100644
index ..2424513982d9
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mce/Makefile
@@ -0,0 +1,7 @@
+#SPDX-License-Identifier: GPL-2.0-or-later
+
+TEST_GEN_PROGS := inject-ra-err
+
+include ../../lib.mk
+
+$(TEST_GEN_PROGS): ../harness.c
diff --git a/tools/testing/selftests/powerpc/mce/inject-ra-err.c 
b/tools/testing/selftests/powerpc/mce/inject-ra-err.c
new file mode 100644
index ..94323c34d9a6
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mce/inject-ra-err.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vas-api.h"
+#include "utils.h"
+
+static bool faulted;
+
+static void sigbus_handler(int n, siginfo_t *info, void *ctxt_v)
+{
+   ucontext_t *ctxt = (ucontext_t *)ctxt_v;
+   struct pt_regs *regs = ctxt->uc_mcontext.regs;
+
+   faulted = true;
+   regs->nip += 4;
+}
+
+static int test_ra_error(void)
+{
+   struct vas_tx_win_open_attr attr;
+   int fd, *paste_addr;
+   char *devname = "/dev/crypto/nx-gzip";
+   struct sigaction act = {
+   .sa_sigaction = sigbus_handler,
+   .sa_flags = SA_SIGINFO,
+   };
+
+   memset(, 0, sizeof(attr));
+   attr.version = 1;
+   attr.vas_id = 0;
+
+   SKIP_IF(access(devname, F_OK));
+
+   fd = open(devname, O_RDWR);
+   FAIL_IF(fd < 0);
+   FAIL_IF(ioctl(fd, VAS_TX_WIN_OPEN, ) < 0);
+   FAIL_IF(sigaction(SIGBUS, , NULL) != 0);
+
+   paste_addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 
0ULL);
+
+   /* The following assignment triggers exception */
+   mb();
+   *paste_addr = 1;
+   mb();
+
+   FAIL_IF(!faulted);
+
+   return 0;
+}
+
+int main(void)
+{
+   return test_harness(test_ra_error, "inject-ra-err");
+}
+
diff --git a/tools/testing/selftests/powerpc/mce/vas-api.h 
b/tools/testing/selftests/powerpc/mce/vas-api.h
new file mode 12
index ..1455c1bcd351
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mce/vas-api.h
@@ -0,0 +1 @@
+../../../../../arch/powerpc/include/uapi/asm/vas-api.h
\ No newline at end of file
-- 
2.31.1



[PATCH v3 1/3] powerpc/pseries: Parse control memory access error

2021-09-06 Thread Ganesh Goudar
Add support to parse and log control memory access
error for pseries. These changes are made according to
PAPR v2.11 10.3.2.2.12.

Signed-off-by: Ganesh Goudar 
---
v3: Modify the commit log to mention the document according
to which changes are made.
Define and use a macro to check if the effective address
is provided.

v2: No changes.
---
 arch/powerpc/platforms/pseries/ras.c | 36 
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c 
b/arch/powerpc/platforms/pseries/ras.c
index 56092dccfdb8..e62a0ca2611a 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -60,11 +60,17 @@ struct pseries_mc_errorlog {
 *  XX  2: Reserved.
 *XXX   3: Type of UE error.
 *
-* For error_type != MC_ERROR_TYPE_UE
+* For error_type == MC_ERROR_TYPE_SLB/ERAT/TLB
 *   
 *   X  1: Effective address provided.
 *X 5: Reserved.
 * XX   2: Type of SLB/ERAT/TLB error.
+*
+* For error_type == MC_ERROR_TYPE_CTRL_MEM_ACCESS
+*   
+*   X  1: Error causing address provided.
+*XXX   3: Type of error.
+*      4: Reserved.
 */
u8  sub_err_type;
u8  reserved_1[6];
@@ -80,6 +86,7 @@ struct pseries_mc_errorlog {
 #define MC_ERROR_TYPE_TLB  0x04
 #define MC_ERROR_TYPE_D_CACHE  0x05
 #define MC_ERROR_TYPE_I_CACHE  0x07
+#define MC_ERROR_TYPE_CTRL_MEM_ACCESS  0x08
 
 /* RTAS pseries MCE error sub types */
 #define MC_ERROR_UE_INDETERMINATE  0
@@ -90,6 +97,7 @@ struct pseries_mc_errorlog {
 
 #define UE_EFFECTIVE_ADDR_PROVIDED 0x40
 #define UE_LOGICAL_ADDR_PROVIDED   0x20
+#define MC_EFFECTIVE_ADDR_PROVIDED 0x80
 
 #define MC_ERROR_SLB_PARITY0
 #define MC_ERROR_SLB_MULTIHIT  1
@@ -103,6 +111,9 @@ struct pseries_mc_errorlog {
 #define MC_ERROR_TLB_MULTIHIT  2
 #define MC_ERROR_TLB_INDETERMINATE 3
 
+#define MC_ERROR_CTRL_MEM_ACCESS_PTABLE_WALK   0
+#define MC_ERROR_CTRL_MEM_ACCESS_OP_ACCESS 1
+
 static inline u8 rtas_mc_error_sub_type(const struct pseries_mc_errorlog *mlog)
 {
switch (mlog->error_type) {
@@ -112,6 +123,8 @@ static inline u8 rtas_mc_error_sub_type(const struct 
pseries_mc_errorlog *mlog)
caseMC_ERROR_TYPE_ERAT:
caseMC_ERROR_TYPE_TLB:
return (mlog->sub_err_type & 0x03);
+   caseMC_ERROR_TYPE_CTRL_MEM_ACCESS:
+   return (mlog->sub_err_type & 0x70) >> 4;
default:
return 0;
}
@@ -656,7 +669,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs,
mce_err.u.slb_error_type = MCE_SLB_ERROR_INDETERMINATE;
break;
}
-   if (mce_log->sub_err_type & 0x80)
+   if (mce_log->sub_err_type & MC_EFFECTIVE_ADDR_PROVIDED)
eaddr = be64_to_cpu(mce_log->effective_address);
break;
case MC_ERROR_TYPE_ERAT:
@@ -673,7 +686,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs,
mce_err.u.erat_error_type = 
MCE_ERAT_ERROR_INDETERMINATE;
break;
}
-   if (mce_log->sub_err_type & 0x80)
+   if (mce_log->sub_err_type & MC_EFFECTIVE_ADDR_PROVIDED)
eaddr = be64_to_cpu(mce_log->effective_address);
break;
case MC_ERROR_TYPE_TLB:
@@ -690,7 +703,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs,
mce_err.u.tlb_error_type = MCE_TLB_ERROR_INDETERMINATE;
break;
}
-   if (mce_log->sub_err_type & 0x80)
+   if (mce_log->sub_err_type & MC_EFFECTIVE_ADDR_PROVIDED)
eaddr = be64_to_cpu(mce_log->effective_address);
break;
case MC_ERROR_TYPE_D_CACHE:
@@ -699,6 +712,21 @@ static int mce_handle_err_virtmode(struct pt_regs *regs,
case MC_ERROR_TYPE_I_CACHE:
mce_err.error_type = MCE_ERROR_TYPE_ICACHE;
break;
+   case MC_ERROR_TYPE_CTRL_MEM_ACCESS:
+   mce_err.error_type = MCE_ERROR_TYPE_RA;
+   switch (err_sub_type) {
+   case MC_ERROR_CTRL_MEM_ACCESS_PTABLE_WALK:
+   mce_err.u.ra_error_type =
+   MCE_RA_ERROR_PAGE_TABLE_WALK_LOAD_STORE_FOREIGN;
+   break;
+   case MC_ERROR_CTRL_MEM_ACCESS_OP_ACCESS:
+   mce_err.u.ra_error_type =
+   MCE_RA_ERROR_LOAD_STORE_FOREIGN;
+   break;
+   }
+   if (mce_log->sub_err_type & MC_EFFECTIVE_ADDR_PROVIDED)
+  

[PATCH] powerpc/mce: Fix access error in mce handler

2021-09-06 Thread Ganesh Goudar
We queue an irq work for deferred processing of mce event
in realmode mce handler, where translation is disabled.
Queuing of the work may result in accessing memory outside
RMO region, such access needs the translation to be enabled
for an LPAR running with hash mmu else the kernel crashes.

So enable the translation before queuing the work.

Without this change following trace is seen on injecting machine
check error in an LPAR running with hash mmu.

Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0-mce+ #137
NIP:  c0735d60 LR: c0318640 CTR: 
REGS: c0001ebff9a0 TRAP: 0300   Tainted: G   OE  (5.14.0-mce+)
MSR:  80001003   CR: 28008228  XER: 0001
CFAR: c031863c DAR: c0027fa8fe08 DSISR: 4000 IRQMASK: 0
GPR00: c03186d0 c0001ebffc40 c1b0df00 c16337e8
GPR04: c16337e8 c0027fa8fe08 0023 c16337f0
GPR08: 0023 c12ffe08  c00801460240
GPR12:  c0001ec9a900 c0002ac4bd00 
GPR16: 05a0 c008006b c008006b05a0 c0ff3068
GPR20: c0002ac4bbc0 0001 c0002ac4bbc0 c00801490298
GPR24: c00801490108 c1636198 c00801470090 c00801470058
GPR28: 0510 c0080100 c0080819 0019
NIP [c0735d60] llist_add_batch+0x0/0x40
LR [c0318640] __irq_work_queue_local+0x70/0xc0
Call Trace:
[c0001ebffc40] [c0001ebffc0c] 0xc0001ebffc0c (unreliable)
[c0001ebffc60] [c03186d0] irq_work_queue+0x40/0x70
[c0001ebffc80] [c004425c] machine_check_queue_event+0xbc/0xd0
[c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4

Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from 
handler")
Signed-off-by: Ganesh Goudar 
---
 arch/powerpc/kernel/mce.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 47a683cd00d2..9d1e39d42e3e 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -249,6 +249,7 @@ void machine_check_queue_event(void)
 {
int index;
struct machine_check_event evt;
+   unsigned long msr;
 
if (!get_mce_event(, MCE_EVENT_RELEASE))
return;
@@ -262,8 +263,19 @@ void machine_check_queue_event(void)
memcpy(_paca->mce_info->mce_event_queue[index],
   , sizeof(evt));
 
-   /* Queue irq work to process this event later. */
-   irq_work_queue(_event_process_work);
+   /* Queue irq work to process this event later. Before
+* queuing the work enable translation for non radix LPAR,
+* as irq_work_queue may try to access memory outside RMO
+* region.
+*/
+   if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
+   msr = mfmsr();
+   mtmsr(msr | MSR_IR | MSR_DR);
+   irq_work_queue(_event_process_work);
+   mtmsr(msr);
+   } else {
+   irq_work_queue(_event_process_work);
+   }
 }
 
 void mce_common_process_ue(struct pt_regs *regs,
-- 
2.31.1