Re: [PATCH] compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING

2019-03-19 Thread Masahiro Yamada
On Wed, Mar 20, 2019 at 3:21 PM Masahiro Yamada
 wrote:
>
> Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
> CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.
>
> The idea is obviously arch-agnostic although we need some code fixups.
> This commit moves the config entry from arch/x86/Kconfig.debug to
> lib/Kconfig.debug so that all architectures (except MIPS for now) can
> benefit from it.
>
> At this moment, I added "depends on !MIPS" because fixing 0day bot reports
> for MIPS was complex to me.

BTW, I got the following error if I enabled CONFIG_OPTIMIZE_INLINING for MIPS.

It is unclear to me how to fix it.
That's why I ended up with "depends on !MIPS".


  MODPOST vmlinux.o
arch/mips/mm/sc-mips.o: In function `mips_sc_prefetch_enable.part.2':
sc-mips.c:(.text+0x98): undefined reference to `mips_gcr_base'
sc-mips.c:(.text+0x9c): undefined reference to `mips_gcr_base'
sc-mips.c:(.text+0xbc): undefined reference to `mips_gcr_base'
sc-mips.c:(.text+0xc8): undefined reference to `mips_gcr_base'
sc-mips.c:(.text+0xdc): undefined reference to `mips_gcr_base'
arch/mips/mm/sc-mips.o:sc-mips.c:(.text.unlikely+0x44): more undefined
references to `mips_gcr_base'


Perhaps, MIPS folks may know how to fix it.




> I tested this patch on my arm/arm64 boards.
>
> This can make a huge difference in kernel image size especially when
> CONFIG_OPTIMIZE_FOR_SIZE is enabled.
>
> For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.
>
>   dec   file
>   18983424  arch/arm64/boot/Image.before
>   18321920  arch/arm64/boot/Image.after
>
> This also slightly improves the "Kernel hacking" Kconfig menu.
> Commit e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen')
> mentioned this config option would be a good fit in the "compiler option"
> menu. I did so.
>
> I fixed up some files to avoid build warnings/errors.
>
> [1] arch/arm64/include/asm/cpufeature.h
>
> In file included from ././include/linux/compiler_types.h:68,
>  from :
> ./arch/arm64/include/asm/jump_label.h: In function 'cpus_have_const_cap':
> ./include/linux/compiler-gcc.h:120:38: warning: asm operand 0 probably 
> doesn't match constraints
>  #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
>   ^~~
> ./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of macro 
> 'asm_volatile_goto'
>   asm_volatile_goto(
>   ^
> ./include/linux/compiler-gcc.h:120:38: error: impossible constraint in 'asm'
>  #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
>   ^~~
> ./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of macro 
> 'asm_volatile_goto'
>   asm_volatile_goto(
>   ^
>
> [2] arch/mips/kernel/cpu-bugs64.c
>
> arch/mips/kernel/cpu-bugs64.c: In function 'mult_sh_align_mod.constprop':
> arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably doesn't 
> match constraints [-Werror]
>   asm volatile(
>   ^~~
> arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably doesn't 
> match constraints [-Werror]
>   asm volatile(
>   ^~~
> arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
>   asm volatile(
>   ^~~
> arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
>   asm volatile(
>   ^~~
>
> [3] arch/powerpc/mm/tlb-radix.c
>
> arch/powerpc/mm/tlb-radix.c: In function '__radix__flush_tlb_range_psize':
> arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably doesn't 
> match constraints [-Werror]
>   asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
>   ^~~
> arch/powerpc/mm/tlb-radix.c:104:2: error: impossible constraint in 'asm'
>   CC  arch/powerpc/perf/hv-gpci.o
>
> [4] arch/s390/include/asm/cpacf.h
>
> In file included from arch/s390/crypto/des_s390.c:19:
> ./arch/s390/include/asm/cpacf.h: In function 'cpacf_query':
> ./arch/s390/include/asm/cpacf.h:170:2: warning: asm operand 3 probably 
> doesn't match constraints
>   asm volatile(
>   ^~~
> ./arch/s390/include/asm/cpacf.h:170:2: error: impossible constraint in 'asm'
>
> [5] arch/powerpc/kernel/prom_init.c
>
> WARNING: vmlinux.o(.text.unlikely+0x20): Section mismatch in reference from 
> the function .prom_getprop() to the function .init.text:.call_prom()
> The function .prom_getprop() references
> the function __init .call_prom().
> This is often because .prom_getprop lacks a __init
> annotation or the annotation of .call_prom is wrong.
>
> WARNING: vmlinux.o(.text.unlikely+0x3c): Section mismatch in reference from 
> the function .prom_getproplen() to the function .init.text:.call_prom()
> The function .prom_getproplen() references
> the function __init .call_prom().
> This is often because .prom_getproplen lacks a __init
> annotation or the annotation of .call_prom is wrong.
>
> [6] drivers/mtd/nand/raw/vf610_nfc.c
>
> drivers/mtd/nand/raw/vf610_nfc.c: In function ‘vf610_nfc_cmd’:
> drivers/mtd/nand/raw/vf610_nf

Re: [PATCH 5/8] powerpc/eeh: Add eeh_show_enabled()

2019-03-19 Thread Oliver
On Wed, Mar 20, 2019 at 5:06 PM Alexey Kardashevskiy  wrote:
>
>
>
> On 20/03/2019 13:58, Sam Bobroff wrote:
> > Move the EEH enabled message into it's own function so that future
> > work can call it from multiple places.
> >
> > Signed-off-by: Sam Bobroff 
> > ---
> >  arch/powerpc/include/asm/eeh.h |  3 +++
> >  arch/powerpc/kernel/eeh.c  | 16 +++-
> >  2 files changed, 14 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> > index fe4cf7208890..e217ccda55d0 100644
> > --- a/arch/powerpc/include/asm/eeh.h
> > +++ b/arch/powerpc/include/asm/eeh.h
> > @@ -289,6 +289,7 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
> >
> >  struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
> >  void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
> > +void eeh_show_enabled(void);
> >  void eeh_probe_devices(void);
> >  int __init eeh_ops_register(struct eeh_ops *ops);
> >  int __exit eeh_ops_unregister(const char *name);
> > @@ -338,6 +339,8 @@ static inline bool eeh_enabled(void)
> >  return false;
> >  }
> >
> > +static inline void eeh_show_enabled(void) { }
> > +
> >  static inline bool eeh_phb_enabled(void)
> >  {
> >   return false;
> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> > index b14d89547895..3dcff29cb9b3 100644
> > --- a/arch/powerpc/kernel/eeh.c
> > +++ b/arch/powerpc/kernel/eeh.c
> > @@ -163,6 +163,16 @@ static int __init eeh_setup(char *str)
> >  }
> >  __setup("eeh=", eeh_setup);
> >
> > +void eeh_show_enabled(void)
> > +{
> > + if (eeh_has_flag(EEH_FORCE_DISABLED))
> > + pr_info("EEH: PCI Enhanced I/O Error Handling DISABLED (by 
> > eeh=off)\n");
> > + else if (eeh_enabled())
>
>
> I'd make it eeh_has_flag(EEH_ENABLED) for clarity.
>
>
> > + pr_info("EEH: PCI Enhanced I/O Error Handling ENABLED 
> > (capable adapter found)\n");
> > + else
> > + pr_info("EEH: PCI Enhanced I/O Error Handling DISABLED (no 
> > capable adapter found)\n");
> > +}
> > +
> >  /*
> >   * This routine captures assorted PCI configuration space data
> >   * for the indicated PCI device, and puts them into a buffer
> > @@ -1166,11 +1176,7 @@ void eeh_probe_devices(void)
> >   pdn = hose->pci_data;
> >   traverse_pci_dn(pdn, eeh_ops->probe, NULL);
> >   }
> > - if (eeh_enabled())
> > - pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
> > - else
> > - pr_info("EEH: No capable adapters found\n");
> > -
> > + eeh_show_enabled();
>
>
> This line moves later in the series so I'd just merge this patch into
> 8/8 to reduce number of lines moving withing the patchset.
>
> In general the whole point of the EEH_ENABLED flag is fading away. Its
> meaning now is that "at least somewhere in the box for at least one
> device with enabled EEH" which does not seem extremely useful as we have
> a pci_dev or pe pretty much everywhere we look at eeh_enabled() and
> pdev->dev.archdata.edev can tell if eeh is enabled for a device.
> Although I am pretty sure this is in your list already :)

The other function is to disable attempting to detect EEH errors when
we get 0xFFs from an MMIO load, but I don't think anyone ever disables
it.

> >  }
> >
> >  /**
> >
>
> --
> Alexey


[PATCH] compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING

2019-03-19 Thread Masahiro Yamada
Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.

The idea is obviously arch-agnostic although we need some code fixups.
This commit moves the config entry from arch/x86/Kconfig.debug to
lib/Kconfig.debug so that all architectures (except MIPS for now) can
benefit from it.

At this moment, I added "depends on !MIPS" because fixing 0day bot reports
for MIPS was complex to me.

I tested this patch on my arm/arm64 boards.

This can make a huge difference in kernel image size especially when
CONFIG_OPTIMIZE_FOR_SIZE is enabled.

For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.

  dec   file
  18983424  arch/arm64/boot/Image.before
  18321920  arch/arm64/boot/Image.after

This also slightly improves the "Kernel hacking" Kconfig menu.
Commit e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen')
mentioned this config option would be a good fit in the "compiler option"
menu. I did so.

I fixed up some files to avoid build warnings/errors.

[1] arch/arm64/include/asm/cpufeature.h

In file included from ././include/linux/compiler_types.h:68,
 from :
./arch/arm64/include/asm/jump_label.h: In function 'cpus_have_const_cap':
./include/linux/compiler-gcc.h:120:38: warning: asm operand 0 probably doesn't 
match constraints
 #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
  ^~~
./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of macro 
'asm_volatile_goto'
  asm_volatile_goto(
  ^
./include/linux/compiler-gcc.h:120:38: error: impossible constraint in 'asm'
 #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
  ^~~
./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of macro 
'asm_volatile_goto'
  asm_volatile_goto(
  ^

[2] arch/mips/kernel/cpu-bugs64.c

arch/mips/kernel/cpu-bugs64.c: In function 'mult_sh_align_mod.constprop':
arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably doesn't match 
constraints [-Werror]
  asm volatile(
  ^~~
arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably doesn't match 
constraints [-Werror]
  asm volatile(
  ^~~
arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
  asm volatile(
  ^~~
arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
  asm volatile(
  ^~~

[3] arch/powerpc/mm/tlb-radix.c

arch/powerpc/mm/tlb-radix.c: In function '__radix__flush_tlb_range_psize':
arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably doesn't match 
constraints [-Werror]
  asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
  ^~~
arch/powerpc/mm/tlb-radix.c:104:2: error: impossible constraint in 'asm'
  CC  arch/powerpc/perf/hv-gpci.o

[4] arch/s390/include/asm/cpacf.h

In file included from arch/s390/crypto/des_s390.c:19:
./arch/s390/include/asm/cpacf.h: In function 'cpacf_query':
./arch/s390/include/asm/cpacf.h:170:2: warning: asm operand 3 probably doesn't 
match constraints
  asm volatile(
  ^~~
./arch/s390/include/asm/cpacf.h:170:2: error: impossible constraint in 'asm'

[5] arch/powerpc/kernel/prom_init.c

WARNING: vmlinux.o(.text.unlikely+0x20): Section mismatch in reference from the 
function .prom_getprop() to the function .init.text:.call_prom()
The function .prom_getprop() references
the function __init .call_prom().
This is often because .prom_getprop lacks a __init
annotation or the annotation of .call_prom is wrong.

WARNING: vmlinux.o(.text.unlikely+0x3c): Section mismatch in reference from the 
function .prom_getproplen() to the function .init.text:.call_prom()
The function .prom_getproplen() references
the function __init .call_prom().
This is often because .prom_getproplen lacks a __init
annotation or the annotation of .call_prom is wrong.

[6] drivers/mtd/nand/raw/vf610_nfc.c

drivers/mtd/nand/raw/vf610_nfc.c: In function ‘vf610_nfc_cmd’:
drivers/mtd/nand/raw/vf610_nfc.c:455:3: warning: ‘offset’ may be used 
uninitialized in this function [-Wmaybe-uninitialized]
   vf610_nfc_rd_from_sram(instr->ctx.data.buf.in + offset,
   ^~~
nfc->regs + NFC_MAIN_AREA(0) + offset,
~~
trfr_sz, !nfc->data_access);
~~~

[7] arch/arm/kernel/smp.c

arch/arm/kernel/smp.c: In function ‘raise_nmi’:
arch/arm/kernel/smp.c:522:2: warning: array subscript is above array bounds 
[-Warray-bounds]
  trace_ipi_raise_rcuidle(target, ipi_types[ipinr]);
  ^

The fixup is not included in this. The patch is available in ML:

http://lists.infradead.org/pipermail/linux-arm-kernel/2016-February/409393.html

Signed-off-by: Masahiro Yamada 
---

 arch/arm64/include/asm/cpufeature.h |  4 ++--
 arch/mips/kernel/cpu-bugs64.c   |  4 ++--
 arch/powerpc/kernel

Re: [PATCH 8/8] powerpc/eeh: Remove eeh_probe_devices() and eeh_addr_cache_build()

2019-03-19 Thread Alexey Kardashevskiy



On 20/03/2019 13:58, Sam Bobroff wrote:
> Now that EEH support for all devices (on PowerNV and pSeries) is
> provided by the pcibios bus add device hooks, eeh_probe_devices() and
> eeh_addr_cache_build() are redundant and can be removed.
> 
> Note that previously on pSeries, useless EEH sysfs files were created
> for some devices that did not have EEH support and this change
> prevents them from being created.
> 
> Signed-off-by: Sam Bobroff 
> ---
>  arch/powerpc/include/asm/eeh.h   |  6 
>  arch/powerpc/kernel/eeh.c| 13 
>  arch/powerpc/kernel/eeh_cache.c  | 32 
>  arch/powerpc/platforms/powernv/eeh-powernv.c |  5 ++-
>  arch/powerpc/platforms/pseries/pci.c |  3 +-
>  5 files changed, 3 insertions(+), 56 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 791b9e6fcc45..f1eca1757cbc 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -290,13 +290,11 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
>  struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
>  void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
>  void eeh_show_enabled(void);
> -void eeh_probe_devices(void);
>  int __init eeh_ops_register(struct eeh_ops *ops);
>  int __exit eeh_ops_unregister(const char *name);
>  int eeh_check_failure(const volatile void __iomem *token);
>  int eeh_dev_check_failure(struct eeh_dev *edev);
>  void eeh_addr_cache_init(void);
> -void eeh_addr_cache_build(void);
>  void eeh_add_device_early(struct pci_dn *);
>  void eeh_add_device_tree_early(struct pci_dn *);
>  void eeh_add_device_late(struct pci_dev *);
> @@ -347,8 +345,6 @@ static inline bool eeh_phb_enabled(void)
>   return false;
>  }
>  
> -static inline void eeh_probe_devices(void) { }
> -
>  static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
>  {
>   return NULL;
> @@ -365,8 +361,6 @@ static inline int eeh_check_failure(const volatile void 
> __iomem *token)
>  
>  static inline void eeh_addr_cache_init(void) { }
>  
> -static inline void eeh_addr_cache_build(void) { }
> -
>  static inline void eeh_add_device_early(struct pci_dn *pdn) { }
>  
>  static inline void eeh_add_device_tree_early(struct pci_dn *pdn) { }
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 217e14bb1fb6..cd2abbe41497 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1166,19 +1166,6 @@ static struct notifier_block eeh_reboot_nb = {
>   .notifier_call = eeh_reboot_notifier,
>  };
>  
> -void eeh_probe_devices(void)
> -{
> - struct pci_controller *hose, *tmp;
> - struct pci_dn *pdn;
> -
> - /* Enable EEH for all adapters */
> - list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> - pdn = hose->pci_data;
> - traverse_pci_dn(pdn, eeh_ops->probe, NULL);
> - }
> - eeh_show_enabled();
> -}
> -
>  /**
>   * eeh_init - EEH initialization
>   *
> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> index f93dd5cf6a39..c40078d036af 100644
> --- a/arch/powerpc/kernel/eeh_cache.c
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -278,38 +278,6 @@ void eeh_addr_cache_init(void)
>   spin_lock_init(&pci_io_addr_cache_root.piar_lock);
>  }
>  
> -/**
> - * eeh_addr_cache_build - Build a cache of I/O addresses
> - *
> - * Build a cache of pci i/o addresses.  This cache will be used to
> - * find the pci device that corresponds to a given address.
> - * This routine scans all pci busses to build the cache.
> - * Must be run late in boot process, after the pci controllers
> - * have been scanned for devices (after all device resources are known).
> - */
> -void eeh_addr_cache_build(void)
> -{
> - struct pci_dn *pdn;
> - struct eeh_dev *edev;
> - struct pci_dev *dev = NULL;
> -
> - for_each_pci_dev(dev) {
> - pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
> - if (!pdn)
> - continue;
> -
> - edev = pdn_to_eeh_dev(pdn);
> - if (!edev)
> - continue;
> -
> - dev->dev.archdata.edev = edev;
> - edev->pdev = dev;
> -
> - eeh_addr_cache_insert_dev(dev);
> - eeh_sysfs_add_device(dev);
> - }
> -}
> -
>  static int eeh_addr_cache_show(struct seq_file *s, void *v)
>  {
>   struct pci_io_addr_range *piar;
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 81b0923cc55f..6a08f4fab255 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -240,9 +240,7 @@ int pnv_eeh_post_init(void)
>   struct pnv_phb *phb;
>   int ret = 0;
>  
> - /* Probe devices & build address cache */
> - eeh_probe_devices();
> - eeh_addr_cache_build();
> + eeh_show_enabled();
>  
>   /*

Re: [PATCH kernel RFC 2/2] vfio-pci-nvlink2: Implement interconnect isolation

2019-03-19 Thread David Gibson
On Tue, Mar 19, 2019 at 10:36:19AM -0600, Alex Williamson wrote:
> On Fri, 15 Mar 2019 19:18:35 +1100
> Alexey Kardashevskiy  wrote:
> 
> > The NVIDIA V100 SXM2 GPUs are connected to the CPU via PCIe links and
> > (on POWER9) NVLinks. In addition to that, GPUs themselves have direct
> > peer to peer NVLinks in groups of 2 to 4 GPUs. At the moment the POWERNV
> > platform puts all interconnected GPUs to the same IOMMU group.
> > 
> > However the user may want to pass individual GPUs to the userspace so
> > in order to do so we need to put them into separate IOMMU groups and
> > cut off the interconnects.
> > 
> > Thankfully V100 GPUs implement an interface to do by programming link
> > disabling mask to BAR0 of a GPU. Once a link is disabled in a GPU using
> > this interface, it cannot be re-enabled until the secondary bus reset is
> > issued to the GPU.
> > 
> > This defines a reset_done() handler for V100 NVlink2 device which
> > determines what links need to be disabled. This relies on presence
> > of the new "ibm,nvlink-peers" device tree property of a GPU telling which
> > PCI peers it is connected to (which includes NVLink bridges or peer GPUs).
> > 
> > This does not change the existing behaviour and instead adds
> > a new "isolate_nvlink" kernel parameter to allow such isolation.
> > 
> > The alternative approaches would be:
> > 
> > 1. do this in the system firmware (skiboot) but for that we would need
> > to tell skiboot via an additional OPAL call whether or not we want this
> > isolation - skiboot is unaware of IOMMU groups.
> > 
> > 2. do this in the secondary bus reset handler in the POWERNV platform -
> > the problem with that is at that point the device is not enabled, i.e.
> > config space is not restored so we need to enable the device (i.e. MMIO
> > bit in CMD register + program valid address to BAR0) in order to disable
> > links and then perhaps undo all this initialization to bring the device
> > back to the state where pci_try_reset_function() expects it to be.
> 
> The trouble seems to be that this approach only maintains the isolation
> exposed by the IOMMU group when vfio-pci is the active driver for the
> device.  IOMMU groups can be used by any driver and the IOMMU core is
> incorporating groups in various ways.

I don't think that reasoning is quite right.  An IOMMU group doesn't
necessarily represent devices which *are* isolated, just devices which
*can be* isolated.  There are plenty of instances when we don't need
to isolate devices in different IOMMU groups: passing both groups to
the same guest or userspace VFIO driver for example, or indeed when
both groups are owned by regular host kernel drivers.

In at least some of those cases we also don't want to isolate the
devices when we don't have to, usually for performance reasons.

> So, if there's a device specific
> way to configure the isolation reported in the group, which requires
> some sort of active management against things like secondary bus
> resets, then I think we need to manage it above the attached endpoint
> driver.

The problem is that above the endpoint driver, we don't actually have
enough information about what should be isolated.  For VFIO we want to
isolate things if they're in different containers, for most regular
host kernel drivers we don't need to isolate at all (although we might
as well when it doesn't have a cost).  The host side nVidia GPGPU
drivers also won't want to isolate the (host owned) NVLink devices
from each other, since they'll want to use the fast interconnects

> Ideally I'd see this as a set of PCI quirks so that we might
> leverage it beyond POWER platforms.  I'm not sure how we get past the
> reliance on device tree properties that we won't have on other
> platforms though, if only NVIDIA could at least open a spec addressing
> the discovery and configuration of NVLink registers on their
> devices :-\  Thanks,

Yeah, that'd be nice :/.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 1/8] powerpc/64: Adjust order in pcibios_init()

2019-03-19 Thread Alexey Kardashevskiy



On 20/03/2019 13:58, Sam Bobroff wrote:
> The pcibios_init() function for 64 bit PowerPC currently calls
> pci_bus_add_devices() before pcibios_resource_survey(), which seems
> incorrect because it adds devices and attempts to bind their drivers
> before allocating their resources (although no problems seem to be
> apparent).
> 
> So move the call to pci_bus_add_devices() to after
> pcibios_resource_survey(), while extracting call to the
> pcibios_fixup() hook so that it remains in the same location.
> 
> This will also allow the ppc_md.pcibios_bus_add_device() hooks to
> perform actions that depend on PCI resources, both during rescanning
> (where this is already the case) and at boot time, to support future
> work.
> 
> Signed-off-by: Sam Bobroff 


Reviewed-by: Alexey Kardashevskiy 



> ---
>  arch/powerpc/kernel/pci-common.c |  4 
>  arch/powerpc/kernel/pci_32.c |  4 
>  arch/powerpc/kernel/pci_64.c | 12 +---
>  3 files changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/pci-common.c 
> b/arch/powerpc/kernel/pci-common.c
> index ff4b7539cbdf..3146eb73e3b3 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -1383,10 +1383,6 @@ void __init pcibios_resource_survey(void)
>   pr_debug("PCI: Assigning unassigned resources...\n");
>   pci_assign_unassigned_resources();
>   }
> -
> - /* Call machine dependent fixup */
> - if (ppc_md.pcibios_fixup)
> - ppc_md.pcibios_fixup();
>  }
>  
>  /* This is used by the PCI hotplug driver to allocate resource
> diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c
> index d3f04f2d8249..40aaa1a6e193 100644
> --- a/arch/powerpc/kernel/pci_32.c
> +++ b/arch/powerpc/kernel/pci_32.c
> @@ -259,6 +259,10 @@ static int __init pcibios_init(void)
>   /* Call common code to handle resource allocation */
>   pcibios_resource_survey();
>  
> + /* Call machine dependent fixup */
> + if (ppc_md.pcibios_fixup)
> + ppc_md.pcibios_fixup();
> +
>   /* Call machine dependent post-init code */
>   if (ppc_md.pcibios_after_init)
>   ppc_md.pcibios_after_init();
> diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
> index 9d8c10d55407..6f16f30031d7 100644
> --- a/arch/powerpc/kernel/pci_64.c
> +++ b/arch/powerpc/kernel/pci_64.c
> @@ -58,14 +58,20 @@ static int __init pcibios_init(void)
>   pci_add_flags(PCI_ENABLE_PROC_DOMAINS | PCI_COMPAT_DOMAIN_0);
>  
>   /* Scan all of the recorded PCI controllers.  */
> - list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> + list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
>   pcibios_scan_phb(hose);
> - pci_bus_add_devices(hose->bus);
> - }
>  
>   /* Call common code to handle resource allocation */
>   pcibios_resource_survey();
>  
> + /* Add devices. */
> + list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
> + pci_bus_add_devices(hose->bus);
> +
> + /* Call machine dependent fixup */
> + if (ppc_md.pcibios_fixup)
> + ppc_md.pcibios_fixup();
> +
>   printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
>  
>   return 0;
> 

-- 
Alexey


Re: [PATCH 2/8] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag

2019-03-19 Thread Alexey Kardashevskiy



On 20/03/2019 13:58, Sam Bobroff wrote:
> The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the
> use of driver callbacks in drivers that have been bound part way
> through the recovery process. This is necessary to prevent later stage
> handlers from being called when the earlier stage handlers haven't,
> which can be confusing for drivers.

The flag is used from eeh_pe_report()->eeh_pe_report_edev which is
called many times from eeh_handle_normal_event() (and you clear the flag
here unconditionally) and once from eeh_handle_special_event() - so this
is actually the only case now when the flag matters. Is my understanding
correct? Also is not clearing the flag correct in that case? I do not
quite understand eeh_handle_normal_event vs. eeh_handle_special_event
business though.


> 
> However, the flag is set for all devices that are added after boot
> time and only cleared at the end of the EEH recovery process. This
> results in hot plugged devices erroneously having the flag set during
> the first recovery after they are added (causing their driver's
> handlers to be incorrectly ignored).
> 
> To remedy this, clear the flag at the beginning of recovery
> processing. The flag is still cleared at the end of recovery
> processing, although it is no longer really necessary.

Then may be remove that redundant clearing?

> 
> Signed-off-by: Sam Bobroff 
> ---
>  arch/powerpc/kernel/eeh_driver.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/eeh_driver.c 
> b/arch/powerpc/kernel/eeh_driver.c
> index 6f3ee30565dd..4c34b9901f15 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -819,6 +819,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
>   result = PCI_ERS_RESULT_DISCONNECT;
>   }
>  
> + eeh_for_each_pe(pe, tmp_pe)
> + eeh_pe_for_each_dev(tmp_pe, edev, tmp)
> + edev->mode &= ~EEH_DEV_NO_HANDLER;
> +
>   /* Walk the various device drivers attached to this slot through
>* a reset sequence, giving each an opportunity to do what it needs
>* to accomplish the reset.  Each child gets a report of the
> 

-- 
Alexey


Re: [PATCH 3/8] powerpc/eeh: Convert PNV_PHB_FLAG_EEH to global flag

2019-03-19 Thread Alexey Kardashevskiy



On 20/03/2019 13:58, Sam Bobroff wrote:
> The PHB flag, PNV_PHB_FLAG_EEH, is set (on PowerNV) individually on
> each PHB once the EEH subsystem is ready. It is the only use of the
> flags member of the phb struct.


Then why to keep pnv_phb::flags?

> However there is no need to store this separately on each PHB, so
> convert it to a global flag. For symmetry, the flag is now also set
> for pSeries; although it is currently unused it may be useful in the
> future.

Just using eeh_enabled() instead of (phb->flags & PNV_PHB_FLAG_EEH)
seems easier and cleaner; also pseries does not use it so there is no
point defining it there either.


> 
> Signed-off-by: Sam Bobroff 
> ---
>  arch/powerpc/include/asm/eeh.h   | 11 +++
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 14 +++---
>  arch/powerpc/platforms/powernv/pci.c |  7 +++
>  arch/powerpc/platforms/powernv/pci.h |  2 --
>  arch/powerpc/platforms/pseries/pci.c |  4 
>  5 files changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 3613a56281f2..fe4cf7208890 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -43,6 +43,7 @@ struct pci_dn;
>  #define EEH_VALID_PE_ZERO0x10/* PE#0 is valid */
>  #define EEH_ENABLE_IO_FOR_LOG0x20/* Enable IO for log
>  */
>  #define EEH_EARLY_DUMP_LOG   0x40/* Dump log immediately  */
> +#define EEH_PHB_ENABLED  0x80/* PHB recovery uses EEH
>  */
>  
>  /*
>   * Delay for PE reset, all in ms
> @@ -245,6 +246,11 @@ static inline bool eeh_enabled(void)
>   return eeh_has_flag(EEH_ENABLED) && !eeh_has_flag(EEH_FORCE_DISABLED);
>  }
>  
> +static inline bool eeh_phb_enabled(void)
> +{
> + return eeh_has_flag(EEH_PHB_ENABLED);
> +}
> +
>  static inline void eeh_serialize_lock(unsigned long *flags)
>  {
>   raw_spin_lock_irqsave(&confirm_error_lock, *flags);
> @@ -332,6 +338,11 @@ static inline bool eeh_enabled(void)
>  return false;
>  }
>  
> +static inline bool eeh_phb_enabled(void)
> +{
> + return false;
> +}
> +
>  static inline void eeh_probe_devices(void) { }
>  
>  static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 6fc1a463b796..f0a95f663810 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -264,22 +264,14 @@ int pnv_eeh_post_init(void)
>   return ret;
>   }
>  
> - if (!eeh_enabled())
> + if (eeh_enabled())
> + eeh_add_flag(EEH_PHB_ENABLED);
> + else
>   disable_irq(eeh_event_irq);
>  
>   list_for_each_entry(hose, &hose_list, list_node) {
>   phb = hose->private_data;
>  
> - /*
> -  * If EEH is enabled, we're going to rely on that.
> -  * Otherwise, we restore to conventional mechanism
> -  * to clear frozen PE during PCI config access.
> -  */
> - if (eeh_enabled())
> - phb->flags |= PNV_PHB_FLAG_EEH;
> - else
> - phb->flags &= ~PNV_PHB_FLAG_EEH;
> -
>   /* Create debugfs entries */
>  #ifdef CONFIG_DEBUG_FS
>   if (phb->has_dbgfs || !phb->dbgfs)
> diff --git a/arch/powerpc/platforms/powernv/pci.c 
> b/arch/powerpc/platforms/powernv/pci.c
> index 307181fd8a17..d2b50f3bf6b1 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -717,10 +717,9 @@ int pnv_pci_cfg_write(struct pci_dn *pdn,
>  static bool pnv_pci_cfg_check(struct pci_dn *pdn)
>  {
>   struct eeh_dev *edev = NULL;
> - struct pnv_phb *phb = pdn->phb->private_data;
>  
>   /* EEH not enabled ? */
> - if (!(phb->flags & PNV_PHB_FLAG_EEH))
> + if (!eeh_phb_enabled())
>   return true;
>  
>   /* PE reset or device removed ? */
> @@ -761,7 +760,7 @@ static int pnv_pci_read_config(struct pci_bus *bus,
>  
>   ret = pnv_pci_cfg_read(pdn, where, size, val);
>   phb = pdn->phb->private_data;
> - if (phb->flags & PNV_PHB_FLAG_EEH && pdn->edev) {
> + if (eeh_phb_enabled() && pdn->edev) {
>   if (*val == EEH_IO_ERROR_VALUE(size) &&
>   eeh_dev_check_failure(pdn->edev))
>  return PCIBIOS_DEVICE_NOT_FOUND;
> @@ -789,7 +788,7 @@ static int pnv_pci_write_config(struct pci_bus *bus,
>  
>   ret = pnv_pci_cfg_write(pdn, where, size, val);
>   phb = pdn->phb->private_data;
> - if (!(phb->flags & PNV_PHB_FLAG_EEH))
> + if (!eeh_phb_enabled())
>   pnv_pci_config_check_eeh(pdn);
>  
>   return ret;
> diff --git a/arch/powerpc/platforms/powernv/pci.h 
> b/arch/powerpc/platforms/powernv/pci.h
> index 8

Re: [PATCH 4/8] powerpc/eeh: Improve debug messages around device addition

2019-03-19 Thread Alexey Kardashevskiy



On 20/03/2019 13:58, Sam Bobroff wrote:
> Also remove useless comment.

Reviewed-by: Alexey Kardashevskiy 

> 
> Signed-off-by: Sam Bobroff 
> ---
>  arch/powerpc/kernel/eeh.c|  2 +-
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 14 
>  arch/powerpc/platforms/pseries/eeh_pseries.c | 23 +++-
>  3 files changed, 28 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 8d3c36a1f194..b14d89547895 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1291,7 +1291,7 @@ void eeh_add_device_late(struct pci_dev *dev)
>   pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
>   edev = pdn_to_eeh_dev(pdn);
>   if (edev->pdev == dev) {
> - pr_debug("EEH: Already referenced !\n");
> + pr_debug("EEH: Device %s already referenced!\n", pci_name(dev));
>   return;
>   }
>  
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index f0a95f663810..51c5b6bb9b0e 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -50,10 +50,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
>   if (!pdev->is_virtfn)
>   return;
>  
> - /*
> -  * The following operations will fail if VF's sysfs files
> -  * aren't created or its resources aren't finalized.
> -  */
> + pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
>   eeh_add_device_early(pdn);
>   eeh_add_device_late(pdev);
>   eeh_sysfs_add_device(pdev);
> @@ -389,6 +386,10 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void 
> *data)
>   int ret;
>   int config_addr = (pdn->busno << 8) | (pdn->devfn);
>  
> + pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
> + __func__, hose->global_number, pdn->busno,
> + PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
> +
>   /*
>* When probing the root bridge, which doesn't have any
>* subordinate PCI devices. We don't have OF node for
> @@ -483,6 +484,11 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void 
> *data)
>   /* Save memory bars */
>   eeh_save_bars(edev);
>  
> + pr_debug("%s: EEH enabled on %02x:%02x.%01x PHB#%x-PE#%x\n",
> + __func__, pdn->busno, PCI_SLOT(pdn->devfn),
> + PCI_FUNC(pdn->devfn), edev->pe->phb->global_number,
> + edev->pe->addr);
> +
>   return NULL;
>  }
>  
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
> b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index 7aa50258dd42..ae06878fbdea 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -65,6 +65,8 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
>   if (!pdev->is_virtfn)
>   return;
>  
> + pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
> +
>   pdn->device_id  =  pdev->device;
>   pdn->vendor_id  =  pdev->vendor;
>   pdn->class_code =  pdev->class;
> @@ -251,6 +253,10 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
> *data)
>   int enable = 0;
>   int ret;
>  
> + pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
> + __func__, pdn->phb->global_number, pdn->busno,
> + PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
> +
>   /* Retrieve OF node and eeh device */
>   edev = pdn_to_eeh_dev(pdn);
>   if (!edev || edev->pe)
> @@ -294,7 +300,12 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
> *data)
>  
>   /* Enable EEH on the device */
>   ret = eeh_ops->set_option(&pe, EEH_OPT_ENABLE);
> - if (!ret) {
> + if (ret) {
> + pr_debug("%s: EEH failed to enable on %02x:%02x.%01x 
> PHB#%x-PE#%x (code %d)\n",
> + __func__, pdn->busno, PCI_SLOT(pdn->devfn),
> + PCI_FUNC(pdn->devfn), pe.phb->global_number,
> + pe.addr, ret);
> + } else {
>   /* Retrieve PE address */
>   edev->pe_config_addr = eeh_ops->get_pe_addr(&pe);
>   pe.addr = edev->pe_config_addr;
> @@ -310,11 +321,6 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
> *data)
>   if (enable) {
>   eeh_add_flag(EEH_ENABLED);
>   eeh_add_to_parent_pe(edev);
> -
> - pr_debug("%s: EEH enabled on %02x:%02x.%01x 
> PHB#%x-PE#%x\n",
> - __func__, pdn->busno, PCI_SLOT(pdn->devfn),
> - PCI_FUNC(pdn->devfn), pe.phb->global_number,
> - pe.addr);
>   } else if (pdn->parent && pdn_to_eeh_dev(pdn->parent) &&
>  (pdn_to_eeh_dev(pdn->parent))->pe) {
>   /* This device doesn't support EEH, but it may ha

Re: [PATCH 5/8] powerpc/eeh: Add eeh_show_enabled()

2019-03-19 Thread Alexey Kardashevskiy



On 20/03/2019 13:58, Sam Bobroff wrote:
> Move the EEH enabled message into it's own function so that future
> work can call it from multiple places.
> 
> Signed-off-by: Sam Bobroff 
> ---
>  arch/powerpc/include/asm/eeh.h |  3 +++
>  arch/powerpc/kernel/eeh.c  | 16 +++-
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index fe4cf7208890..e217ccda55d0 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -289,6 +289,7 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
>  
>  struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
>  void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
> +void eeh_show_enabled(void);
>  void eeh_probe_devices(void);
>  int __init eeh_ops_register(struct eeh_ops *ops);
>  int __exit eeh_ops_unregister(const char *name);
> @@ -338,6 +339,8 @@ static inline bool eeh_enabled(void)
>  return false;
>  }
>  
> +static inline void eeh_show_enabled(void) { }
> +
>  static inline bool eeh_phb_enabled(void)
>  {
>   return false;
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index b14d89547895..3dcff29cb9b3 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -163,6 +163,16 @@ static int __init eeh_setup(char *str)
>  }
>  __setup("eeh=", eeh_setup);
>  
> +void eeh_show_enabled(void)
> +{
> + if (eeh_has_flag(EEH_FORCE_DISABLED))
> + pr_info("EEH: PCI Enhanced I/O Error Handling DISABLED (by 
> eeh=off)\n");
> + else if (eeh_enabled())


I'd make it eeh_has_flag(EEH_ENABLED) for clarity.


> + pr_info("EEH: PCI Enhanced I/O Error Handling ENABLED (capable 
> adapter found)\n");
> + else
> + pr_info("EEH: PCI Enhanced I/O Error Handling DISABLED (no 
> capable adapter found)\n");
> +}
> +
>  /*
>   * This routine captures assorted PCI configuration space data
>   * for the indicated PCI device, and puts them into a buffer
> @@ -1166,11 +1176,7 @@ void eeh_probe_devices(void)
>   pdn = hose->pci_data;
>   traverse_pci_dn(pdn, eeh_ops->probe, NULL);
>   }
> - if (eeh_enabled())
> - pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
> - else
> - pr_info("EEH: No capable adapters found\n");
> -
> + eeh_show_enabled();


This line moves later in the series so I'd just merge this patch into
8/8 to reduce number of lines moving withing the patchset.

In general the whole point of the EEH_ENABLED flag is fading away. Its
meaning now is that "at least somewhere in the box for at least one
device with enabled EEH" which does not seem extremely useful as we have
a pci_dev or pe pretty much everywhere we look at eeh_enabled() and
pdev->dev.archdata.edev can tell if eeh is enabled for a device.
Although I am pretty sure this is in your list already :)


>  }
>  
>  /**
> 

-- 
Alexey


Re: [PATCH 6/8] powerpc/eeh: Initialize EEH address cache earlier

2019-03-19 Thread Alexey Kardashevskiy



On 20/03/2019 13:58, Sam Bobroff wrote:
> The EEH address cache is currently initialized and populated by a
> single function: eeh_addr_cache_build().  While the initial population
> of the cache can only be done once resources are allocated,
> initialization (just setting up a spinlock) could be done much
> earlier.
> 
> So move the initialization step into a separate function and call it
> from a core_initcall (rather than a subsys initcall).
> 
> This will allow future work to make use of the cache during boot time
> PCI scanning.
> 
> Signed-off-by: Sam Bobroff 

Reviewed-by: Alexey Kardashevskiy 

> ---
>  arch/powerpc/include/asm/eeh.h  |  3 +++
>  arch/powerpc/kernel/eeh.c   |  2 ++
>  arch/powerpc/kernel/eeh_cache.c | 13 +++--
>  3 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index e217ccda55d0..791b9e6fcc45 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -295,6 +295,7 @@ int __init eeh_ops_register(struct eeh_ops *ops);
>  int __exit eeh_ops_unregister(const char *name);
>  int eeh_check_failure(const volatile void __iomem *token);
>  int eeh_dev_check_failure(struct eeh_dev *edev);
> +void eeh_addr_cache_init(void);
>  void eeh_addr_cache_build(void);
>  void eeh_add_device_early(struct pci_dn *);
>  void eeh_add_device_tree_early(struct pci_dn *);
> @@ -362,6 +363,8 @@ static inline int eeh_check_failure(const volatile void 
> __iomem *token)
>  
>  #define eeh_dev_check_failure(x) (0)
>  
> +static inline void eeh_addr_cache_init(void) { }
> +
>  static inline void eeh_addr_cache_build(void) { }
>  
>  static inline void eeh_add_device_early(struct pci_dn *pdn) { }
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 3dcff29cb9b3..7a406d58d2c0 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1219,6 +1219,8 @@ static int eeh_init(void)
>   list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
>   eeh_dev_phb_init_dynamic(hose);
>  
> + eeh_addr_cache_init();
> +
>   /* Initialize EEH event */
>   return eeh_event_init();
>  }
> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> index 9c68f0837385..f93dd5cf6a39 100644
> --- a/arch/powerpc/kernel/eeh_cache.c
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -267,6 +267,17 @@ void eeh_addr_cache_rmv_dev(struct pci_dev *dev)
>   spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
>  }
>  
> +/**
> + * eeh_addr_cache_init - Initialize a cache of I/O addresses
> + *
> + * Initialize a cache of pci i/o addresses.  This cache will be used to
> + * find the pci device that corresponds to a given address.
> + */
> +void eeh_addr_cache_init(void)
> +{
> + spin_lock_init(&pci_io_addr_cache_root.piar_lock);
> +}
> +
>  /**
>   * eeh_addr_cache_build - Build a cache of I/O addresses
>   *
> @@ -282,8 +293,6 @@ void eeh_addr_cache_build(void)
>   struct eeh_dev *edev;
>   struct pci_dev *dev = NULL;
>  
> - spin_lock_init(&pci_io_addr_cache_root.piar_lock);
> -
>   for_each_pci_dev(dev) {
>   pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
>   if (!pdn)
> 

-- 
Alexey


Re: [PATCH 7/8] powerpc/eeh: EEH for pSeries hot plug

2019-03-19 Thread Alexey Kardashevskiy



On 20/03/2019 13:58, Sam Bobroff wrote:
> On PowerNV and pSeries, devices currently acquire EEH support from
> several different places: Boot-time devices from eeh_probe_devices()
> and eeh_addr_cache_build(), Virtual Function devices from the pcibios
> bus add device hooks and hot plugged devices from pci_hp_add_devices()
> (with other platforms using other methods as well).  Unfortunately,
> pSeries machines currently discover hot plugged devices using
> pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do
> not receive EEH support.
> 
> Rather than adding another case for pci_rescan_bus(), this change
> widens the scope of the pcibios bus add device hooks so that they can
> handle all devices. As a side effect this also supports devices
> discovered after manually rescanning via /sys/bus/pci/rescan.
> 
> Note that on PowerNV, this change allows the EEH subsystem to become
> enabled after boot as long as it has not been forced off, which was
> not previously possible (it was already possible on pSeries).
> 
> Signed-off-by: Sam Bobroff 
> ---
>  arch/powerpc/kernel/eeh.c|  2 +-
>  arch/powerpc/kernel/of_platform.c|  3 +-
>  arch/powerpc/platforms/powernv/eeh-powernv.c |  8 ++-
>  arch/powerpc/platforms/pseries/eeh_pseries.c | 54 ++--
>  4 files changed, 35 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 7a406d58d2c0..217e14bb1fb6 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1291,7 +1291,7 @@ void eeh_add_device_late(struct pci_dev *dev)
>   struct pci_dn *pdn;
>   struct eeh_dev *edev;
>  
> - if (!dev || !eeh_enabled())
> + if (!dev)
>   return;
>  
>   pr_debug("EEH: Adding device %s\n", pci_name(dev));
> diff --git a/arch/powerpc/kernel/of_platform.c 
> b/arch/powerpc/kernel/of_platform.c
> index becaec990140..d5818e9c4069 100644
> --- a/arch/powerpc/kernel/of_platform.c
> +++ b/arch/powerpc/kernel/of_platform.c
> @@ -86,7 +86,8 @@ static int of_pci_phb_probe(struct platform_device *dev)
>   pcibios_claim_one_bus(phb->bus);
>  
>   /* Finish EEH setup */
> - eeh_add_device_tree_late(phb->bus);
> + if (!eeh_has_flag(EEH_FORCE_DISABLED))
> + eeh_add_device_tree_late(phb->bus);
>  
>   /* Add probed PCI devices to the device model */
>   pci_bus_add_devices(phb->bus);
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 51c5b6bb9b0e..81b0923cc55f 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -47,7 +47,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
>  {
>   struct pci_dn *pdn = pci_get_pdn(pdev);
>  
> - if (!pdev->is_virtfn)
> + if (eeh_has_flag(EEH_FORCE_DISABLED))
>   return;
>  
>   pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
> @@ -479,7 +479,11 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void 
> *data)
>* Enable EEH explicitly so that we will do EEH check
>* while accessing I/O stuff
>*/
> - eeh_add_flag(EEH_ENABLED);
> + if (!eeh_has_flag(EEH_ENABLED)) {
> + enable_irq(eeh_event_irq);
> + eeh_add_flag(EEH_PHB_ENABLED);


Except that I do not think we need EEH_PHB_ENABLED (commented elsewhere),

Reviewed-by: Alexey Kardashevskiy 




> + eeh_add_flag(EEH_ENABLED);
> + }
>  
>   /* Save memory bars */
>   eeh_save_bars(edev);
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
> b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index ae06878fbdea..e68c79164974 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -55,44 +55,44 @@ static int ibm_get_config_addr_info;
>  static int ibm_get_config_addr_info2;
>  static int ibm_configure_pe;
>  
> -#ifdef CONFIG_PCI_IOV
>  void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
>  {
>   struct pci_dn *pdn = pci_get_pdn(pdev);
> - struct pci_dn *physfn_pdn;
> - struct eeh_dev *edev;
>  
> - if (!pdev->is_virtfn)
> + if (eeh_has_flag(EEH_FORCE_DISABLED))
>   return;
>  
>   pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
> +#ifdef CONFIG_PCI_IOV
> + if (pdev->is_virtfn) {
> + struct pci_dn *physfn_pdn;
>  
> - pdn->device_id  =  pdev->device;
> - pdn->vendor_id  =  pdev->vendor;
> - pdn->class_code =  pdev->class;
> - /*
> -  * Last allow unfreeze return code used for retrieval
> -  * by user space in eeh-sysfs to show the last command
> -  * completion from platform.
> -  */
> - pdn->last_allow_rc =  0;
> - physfn_pdn  =  pci_get_pdn(pdev->physfn);
> - pdn->pe_number  =  physfn_pdn->pe_num_map[pdn->vf_index];
> - edev = pdn_to_eeh_dev(pdn);
> -

[PATCH v3 4/5] ocxl: Remove superfluous 'extern' from headers

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

The 'extern' keyword adds no value here.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/ocxl_internal.h | 54 +++
 include/misc/ocxl.h   | 36 ++---
 2 files changed, 44 insertions(+), 46 deletions(-)

diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index a32f2151029f..321b29e77f45 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -16,7 +16,6 @@
 
 extern struct pci_driver ocxl_pci_driver;
 
-
 struct ocxl_fn {
struct device dev;
int bar_used[3];
@@ -92,41 +91,40 @@ struct ocxl_process_element {
__be32 software_state;
 };
 
+struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu);
+void ocxl_afu_put(struct ocxl_afu *afu);
 
-extern struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu);
-extern void ocxl_afu_put(struct ocxl_afu *afu);
-
-extern int ocxl_create_cdev(struct ocxl_afu *afu);
-extern void ocxl_destroy_cdev(struct ocxl_afu *afu);
-extern int ocxl_register_afu(struct ocxl_afu *afu);
-extern void ocxl_unregister_afu(struct ocxl_afu *afu);
+int ocxl_create_cdev(struct ocxl_afu *afu);
+void ocxl_destroy_cdev(struct ocxl_afu *afu);
+int ocxl_register_afu(struct ocxl_afu *afu);
+void ocxl_unregister_afu(struct ocxl_afu *afu);
 
-extern int ocxl_file_init(void);
-extern void ocxl_file_exit(void);
+int ocxl_file_init(void);
+void ocxl_file_exit(void);
 
-extern int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size);
-extern void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
-extern int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
-extern void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
+int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size);
+void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
+int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
+void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
 
-extern struct ocxl_context *ocxl_context_alloc(void);
-extern int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
+struct ocxl_context *ocxl_context_alloc(void);
+int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
struct address_space *mapping);
-extern int ocxl_context_attach(struct ocxl_context *ctx, u64 amr);
-extern int ocxl_context_mmap(struct ocxl_context *ctx,
+int ocxl_context_attach(struct ocxl_context *ctx, u64 amr);
+int ocxl_context_mmap(struct ocxl_context *ctx,
struct vm_area_struct *vma);
-extern int ocxl_context_detach(struct ocxl_context *ctx);
-extern void ocxl_context_detach_all(struct ocxl_afu *afu);
-extern void ocxl_context_free(struct ocxl_context *ctx);
+int ocxl_context_detach(struct ocxl_context *ctx);
+void ocxl_context_detach_all(struct ocxl_afu *afu);
+void ocxl_context_free(struct ocxl_context *ctx);
 
-extern int ocxl_sysfs_add_afu(struct ocxl_afu *afu);
-extern void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);
+int ocxl_sysfs_add_afu(struct ocxl_afu *afu);
+void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);
 
-extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset);
-extern int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset);
-extern void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
-extern int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset,
+int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset);
+int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset);
+void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
+int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset,
int eventfd);
-extern u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset);
+u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset);
 
 #endif /* _OCXL_INTERNAL_H_ */
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 9ff6ddc28e22..4544573cc93c 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -53,7 +53,7 @@ struct ocxl_fn_config {
  * Read the configuration space of a function and fill in a
  * ocxl_fn_config structure with all the function details
  */
-extern int ocxl_config_read_function(struct pci_dev *dev,
+int ocxl_config_read_function(struct pci_dev *dev,
struct ocxl_fn_config *fn);
 
 /*
@@ -62,14 +62,14 @@ extern int ocxl_config_read_function(struct pci_dev *dev,
  * AFU indexes can be sparse, so a driver should check all indexes up
  * to the maximum found in the function description
  */
-extern int ocxl_config_check_afu_index(struct pci_dev *dev,
+int ocxl_config_check_afu_index(struct pci_dev *dev,
struct ocxl_fn_config *fn, int afu_idx);
 
 /*
  * Read the configuration space of a function for the AFU specified by
  * the index 'afu_idx'. Fills in a ocxl_afu_config structure
  */
-extern int ocxl_config_read_afu(struct pci_dev *dev,
+int ocxl_config_read_af

[PATCH v3 0/5] ocxl: OpenCAPI Cleanup

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

Some minor cleanups for the OpenCAPI driver as a prerequisite
for an ocxl driver refactoring to allow the driver core to
be utilised by external drivers.

Changelog:
V3:
  - Add missed header in 'ocxl: Remove some unused exported symbols'.
This addresses the introduced sparse warnings
V2:
  - remove intermediate assignment of 'link' var in
'Rename struct link to ocxl_link'
  - Don't shift definition of ocxl_context_attach in
'Remove some unused exported symbols'

Alastair D'Silva (5):
  ocxl: Rename struct link to ocxl_link
  ocxl: Clean up printf formats
  ocxl: read_pasid never returns an error, so make it void
  ocxl: Remove superfluous 'extern' from headers
  ocxl: Remove some unused exported symbols

 drivers/misc/ocxl/config.c| 17 ++-
 drivers/misc/ocxl/context.c   |  2 +-
 drivers/misc/ocxl/file.c  |  5 +-
 drivers/misc/ocxl/link.c  | 36 ++---
 drivers/misc/ocxl/ocxl_internal.h | 85 +++
 drivers/misc/ocxl/trace.h | 10 ++--
 include/misc/ocxl.h   | 53 ++-
 7 files changed, 99 insertions(+), 109 deletions(-)

-- 
2.20.1



[PATCH v3 5/5] ocxl: Remove some unused exported symbols

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

Remove some unused exported symbols.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/config.c|  2 --
 drivers/misc/ocxl/ocxl_internal.h | 23 +++
 include/misc/ocxl.h   | 23 ---
 3 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 026ac2ac4f9c..c90c2e4875bf 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -299,7 +299,6 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
}
return 1;
 }
-EXPORT_SYMBOL_GPL(ocxl_config_check_afu_index);
 
 static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn,
struct ocxl_afu_config *afu)
@@ -535,7 +534,6 @@ int ocxl_config_get_pasid_info(struct pci_dev *dev, int 
*count)
 {
return pnv_ocxl_get_pasid_count(dev, count);
 }
-EXPORT_SYMBOL_GPL(ocxl_config_get_pasid_info);
 
 void ocxl_config_set_afu_pasid(struct pci_dev *dev, int pos, int pasid_base,
u32 pasid_count_log)
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 321b29e77f45..06fd98c989c8 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -107,6 +107,29 @@ void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, 
u32 size);
 int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
 void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
 
+/*
+ * Get the max PASID value that can be used by the function
+ */
+int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count);
+
+/*
+ * Check if an AFU index is valid for the given function.
+ *
+ * AFU indexes can be sparse, so a driver should check all indexes up
+ * to the maximum found in the function description
+ */
+int ocxl_config_check_afu_index(struct pci_dev *dev,
+   struct ocxl_fn_config *fn, int afu_idx);
+
+/**
+ * Update values within a Process Element
+ *
+ * link_handle: the link handle associated with the process element
+ * pasid: the PASID for the AFU context
+ * tid: the new thread id for the process element
+ */
+int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
+
 struct ocxl_context *ocxl_context_alloc(void);
 int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
struct address_space *mapping);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 4544573cc93c..9530d3be1b30 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -56,15 +56,6 @@ struct ocxl_fn_config {
 int ocxl_config_read_function(struct pci_dev *dev,
struct ocxl_fn_config *fn);
 
-/*
- * Check if an AFU index is valid for the given function.
- *
- * AFU indexes can be sparse, so a driver should check all indexes up
- * to the maximum found in the function description
- */
-int ocxl_config_check_afu_index(struct pci_dev *dev,
-   struct ocxl_fn_config *fn, int afu_idx);
-
 /*
  * Read the configuration space of a function for the AFU specified by
  * the index 'afu_idx'. Fills in a ocxl_afu_config structure
@@ -74,11 +65,6 @@ int ocxl_config_read_afu(struct pci_dev *dev,
struct ocxl_afu_config *afu,
u8 afu_idx);
 
-/*
- * Get the max PASID value that can be used by the function
- */
-int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count);
-
 /*
  * Tell an AFU, by writing in the configuration space, the PASIDs that
  * it can use. Range starts at 'pasid_base' and its size is a multiple
@@ -188,15 +174,6 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data);
 
-/**
- * Update values within a Process Element
- *
- * link_handle: the link handle associated with the process element
- * pasid: the PASID for the AFU context
- * tid: the new thread id for the process element
- */
-int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
-
 /*
  * Remove a Process Element from the Shared Process Area for a link
  */
-- 
2.20.1



[PATCH v3 3/5] ocxl: read_pasid never returns an error, so make it void

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

No need for a return value in read_pasid as it only returns 0.

Signed-off-by: Alastair D'Silva 
Reviewed-by: Greg Kurz 
---
 drivers/misc/ocxl/config.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 0ee7856b033d..026ac2ac4f9c 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -68,7 +68,7 @@ static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 
afu_idx)
return 0;
 }
 
-static int read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
+static void read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
 {
u16 val;
int pos;
@@ -89,7 +89,6 @@ static int read_pasid(struct pci_dev *dev, struct 
ocxl_fn_config *fn)
 out:
dev_dbg(&dev->dev, "PASID capability:\n");
dev_dbg(&dev->dev, "  Max PASID log = %d\n", fn->max_pasid_log);
-   return 0;
 }
 
 static int read_dvsec_tl(struct pci_dev *dev, struct ocxl_fn_config *fn)
@@ -205,11 +204,7 @@ int ocxl_config_read_function(struct pci_dev *dev, struct 
ocxl_fn_config *fn)
 {
int rc;
 
-   rc = read_pasid(dev, fn);
-   if (rc) {
-   dev_err(&dev->dev, "Invalid PASID configuration: %d\n", rc);
-   return -ENODEV;
-   }
+   read_pasid(dev, fn);
 
rc = read_dvsec_tl(dev, fn);
if (rc) {
-- 
2.20.1



[PATCH v3 2/5] ocxl: Clean up printf formats

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

Use %# instead of using a literal '0x'

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/config.c  |  6 +++---
 drivers/misc/ocxl/context.c |  2 +-
 drivers/misc/ocxl/trace.h   | 10 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 8f2c5d8bd2ee..0ee7856b033d 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -178,9 +178,9 @@ static int read_dvsec_vendor(struct pci_dev *dev)
pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_DLX_VERS, &dlx);
 
dev_dbg(&dev->dev, "Vendor specific DVSEC:\n");
-   dev_dbg(&dev->dev, "  CFG version = 0x%x\n", cfg);
-   dev_dbg(&dev->dev, "  TLX version = 0x%x\n", tlx);
-   dev_dbg(&dev->dev, "  DLX version = 0x%x\n", dlx);
+   dev_dbg(&dev->dev, "  CFG version = %#x\n", cfg);
+   dev_dbg(&dev->dev, "  TLX version = %#x\n", tlx);
+   dev_dbg(&dev->dev, "  DLX version = %#x\n", dlx);
return 0;
 }
 
diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index c10a940e3b38..3498a0199bde 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -134,7 +134,7 @@ static vm_fault_t ocxl_mmap_fault(struct vm_fault *vmf)
vm_fault_t ret;
 
offset = vmf->pgoff << PAGE_SHIFT;
-   pr_debug("%s: pasid %d address 0x%lx offset 0x%llx\n", __func__,
+   pr_debug("%s: pasid %d address %#lx offset %#llx\n", __func__,
ctx->pasid, vmf->address, offset);
 
if (offset < ctx->afu->irq_base_offset)
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
index bcb7ff330c1e..68bf2f173a1a 100644
--- a/drivers/misc/ocxl/trace.h
+++ b/drivers/misc/ocxl/trace.h
@@ -28,7 +28,7 @@ DECLARE_EVENT_CLASS(ocxl_context,
__entry->tidr = tidr;
),
 
-   TP_printk("linux pid=%d spa=0x%p pasid=0x%x pidr=0x%x tidr=0x%x",
+   TP_printk("linux pid=%d spa=%p pasid=%#x pidr=%#x tidr=%#x",
__entry->pid,
__entry->spa,
__entry->pasid,
@@ -61,7 +61,7 @@ TRACE_EVENT(ocxl_terminate_pasid,
__entry->rc = rc;
),
 
-   TP_printk("pasid=0x%x rc=%d",
+   TP_printk("pasid=%#x rc=%d",
__entry->pasid,
__entry->rc
)
@@ -87,7 +87,7 @@ DECLARE_EVENT_CLASS(ocxl_fault_handler,
__entry->tfc = tfc;
),
 
-   TP_printk("spa=%p pe=0x%llx dsisr=0x%llx dar=0x%llx tfc=0x%llx",
+   TP_printk("spa=%p pe=%#llx dsisr=%#llx dar=%#llx tfc=%#llx",
__entry->spa,
__entry->pe,
__entry->dsisr,
@@ -127,7 +127,7 @@ TRACE_EVENT(ocxl_afu_irq_alloc,
__entry->irq_offset = irq_offset;
),
 
-   TP_printk("pasid=0x%x irq_id=%d virq=%u hw_irq=%d irq_offset=0x%llx",
+   TP_printk("pasid=%#x irq_id=%d virq=%u hw_irq=%d irq_offset=%#llx",
__entry->pasid,
__entry->irq_id,
__entry->virq,
@@ -150,7 +150,7 @@ TRACE_EVENT(ocxl_afu_irq_free,
__entry->irq_id = irq_id;
),
 
-   TP_printk("pasid=0x%x irq_id=%d",
+   TP_printk("pasid=%#x irq_id=%d",
__entry->pasid,
__entry->irq_id
)
-- 
2.20.1



[PATCH v3 1/5] ocxl: Rename struct link to ocxl_link

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

The term 'link' is ambiguous (especially when the struct is used for a
list), so rename it for clarity.

Signed-off-by: Alastair D'Silva 
Reviewed-by: Greg Kurz 
---
 drivers/misc/ocxl/file.c |  5 ++---
 drivers/misc/ocxl/link.c | 36 ++--
 2 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index e6a607488f8a..009e09b7ded5 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -151,10 +151,9 @@ static long afu_ioctl_enable_p9_wait(struct ocxl_context 
*ctx,
mutex_unlock(&ctx->status_mutex);
 
if (status == ATTACHED) {
-   int rc;
-   struct link *link = ctx->afu->fn->link;
+   int rc = ocxl_link_update_pe(ctx->afu->fn->link,
+   ctx->pasid, ctx->tidr);
 
-   rc = ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);
if (rc)
return rc;
}
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index d50b861d7e57..8d2690a1a9de 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -76,7 +76,7 @@ struct spa {
  * limited number of opencapi slots on a system and lookup is only
  * done when the device is probed
  */
-struct link {
+struct ocxl_link {
struct list_head list;
struct kref ref;
int domain;
@@ -179,7 +179,7 @@ static void xsl_fault_handler_bh(struct work_struct 
*fault_work)
 
 static irqreturn_t xsl_fault_handler(int irq, void *data)
 {
-   struct link *link = (struct link *) data;
+   struct ocxl_link *link = (struct ocxl_link *) data;
struct spa *spa = link->spa;
u64 dsisr, dar, pe_handle;
struct pe_data *pe_data;
@@ -256,7 +256,7 @@ static int map_irq_registers(struct pci_dev *dev, struct 
spa *spa)
&spa->reg_tfc, &spa->reg_pe_handle);
 }
 
-static int setup_xsl_irq(struct pci_dev *dev, struct link *link)
+static int setup_xsl_irq(struct pci_dev *dev, struct ocxl_link *link)
 {
struct spa *spa = link->spa;
int rc;
@@ -311,7 +311,7 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link 
*link)
return rc;
 }
 
-static void release_xsl_irq(struct link *link)
+static void release_xsl_irq(struct ocxl_link *link)
 {
struct spa *spa = link->spa;
 
@@ -323,7 +323,7 @@ static void release_xsl_irq(struct link *link)
unmap_irq_registers(spa);
 }
 
-static int alloc_spa(struct pci_dev *dev, struct link *link)
+static int alloc_spa(struct pci_dev *dev, struct ocxl_link *link)
 {
struct spa *spa;
 
@@ -350,7 +350,7 @@ static int alloc_spa(struct pci_dev *dev, struct link *link)
return 0;
 }
 
-static void free_spa(struct link *link)
+static void free_spa(struct ocxl_link *link)
 {
struct spa *spa = link->spa;
 
@@ -364,12 +364,12 @@ static void free_spa(struct link *link)
}
 }
 
-static int alloc_link(struct pci_dev *dev, int PE_mask, struct link **out_link)
+static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link 
**out_link)
 {
-   struct link *link;
+   struct ocxl_link *link;
int rc;
 
-   link = kzalloc(sizeof(struct link), GFP_KERNEL);
+   link = kzalloc(sizeof(struct ocxl_link), GFP_KERNEL);
if (!link)
return -ENOMEM;
 
@@ -405,7 +405,7 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
struct link **out_link)
return rc;
 }
 
-static void free_link(struct link *link)
+static void free_link(struct ocxl_link *link)
 {
release_xsl_irq(link);
free_spa(link);
@@ -415,7 +415,7 @@ static void free_link(struct link *link)
 int ocxl_link_setup(struct pci_dev *dev, int PE_mask, void **link_handle)
 {
int rc = 0;
-   struct link *link;
+   struct ocxl_link *link;
 
mutex_lock(&links_list_lock);
list_for_each_entry(link, &links_list, list) {
@@ -442,7 +442,7 @@ EXPORT_SYMBOL_GPL(ocxl_link_setup);
 
 static void release_xsl(struct kref *ref)
 {
-   struct link *link = container_of(ref, struct link, ref);
+   struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);
 
list_del(&link->list);
/* call platform code before releasing data */
@@ -452,7 +452,7 @@ static void release_xsl(struct kref *ref)
 
 void ocxl_link_release(struct pci_dev *dev, void *link_handle)
 {
-   struct link *link = (struct link *) link_handle;
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
 
mutex_lock(&links_list_lock);
kref_put(&link->ref, release_xsl);
@@ -488,7 +488,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data)
 {
-   struct link *link = (struct link *) link_handle;
+   

[PATCH v2 7/7] ocxl: Provide global MMIO accessors for external drivers

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

External drivers that communicate via OpenCAPI will need to make
MMIO calls to interact with the devices.

Signed-off-by: Alastair D'Silva 
Reviewed-by: Greg Kurz 
---
 drivers/misc/ocxl/Makefile |   2 +-
 drivers/misc/ocxl/mmio.c   | 234 +
 include/misc/ocxl.h| 110 +
 3 files changed, 345 insertions(+), 1 deletion(-)
 create mode 100644 drivers/misc/ocxl/mmio.c

diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
index bc4e39bfda7b..d07d1bb8e8d4 100644
--- a/drivers/misc/ocxl/Makefile
+++ b/drivers/misc/ocxl/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0+
 ccflags-$(CONFIG_PPC_WERROR)   += -Werror
 
-ocxl-y += main.o pci.o config.o file.o pasid.o
+ocxl-y += main.o pci.o config.o file.o pasid.o mmio.o
 ocxl-y += link.o context.o afu_irq.o sysfs.o trace.o
 ocxl-y += core.o
 obj-$(CONFIG_OCXL) += ocxl.o
diff --git a/drivers/misc/ocxl/mmio.c b/drivers/misc/ocxl/mmio.c
new file mode 100644
index ..7f6ebae1c6c7
--- /dev/null
+++ b/drivers/misc/ocxl/mmio.c
@@ -0,0 +1,234 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2017 IBM Corp.
+#include 
+#include "trace.h"
+#include "ocxl_internal.h"
+
+int ocxl_global_mmio_read32(struct ocxl_afu *afu, size_t offset,
+   enum ocxl_endian endian, u32 *val)
+{
+   if (offset > afu->config.global_mmio_size - 4)
+   return -EINVAL;
+
+#ifdef __BIG_ENDIAN__
+   if (endian == OCXL_HOST_ENDIAN)
+   endian = OCXL_BIG_ENDIAN;
+#endif
+
+   switch (endian) {
+   case OCXL_BIG_ENDIAN:
+   *val = readl_be((char *)afu->global_mmio_ptr + offset);
+   break;
+
+   default:
+   *val = readl((char *)afu->global_mmio_ptr + offset);
+   break;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(ocxl_global_mmio_read32);
+
+int ocxl_global_mmio_read64(struct ocxl_afu *afu, size_t offset,
+   enum ocxl_endian endian, u64 *val)
+{
+   if (offset > afu->config.global_mmio_size - 8)
+   return -EINVAL;
+
+#ifdef __BIG_ENDIAN__
+   if (endian == OCXL_HOST_ENDIAN)
+   endian = OCXL_BIG_ENDIAN;
+#endif
+
+   switch (endian) {
+   case OCXL_BIG_ENDIAN:
+   *val = readq_be((char *)afu->global_mmio_ptr + offset);
+   break;
+
+   default:
+   *val = readq((char *)afu->global_mmio_ptr + offset);
+   break;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(ocxl_global_mmio_read64);
+
+int ocxl_global_mmio_write32(struct ocxl_afu *afu, size_t offset,
+   enum ocxl_endian endian, u32 val)
+{
+   if (offset > afu->config.global_mmio_size - 4)
+   return -EINVAL;
+
+#ifdef __BIG_ENDIAN__
+   if (endian == OCXL_HOST_ENDIAN)
+   endian = OCXL_BIG_ENDIAN;
+#endif
+
+   switch (endian) {
+   case OCXL_BIG_ENDIAN:
+   writel_be(val, (char *)afu->global_mmio_ptr + offset);
+   break;
+
+   default:
+   writel(val, (char *)afu->global_mmio_ptr + offset);
+   break;
+   }
+
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(ocxl_global_mmio_write32);
+
+int ocxl_global_mmio_write64(struct ocxl_afu *afu, size_t offset,
+   enum ocxl_endian endian, u64 val)
+{
+   if (offset > afu->config.global_mmio_size - 8)
+   return -EINVAL;
+
+#ifdef __BIG_ENDIAN__
+   if (endian == OCXL_HOST_ENDIAN)
+   endian = OCXL_BIG_ENDIAN;
+#endif
+
+   switch (endian) {
+   case OCXL_BIG_ENDIAN:
+   writeq_be(val, (char *)afu->global_mmio_ptr + offset);
+   break;
+
+   default:
+   writeq(val, (char *)afu->global_mmio_ptr + offset);
+   break;
+   }
+
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(ocxl_global_mmio_write64);
+
+int ocxl_global_mmio_set32(struct ocxl_afu *afu, size_t offset,
+   enum ocxl_endian endian, u32 mask)
+{
+   u32 tmp;
+
+   if (offset > afu->config.global_mmio_size - 4)
+   return -EINVAL;
+
+#ifdef __BIG_ENDIAN__
+   if (endian == OCXL_HOST_ENDIAN)
+   endian = OCXL_BIG_ENDIAN;
+#endif
+
+   switch (endian) {
+   case OCXL_BIG_ENDIAN:
+   tmp = readl_be((char *)afu->global_mmio_ptr + offset);
+   tmp |= mask;
+   writel_be(tmp, (char *)afu->global_mmio_ptr + offset);
+   break;
+
+   default:
+   tmp = readl((char *)afu->global_mmio_ptr + offset);
+   tmp |= mask;
+   writel(tmp, (char *)afu->global_mmio_ptr + offset);
+   break;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(ocxl_global_mmio_set32);
+
+int ocxl_global_mmio_set64(

[PATCH v2 6/7] ocxl: move event_fd handling to frontend

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

Event_fd is only used in the driver frontend, so it does not
need to exist in the backend code. Relocate it to the frontend
and provide an opaque mechanism for consumers instead.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/afu_irq.c   | 69 +--
 drivers/misc/ocxl/file.c  | 22 +-
 drivers/misc/ocxl/ocxl_internal.h |  5 ---
 include/misc/ocxl.h   | 46 +
 4 files changed, 104 insertions(+), 38 deletions(-)

diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c
index 2d410cd6f817..f23cd585e737 100644
--- a/drivers/misc/ocxl/afu_irq.c
+++ b/drivers/misc/ocxl/afu_irq.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0+
 // Copyright 2017 IBM Corp.
 #include 
-#include 
+#include 
 #include "ocxl_internal.h"
 #include "trace.h"
 
@@ -11,7 +11,9 @@ struct afu_irq {
unsigned int virq;
char *name;
u64 trigger_page;
-   struct eventfd_ctx *ev_ctx;
+   irqreturn_t (*handler)(void *private);
+   void (*free_private)(void *private);
+   void *private;
 };
 
 int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
@@ -24,14 +26,42 @@ u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int 
irq_id)
return ctx->afu->irq_base_offset + (irq_id << PAGE_SHIFT);
 }
 
+int ocxl_irq_set_handler(struct ocxl_context *ctx, int irq_id,
+   irqreturn_t (*handler)(void *private),
+   void (*free_private)(void *private),
+   void *private)
+{
+   struct afu_irq *irq;
+   int rc;
+
+   mutex_lock(&ctx->irq_lock);
+   irq = idr_find(&ctx->irq_idr, irq_id);
+   if (!irq) {
+   rc = -EINVAL;
+   goto unlock;
+   }
+
+   irq->handler = handler;
+   irq->private = private;
+
+   rc = 0;
+   goto unlock;
+
+unlock:
+   mutex_unlock(&ctx->irq_lock);
+   return rc;
+}
+
 static irqreturn_t afu_irq_handler(int virq, void *data)
 {
struct afu_irq *irq = (struct afu_irq *) data;
 
trace_ocxl_afu_irq_receive(virq);
-   if (irq->ev_ctx)
-   eventfd_signal(irq->ev_ctx, 1);
-   return IRQ_HANDLED;
+
+   if (irq->handler)
+   return irq->handler(irq->private);
+
+   return IRQ_HANDLED; // Just drop it on the ground
 }
 
 static int setup_afu_irq(struct ocxl_context *ctx, struct afu_irq *irq)
@@ -126,8 +156,8 @@ static void afu_irq_free(struct afu_irq *irq, struct 
ocxl_context *ctx)
ocxl_irq_id_to_offset(ctx, irq->id),
1 << PAGE_SHIFT, 1);
release_afu_irq(irq);
-   if (irq->ev_ctx)
-   eventfd_ctx_put(irq->ev_ctx);
+   if (irq->free_private)
+   irq->free_private(irq->private);
ocxl_link_free_irq(ctx->afu->fn->link, irq->hw_irq);
kfree(irq);
 }
@@ -160,31 +190,6 @@ void ocxl_afu_irq_free_all(struct ocxl_context *ctx)
mutex_unlock(&ctx->irq_lock);
 }
 
-int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, int irq_id, int eventfd)
-{
-   struct afu_irq *irq;
-   struct eventfd_ctx *ev_ctx;
-   int rc = 0;
-
-   mutex_lock(&ctx->irq_lock);
-   irq = idr_find(&ctx->irq_idr, irq_id);
-   if (!irq) {
-   rc = -EINVAL;
-   goto unlock;
-   }
-
-   ev_ctx = eventfd_ctx_fdget(eventfd);
-   if (IS_ERR(ev_ctx)) {
-   rc = -EINVAL;
-   goto unlock;
-   }
-
-   irq->ev_ctx = ev_ctx;
-unlock:
-   mutex_unlock(&ctx->irq_lock);
-   return rc;
-}
-
 u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, int irq_id)
 {
struct afu_irq *irq;
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index d13297336253..42214b0c956a 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -191,11 +192,27 @@ static long afu_ioctl_get_features(struct ocxl_context 
*ctx,
x == OCXL_IOCTL_GET_FEATURES ? "GET_FEATURES" : \
"UNKNOWN")
 
+static irqreturn_t irq_handler(void *private)
+{
+   struct eventfd_ctx *ev_ctx = private;
+
+   eventfd_signal(ev_ctx, 1);
+   return IRQ_HANDLED;
+}
+
+static void irq_free(void *private)
+{
+   struct eventfd_ctx *ev_ctx = private;
+
+   eventfd_ctx_put(ev_ctx);
+}
+
 static long afu_ioctl(struct file *file, unsigned int cmd,
unsigned long args)
 {
struct ocxl_context *ctx = file->private_data;
struct ocxl_ioctl_irq_fd irq_fd;
+   struct eventfd_ctx *ev_ctx;
int irq_id;
u64 irq_offset;
long rc;
@@ -247,7 +264,10 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
if (irq_fd.reserved)
return -EINVAL;
irq_id = ocxl_irq_offset_to_id(ctx, irq_fd.irq_offset);
-   

[PATCH v2 5/7] ocxl: afu_irq only deals with IRQ IDs, not offsets

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

The use of offsets is required only in the frontend, so alter
the IRQ API to only work with IRQ IDs in the backend.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/afu_irq.c   | 34 +++
 drivers/misc/ocxl/context.c   |  7 +--
 drivers/misc/ocxl/file.c  | 13 +++-
 drivers/misc/ocxl/ocxl_internal.h | 10 +
 drivers/misc/ocxl/trace.h | 12 ---
 5 files changed, 39 insertions(+), 37 deletions(-)

diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c
index 11ab996657a2..2d410cd6f817 100644
--- a/drivers/misc/ocxl/afu_irq.c
+++ b/drivers/misc/ocxl/afu_irq.c
@@ -14,14 +14,14 @@ struct afu_irq {
struct eventfd_ctx *ev_ctx;
 };
 
-static int irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
+int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
 {
return (offset - ctx->afu->irq_base_offset) >> PAGE_SHIFT;
 }
 
-static u64 irq_id_to_offset(struct ocxl_context *ctx, int id)
+u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id)
 {
-   return ctx->afu->irq_base_offset + (id << PAGE_SHIFT);
+   return ctx->afu->irq_base_offset + (irq_id << PAGE_SHIFT);
 }
 
 static irqreturn_t afu_irq_handler(int virq, void *data)
@@ -69,7 +69,7 @@ static void release_afu_irq(struct afu_irq *irq)
kfree(irq->name);
 }
 
-int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset)
+int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id)
 {
struct afu_irq *irq;
int rc;
@@ -101,11 +101,11 @@ int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 
*irq_offset)
if (rc)
goto err_alloc;
 
-   *irq_offset = irq_id_to_offset(ctx, irq->id);
-
-   trace_ocxl_afu_irq_alloc(ctx->pasid, irq->id, irq->virq, irq->hw_irq,
-   *irq_offset);
+   trace_ocxl_afu_irq_alloc(ctx->pasid, irq->id, irq->virq, irq->hw_irq);
mutex_unlock(&ctx->irq_lock);
+
+   *irq_id = irq->id;
+
return 0;
 
 err_alloc:
@@ -123,7 +123,7 @@ static void afu_irq_free(struct afu_irq *irq, struct 
ocxl_context *ctx)
trace_ocxl_afu_irq_free(ctx->pasid, irq->id);
if (ctx->mapping)
unmap_mapping_range(ctx->mapping,
-   irq_id_to_offset(ctx, irq->id),
+   ocxl_irq_id_to_offset(ctx, irq->id),
1 << PAGE_SHIFT, 1);
release_afu_irq(irq);
if (irq->ev_ctx)
@@ -132,14 +132,13 @@ static void afu_irq_free(struct afu_irq *irq, struct 
ocxl_context *ctx)
kfree(irq);
 }
 
-int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset)
+int ocxl_afu_irq_free(struct ocxl_context *ctx, int irq_id)
 {
struct afu_irq *irq;
-   int id = irq_offset_to_id(ctx, irq_offset);
 
mutex_lock(&ctx->irq_lock);
 
-   irq = idr_find(&ctx->irq_idr, id);
+   irq = idr_find(&ctx->irq_idr, irq_id);
if (!irq) {
mutex_unlock(&ctx->irq_lock);
return -EINVAL;
@@ -161,14 +160,14 @@ void ocxl_afu_irq_free_all(struct ocxl_context *ctx)
mutex_unlock(&ctx->irq_lock);
 }
 
-int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset, int eventfd)
+int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, int irq_id, int eventfd)
 {
struct afu_irq *irq;
struct eventfd_ctx *ev_ctx;
-   int rc = 0, id = irq_offset_to_id(ctx, irq_offset);
+   int rc = 0;
 
mutex_lock(&ctx->irq_lock);
-   irq = idr_find(&ctx->irq_idr, id);
+   irq = idr_find(&ctx->irq_idr, irq_id);
if (!irq) {
rc = -EINVAL;
goto unlock;
@@ -186,14 +185,13 @@ int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 
irq_offset, int eventfd)
return rc;
 }
 
-u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset)
+u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, int irq_id)
 {
struct afu_irq *irq;
-   int id = irq_offset_to_id(ctx, irq_offset);
u64 addr = 0;
 
mutex_lock(&ctx->irq_lock);
-   irq = idr_find(&ctx->irq_idr, id);
+   irq = idr_find(&ctx->irq_idr, irq_id);
if (irq)
addr = irq->trigger_page;
mutex_unlock(&ctx->irq_lock);
diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 9a37e9632cd9..c04887591837 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -93,8 +93,9 @@ static vm_fault_t map_afu_irq(struct vm_area_struct *vma, 
unsigned long address,
u64 offset, struct ocxl_context *ctx)
 {
u64 trigger_addr;
+   int irq_id = ocxl_irq_offset_to_id(ctx, offset);
 
-   trigger_addr = ocxl_afu_irq_get_addr(ctx, offset);
+   trigger_addr = ocxl_afu_irq_get_addr(ctx, irq_id);
if (!trigger_addr)
return VM_FAULT_SIGBUS;
 
@@ -154,12 +155,14 @@ static const struct vm_operations_struct ocxl_vmops = {

[PATCH v2 4/7] ocxl: Allow external drivers to use OpenCAPI contexts

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

Most OpenCAPI operations require a valid context, so
exposing these functions to external drivers is necessary.

Signed-off-by: Alastair D'Silva 
Reviewed-by: Greg Kurz 
---
 drivers/misc/ocxl/context.c   |  9 +--
 drivers/misc/ocxl/file.c  |  2 +-
 drivers/misc/ocxl/ocxl_internal.h |  6 -
 include/misc/ocxl.h   | 45 +++
 4 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 371ef17bba33..9a37e9632cd9 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -8,6 +8,7 @@ struct ocxl_context *ocxl_context_alloc(void)
 {
return kzalloc(sizeof(struct ocxl_context), GFP_KERNEL);
 }
+EXPORT_SYMBOL_GPL(ocxl_context_alloc);
 
 int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
struct address_space *mapping)
@@ -43,6 +44,7 @@ int ocxl_context_init(struct ocxl_context *ctx, struct 
ocxl_afu *afu,
ocxl_afu_get(afu);
return 0;
 }
+EXPORT_SYMBOL_GPL(ocxl_context_init);
 
 /*
  * Callback for when a translation fault triggers an error
@@ -63,7 +65,7 @@ static void xsl_fault_error(void *data, u64 addr, u64 dsisr)
wake_up_all(&ctx->events_wq);
 }
 
-int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
+int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct 
*mm)
 {
int rc;
 
@@ -75,7 +77,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
}
 
rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
-   current->mm->context.id, ctx->tidr, amr, current->mm,
+   mm->context.id, ctx->tidr, amr, mm,
xsl_fault_error, ctx);
if (rc)
goto out;
@@ -85,6 +87,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
mutex_unlock(&ctx->status_mutex);
return rc;
 }
+EXPORT_SYMBOL_GPL(ocxl_context_attach);
 
 static vm_fault_t map_afu_irq(struct vm_area_struct *vma, unsigned long 
address,
u64 offset, struct ocxl_context *ctx)
@@ -243,6 +246,7 @@ int ocxl_context_detach(struct ocxl_context *ctx)
}
return 0;
 }
+EXPORT_SYMBOL_GPL(ocxl_context_detach);
 
 void ocxl_context_detach_all(struct ocxl_afu *afu)
 {
@@ -280,3 +284,4 @@ void ocxl_context_free(struct ocxl_context *ctx)
ocxl_afu_put(ctx->afu);
kfree(ctx);
 }
+EXPORT_SYMBOL_GPL(ocxl_context_free);
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index 1f17f8706e29..665422c6c8a0 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -94,7 +94,7 @@ static long afu_ioctl_attach(struct ocxl_context *ctx,
return -EINVAL;
 
amr = arg.amr & mfspr(SPRN_UAMOR);
-   rc = ocxl_context_attach(ctx, amr);
+   rc = ocxl_context_attach(ctx, amr, current->mm);
return rc;
 }
 
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 05930a29f606..4fc7e9597ede 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -133,15 +133,9 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
  */
 int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
 
-struct ocxl_context *ocxl_context_alloc(void);
-int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
-   struct address_space *mapping);
-int ocxl_context_attach(struct ocxl_context *ctx, u64 amr);
 int ocxl_context_mmap(struct ocxl_context *ctx,
struct vm_area_struct *vma);
-int ocxl_context_detach(struct ocxl_context *ctx);
 void ocxl_context_detach_all(struct ocxl_afu *afu);
-void ocxl_context_free(struct ocxl_context *ctx);
 
 int ocxl_sysfs_register_afu(struct ocxl_afu *afu);
 void ocxl_sysfs_unregister_afu(struct ocxl_afu *afu);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 7d611ea68a59..a5e94da6c8db 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -116,6 +116,51 @@ const struct ocxl_fn_config *ocxl_function_config(struct 
ocxl_fn *fn);
  */
 void ocxl_function_close(struct ocxl_fn *fn);
 
+// Context allocation
+
+/**
+ * Allocate space for a new OpenCAPI context
+ *
+ * Returns NULL on failure
+ */
+struct ocxl_context *ocxl_context_alloc(void);
+
+/**
+ * Initialize an OpenCAPI context
+ *
+ * @ctx: The OpenCAPI context to initialize
+ * @afu: The AFU the context belongs to
+ * @mapping: The mapping to unmap when the context is closed (may be NULL)
+ */
+int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
+   struct address_space *mapping);
+
+/**
+ * Free an OpenCAPI context
+ *
+ * @ctx: The OpenCAPI context to free
+ */
+void ocxl_context_free(struct ocxl_context *ctx);
+
+/**
+ * Grant access to an MM to an OpenCAPI context
+ * @ctx: The OpenCAPI context to attach
+ * @amr: The value of the AMR register to restrict ac

[PATCH v2 3/7] ocxl: Create a clear delineation between ocxl backend & frontend

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

The OCXL driver contains both frontend code for interacting with userspace,
as well as backend code for interacting with the hardware.

This patch separates the backend code from the frontend so that it can be
used by other device drivers that communicate via OpenCAPI.

Relocate dev, cdev & sysfs files to the frontend code to allow external
drivers to maintain their own devices.

Reference counting on the device in the backend is replaced with kref
counting.

Move file & sysfs layer initialisation from core.c (backend) to
pci.c (frontend).

Create an ocxl_function oriented interface for initing devices &
enumerating AFUs.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/context.c   |   2 +-
 drivers/misc/ocxl/core.c  | 205 +++---
 drivers/misc/ocxl/file.c  | 125 --
 drivers/misc/ocxl/ocxl_internal.h |  39 +++---
 drivers/misc/ocxl/pci.c   |  61 -
 drivers/misc/ocxl/sysfs.c |  58 +
 include/misc/ocxl.h   | 121 --
 7 files changed, 416 insertions(+), 195 deletions(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 3498a0199bde..371ef17bba33 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -238,7 +238,7 @@ int ocxl_context_detach(struct ocxl_context *ctx)
}
rc = ocxl_link_remove_pe(ctx->afu->fn->link, ctx->pasid);
if (rc) {
-   dev_warn(&ctx->afu->dev,
+   dev_warn(&dev->dev,
"Couldn't remove PE entry cleanly: %d\n", rc);
}
return 0;
diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
index 2fd0c700e8a0..c632ec372342 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -13,16 +13,6 @@ static void ocxl_fn_put(struct ocxl_fn *fn)
put_device(&fn->dev);
 }
 
-struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu)
-{
-   return (get_device(&afu->dev) == NULL) ? NULL : afu;
-}
-
-void ocxl_afu_put(struct ocxl_afu *afu)
-{
-   put_device(&afu->dev);
-}
-
 static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
 {
struct ocxl_afu *afu;
@@ -31,6 +21,7 @@ static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
if (!afu)
return NULL;
 
+   kref_init(&afu->kref);
mutex_init(&afu->contexts_lock);
mutex_init(&afu->afu_control_lock);
idr_init(&afu->contexts_idr);
@@ -39,32 +30,26 @@ static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
return afu;
 }
 
-static void free_afu(struct ocxl_afu *afu)
+static void afu_release(struct kref *kref)
 {
+   struct ocxl_afu *afu = container_of(kref, struct ocxl_afu, kref);
+
idr_destroy(&afu->contexts_idr);
ocxl_fn_put(afu->fn);
kfree(afu);
 }
 
-static void free_afu_dev(struct device *dev)
+void ocxl_afu_get(struct ocxl_afu *afu)
 {
-   struct ocxl_afu *afu = to_ocxl_afu(dev);
-
-   ocxl_unregister_afu(afu);
-   free_afu(afu);
+   kref_get(&afu->kref);
 }
+EXPORT_SYMBOL_GPL(ocxl_afu_get);
 
-static int set_afu_device(struct ocxl_afu *afu, const char *location)
+void ocxl_afu_put(struct ocxl_afu *afu)
 {
-   struct ocxl_fn *fn = afu->fn;
-   int rc;
-
-   afu->dev.parent = &fn->dev;
-   afu->dev.release = free_afu_dev;
-   rc = dev_set_name(&afu->dev, "%s.%s.%hhu", afu->config.name, location,
-   afu->config.idx);
-   return rc;
+   kref_put(&afu->kref, afu_release);
 }
+EXPORT_SYMBOL_GPL(ocxl_afu_put);
 
 static int assign_afu_actag(struct ocxl_afu *afu)
 {
@@ -233,27 +218,25 @@ static int configure_afu(struct ocxl_afu *afu, u8 
afu_idx, struct pci_dev *dev)
if (rc)
return rc;
 
-   rc = set_afu_device(afu, dev_name(&dev->dev));
-   if (rc)
-   return rc;
-
rc = assign_afu_actag(afu);
if (rc)
return rc;
 
rc = assign_afu_pasid(afu);
-   if (rc) {
-   reclaim_afu_actag(afu);
-   return rc;
-   }
+   if (rc)
+   goto err_free_actag;
 
rc = map_mmio_areas(afu);
-   if (rc) {
-   reclaim_afu_pasid(afu);
-   reclaim_afu_actag(afu);
-   return rc;
-   }
+   if (rc)
+   goto err_free_pasid;
+
return 0;
+
+err_free_pasid:
+   reclaim_afu_pasid(afu);
+err_free_actag:
+   reclaim_afu_actag(afu);
+   return rc;
 }
 
 static void deconfigure_afu(struct ocxl_afu *afu)
@@ -265,16 +248,8 @@ static void deconfigure_afu(struct ocxl_afu *afu)
 
 static int activate_afu(struct pci_dev *dev, struct ocxl_afu *afu)
 {
-   int rc;
-
ocxl_config_set_afu_state(dev, afu->config.dvsec_afu_control_pos, 1);
-   /*
-* Char device creation is the last step, as processes can
-* call our driver immediately, so all our inits must be finished.
-*/
-   rc = ocxl_create_cdev(afu)

[PATCH v2 2/7] ocxl: Don't pass pci_dev around

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

This data is already available in a struct

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/core.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
index b47cfda83e46..2fd0c700e8a0 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -66,10 +66,11 @@ static int set_afu_device(struct ocxl_afu *afu, const char 
*location)
return rc;
 }
 
-static int assign_afu_actag(struct ocxl_afu *afu, struct pci_dev *dev)
+static int assign_afu_actag(struct ocxl_afu *afu)
 {
struct ocxl_fn *fn = afu->fn;
int actag_count, actag_offset;
+   struct pci_dev *pci_dev = to_pci_dev(fn->dev.parent);
 
/*
 * if there were not enough actags for the function, each afu
@@ -79,16 +80,16 @@ static int assign_afu_actag(struct ocxl_afu *afu, struct 
pci_dev *dev)
fn->actag_enabled / fn->actag_supported;
actag_offset = ocxl_actag_afu_alloc(fn, actag_count);
if (actag_offset < 0) {
-   dev_err(&afu->dev, "Can't allocate %d actags for AFU: %d\n",
+   dev_err(&pci_dev->dev, "Can't allocate %d actags for AFU: %d\n",
actag_count, actag_offset);
return actag_offset;
}
afu->actag_base = fn->actag_base + actag_offset;
afu->actag_enabled = actag_count;
 
-   ocxl_config_set_afu_actag(dev, afu->config.dvsec_afu_control_pos,
+   ocxl_config_set_afu_actag(pci_dev, afu->config.dvsec_afu_control_pos,
afu->actag_base, afu->actag_enabled);
-   dev_dbg(&afu->dev, "actag base=%d enabled=%d\n",
+   dev_dbg(&pci_dev->dev, "actag base=%d enabled=%d\n",
afu->actag_base, afu->actag_enabled);
return 0;
 }
@@ -103,10 +104,11 @@ static void reclaim_afu_actag(struct ocxl_afu *afu)
ocxl_actag_afu_free(afu->fn, start_offset, size);
 }
 
-static int assign_afu_pasid(struct ocxl_afu *afu, struct pci_dev *dev)
+static int assign_afu_pasid(struct ocxl_afu *afu)
 {
struct ocxl_fn *fn = afu->fn;
int pasid_count, pasid_offset;
+   struct pci_dev *pci_dev = to_pci_dev(fn->dev.parent);
 
/*
 * We only support the case where the function configuration
@@ -115,7 +117,7 @@ static int assign_afu_pasid(struct ocxl_afu *afu, struct 
pci_dev *dev)
pasid_count = 1 << afu->config.pasid_supported_log;
pasid_offset = ocxl_pasid_afu_alloc(fn, pasid_count);
if (pasid_offset < 0) {
-   dev_err(&afu->dev, "Can't allocate %d PASIDs for AFU: %d\n",
+   dev_err(&pci_dev->dev, "Can't allocate %d PASIDs for AFU: %d\n",
pasid_count, pasid_offset);
return pasid_offset;
}
@@ -123,10 +125,10 @@ static int assign_afu_pasid(struct ocxl_afu *afu, struct 
pci_dev *dev)
afu->pasid_count = 0;
afu->pasid_max = pasid_count;
 
-   ocxl_config_set_afu_pasid(dev, afu->config.dvsec_afu_control_pos,
+   ocxl_config_set_afu_pasid(pci_dev, afu->config.dvsec_afu_control_pos,
afu->pasid_base,
afu->config.pasid_supported_log);
-   dev_dbg(&afu->dev, "PASID base=%d, enabled=%d\n",
+   dev_dbg(&pci_dev->dev, "PASID base=%d, enabled=%d\n",
afu->pasid_base, pasid_count);
return 0;
 }
@@ -172,9 +174,10 @@ static void release_fn_bar(struct ocxl_fn *fn, int bar)
WARN_ON(fn->bar_used[idx] < 0);
 }
 
-static int map_mmio_areas(struct ocxl_afu *afu, struct pci_dev *dev)
+static int map_mmio_areas(struct ocxl_afu *afu)
 {
int rc;
+   struct pci_dev *pci_dev = to_pci_dev(afu->fn->dev.parent);
 
rc = reserve_fn_bar(afu->fn, afu->config.global_mmio_bar);
if (rc)
@@ -187,10 +190,10 @@ static int map_mmio_areas(struct ocxl_afu *afu, struct 
pci_dev *dev)
}
 
afu->global_mmio_start =
-   pci_resource_start(dev, afu->config.global_mmio_bar) +
+   pci_resource_start(pci_dev, afu->config.global_mmio_bar) +
afu->config.global_mmio_offset;
afu->pp_mmio_start =
-   pci_resource_start(dev, afu->config.pp_mmio_bar) +
+   pci_resource_start(pci_dev, afu->config.pp_mmio_bar) +
afu->config.pp_mmio_offset;
 
afu->global_mmio_ptr = ioremap(afu->global_mmio_start,
@@ -198,7 +201,7 @@ static int map_mmio_areas(struct ocxl_afu *afu, struct 
pci_dev *dev)
if (!afu->global_mmio_ptr) {
release_fn_bar(afu->fn, afu->config.pp_mmio_bar);
release_fn_bar(afu->fn, afu->config.global_mmio_bar);
-   dev_err(&dev->dev, "Error mapping global mmio area\n");
+   dev_err(&pci_dev->dev, "Error mapping global mmio area\n");
return -ENOMEM;
}
 
@@ -234,17 +237,17 @@ static int configure_af

[PATCH v2 1/7] ocxl: Split pci.c

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

In preparation for making core code available for external drivers,
move the core code out of pci.c and into core.c

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/Makefile|   1 +
 drivers/misc/ocxl/core.c  | 517 ++
 drivers/misc/ocxl/ocxl_internal.h |   5 +
 drivers/misc/ocxl/pci.c   | 517 --
 4 files changed, 523 insertions(+), 517 deletions(-)
 create mode 100644 drivers/misc/ocxl/core.c

diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
index 5229dcda8297..bc4e39bfda7b 100644
--- a/drivers/misc/ocxl/Makefile
+++ b/drivers/misc/ocxl/Makefile
@@ -3,6 +3,7 @@ ccflags-$(CONFIG_PPC_WERROR)+= -Werror
 
 ocxl-y += main.o pci.o config.o file.o pasid.o
 ocxl-y += link.o context.o afu_irq.o sysfs.o trace.o
+ocxl-y += core.o
 obj-$(CONFIG_OCXL) += ocxl.o
 
 # For tracepoints to include our trace.h from tracepoint infrastructure:
diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
new file mode 100644
index ..b47cfda83e46
--- /dev/null
+++ b/drivers/misc/ocxl/core.c
@@ -0,0 +1,517 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2017 IBM Corp.
+#include 
+#include "ocxl_internal.h"
+
+static struct ocxl_fn *ocxl_fn_get(struct ocxl_fn *fn)
+{
+   return (get_device(&fn->dev) == NULL) ? NULL : fn;
+}
+
+static void ocxl_fn_put(struct ocxl_fn *fn)
+{
+   put_device(&fn->dev);
+}
+
+struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu)
+{
+   return (get_device(&afu->dev) == NULL) ? NULL : afu;
+}
+
+void ocxl_afu_put(struct ocxl_afu *afu)
+{
+   put_device(&afu->dev);
+}
+
+static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
+{
+   struct ocxl_afu *afu;
+
+   afu = kzalloc(sizeof(struct ocxl_afu), GFP_KERNEL);
+   if (!afu)
+   return NULL;
+
+   mutex_init(&afu->contexts_lock);
+   mutex_init(&afu->afu_control_lock);
+   idr_init(&afu->contexts_idr);
+   afu->fn = fn;
+   ocxl_fn_get(fn);
+   return afu;
+}
+
+static void free_afu(struct ocxl_afu *afu)
+{
+   idr_destroy(&afu->contexts_idr);
+   ocxl_fn_put(afu->fn);
+   kfree(afu);
+}
+
+static void free_afu_dev(struct device *dev)
+{
+   struct ocxl_afu *afu = to_ocxl_afu(dev);
+
+   ocxl_unregister_afu(afu);
+   free_afu(afu);
+}
+
+static int set_afu_device(struct ocxl_afu *afu, const char *location)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int rc;
+
+   afu->dev.parent = &fn->dev;
+   afu->dev.release = free_afu_dev;
+   rc = dev_set_name(&afu->dev, "%s.%s.%hhu", afu->config.name, location,
+   afu->config.idx);
+   return rc;
+}
+
+static int assign_afu_actag(struct ocxl_afu *afu, struct pci_dev *dev)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int actag_count, actag_offset;
+
+   /*
+* if there were not enough actags for the function, each afu
+* reduces its count as well
+*/
+   actag_count = afu->config.actag_supported *
+   fn->actag_enabled / fn->actag_supported;
+   actag_offset = ocxl_actag_afu_alloc(fn, actag_count);
+   if (actag_offset < 0) {
+   dev_err(&afu->dev, "Can't allocate %d actags for AFU: %d\n",
+   actag_count, actag_offset);
+   return actag_offset;
+   }
+   afu->actag_base = fn->actag_base + actag_offset;
+   afu->actag_enabled = actag_count;
+
+   ocxl_config_set_afu_actag(dev, afu->config.dvsec_afu_control_pos,
+   afu->actag_base, afu->actag_enabled);
+   dev_dbg(&afu->dev, "actag base=%d enabled=%d\n",
+   afu->actag_base, afu->actag_enabled);
+   return 0;
+}
+
+static void reclaim_afu_actag(struct ocxl_afu *afu)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int start_offset, size;
+
+   start_offset = afu->actag_base - fn->actag_base;
+   size = afu->actag_enabled;
+   ocxl_actag_afu_free(afu->fn, start_offset, size);
+}
+
+static int assign_afu_pasid(struct ocxl_afu *afu, struct pci_dev *dev)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int pasid_count, pasid_offset;
+
+   /*
+* We only support the case where the function configuration
+* requested enough PASIDs to cover all AFUs.
+*/
+   pasid_count = 1 << afu->config.pasid_supported_log;
+   pasid_offset = ocxl_pasid_afu_alloc(fn, pasid_count);
+   if (pasid_offset < 0) {
+   dev_err(&afu->dev, "Can't allocate %d PASIDs for AFU: %d\n",
+   pasid_count, pasid_offset);
+   return pasid_offset;
+   }
+   afu->pasid_base = fn->pasid_base + pasid_offset;
+   afu->pasid_count = 0;
+   afu->pasid_max = pasid_count;
+
+   ocxl_config_set_afu_pasid(dev, afu->config.dvsec_afu_control_pos,
+   afu->pasid_base

[PATCH v2 0/7] Refactor OCXL driver to allow external drivers to use it

2019-03-19 Thread Alastair D'Silva
From: Alastair D'Silva 

This series reworks the OpenCAPI driver to split frontend
(driver interactions) from backend (hardware interactions).

This allows external drivers to utilise the core of the
generic OpenCAPI driver to communicate with specific
OpenCAPI hardware.

Changelog:
V2:
  - Reorder patches as some required structs that were only available later
  - Add dev.release implementation for ocxl_file_info to address warning on
driver unload (ocxl: Create a clear delineation...)
  - Set output var irq_id in ocxl_afu_irq_alloc (ocxl: afu_irq only deals...)
  - Bump copyright year (ocxl: Provide global MMIO accessors...,
ocxl: Split pci.c)

Alastair D'Silva (7):
  ocxl: Split pci.c
  ocxl: Don't pass pci_dev around
  ocxl: Create a clear delineation between ocxl backend & frontend
  ocxl: Allow external drivers to use OpenCAPI contexts
  ocxl: afu_irq only deals with IRQ IDs, not offsets
  ocxl: move event_fd handling to frontend
  ocxl: Provide global MMIO accessors for external drivers

 drivers/misc/ocxl/Makefile|   3 +-
 drivers/misc/ocxl/afu_irq.c   |  97 ++---
 drivers/misc/ocxl/context.c   |  18 +-
 drivers/misc/ocxl/core.c  | 578 ++
 drivers/misc/ocxl/file.c  | 160 ++---
 drivers/misc/ocxl/mmio.c  | 234 
 drivers/misc/ocxl/ocxl_internal.h |  49 +--
 drivers/misc/ocxl/pci.c   | 562 ++---
 drivers/misc/ocxl/sysfs.c |  58 +--
 drivers/misc/ocxl/trace.h |  12 +-
 include/misc/ocxl.h   | 322 -
 11 files changed, 1390 insertions(+), 703 deletions(-)
 create mode 100644 drivers/misc/ocxl/core.c
 create mode 100644 drivers/misc/ocxl/mmio.c

-- 
2.20.1



[PATCH v3 3/3] Documentation/vmcoreinfo: Add documentation for 'MAX_PHYSMEM_BITS'

2019-03-19 Thread Bhupesh Sharma
Add documentation for 'MAX_PHYSMEM_BITS' variable being added to
vmcoreinfo.

'MAX_PHYSMEM_BITS' defines the maximum supported physical address
space memory.

Cc: Boris Petkov 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: James Morse 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Dave Anderson 
Cc: Kazuhito Hagio 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: ke...@lists.infradead.org
Signed-off-by: Bhupesh Sharma 
---
 Documentation/kdump/vmcoreinfo.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/kdump/vmcoreinfo.txt 
b/Documentation/kdump/vmcoreinfo.txt
index bb94a4bd597a..f5a11388dc49 100644
--- a/Documentation/kdump/vmcoreinfo.txt
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -95,6 +95,11 @@ It exists in the sparse memory mapping model, and it is also 
somewhat
 similar to the mem_map variable, both of them are used to translate an
 address.
 
+MAX_PHYSMEM_BITS
+
+
+Defines the maximum supported physical address space memory.
+
 page
 
 
-- 
2.7.4



[PATCH v3 2/3] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo

2019-03-19 Thread Bhupesh Sharma
Right now user-space tools like 'makedumpfile' and 'crash' need to rely
on a best-guess method of determining value of 'MAX_PHYSMEM_BITS'
supported by underlying kernel.

This value is used in user-space code to calculate the bit-space
required to store a section for SPARESMEM (similar to the existing
calculation method used in the kernel implementation):

  #define SECTIONS_SHIFT(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)

Now, regressions have been reported in user-space utilities
like 'makedumpfile' and 'crash' on arm64, with the recently added
kernel support for 52-bit physical address space, as there is
no clear method of determining this value in user-space
(other than reading kernel CONFIG flags).

As per suggestion from makedumpfile maintainer (Kazu), it makes more
sense to append 'MAX_PHYSMEM_BITS' to vmcoreinfo in the core code itself
rather than in arch-specific code, so that the user-space code for other
archs can also benefit from this addition to the vmcoreinfo and use it
as a standard way of determining 'SECTIONS_SHIFT' value in user-land.

A reference 'makedumpfile' implementation which reads the
'MAX_PHYSMEM_BITS' value from vmcoreinfo in a arch-independent fashion
is available here:

[0]. 
https://github.com/bhupesh-sharma/makedumpfile/blob/remove-max-phys-mem-bit-v1/arch/ppc64.c#L471

Cc: Boris Petkov 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: James Morse 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Dave Anderson 
Cc: Kazuhito Hagio 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: ke...@lists.infradead.org
Signed-off-by: Bhupesh Sharma 
---
 kernel/crash_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 093c9f917ed0..495f09084696 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -415,6 +415,7 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS);
VMCOREINFO_STRUCT_SIZE(mem_section);
VMCOREINFO_OFFSET(mem_section, section_mem_map);
+   VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);
 #endif
VMCOREINFO_STRUCT_SIZE(page);
VMCOREINFO_STRUCT_SIZE(pglist_data);
-- 
2.7.4



[PATCH v3 0/3] Append new variables to vmcoreinfo (PTRS_PER_PGD for arm64 and MAX_PHYSMEM_BITS for all archs)

2019-03-19 Thread Bhupesh Sharma
Changes since v2:

- v2 can be seen here:
  http://lists.infradead.org/pipermail/kexec/2019-March/022531.html
- Protected 'MAX_PHYSMEM_BITS' vmcoreinfo variable under CONFIG_SPARSEMEM
  ifdef sections, as suggested by Kazu.
- Updated vmcoreinfo documentation to add description about
  'MAX_PHYSMEM_BITS' variable (via [PATCH 3/3]).

Changes since v1:

- v1 was sent out as a single patch which can be seen here:
  http://lists.infradead.org/pipermail/kexec/2019-February/022411.html

- v2 breaks the single patch into two independent patches:
  [PATCH 1/2] appends 'PTRS_PER_PGD' to vmcoreinfo for arm64 arch, whereas
  [PATCH 2/2] appends 'MAX_PHYSMEM_BITS' to vmcoreinfo in core kernel code (all 
archs)

This patchset primarily fixes the regression reported in user-space
utilities like 'makedumpfile' and 'crash-utility' on arm64 architecture
with the availability of 52-bit address space feature in underlying
kernel. These regressions have been reported both on CPUs which don't
support ARMv8.2 extensions (i.e. LVA, LPA) and are running newer kernels
and also on prototype platforms (like ARMv8 FVP simulator model) which
support ARMv8.2 extensions and are running newer kernels.

The reason for these regressions is that right now user-space tools
have no direct access to these values (since these are not exported
from the kernel) and hence need to rely on a best-guess method of
determining value of 'PTRS_PER_PGD' and 'MAX_PHYSMEM_BITS' supported
by underlying kernel.

Exporting these values via vmcoreinfo will help user-land in such cases.
In addition, as per suggestion from makedumpfile maintainer (Kazu),
it makes more sense to append 'MAX_PHYSMEM_BITS' to
vmcoreinfo in the core code itself rather than in arm64 arch-specific
code, so that the user-space code for other archs can also benefit from
this addition to the vmcoreinfo and use it as a standard way of
determining 'SECTIONS_SHIFT' value in user-land.

Cc: Mark Rutland 
Cc: James Morse 
Cc: Will Deacon 
Cc: Boris Petkov 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Dave Anderson 
Cc: Kazuhito Hagio 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: ke...@lists.infradead.org

Bhupesh Sharma (3):
  arm64, vmcoreinfo : Append 'PTRS_PER_PGD' to vmcoreinfo
  crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
  Documentation/vmcoreinfo: Add documentation for 'MAX_PHYSMEM_BITS'

 Documentation/kdump/vmcoreinfo.txt | 5 +
 arch/arm64/kernel/crash_core.c | 1 +
 kernel/crash_core.c| 1 +
 3 files changed, 7 insertions(+)

-- 
2.7.4



Re: [PATCH v3 06/17] KVM: PPC: Book3S HV: XIVE: add controls for the EQ configuration

2019-03-19 Thread David Gibson
On Tue, Mar 19, 2019 at 04:47:20PM +0100, Cédric Le Goater wrote:
> On 3/19/19 5:54 AM, David Gibson wrote:
> > On Mon, Mar 18, 2019 at 03:12:10PM +0100, Cédric Le Goater wrote:
> >> On 3/18/19 4:23 AM, David Gibson wrote:
> >>> On Fri, Mar 15, 2019 at 01:05:58PM +0100, Cédric Le Goater wrote:
>  These controls will be used by the H_INT_SET_QUEUE_CONFIG and
>  H_INT_GET_QUEUE_CONFIG hcalls from QEMU to configure the underlying
>  Event Queue in the XIVE IC. They will also be used to restore the
>  configuration of the XIVE EQs and to capture the internal run-time
>  state of the EQs. Both 'get' and 'set' rely on an OPAL call to access
>  the EQ toggle bit and EQ index which are updated by the XIVE IC when
>  event notifications are enqueued in the EQ.
> 
>  The value of the guest physical address of the event queue is saved in
>  the XIVE internal xive_q structure for later use. That is when
>  migration needs to mark the EQ pages dirty to capture a consistent
>  memory state of the VM.
> 
>  To be noted that H_INT_SET_QUEUE_CONFIG does not require the extra
>  OPAL call setting the EQ toggle bit and EQ index to configure the EQ,
>  but restoring the EQ state will.
> 
>  Signed-off-by: Cédric Le Goater 
>  ---
> 
>   Changes since v2 :
>   
>   - fixed comments on the KVM device attribute definitions
>   - fixed check on supported EQ size to restrict to 64K pages
>   - checked kvm_eq.flags that need to be zero
>   - removed the OPAL call when EQ qtoggle bit and index are zero. 
> 
>   arch/powerpc/include/asm/xive.h|   2 +
>   arch/powerpc/include/uapi/asm/kvm.h|  21 ++
>   arch/powerpc/kvm/book3s_xive.h |   2 +
>   arch/powerpc/kvm/book3s_xive.c |  15 +-
>   arch/powerpc/kvm/book3s_xive_native.c  | 232 +
>   Documentation/virtual/kvm/devices/xive.txt |  31 +++
>   6 files changed, 297 insertions(+), 6 deletions(-)
> 
>  diff --git a/arch/powerpc/include/asm/xive.h 
>  b/arch/powerpc/include/asm/xive.h
>  index b579a943407b..46891f321606 100644
>  --- a/arch/powerpc/include/asm/xive.h
>  +++ b/arch/powerpc/include/asm/xive.h
>  @@ -73,6 +73,8 @@ struct xive_q {
>   u32 esc_irq;
>   atomic_tcount;
>   atomic_tpending_count;
>  +u64 guest_qpage;
>  +u32 guest_qsize;
>   };
>   
>   /* Global enable flags for the XIVE support */
>  diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>  b/arch/powerpc/include/uapi/asm/kvm.h
>  index 12bb01baf0ae..1cd728c87d7c 100644
>  --- a/arch/powerpc/include/uapi/asm/kvm.h
>  +++ b/arch/powerpc/include/uapi/asm/kvm.h
>  @@ -679,6 +679,7 @@ struct kvm_ppc_cpu_char {
>   #define KVM_DEV_XIVE_GRP_CTRL   1
>   #define KVM_DEV_XIVE_GRP_SOURCE 2   /* 64-bit source 
>  identifier */
>   #define KVM_DEV_XIVE_GRP_SOURCE_CONFIG  3   /* 64-bit source 
>  identifier */
>  +#define KVM_DEV_XIVE_GRP_EQ_CONFIG  4   /* 64-bit EQ identifier 
>  */
>   
>   /* Layout of 64-bit XIVE source attribute values */
>   #define KVM_XIVE_LEVEL_SENSITIVE(1ULL << 0)
>  @@ -694,4 +695,24 @@ struct kvm_ppc_cpu_char {
>   #define KVM_XIVE_SOURCE_EISN_SHIFT  33
>   #define KVM_XIVE_SOURCE_EISN_MASK   0xfffeULL
>   
>  +/* Layout of 64-bit EQ identifier */
>  +#define KVM_XIVE_EQ_PRIORITY_SHIFT  0
>  +#define KVM_XIVE_EQ_PRIORITY_MASK   0x7
>  +#define KVM_XIVE_EQ_SERVER_SHIFT3
>  +#define KVM_XIVE_EQ_SERVER_MASK 0xfff8ULL
>  +
>  +/* Layout of EQ configuration values (64 bytes) */
>  +struct kvm_ppc_xive_eq {
>  +__u32 flags;
>  +__u32 qsize;
>  +__u64 qpage;
>  +__u32 qtoggle;
>  +__u32 qindex;
>  +__u8  pad[40];
>  +};
>  +
>  +#define KVM_XIVE_EQ_FLAG_ENABLED0x0001
>  +#define KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY  0x0002
>  +#define KVM_XIVE_EQ_FLAG_ESCALATE   0x0004
>  +
>   #endif /* __LINUX_KVM_POWERPC_H */
>  diff --git a/arch/powerpc/kvm/book3s_xive.h 
>  b/arch/powerpc/kvm/book3s_xive.h
>  index ae26fe653d98..622f594d93e1 100644
>  --- a/arch/powerpc/kvm/book3s_xive.h
>  +++ b/arch/powerpc/kvm/book3s_xive.h
>  @@ -272,6 +272,8 @@ struct kvmppc_xive_src_block 
>  *kvmppc_xive_create_src_block(
>   struct kvmppc_xive *xive, int irq);
>   void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb);
>   int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio);
>  +int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
> 

[PATCH 7/8] powerpc/eeh: EEH for pSeries hot plug

2019-03-19 Thread Sam Bobroff
On PowerNV and pSeries, devices currently acquire EEH support from
several different places: Boot-time devices from eeh_probe_devices()
and eeh_addr_cache_build(), Virtual Function devices from the pcibios
bus add device hooks and hot plugged devices from pci_hp_add_devices()
(with other platforms using other methods as well).  Unfortunately,
pSeries machines currently discover hot plugged devices using
pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do
not receive EEH support.

Rather than adding another case for pci_rescan_bus(), this change
widens the scope of the pcibios bus add device hooks so that they can
handle all devices. As a side effect this also supports devices
discovered after manually rescanning via /sys/bus/pci/rescan.

Note that on PowerNV, this change allows the EEH subsystem to become
enabled after boot as long as it has not been forced off, which was
not previously possible (it was already possible on pSeries).

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh.c|  2 +-
 arch/powerpc/kernel/of_platform.c|  3 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  8 ++-
 arch/powerpc/platforms/pseries/eeh_pseries.c | 54 ++--
 4 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 7a406d58d2c0..217e14bb1fb6 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1291,7 +1291,7 @@ void eeh_add_device_late(struct pci_dev *dev)
struct pci_dn *pdn;
struct eeh_dev *edev;
 
-   if (!dev || !eeh_enabled())
+   if (!dev)
return;
 
pr_debug("EEH: Adding device %s\n", pci_name(dev));
diff --git a/arch/powerpc/kernel/of_platform.c 
b/arch/powerpc/kernel/of_platform.c
index becaec990140..d5818e9c4069 100644
--- a/arch/powerpc/kernel/of_platform.c
+++ b/arch/powerpc/kernel/of_platform.c
@@ -86,7 +86,8 @@ static int of_pci_phb_probe(struct platform_device *dev)
pcibios_claim_one_bus(phb->bus);
 
/* Finish EEH setup */
-   eeh_add_device_tree_late(phb->bus);
+   if (!eeh_has_flag(EEH_FORCE_DISABLED))
+   eeh_add_device_tree_late(phb->bus);
 
/* Add probed PCI devices to the device model */
pci_bus_add_devices(phb->bus);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 51c5b6bb9b0e..81b0923cc55f 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -47,7 +47,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
 {
struct pci_dn *pdn = pci_get_pdn(pdev);
 
-   if (!pdev->is_virtfn)
+   if (eeh_has_flag(EEH_FORCE_DISABLED))
return;
 
pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
@@ -479,7 +479,11 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 * Enable EEH explicitly so that we will do EEH check
 * while accessing I/O stuff
 */
-   eeh_add_flag(EEH_ENABLED);
+   if (!eeh_has_flag(EEH_ENABLED)) {
+   enable_irq(eeh_event_irq);
+   eeh_add_flag(EEH_PHB_ENABLED);
+   eeh_add_flag(EEH_ENABLED);
+   }
 
/* Save memory bars */
eeh_save_bars(edev);
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index ae06878fbdea..e68c79164974 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -55,44 +55,44 @@ static int ibm_get_config_addr_info;
 static int ibm_get_config_addr_info2;
 static int ibm_configure_pe;
 
-#ifdef CONFIG_PCI_IOV
 void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
 {
struct pci_dn *pdn = pci_get_pdn(pdev);
-   struct pci_dn *physfn_pdn;
-   struct eeh_dev *edev;
 
-   if (!pdev->is_virtfn)
+   if (eeh_has_flag(EEH_FORCE_DISABLED))
return;
 
pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
+#ifdef CONFIG_PCI_IOV
+   if (pdev->is_virtfn) {
+   struct pci_dn *physfn_pdn;
 
-   pdn->device_id  =  pdev->device;
-   pdn->vendor_id  =  pdev->vendor;
-   pdn->class_code =  pdev->class;
-   /*
-* Last allow unfreeze return code used for retrieval
-* by user space in eeh-sysfs to show the last command
-* completion from platform.
-*/
-   pdn->last_allow_rc =  0;
-   physfn_pdn  =  pci_get_pdn(pdev->physfn);
-   pdn->pe_number  =  physfn_pdn->pe_num_map[pdn->vf_index];
-   edev = pdn_to_eeh_dev(pdn);
-
-   /*
-* The following operations will fail if VF's sysfs files
-* aren't created or its resources aren't finalized.
-*/
+   pdn->device_id  =  pdev->device;
+   pdn->vendor_id  =  pdev->vendor;
+   pdn->class_code =  pdev->class;
+  

[PATCH 8/8] powerpc/eeh: Remove eeh_probe_devices() and eeh_addr_cache_build()

2019-03-19 Thread Sam Bobroff
Now that EEH support for all devices (on PowerNV and pSeries) is
provided by the pcibios bus add device hooks, eeh_probe_devices() and
eeh_addr_cache_build() are redundant and can be removed.

Note that previously on pSeries, useless EEH sysfs files were created
for some devices that did not have EEH support and this change
prevents them from being created.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh.h   |  6 
 arch/powerpc/kernel/eeh.c| 13 
 arch/powerpc/kernel/eeh_cache.c  | 32 
 arch/powerpc/platforms/powernv/eeh-powernv.c |  5 ++-
 arch/powerpc/platforms/pseries/pci.c |  3 +-
 5 files changed, 3 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 791b9e6fcc45..f1eca1757cbc 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -290,13 +290,11 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
 void eeh_show_enabled(void);
-void eeh_probe_devices(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 int eeh_check_failure(const volatile void __iomem *token);
 int eeh_dev_check_failure(struct eeh_dev *edev);
 void eeh_addr_cache_init(void);
-void eeh_addr_cache_build(void);
 void eeh_add_device_early(struct pci_dn *);
 void eeh_add_device_tree_early(struct pci_dn *);
 void eeh_add_device_late(struct pci_dev *);
@@ -347,8 +345,6 @@ static inline bool eeh_phb_enabled(void)
return false;
 }
 
-static inline void eeh_probe_devices(void) { }
-
 static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
 {
return NULL;
@@ -365,8 +361,6 @@ static inline int eeh_check_failure(const volatile void 
__iomem *token)
 
 static inline void eeh_addr_cache_init(void) { }
 
-static inline void eeh_addr_cache_build(void) { }
-
 static inline void eeh_add_device_early(struct pci_dn *pdn) { }
 
 static inline void eeh_add_device_tree_early(struct pci_dn *pdn) { }
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 217e14bb1fb6..cd2abbe41497 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1166,19 +1166,6 @@ static struct notifier_block eeh_reboot_nb = {
.notifier_call = eeh_reboot_notifier,
 };
 
-void eeh_probe_devices(void)
-{
-   struct pci_controller *hose, *tmp;
-   struct pci_dn *pdn;
-
-   /* Enable EEH for all adapters */
-   list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-   pdn = hose->pci_data;
-   traverse_pci_dn(pdn, eeh_ops->probe, NULL);
-   }
-   eeh_show_enabled();
-}
-
 /**
  * eeh_init - EEH initialization
  *
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index f93dd5cf6a39..c40078d036af 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -278,38 +278,6 @@ void eeh_addr_cache_init(void)
spin_lock_init(&pci_io_addr_cache_root.piar_lock);
 }
 
-/**
- * eeh_addr_cache_build - Build a cache of I/O addresses
- *
- * Build a cache of pci i/o addresses.  This cache will be used to
- * find the pci device that corresponds to a given address.
- * This routine scans all pci busses to build the cache.
- * Must be run late in boot process, after the pci controllers
- * have been scanned for devices (after all device resources are known).
- */
-void eeh_addr_cache_build(void)
-{
-   struct pci_dn *pdn;
-   struct eeh_dev *edev;
-   struct pci_dev *dev = NULL;
-
-   for_each_pci_dev(dev) {
-   pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
-   if (!pdn)
-   continue;
-
-   edev = pdn_to_eeh_dev(pdn);
-   if (!edev)
-   continue;
-
-   dev->dev.archdata.edev = edev;
-   edev->pdev = dev;
-
-   eeh_addr_cache_insert_dev(dev);
-   eeh_sysfs_add_device(dev);
-   }
-}
-
 static int eeh_addr_cache_show(struct seq_file *s, void *v)
 {
struct pci_io_addr_range *piar;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 81b0923cc55f..6a08f4fab255 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -240,9 +240,7 @@ int pnv_eeh_post_init(void)
struct pnv_phb *phb;
int ret = 0;
 
-   /* Probe devices & build address cache */
-   eeh_probe_devices();
-   eeh_addr_cache_build();
+   eeh_show_enabled();
 
/* Register OPAL event notifier */
eeh_event_irq = opal_event_request(ilog2(OPAL_EVENT_PCI_ERROR));
@@ -360,6 +358,7 @@ static int pnv_eeh_find_ecap(struct pci_dn *pdn, int cap)
return 0;
 }
 
+
 /**
  * pnv_eeh_probe - Do probe on

[PATCH 2/8] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag

2019-03-19 Thread Sam Bobroff
The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the
use of driver callbacks in drivers that have been bound part way
through the recovery process. This is necessary to prevent later stage
handlers from being called when the earlier stage handlers haven't,
which can be confusing for drivers.

However, the flag is set for all devices that are added after boot
time and only cleared at the end of the EEH recovery process. This
results in hot plugged devices erroneously having the flag set during
the first recovery after they are added (causing their driver's
handlers to be incorrectly ignored).

To remedy this, clear the flag at the beginning of recovery
processing. The flag is still cleared at the end of recovery
processing, although it is no longer really necessary.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh_driver.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 6f3ee30565dd..4c34b9901f15 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -819,6 +819,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
result = PCI_ERS_RESULT_DISCONNECT;
}
 
+   eeh_for_each_pe(pe, tmp_pe)
+   eeh_pe_for_each_dev(tmp_pe, edev, tmp)
+   edev->mode &= ~EEH_DEV_NO_HANDLER;
+
/* Walk the various device drivers attached to this slot through
 * a reset sequence, giving each an opportunity to do what it needs
 * to accomplish the reset.  Each child gets a report of the
-- 
2.19.0.2.gcad72f5712



[PATCH 4/8] powerpc/eeh: Improve debug messages around device addition

2019-03-19 Thread Sam Bobroff
Also remove useless comment.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh.c|  2 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 14 
 arch/powerpc/platforms/pseries/eeh_pseries.c | 23 +++-
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 8d3c36a1f194..b14d89547895 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1291,7 +1291,7 @@ void eeh_add_device_late(struct pci_dev *dev)
pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
edev = pdn_to_eeh_dev(pdn);
if (edev->pdev == dev) {
-   pr_debug("EEH: Already referenced !\n");
+   pr_debug("EEH: Device %s already referenced!\n", pci_name(dev));
return;
}
 
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index f0a95f663810..51c5b6bb9b0e 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -50,10 +50,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
if (!pdev->is_virtfn)
return;
 
-   /*
-* The following operations will fail if VF's sysfs files
-* aren't created or its resources aren't finalized.
-*/
+   pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
eeh_add_device_early(pdn);
eeh_add_device_late(pdev);
eeh_sysfs_add_device(pdev);
@@ -389,6 +386,10 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
int ret;
int config_addr = (pdn->busno << 8) | (pdn->devfn);
 
+   pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
+   __func__, hose->global_number, pdn->busno,
+   PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+
/*
 * When probing the root bridge, which doesn't have any
 * subordinate PCI devices. We don't have OF node for
@@ -483,6 +484,11 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
/* Save memory bars */
eeh_save_bars(edev);
 
+   pr_debug("%s: EEH enabled on %02x:%02x.%01x PHB#%x-PE#%x\n",
+   __func__, pdn->busno, PCI_SLOT(pdn->devfn),
+   PCI_FUNC(pdn->devfn), edev->pe->phb->global_number,
+   edev->pe->addr);
+
return NULL;
 }
 
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 7aa50258dd42..ae06878fbdea 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -65,6 +65,8 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
if (!pdev->is_virtfn)
return;
 
+   pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
+
pdn->device_id  =  pdev->device;
pdn->vendor_id  =  pdev->vendor;
pdn->class_code =  pdev->class;
@@ -251,6 +253,10 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
*data)
int enable = 0;
int ret;
 
+   pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
+   __func__, pdn->phb->global_number, pdn->busno,
+   PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+
/* Retrieve OF node and eeh device */
edev = pdn_to_eeh_dev(pdn);
if (!edev || edev->pe)
@@ -294,7 +300,12 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
*data)
 
/* Enable EEH on the device */
ret = eeh_ops->set_option(&pe, EEH_OPT_ENABLE);
-   if (!ret) {
+   if (ret) {
+   pr_debug("%s: EEH failed to enable on %02x:%02x.%01x 
PHB#%x-PE#%x (code %d)\n",
+   __func__, pdn->busno, PCI_SLOT(pdn->devfn),
+   PCI_FUNC(pdn->devfn), pe.phb->global_number,
+   pe.addr, ret);
+   } else {
/* Retrieve PE address */
edev->pe_config_addr = eeh_ops->get_pe_addr(&pe);
pe.addr = edev->pe_config_addr;
@@ -310,11 +321,6 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
*data)
if (enable) {
eeh_add_flag(EEH_ENABLED);
eeh_add_to_parent_pe(edev);
-
-   pr_debug("%s: EEH enabled on %02x:%02x.%01x 
PHB#%x-PE#%x\n",
-   __func__, pdn->busno, PCI_SLOT(pdn->devfn),
-   PCI_FUNC(pdn->devfn), pe.phb->global_number,
-   pe.addr);
} else if (pdn->parent && pdn_to_eeh_dev(pdn->parent) &&
   (pdn_to_eeh_dev(pdn->parent))->pe) {
/* This device doesn't support EEH, but it may have an
@@ -323,6 +329,11 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
*data)
edev->pe_config_addr = 
pdn_to_eeh_dev(pdn->parent)->pe_config_addr;
  

[PATCH 0/8]

2019-03-19 Thread Sam Bobroff
Hi all,

This patch set adds support for EEH recovery of hot plugged devices on pSeries
machines. Specifically, devices discovered by PCI rescanning using
/sys/bus/pci/rescan, which includes devices hotplugged by QEMU's device_add
command. (pSeries doesn't currently use slot power control for hotplugging.)

As a side effect this also provides EEH support for devices removed by
/sys/bus/pci/devices/*/remove and re-discovered by writing to 
/sys/bus/pci/rescan,
on all platforms.

The approach I've taken is to use the fact that the existing
pcibios_bus_add_device() platform hooks (which are used to set up EEH on
Virtual Function devices (VFs)) are actually called for all devices, so I've
widened their scope and made other adjustments necessary to allow them to work
for hotplugged and boot-time devices as well.

Because some of the changes are in generic PowerPC code, it's
possible that I've disturbed something for another PowerPC platform. I've tried
to minimize this by leaving that code alone as much as possible and so there
are a few cases where eeh_add_device_{early,late}() or eeh_add_sysfs_files() is
called more than once. I think these can be looked at later, as duplicate calls
are not harmful.

The patch "Convert PNV_PHB_FLAG_EEH" isn't strictly necessary and I'm not sure
if it's better to keep it, because it simplifies the code or drop it, because
we may need a separate flag per PHB later on. Thoughts anyone?

The first patch is a rework of the pcibios_init reordering patch I posted
earlier, which I've included here because it's necessary for this set.

I have done some testing for PowerNV on Power9 using a modified pnv_php module
and some testing on pSeries with slot power control using a modified rpaphp
module, and the EEH-related parts seem to work.

Cheers,
Sam.

Sam Bobroff (8):
  powerpc/64: Adjust order in pcibios_init()
  powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
  powerpc/eeh: Convert PNV_PHB_FLAG_EEH to global flag
  powerpc/eeh: Improve debug messages around device addition
  powerpc/eeh: Add eeh_show_enabled()
  powerpc/eeh: Initialize EEH address cache earlier
  powerpc/eeh: EEH for pSeries hot plug
  powerpc/eeh: Remove eeh_probe_devices() and eeh_addr_cache_build()

 arch/powerpc/include/asm/eeh.h   | 19 +++--
 arch/powerpc/kernel/eeh.c| 33 -
 arch/powerpc/kernel/eeh_cache.c  | 29 +---
 arch/powerpc/kernel/eeh_driver.c |  4 ++
 arch/powerpc/kernel/of_platform.c|  3 +-
 arch/powerpc/kernel/pci-common.c |  4 --
 arch/powerpc/kernel/pci_32.c |  4 ++
 arch/powerpc/kernel/pci_64.c | 12 +++-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 41 +--
 arch/powerpc/platforms/powernv/pci.c |  7 +-
 arch/powerpc/platforms/powernv/pci.h |  2 -
 arch/powerpc/platforms/pseries/eeh_pseries.c | 75 +++-
 arch/powerpc/platforms/pseries/pci.c |  7 +-
 13 files changed, 122 insertions(+), 118 deletions(-)

-- 
2.19.0.2.gcad72f5712



[PATCH 3/8] powerpc/eeh: Convert PNV_PHB_FLAG_EEH to global flag

2019-03-19 Thread Sam Bobroff
The PHB flag, PNV_PHB_FLAG_EEH, is set (on PowerNV) individually on
each PHB once the EEH subsystem is ready. It is the only use of the
flags member of the phb struct.

However there is no need to store this separately on each PHB, so
convert it to a global flag. For symmetry, the flag is now also set
for pSeries; although it is currently unused it may be useful in the
future.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh.h   | 11 +++
 arch/powerpc/platforms/powernv/eeh-powernv.c | 14 +++---
 arch/powerpc/platforms/powernv/pci.c |  7 +++
 arch/powerpc/platforms/powernv/pci.h |  2 --
 arch/powerpc/platforms/pseries/pci.c |  4 
 5 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 3613a56281f2..fe4cf7208890 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -43,6 +43,7 @@ struct pci_dn;
 #define EEH_VALID_PE_ZERO  0x10/* PE#0 is valid */
 #define EEH_ENABLE_IO_FOR_LOG  0x20/* Enable IO for log */
 #define EEH_EARLY_DUMP_LOG 0x40/* Dump log immediately  */
+#define EEH_PHB_ENABLED0x80/* PHB recovery uses EEH
 */
 
 /*
  * Delay for PE reset, all in ms
@@ -245,6 +246,11 @@ static inline bool eeh_enabled(void)
return eeh_has_flag(EEH_ENABLED) && !eeh_has_flag(EEH_FORCE_DISABLED);
 }
 
+static inline bool eeh_phb_enabled(void)
+{
+   return eeh_has_flag(EEH_PHB_ENABLED);
+}
+
 static inline void eeh_serialize_lock(unsigned long *flags)
 {
raw_spin_lock_irqsave(&confirm_error_lock, *flags);
@@ -332,6 +338,11 @@ static inline bool eeh_enabled(void)
 return false;
 }
 
+static inline bool eeh_phb_enabled(void)
+{
+   return false;
+}
+
 static inline void eeh_probe_devices(void) { }
 
 static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 6fc1a463b796..f0a95f663810 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -264,22 +264,14 @@ int pnv_eeh_post_init(void)
return ret;
}
 
-   if (!eeh_enabled())
+   if (eeh_enabled())
+   eeh_add_flag(EEH_PHB_ENABLED);
+   else
disable_irq(eeh_event_irq);
 
list_for_each_entry(hose, &hose_list, list_node) {
phb = hose->private_data;
 
-   /*
-* If EEH is enabled, we're going to rely on that.
-* Otherwise, we restore to conventional mechanism
-* to clear frozen PE during PCI config access.
-*/
-   if (eeh_enabled())
-   phb->flags |= PNV_PHB_FLAG_EEH;
-   else
-   phb->flags &= ~PNV_PHB_FLAG_EEH;
-
/* Create debugfs entries */
 #ifdef CONFIG_DEBUG_FS
if (phb->has_dbgfs || !phb->dbgfs)
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 307181fd8a17..d2b50f3bf6b1 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -717,10 +717,9 @@ int pnv_pci_cfg_write(struct pci_dn *pdn,
 static bool pnv_pci_cfg_check(struct pci_dn *pdn)
 {
struct eeh_dev *edev = NULL;
-   struct pnv_phb *phb = pdn->phb->private_data;
 
/* EEH not enabled ? */
-   if (!(phb->flags & PNV_PHB_FLAG_EEH))
+   if (!eeh_phb_enabled())
return true;
 
/* PE reset or device removed ? */
@@ -761,7 +760,7 @@ static int pnv_pci_read_config(struct pci_bus *bus,
 
ret = pnv_pci_cfg_read(pdn, where, size, val);
phb = pdn->phb->private_data;
-   if (phb->flags & PNV_PHB_FLAG_EEH && pdn->edev) {
+   if (eeh_phb_enabled() && pdn->edev) {
if (*val == EEH_IO_ERROR_VALUE(size) &&
eeh_dev_check_failure(pdn->edev))
 return PCIBIOS_DEVICE_NOT_FOUND;
@@ -789,7 +788,7 @@ static int pnv_pci_write_config(struct pci_bus *bus,
 
ret = pnv_pci_cfg_write(pdn, where, size, val);
phb = pdn->phb->private_data;
-   if (!(phb->flags & PNV_PHB_FLAG_EEH))
+   if (!eeh_phb_enabled())
pnv_pci_config_check_eeh(pdn);
 
return ret;
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 8e36da379252..eb0add61397b 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -85,8 +85,6 @@ struct pnv_ioda_pe {
struct list_headlist;
 };
 
-#define PNV_PHB_FLAG_EEH   (1 << 0)
-
 struct pnv_phb {
struct pci_controller   *hose;
enum pnv_phb_type   type;
diff --git a/arch/powerpc/platforms/pseries/pci.c 
b/arch/powerpc/platform

[PATCH 5/8] powerpc/eeh: Add eeh_show_enabled()

2019-03-19 Thread Sam Bobroff
Move the EEH enabled message into it's own function so that future
work can call it from multiple places.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh.h |  3 +++
 arch/powerpc/kernel/eeh.c  | 16 +++-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index fe4cf7208890..e217ccda55d0 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -289,6 +289,7 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
 struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
+void eeh_show_enabled(void);
 void eeh_probe_devices(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
@@ -338,6 +339,8 @@ static inline bool eeh_enabled(void)
 return false;
 }
 
+static inline void eeh_show_enabled(void) { }
+
 static inline bool eeh_phb_enabled(void)
 {
return false;
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index b14d89547895..3dcff29cb9b3 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -163,6 +163,16 @@ static int __init eeh_setup(char *str)
 }
 __setup("eeh=", eeh_setup);
 
+void eeh_show_enabled(void)
+{
+   if (eeh_has_flag(EEH_FORCE_DISABLED))
+   pr_info("EEH: PCI Enhanced I/O Error Handling DISABLED (by 
eeh=off)\n");
+   else if (eeh_enabled())
+   pr_info("EEH: PCI Enhanced I/O Error Handling ENABLED (capable 
adapter found)\n");
+   else
+   pr_info("EEH: PCI Enhanced I/O Error Handling DISABLED (no 
capable adapter found)\n");
+}
+
 /*
  * This routine captures assorted PCI configuration space data
  * for the indicated PCI device, and puts them into a buffer
@@ -1166,11 +1176,7 @@ void eeh_probe_devices(void)
pdn = hose->pci_data;
traverse_pci_dn(pdn, eeh_ops->probe, NULL);
}
-   if (eeh_enabled())
-   pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
-   else
-   pr_info("EEH: No capable adapters found\n");
-
+   eeh_show_enabled();
 }
 
 /**
-- 
2.19.0.2.gcad72f5712



[PATCH 6/8] powerpc/eeh: Initialize EEH address cache earlier

2019-03-19 Thread Sam Bobroff
The EEH address cache is currently initialized and populated by a
single function: eeh_addr_cache_build().  While the initial population
of the cache can only be done once resources are allocated,
initialization (just setting up a spinlock) could be done much
earlier.

So move the initialization step into a separate function and call it
from a core_initcall (rather than a subsys initcall).

This will allow future work to make use of the cache during boot time
PCI scanning.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh.h  |  3 +++
 arch/powerpc/kernel/eeh.c   |  2 ++
 arch/powerpc/kernel/eeh_cache.c | 13 +++--
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index e217ccda55d0..791b9e6fcc45 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -295,6 +295,7 @@ int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 int eeh_check_failure(const volatile void __iomem *token);
 int eeh_dev_check_failure(struct eeh_dev *edev);
+void eeh_addr_cache_init(void);
 void eeh_addr_cache_build(void);
 void eeh_add_device_early(struct pci_dn *);
 void eeh_add_device_tree_early(struct pci_dn *);
@@ -362,6 +363,8 @@ static inline int eeh_check_failure(const volatile void 
__iomem *token)
 
 #define eeh_dev_check_failure(x) (0)
 
+static inline void eeh_addr_cache_init(void) { }
+
 static inline void eeh_addr_cache_build(void) { }
 
 static inline void eeh_add_device_early(struct pci_dn *pdn) { }
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 3dcff29cb9b3..7a406d58d2c0 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1219,6 +1219,8 @@ static int eeh_init(void)
list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
eeh_dev_phb_init_dynamic(hose);
 
+   eeh_addr_cache_init();
+
/* Initialize EEH event */
return eeh_event_init();
 }
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 9c68f0837385..f93dd5cf6a39 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -267,6 +267,17 @@ void eeh_addr_cache_rmv_dev(struct pci_dev *dev)
spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
 }
 
+/**
+ * eeh_addr_cache_init - Initialize a cache of I/O addresses
+ *
+ * Initialize a cache of pci i/o addresses.  This cache will be used to
+ * find the pci device that corresponds to a given address.
+ */
+void eeh_addr_cache_init(void)
+{
+   spin_lock_init(&pci_io_addr_cache_root.piar_lock);
+}
+
 /**
  * eeh_addr_cache_build - Build a cache of I/O addresses
  *
@@ -282,8 +293,6 @@ void eeh_addr_cache_build(void)
struct eeh_dev *edev;
struct pci_dev *dev = NULL;
 
-   spin_lock_init(&pci_io_addr_cache_root.piar_lock);
-
for_each_pci_dev(dev) {
pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
if (!pdn)
-- 
2.19.0.2.gcad72f5712



[PATCH 1/8] powerpc/64: Adjust order in pcibios_init()

2019-03-19 Thread Sam Bobroff
The pcibios_init() function for 64 bit PowerPC currently calls
pci_bus_add_devices() before pcibios_resource_survey(), which seems
incorrect because it adds devices and attempts to bind their drivers
before allocating their resources (although no problems seem to be
apparent).

So move the call to pci_bus_add_devices() to after
pcibios_resource_survey(), while extracting call to the
pcibios_fixup() hook so that it remains in the same location.

This will also allow the ppc_md.pcibios_bus_add_device() hooks to
perform actions that depend on PCI resources, both during rescanning
(where this is already the case) and at boot time, to support future
work.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/pci-common.c |  4 
 arch/powerpc/kernel/pci_32.c |  4 
 arch/powerpc/kernel/pci_64.c | 12 +---
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ff4b7539cbdf..3146eb73e3b3 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1383,10 +1383,6 @@ void __init pcibios_resource_survey(void)
pr_debug("PCI: Assigning unassigned resources...\n");
pci_assign_unassigned_resources();
}
-
-   /* Call machine dependent fixup */
-   if (ppc_md.pcibios_fixup)
-   ppc_md.pcibios_fixup();
 }
 
 /* This is used by the PCI hotplug driver to allocate resource
diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c
index d3f04f2d8249..40aaa1a6e193 100644
--- a/arch/powerpc/kernel/pci_32.c
+++ b/arch/powerpc/kernel/pci_32.c
@@ -259,6 +259,10 @@ static int __init pcibios_init(void)
/* Call common code to handle resource allocation */
pcibios_resource_survey();
 
+   /* Call machine dependent fixup */
+   if (ppc_md.pcibios_fixup)
+   ppc_md.pcibios_fixup();
+
/* Call machine dependent post-init code */
if (ppc_md.pcibios_after_init)
ppc_md.pcibios_after_init();
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 9d8c10d55407..6f16f30031d7 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -58,14 +58,20 @@ static int __init pcibios_init(void)
pci_add_flags(PCI_ENABLE_PROC_DOMAINS | PCI_COMPAT_DOMAIN_0);
 
/* Scan all of the recorded PCI controllers.  */
-   list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+   list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
pcibios_scan_phb(hose);
-   pci_bus_add_devices(hose->bus);
-   }
 
/* Call common code to handle resource allocation */
pcibios_resource_survey();
 
+   /* Add devices. */
+   list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
+   pci_bus_add_devices(hose->bus);
+
+   /* Call machine dependent fixup */
+   if (ppc_md.pcibios_fixup)
+   ppc_md.pcibios_fixup();
+
printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
 
return 0;
-- 
2.19.0.2.gcad72f5712



Re: [PATCH kernel RFC 2/2] vfio-pci-nvlink2: Implement interconnect isolation

2019-03-19 Thread Alexey Kardashevskiy



On 20/03/2019 03:36, Alex Williamson wrote:
> On Fri, 15 Mar 2019 19:18:35 +1100
> Alexey Kardashevskiy  wrote:
> 
>> The NVIDIA V100 SXM2 GPUs are connected to the CPU via PCIe links and
>> (on POWER9) NVLinks. In addition to that, GPUs themselves have direct
>> peer to peer NVLinks in groups of 2 to 4 GPUs. At the moment the POWERNV
>> platform puts all interconnected GPUs to the same IOMMU group.
>>
>> However the user may want to pass individual GPUs to the userspace so
>> in order to do so we need to put them into separate IOMMU groups and
>> cut off the interconnects.
>>
>> Thankfully V100 GPUs implement an interface to do by programming link
>> disabling mask to BAR0 of a GPU. Once a link is disabled in a GPU using
>> this interface, it cannot be re-enabled until the secondary bus reset is
>> issued to the GPU.
>>
>> This defines a reset_done() handler for V100 NVlink2 device which
>> determines what links need to be disabled. This relies on presence
>> of the new "ibm,nvlink-peers" device tree property of a GPU telling which
>> PCI peers it is connected to (which includes NVLink bridges or peer GPUs).
>>
>> This does not change the existing behaviour and instead adds
>> a new "isolate_nvlink" kernel parameter to allow such isolation.
>>
>> The alternative approaches would be:
>>
>> 1. do this in the system firmware (skiboot) but for that we would need
>> to tell skiboot via an additional OPAL call whether or not we want this
>> isolation - skiboot is unaware of IOMMU groups.
>>
>> 2. do this in the secondary bus reset handler in the POWERNV platform -
>> the problem with that is at that point the device is not enabled, i.e.
>> config space is not restored so we need to enable the device (i.e. MMIO
>> bit in CMD register + program valid address to BAR0) in order to disable
>> links and then perhaps undo all this initialization to bring the device
>> back to the state where pci_try_reset_function() expects it to be.
> 
> The trouble seems to be that this approach only maintains the isolation
> exposed by the IOMMU group when vfio-pci is the active driver for the
> device.  IOMMU groups can be used by any driver and the IOMMU core is
> incorporating groups in various ways.  So, if there's a device specific
> way to configure the isolation reported in the group, which requires
> some sort of active management against things like secondary bus
> resets, then I think we need to manage it above the attached endpoint
> driver.

Fair point. So for now I'll go for 2) then.

> Ideally I'd see this as a set of PCI quirks so that we might
> leverage it beyond POWER platforms.  I'm not sure how we get past the
> reliance on device tree properties that we won't have on other
> platforms though, if only NVIDIA could at least open a spec addressing
> the discovery and configuration of NVLink registers on their
> devices :-\  Thanks,

This would be nice, yes...


-- 
Alexey


Re: [PATCH 6/7] ocxl: afu_irq only deals with IRQ IDs, not offsets

2019-03-19 Thread Alastair D'Silva
On Fri, 2019-03-15 at 14:56 +0100, Greg Kurz wrote:
> On Wed, 13 Mar 2019 15:15:21 +1100
> "Alastair D'Silva"  wrote:
> 
> > From: Alastair D'Silva 
> > 
> > The use of offsets is required only in the frontend, so alter
> > the IRQ API to only work with IRQ IDs in the backend.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >  drivers/misc/ocxl/afu_irq.c   | 31 +
> > --
> >  drivers/misc/ocxl/context.c   |  7 +--
> >  drivers/misc/ocxl/file.c  | 13 -
> >  drivers/misc/ocxl/ocxl_internal.h | 10 ++
> >  drivers/misc/ocxl/trace.h | 12 
> >  5 files changed, 36 insertions(+), 37 deletions(-)
> > 
> > diff --git a/drivers/misc/ocxl/afu_irq.c
> > b/drivers/misc/ocxl/afu_irq.c
> > index 11ab996657a2..1885c472df58 100644
> > --- a/drivers/misc/ocxl/afu_irq.c
> > +++ b/drivers/misc/ocxl/afu_irq.c
> > @@ -14,14 +14,14 @@ struct afu_irq {
> > struct eventfd_ctx *ev_ctx;
> >  };
> >  
> > -static int irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
> > +int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
> >  {
> > return (offset - ctx->afu->irq_base_offset) >> PAGE_SHIFT;
> >  }
> >  
> > -static u64 irq_id_to_offset(struct ocxl_context *ctx, int id)
> > +u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id)
> >  {
> > -   return ctx->afu->irq_base_offset + (id << PAGE_SHIFT);
> > +   return ctx->afu->irq_base_offset + (irq_id << PAGE_SHIFT);
> >  }
> >  
> >  static irqreturn_t afu_irq_handler(int virq, void *data)
> > @@ -69,7 +69,7 @@ static void release_afu_irq(struct afu_irq *irq)
> > kfree(irq->name);
> >  }
> >  
> > -int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset)
> > +int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id)
> >  {
> > struct afu_irq *irq;
> > int rc;
> > @@ -101,10 +101,7 @@ int ocxl_afu_irq_alloc(struct ocxl_context
> > *ctx, u64 *irq_offset)
> > if (rc)
> > goto err_alloc;
> >  
> > -   *irq_offset = irq_id_to_offset(ctx, irq->id);
> 
> This should be replaced by:
> 
>   *irq_id = irq->id;
> 

Whoops, good catch, thanks :)

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



[PATCH 4/4] powerpc: convert config files to generic cmdline

2019-03-19 Thread Daniel Walker
This is a mass convert of the config files to use the new
generic cmdline.

The command used to convert is as follows,

sed -i 's/CONFIG_CMDLINE=/CONFIG_CMDLINE_PREPEND=/g' 

Cc: xe-linux-exter...@cisco.com
Cc: Daniel Walker 
Signed-off-by: Daniel Walker 

Change-Id: Idf7cae45ef5f8afebcb1ee7e025aafcb6541ad35
Signed-off-by: Daniel Walker 
---
 arch/powerpc/configs/44x/fsp2_defconfig   | 2 +-
 arch/powerpc/configs/44x/iss476-smp_defconfig | 2 +-
 arch/powerpc/configs/44x/warp_defconfig   | 2 +-
 arch/powerpc/configs/holly_defconfig  | 2 +-
 arch/powerpc/configs/mvme5100_defconfig   | 2 +-
 arch/powerpc/configs/skiroot_defconfig| 2 +-
 arch/powerpc/configs/storcenter_defconfig | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/configs/44x/fsp2_defconfig 
b/arch/powerpc/configs/44x/fsp2_defconfig
index bae6b26bcfba..1e9b1e7e281f 100644
--- a/arch/powerpc/configs/44x/fsp2_defconfig
+++ b/arch/powerpc/configs/44x/fsp2_defconfig
@@ -29,7 +29,7 @@ CONFIG_SWIOTLB=y
 CONFIG_KEXEC=y
 CONFIG_CRASH_DUMP=y
 CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE="ip=on rw"
+CONFIG_CMDLINE_PREPEND="ip=on rw"
 # CONFIG_SUSPEND is not set
 # CONFIG_PCI is not set
 CONFIG_NET=y
diff --git a/arch/powerpc/configs/44x/iss476-smp_defconfig 
b/arch/powerpc/configs/44x/iss476-smp_defconfig
index d24bfa6ecd62..ed234c2b1956 100644
--- a/arch/powerpc/configs/44x/iss476-smp_defconfig
+++ b/arch/powerpc/configs/44x/iss476-smp_defconfig
@@ -18,7 +18,7 @@ CONFIG_HZ_100=y
 CONFIG_MATH_EMULATION=y
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE="root=/dev/issblk0"
+CONFIG_CMDLINE_PREPEND="root=/dev/issblk0"
 # CONFIG_PCI is not set
 CONFIG_ADVANCED_OPTIONS=y
 CONFIG_DYNAMIC_MEMSTART=y
diff --git a/arch/powerpc/configs/44x/warp_defconfig 
b/arch/powerpc/configs/44x/warp_defconfig
index 6c02f53271cd..ddb395840bb9 100644
--- a/arch/powerpc/configs/44x/warp_defconfig
+++ b/arch/powerpc/configs/44x/warp_defconfig
@@ -15,7 +15,7 @@ CONFIG_WARP=y
 CONFIG_PPC4xx_GPIO=y
 CONFIG_HZ_1000=y
 CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE="ip=on"
+CONFIG_CMDLINE_PREPEND="ip=on"
 # CONFIG_PCI is not set
 CONFIG_NET=y
 CONFIG_PACKET=y
diff --git a/arch/powerpc/configs/holly_defconfig 
b/arch/powerpc/configs/holly_defconfig
index 71d8d2430b6c..14945562d193 100644
--- a/arch/powerpc/configs/holly_defconfig
+++ b/arch/powerpc/configs/holly_defconfig
@@ -14,7 +14,7 @@ CONFIG_PPC_HOLLY=y
 CONFIG_GEN_RTC=y
 CONFIG_BINFMT_MISC=y
 CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE="console=ttyS0,115200"
+CONFIG_CMDLINE_PREPEND="console=ttyS0,115200"
 # CONFIG_SECCOMP is not set
 CONFIG_NET=y
 CONFIG_PACKET=y
diff --git a/arch/powerpc/configs/mvme5100_defconfig 
b/arch/powerpc/configs/mvme5100_defconfig
index 63e38c7220f1..07a68ebb3713 100644
--- a/arch/powerpc/configs/mvme5100_defconfig
+++ b/arch/powerpc/configs/mvme5100_defconfig
@@ -23,7 +23,7 @@ CONFIG_HZ_100=y
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 # CONFIG_COMPACTION is not set
 CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE="console=ttyS0,9600 ip=dhcp root=/dev/nfs"
+CONFIG_CMDLINE_PREPEND="console=ttyS0,9600 ip=dhcp root=/dev/nfs"
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
diff --git a/arch/powerpc/configs/skiroot_defconfig 
b/arch/powerpc/configs/skiroot_defconfig
index cfdd08897a06..a79984208224 100644
--- a/arch/powerpc/configs/skiroot_defconfig
+++ b/arch/powerpc/configs/skiroot_defconfig
@@ -53,7 +53,7 @@ CONFIG_NUMA=y
 CONFIG_PPC_64K_PAGES=y
 CONFIG_SCHED_SMT=y
 CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE="console=tty0 console=hvc0 ipr.fast_reboot=1 quiet"
+CONFIG_CMDLINE_PREPEND="console=tty0 console=hvc0 ipr.fast_reboot=1 quiet"
 # CONFIG_SECCOMP is not set
 # CONFIG_PPC_MEM_KEYS is not set
 CONFIG_NET=y
diff --git a/arch/powerpc/configs/storcenter_defconfig 
b/arch/powerpc/configs/storcenter_defconfig
index 74bca2eccd0f..83b3b92176a0 100644
--- a/arch/powerpc/configs/storcenter_defconfig
+++ b/arch/powerpc/configs/storcenter_defconfig
@@ -13,7 +13,7 @@ CONFIG_STORCENTER=y
 CONFIG_HZ_100=y
 CONFIG_BINFMT_MISC=y
 CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE="console=ttyS0,115200"
+CONFIG_CMDLINE_PREPEND="console=ttyS0,115200"
 # CONFIG_SECCOMP is not set
 CONFIG_NET=y
 CONFIG_PACKET=m
-- 
2.19.1



[PATCH 1/4] add generic builtin command line

2019-03-19 Thread Daniel Walker
This code allows architectures to use a generic builtin command line.
The state of the builtin command line options across architecture is
diverse. On x86 and mips they have pretty much the same code and the
code prepends the builtin command line onto the boot loader provided
one. On powerpc there is only a builtin override and nothing else.

The code in this commit unifies the mips and x86 code into a generic
header file under the CONFIG_GENERIC_CMDLINE option. When this
option is enabled the architecture can call the cmdline_add_builtin()
to add the builtin command line.

[maksym.kok...@globallogic.com: fix cmdline_add_builtin() macro]
Cc: Daniel Walker 
Cc: Daniel Walker 
Cc: xe-linux-exter...@cisco.com
Signed-off-by: Daniel Walker 
Signed-off-by: Maksym Kokhan 
---
 include/linux/cmdline.h | 69 +
 init/Kconfig| 69 +
 2 files changed, 138 insertions(+)
 create mode 100644 include/linux/cmdline.h

diff --git a/include/linux/cmdline.h b/include/linux/cmdline.h
new file mode 100644
index ..4a16ee134585
--- /dev/null
+++ b/include/linux/cmdline.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CMDLINE_H
+#define _LINUX_CMDLINE_H
+
+/*
+ *
+ * Copyright (C) 2015. Cisco Systems, Inc.
+ *
+ * Generic Append/Prepend cmdline support.
+ */
+
+#if defined(CONFIG_GENERIC_CMDLINE) && defined(CONFIG_CMDLINE_BOOL)
+
+#ifndef CONFIG_CMDLINE_OVERRIDE
+/*
+ * This function will append or prepend a builtin command line to the command
+ * line provided by the bootloader. Kconfig options can be used to alter
+ * the behavior of this builtin command line.
+ * @dest: The destination of the final appended/prepended string
+ * @src: The starting string or NULL if there isn't one.
+ * @tmp: temporary space used for prepending
+ * @length: the maximum length of the strings above.
+ */
+static inline void
+_cmdline_add_builtin(char *dest, char *src, char *tmp, unsigned long length)
+{
+   if (src != dest && src != NULL) {
+   strlcpy(dest, " ", length);
+   strlcat(dest, src, length);
+   }
+
+   if (sizeof(CONFIG_CMDLINE_APPEND) > 1)
+   strlcat(dest, " " CONFIG_CMDLINE_APPEND, length);
+
+   if (sizeof(CONFIG_CMDLINE_PREPEND) > 1) {
+   strlcpy(tmp, CONFIG_CMDLINE_PREPEND " ", length);
+   strlcat(tmp, dest, length);
+   strlcpy(dest, tmp, length);
+   }
+}
+
+#define cmdline_add_builtin_section(dest, src, length, section)\
+{  \
+   if (sizeof(CONFIG_CMDLINE_PREPEND) > 1) {   \
+   static char cmdline_tmp_space[length] section;  \
+   _cmdline_add_builtin(dest, src, cmdline_tmp_space, length); \
+   } else {\
+   _cmdline_add_builtin(dest, src, NULL, length);  \
+   }   \
+}
+#else
+#define cmdline_add_builtin_section(dest, src, length, section)   \
+{ \
+   strlcpy(dest, CONFIG_CMDLINE_PREPEND " " CONFIG_CMDLINE_APPEND,\
+   length);   \
+}
+#endif /* !CONFIG_CMDLINE_OVERRIDE */
+
+#else
+#define cmdline_add_builtin_section(dest, src, length, section) {  \
+   if (src != NULL)   \
+   strlcpy(dest, src, length);\
+}
+#endif /* CONFIG_GENERIC_CMDLINE */
+
+#define cmdline_add_builtin(dest, src, length) \
+   cmdline_add_builtin_section(dest, src, length, __initdata)
+
+#endif /* _LINUX_CMDLINE_H */
diff --git a/init/Kconfig b/init/Kconfig
index d47cb77a220e..b9b9e7702ea3 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1778,6 +1778,75 @@ config PROFILING
 config TRACEPOINTS
bool
 
+config GENERIC_CMDLINE
+   bool
+
+if GENERIC_CMDLINE
+
+config CMDLINE_BOOL
+   bool "Built-in kernel command line"
+   help
+ Allow for specifying boot arguments to the kernel at
+ build time.  On some systems (e.g. embedded ones), it is
+ necessary or convenient to provide some or all of the
+ kernel boot arguments with the kernel itself (that is,
+ to not rely on the boot loader to provide them.)
+
+ To compile command line arguments into the kernel,
+ set this option to 'Y', then fill in the
+ the boot arguments in CONFIG_CMDLINE.
+
+ Systems with fully functional boot loaders (i.e. non-embedded)
+ should leave this option set to 'N'.
+
+config CMDLINE_APPEND
+   string "Built-in kernel command string append"
+   depends on CMDLINE_BOOL
+   default ""
+   help
+

[PATCH 2/4] drivers: of: generic command line support

2019-03-19 Thread Daniel Walker
This adds support for the generic command line implementation into the
device tree code. This allows some platforms to use the original
CONFIG_CMDLINE implementation, but powerpc platforms can used the newer
generic command line code. As platforms support the generic command line
code they can simply add "select GENERIC_CMDLINE" and delete their
Kconfig options for the current CMDLINE.

Change-Id: Ief473a5ffac01a999b0aba7619f5b63bc4b36ac4
Cc: Andrew Morton 
Cc: Christophe Leroy 
Cc: Michael Ellerman 
Signed-off-by: Daniel Walker 
---
 drivers/of/fdt.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 7099c652c6a5..9dc5550697c2 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include   /* for COMMAND_LINE_SIZE */
 #include 
@@ -1102,7 +1103,7 @@ int __init early_init_dt_scan_chosen(unsigned long node, 
const char *uname,
 * managed to set the command line, unless CONFIG_CMDLINE_FORCE
 * is set in which case we override whatever was found earlier.
 */
-#ifdef CONFIG_CMDLINE
+#if defined(CONFIG_CMDLINE) && !defined(CONFIG_GENERIC_CMDLINE)
 #if defined(CONFIG_CMDLINE_EXTEND)
strlcat(data, " ", COMMAND_LINE_SIZE);
strlcat(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
@@ -1113,7 +1114,12 @@ int __init early_init_dt_scan_chosen(unsigned long node, 
const char *uname,
if (!((char *)data)[0])
strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
 #endif
-#endif /* CONFIG_CMDLINE */
+#endif /* CONFIG_CMDLINE  && !CONFIG_GENERIC_CMDLINE */
+
+   /* append and prepend any arguments built into the kernel via
+* generic cmdline.
+*/
+   cmdline_add_builtin(data, NULL, COMMAND_LINE_SIZE);
 
pr_debug("Command line is: %s\n", (char*)data);
 
-- 
2.19.1



[PATCH 3/4] powerpc: convert to generic builtin command line

2019-03-19 Thread Daniel Walker
This updates the powerpc code to use the CONFIG_GENERIC_CMDLINE
option.

[maksym.kok...@globallogic.com: add strlcat to prom_init_check.sh
whitelist]
Cc: Daniel Walker 
Cc: Daniel Walker 
Cc: xe-linux-exter...@cisco.com
Signed-off-by: Daniel Walker 
Signed-off-by: Maksym Kokhan 
---
 arch/powerpc/Kconfig   | 23 +--
 arch/powerpc/kernel/prom_init.c|  8 
 arch/powerpc/kernel/prom_init_check.sh |  2 +-
 3 files changed, 6 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index def41a06377b..385120ff0236 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -173,6 +173,7 @@ config PPC
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
select GENERIC_TIME_VSYSCALL
+   select GENERIC_CMDLINE
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KGDB
@@ -780,28 +781,6 @@ config PPC_DENORMALISATION
  Add support for handling denormalisation of single precision
  values.  Useful for bare metal only.  If unsure say Y here.
 
-config CMDLINE_BOOL
-   bool "Default bootloader kernel arguments"
-
-config CMDLINE
-   string "Initial kernel command string"
-   depends on CMDLINE_BOOL
-   default "console=ttyS0,9600 console=tty0 root=/dev/sda2"
-   help
- On some platforms, there is currently no way for the boot loader to
- pass arguments to the kernel. For these platforms, you can supply
- some command-line options at build time by entering them here.  In
- most cases you will need to specify the root device here.
-
-config CMDLINE_FORCE
-   bool "Always use the default kernel command string"
-   depends on CMDLINE_BOOL
-   help
- Always use the default kernel command string, even if the boot
- loader passes other arguments to the kernel.
- This is useful if you cannot or don't want to change the
- command-line options your boot loader passes to the kernel.
-
 config EXTRA_TARGETS
string "Additional default image types"
help
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index f33ff4163a51..e8e9fca22470 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -637,11 +638,10 @@ static void __init early_cmdline_parse(void)
p = prom_cmd_line;
if ((long)prom.chosen > 0)
l = prom_getprop(prom.chosen, "bootargs", p, 
COMMAND_LINE_SIZE-1);
-#ifdef CONFIG_CMDLINE
+
if (l <= 0 || p[0] == '\0') /* dbl check */
-   strlcpy(prom_cmd_line,
-   CONFIG_CMDLINE, sizeof(prom_cmd_line));
-#endif /* CONFIG_CMDLINE */
+   cmdline_add_builtin_section(prom_cmd_line, NULL, 
sizeof(prom_cmd_line), __prombss);
+
prom_printf("command line: %s\n", prom_cmd_line);
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/prom_init_check.sh 
b/arch/powerpc/kernel/prom_init_check.sh
index 667df97d2595..ab2acc8d8b5a 100644
--- a/arch/powerpc/kernel/prom_init_check.sh
+++ b/arch/powerpc/kernel/prom_init_check.sh
@@ -18,7 +18,7 @@
 
 WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush
 _end enter_prom memcpy memset reloc_offset __secondary_hold
-__secondary_hold_acknowledge __secondary_hold_spinloop __start
+__secondary_hold_acknowledge __secondary_hold_spinloop __start strlcat
 strcmp strcpy strlcpy strlen strncmp strstr kstrtobool logo_linux_clut224
 reloc_got2 kernstart_addr memstart_addr linux_banner _stext
 __prom_init_toc_start __prom_init_toc_end btext_setup_display TOC."
-- 
2.19.1



Generic command line -v2

2019-03-19 Thread Daniel Walker
Hi all,

new in -v2
* Updated with some changes to adding spaces from Christophe Leroy.
* Added OF support with comments from Rob on my 2015 submission which 
he reviewed.
(https://lore.kernel.org/patchwork/patch/604997/)
  Christophe and Rob suggested to have support for this inside the OF 
code.
* Powerpc was made to use the OF parts instead of having changes into 
it's code.
* Removed trim effect from powerpc config files. sed only was used to 
convert.
  requested by Michael Ellerman.

That's basically it. Otherwise the same changes.

Daniel




Re: [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it

2019-03-19 Thread Andrew Morton
On Sun, 17 Mar 2019 11:34:31 -0700 ira.we...@intel.com wrote:

> Resending after rebasing to the latest mm tree.
> 
> HFI1, qib, and mthca, use get_user_pages_fast() due to it performance
> advantages.  These pages can be held for a significant time.  But
> get_user_pages_fast() does not protect against mapping FS DAX pages.
> 
> Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
> retains the performance while also adding the FS DAX checks.  XDP has also
> shown interest in using this functionality.[1]
> 
> In addition we change get_user_pages() to use the new FOLL_LONGTERM flag and
> remove the specialized get_user_pages_longterm call.

It would be helpful to include your response to Christoph's question
(http://lkml.kernel.org/r/20190220180255.ga12...@iweiny-desk2.sc.intel.com)
in the changelog.  Because if one person was wondering about this,
others will likely do so.

We have no record of acks or reviewed-by's.  At least one was missed
(http://lkml.kernel.org/r/caog9msttcd-9bcsdfc0wryqfvrnb4twozl0c4+6qxi-n_y4...@mail.gmail.com),
but that is very very partial.

This patchset is fairly DAX-centered, but Dan wasn't cc'ed!

So ho hum.  I'll scoop them up and shall make the above changes to the
[1/n] changelog, but we still have some work to do.



Re: [PATCH net-next] ibmveth: Make array ibmveth_stats static

2019-03-19 Thread David Miller
From: Yue Haibing 
Date: Tue, 19 Mar 2019 22:42:37 +0800

> From: YueHaibing 
> 
> Fix sparse warning:
> drivers/net/ethernet/ibm/ibmveth.c:96:21:
>  warning: symbol 'ibmveth_stats' was not declared. Should it be static?
> 
> Signed-off-by: YueHaibing 

Applied.


Re: [PATCH 2/3] powerpc: convert to generic builtin command line

2019-03-19 Thread Daniel Walker
On Mon, Mar 04, 2019 at 03:26:59PM +0100, Christophe Leroy wrote:
> 
> 
> Le 01/03/2019 à 20:44, Daniel Walker a écrit :
> > This updates the powerpc code to use the CONFIG_GENERIC_CMDLINE
> > option.
> 
> Please explain more in details how each powerpc option is replaced by one of
> the generic options.

CMDLINE is replace by two options to either which allow static options to either
be appended or prepended to the boot loader arguemnts. If you wanted a lateral
changes you would only fill in CONFIG_CMDLINE_PREPEND. CONFIG_CMDLINE_OVERRIDE
does the same as CMDLINE_FORCE, only with the append and prepend arguemnts
merged without the boot loader arguments.

> > --- a/arch/powerpc/kernel/prom.c
> > +++ b/arch/powerpc/kernel/prom.c
> > @@ -34,6 +34,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> > @@ -716,6 +717,9 @@ void __init early_init_devtree(void *params)
> >  */
> > of_scan_flat_dt(early_init_dt_scan_chosen_ppc, boot_command_line);
> > +   /* append and prepend any arguments built into the kernel. */
> > +   cmdline_add_builtin(boot_command_line, NULL, COMMAND_LINE_SIZE);
> > +
> 
> I don't think it is worth an implementation as complex as in the previous
> patch just for the above line.
> Could easily define the temporary buffer in this file directely, then just
> locally do:
> 
> strlcpy(temp_buff, CONFIG_CMDLINE_PREPEND, COMMAND_LINE_SIZE);
> strlcat(temp_buff, boot_command_line, COMMAND_LINE_SIZE);
> strlcat(temp_buff, CONFIG_CMDLINE_APPEND, COMMAND_LINE_SIZE);
> strlcpy(boot_command_line, temp_buff, COMMAND_LINE_SIZE);
 
The point of the code is to have an implementation that other architecture can
use. If we open code it in powerpc we're no better off.

> 
> 
> > /* Scan memory nodes and rebuild MEMBLOCKs */
> > of_scan_flat_dt(early_init_dt_scan_root, NULL);
> > of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL);
> > diff --git a/arch/powerpc/kernel/prom_init.c 
> > b/arch/powerpc/kernel/prom_init.c
> > index f33ff4163a51..e8e9fca22470 100644
> > --- a/arch/powerpc/kernel/prom_init.c
> > +++ b/arch/powerpc/kernel/prom_init.c
> > @@ -30,6 +30,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -637,11 +638,10 @@ static void __init early_cmdline_parse(void)
> > p = prom_cmd_line;
> > if ((long)prom.chosen > 0)
> > l = prom_getprop(prom.chosen, "bootargs", p, 
> > COMMAND_LINE_SIZE-1);
> > -#ifdef CONFIG_CMDLINE
> > +
> > if (l <= 0 || p[0] == '\0') /* dbl check */
> > -   strlcpy(prom_cmd_line,
> > -   CONFIG_CMDLINE, sizeof(prom_cmd_line));
> > -#endif /* CONFIG_CMDLINE */
> > +   cmdline_add_builtin_section(prom_cmd_line, NULL, 
> > sizeof(prom_cmd_line), __prombss);
> > +
> 
> You don't need something as complex as what your generic code does for that.
> It could be done with the following simple line:
> 
> strlcpy(prom_cmd_line, CONFIG_CMDLINE_PREPEND " " CONFIG_CMDLINE_APPEND,
> sizeof(prom_cmd_line));
> 
> > prom_printf("command line: %s\n", prom_cmd_line);
> >   #ifdef CONFIG_PPC64
> > diff --git a/arch/powerpc/kernel/prom_init_check.sh 
> > b/arch/powerpc/kernel/prom_init_check.sh
> > index 667df97d2595..ab2acc8d8b5a 100644
> > --- a/arch/powerpc/kernel/prom_init_check.sh
> > +++ b/arch/powerpc/kernel/prom_init_check.sh
> > @@ -18,7 +18,7 @@
> >   WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush
> >   _end enter_prom memcpy memset reloc_offset __secondary_hold
> > -__secondary_hold_acknowledge __secondary_hold_spinloop __start
> > +__secondary_hold_acknowledge __secondary_hold_spinloop __start strlcat
> 
> The above is a big issue. In the scope of KASAN implementation, we are
> getting rid of generic string functions from prom_init because they are
> KASAN instrumented and that's far too early for prom_init. See series
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=94949 and
> especially patch [v9,03/11] powerpc/prom_init: don't use string functions
> from lib/

You already re-implemented a number of string functions, seem easy enough to add
another one.

What your doing here is exactly what I'm trying to prevent in my implementation.
Say there is a small, but horrific defect in one of the string functions. Some
other architecture fixes it in lib/strings.c , woops , you just missed it and
now prom_init.c is stuck with it unless powerpc maintainers are watching closely
to keep up with the fixes to the string functions.

You could move these functions into the include/linux/string.h as static
inlines, then use them in lib/strings.c and in prom_init.c. Then you have a
unified implementation. I assume you would regard that as ugly tho.

Something else you would regard as ugly , your not adding an #ifdef on KASAN in
prom_init.c for the string functions. If you have that then any buggy string
functions which you may add (or forget to update) would only cause 

Re: [PULL REQUEST] powerpc generic command line

2019-03-19 Thread Daniel Walker
On Tue, Mar 19, 2019 at 06:42:35PM +0100, Christophe Leroy wrote:
> Well, that's what I started with, but at the end my main worry has been that
> you bring a non exciting set of complicated macros and code to replace
> simple code, and you break something out of generic OF code to a new brand
> new generic one, instead of updating the existing generic OF code.
 
Even if we update the generic OF code it only changes the powerpc changes 
slightly.
Because in arch/powerpc/kernel/prom_init.c there is a second version of the same
thing, which doesn't use OF.

We're not replacing simple macro's in powerpc with in-kind replacements, we're
adding a feature which we want. So yes our macros are more complicated, but in
the grand scheme of things they are very simple macros. If you think my stuff is
complicated, you haven't seen complicated.

I didn't see anyplace in your comments when you found code which would cause a
problem ? Did you find breakage which I missed?

> I like the idea behind your series very much, but I don't like too much the
> way it is proposed to be implemented. If you give me one week or two, I will
> come with a lighter proposal that should achieve the same goal.

It's fine with us, we just want the feature set. We'll continue with our version
tho, unless you decide to submit something.

I will incorporate your comments now, but immediately prior to a pull request I
couldn't add them.

Daniel


Re: [PULL REQUEST] powerpc generic command line

2019-03-19 Thread Christophe Leroy

Hi Daniel,

Le 19/03/2019 à 16:38, Daniel Walker a écrit :

On Tue, Mar 19, 2019 at 12:18:03PM +1100, Michael Ellerman wrote:

Hi Daniel,

Daniel Walker  writes:

Here are the generic command line changes for powerpc.

These changes have been in linux-next for two cycles, with few problems 
reported.
It's also been used at Cisco Systems, Inc. in production products for many many
years with no problems.

Please pull these changes.


Sorry I didn't reply to this earlier, have been busy with merge window
bugs and so on.

As I imagine you noticed, I didn't pull this. There are a few reasons.

Firstly you sent it a bit late, about a day before the 5.0 release, and
at 6am Saturday my time :) In future if you want me to merge something
please send a pull at least the ~Wednesday before the release.
   
Ok .. It was Friday morning my time.



Secondly I had no idea this code was even in linux-next. I'm not sure if
I was Cc'ed at some point when you added it, if so sorry I missed it,
but I get lots of email. If you're going to add changes to arch/powerpc
in your next tree I'd appreciate some notice, or preferably an explicit
ack.
  
Can I have an ack now ? Since your looking at it. Do you think this has no use,

certainly Cisco has use for it. It's still in linux-next as of now.


The main reason I didn't merge it is that it's adding a bunch of code
outside of arch/powerpc, into files which I'm not the maintainer for,
and the patches doing so have no acks or reviews from anyone.


With the exception of the Kconfig the header file is brand new, so I'm not sure
who would ack that. From a maintainer perspective I think you could add new
files without issues from other maintainers.


It's also adding a generic implementation with no indication that any
other arches are willing/able to use the generic implementation, which
begs the question whether it will actually used.
  
It would have been used by powerpc ;) I've gotten feedback in the past from

Ralf Baechle who thought this was useful, however that was years ago when
this was first submitted and the code around this area in mips has changed and
it would require a fair amount of new work to function properly on mips.

Also , no other platforms need to use this. Powerpc could be the only user of
it. This isn't really a question of a new exciting implementation of
something. This is really simple, it's just consolidation across architectures.
The implementation is vanilla, non-exciting stuff.


I appreciate it's hard to get these sort of cross architecture changes
into mainline, but I don't think this is the way to do it.

I'd suggest you post a patch series to linux-arch with the generic
changes and as many architecture conversions as you can manage, then get
some review/acks for the generic changes and chase arch maintainers for
some acks.
  
I didn't post to linux-arch , but the code has been around for years, submitted

multiple times with more architectures than powerpc. It was scaled down to just
powerpc to simplify it's submission.

It's really a simple set of changes, I don't think it needs as much thought as
other cross architecture changes.


I realise you have posted the series before, it may require some
persistence. There were also quite a few comments from Christophe, so
replying to those would be a good place to start.
  
I've looked at his comments, but I think he was more worried about conflicts with

his debugging enablement, not something to stop a pull request.


Well, that's what I started with, but at the end my main worry has been 
that you bring a non exciting set of complicated macros and code to 
replace simple code, and you break something out of generic OF code to a 
new brand new generic one, instead of updating the existing generic OF code.


I like the idea behind your series very much, but I don't like too much 
the way it is proposed to be implemented. If you give me one week or 
two, I will come with a lighter proposal that should achieve the same goal.


Christophe




The following changes since commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad:

   Linux 4.20-rc2 (2018-11-11 17:12:31 -0600)

are available in the git repository at:

   https://github.com/daniel-walker/cisco-linux.git for-powerpc

for you to fetch changes up to 5d4514a9c291ecf19b0626695161673d35e5d549:

   powerpc: convert config files to generic cmdline (2018-11-16 07:32:26 -0800)


Daniel Walker (3):
   add generic builtin command line
   powerpc: convert to generic builtin command line
   powerpc: convert config files to generic cmdline

  arch/powerpc/Kconfig  | 23 +
  arch/powerpc/configs/44x/fsp2_defconfig   | 29 ++-
  arch/powerpc/configs/44x/iss476-smp_defconfig | 24 -
  arch/powerpc/configs/44x/warp_defconfig   | 12 ++---
  arch/powerpc/configs/holly_defconfig  | 12 ++---
  arch/powerpc/configs/mvme5100_defconfig   | 2

Re: [PATCH v2 4/6] powerpc: use common ptrace_syscall_enter hook to handle _TIF_SYSCALL_EMU

2019-03-19 Thread Oleg Nesterov
On 03/19, Oleg Nesterov wrote:
>
> Well, personally I see no point... Again, after the trivial simplification
> x86 does
>
>   if (work & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) {
>   ret = tracehook_report_syscall_entry(regs);
>   if (ret || (work & _TIF_SYSCALL_EMU))
>   return -1L;
>   }
>
> this looks simple enough for copy-and-paste.
>
> > If there's a better way to achieve the same
>
> I can only say that if we add a common helper, I think it should absorb
> tracehook_report_syscall_entry() and handle both TIF's just like the code
> above does. Not sure this makes any sense.

this won't work, looking at 6/6 I see that arm64 needs to distinguish
_TRACE and _EMU ... I don't understand this code, but it looks suspicious.
If tracehook_report_syscall_entry() returns nonzero the tracee was killed,
syscall_trace_enter() should just return.

To me this is another indication that consolidation makes no sense ;)

Oleg.



Re: [PATCH v2 4/6] powerpc: use common ptrace_syscall_enter hook to handle _TIF_SYSCALL_EMU

2019-03-19 Thread Oleg Nesterov
On 03/18, Sudeep Holla wrote:
>
> On Mon, Mar 18, 2019 at 06:33:41PM +0100, Oleg Nesterov wrote:
> > On 03/18, Sudeep Holla wrote:
> > >
> > > On Mon, Mar 18, 2019 at 06:20:24PM +0100, Oleg Nesterov wrote:
> > > >
> > > > Again, to me this patch just makes the code look worse. Honestly, I 
> > > > don't
> > > > think that the new (badly named) ptrace_syscall_enter() hook makes any 
> > > > sense.
> > > >
> > >
> > > Worse because we end up reading current_thread_info->flags twice ?
> >
> > Mostly because in my opinion ptrace_syscall_enter() buys nothing but makes
> > the caller's code less readable/understandable.
> >
> > Sure, this is subjective.
> >
>
> Based on what we have in that function today, I tend to agree. Will and
> Richard were in the opinion to consolidate SYSEMU handling

Well, personally I see no point... Again, after the trivial simplification
x86 does

if (work & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) {
ret = tracehook_report_syscall_entry(regs);
if (ret || (work & _TIF_SYSCALL_EMU))
return -1L;
}

this looks simple enough for copy-and-paste.

> If there's a better way to achieve the same

I can only say that if we add a common helper, I think it should absorb
tracehook_report_syscall_entry() and handle both TIF's just like the code
above does. Not sure this makes any sense.

Oleg.



Re: [PATCH kernel RFC 2/2] vfio-pci-nvlink2: Implement interconnect isolation

2019-03-19 Thread Alex Williamson
On Fri, 15 Mar 2019 19:18:35 +1100
Alexey Kardashevskiy  wrote:

> The NVIDIA V100 SXM2 GPUs are connected to the CPU via PCIe links and
> (on POWER9) NVLinks. In addition to that, GPUs themselves have direct
> peer to peer NVLinks in groups of 2 to 4 GPUs. At the moment the POWERNV
> platform puts all interconnected GPUs to the same IOMMU group.
> 
> However the user may want to pass individual GPUs to the userspace so
> in order to do so we need to put them into separate IOMMU groups and
> cut off the interconnects.
> 
> Thankfully V100 GPUs implement an interface to do by programming link
> disabling mask to BAR0 of a GPU. Once a link is disabled in a GPU using
> this interface, it cannot be re-enabled until the secondary bus reset is
> issued to the GPU.
> 
> This defines a reset_done() handler for V100 NVlink2 device which
> determines what links need to be disabled. This relies on presence
> of the new "ibm,nvlink-peers" device tree property of a GPU telling which
> PCI peers it is connected to (which includes NVLink bridges or peer GPUs).
> 
> This does not change the existing behaviour and instead adds
> a new "isolate_nvlink" kernel parameter to allow such isolation.
> 
> The alternative approaches would be:
> 
> 1. do this in the system firmware (skiboot) but for that we would need
> to tell skiboot via an additional OPAL call whether or not we want this
> isolation - skiboot is unaware of IOMMU groups.
> 
> 2. do this in the secondary bus reset handler in the POWERNV platform -
> the problem with that is at that point the device is not enabled, i.e.
> config space is not restored so we need to enable the device (i.e. MMIO
> bit in CMD register + program valid address to BAR0) in order to disable
> links and then perhaps undo all this initialization to bring the device
> back to the state where pci_try_reset_function() expects it to be.

The trouble seems to be that this approach only maintains the isolation
exposed by the IOMMU group when vfio-pci is the active driver for the
device.  IOMMU groups can be used by any driver and the IOMMU core is
incorporating groups in various ways.  So, if there's a device specific
way to configure the isolation reported in the group, which requires
some sort of active management against things like secondary bus
resets, then I think we need to manage it above the attached endpoint
driver.  Ideally I'd see this as a set of PCI quirks so that we might
leverage it beyond POWER platforms.  I'm not sure how we get past the
reliance on device tree properties that we won't have on other
platforms though, if only NVIDIA could at least open a spec addressing
the discovery and configuration of NVLink registers on their
devices :-\  Thanks,

Alex


Re: [PATCH v3 06/17] KVM: PPC: Book3S HV: XIVE: add controls for the EQ configuration

2019-03-19 Thread Cédric Le Goater
On 3/19/19 5:54 AM, David Gibson wrote:
> On Mon, Mar 18, 2019 at 03:12:10PM +0100, Cédric Le Goater wrote:
>> On 3/18/19 4:23 AM, David Gibson wrote:
>>> On Fri, Mar 15, 2019 at 01:05:58PM +0100, Cédric Le Goater wrote:
 These controls will be used by the H_INT_SET_QUEUE_CONFIG and
 H_INT_GET_QUEUE_CONFIG hcalls from QEMU to configure the underlying
 Event Queue in the XIVE IC. They will also be used to restore the
 configuration of the XIVE EQs and to capture the internal run-time
 state of the EQs. Both 'get' and 'set' rely on an OPAL call to access
 the EQ toggle bit and EQ index which are updated by the XIVE IC when
 event notifications are enqueued in the EQ.

 The value of the guest physical address of the event queue is saved in
 the XIVE internal xive_q structure for later use. That is when
 migration needs to mark the EQ pages dirty to capture a consistent
 memory state of the VM.

 To be noted that H_INT_SET_QUEUE_CONFIG does not require the extra
 OPAL call setting the EQ toggle bit and EQ index to configure the EQ,
 but restoring the EQ state will.

 Signed-off-by: Cédric Le Goater 
 ---

  Changes since v2 :
  
  - fixed comments on the KVM device attribute definitions
  - fixed check on supported EQ size to restrict to 64K pages
  - checked kvm_eq.flags that need to be zero
  - removed the OPAL call when EQ qtoggle bit and index are zero. 

  arch/powerpc/include/asm/xive.h|   2 +
  arch/powerpc/include/uapi/asm/kvm.h|  21 ++
  arch/powerpc/kvm/book3s_xive.h |   2 +
  arch/powerpc/kvm/book3s_xive.c |  15 +-
  arch/powerpc/kvm/book3s_xive_native.c  | 232 +
  Documentation/virtual/kvm/devices/xive.txt |  31 +++
  6 files changed, 297 insertions(+), 6 deletions(-)

 diff --git a/arch/powerpc/include/asm/xive.h 
 b/arch/powerpc/include/asm/xive.h
 index b579a943407b..46891f321606 100644
 --- a/arch/powerpc/include/asm/xive.h
 +++ b/arch/powerpc/include/asm/xive.h
 @@ -73,6 +73,8 @@ struct xive_q {
u32 esc_irq;
atomic_tcount;
atomic_tpending_count;
 +  u64 guest_qpage;
 +  u32 guest_qsize;
  };
  
  /* Global enable flags for the XIVE support */
 diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
 b/arch/powerpc/include/uapi/asm/kvm.h
 index 12bb01baf0ae..1cd728c87d7c 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -679,6 +679,7 @@ struct kvm_ppc_cpu_char {
  #define KVM_DEV_XIVE_GRP_CTRL 1
  #define KVM_DEV_XIVE_GRP_SOURCE   2   /* 64-bit source 
 identifier */
  #define KVM_DEV_XIVE_GRP_SOURCE_CONFIG3   /* 64-bit source 
 identifier */
 +#define KVM_DEV_XIVE_GRP_EQ_CONFIG4   /* 64-bit EQ identifier 
 */
  
  /* Layout of 64-bit XIVE source attribute values */
  #define KVM_XIVE_LEVEL_SENSITIVE  (1ULL << 0)
 @@ -694,4 +695,24 @@ struct kvm_ppc_cpu_char {
  #define KVM_XIVE_SOURCE_EISN_SHIFT33
  #define KVM_XIVE_SOURCE_EISN_MASK 0xfffeULL
  
 +/* Layout of 64-bit EQ identifier */
 +#define KVM_XIVE_EQ_PRIORITY_SHIFT0
 +#define KVM_XIVE_EQ_PRIORITY_MASK 0x7
 +#define KVM_XIVE_EQ_SERVER_SHIFT  3
 +#define KVM_XIVE_EQ_SERVER_MASK   0xfff8ULL
 +
 +/* Layout of EQ configuration values (64 bytes) */
 +struct kvm_ppc_xive_eq {
 +  __u32 flags;
 +  __u32 qsize;
 +  __u64 qpage;
 +  __u32 qtoggle;
 +  __u32 qindex;
 +  __u8  pad[40];
 +};
 +
 +#define KVM_XIVE_EQ_FLAG_ENABLED  0x0001
 +#define KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY0x0002
 +#define KVM_XIVE_EQ_FLAG_ESCALATE 0x0004
 +
  #endif /* __LINUX_KVM_POWERPC_H */
 diff --git a/arch/powerpc/kvm/book3s_xive.h 
 b/arch/powerpc/kvm/book3s_xive.h
 index ae26fe653d98..622f594d93e1 100644
 --- a/arch/powerpc/kvm/book3s_xive.h
 +++ b/arch/powerpc/kvm/book3s_xive.h
 @@ -272,6 +272,8 @@ struct kvmppc_xive_src_block 
 *kvmppc_xive_create_src_block(
struct kvmppc_xive *xive, int irq);
  void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb);
  int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio);
 +int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
 +bool single_escalation);
  
  #endif /* CONFIG_KVM_XICS */
  #endif /* _KVM_PPC_BOOK3S_XICS_H */
 diff --git a/arch/powerpc/kvm/book3s_xive.c 
 b/arch/powerpc/kvm/book3s_xive.c
 index e09f3addffe5..c1b7aa7dbc28 100644
 --- a/arch/powerpc/kvm/book3s_xive.c
 +++ b/arch/powerpc/kvm/book3s_xive.c
 @@ -166,7

Re: [PULL REQUEST] powerpc generic command line

2019-03-19 Thread Daniel Walker
On Tue, Mar 19, 2019 at 12:18:03PM +1100, Michael Ellerman wrote:
> Hi Daniel,
> 
> Daniel Walker  writes:
> > Here are the generic command line changes for powerpc. 
> >
> > These changes have been in linux-next for two cycles, with few problems 
> > reported.
> > It's also been used at Cisco Systems, Inc. in production products for many 
> > many
> > years with no problems.
> >
> > Please pull these changes.
> 
> Sorry I didn't reply to this earlier, have been busy with merge window
> bugs and so on.
> 
> As I imagine you noticed, I didn't pull this. There are a few reasons.
> 
> Firstly you sent it a bit late, about a day before the 5.0 release, and
> at 6am Saturday my time :) In future if you want me to merge something
> please send a pull at least the ~Wednesday before the release.
  
Ok .. It was Friday morning my time.

> Secondly I had no idea this code was even in linux-next. I'm not sure if
> I was Cc'ed at some point when you added it, if so sorry I missed it,
> but I get lots of email. If you're going to add changes to arch/powerpc
> in your next tree I'd appreciate some notice, or preferably an explicit
> ack.
 
Can I have an ack now ? Since your looking at it. Do you think this has no use,
certainly Cisco has use for it. It's still in linux-next as of now.

> The main reason I didn't merge it is that it's adding a bunch of code
> outside of arch/powerpc, into files which I'm not the maintainer for,
> and the patches doing so have no acks or reviews from anyone.

With the exception of the Kconfig the header file is brand new, so I'm not sure
who would ack that. From a maintainer perspective I think you could add new
files without issues from other maintainers.

> It's also adding a generic implementation with no indication that any
> other arches are willing/able to use the generic implementation, which
> begs the question whether it will actually used.
 
It would have been used by powerpc ;) I've gotten feedback in the past from
Ralf Baechle who thought this was useful, however that was years ago when
this was first submitted and the code around this area in mips has changed and
it would require a fair amount of new work to function properly on mips.

Also , no other platforms need to use this. Powerpc could be the only user of
it. This isn't really a question of a new exciting implementation of
something. This is really simple, it's just consolidation across architectures.
The implementation is vanilla, non-exciting stuff.

> I appreciate it's hard to get these sort of cross architecture changes
> into mainline, but I don't think this is the way to do it.
> 
> I'd suggest you post a patch series to linux-arch with the generic
> changes and as many architecture conversions as you can manage, then get
> some review/acks for the generic changes and chase arch maintainers for
> some acks.
 
I didn't post to linux-arch , but the code has been around for years, submitted
multiple times with more architectures than powerpc. It was scaled down to just
powerpc to simplify it's submission.

It's really a simple set of changes, I don't think it needs as much thought as
other cross architecture changes.

> I realise you have posted the series before, it may require some
> persistence. There were also quite a few comments from Christophe, so
> replying to those would be a good place to start.
 
I've looked at his comments, but I think he was more worried about conflicts 
with
his debugging enablement, not something to stop a pull request.

> > The following changes since commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad:
> >
> >   Linux 4.20-rc2 (2018-11-11 17:12:31 -0600)
> >
> > are available in the git repository at:
> >
> >   https://github.com/daniel-walker/cisco-linux.git for-powerpc
> >
> > for you to fetch changes up to 5d4514a9c291ecf19b0626695161673d35e5d549:
> >
> >   powerpc: convert config files to generic cmdline (2018-11-16 07:32:26 
> > -0800)
> >
> > 
> > Daniel Walker (3):
> >   add generic builtin command line
> >   powerpc: convert to generic builtin command line
> >   powerpc: convert config files to generic cmdline
> >
> >  arch/powerpc/Kconfig  | 23 +
> >  arch/powerpc/configs/44x/fsp2_defconfig   | 29 ++-
> >  arch/powerpc/configs/44x/iss476-smp_defconfig | 24 -
> >  arch/powerpc/configs/44x/warp_defconfig   | 12 ++---
> >  arch/powerpc/configs/holly_defconfig  | 12 ++---
> >  arch/powerpc/configs/mvme5100_defconfig   | 25 +-
> >  arch/powerpc/configs/skiroot_defconfig| 48 +-
> >  arch/powerpc/configs/storcenter_defconfig | 15 +++---
> 
> Also if you're updating defconfigs please don't include any unrelated
> changes. Trimming the defconfigs can silently drop symbols and break
> people's setups so needs to be done carefully.
 
> It's safer to just sed the defconfig files directly, rather than running
> sa

Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

2019-03-19 Thread Dan Williams
On Tue, Mar 19, 2019 at 1:45 AM Kirill A. Shutemov  wrote:
>
> On Wed, Mar 13, 2019 at 09:07:13AM -0700, Dan Williams wrote:
> > On Wed, Mar 6, 2019 at 4:46 AM Aneesh Kumar K.V
> >  wrote:
> > >
> > > On 3/6/19 5:14 PM, Michal Suchánek wrote:
> > > > On Wed, 06 Mar 2019 14:47:33 +0530
> > > > "Aneesh Kumar K.V"  wrote:
> > > >
> > > >> Dan Williams  writes:
> > > >>
> > > >>> On Thu, Feb 28, 2019 at 1:40 AM Oliver  wrote:
> > > 
> > >  On Thu, Feb 28, 2019 at 7:35 PM Aneesh Kumar K.V
> > >   wrote:
> > > >
> > > >> Also even if the user decided to not use THP, by
> > > >> echo "never" > transparent_hugepage/enabled , we should continue to map
> > > >> dax fault using huge page on platforms that can support huge pages.
> > > >
> > > > Is this a good idea?
> > > >
> > > > This knob is there for a reason. In some situations having huge pages
> > > > can severely impact performance of the system (due to host-guest
> > > > interaction or whatever) and the ability to really turn off all THP
> > > > would be important in those cases, right?
> > > >
> > >
> > > My understanding was that is not true for dax pages? These are not
> > > regular memory that got allocated. They are allocated out of /dev/dax/
> > > or /dev/pmem*. Do we have a reason not to use hugepages for mapping
> > > pages in that case?
> >
> > The problem with the transparent_hugepage/enabled interface is that it
> > conflates performing compaction work to produce THP-pages with the
> > ability to map huge pages at all.
>
> That's not [entirely] true. transparent_hugepage/defrag gates heavy-duty
> compaction. We do only very limited compaction if it's not advised by
> transparent_hugepage/defrag.
>
> I believe DAX has to respect transparent_hugepage/enabled. Or not
> advertise its huge pages as THP. It's confusing for user.

What does "advertise its huge pages as THP" mean in practice? I think
it's confusing that DAX, a facility that bypasses System RAM, is
affected by a transparent_hugepage flag which is a feature for
combining System RAM pages into larger pages. For the same reason that
transparent_hugepage does not gate / control hugetlb operation is the
same reason that transparent_hugepage should not gate / control DAX. A
global setting to disable opportunistic large page mappings of
System-RAM makes sense, but I don't see why that should read on DAX?


[PATCH -next] rtc: opal: Make opal_tpo_alarm_irq_enable static

2019-03-19 Thread Yue Haibing
From: YueHaibing 

Fix sparse warning:

drivers/rtc/rtc-opal.c:227:5:
 warning: symbol 'opal_tpo_alarm_irq_enable' was not declared. Should it be 
static?

Signed-off-by: YueHaibing 
---
 drivers/rtc/rtc-opal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c
index 60f2250..3dd9d26 100644
--- a/drivers/rtc/rtc-opal.c
+++ b/drivers/rtc/rtc-opal.c
@@ -224,7 +224,7 @@ static int opal_set_tpo_time(struct device *dev, struct 
rtc_wkalrm *alarm)
return rc;
 }
 
-int opal_tpo_alarm_irq_enable(struct device *dev, unsigned int enabled)
+static int opal_tpo_alarm_irq_enable(struct device *dev, unsigned int enabled)
 {
struct rtc_wkalrm alarm = { .enabled = 0 };
 
-- 
2.7.4




[PATCH net-next] ibmveth: Make array ibmveth_stats static

2019-03-19 Thread Yue Haibing
From: YueHaibing 

Fix sparse warning:
drivers/net/ethernet/ibm/ibmveth.c:96:21:
 warning: symbol 'ibmveth_stats' was not declared. Should it be static?

Signed-off-by: YueHaibing 
---
 drivers/net/ethernet/ibm/ibmveth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
b/drivers/net/ethernet/ibm/ibmveth.c
index dd71d5d..d86b0e5 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -93,7 +93,7 @@ struct ibmveth_stat {
 #define IBMVETH_STAT_OFF(stat) offsetof(struct ibmveth_adapter, stat)
 #define IBMVETH_GET_STAT(a, off) *((u64 *)(((unsigned long)(a)) + off))
 
-struct ibmveth_stat ibmveth_stats[] = {
+static struct ibmveth_stat ibmveth_stats[] = {
{ "replenish_task_cycles", IBMVETH_STAT_OFF(replenish_task_cycles) },
{ "replenish_no_mem", IBMVETH_STAT_OFF(replenish_no_mem) },
{ "replenish_add_buff_failure",
-- 
2.7.4




Re: [PATCH v2] kmemleak: skip scanning holes in the .bss section

2019-03-19 Thread Catalin Marinas
Hi Qian,

On Wed, Mar 13, 2019 at 10:57:17AM -0400, Qian Cai wrote:
> @@ -1531,7 +1547,14 @@ static void kmemleak_scan(void)
>  
>   /* data/bss scanning */
>   scan_large_block(_sdata, _edata);
> - scan_large_block(__bss_start, __bss_stop);
> +
> + if (bss_hole_start) {
> + scan_large_block(__bss_start, bss_hole_start);
> + scan_large_block(bss_hole_stop, __bss_stop);
> + } else {
> + scan_large_block(__bss_start, __bss_stop);
> + }
> +
>   scan_large_block(__start_ro_after_init, __end_ro_after_init);

I'm not a fan of this approach but I couldn't come up with anything
better. I was hoping we could check for PageReserved() in scan_block()
but on arm64 it ends up not scanning the .bss at all.

Until another user appears, I'm ok with this patch.

Acked-by: Catalin Marinas 


Re: Shift overflow warnings in arch/powerpc/boot/addnote.c on 32-bit builds

2019-03-19 Thread Cédric Le Goater
On 3/19/19 9:45 AM, Christophe Leroy wrote:
> Hi,
> 
> Le 19/03/2019 à 08:10, Mark Cave-Ayland a écrit :
>> Hi all,
>>
>> Whilst building the latest git master on my G4 I noticed the following shift 
>> overflow
>> warnings in the build log for arch/powerpc/boot/addnote.c:
> 
> I guess the problem must have been there for some time. I get the exact same 
> on 4.14.106
> 
> When reverting 284b52c4c6e3 ("powerpc/boot: Add 64bit and little endian 
> support to addnote"), the warnings disappear.


Ouh. This is from the little-endian days. I suppose we are missing a u64 
cast somewhere ? The L suffix seems wrong also.

C. 

> Christophe
> 
> 
>>
>>
>> arch/powerpc/boot/addnote.c: In function ‘main’:
>> arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of 
>> type
>> [-Wshift-count-overflow]
>>   #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
>>     ^~
>> arch/powerpc/boot/addnote.c:72:39: note: in definition of macro ‘PUT_16BE’
>>   #define PUT_16BE(off, v)(buf[off] = ((v) >> 8) & 0xff, \
>>     ^
>> arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
>>   #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
>>     ^~~~
>> arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
>>   #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
>>    ^~~~
>> arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
>>     PUT_64(ph + PH_OFFSET, ns);
>>     ^~
>> arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of 
>> type
>> [-Wshift-count-overflow]
>>   #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
>>     ^~
>> arch/powerpc/boot/addnote.c:73:23: note: in definition of macro ‘PUT_16BE’
>>   buf[(off) + 1] = (v) & 0xff)
>>     ^
>> arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
>>   #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
>>     ^~~~
>> arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
>>   #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
>>    ^~~~
>> arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
>>     PUT_64(ph + PH_OFFSET, ns);
>>     ^~
>> arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of 
>> type
>> [-Wshift-count-overflow]
>>   #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
>>     ^~
>> arch/powerpc/boot/addnote.c:72:39: note: in definition of macro ‘PUT_16BE’
>>   #define PUT_16BE(off, v)(buf[off] = ((v) >> 8) & 0xff, \
>>     ^
>> arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
>>   #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
>>     ^~~~
>> arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
>>   #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
>>    ^~~~
>> arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
>>     PUT_64(ph + PH_OFFSET, ns);
>>     ^~
>> arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of 
>> type
>> [-Wshift-count-overflow]
>>   #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
>>     ^~
>> arch/powerpc/boot/addnote.c:73:23: note: in definition of macro ‘PUT_16BE’
>>   buf[(off) + 1] = (v) & 0xff)
>>     ^
>> arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
>>   #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
>>     ^~~~
>> arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
>>   #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
>>    ^~~~
>> arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
>>     PUT_64(ph + PH_OFFSET, ns);
>>     ^~
>> arch/powerpc/boot/addnote.c:85:73: warning: right shift count >= width of 
>> type
>> [-Wshift-count-overflow]
>>   #define PUT_64LE(off, v) (PUT_32LE((off), (v)), PUT_32LE((off) + 4, (v) >> 
>> 32L))
>>   ^~
>> arch/powerpc/boot/addnote.c:82:39: note: in definition of macro ‘PUT_16LE’
>>   #define PUT_16LE(off, v) (buf[off] = (v) & 0xff, \
>>     ^
>> arch/powerpc/boot/addnote.c:85:49: note: in expansion of macro ‘PUT_32LE’
>>   #define PUT_64LE(off, v) (PUT_32LE((off), (v)), PUT_32LE((off) + 4, (v) >> 
>> 32L))
>>   

Re: [RESEND PATCH v2] powerpc: mute unused-but-set-variable warnings

2019-03-19 Thread Qian Cai



On 3/19/19 5:21 AM, Christophe Leroy wrote:
> Is there a reason for resending ? AFAICS, both are identical and still marked
> new in patchwork:
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=76055
> 

"RESEND" because of no maintainer response for more than one week.

> Indeed, the resend has an issue in the commit log and fails the checkpatch (a 
> ::
> in Reviewed-by::)

Yes, my bad for a copy-and-paste error.


Re: [RESEND PATCH v2] powerpc: mute unused-but-set-variable warnings

2019-03-19 Thread Christophe Leroy
Is there a reason for resending ? AFAICS, both are identical and still 
marked new in patchwork: 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=76055


Indeed, the resend has an issue in the commit log and fails the 
checkpatch (a :: in Reviewed-by::)


Christophe

Le 17/03/2019 à 23:05, Qian Cai a écrit :

pte_unmap() compiles away on some powerpc platforms, so silence the
warnings below by making it a static inline function.

mm/memory.c: In function 'copy_pte_range':
mm/memory.c:820:24: warning: variable 'orig_dst_pte' set but not used
[-Wunused-but-set-variable]
mm/memory.c:820:9: warning: variable 'orig_src_pte' set but not used
[-Wunused-but-set-variable]
mm/madvise.c: In function 'madvise_free_pte_range':
mm/madvise.c:318:9: warning: variable 'orig_pte' set but not used
[-Wunused-but-set-variable]
mm/swap_state.c: In function 'swap_ra_info':
mm/swap_state.c:634:15: warning: variable 'orig_pte' set but not used
[-Wunused-but-set-variable]

Suggested-by: Christophe Leroy 
Reviewed-by:: Christophe Leroy 
Signed-off-by: Qian Cai 
---

v2: make it a static inline function.

  arch/powerpc/include/asm/book3s/64/pgtable.h | 3 ++-
  arch/powerpc/include/asm/nohash/64/pgtable.h | 3 ++-
  2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 868fcaf56f6b..d798e33a0c86 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1006,7 +1006,8 @@ extern struct page *pgd_page(pgd_t pgd);
(((pte_t *) pmd_page_vaddr(*(dir))) + pte_index(addr))
  
  #define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))

-#define pte_unmap(pte) do { } while(0)
+
+static inline void pte_unmap(pte_t *pte) { }
  
  /* to find an entry in a kernel page-table-directory */

  /* This now only contains the vmalloc pages */
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index e77ed9761632..0384a3302fb6 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -205,7 +205,8 @@ static inline void pgd_set(pgd_t *pgdp, unsigned long val)
(((pte_t *) pmd_page_vaddr(*(dir))) + (((addr) >> PAGE_SHIFT) & 
(PTRS_PER_PTE - 1)))
  
  #define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))

-#define pte_unmap(pte) do { } while(0)
+
+static inline void pte_unmap(pte_t *pte) { }
  
  /* to find an entry in a kernel page-table-directory */

  /* This now only contains the vmalloc pages */



Re: [PATCH v3] powerpc/mm: move warning from resize_hpt_for_hotplug()

2019-03-19 Thread Laurent Vivier
Hi Michael,

as it seems good now, could you pick up this patch for merging?

Thanks,
Laurent

On 13/03/2019 11:25, Laurent Vivier wrote:
> resize_hpt_for_hotplug() reports a warning when it cannot
> resize the hash page table ("Unable to resize hash page
> table to target order") but in some cases it's not a problem
> and can make user thinks something has not worked properly.
> 
> This patch moves the warning to arch_remove_memory() to
> only report the problem when it is needed.
> 
> Reviewed-by: David Gibson 
> Signed-off-by: Laurent Vivier 
> ---
> 
> Notes:
> v3: move "||" to above line and remove parenthesis
> v2: add warning messages for H_PARAMETER and H_RESOURCE
> 
>  arch/powerpc/include/asm/sparsemem.h  |  4 ++--
>  arch/powerpc/mm/hash_utils_64.c   | 19 +++
>  arch/powerpc/mm/mem.c |  3 ++-
>  arch/powerpc/platforms/pseries/lpar.c |  3 ++-
>  4 files changed, 13 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/sparsemem.h 
> b/arch/powerpc/include/asm/sparsemem.h
> index 68da49320592..3192d454a733 100644
> --- a/arch/powerpc/include/asm/sparsemem.h
> +++ b/arch/powerpc/include/asm/sparsemem.h
> @@ -17,9 +17,9 @@ extern int create_section_mapping(unsigned long start, 
> unsigned long end, int ni
>  extern int remove_section_mapping(unsigned long start, unsigned long end);
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> -extern void resize_hpt_for_hotplug(unsigned long new_mem_size);
> +extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
>  #else
> -static inline void resize_hpt_for_hotplug(unsigned long new_mem_size) { }
> +static inline int resize_hpt_for_hotplug(unsigned long new_mem_size) { 
> return 0; }
>  #endif
>  
>  #ifdef CONFIG_NUMA
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 0cc7fbc3bd1c..5aa7594ee71b 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -755,12 +755,12 @@ static unsigned long __init htab_get_table_size(void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -void resize_hpt_for_hotplug(unsigned long new_mem_size)
> +int resize_hpt_for_hotplug(unsigned long new_mem_size)
>  {
>   unsigned target_hpt_shift;
>  
>   if (!mmu_hash_ops.resize_hpt)
> - return;
> + return 0;
>  
>   target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
>  
> @@ -772,16 +772,11 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
>* reduce unless the target shift is at least 2 below the
>* current shift
>*/
> - if ((target_hpt_shift > ppc64_pft_size)
> - || (target_hpt_shift < (ppc64_pft_size - 1))) {
> - int rc;
> -
> - rc = mmu_hash_ops.resize_hpt(target_hpt_shift);
> - if (rc && (rc != -ENODEV))
> - printk(KERN_WARNING
> -"Unable to resize hash page table to target 
> order %d: %d\n",
> -target_hpt_shift, rc);
> - }
> + if (target_hpt_shift > ppc64_pft_size ||
> + target_hpt_shift < ppc64_pft_size - 1)
> + return mmu_hash_ops.resize_hpt(target_hpt_shift);
> +
> + return 0;
>  }
>  
>  int hash__create_section_mapping(unsigned long start, unsigned long end, int 
> nid)
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 33cc6f676fa6..0d40d970cf4a 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -169,7 +169,8 @@ int __meminit arch_remove_memory(int nid, u64 start, u64 
> size,
>*/
>   vm_unmap_aliases();
>  
> - resize_hpt_for_hotplug(memblock_phys_mem_size());
> + if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
> + pr_warn("Hash collision while resizing HPT\n");
>  
>   return ret;
>  }
> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
> b/arch/powerpc/platforms/pseries/lpar.c
> index f2a9f0adc2d3..1034ef1fe2b4 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -901,8 +901,10 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
>   break;
>  
>   case H_PARAMETER:
> + pr_warn("Invalid argument from H_RESIZE_HPT_PREPARE\n");
>   return -EINVAL;
>   case H_RESOURCE:
> + pr_warn("Operation not permitted from H_RESIZE_HPT_PREPARE\n");
>   return -EPERM;
>   default:
>   pr_warn("Unexpected error %d from H_RESIZE_HPT_PREPARE\n", rc);
> @@ -918,7 +920,6 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
>   if (rc != 0) {
>   switch (state.commit_rc) {
>   case H_PTEG_FULL:
> - pr_warn("Hash collision while resizing HPT\n");
>   return -ENOSPC;
>  
>   default:
> 



Re: Shift overflow warnings in arch/powerpc/boot/addnote.c on 32-bit builds

2019-03-19 Thread Christophe Leroy

Hi,

Le 19/03/2019 à 08:10, Mark Cave-Ayland a écrit :

Hi all,

Whilst building the latest git master on my G4 I noticed the following shift 
overflow
warnings in the build log for arch/powerpc/boot/addnote.c:


I guess the problem must have been there for some time. I get the exact 
same on 4.14.106


When reverting 284b52c4c6e3 ("powerpc/boot: Add 64bit and little endian 
support to addnote"), the warnings disappear.


Christophe





arch/powerpc/boot/addnote.c: In function ‘main’:
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of type
[-Wshift-count-overflow]
  #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
^~
arch/powerpc/boot/addnote.c:72:39: note: in definition of macro ‘PUT_16BE’
  #define PUT_16BE(off, v)(buf[off] = ((v) >> 8) & 0xff, \
^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
  #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
  #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
   ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
PUT_64(ph + PH_OFFSET, ns);
^~
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of type
[-Wshift-count-overflow]
  #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
^~
arch/powerpc/boot/addnote.c:73:23: note: in definition of macro ‘PUT_16BE’
  buf[(off) + 1] = (v) & 0xff)
^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
  #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
  #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
   ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
PUT_64(ph + PH_OFFSET, ns);
^~
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of type
[-Wshift-count-overflow]
  #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
^~
arch/powerpc/boot/addnote.c:72:39: note: in definition of macro ‘PUT_16BE’
  #define PUT_16BE(off, v)(buf[off] = ((v) >> 8) & 0xff, \
^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
  #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
  #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
   ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
PUT_64(ph + PH_OFFSET, ns);
^~
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of type
[-Wshift-count-overflow]
  #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
^~
arch/powerpc/boot/addnote.c:73:23: note: in definition of macro ‘PUT_16BE’
  buf[(off) + 1] = (v) & 0xff)
^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
  #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
  #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
   ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
PUT_64(ph + PH_OFFSET, ns);
^~
arch/powerpc/boot/addnote.c:85:73: warning: right shift count >= width of type
[-Wshift-count-overflow]
  #define PUT_64LE(off, v) (PUT_32LE((off), (v)), PUT_32LE((off) + 4, (v) >> 
32L))
  ^~
arch/powerpc/boot/addnote.c:82:39: note: in definition of macro ‘PUT_16LE’
  #define PUT_16LE(off, v) (buf[off] = (v) & 0xff, \
^
arch/powerpc/boot/addnote.c:85:49: note: in expansion of macro ‘PUT_32LE’
  #define PUT_64LE(off, v) (PUT_32LE((off), (v)), PUT_32LE((off) + 4, (v) >> 
32L))
  ^~~~
arch/powerpc/boot/addnote.c:95:5: note: in expansion of macro ‘PUT_64LE’
  PUT_64LE(off, v))
  ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
PUT_64(ph + PH_OFFSET, ns);
^~
arch/powerpc/boot/addnote.c:85:73: warning: right shift count >= width of type
[-Wshift-count-overflow]
  #define PUT_64LE(off, v) (PUT_32LE((off), (v)), PUT_32LE((off) + 4, (v) >> 
32

Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

2019-03-19 Thread Kirill A. Shutemov
On Wed, Mar 13, 2019 at 09:07:13AM -0700, Dan Williams wrote:
> On Wed, Mar 6, 2019 at 4:46 AM Aneesh Kumar K.V
>  wrote:
> >
> > On 3/6/19 5:14 PM, Michal Suchánek wrote:
> > > On Wed, 06 Mar 2019 14:47:33 +0530
> > > "Aneesh Kumar K.V"  wrote:
> > >
> > >> Dan Williams  writes:
> > >>
> > >>> On Thu, Feb 28, 2019 at 1:40 AM Oliver  wrote:
> > 
> >  On Thu, Feb 28, 2019 at 7:35 PM Aneesh Kumar K.V
> >   wrote:
> > >
> > >> Also even if the user decided to not use THP, by
> > >> echo "never" > transparent_hugepage/enabled , we should continue to map
> > >> dax fault using huge page on platforms that can support huge pages.
> > >
> > > Is this a good idea?
> > >
> > > This knob is there for a reason. In some situations having huge pages
> > > can severely impact performance of the system (due to host-guest
> > > interaction or whatever) and the ability to really turn off all THP
> > > would be important in those cases, right?
> > >
> >
> > My understanding was that is not true for dax pages? These are not
> > regular memory that got allocated. They are allocated out of /dev/dax/
> > or /dev/pmem*. Do we have a reason not to use hugepages for mapping
> > pages in that case?
> 
> The problem with the transparent_hugepage/enabled interface is that it
> conflates performing compaction work to produce THP-pages with the
> ability to map huge pages at all.

That's not [entirely] true. transparent_hugepage/defrag gates heavy-duty
compaction. We do only very limited compaction if it's not advised by
transparent_hugepage/defrag.

I believe DAX has to respect transparent_hugepage/enabled. Or not
advertise its huge pages as THP. It's confusing for user.

-- 
 Kirill A. Shutemov


Shift overflow warnings in arch/powerpc/boot/addnote.c on 32-bit builds

2019-03-19 Thread Mark Cave-Ayland
Hi all,

Whilst building the latest git master on my G4 I noticed the following shift 
overflow
warnings in the build log for arch/powerpc/boot/addnote.c:


arch/powerpc/boot/addnote.c: In function ‘main’:
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of type
[-Wshift-count-overflow]
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~
arch/powerpc/boot/addnote.c:72:39: note: in definition of macro ‘PUT_16BE’
 #define PUT_16BE(off, v)(buf[off] = ((v) >> 8) & 0xff, \
   ^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
 #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
  ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
   PUT_64(ph + PH_OFFSET, ns);
   ^~
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of type
[-Wshift-count-overflow]
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~
arch/powerpc/boot/addnote.c:73:23: note: in definition of macro ‘PUT_16BE’
 buf[(off) + 1] = (v) & 0xff)
   ^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
 #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
  ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
   PUT_64(ph + PH_OFFSET, ns);
   ^~
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of type
[-Wshift-count-overflow]
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~
arch/powerpc/boot/addnote.c:72:39: note: in definition of macro ‘PUT_16BE’
 #define PUT_16BE(off, v)(buf[off] = ((v) >> 8) & 0xff, \
   ^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
 #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
  ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
   PUT_64(ph + PH_OFFSET, ns);
   ^~
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of type
[-Wshift-count-overflow]
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~
arch/powerpc/boot/addnote.c:73:23: note: in definition of macro ‘PUT_16BE’
 buf[(off) + 1] = (v) & 0xff)
   ^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro ‘PUT_32BE’
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro ‘PUT_64BE’
 #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
  ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
   PUT_64(ph + PH_OFFSET, ns);
   ^~
arch/powerpc/boot/addnote.c:85:73: warning: right shift count >= width of type
[-Wshift-count-overflow]
 #define PUT_64LE(off, v) (PUT_32LE((off), (v)), PUT_32LE((off) + 4, (v) >> 
32L))
 ^~
arch/powerpc/boot/addnote.c:82:39: note: in definition of macro ‘PUT_16LE’
 #define PUT_16LE(off, v) (buf[off] = (v) & 0xff, \
   ^
arch/powerpc/boot/addnote.c:85:49: note: in expansion of macro ‘PUT_32LE’
 #define PUT_64LE(off, v) (PUT_32LE((off), (v)), PUT_32LE((off) + 4, (v) >> 
32L))
 ^~~~
arch/powerpc/boot/addnote.c:95:5: note: in expansion of macro ‘PUT_64LE’
 PUT_64LE(off, v))
 ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
   PUT_64(ph + PH_OFFSET, ns);
   ^~
arch/powerpc/boot/addnote.c:85:73: warning: right shift count >= width of type
[-Wshift-count-overflow]
 #define PUT_64LE(off, v) (PUT_32LE((off), (v)), PUT_32LE((off) + 4, (v) >> 
32L))
 ^~
arch/powerpc/boot/addnote.c:83:25: note: in definition of macro ‘PUT_16LE’
  buf[(off) + 1] = ((v) >> 8) & 0xff)
 ^
arch/powerpc/boot/addnote.c:85:49: note: in expansion of macro ‘PUT_32LE’
 #define PUT_64LE(off, v) (PUT_32LE((of