[PATCH v2] powernv/elog: Fix the race while processing OPAL error log event.

2020-10-04 Thread Mahesh Salgaonkar
Every error log reported by OPAL is exported to userspace through a sysfs
interface and notified using kobject_uevent(). The userspace daemon
(opal_errd) then reads the error log and acknowledges it error log is saved
safely to disk. Once acknowledged the kernel removes the respective sysfs
file entry causing respective resources getting released including kobject.

However there are chances where user daemon may already be scanning elog
entries while new sysfs elog entry is being created by kernel. User daemon
may read this new entry and ack it even before kernel can notify userspace
about it through kobject_uevent() call. If that happens then we have a
potential race between elog_ack_store->kobject_put() and kobject_uevent
which can lead to use-after-free issue of a kernfs object resulting into a
kernel crash. This patch fixes this race by protecting a sysfs file
creation/notification by holding an additional reference count on kobject
until we safely send kobject_uevent().

Reported-by: Oliver O'Halloran 
Signed-off-by: Mahesh Salgaonkar 
Signed-off-by: Aneesh Kumar K.V 
---
Change in v2:
- Instead of mutex and use extra reference count on kobject to avoid the
  race.
---
 arch/powerpc/platforms/powernv/opal-elog.c |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-elog.c 
b/arch/powerpc/platforms/powernv/opal-elog.c
index 62ef7ad995da..230f102e87c0 100644
--- a/arch/powerpc/platforms/powernv/opal-elog.c
+++ b/arch/powerpc/platforms/powernv/opal-elog.c
@@ -222,13 +222,28 @@ static struct elog_obj *create_elog_obj(uint64_t id, 
size_t size, uint64_t type)
return NULL;
}
 
+   /*
+* As soon as sysfs file for this elog is created/activated there is
+* chance opal_errd daemon might read and acknowledge this elog before
+* kobject_uevent() is called. If that happens then we have a potential
+* race between elog_ack_store->kobject_put() and kobject_uevent which
+* leads to use-after-free issue of a kernfs object resulting into
+* kernel crash. To avoid this race take an additional reference count
+* on kobject until we safely send kobject_uevent().
+*/
+
+   kobject_get(>kobj);  /* extra reference count */
rc = sysfs_create_bin_file(>kobj, >raw_attr);
if (rc) {
+   kobject_put(>kobj);
+   /* Drop the extra reference count  */
kobject_put(>kobj);
return NULL;
}
 
kobject_uevent(>kobj, KOBJ_ADD);
+   /* Drop the extra reference count  */
+   kobject_put(>kobj);
 
return elog;
 }




Re: [PATCH v4] PCI: Unify ECAM constants in native PCI Express drivers

2020-10-04 Thread Florian Fainelli




On 10/4/2020 5:38 PM, Krzysztof Wilczyński wrote:

Unify ECAM-related constants into a single set of standard constants
defining memory address shift values for the byte-level address that can
be used when accessing the PCI Express Configuration Space, and then
move native PCI Express controller drivers to use newly introduced
definitions retiring any driver-specific ones.

The ECAM ("Enhanced Configuration Access Mechanism") is defined by the
PCI Express specification (see PCI Express Base Specification, Revision
5.0, Version 1.0, Section 7.2.2, p. 676), thus most hardware should
implement it the same way.  Most of the native PCI Express controller
drivers define their ECAM-related constants, many of these could be
shared, or use open-coded values when setting the .bus_shift field of
the struct pci_ecam_ops.

All of the newly added constants should remove ambiguity and reduce the
number of open-coded values, and also correlate more strongly with the
descriptions in the aforementioned specification (see Table 7-1
"Enhanced Configuration Address Mapping", p. 677).

There is no change to functionality.

Suggested-by: Bjorn Helgaas 
Signed-off-by: Krzysztof Wilczyński 
---


[snip]

  
-/* Configuration space read/write support */

-static inline int brcm_pcie_cfg_index(int busnr, int devfn, int reg)
-{
-   return ((PCI_SLOT(devfn) & 0x1f) << PCIE_EXT_SLOT_SHIFT)
-   | ((PCI_FUNC(devfn) & 0x07) << PCIE_EXT_FUNC_SHIFT)
-   | (busnr << PCIE_EXT_BUSNUM_SHIFT)
-   | (reg & ~3);
-}
-
  static void __iomem *brcm_pcie_map_conf(struct pci_bus *bus, unsigned int 
devfn,
int where)
  {
@@ -590,7 +578,7 @@ static void __iomem *brcm_pcie_map_conf(struct pci_bus 
*bus, unsigned int devfn,
return PCI_SLOT(devfn) ? NULL : base + where;
  
  	/* For devices, write to the config space index register */

-   idx = brcm_pcie_cfg_index(bus->number, devfn, 0);
+   idx = PCIE_ECAM_BUS(bus->number) | PCIE_ECAM_DEVFN(devfn);


This appears to be correct, so:

Acked-by: Florian Fainelli 

however, I would have defined a couple of additional helper macros and do:

	idx = PCIE_ECAM_BUS(bus->number) | PCIE_ECAM_DEV(devfn) | 
PCIE_ECAM_FUN(devfn);


for clarity.

[snip]


+/*
+ * Memory address shift values for the byte-level address that
+ * can be used when accessing the PCI Express Configuration Space.
+ */
+
+/*
+ * Enhanced Configuration Access Mechanism (ECAM)
+ *
+ * See PCI Express Base Specification, Revision 5.0, Version 1.0,
+ * Section 7.2.2, Table 7-1, p. 677.
+ */
+#define PCIE_ECAM_BUS_SHIFT20 /* Bus Number */
+#define PCIE_ECAM_DEV_SHIFT15 /* Device Number */
+#define PCIE_ECAM_FUN_SHIFT12 /* Function Number */
+
+#define PCIE_ECAM_BUS(x)   (((x) & 0xff) << PCIE_ECAM_BUS_SHIFT)
+#define PCIE_ECAM_DEVFN(x) (((x) & 0xff) << PCIE_ECAM_FUN_SHIFT)


For instance, adding these two:

#define PCIE_ECAM_DEV(x)(((x) & 0x1f) << PCIE_ECAM_DEV_SHIFT)
#define PCIE_ECAM_FUN(x)(((x) & 0x7) << PCIE_ECAM_FUN_SHIFT)

may be clearer for use in drivers like pcie-brcmstb.c that used to treat 
the device function in terms of device and function (though it was 
called slot there).

--
Florian


[PATCH v4] PCI: Unify ECAM constants in native PCI Express drivers

2020-10-04 Thread Krzysztof Wilczyński
Unify ECAM-related constants into a single set of standard constants
defining memory address shift values for the byte-level address that can
be used when accessing the PCI Express Configuration Space, and then
move native PCI Express controller drivers to use newly introduced
definitions retiring any driver-specific ones.

The ECAM ("Enhanced Configuration Access Mechanism") is defined by the
PCI Express specification (see PCI Express Base Specification, Revision
5.0, Version 1.0, Section 7.2.2, p. 676), thus most hardware should
implement it the same way.  Most of the native PCI Express controller
drivers define their ECAM-related constants, many of these could be
shared, or use open-coded values when setting the .bus_shift field of
the struct pci_ecam_ops.

All of the newly added constants should remove ambiguity and reduce the
number of open-coded values, and also correlate more strongly with the
descriptions in the aforementioned specification (see Table 7-1
"Enhanced Configuration Address Mapping", p. 677).

There is no change to functionality.

Suggested-by: Bjorn Helgaas 
Signed-off-by: Krzysztof Wilczyński 
---
Changed in v4:
  Removed constants related to "CAM".
  Added more platforms and devices that can use new ECAM macros and
  constants.
  Removed unused ".bus_shift" initialisers from pci-xgene.c as
  xgene_pcie_map_bus() did not use these.

Changes in v3:
  Updated commit message wording.
  Updated regarding custom ECAM bus shift values and concerning PCI base
  configuration space access for Type 1 access.
  Refactored rockchip_pcie_rd_other_conf() and rockchip_pcie_wr_other_conf()
  and removed the "busdev" variable.
  Removed surplus "relbus" variable from nwl_pcie_map_bus() and
  xilinx_pcie_map_bus().
  Renamed the PCIE_ECAM_ADDR() macro to PCIE_ECAM_OFFSET().

Changes in v2:
  Use PCIE_ECAM_ADDR macro when computing ECAM address offset, but drop
  PCI_SLOT and PCI_FUNC macros from the PCIE_ECAM_ADDR macro in favour
  of using a single value for the device/function.

 arch/powerpc/platforms/4xx/pci.c|  7 +++--
 drivers/pci/controller/dwc/pcie-al.c|  8 +++---
 drivers/pci/controller/dwc/pcie-hisi.c  |  4 +--
 drivers/pci/controller/pci-aardvark.c   | 13 +++--
 drivers/pci/controller/pci-host-generic.c   |  2 +-
 drivers/pci/controller/pci-thunder-ecam.c   |  2 +-
 drivers/pci/controller/pci-thunder-pem.c| 13 +++--
 drivers/pci/controller/pci-xgene.c  |  2 --
 drivers/pci/controller/pcie-brcmstb.c   | 16 ++--
 drivers/pci/controller/pcie-iproc.c | 29 ++---
 drivers/pci/controller/pcie-rockchip-host.c | 27 +--
 drivers/pci/controller/pcie-rockchip.h  |  8 +-
 drivers/pci/controller/pcie-tango.c |  2 +-
 drivers/pci/controller/pcie-xilinx-nwl.c|  9 ++-
 drivers/pci/controller/pcie-xilinx.c| 11 ++--
 drivers/pci/controller/vmd.c|  5 ++--
 drivers/pci/ecam.c  |  4 +--
 include/linux/pci-ecam.h| 24 +
 18 files changed, 82 insertions(+), 104 deletions(-)

diff --git a/arch/powerpc/platforms/4xx/pci.c b/arch/powerpc/platforms/4xx/pci.c
index c13d64c3b019..cee40e0b061c 100644
--- a/arch/powerpc/platforms/4xx/pci.c
+++ b/arch/powerpc/platforms/4xx/pci.c
@@ -20,6 +20,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1585,17 +1586,15 @@ static void __iomem 
*ppc4xx_pciex_get_config_base(struct ppc4xx_pciex_port *port
  struct pci_bus *bus,
  unsigned int devfn)
 {
-   int relbus;
-
/* Remove the casts when we finally remove the stupid volatile
 * in struct pci_controller
 */
if (bus->number == port->hose->first_busno)
return (void __iomem *)port->hose->cfg_addr;
 
-   relbus = bus->number - (port->hose->first_busno + 1);
return (void __iomem *)port->hose->cfg_data +
-   ((relbus  << 20) | (devfn << 12));
+   PCIE_ECAM_BUS(bus->number - (port->hose->first_busno + 1)) |
+   PCIE_ECAM_DEVFN(devfn);
 }
 
 static int ppc4xx_pciex_read_config(struct pci_bus *bus, unsigned int devfn,
diff --git a/drivers/pci/controller/dwc/pcie-al.c 
b/drivers/pci/controller/dwc/pcie-al.c
index d57d4ee15848..7c2aa049113c 100644
--- a/drivers/pci/controller/dwc/pcie-al.c
+++ b/drivers/pci/controller/dwc/pcie-al.c
@@ -76,7 +76,7 @@ static int al_pcie_init(struct pci_config_window *cfg)
 }
 
 const struct pci_ecam_ops al_pcie_ops = {
-   .bus_shift= 20,
+   .bus_shift= PCIE_ECAM_BUS_SHIFT,
.init =  al_pcie_init,
.pci_ops  = {
.map_bus= al_pcie_map_bus,
@@ -138,8 +138,6 @@ struct al_pcie {
struct al_pcie_target_bus_cfg target_bus_cfg;
 };
 
-#define PCIE_ECAM_DEVFN(x) (((x) & 0xff) << 12)
-
 #define to_al_pcie(x) 

[PATCH] powernv/elog: Fix the race while processing OPAL error log event.

2020-10-04 Thread Mahesh Salgaonkar
Every error log reported by OPAL is exported to userspace through a sysfs
interface and notified using kobject_uevent(). The userspace daemon
(opal_errd) then reads the error log and acknowledges it error log is saved
safely to disk. Once acknowledged the kernel removes the respective sysfs
file entry causing respective resources getting released including kobject.

However there are chances where user daemon may already be scanning elog
entries while new sysfs elog entry is being created by kernel. User daemon
may read this new entry and ack it even before kernel can notify userspace
about it through kobject_uevent() call. If that happens then we have a
potential race between elog_ack_store->kobject_put() and kobject_uevent
which can lead to use-after-free issue of a kernfs object resulting into a
kernel crash. This patch fixes this race by protecting a sysfs file
creation/notification using mutex lock per elog record.

Reported-by: Oliver O'Halloran 
Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/platforms/powernv/opal-elog.c |   17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-elog.c 
b/arch/powerpc/platforms/powernv/opal-elog.c
index 62ef7ad995da..4bba8f968ced 100644
--- a/arch/powerpc/platforms/powernv/opal-elog.c
+++ b/arch/powerpc/platforms/powernv/opal-elog.c
@@ -24,6 +24,7 @@ struct elog_obj {
uint64_t type;
size_t size;
char *buffer;
+   struct mutex elog_mutex;
 };
 #define to_elog_obj(x) container_of(x, struct elog_obj, kobj)
 
@@ -73,7 +74,11 @@ static ssize_t elog_ack_store(struct elog_obj *elog_obj,
  size_t count)
 {
opal_send_ack_elog(elog_obj->id);
+
+   /* Wait until kobject_uevent() is called */
+   mutex_lock(_obj->elog_mutex);
sysfs_remove_file_self(_obj->kobj, >attr);
+   mutex_unlock(_obj->elog_mutex);
kobject_put(_obj->kobj);
return count;
 }
@@ -193,6 +198,7 @@ static struct elog_obj *create_elog_obj(uint64_t id, size_t 
size, uint64_t type)
kobject_init(>kobj, _ktype);
 
sysfs_bin_attr_init(>raw_attr);
+   mutex_init(>elog_mutex);
 
elog->raw_attr.attr.name = "raw";
elog->raw_attr.attr.mode = 0400;
@@ -222,13 +228,24 @@ static struct elog_obj *create_elog_obj(uint64_t id, 
size_t size, uint64_t type)
return NULL;
}
 
+   /*
+* As soon as sysfs file for this elog is created/activated there is
+* chance opal_errd daemon might read and acknowledge this elog before
+* kobject_uevent() is called. If that happens then we have a potential
+* race between elog_ack_store->kobject_put() and kobject_uevent which
+* leads to use-after-free issue of a kernfs object resulting into
+* kernel crash. Take a mutex lock to avoid this race.
+*/
+   mutex_lock(>elog_mutex);
rc = sysfs_create_bin_file(>kobj, >raw_attr);
if (rc) {
kobject_put(>kobj);
+   mutex_unlock(>elog_mutex);
return NULL;
}
 
kobject_uevent(>kobj, KOBJ_ADD);
+   mutex_unlock(>elog_mutex);
 
return elog;
 }