[PATCH v5 04/10] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-06-09 Thread Aneesh Kumar K.V
Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/md/dm-writecache.c   | 2 +-
 drivers/nvdimm/region_devs.c | 8 
 include/linux/libnvdimm.h| 4 
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 613c171b1b6d..904fdbf2b089 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -540,7 +540,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
 static void writecache_commit_flushed(struct dm_writecache *wc, bool 
wait_for_ios)
 {
if (WC_MODE_PMEM(wc))
-   wmb();
+   arch_pmem_flush_barrier();
else
ssd_commit_flushed(wc, wait_for_ios);
 }
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ccbb5b43b8b2..88ea34a9c7fd 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1216,13 +1216,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
 
/*
-* The first wmb() is needed to 'sfence' all previous writes
-* such that they are architecturally visible for the platform
-* buffer flush.  Note that we've already arranged for pmem
+* The first arch_pmem_flush_barrier() is needed to 'sfence' all
+* previous writes such that they are architecturally visible for
+* the platform buffer flush. Note that we've already arranged for pmem
 * writes to avoid the cache via memcpy_flushcache().  The final
 * wmb() ensures ordering for the NVDIMM flush write.
 */
-   wmb();
+   arch_pmem_flush_barrier();
for (i = 0; i < nd_region->ndr_mappings; i++)
if (ndrd_get_flush_wpq(ndrd, i, 0))
writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..66f6c65bd789 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t 
size)
 }
 #endif
 
+#ifndef arch_pmem_flush_barrier
+#define arch_pmem_flush_barrier() wmb()
+#endif
+
 #endif /* __LIBNVDIMM_H__ */
-- 
2.26.2



[PATCH v5 01/10] powerpc/pmem: Restrict papr_scm to P8 and above.

2020-06-09 Thread Aneesh Kumar K.V
The PAPR based virtualized persistent memory devices are only supported on
POWER9 and above. In the followup patch, the kernel will switch the persistent
memory cache flush functions to use a new `dcbf` variant instruction. The new
instructions even though added in ISA 3.1 works even on P8 and P9 because these
are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
P9 behaves as such.

Considering these devices are only supported on P8 and above,  update the driver
to prevent a P7-compat guest from using persistent memory devices.

We don't update of_pmem driver with the same condition, because, on bare-metal,
the firmware enables pmem support only on P9 and above. There the kernel depends
on OPAL firmware to restrict exposing persistent memory related device tree
entries on older hardware. of_pmem.ko is written without any arch dependency and
we don't want to add ppc64 specific cpu feature check in of_pmem driver.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/pseries/pmem.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pmem.c 
b/arch/powerpc/platforms/pseries/pmem.c
index f860a897a9e0..2347e1038f58 100644
--- a/arch/powerpc/platforms/pseries/pmem.c
+++ b/arch/powerpc/platforms/pseries/pmem.c
@@ -147,6 +147,12 @@ const struct of_device_id drc_pmem_match[] = {
 
 static int pseries_pmem_init(void)
 {
+   /*
+* Only supported on POWER8 and above.
+*/
+   if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+   return 0;
+
pmem_node = of_find_node_by_type(NULL, "ibm,persistent-memory");
if (!pmem_node)
return 0;
-- 
2.26.2



[PATCH v5 10/10] powerpc/pmem: Initialize pmem device on newer hardware

2020-06-09 Thread Aneesh Kumar K.V
With kernel now supporting new pmem flush/sync instructions, we can now
enable the kernel to initialize the device. On P10 these devices would
appear with a new compatible string. For PAPR device we have

compatible   "ibm,pmemory-v2"

and for OF pmem device we have

compatible   "pmem-region-v2"

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 1 +
 drivers/nvdimm/of_pmem.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index b970d2dbe589..3efd827fe0ac 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -498,6 +498,7 @@ static int papr_scm_remove(struct platform_device *pdev)
 
 static const struct of_device_id papr_scm_match[] = {
{ .compatible = "ibm,pmemory" },
+   { .compatible = "ibm,pmemory-v2" },
{ },
 };
 
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index a6cc3488e552..1e1585ab07c7 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -97,6 +97,7 @@ static int of_pmem_region_remove(struct platform_device *pdev)
 
 static const struct of_device_id of_pmem_region_match[] = {
{ .compatible = "pmem-region" },
+   { .compatible = "pmem-region-v2" },
{ },
 };
 
-- 
2.26.2



[PATCH v5 09/10] powerpc/pmem: Disable synchronous fault by default

2020-06-09 Thread Aneesh Kumar K.V
This adds a kernel config option that controls whether MAP_SYNC is enabled by
default. With POWER10, architecture is adding new pmem flush and sync
instructions. The kernel should prevent the usage of MAP_SYNC if applications
are not using the new instructions on newer hardware.

This config allows user to control whether MAP_SYNC should be enabled by
default or not.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/Kconfig.cputype|  9 +
 arch/powerpc/platforms/pseries/papr_scm.c | 17 -
 drivers/nvdimm/of_pmem.c  |  7 +++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index d349603fb889..abcc163b8dc6 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -383,6 +383,15 @@ config PPC_KUEP
 
  If you're unsure, say Y.
 
+config ARCH_MAP_SYNC_DISABLE
+   bool "Disable synchronous fault support (MAP_SYNC)"
+   default y
+   help
+ Disable support for synchronous fault with nvdimm namespaces.
+
+ If you're unsure, say Y.
+
+
 config PPC_HAVE_KUAP
bool
 
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index ad506e7003c9..b970d2dbe589 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -30,6 +30,7 @@ struct papr_scm_priv {
uint64_t block_size;
int metadata_size;
bool is_volatile;
+   bool disable_map_sync;
 
uint64_t bound_addr;
 
@@ -353,11 +354,18 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
ndr_desc.num_mappings = 1;
ndr_desc.nd_set = &p->nd_set;
ndr_desc.flush = papr_scm_flush_sync;
+   set_bit(ND_REGION_SYNC_ENABLED, &ndr_desc.flags);
 
if (p->is_volatile)
p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
else {
set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
+   /*
+* for a persistent region, check if the platform needs to
+* force MAP_SYNC disable.
+*/
+   if (p->disable_map_sync)
+   clear_bit(ND_REGION_SYNC_ENABLED, &ndr_desc.flags);
p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc);
}
if (!p->region) {
@@ -378,7 +386,7 @@ err:nvdimm_bus_unregister(p->bus);
 
 static int papr_scm_probe(struct platform_device *pdev)
 {
-   struct device_node *dn = pdev->dev.of_node;
+   struct device_node *dn;
u32 drc_index, metadata_size;
u64 blocks, block_size;
struct papr_scm_priv *p;
@@ -386,6 +394,10 @@ static int papr_scm_probe(struct platform_device *pdev)
u64 uuid[2];
int rc;
 
+   dn = dev_of_node(&pdev->dev);
+   if (!dn)
+   return -ENXIO;
+
/* check we have all the required DT properties */
if (of_property_read_u32(dn, "ibm,my-drc-index", &drc_index)) {
dev_err(&pdev->dev, "%pOF: missing drc-index!\n", dn);
@@ -415,6 +427,9 @@ static int papr_scm_probe(struct platform_device *pdev)
/* optional DT properties */
of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
 
+   if (of_device_is_compatible(dn, "ibm,pmemory-v2"))
+   p->disable_map_sync = true;
+
p->dn = dn;
p->drc_index = drc_index;
p->block_size = block_size;
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 6826a274a1f1..a6cc3488e552 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -59,12 +59,19 @@ static int of_pmem_region_probe(struct platform_device 
*pdev)
ndr_desc.res = &pdev->resource[i];
ndr_desc.of_node = np;
set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
+   set_bit(ND_REGION_SYNC_ENABLED, &ndr_desc.flags);
 
if (is_volatile)
region = nvdimm_volatile_region_create(bus, &ndr_desc);
else {
set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
+   /*
+* for a persistent region, check for newer device
+*/
+   if (of_device_is_compatible(np, "pmem-region-v2"))
+   clear_bit(ND_REGION_SYNC_ENABLED, 
&ndr_desc.flags);
region = nvdimm_pmem_region_create(bus, &ndr_desc);
+
}
 
if (!region)
-- 
2.26.2



[PATCH v5 08/10] libnvdimm/dax: Add a dax flag to control synchronous fault support

2020-06-09 Thread Aneesh Kumar K.V
With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware

This patch adds a dax attribute 
(/sys/bus/nd/devices/region0/pfn0.1/block/pmem0/dax/sync_fault)
which can be used to control this flag. If the device supports synchronous flush
then userspace can update this attribute to enable/disable the synchronous
fault. The attribute is only visible if there is write cache enabled on the 
device.

In a followup patch on ppc64 device with compat string "ibm,pmemory-v2"
will disable the sync fault feature.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 73 
 drivers/nvdimm/pmem.c|  4 ++
 drivers/nvdimm/region_devs.c | 16 
 include/linux/dax.h  | 16 
 include/linux/libnvdimm.h|  4 ++
 mm/Kconfig   |  3 ++
 7 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index df238c8b6ef2..8a825ecff49b 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -420,7 +420,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC | 
DAXDEV_F_SYNC_ENABLED);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
goto err;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 8e32345be0f7..f93e6649d452 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -198,6 +198,12 @@ enum dax_device_flags {
DAXDEV_WRITE_CACHE,
/* flag to check if device supports synchronous flush */
DAXDEV_SYNC,
+   /*
+* flag to indicate whether synchronous flush is enabled.
+* Some platform may want to disable synchronous flush support
+* even though device supports the same.
+*/
+   DAXDEV_SYNC_ENABLED,
 };
 
 /**
@@ -254,6 +260,63 @@ static ssize_t write_cache_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(write_cache);
 
+bool __dax_synchronous_enabled(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(__dax_synchronous_enabled);
+
+static void set_dax_synchronous_enable(struct dax_device *dax_dev, bool enable)
+{
+   if (!test_bit(DAXDEV_SYNC, &dax_dev->flags))
+   return;
+
+   if (enable)
+   set_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
+   else
+   clear_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
+}
+
+
+static ssize_t sync_fault_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   int enabled;
+   struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
+   ssize_t rc;
+
+   WARN_ON_ONCE(!dax_dev);
+   if (!dax_dev)
+   return -ENXIO;
+
+   enabled = (dax_synchronous(dax_dev) && 
dax_synchronous_enabled(dax_dev));
+   rc = sprintf(buf, "%d\n", enabled);
+   put_dax(dax_dev);
+   return rc;
+}
+
+static ssize_t sync_fault_store(struct device *dev,
+   struct device_attribute *attr, const char *buf, size_t len)
+{
+   bool enable_sync;
+   int rc = strtobool(buf, &enable_sync);
+   struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
+
+   WARN_ON_ONCE(!dax_dev);
+   if (!dax_dev)
+   return -ENXIO;
+
+   if (rc)
+   len = rc;
+   else
+   set_dax_synchronous_enable(dax_dev, enable_sync);
+
+   put_dax(dax_dev);
+   return len;
+}
+
+static DEVICE_ATTR_RW(sync_fault);
+
 static umode_t dax_visible(struct kobject *kobj, struct attribute *a, int n)
 {
struct device *dev = container_of(kobj, typeof(*dev), kobj);
@@ -267,11 +330,18 @@ static umode_t dax_visible(struct kobject *kobj, struct 
attribute *a, int n)
if (a == &dev_attr_write_cache.attr)
return 0;
 #endif
+   if (a == &dev_attr_sync_fault.attr) {
+   if (dax_write_cache_enabled(dax_dev))
+   return a->mode;
+   return 0;
+   }
+
return a->mode;
 }
 
 static struct attribute *dax_attributes[] = {
&dev_attr_write_cache.attr,
+   &dev_attr_sync_fault.attr,
NULL,
 };
 
@@ -594,6 +664,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
if (flags & DAXDEV_F_SYNC)
set_dax_synchronous(dax_dev);
 
+   if (flags & DAXDEV_F_SYNC_ENABLED)
+   set_dax_synchronous_enable(dax_dev, true);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 97f948f8f4e6..a738b237a

[PATCH v5 07/10] powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem flush functions.

2020-06-09 Thread Aneesh Kumar K.V
We only support persistent memory on P8 and above. This is enforced by the
firmware and further checked on virtualzied platform during platform init.
Add WARN_ONCE in pmem flush routines to catch the wrong usage of these.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/cacheflush.h | 2 ++
 arch/powerpc/lib/pmem.c   | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index bb56a49c9a66..6dad92bd4be3 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -126,6 +126,8 @@ static inline void  arch_pmem_flush_barrier(void)
 {
if (cpu_has_feature(CPU_FTR_ARCH_207S))
asm volatile(PPC_PHWSYNC ::: "memory");
+   else
+   WARN_ONCE(1, "Using pmem flush on older hardware.");
 }
 #endif /* __KERNEL__ */
 
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 21210fa676e5..f40bd908d28d 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -37,12 +37,14 @@ static inline void clean_pmem_range(unsigned long start, 
unsigned long stop)
 {
if (cpu_has_feature(CPU_FTR_ARCH_207S))
return __clean_pmem_range(start, stop);
+   WARN_ONCE(1, "Using pmem flush on older hardware.");
 }
 
 static inline void flush_pmem_range(unsigned long start, unsigned long stop)
 {
if (cpu_has_feature(CPU_FTR_ARCH_207S))
return __flush_pmem_range(start, stop);
+   WARN_ONCE(1, "Using pmem flush on older hardware.");
 }
 
 /*
-- 
2.26.2



[PATCH v5 06/10] powerpc/pmem: Avoid the barrier in flush routines

2020-06-09 Thread Aneesh Kumar K.V
nvdimm expect the flush routines to just mark the cache clean. The barrier
that mark the store globally visible is done in nvdimm_flush().

Update the papr_scm driver to a simplified nvdim_flush callback that do
only the required barrier.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/lib/pmem.c   |  6 --
 arch/powerpc/platforms/pseries/papr_scm.c | 13 +
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 5a61aaeb6930..21210fa676e5 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -19,9 +19,6 @@ static inline void __clean_pmem_range(unsigned long start, 
unsigned long stop)
 
for (i = 0; i < size >> shift; i++, addr += bytes)
asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-   asm volatile(PPC_PHWSYNC ::: "memory");
 }
 
 static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
@@ -34,9 +31,6 @@ static inline void __flush_pmem_range(unsigned long start, 
unsigned long stop)
 
for (i = 0; i < size >> shift; i++, addr += bytes)
asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-   asm volatile(PPC_PHWSYNC ::: "memory");
 }
 
 static inline void clean_pmem_range(unsigned long start, unsigned long stop)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..ad506e7003c9 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -285,6 +285,18 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor 
*nd_desc,
 
return 0;
 }
+/*
+ * We have made sure the pmem writes are done such that before calling this
+ * all the caches are flushed/clean. We use dcbf/dcbfps to ensure this. Here
+ * we just need to add the necessary barrier to make sure the above flushes
+ * are have updated persistent storage before any data access or data transfer
+ * caused by subsequent instructions is initiated.
+ */
+static int papr_scm_flush_sync(struct nd_region *nd_region, struct bio *bio)
+{
+   arch_pmem_flush_barrier();
+   return 0;
+}
 
 static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 {
@@ -340,6 +352,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
ndr_desc.mapping = &mapping;
ndr_desc.num_mappings = 1;
ndr_desc.nd_set = &p->nd_set;
+   ndr_desc.flush = papr_scm_flush_sync;
 
if (p->is_volatile)
p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
-- 
2.26.2



[PATCH v5 05/10] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction.

2020-06-09 Thread Aneesh Kumar K.V
of_pmem on POWER10 can now use phwsync instead of hwsync to ensure
all previous writes are architecturally visible for the platform
buffer flush.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/cacheflush.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index 81808d1b54ca..bb56a49c9a66 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -120,6 +120,13 @@ static inline void invalidate_dcache_range(unsigned long 
start,
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
memcpy(dst, src, len)
 
+
+#define arch_pmem_flush_barrier arch_pmem_flush_barrier
+static inline void  arch_pmem_flush_barrier(void)
+{
+   if (cpu_has_feature(CPU_FTR_ARCH_207S))
+   asm volatile(PPC_PHWSYNC ::: "memory");
+}
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_POWERPC_CACHEFLUSH_H */
-- 
2.26.2



[PATCH v5 02/10] powerpc/pmem: Add new instructions for persistent storage and sync

2020-06-09 Thread Aneesh Kumar K.V
POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage.

Additionally, POWER10 also introduce phwsync and plwsync which can be used
to establish order of these writes to persistent storage.

This patch exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new
instructions are added as a variant of the old ones that old hardware
won't differentiate.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/ppc-opcode.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 2a39c716c343..1ad014e4633e 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -219,6 +219,8 @@
 #define PPC_INST_STWCX 0x7c00012d
 #define PPC_INST_LWSYNC0x7c2004ac
 #define PPC_INST_SYNC  0x7c0004ac
+#define PPC_INST_PHWSYNC   0x7c8004ac
+#define PPC_INST_PLWSYNC   0x7ca004ac
 #define PPC_INST_SYNC_MASK 0xfc0007fe
 #define PPC_INST_ISYNC 0x4c00012c
 #define PPC_INST_LXVD2X0x7c000698
@@ -284,6 +286,8 @@
 #define PPC_INST_TABORT0x7c00071d
 #define PPC_INST_TSR   0x7c0005dd
 
+#define PPC_INST_DCBF  0x7cac
+
 #define PPC_INST_NAP   0x4c000364
 #define PPC_INST_SLEEP 0x4c0003a4
 #define PPC_INST_WINKLE0x4c0003e4
@@ -532,6 +536,14 @@
 #define STBCIX(s,a,b)  stringify_in_c(.long PPC_INST_STBCIX | \
   __PPC_RS(s) | __PPC_RA(a) | __PPC_RB(b))
 
+#definePPC_DCBFPS(a, b)stringify_in_c(.long PPC_INST_DCBF |
\
+  ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21))
+#definePPC_DCBSTPS(a, b)   stringify_in_c(.long PPC_INST_DCBF |
\
+  ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21))
+
+#definePPC_PHWSYNC stringify_in_c(.long PPC_INST_PHWSYNC)
+#definePPC_PLWSYNC stringify_in_c(.long PPC_INST_PLWSYNC)
+
 /*
  * Define what the VSX XX1 form instructions will look like, then add
  * the 128 bit load store instructions based on that.
-- 
2.26.2



[PATCH v5 03/10] powerpc/pmem: Add flush routines using new pmem store and sync instruction

2020-06-09 Thread Aneesh Kumar K.V
Start using dcbstps; phwsync; sequence for flushing persistent memory range.
The new instructions are implemented as a variant of dcbf and hwsync and on
P8 and P9 they will be executed as those instructions. We avoid using them on
older hardware. This helps to avoid difficult to debug bugs.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/cacheflush.h |  1 +
 arch/powerpc/lib/pmem.c   | 50 ---
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index e92191b390f3..81808d1b54ca 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * No cache flushing is required when address mappings are changed,
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..5a61aaeb6930 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,20 +9,62 @@
 
 #include 
 
+static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
+{
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
+   void *addr = (void *)(start & ~(bytes - 1));
+   unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+   unsigned long i;
+
+   for (i = 0; i < size >> shift; i++, addr += bytes)
+   asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+   asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
+{
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
+   void *addr = (void *)(start & ~(bytes - 1));
+   unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+   unsigned long i;
+
+   for (i = 0; i < size >> shift; i++, addr += bytes)
+   asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+   asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void clean_pmem_range(unsigned long start, unsigned long stop)
+{
+   if (cpu_has_feature(CPU_FTR_ARCH_207S))
+   return __clean_pmem_range(start, stop);
+}
+
+static inline void flush_pmem_range(unsigned long start, unsigned long stop)
+{
+   if (cpu_has_feature(CPU_FTR_ARCH_207S))
+   return __flush_pmem_range(start, stop);
+}
+
 /*
  * CONFIG_ARCH_HAS_PMEM_API symbols
  */
 void arch_wb_cache_pmem(void *addr, size_t size)
 {
unsigned long start = (unsigned long) addr;
-   flush_dcache_range(start, start + size);
+   clean_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
 
 void arch_invalidate_pmem(void *addr, size_t size)
 {
unsigned long start = (unsigned long) addr;
-   flush_dcache_range(start, start + size);
+   flush_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
 
@@ -35,7 +77,7 @@ long __copy_from_user_flushcache(void *dest, const void 
__user *src,
unsigned long copied, start = (unsigned long) dest;
 
copied = __copy_from_user(dest, src, size);
-   flush_dcache_range(start, start + size);
+   clean_pmem_range(start, start + size);
 
return copied;
 }
@@ -45,7 +87,7 @@ void *memcpy_flushcache(void *dest, const void *src, size_t 
size)
unsigned long start = (unsigned long) dest;
 
memcpy(dest, src, size);
-   flush_dcache_range(start, start + size);
+   clean_pmem_range(start, start + size);
 
return dest;
 }
-- 
2.26.2



[PATCH v5 00/10] Support new pmem flush and sync instructions for POWER

2020-06-09 Thread Aneesh Kumar K.V
This patch series enables the usage os new pmem flush and sync instructions on 
POWER
architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps 
and dcbfps)
that can be used to write modified locations back to persistent storage. 
Additionally,
POWER10 also introduce phwsync and plwsync which can be used to establish order 
of these
writes to persistent storage.

This series exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new 
instructions
are added as a variant of the old ones that old hardware won't differentiate.

On POWER10, pmem devices will be represented by a different device tree compat
strings. This ensures that older kernels won't initialize pmem devices on 
POWER10.

W.r.t userspace we want to make sure applications are enabled to use MAP_SYNC 
only
if they are using the new instructions. To avoid the wrong usage of MAP_SYNC on
newer hardware, we disable MAP_SYNC by default on newer hardware. The namespace 
specific
attribute /sys/block/pmem0/dax/sync_fault can be used to enable MAP_SYNC later.

With this:
1) vPMEM continues to work since it is a volatile region. That 
doesn't need any flush instructions.

2) pmdk and other user applications get updated to use new instructions
and updated packages are made available to all distributions

3) On newer hardware, the device will appear with a new compat string. 
Hence older distributions won't initialize pmem on newer hardware.

4) If we have a newer kernel with an older distro, we use the per 
namespace sysfs knob that prevents the usage of MAP_SYNC.

5) Sometime in the future, we mark the CONFIG_ARCH_MAP_SYNC_DISABLE=n
on ppc64 when we are confident that everybody is using the new flush 
instruction.

Chaanges from V4:
* Add namespace specific sychronous fault control.

Changes from V3:
* Add new compat string to be used for the device.
* Use arch_pmem_flush_barrier() in dm-writecache.

Aneesh Kumar K.V (10):
  powerpc/pmem: Restrict papr_scm to P8 and above.
  powerpc/pmem: Add new instructions for persistent storage and sync
  powerpc/pmem: Add flush routines using new pmem store and sync
instruction
  libnvdimm/nvdimm/flush: Allow architecture to override the flush
barrier
  powerpc/pmem/of_pmem: Update of_pmem to use the new barrier
instruction.
  powerpc/pmem: Avoid the barrier in flush routines
  powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem
flush functions.
  libnvdimm/dax: Add a dax flag to control synchronous fault support
  powerpc/pmem: Disable synchronous fault by default
  powerpc/pmem: Initialize pmem device on newer hardware

 arch/powerpc/include/asm/cacheflush.h | 10 
 arch/powerpc/include/asm/ppc-opcode.h | 12 
 arch/powerpc/lib/pmem.c   | 46 --
 arch/powerpc/platforms/Kconfig.cputype|  9 +++
 arch/powerpc/platforms/pseries/papr_scm.c | 31 +-
 arch/powerpc/platforms/pseries/pmem.c |  6 ++
 drivers/dax/bus.c |  2 +-
 drivers/dax/super.c   | 73 +++
 drivers/md/dm-writecache.c|  2 +-
 drivers/nvdimm/of_pmem.c  |  8 +++
 drivers/nvdimm/pmem.c |  4 ++
 drivers/nvdimm/region_devs.c  | 24 ++--
 include/linux/dax.h   | 16 +
 include/linux/libnvdimm.h |  8 +++
 mm/Kconfig|  3 +
 15 files changed, 243 insertions(+), 11 deletions(-)

-- 
2.26.2



Re: ipr crashes due to NULL dma_need_drain since cc97923a5bcc ("block: move dma drain handling to scsi")

2020-06-09 Thread Michael Ellerman
Christoph Hellwig  writes:
> Can you try this patch?
>
> ---
> From 1c9913360a0494375c5655b133899cb4323bceb4 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Tue, 9 Jun 2020 14:07:31 +0200
> Subject: scsi: wire up ata_scsi_dma_need_drain for SAS HBA drivers
>
> We need ata_scsi_dma_need_drain for all drivers wired up to drive ATAPI
> devices through libata.  That also includes the SAS HBA drivers in
> addition to native libata HBA drivers.
>
> Fixes: cc97923a5bcc ("block: move dma drain handling to scsi")
> Reported-by: Michael Ellerman 
> Signed-off-by: Christoph Hellwig 

Yep that works for me here with ipr.

Tested-by: Michael Ellerman 

cheers


Re: [PATCH v3 0/7] Base support for POWER10

2020-06-09 Thread Michael Ellerman
Murilo Opsfelder Araújo  writes:
> On Tue, Jun 09, 2020 at 03:28:31PM +1000, Michael Ellerman wrote:
>> On Thu, 21 May 2020 11:43:34 +1000, Alistair Popple wrote:
>> > This series brings together several previously posted patches required for
>> > POWER10 support and introduces a new patch enabling POWER10 architected
>> > mode to enable booting as a POWER10 pseries guest.
>> >
>> > It includes support for enabling facilities related to MMA and prefix
>> > instructions.
>> >
>> > [...]
>>
>> Patches 1-3 and 5-7 applied to powerpc/next.
>>
>> [1/7] powerpc: Add new HWCAP bits
>>   
>> https://git.kernel.org/powerpc/c/ee988c11acf6f9464b7b44e9a091bf6afb3b3a49
>> [2/7] powerpc: Add support for ISA v3.1
>>   
>> https://git.kernel.org/powerpc/c/3fd5836ee801ab9ac5b314c26550e209bafa5eaa
>> [3/7] powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
>>   
>> https://git.kernel.org/powerpc/c/43d0d37acbe40a9a93d9891ca670638cd22116b1
>
> Just out of curiosity, why do we define ISA_V3_0B and ISA_V3_1 macros
> and don't use them anywhere else in the code?

Because we're sloppy :/

> Can't they be used in cpufeatures_setup_start() instead of 3000 and
> 3100 literals?

Yes please.

cheers


Re: [PATCH 2/6] powerpc/ppc-opcode: move ppc instruction encoding from test_emulate_step

2020-06-09 Thread Sandipan Das


On 26/05/20 1:45 pm, Balamuruhan S wrote:
> Few ppc instructions are encoded in test_emulate_step.c, consolidate
> them and use it from ppc-opcode.h
> 
> Signed-off-by: Balamuruhan S 
> Acked-by: Naveen N. Rao 
> Tested-by: Naveen N. Rao 
> ---
>  arch/powerpc/include/asm/ppc-opcode.h |  35 ++
>  arch/powerpc/lib/test_emulate_step.c  | 155 ++
>  2 files changed, 91 insertions(+), 99 deletions(-)
> [...]
> 

Acked-by: Sandipan Das 


Re: [PATCH 4/6] powerpc/ppc-opcode: consolidate powerpc instructions from bpf_jit.h

2020-06-09 Thread Sandipan Das


On 26/05/20 1:45 pm, Balamuruhan S wrote:
> move macro definitions of powerpc instructions from bpf_jit.h to ppc-opcode.h
> and adopt the users of the macros accordingly. `PPC_MR()` is defined twice in
> bpf_jit.h, remove the duplicate one.
> 
> Signed-off-by: Balamuruhan S 
> Acked-by: Naveen N. Rao 
> Tested-by: Naveen N. Rao 
> ---
>  arch/powerpc/include/asm/ppc-opcode.h | 139 +
>  arch/powerpc/net/bpf_jit.h| 166 ++-
>  arch/powerpc/net/bpf_jit32.h  |  24 +--
>  arch/powerpc/net/bpf_jit64.h  |  12 +-
>  arch/powerpc/net/bpf_jit_comp.c   | 132 ++--
>  arch/powerpc/net/bpf_jit_comp64.c | 278 +-
>  6 files changed, 378 insertions(+), 373 deletions(-)
> [...]
> 

Acked-by: Sandipan Das 


Re: [PATCH 3/6] powerpc/bpf_jit: reuse instruction macros from ppc-opcode.h

2020-06-09 Thread Sandipan Das


On 26/05/20 1:45 pm, Balamuruhan S wrote:
> remove duplicate macro definitions from bpf_jit.h and reuse the macros from
> ppc-opcode.h
> 
> Signed-off-by: Balamuruhan S 
> Acked-by: Naveen N. Rao 
> Tested-by: Naveen N. Rao 
> ---
>  arch/powerpc/net/bpf_jit.h| 18 +-
>  arch/powerpc/net/bpf_jit32.h  | 10 +-
>  arch/powerpc/net/bpf_jit64.h  |  4 ++--
>  arch/powerpc/net/bpf_jit_comp.c   |  2 +-
>  arch/powerpc/net/bpf_jit_comp64.c | 20 ++--
>  5 files changed, 19 insertions(+), 35 deletions(-)
> [...]
> 

Acked-by: Sandipan Das 


Re: [PATCH] powerpc/pseries/svm: Fixup align argument in alloc_shared_lppaca() function

2020-06-09 Thread Thiago Jung Bauermann


Satheesh Rajendran  writes:

> Argument "align" in alloc_shared_lppaca() function was unused inside the
> function. Let's fix it and update code comment.
>
> Cc: linux-ker...@vger.kernel.org
> Cc: Thiago Jung Bauermann 
> Cc: Ram Pai 
> Cc: Sukadev Bhattiprolu 
> Cc: Laurent Dufour 
> Signed-off-by: Satheesh Rajendran 
> ---
>  arch/powerpc/kernel/paca.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Nice. I agree it's a good code cleanup.

Reviewed-by: Thiago Jung Bauermann 

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH] powerpc/pseries/svm: Remove unwanted check for shared_lppaca_size

2020-06-09 Thread Thiago Jung Bauermann


Satheesh Rajendran  writes:

> Early secure guest boot hits the below crash while booting with
> vcpus numbers aligned with page boundary for PAGE size of 64k
> and LPPACA size of 1k i.e 64, 128 etc, due to the BUG_ON assert
> for shared_lppaca_total_size equal to shared_lppaca_size,
>
>  [0.00] Partition configured for 64 cpus.
>  [0.00] CPU maps initialized for 1 thread per core
>  [0.00] [ cut here ]
>  [0.00] kernel BUG at arch/powerpc/kernel/paca.c:89!
>  [0.00] Oops: Exception in kernel mode, sig: 5 [#1]
>  [0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>
> which is not necessary, let's remove it.
>
> Cc: linux-ker...@vger.kernel.org
> Cc: Thiago Jung Bauermann 
> Cc: Ram Pai 
> Cc: Sukadev Bhattiprolu 
> Cc: Laurent Dufour 
> Signed-off-by: Satheesh Rajendran 
> ---
>  arch/powerpc/kernel/paca.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks for fixing this bug! I would only add:

Fixes: bd104e6db6f0 ("powerpc/pseries/svm: Use shared memory for LPPACA 
structures")

In any case:

Reviewed-by: Thiago Jung Bauermann 

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH] ibmvscsi: don't send host info in adapter info MAD after LPM

2020-06-09 Thread Martin K. Petersen
On Wed, 3 Jun 2020 15:36:32 -0500, Tyrel Datwyler wrote:

> The adatper info MAD is used to send the client info and receive the
> host info as a response. A peristent buffer is used and as such the
> client info is overwritten after the response. During the course of
> a normal adapter reset the client info is refreshed in the buffer in
> preparation for sending the adapter info MAD.
> 
> However, in the special case of LPM where we reenable the CRQ instead
> of a full CRQ teardown and reset we fail to refresh the client info in
> the adapter info buffer. As a result after Live Partition Migration
> (LPM) we erroneously report the hosts info as our own.

Applied to 5.8/scsi-queue, thanks!

[1/1] scsi: ibmvscsi: Don't send host info in adapter info MAD after LPM
  https://git.kernel.org/mkp/scsi/c/4919b33b63c8

-- 
Martin K. Petersen  Oracle Linux Engineering


PowerPC KVM-PR issue

2020-06-09 Thread Christian Zigotzky

Hello,

KVM-PR doesn't work anymore on my Nemo board [1]. I figured out that the 
Git kernels and the kernel 5.7 are affected.


Error message: Fienix kernel: kvmppc_exit_pr_progint: emulation at 700 
failed ()


I can boot virtual QEMU PowerPC machines with KVM-PR with the kernel 5.6 
without any problems on my Nemo board.


I tested it with QEMU 2.5.0 and QEMU 5.0.0 today.

Could you please check KVM-PR on your PowerPC machine?

Thanks,
Christian

[1] https://en.wikipedia.org/wiki/AmigaOne_X1000


Re: [PATCH v11 5/6] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods

2020-06-09 Thread Dan Williams
On Tue, Jun 9, 2020 at 10:54 AM Vaibhav Jain  wrote:
>
> Thanks Dan for the consideration and taking time to look into this.
>
> My responses below:
>
> Dan Williams  writes:
>
> > On Mon, Jun 8, 2020 at 5:16 PM kernel test robot  wrote:
> >>
> >> Hi Vaibhav,
> >>
> >> Thank you for the patch! Perhaps something to improve:
> >>
> >> [auto build test WARNING on powerpc/next]
> >> [also build test WARNING on linus/master v5.7 next-20200605]
> >> [cannot apply to linux-nvdimm/libnvdimm-for-next scottwood/next]
> >> [if your patch is applied to the wrong git tree, please drop us a note to 
> >> help
> >> improve the system. BTW, we also suggest to use '--base' option to specify 
> >> the
> >> base tree in git format-patch, please see 
> >> https://stackoverflow.com/a/37406982]
> >>
> >> url:
> >> https://github.com/0day-ci/linux/commits/Vaibhav-Jain/powerpc-papr_scm-Add-support-for-reporting-nvdimm-health/20200607-211653
> >> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> >> next
> >> config: powerpc-randconfig-r016-20200607 (attached as .config)
> >> compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
> >> e429cffd4f228f70c1d9df0e5d77c08590dd9766)
> >> reproduce (this is a W=1 build):
> >> wget 
> >> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross 
> >> -O ~/bin/make.cross
> >> chmod +x ~/bin/make.cross
> >> # install powerpc cross compiling tool for clang build
> >> # apt-get install binutils-powerpc-linux-gnu
> >> # save the attached .config to linux build tree
> >> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
> >> ARCH=powerpc
> >>
> >> If you fix the issue, kindly add following tag as appropriate
> >> Reported-by: kernel test robot 
> >>
> >> All warnings (new ones prefixed by >>, old ones prefixed by <<):
> >>
> >> In file included from :1:
> >> >> ./usr/include/asm/papr_pdsm.h:69:20: warning: field 'hdr' with variable 
> >> >> sized type 'struct nd_cmd_pkg' not at the end of a struct or class is a 
> >> >> GNU extension [-Wgnu-variable-sized-type-not-at-end]
> >> struct nd_cmd_pkg hdr;  /* Package header containing sub-cmd */
> >
> > Hi Vaibhav,
> >
> [.]
> > This looks like it's going to need another round to get this fixed. I
> > don't think 'struct nd_pdsm_cmd_pkg' should embed a definition of
> > 'struct nd_cmd_pkg'. An instance of 'struct nd_cmd_pkg' carries a
> > payload that is the 'pdsm' specifics. As the code has it now it's
> > defined as a superset of 'struct nd_cmd_pkg' and the compiler warning
> > is pointing out a real 'struct' organization problem.
> >
> > Given the soak time needed in -next after the code is finalized this
> > there's no time to do another round of updates and still make the v5.8
> > merge window.
>
> Agreed that this looks bad, a solution will probably need some more
> review cycles resulting in this series missing the merge window.
>
> I am investigating into the possible solutions for this reported issue
> and made few observations:
>
> I see command pkg for Intel, Hpe, Msft and Hyperv families using a
> similar layout of embedding nd_cmd_pkg at the head of the
> command-pkg. struct nd_pdsm_cmd_pkg is following the same pattern.
>
> struct nd_pdsm_cmd_pkg {
> struct nd_cmd_pkg hdr;
> /* other members */
> };
>
> struct ndn_pkg_msft {
> struct nd_cmd_pkg gen;
> /* other members */
> };
> struct nd_pkg_intel {
> struct nd_cmd_pkg gen;
> /* other members */
> };
> struct ndn_pkg_hpe1 {
> struct nd_cmd_pkg gen;
> /* other members */

In those cases the other members are a union and there is no second
variable length array. Perhaps that is why those definitions are not
getting flagged? I'm not seeing anything in ndctl build options that
would explicitly disable this warning, but I'm not sure if the ndctl
build environment is missing this build warning by accident.

Those variable size payloads are also not being used in any code paths
that would look at the size of the command payload, like the kernel
ioctl() path. The payload validation code needs static sizes and the
payload parsing code wants to cast the payload to a known type. I
don't think you can use the same struct definition for both those
cases which is why the ndctl parsing code uses the union layout, but
the kernel command marshaling code does strict layering.

> };
>
> Even though other command families implement similar command-package
> layout they were not flagged (yet) as they are (I am guessing) serviced
> in vendor acpi drivers rather than in kernel like in case of papr-scm
> command family.

I sincerely hope there are no vendor acpi kernel drivers outside of
the upstream one.

>
> So, I think this issue is not just specific to papr-scm command family
> introduced in this patch series but rather across all other command
> families. Every other command family assumes 'struct nd_cmd_pkg_hdr' to
> be embeddable and puts it at the beginnin

Re: [PATCH v11 5/6] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods

2020-06-09 Thread Vaibhav Jain
Thanks Dan for the consideration and taking time to look into this.

My responses below:

Dan Williams  writes:

> On Mon, Jun 8, 2020 at 5:16 PM kernel test robot  wrote:
>>
>> Hi Vaibhav,
>>
>> Thank you for the patch! Perhaps something to improve:
>>
>> [auto build test WARNING on powerpc/next]
>> [also build test WARNING on linus/master v5.7 next-20200605]
>> [cannot apply to linux-nvdimm/libnvdimm-for-next scottwood/next]
>> [if your patch is applied to the wrong git tree, please drop us a note to 
>> help
>> improve the system. BTW, we also suggest to use '--base' option to specify 
>> the
>> base tree in git format-patch, please see 
>> https://stackoverflow.com/a/37406982]
>>
>> url:
>> https://github.com/0day-ci/linux/commits/Vaibhav-Jain/powerpc-papr_scm-Add-support-for-reporting-nvdimm-health/20200607-211653
>> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
>> next
>> config: powerpc-randconfig-r016-20200607 (attached as .config)
>> compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
>> e429cffd4f228f70c1d9df0e5d77c08590dd9766)
>> reproduce (this is a W=1 build):
>> wget 
>> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
>> ~/bin/make.cross
>> chmod +x ~/bin/make.cross
>> # install powerpc cross compiling tool for clang build
>> # apt-get install binutils-powerpc-linux-gnu
>> # save the attached .config to linux build tree
>> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
>> ARCH=powerpc
>>
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot 
>>
>> All warnings (new ones prefixed by >>, old ones prefixed by <<):
>>
>> In file included from :1:
>> >> ./usr/include/asm/papr_pdsm.h:69:20: warning: field 'hdr' with variable 
>> >> sized type 'struct nd_cmd_pkg' not at the end of a struct or class is a 
>> >> GNU extension [-Wgnu-variable-sized-type-not-at-end]
>> struct nd_cmd_pkg hdr;  /* Package header containing sub-cmd */
>
> Hi Vaibhav,
>
[.]
> This looks like it's going to need another round to get this fixed. I
> don't think 'struct nd_pdsm_cmd_pkg' should embed a definition of
> 'struct nd_cmd_pkg'. An instance of 'struct nd_cmd_pkg' carries a
> payload that is the 'pdsm' specifics. As the code has it now it's
> defined as a superset of 'struct nd_cmd_pkg' and the compiler warning
> is pointing out a real 'struct' organization problem.
>
> Given the soak time needed in -next after the code is finalized this
> there's no time to do another round of updates and still make the v5.8
> merge window.

Agreed that this looks bad, a solution will probably need some more
review cycles resulting in this series missing the merge window.

I am investigating into the possible solutions for this reported issue
and made few observations:

I see command pkg for Intel, Hpe, Msft and Hyperv families using a
similar layout of embedding nd_cmd_pkg at the head of the
command-pkg. struct nd_pdsm_cmd_pkg is following the same pattern.

struct nd_pdsm_cmd_pkg {
struct nd_cmd_pkg hdr;
/* other members */
};

struct ndn_pkg_msft {
struct nd_cmd_pkg gen;
/* other members */
};
struct nd_pkg_intel {
struct nd_cmd_pkg gen;
/* other members */
};
struct ndn_pkg_hpe1 {
struct nd_cmd_pkg gen;
/* other members */
};

Even though other command families implement similar command-package
layout they were not flagged (yet) as they are (I am guessing) serviced
in vendor acpi drivers rather than in kernel like in case of papr-scm
command family.

So, I think this issue is not just specific to papr-scm command family
introduced in this patch series but rather across all other command
families. Every other command family assumes 'struct nd_cmd_pkg_hdr' to
be embeddable and puts it at the beginning of their corresponding
command-packages. And its only a matter of time when someone tries
filtering/handling of vendor specific commands in nfit module when they
hit similar issue.

Possible Solutions:

* One way would be to redefine 'struct nd_cmd_pkg' to mark field
  'nd_payload[]' from a flexible array to zero sized array as
  'nd_payload[0]'. This should make 'struct nd_cmd_pkg' embeddable and
  clang shouldn't report 'gnu-variable-sized-type-not-at-end'
  warning. Also I think this change shouldn't introduce any ABI change.
  
* Another way to solve this issue might be to redefine 'struct
  nd_pdsm_cmd_pkg' to below removing the 'struct nd_cmd_pkg' member. This
  struct should immediately follow the 'struct nd_cmd_pkg' command package
  when sent to libnvdimm:

  struct nd_pdsm_cmd_pkg {
__s32 cmd_status;   /* Out: Sub-cmd status returned back */
__u16 reserved[2];  /* Ignored and to be used in future */
__u8 payload[];
};

  This should remove the flexible member nc_cmd_pkg.nd_payload from the
  struct with just one remaining at the end. However this would make
  accessing the 

Re: [PATCH v3 0/7] Base support for POWER10

2020-06-09 Thread Murilo Opsfelder Araújo
On Tue, Jun 09, 2020 at 03:28:31PM +1000, Michael Ellerman wrote:
> On Thu, 21 May 2020 11:43:34 +1000, Alistair Popple wrote:
> > This series brings together several previously posted patches required for
> > POWER10 support and introduces a new patch enabling POWER10 architected
> > mode to enable booting as a POWER10 pseries guest.
> >
> > It includes support for enabling facilities related to MMA and prefix
> > instructions.
> >
> > [...]
>
> Patches 1-3 and 5-7 applied to powerpc/next.
>
> [1/7] powerpc: Add new HWCAP bits
>   
> https://git.kernel.org/powerpc/c/ee988c11acf6f9464b7b44e9a091bf6afb3b3a49
> [2/7] powerpc: Add support for ISA v3.1
>   
> https://git.kernel.org/powerpc/c/3fd5836ee801ab9ac5b314c26550e209bafa5eaa
> [3/7] powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
>   
> https://git.kernel.org/powerpc/c/43d0d37acbe40a9a93d9891ca670638cd22116b1

Just out of curiosity, why do we define ISA_V3_0B and ISA_V3_1 macros
and don't use them anywhere else in the code?

Can't they be used in cpufeatures_setup_start() instead of 3000 and
3100 literals?

> [5/7] powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
>   
> https://git.kernel.org/powerpc/c/c63d688c3dabca973c5a7da73d17422ad13f3737
> [6/7] powerpc/dt_cpu_ftrs: Add MMA feature
>   
> https://git.kernel.org/powerpc/c/87939d50e5888bd78478d9aa9455f56b919df658
> [7/7] powerpc: Add POWER10 architected mode
>   
> https://git.kernel.org/powerpc/c/a3ea40d5c7365e7e5c7c85b6f30b15142b397571
>
> cheers

--
Murilo


Re: [PATCH v12 5/6] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods

2020-06-09 Thread kernel test robot
Hi Vaibhav,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on linus/master v5.7 next-20200608]
[cannot apply to linux-nvdimm/libnvdimm-for-next scottwood/next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Vaibhav-Jain/powerpc-papr_scm-Add-support-for-reporting-nvdimm-health/20200609-051451
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r031-20200608 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
bc2b70982be8f5250cd0082a7190f8b417bd4dfe)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>, old ones prefixed by <<):

In file included from :1:
>> ./usr/include/asm/papr_pdsm.h:67:20: warning: field 'hdr' with variable 
>> sized type 'struct nd_cmd_pkg' not at the end of a struct or class is a GNU 
>> extension [-Wgnu-variable-sized-type-not-at-end]
struct nd_cmd_pkg hdr;  /* Package header containing sub-cmd */
^
1 warning generated.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [musl] ppc64le and 32-bit LE userland compatibility

2020-06-09 Thread Rich Felker
On Tue, Jun 09, 2020 at 10:29:57AM +, Will Springer wrote:
> On Saturday, May 30, 2020 3:56:47 PM PDT you wrote:
> > On Friday, May 29, 2020 12:24:27 PM PDT Rich Felker wrote:
> > > The argument passing for pread/pwrite is historically a mess and
> > > differs between archs. musl has a dedicated macro that archs can
> > > define to override it. But it looks like it should match regardless of
> > > BE vs LE, and musl already defines it for powerpc with the default
> > > definition, adding a zero arg to start on an even arg-slot index,
> > > which is an odd register (since ppc32 args start with an odd one, r3).
> > > 
> > > > [6]:
> > > > https://gist.github.com/Skirmisher/02891c1a8cafa0ff18b2460933ef4f3c
> > > 
> > > I don't think this is correct, but I'm confused about where it's
> > > getting messed up because it looks like it should already be right.
> > 
> > Hmm, interesting. Will have to go back to it I guess...
> > 
> > > > This was enough to fix up the `file` bug. I'm no seasoned kernel
> > > > hacker, though, and there is still concern over the right way to
> > > > approach this, whether it should live in the kernel or libc, etc.
> > > > Frankly, I don't know the ABI structure enough to understand why the
> > > > register padding has to be different in this case, or what
> > > > lower-level component is responsible for it.. For comparison, I had
> > > > a
> > > > look at the mips tree, since it's bi-endian and has a similar 32/64
> > > > situation. There is a macro conditional upon endianness that is
> > > > responsible for munging long longs; it uses __MIPSEB__ and
> > > > __MIPSEL__
> > > > instead of an if/else on the generic __LITTLE_ENDIAN__. Not sure
> > > > what
> > > > to make of that. (It also simply swaps registers for LE, unlike what
> > > > I did for ppc.)
> > > 
> > > Indeed the problem is probably that you need to swap registers for LE,
> > > not remove the padding slot. Did you check what happens if you pass a
> > > value larger than 32 bits?
> > > 
> > > If so, the right way to fix this on the kernel side would be to
> > > construct the value as a union rather than by bitwise ops so it's
> > > 
> > > endian-agnostic:
> > >   (union { u32 parts[2]; u64 val; }){{ arg1, arg2 }}.val
> > > 
> > > But the kernel folks might prefer endian ifdefs for some odd reason...
> > 
> > You are right, this does seem odd considering what the other archs do.
> > It's quite possible I made a silly mistake, of course...
> > 
> > I haven't tested with values outside the 32-bit range yet; again, this
> > is new territory for me, so I haven't exactly done exhaustive tests on
> > everything. I'll give it a closer look.
> 
> I took some cues from the mips linux32 syscall setup, and drafted a new 
> patch defining a macro to compose the hi/lo parts within the function, 
> instead of swapping the args at the function definition. `file /bin/bash` 
> and `truncate -s 5G test` both work correctly now. This appears to be the 
> correct solution, so I'm not sure what silly mistake I made before, but 
> apologies for the confusion. I've updated my gist with the new patch [1].
> [...]
> 
> [1]: https://gist.github.com/Skirmisher/02891c1a8cafa0ff18b2460933ef4f3c

This patch looks correct. I prefer the union approach with no #ifdef
but I'm fine with either.

Rich


Re: [PATCH] mm: Move p?d_alloc_track to separate header file

2020-06-09 Thread Stephen Rothwell
Hi Christophe,

On Tue, 9 Jun 2020 17:24:14 +0200 Christophe Leroy 
 wrote:
>
> Le 09/06/2020 à 14:05, Joerg Roedel a écrit :
> > From: Joerg Roedel 
> > 
> > The functions are only used in two source files, so there is no need
> > for them to be in the global  header. Move them to the new
> >  header and include it only where needed.  
> 
> Do you mean we will now create a new header file for any new couple on 
> functions based on where they are used ?
> 
> Can you explain why this change is needed or is a plus ?

Well at a minimum, it means 45 lines less to be parsed every time the
linux/mm is included (in at last count, 1996 places some of which are
include files included by other files).  And, as someone who does a lot
of builds every day, I am in favour of that :-)

-- 
Cheers,
Stephen Rothwell


pgpdoWwAKz_k7.pgp
Description: OpenPGP digital signature


Re: ipr crashes due to NULL dma_need_drain since cc97923a5bcc ("block: move dma drain handling to scsi")

2020-06-09 Thread Christoph Hellwig
Can you try this patch?

---
>From 1c9913360a0494375c5655b133899cb4323bceb4 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Tue, 9 Jun 2020 14:07:31 +0200
Subject: scsi: wire up ata_scsi_dma_need_drain for SAS HBA drivers

We need ata_scsi_dma_need_drain for all drivers wired up to drive ATAPI
devices through libata.  That also includes the SAS HBA drivers in
addition to native libata HBA drivers.

Fixes: cc97923a5bcc ("block: move dma drain handling to scsi")
Reported-by: Michael Ellerman 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/aic94xx/aic94xx_init.c| 1 +
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 1 +
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 1 +
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 +
 drivers/scsi/ipr.c | 1 +
 drivers/scsi/isci/init.c   | 1 +
 drivers/scsi/mvsas/mv_init.c   | 1 +
 drivers/scsi/pm8001/pm8001_init.c  | 1 +
 8 files changed, 8 insertions(+)

diff --git a/drivers/scsi/aic94xx/aic94xx_init.c 
b/drivers/scsi/aic94xx/aic94xx_init.c
index d022407e5645c7..bef47f38dd0dbc 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -40,6 +40,7 @@ static struct scsi_host_template aic94xx_sht = {
/* .name is initialized */
.name   = "aic94xx",
.queuecommand   = sas_queuecommand,
+   .dma_need_drain = ata_scsi_dma_need_drain,
.target_alloc   = sas_target_alloc,
.slave_configure= sas_slave_configure,
.scan_finished  = asd_scan_finished,
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index 2e1718f9ade218..09a7669dad4c67 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1756,6 +1756,7 @@ static struct scsi_host_template sht_v1_hw = {
.proc_name  = DRV_NAME,
.module = THIS_MODULE,
.queuecommand   = sas_queuecommand,
+   .dma_need_drain = ata_scsi_dma_need_drain,
.target_alloc   = sas_target_alloc,
.slave_configure= hisi_sas_slave_configure,
.scan_finished  = hisi_sas_scan_finished,
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index e7e7849a4c14e2..968d3870235359 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -3532,6 +3532,7 @@ static struct scsi_host_template sht_v2_hw = {
.proc_name  = DRV_NAME,
.module = THIS_MODULE,
.queuecommand   = sas_queuecommand,
+   .dma_need_drain = ata_scsi_dma_need_drain,
.target_alloc   = sas_target_alloc,
.slave_configure= hisi_sas_slave_configure,
.scan_finished  = hisi_sas_scan_finished,
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 3e6b78a1f993b9..55e2321a65bc5f 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -3075,6 +3075,7 @@ static struct scsi_host_template sht_v3_hw = {
.proc_name  = DRV_NAME,
.module = THIS_MODULE,
.queuecommand   = sas_queuecommand,
+   .dma_need_drain = ata_scsi_dma_need_drain,
.target_alloc   = sas_target_alloc,
.slave_configure= hisi_sas_slave_configure,
.scan_finished  = hisi_sas_scan_finished,
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 7d77997d26d457..7d86f4ca266c86 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -6731,6 +6731,7 @@ static struct scsi_host_template driver_template = {
.compat_ioctl = ipr_ioctl,
 #endif
.queuecommand = ipr_queuecommand,
+   .dma_need_drain = ata_scsi_dma_need_drain,
.eh_abort_handler = ipr_eh_abort,
.eh_device_reset_handler = ipr_eh_dev_reset,
.eh_host_reset_handler = ipr_eh_host_reset,
diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
index 974c3b9116d5ba..085e285f427d93 100644
--- a/drivers/scsi/isci/init.c
+++ b/drivers/scsi/isci/init.c
@@ -153,6 +153,7 @@ static struct scsi_host_template isci_sht = {
.name   = DRV_NAME,
.proc_name  = DRV_NAME,
.queuecommand   = sas_queuecommand,
+   .dma_need_drain = ata_scsi_dma_need_drain,
.target_alloc   = sas_target_alloc,
.slave_configure= sas_slave_configure,
.scan_finished  = isci_host_scan_finished,
diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index 5973eed9493820..b0de3bdb01db06 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -33,6 +33,7 @@ static struct scsi_host_template mvs_sht = {

Re: [PATCH] mm: Move p?d_alloc_track to separate header file

2020-06-09 Thread Christophe Leroy




Le 09/06/2020 à 14:05, Joerg Roedel a écrit :

From: Joerg Roedel 

The functions are only used in two source files, so there is no need
for them to be in the global  header. Move them to the new
 header and include it only where needed.


Do you mean we will now create a new header file for any new couple on 
functions based on where they are used ?


Can you explain why this change is needed or is a plus ?

Christophe



Signed-off-by: Joerg Roedel 
---
  include/linux/mm.h| 45 ---
  include/linux/pgalloc-track.h | 51 +++
  lib/ioremap.c |  1 +
  mm/vmalloc.c  |  1 +
  4 files changed, 53 insertions(+), 45 deletions(-)
  create mode 100644 include/linux/pgalloc-track.h

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9d6042178ca7..22d8b2a2c9bc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2092,51 +2092,11 @@ static inline pud_t *pud_alloc(struct mm_struct *mm, 
p4d_t *p4d,
NULL : pud_offset(p4d, address);
  }
  
-static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,

-unsigned long address,
-pgtbl_mod_mask *mod_mask)
-
-{
-   if (unlikely(pgd_none(*pgd))) {
-   if (__p4d_alloc(mm, pgd, address))
-   return NULL;
-   *mod_mask |= PGTBL_PGD_MODIFIED;
-   }
-
-   return p4d_offset(pgd, address);
-}
-
-static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
-unsigned long address,
-pgtbl_mod_mask *mod_mask)
-{
-   if (unlikely(p4d_none(*p4d))) {
-   if (__pud_alloc(mm, p4d, address))
-   return NULL;
-   *mod_mask |= PGTBL_P4D_MODIFIED;
-   }
-
-   return pud_offset(p4d, address);
-}
-
  static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned 
long address)
  {
return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
NULL: pmd_offset(pud, address);
  }
-
-static inline pmd_t *pmd_alloc_track(struct mm_struct *mm, pud_t *pud,
-unsigned long address,
-pgtbl_mod_mask *mod_mask)
-{
-   if (unlikely(pud_none(*pud))) {
-   if (__pmd_alloc(mm, pud, address))
-   return NULL;
-   *mod_mask |= PGTBL_PUD_MODIFIED;
-   }
-
-   return pmd_offset(pud, address);
-}
  #endif /* CONFIG_MMU */
  
  #if USE_SPLIT_PTE_PTLOCKS

@@ -2252,11 +2212,6 @@ static inline void pgtable_pte_page_dtor(struct page 
*page)
((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd))? \
NULL: pte_offset_kernel(pmd, address))
  
-#define pte_alloc_kernel_track(pmd, address, mask)			\

-   ((unlikely(pmd_none(*(pmd))) && \
- (__pte_alloc_kernel(pmd) || ({*(mask)|=PGTBL_PMD_MODIFIED;0;})))?\
-   NULL: pte_offset_kernel(pmd, address))
-
  #if USE_SPLIT_PMD_PTLOCKS
  
  static struct page *pmd_to_page(pmd_t *pmd)

diff --git a/include/linux/pgalloc-track.h b/include/linux/pgalloc-track.h
new file mode 100644
index ..1dcc865029a2
--- /dev/null
+++ b/include/linux/pgalloc-track.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_PGALLLC_TRACK_H
+#define _LINUX_PGALLLC_TRACK_H
+
+#if defined(CONFIG_MMU)
+static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
+unsigned long address,
+pgtbl_mod_mask *mod_mask)
+{
+   if (unlikely(pgd_none(*pgd))) {
+   if (__p4d_alloc(mm, pgd, address))
+   return NULL;
+   *mod_mask |= PGTBL_PGD_MODIFIED;
+   }
+
+   return p4d_offset(pgd, address);
+}
+
+static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
+unsigned long address,
+pgtbl_mod_mask *mod_mask)
+{
+   if (unlikely(p4d_none(*p4d))) {
+   if (__pud_alloc(mm, p4d, address))
+   return NULL;
+   *mod_mask |= PGTBL_P4D_MODIFIED;
+   }
+
+   return pud_offset(p4d, address);
+}
+
+static inline pmd_t *pmd_alloc_track(struct mm_struct *mm, pud_t *pud,
+unsigned long address,
+pgtbl_mod_mask *mod_mask)
+{
+   if (unlikely(pud_none(*pud))) {
+   if (__pmd_alloc(mm, pud, address))
+   return NULL;
+   *mod_mask |= PGTBL_PUD_MODIFIED;
+   }
+
+   return pmd_offset(pud, address);
+}
+#endif /* CONFIG_MMU */
+
+#define pte_alloc_kernel_track(pmd, address, mask) \
+   ((unlikely(pmd_none(*(pmd))) && 

Re: [PATCH] mm: Move p?d_alloc_track to separate header file

2020-06-09 Thread Mike Rapoport
On Tue, Jun 09, 2020 at 02:05:33PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> The functions are only used in two source files, so there is no need
> for them to be in the global  header. Move them to the new
>  header and include it only where needed.
> 
> Signed-off-by: Joerg Roedel 

Acked-by: Mike Rapoport 

> ---
>  include/linux/mm.h| 45 ---
>  include/linux/pgalloc-track.h | 51 +++
>  lib/ioremap.c |  1 +
>  mm/vmalloc.c  |  1 +
>  4 files changed, 53 insertions(+), 45 deletions(-)
>  create mode 100644 include/linux/pgalloc-track.h
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 9d6042178ca7..22d8b2a2c9bc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2092,51 +2092,11 @@ static inline pud_t *pud_alloc(struct mm_struct *mm, 
> p4d_t *p4d,
>   NULL : pud_offset(p4d, address);
>  }
>  
> -static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
> -  unsigned long address,
> -  pgtbl_mod_mask *mod_mask)
> -
> -{
> - if (unlikely(pgd_none(*pgd))) {
> - if (__p4d_alloc(mm, pgd, address))
> - return NULL;
> - *mod_mask |= PGTBL_PGD_MODIFIED;
> - }
> -
> - return p4d_offset(pgd, address);
> -}
> -
> -static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
> -  unsigned long address,
> -  pgtbl_mod_mask *mod_mask)
> -{
> - if (unlikely(p4d_none(*p4d))) {
> - if (__pud_alloc(mm, p4d, address))
> - return NULL;
> - *mod_mask |= PGTBL_P4D_MODIFIED;
> - }
> -
> - return pud_offset(p4d, address);
> -}
> -
>  static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned 
> long address)
>  {
>   return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
>   NULL: pmd_offset(pud, address);
>  }
> -
> -static inline pmd_t *pmd_alloc_track(struct mm_struct *mm, pud_t *pud,
> -  unsigned long address,
> -  pgtbl_mod_mask *mod_mask)
> -{
> - if (unlikely(pud_none(*pud))) {
> - if (__pmd_alloc(mm, pud, address))
> - return NULL;
> - *mod_mask |= PGTBL_PUD_MODIFIED;
> - }
> -
> - return pmd_offset(pud, address);
> -}
>  #endif /* CONFIG_MMU */
>  
>  #if USE_SPLIT_PTE_PTLOCKS
> @@ -2252,11 +2212,6 @@ static inline void pgtable_pte_page_dtor(struct page 
> *page)
>   ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd))? \
>   NULL: pte_offset_kernel(pmd, address))
>  
> -#define pte_alloc_kernel_track(pmd, address, mask)   \
> - ((unlikely(pmd_none(*(pmd))) && \
> -   (__pte_alloc_kernel(pmd) || ({*(mask)|=PGTBL_PMD_MODIFIED;0;})))?\
> - NULL: pte_offset_kernel(pmd, address))
> -
>  #if USE_SPLIT_PMD_PTLOCKS
>  
>  static struct page *pmd_to_page(pmd_t *pmd)
> diff --git a/include/linux/pgalloc-track.h b/include/linux/pgalloc-track.h
> new file mode 100644
> index ..1dcc865029a2
> --- /dev/null
> +++ b/include/linux/pgalloc-track.h
> @@ -0,0 +1,51 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_PGALLLC_TRACK_H
> +#define _LINUX_PGALLLC_TRACK_H
> +
> +#if defined(CONFIG_MMU)
> +static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
> +  unsigned long address,
> +  pgtbl_mod_mask *mod_mask)
> +{
> + if (unlikely(pgd_none(*pgd))) {
> + if (__p4d_alloc(mm, pgd, address))
> + return NULL;
> + *mod_mask |= PGTBL_PGD_MODIFIED;
> + }
> +
> + return p4d_offset(pgd, address);
> +}
> +
> +static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
> +  unsigned long address,
> +  pgtbl_mod_mask *mod_mask)
> +{
> + if (unlikely(p4d_none(*p4d))) {
> + if (__pud_alloc(mm, p4d, address))
> + return NULL;
> + *mod_mask |= PGTBL_P4D_MODIFIED;
> + }
> +
> + return pud_offset(p4d, address);
> +}
> +
> +static inline pmd_t *pmd_alloc_track(struct mm_struct *mm, pud_t *pud,
> +  unsigned long address,
> +  pgtbl_mod_mask *mod_mask)
> +{
> + if (unlikely(pud_none(*pud))) {
> + if (__pmd_alloc(mm, pud, address))
> + return NULL;
> + *mod_mask |= PGTBL_PUD_MODIFIED;
> + }
> +
> + return pmd_offset(pud, address);
> +}
> +#endif /* CONFIG_MMU */
> +
> +#define pte_alloc_kernel_track(pmd, address, mask)   \
> + ((unlikely(pmd_none(*(pmd))) &&   

[PATCH 00/17] spelling.txt: /decriptors/descriptors/

2020-06-09 Thread Kieran Bingham
I wouldn't normally go through spelling fixes, but I caught sight of
this typo twice, and then foolishly grepped the tree for it, and saw how
pervasive it was.

so here I am ... fixing a typo globally... but with an addition in
scripts/spelling.txt so it shouldn't re-appear ;-)

Cc: linux-arm-ker...@lists.infradead.org (moderated list:TI DAVINCI MACHINE 
SUPPORT)
Cc: linux-ker...@vger.kernel.org (open list)
Cc: linux...@vger.kernel.org (open list:DEVICE FREQUENCY EVENT (DEVFREQ-EVENT))
Cc: linux-g...@vger.kernel.org (open list:GPIO SUBSYSTEM)
Cc: dri-de...@lists.freedesktop.org (open list:DRM DRIVERS)
Cc: linux-r...@vger.kernel.org (open list:HFI1 DRIVER)
Cc: linux-in...@vger.kernel.org (open list:INPUT (KEYBOARD, MOUSE, JOYSTICK, 
TOUCHSCREEN)...)
Cc: linux-...@lists.infradead.org (open list:NAND FLASH SUBSYSTEM)
Cc: net...@vger.kernel.org (open list:NETWORKING DRIVERS)
Cc: ath...@lists.infradead.org (open list:QUALCOMM ATHEROS ATH10K WIRELESS 
DRIVER)
Cc: linux-wirel...@vger.kernel.org (open list:NETWORKING DRIVERS (WIRELESS))
Cc: linux-s...@vger.kernel.org (open list:IBM Power Virtual FC Device Drivers)
Cc: linuxppc-dev@lists.ozlabs.org (open list:LINUX FOR POWERPC (32-BIT AND 
64-BIT))
Cc: linux-...@vger.kernel.org (open list:USB SUBSYSTEM)
Cc: virtualizat...@lists.linux-foundation.org (open list:VIRTIO CORE AND NET 
DRIVERS)
Cc: linux...@kvack.org (open list:MEMORY MANAGEMENT)


Kieran Bingham (17):
  arch: arm: mach-davinci: Fix trivial spelling
  drivers: infiniband: Fix trivial spelling
  drivers: gpio: Fix trivial spelling
  drivers: mtd: nand: raw: Fix trivial spelling
  drivers: net: Fix trivial spelling
  drivers: scsi: Fix trivial spelling
  drivers: usb: Fix trivial spelling
  drivers: gpu: drm: Fix trivial spelling
  drivers: regulator: Fix trivial spelling
  drivers: input: joystick: Fix trivial spelling
  drivers: infiniband: Fix trivial spelling
  drivers: devfreq: Fix trivial spelling
  include: dynamic_debug.h: Fix trivial spelling
  kernel: trace: Fix trivial spelling
  mm: Fix trivial spelling
  regulator: gpio: Fix trivial spelling
  scripts/spelling.txt: Add descriptors correction

 arch/arm/mach-davinci/board-da830-evm.c  | 2 +-
 drivers/devfreq/devfreq-event.c  | 4 ++--
 drivers/gpio/TODO| 2 +-
 drivers/gpu/drm/drm_dp_helper.c  | 2 +-
 drivers/infiniband/hw/hfi1/iowait.h  | 2 +-
 drivers/infiniband/hw/hfi1/ipoib_tx.c| 2 +-
 drivers/infiniband/hw/hfi1/verbs_txreq.h | 2 +-
 drivers/input/joystick/spaceball.c   | 2 +-
 drivers/mtd/nand/raw/mxc_nand.c  | 2 +-
 drivers/mtd/nand/raw/nand_bbt.c  | 2 +-
 drivers/net/wan/lmc/lmc_main.c   | 2 +-
 drivers/net/wireless/ath/ath10k/usb.c| 2 +-
 drivers/net/wireless/ath/ath6kl/usb.c| 2 +-
 drivers/net/wireless/cisco/airo.c| 2 +-
 drivers/regulator/fixed.c| 2 +-
 drivers/regulator/gpio-regulator.c   | 2 +-
 drivers/scsi/ibmvscsi/ibmvfc.c   | 2 +-
 drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
 drivers/scsi/qla2xxx/qla_inline.h| 2 +-
 drivers/scsi/qla2xxx/qla_iocb.c  | 6 +++---
 drivers/usb/core/of.c| 2 +-
 include/drm/drm_dp_helper.h  | 2 +-
 include/linux/dynamic_debug.h| 2 +-
 kernel/trace/trace_events.c  | 2 +-
 mm/balloon_compaction.c  | 4 ++--
 scripts/spelling.txt | 1 +
 26 files changed, 30 insertions(+), 29 deletions(-)

-- 
2.25.1



[PATCH 06/17] drivers: scsi: Fix trivial spelling

2020-06-09 Thread Kieran Bingham
The word 'descriptor' is misspelled throughout the tree.

Fix it up accordingly:
decriptors -> descriptors

Signed-off-by: Kieran Bingham 
---
 drivers/scsi/ibmvscsi/ibmvfc.c| 2 +-
 drivers/scsi/ibmvscsi/ibmvscsi.c  | 2 +-
 drivers/scsi/qla2xxx/qla_inline.h | 2 +-
 drivers/scsi/qla2xxx/qla_iocb.c   | 6 +++---
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 635f6f9cffc4..77f4d37d5bd6 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1344,7 +1344,7 @@ static void ibmvfc_map_sg_list(struct scsi_cmnd *scmd, 
int nseg,
 }
 
 /**
- * ibmvfc_map_sg_data - Maps dma for a scatterlist and initializes decriptor 
fields
+ * ibmvfc_map_sg_data - Maps dma for a scatterlist and initializes descriptor 
fields
  * @scmd:  struct scsi_cmnd with the scatterlist
  * @evt:   ibmvfc event struct
  * @vfc_cmd:   vfc_cmd that contains the memory descriptor
diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 44e64aa21194..a92587624c72 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -667,7 +667,7 @@ static int map_sg_list(struct scsi_cmnd *cmd, int nseg,
 }
 
 /**
- * map_sg_data: - Maps dma for a scatterlist and initializes decriptor fields
+ * map_sg_data: - Maps dma for a scatterlist and initializes descriptor fields
  * @cmd:   struct scsi_cmnd with the scatterlist
  * @srp_cmd:   srp_cmd that contains the memory descriptor
  * @dev:   device for which to map dma memory
diff --git a/drivers/scsi/qla2xxx/qla_inline.h 
b/drivers/scsi/qla2xxx/qla_inline.h
index 1fb6ccac07cc..861dc522723c 100644
--- a/drivers/scsi/qla2xxx/qla_inline.h
+++ b/drivers/scsi/qla2xxx/qla_inline.h
@@ -11,7 +11,7 @@
  * Continuation Type 1 IOCBs to allocate.
  *
  * @vha: HA context
- * @dsds: number of data segment decriptors needed
+ * @dsds: number of data segment descriptors needed
  *
  * Returns the number of IOCB entries needed to store @dsds.
  */
diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index 8865c35d3421..1d3c58c5f0e2 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -44,7 +44,7 @@ qla2x00_get_cmd_direction(srb_t *sp)
  * qla2x00_calc_iocbs_32() - Determine number of Command Type 2 and
  * Continuation Type 0 IOCBs to allocate.
  *
- * @dsds: number of data segment decriptors needed
+ * @dsds: number of data segment descriptors needed
  *
  * Returns the number of IOCB entries needed to store @dsds.
  */
@@ -66,7 +66,7 @@ qla2x00_calc_iocbs_32(uint16_t dsds)
  * qla2x00_calc_iocbs_64() - Determine number of Command Type 3 and
  * Continuation Type 1 IOCBs to allocate.
  *
- * @dsds: number of data segment decriptors needed
+ * @dsds: number of data segment descriptors needed
  *
  * Returns the number of IOCB entries needed to store @dsds.
  */
@@ -669,7 +669,7 @@ qla24xx_build_scsi_type_6_iocbs(srb_t *sp, struct 
cmd_type_6 *cmd_pkt,
  * qla24xx_calc_dsd_lists() - Determine number of DSD list required
  * for Command Type 6.
  *
- * @dsds: number of data segment decriptors needed
+ * @dsds: number of data segment descriptors needed
  *
  * Returns the number of dsd list needed to store @dsds.
  */
-- 
2.25.1



Re: Add a new fchmodat4() syscall, v2

2020-06-09 Thread Florian Weimer
* Palmer Dabbelt:

> This patch set adds fchmodat4(), a new syscall. The actual
> implementation is super simple: essentially it's just the same as
> fchmodat(), but LOOKUP_FOLLOW is conditionally set based on the flags.
> I've attempted to make this match "man 2 fchmodat" as closely as
> possible, which says EINVAL is returned for invalid flags (as opposed to
> ENOTSUPP, which is currently returned by glibc for AT_SYMLINK_NOFOLLOW).
> I have a sketch of a glibc patch that I haven't even compiled yet, but
> seems fairly straight-forward:

What's the status here?  We'd really like to see this system call in the
kernel because our emulation in glibc has its problems (especially if
/proc is not mounted).

Thanks,
Florian



Re: [RFC PATCH] ASoC: fsl_asrc_dma: Fix warning "Cannot create DMA dma:tx symlink"

2020-06-09 Thread Mark Brown
On Mon, Jun 08, 2020 at 03:07:00PM +0800, Shengjiu Wang wrote:
> The issue log is:
> 
> [   48.021506] CPU: 0 PID: 664 Comm: aplay Not tainted 
> 5.7.0-rc1-13120-g12b434cbbea0 #343
> [   48.031063] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [   48.037638] [] (unwind_backtrace) from [] 
> (show_stack+0x10/0x14)
> [   48.045413] [] (show_stack) from [] 
> (dump_stack+0xe4/0x118)

Please think hard before including complete backtraces in upstream
reports, they are very large and contain almost no useful information
relative to their size so often obscure the relevant content in your
message. If part of the backtrace is usefully illustrative (it often is
for search engines if nothing else) then it's usually better to pull out
the relevant sections.

> ---
>  include/sound/dmaengine_pcm.h | 11 ++
>  include/sound/soc.h   |  2 ++
>  sound/soc/fsl/fsl_asrc_common.h   |  2 ++
>  sound/soc/fsl/fsl_asrc_dma.c  | 49 +--
>  sound/soc/soc-core.c  |  3 +-
>  sound/soc/soc-generic-dmaengine-pcm.c | 12 ---
>  6 files changed, 55 insertions(+), 24 deletions(-)

Please split the core changes you are adding from the driver changes
that use them.

The change does look reasonable for the issue, it's not ideal but I'm
not sure it's avoidable with DPCM.


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/pseries/svm: Remove unwanted check for shared_lppaca_size

2020-06-09 Thread Laurent Dufour

Le 09/06/2020 à 12:57, Satheesh Rajendran a écrit :

Early secure guest boot hits the below crash while booting with
vcpus numbers aligned with page boundary for PAGE size of 64k
and LPPACA size of 1k i.e 64, 128 etc, due to the BUG_ON assert
for shared_lppaca_total_size equal to shared_lppaca_size,

  [0.00] Partition configured for 64 cpus.
  [0.00] CPU maps initialized for 1 thread per core
  [0.00] [ cut here ]
  [0.00] kernel BUG at arch/powerpc/kernel/paca.c:89!
  [0.00] Oops: Exception in kernel mode, sig: 5 [#1]
  [0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries

which is not necessary, let's remove it.



Reviewed-by: Laurent Dufour 


Cc: linux-ker...@vger.kernel.org
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Signed-off-by: Satheesh Rajendran 
---
  arch/powerpc/kernel/paca.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 949eceb25..10b7c54a7 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -86,7 +86,7 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
 * This is very early in boot, so no harm done if the kernel crashes at
 * this point.
 */
-   BUG_ON(shared_lppaca_size >= shared_lppaca_total_size);
+   BUG_ON(shared_lppaca_size > shared_lppaca_total_size);
  
  	return ptr;

  }





[PATCH] mm: Move p?d_alloc_track to separate header file

2020-06-09 Thread Joerg Roedel
From: Joerg Roedel 

The functions are only used in two source files, so there is no need
for them to be in the global  header. Move them to the new
 header and include it only where needed.

Signed-off-by: Joerg Roedel 
---
 include/linux/mm.h| 45 ---
 include/linux/pgalloc-track.h | 51 +++
 lib/ioremap.c |  1 +
 mm/vmalloc.c  |  1 +
 4 files changed, 53 insertions(+), 45 deletions(-)
 create mode 100644 include/linux/pgalloc-track.h

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9d6042178ca7..22d8b2a2c9bc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2092,51 +2092,11 @@ static inline pud_t *pud_alloc(struct mm_struct *mm, 
p4d_t *p4d,
NULL : pud_offset(p4d, address);
 }
 
-static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
-unsigned long address,
-pgtbl_mod_mask *mod_mask)
-
-{
-   if (unlikely(pgd_none(*pgd))) {
-   if (__p4d_alloc(mm, pgd, address))
-   return NULL;
-   *mod_mask |= PGTBL_PGD_MODIFIED;
-   }
-
-   return p4d_offset(pgd, address);
-}
-
-static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
-unsigned long address,
-pgtbl_mod_mask *mod_mask)
-{
-   if (unlikely(p4d_none(*p4d))) {
-   if (__pud_alloc(mm, p4d, address))
-   return NULL;
-   *mod_mask |= PGTBL_P4D_MODIFIED;
-   }
-
-   return pud_offset(p4d, address);
-}
-
 static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long 
address)
 {
return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
NULL: pmd_offset(pud, address);
 }
-
-static inline pmd_t *pmd_alloc_track(struct mm_struct *mm, pud_t *pud,
-unsigned long address,
-pgtbl_mod_mask *mod_mask)
-{
-   if (unlikely(pud_none(*pud))) {
-   if (__pmd_alloc(mm, pud, address))
-   return NULL;
-   *mod_mask |= PGTBL_PUD_MODIFIED;
-   }
-
-   return pmd_offset(pud, address);
-}
 #endif /* CONFIG_MMU */
 
 #if USE_SPLIT_PTE_PTLOCKS
@@ -2252,11 +2212,6 @@ static inline void pgtable_pte_page_dtor(struct page 
*page)
((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd))? \
NULL: pte_offset_kernel(pmd, address))
 
-#define pte_alloc_kernel_track(pmd, address, mask) \
-   ((unlikely(pmd_none(*(pmd))) && \
- (__pte_alloc_kernel(pmd) || ({*(mask)|=PGTBL_PMD_MODIFIED;0;})))?\
-   NULL: pte_offset_kernel(pmd, address))
-
 #if USE_SPLIT_PMD_PTLOCKS
 
 static struct page *pmd_to_page(pmd_t *pmd)
diff --git a/include/linux/pgalloc-track.h b/include/linux/pgalloc-track.h
new file mode 100644
index ..1dcc865029a2
--- /dev/null
+++ b/include/linux/pgalloc-track.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_PGALLLC_TRACK_H
+#define _LINUX_PGALLLC_TRACK_H
+
+#if defined(CONFIG_MMU)
+static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
+unsigned long address,
+pgtbl_mod_mask *mod_mask)
+{
+   if (unlikely(pgd_none(*pgd))) {
+   if (__p4d_alloc(mm, pgd, address))
+   return NULL;
+   *mod_mask |= PGTBL_PGD_MODIFIED;
+   }
+
+   return p4d_offset(pgd, address);
+}
+
+static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
+unsigned long address,
+pgtbl_mod_mask *mod_mask)
+{
+   if (unlikely(p4d_none(*p4d))) {
+   if (__pud_alloc(mm, p4d, address))
+   return NULL;
+   *mod_mask |= PGTBL_P4D_MODIFIED;
+   }
+
+   return pud_offset(p4d, address);
+}
+
+static inline pmd_t *pmd_alloc_track(struct mm_struct *mm, pud_t *pud,
+unsigned long address,
+pgtbl_mod_mask *mod_mask)
+{
+   if (unlikely(pud_none(*pud))) {
+   if (__pmd_alloc(mm, pud, address))
+   return NULL;
+   *mod_mask |= PGTBL_PUD_MODIFIED;
+   }
+
+   return pmd_offset(pud, address);
+}
+#endif /* CONFIG_MMU */
+
+#define pte_alloc_kernel_track(pmd, address, mask) \
+   ((unlikely(pmd_none(*(pmd))) && \
+ (__pte_alloc_kernel(pmd) || ({*(mask)|=PGTBL_PMD_MODIFIED;0;})))?\
+   NULL: pte_offset_kernel(pmd, address))
+
+#endif /* _LINUX_PGALLLC_TRACK_H */
diff --git a/lib/ioremap.c b/lib/ioremap.c
index ad485f081

Re: ipr crashes due to NULL dma_need_drain since cc97923a5bcc ("block: move dma drain handling to scsi")

2020-06-09 Thread Christoph Hellwig
On Tue, Jun 09, 2020 at 08:00:35PM +1000, Michael Ellerman wrote:
> Hi all,
> 
> I'm seeing crashes on powerpc with the ipr driver, which I'm fairly sure
> are due to dma_need_drain being NULL.

Ooops, my changes completely forgot about SAS attached ATAPI devices.
I'll cook up a fix in a bit.


[PATCH] powerpc/pseries/svm: Fixup align argument in alloc_shared_lppaca() function

2020-06-09 Thread Satheesh Rajendran
Argument "align" in alloc_shared_lppaca() function was unused inside the
function. Let's fix it and update code comment.

Cc: linux-ker...@vger.kernel.org
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Signed-off-by: Satheesh Rajendran 
---
 arch/powerpc/kernel/paca.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 8d96169c597e..9088e107fb43 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -70,7 +70,7 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
 
shared_lppaca =
memblock_alloc_try_nid(shared_lppaca_total_size,
-  PAGE_SIZE, MEMBLOCK_LOW_LIMIT,
+  align, MEMBLOCK_LOW_LIMIT,
   limit, NUMA_NO_NODE);
if (!shared_lppaca)
panic("cannot allocate shared data");
@@ -122,7 +122,14 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
return NULL;
 
if (is_secure_guest())
-   lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu);
+   /*
+* See Documentation/powerpc/ultravisor.rst for mode details
+*
+* UV/HV data share is in PAGE granularity, In order to minimize
+* the number of pages shared and maximize the use of a page,
+* let's use page align.
+*/
+   lp = alloc_shared_lppaca(LPPACA_SIZE, PAGE_SIZE, limit, cpu);
else
lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
 
-- 
2.26.2



[PATCH] powerpc/pseries/svm: Remove unwanted check for shared_lppaca_size

2020-06-09 Thread Satheesh Rajendran
Early secure guest boot hits the below crash while booting with
vcpus numbers aligned with page boundary for PAGE size of 64k
and LPPACA size of 1k i.e 64, 128 etc, due to the BUG_ON assert
for shared_lppaca_total_size equal to shared_lppaca_size,

 [0.00] Partition configured for 64 cpus.
 [0.00] CPU maps initialized for 1 thread per core
 [0.00] [ cut here ]
 [0.00] kernel BUG at arch/powerpc/kernel/paca.c:89!
 [0.00] Oops: Exception in kernel mode, sig: 5 [#1]
 [0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries

which is not necessary, let's remove it.

Cc: linux-ker...@vger.kernel.org
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Signed-off-by: Satheesh Rajendran 
---
 arch/powerpc/kernel/paca.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 949eceb25..10b7c54a7 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -86,7 +86,7 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
 * This is very early in boot, so no harm done if the kernel crashes at
 * this point.
 */
-   BUG_ON(shared_lppaca_size >= shared_lppaca_total_size);
+   BUG_ON(shared_lppaca_size > shared_lppaca_total_size);
 
return ptr;
 }
-- 
2.26.2



Re: [musl] ppc64le and 32-bit LE userland compatibility

2020-06-09 Thread Will Springer
On Saturday, May 30, 2020 3:56:47 PM PDT you wrote:
> On Friday, May 29, 2020 12:24:27 PM PDT Rich Felker wrote:
> > The argument passing for pread/pwrite is historically a mess and
> > differs between archs. musl has a dedicated macro that archs can
> > define to override it. But it looks like it should match regardless of
> > BE vs LE, and musl already defines it for powerpc with the default
> > definition, adding a zero arg to start on an even arg-slot index,
> > which is an odd register (since ppc32 args start with an odd one, r3).
> > 
> > > [6]:
> > > https://gist.github.com/Skirmisher/02891c1a8cafa0ff18b2460933ef4f3c
> > 
> > I don't think this is correct, but I'm confused about where it's
> > getting messed up because it looks like it should already be right.
> 
> Hmm, interesting. Will have to go back to it I guess...
> 
> > > This was enough to fix up the `file` bug. I'm no seasoned kernel
> > > hacker, though, and there is still concern over the right way to
> > > approach this, whether it should live in the kernel or libc, etc.
> > > Frankly, I don't know the ABI structure enough to understand why the
> > > register padding has to be different in this case, or what
> > > lower-level component is responsible for it.. For comparison, I had
> > > a
> > > look at the mips tree, since it's bi-endian and has a similar 32/64
> > > situation. There is a macro conditional upon endianness that is
> > > responsible for munging long longs; it uses __MIPSEB__ and
> > > __MIPSEL__
> > > instead of an if/else on the generic __LITTLE_ENDIAN__. Not sure
> > > what
> > > to make of that. (It also simply swaps registers for LE, unlike what
> > > I did for ppc.)
> > 
> > Indeed the problem is probably that you need to swap registers for LE,
> > not remove the padding slot. Did you check what happens if you pass a
> > value larger than 32 bits?
> > 
> > If so, the right way to fix this on the kernel side would be to
> > construct the value as a union rather than by bitwise ops so it's
> > 
> > endian-agnostic:
> > (union { u32 parts[2]; u64 val; }){{ arg1, arg2 }}.val
> > 
> > But the kernel folks might prefer endian ifdefs for some odd reason...
> 
> You are right, this does seem odd considering what the other archs do.
> It's quite possible I made a silly mistake, of course...
> 
> I haven't tested with values outside the 32-bit range yet; again, this
> is new territory for me, so I haven't exactly done exhaustive tests on
> everything. I'll give it a closer look.

I took some cues from the mips linux32 syscall setup, and drafted a new 
patch defining a macro to compose the hi/lo parts within the function, 
instead of swapping the args at the function definition. `file /bin/bash` 
and `truncate -s 5G test` both work correctly now. This appears to be the 
correct solution, so I'm not sure what silly mistake I made before, but 
apologies for the confusion. I've updated my gist with the new patch [1].

> > > Also worth noting is the one other outstanding bug, where the
> > > time-related syscalls in the 32-bit vDSO seem to return garbage. It
> > > doesn't look like an endian bug to me, and it doesn't affect
> > > standard
> > > syscalls (which is why if you run `date` on musl it prints the
> > > correct time, unlike on glibc). The vDSO time functions are
> > > implemented in ppc asm (arch/powerpc/kernel/vdso32/ gettimeofday.S),
> > > and I've never touched the stuff, so if anyone has a clue I'm all
> > > ears.
> > 
> > Not sure about this. Worst-case, just leave it disabled until someone
> > finds a fix.
> 
> Apparently these asm implementations are being replaced by the generic C
> ones [1], so it may be this fixes itself on its own.
> 
> Thanks,
> Will [she/her]
> 
> [1]:
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=173231

I mentioned in Christophe's thread the other day, but his patchset does 
solve the vdso32 issues, though it introduced problems in vdso64 in my 
testing. With that solved and the syscall situation established, I think 
the kernel state is stable enough to start looking at solidifying libc/
compiler stuff. I'll try to get a larger userland built in the near future 
to try to catch any remaining problems (before rebuilding it all when 
libc/ABI support becomes explicit).

Cheers,
Will [she/her]

[1]: https://gist.github.com/Skirmisher/02891c1a8cafa0ff18b2460933ef4f3c






Re: [PATCH v3] selftests: powerpc: Fix CPU affinity for child process

2020-06-09 Thread Satheesh Rajendran
On Tue, Jun 09, 2020 at 01:44:23PM +0530, Harish wrote:
> On systems with large number of cpus, test fails trying to set
> affinity by calling sched_setaffinity() with smaller size for
> affinity mask. This patch fixes it by making sure that the size of
> allocated affinity mask is dependent on the number of CPUs as
> reported by get_nprocs().
> 
> Fixes: 00b7ec5c9cf3 ("selftests/powerpc: Import Anton's context_switch2 
> benchmark")
> Reported-by: Shirisha Ganta 
> Signed-off-by: Sandipan Das 
> Signed-off-by: Harish 
> ---

Reviewed-by: Satheesh Rajendran 

> v2: 
> https://lore.kernel.org/linuxppc-dev/20200609034005.520137-1-har...@linux.ibm.com/
> 
> Changes from v2:
> - Interchanged size and ncpus as suggested by Satheesh
> - Revert the exit code as suggested by Satheesh
> - Added NULL check for the affinity mask as suggested by Kamalesh
> - Freed the affinity mask allocation after affinity is set
>   as suggested by Kamalesh
> - Changed "cpu set" to "affinity mask" in the commit message
> 
> ---
>  .../powerpc/benchmarks/context_switch.c   | 21 ++-
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/powerpc/benchmarks/context_switch.c 
> b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> index a2e8c9da7fa5..d50cc05df495 100644
> --- a/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> +++ b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -104,8 +105,9 @@ static void start_thread_on(void *(*fn)(void *), void 
> *arg, unsigned long cpu)
> 
>  static void start_process_on(void *(*fn)(void *), void *arg, unsigned long 
> cpu)
>  {
> - int pid;
> - cpu_set_t cpuset;
> + int pid, ncpus;
> + cpu_set_t *cpuset;
> + size_t size;
> 
>   pid = fork();
>   if (pid == -1) {
> @@ -116,14 +118,23 @@ static void start_process_on(void *(*fn)(void *), void 
> *arg, unsigned long cpu)
>   if (pid)
>   return;
> 
> - CPU_ZERO(&cpuset);
> - CPU_SET(cpu, &cpuset);
> + ncpus = get_nprocs();
> + size = CPU_ALLOC_SIZE(ncpus);
> + cpuset = CPU_ALLOC(ncpus);
> + if (!cpuset) {
> + perror("malloc");
> + exit(1);
> + }
> + CPU_ZERO_S(size, cpuset);
> + CPU_SET_S(cpu, size, cpuset);
> 
> - if (sched_setaffinity(0, sizeof(cpuset), &cpuset)) {
> + if (sched_setaffinity(0, size, cpuset)) {
>   perror("sched_setaffinity");
> + CPU_FREE(cpuset);
>   exit(1);
>   }
> 
> + CPU_FREE(cpuset);
>   fn(arg);
> 
>   exit(0);
> -- 
> 2.24.1
> 


ipr crashes due to NULL dma_need_drain since cc97923a5bcc ("block: move dma drain handling to scsi")

2020-06-09 Thread Michael Ellerman
Hi all,

I'm seeing crashes on powerpc with the ipr driver, which I'm fairly sure
are due to dma_need_drain being NULL.

The backtrace is:

  scsi_init_io+0x1d8/0x350
  scsi_queue_rq+0x7a4/0xc30
  blk_mq_dispatch_rq_list+0x1b0/0x910
  blk_mq_sched_dispatch_requests+0x154/0x270
  __blk_mq_run_hw_queue+0xa0/0x160
  __blk_mq_delay_run_hw_queue+0x244/0x250
  blk_mq_sched_insert_request+0x13c/0x250
  blk_execute_rq_nowait+0x88/0xb0
  blk_execute_rq+0x5c/0xf0
  __scsi_execute+0x10c/0x270
  scsi_mode_sense+0x144/0x440
  sr_probe+0x2e8/0x810
  really_probe+0x12c/0x580
  driver_probe_device+0x88/0x170
  device_driver_attach+0x11c/0x130
  __driver_attach+0xac/0x190
  bus_for_each_dev+0xa8/0x130
  driver_attach+0x34/0x50
  bus_add_driver+0x170/0x2b0
  driver_register+0xb4/0x1c0
  scsi_register_driver+0x2c/0x40
  init_sr+0x4c/0x80
  do_one_initcall+0x60/0x2b0
  kernel_init_freeable+0x2e0/0x3a0
  kernel_init+0x2c/0x148
  ret_from_kernel_thread+0x5c/0x74

And looking at the disassembly I think it's coming from:

static inline bool scsi_cmd_needs_dma_drain(struct scsi_device *sdev,
struct request *rq)
{
return sdev->dma_drain_len && blk_rq_is_passthrough(rq) &&
   !op_is_write(req_op(rq)) &&
   sdev->host->hostt->dma_need_drain(rq);
  ^^
}

Bisect agrees:

# first bad commit: [cc97923a5bccc776851c242b61015faf288d5c22] block: move dma 
drain handling to scsi


And looking at ipr.c, it constructs its scsi_host_template manually,
without using any of the macros that end up calling __ATA_BASE_SHT,
which populates dma_need_drain.

The obvious fix below works, the system boots and seems to be operating
normally, but I don't know enough (anything) about SCSI to say if it's
actually the correct fix.

cheers


diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 7d77997d26d4..7d86f4ca266c 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -6731,6 +6731,7 @@ static struct scsi_host_template driver_template = {
.compat_ioctl = ipr_ioctl,
 #endif
.queuecommand = ipr_queuecommand,
+   .dma_need_drain = ata_scsi_dma_need_drain,
.eh_abort_handler = ipr_eh_abort,
.eh_device_reset_handler = ipr_eh_dev_reset,
.eh_host_reset_handler = ipr_eh_host_reset,



Re: [PATCH v3] selftests: powerpc: Fix CPU affinity for child process

2020-06-09 Thread Kamalesh Babulal
On 6/9/20 1:44 PM, Harish wrote:
> On systems with large number of cpus, test fails trying to set
> affinity by calling sched_setaffinity() with smaller size for
> affinity mask. This patch fixes it by making sure that the size of
> allocated affinity mask is dependent on the number of CPUs as
> reported by get_nprocs().
> 
> Fixes: 00b7ec5c9cf3 ("selftests/powerpc: Import Anton's context_switch2 
> benchmark")
> Reported-by: Shirisha Ganta 
> Signed-off-by: Sandipan Das 
> Signed-off-by: Harish 

LGTM,

Reviewed-by: Kamalesh Babulal 

-- 
Kamalesh


[PATCH] ASoC: fsl_ssi: Fix bclk calculation for mono channel

2020-06-09 Thread Shengjiu Wang
For mono channel, ssi will switch to normal mode. In normal
mode, the Word Length Control bits control the word length
divider in clock generator, which is different with I2S master
mode, the word length is fixed to 32bit.

So we refine the famula for mono channel, otherwise there
will be sound issue for S24_LE.

Fixes: b0a7043d5c2c ("ASoC: fsl_ssi: Caculate bit clock rate using slot number 
and width")
Signed-off-by: Shengjiu Wang 
---
 sound/soc/fsl/fsl_ssi.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index bad89b0d129e..e347776590f7 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -695,6 +695,11 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream 
*substream,
/* Generate bit clock based on the slot number and slot width */
freq = slots * slot_width * params_rate(hw_params);
 
+   /* The slot_width is not fixed to 32 for normal mode */
+   if (params_channels(hw_params) == 1)
+   freq = (slots <= 1 ? 2 : slots) * params_width(hw_params) *
+  params_rate(hw_params);
+
/* Don't apply it to any non-baudclk circumstance */
if (IS_ERR(ssi->baudclk))
return -EINVAL;
-- 
2.21.0



[PATCH v3] selftests: powerpc: Fix CPU affinity for child process

2020-06-09 Thread Harish
On systems with large number of cpus, test fails trying to set
affinity by calling sched_setaffinity() with smaller size for
affinity mask. This patch fixes it by making sure that the size of
allocated affinity mask is dependent on the number of CPUs as
reported by get_nprocs().

Fixes: 00b7ec5c9cf3 ("selftests/powerpc: Import Anton's context_switch2 
benchmark")
Reported-by: Shirisha Ganta 
Signed-off-by: Sandipan Das 
Signed-off-by: Harish 
---
v2: 
https://lore.kernel.org/linuxppc-dev/20200609034005.520137-1-har...@linux.ibm.com/

Changes from v2:
- Interchanged size and ncpus as suggested by Satheesh
- Revert the exit code as suggested by Satheesh
- Added NULL check for the affinity mask as suggested by Kamalesh
- Freed the affinity mask allocation after affinity is set
  as suggested by Kamalesh
- Changed "cpu set" to "affinity mask" in the commit message

---
 .../powerpc/benchmarks/context_switch.c   | 21 ++-
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/powerpc/benchmarks/context_switch.c 
b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
index a2e8c9da7fa5..d50cc05df495 100644
--- a/tools/testing/selftests/powerpc/benchmarks/context_switch.c
+++ b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -104,8 +105,9 @@ static void start_thread_on(void *(*fn)(void *), void *arg, 
unsigned long cpu)
 
 static void start_process_on(void *(*fn)(void *), void *arg, unsigned long cpu)
 {
-   int pid;
-   cpu_set_t cpuset;
+   int pid, ncpus;
+   cpu_set_t *cpuset;
+   size_t size;
 
pid = fork();
if (pid == -1) {
@@ -116,14 +118,23 @@ static void start_process_on(void *(*fn)(void *), void 
*arg, unsigned long cpu)
if (pid)
return;
 
-   CPU_ZERO(&cpuset);
-   CPU_SET(cpu, &cpuset);
+   ncpus = get_nprocs();
+   size = CPU_ALLOC_SIZE(ncpus);
+   cpuset = CPU_ALLOC(ncpus);
+   if (!cpuset) {
+   perror("malloc");
+   exit(1);
+   }
+   CPU_ZERO_S(size, cpuset);
+   CPU_SET_S(cpu, size, cpuset);
 
-   if (sched_setaffinity(0, sizeof(cpuset), &cpuset)) {
+   if (sched_setaffinity(0, size, cpuset)) {
perror("sched_setaffinity");
+   CPU_FREE(cpuset);
exit(1);
}
 
+   CPU_FREE(cpuset);
fn(arg);
 
exit(0);
-- 
2.24.1



[PATCH v2] selftests: powerpc: Fix online CPU selection

2020-06-09 Thread Sandipan Das
The size of the CPU affinity mask must be large enough for
systems with a very large number of CPUs. Otherwise, tests
which try to determine the first online CPU by calling
sched_getaffinity() will fail. This makes sure that the size
of the allocated affinity mask is dependent on the number of
CPUs as reported by get_nprocs().

Fixes: 3752e453f6ba ("selftests/powerpc: Add tests of PMU EBBs")
Reported-by: Shirisha Ganta 
Signed-off-by: Sandipan Das 
Reviewed-by: Kamalesh Babulal 
---
Previous versions can be found at:
v1: 
https://lore.kernel.org/linuxppc-dev/20200608144212.985144-1-sandi...@linux.ibm.com/

Changes in v2:
- Added NULL check for the affinity mask as suggested by Kamalesh.
- Changed "cpu set" to "CPU affinity mask" in the commit message.

---
 tools/testing/selftests/powerpc/utils.c | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/powerpc/utils.c 
b/tools/testing/selftests/powerpc/utils.c
index 933678f1ed0a..798fa8fdd5f4 100644
--- a/tools/testing/selftests/powerpc/utils.c
+++ b/tools/testing/selftests/powerpc/utils.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -88,28 +89,40 @@ void *get_auxv_entry(int type)
 
 int pick_online_cpu(void)
 {
-   cpu_set_t mask;
-   int cpu;
+   int ncpus, cpu = -1;
+   cpu_set_t *mask;
+   size_t size;
+
+   ncpus = get_nprocs();
+   size = CPU_ALLOC_SIZE(ncpus);
+   mask = CPU_ALLOC(ncpus);
+   if (!mask) {
+   perror("malloc");
+   return -1;
+   }
 
-   CPU_ZERO(&mask);
+   CPU_ZERO_S(size, mask);
 
-   if (sched_getaffinity(0, sizeof(mask), &mask)) {
+   if (sched_getaffinity(0, size, mask)) {
perror("sched_getaffinity");
-   return -1;
+   goto done;
}
 
/* We prefer a primary thread, but skip 0 */
-   for (cpu = 8; cpu < CPU_SETSIZE; cpu += 8)
-   if (CPU_ISSET(cpu, &mask))
-   return cpu;
+   for (cpu = 8; cpu < ncpus; cpu += 8)
+   if (CPU_ISSET_S(cpu, size, mask))
+   goto done;
 
/* Search for anything, but in reverse */
-   for (cpu = CPU_SETSIZE - 1; cpu >= 0; cpu--)
-   if (CPU_ISSET(cpu, &mask))
-   return cpu;
+   for (cpu = ncpus - 1; cpu >= 0; cpu--)
+   if (CPU_ISSET_S(cpu, size, mask))
+   goto done;
 
printf("No cpus in affinity mask?!\n");
-   return -1;
+
+done:
+   CPU_FREE(mask);
+   return cpu;
 }
 
 bool is_ppc64le(void)
-- 
2.25.1



Re: [PATCH] powerpc/pseries/svm: Remove unwanted check for shared_lppaca_size

2020-06-09 Thread Laurent Dufour

Le 09/06/2020 à 07:38, sathn...@linux.vent.ibm.com a écrit :

From: Satheesh Rajendran 

Early secure guest boot hits the below crash while booting with
vcpus numbers aligned with page boundary for PAGE size of 64k
and LPPACA size of 1k i.e 64, 128 etc, due to the BUG_ON assert
for shared_lppaca_total_size equal to shared_lppaca_size,

  [0.00] Partition configured for 64 cpus.
  [0.00] CPU maps initialized for 1 thread per core
  [0.00] [ cut here ]
  [0.00] kernel BUG at arch/powerpc/kernel/paca.c:89!
  [0.00] Oops: Exception in kernel mode, sig: 5 [#1]
  [0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries

which is not necessary, let's remove it.


Reviewed-by: Laurent Dufour 


Cc: linuxppc-dev@lists.ozlabs.org
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Signed-off-by: Satheesh Rajendran 
---
  arch/powerpc/kernel/paca.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 949eceb25..10b7c54a7 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -86,7 +86,7 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
 * This is very early in boot, so no harm done if the kernel crashes at
 * this point.
 */
-   BUG_ON(shared_lppaca_size >= shared_lppaca_total_size);
+   BUG_ON(shared_lppaca_size > shared_lppaca_total_size);
  
  	return ptr;

  }





Re: [PATCH v2] selftests: powerpc: Fix CPU affinity for child process

2020-06-09 Thread Kamalesh Babulal
On 6/9/20 9:10 AM, Harish wrote:
> On systems with large number of cpus, test fails trying to set
> affinity for child process by calling sched_setaffinity() with 
> smaller size for cpuset. This patch fixes it by making sure that
> the size of allocated cpu set is dependent on the number of CPUs
> as reported by get_nprocs().
> 
> Fixes: 00b7ec5c9cf3 ("selftests/powerpc: Import Anton's context_switch2 
> benchmark")
> Reported-by: Shirisha Ganta 
> Signed-off-by: Harish 
> Signed-off-by: Sandipan Das 
> ---
>  .../powerpc/benchmarks/context_switch.c| 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/powerpc/benchmarks/context_switch.c 
> b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> index a2e8c9da7fa5..de6c49d6f88f 100644
> --- a/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> +++ b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -104,8 +105,9 @@ static void start_thread_on(void *(*fn)(void *), void 
> *arg, unsigned long cpu)
> 
>  static void start_process_on(void *(*fn)(void *), void *arg, unsigned long 
> cpu)
>  {
> - int pid;
> - cpu_set_t cpuset;
> + int pid, ncpus;
> + cpu_set_t *cpuset;
> + size_t size;
> 
>   pid = fork();
>   if (pid == -1) {
> @@ -116,12 +118,16 @@ static void start_process_on(void *(*fn)(void *), void 
> *arg, unsigned long cpu)
>   if (pid)
>   return;
> 
> - CPU_ZERO(&cpuset);
> - CPU_SET(cpu, &cpuset);
> + size = CPU_ALLOC_SIZE(ncpus);
> + ncpus = get_nprocs();
> + cpuset = CPU_ALLOC(ncpus);

CPU_ALLOC() allocation failure needs to be checked, like malloc() allocations.

> + CPU_ZERO_S(size, cpuset);
> + CPU_SET_S(cpu, size, cpuset);
> 
> - if (sched_setaffinity(0, sizeof(cpuset), &cpuset)) {
> + if (sched_setaffinity(0, size, cpuset)) {
>   perror("sched_setaffinity");
> - exit(1);
> + CPU_FREE(cpuset);
> + exit(-1);
>   }

once the cpu affinity is set, you probably want to free the cpuset mask.

> 
>   fn(arg);
> 
-- 
Kamalesh


[PATCH 7/7] powerpc/64s: advertise hardware link stack flush

2020-06-09 Thread Nicholas Piggin
For testing only at the moment, firmware does not define these bits.
---
 arch/powerpc/include/asm/hvcall.h | 1 +
 arch/powerpc/include/uapi/asm/kvm.h   | 1 +
 arch/powerpc/kvm/powerpc.c| 9 +++--
 arch/powerpc/platforms/powernv/setup.c| 3 +++
 arch/powerpc/platforms/pseries/setup.c| 3 +++
 tools/arch/powerpc/include/uapi/asm/kvm.h | 1 +
 6 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e90c073e437e..a92a07c89b6f 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -373,6 +373,7 @@
 #define H_CPU_CHAR_THREAD_RECONFIG_CTRL(1ull << 57) // IBM bit 6
 #define H_CPU_CHAR_COUNT_CACHE_DISABLED(1ull << 56) // IBM bit 7
 #define H_CPU_CHAR_BCCTR_FLUSH_ASSIST  (1ull << 54) // IBM bit 9
+#define H_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST (1ull << 53) // IBM bit 10
 
 #define H_CPU_BEHAV_FAVOUR_SECURITY(1ull << 63) // IBM bit 0
 #define H_CPU_BEHAV_L1D_FLUSH_PR   (1ull << 62) // IBM bit 1
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 264e266a85bf..dd229d5f46ee 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -464,6 +464,7 @@ struct kvm_ppc_cpu_char {
 #define KVM_PPC_CPU_CHAR_MTTRIG_THR_RECONF (1ULL << 57)
 #define KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS   (1ULL << 56)
 #define KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST(1ull << 54)
+#define KVM_PPC_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST   (1ull << 53)
 
 #define KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY  (1ULL << 63)
 #define KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR (1ULL << 62)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 27ccff612903..fa981ee09dec 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -2221,7 +2221,8 @@ static int pseries_get_cpu_char(struct kvm_ppc_cpu_char 
*cp)
KVM_PPC_CPU_CHAR_BR_HINT_HONOURED |
KVM_PPC_CPU_CHAR_MTTRIG_THR_RECONF |
KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS |
-   KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST;
+   KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST |
+   KVM_PPC_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST;
cp->behaviour_mask = KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY |
KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR |
KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR |
@@ -2287,13 +2288,17 @@ static int kvmppc_get_cpu_char(struct kvm_ppc_cpu_char 
*cp)
if (have_fw_feat(fw_features, "enabled",
 "fw-count-cache-flush-bcctr2,0,0"))
cp->character |= KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST;
+   if (have_fw_feat(fw_features, "enabled",
+"fw-link-stack-flush-bcctr2,0,0"))
+   cp->character |= 
KVM_PPC_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST;
cp->character_mask = KVM_PPC_CPU_CHAR_SPEC_BAR_ORI31 |
KVM_PPC_CPU_CHAR_BCCTRL_SERIALISED |
KVM_PPC_CPU_CHAR_L1D_FLUSH_ORI30 |
KVM_PPC_CPU_CHAR_L1D_FLUSH_TRIG2 |
KVM_PPC_CPU_CHAR_L1D_THREAD_PRIV |
KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS |
-   KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST;
+   KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST |
+   KVM_PPC_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST;
 
if (have_fw_feat(fw_features, "enabled",
 "speculation-policy-favor-security"))
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 3bc188da82ba..1a06d3b4c0a9 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -78,6 +78,9 @@ static void init_fw_feat_flags(struct device_node *np)
if (fw_feature_is("enabled", "fw-count-cache-flush-bcctr2,0,0", np))
security_ftr_set(SEC_FTR_BCCTR_FLUSH_ASSIST);
 
+   if (fw_feature_is("enabled", "fw-link-stack-flush-bcctr2,0,0", np))
+   security_ftr_set(SEC_FTR_BCCTR_LINK_FLUSH_ASSIST);
+
if (fw_feature_is("enabled", 
"needs-count-cache-flush-on-context-switch", np))
security_ftr_set(SEC_FTR_FLUSH_COUNT_CACHE);
 
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 64d18f4bf093..70c9264f23c5 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -517,6 +517,9 @@ static void init_cpu_char_feature_flags(struct 
h_cpu_char_result *result)
if (result->character & H_CPU_CHAR_BCCTR_FLUSH_ASSIST)
security_ftr_set(SEC_FTR_BCCTR_FLUSH_ASSIST);
 
+   if (result->character & H_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST)
+   security_ftr_set(S

[PATCH 6/7] powerpc/security: Allow for processors that flush the link stack using the special bcctr

2020-06-09 Thread Nicholas Piggin
If both count cache and link stack are to be flushed, and can be flushed
with the special bcctr, patch that in directly to the flush/branch nop
site.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/security_features.h |  2 ++
 arch/powerpc/kernel/security.c   | 27 ++--
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/security_features.h 
b/arch/powerpc/include/asm/security_features.h
index 7c05e95a5c44..fbb8fa32150f 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -63,6 +63,8 @@ static inline bool security_ftr_enabled(u64 feature)
 // bcctr 2,0,0 triggers a hardware assisted count cache flush
 #define SEC_FTR_BCCTR_FLUSH_ASSIST 0x0800ull
 
+// bcctr 2,0,0 triggers a hardware assisted link stack flush
+#define SEC_FTR_BCCTR_LINK_FLUSH_ASSIST0x2000ull
 
 // Features indicating need for Spectre/Meltdown mitigations
 
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 2a413af21124..6ad5c753d47c 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -219,24 +219,25 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
device_attribute *attr, c
if (ccd)
seq_buf_printf(&s, "Indirect branch cache disabled");
 
-   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_SW)
-   seq_buf_printf(&s, ", Software link stack flush");
-
} else if (count_cache_flush_type != BRANCH_CACHE_FLUSH_NONE) {
seq_buf_printf(&s, "Mitigation: Software count cache flush");
 
if (count_cache_flush_type == BRANCH_CACHE_FLUSH_HW)
seq_buf_printf(&s, " (hardware accelerated)");
 
-   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_SW)
-   seq_buf_printf(&s, ", Software link stack flush");
-
} else if (btb_flush_enabled) {
seq_buf_printf(&s, "Mitigation: Branch predictor state flush");
} else {
seq_buf_printf(&s, "Vulnerable");
}
 
+   if (bcs || ccd || count_cache_flush_type != BRANCH_CACHE_FLUSH_NONE) {
+   if (link_stack_flush_type != BRANCH_CACHE_FLUSH_NONE)
+   seq_buf_printf(&s, ", Software link stack flush");
+   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_HW)
+   seq_buf_printf(&s, " (hardware accelerated)");
+   }
+
seq_buf_printf(&s, "\n");
 
return s.len;
@@ -435,6 +436,7 @@ static void update_branch_cache_flush(void)
patch_instruction_site(&patch__call_kvm_flush_link_stack,
   ppc_inst(PPC_INST_NOP));
} else {
+   // Could use HW flush, but that could also flush count cache
patch_branch_site(&patch__call_kvm_flush_link_stack,
  (u64)&kvm_flush_link_stack, BRANCH_SET_LINK);
}
@@ -445,6 +447,10 @@ static void update_branch_cache_flush(void)
link_stack_flush_type == BRANCH_CACHE_FLUSH_NONE) {
patch_instruction_site(&patch__call_flush_branch_caches,
   ppc_inst(PPC_INST_NOP));
+   } else if (count_cache_flush_type == BRANCH_CACHE_FLUSH_HW &&
+  link_stack_flush_type == BRANCH_CACHE_FLUSH_HW) {
+   patch_instruction_site(&patch__call_flush_branch_caches,
+  ppc_inst(PPC_INST_BCCTR_FLUSH));
} else {
patch_branch_site(&patch__call_flush_branch_caches,
  (u64)&flush_branch_caches, BRANCH_SET_LINK);
@@ -485,8 +491,13 @@ static void toggle_branch_cache_flush(bool enable)
pr_info("link-stack-flush: flush disabled.\n");
}
} else {
-   link_stack_flush_type = BRANCH_CACHE_FLUSH_SW;
-   pr_info("link-stack-flush: software flush enabled.\n");
+   if (security_ftr_enabled(SEC_FTR_BCCTR_LINK_FLUSH_ASSIST)) {
+   link_stack_flush_type = BRANCH_CACHE_FLUSH_HW;
+   pr_info("link-stack-flush: hardware flush enabled.\n");
+   } else {
+   link_stack_flush_type = BRANCH_CACHE_FLUSH_SW;
+   pr_info("link-stack-flush: software flush enabled.\n");
+   }
}
 
update_branch_cache_flush();
-- 
2.23.0



[PATCH 5/7] powerpc/64s: Move branch cache flushing bcctr variant to ppc-ops.h

2020-06-09 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ppc-opcode.h | 2 ++
 arch/powerpc/kernel/entry_64.S| 6 ++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 2a39c716c343..79d511a38bbb 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -195,6 +195,7 @@
 #define OP_LQ56
 
 /* sorted alphabetically */
+#define PPC_INST_BCCTR_FLUSH   0x4c400420
 #define PPC_INST_BHRBE 0x7c00025c
 #define PPC_INST_CLRBHRB   0x7c00035c
 #define PPC_INST_COPY  0x7c20060c
@@ -432,6 +433,7 @@
 #endif
 
 /* Deal with instructions that older assemblers aren't aware of */
+#definePPC_BCCTR_FLUSH stringify_in_c(.long 
PPC_INST_BCCTR_FLUSH)
 #definePPC_CP_ABORTstringify_in_c(.long PPC_INST_CP_ABORT)
 #definePPC_COPY(a, b)  stringify_in_c(.long PPC_INST_COPY | \
___PPC_RA(a) | ___PPC_RB(b))
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 2ba25b3b701e..a115aeb2983a 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -261,8 +261,6 @@ _ASM_NOKPROBE_SYMBOL(save_nvgprs);
 1: nop;\
patch_site 1b, patch__call_flush_branch_caches
 
-#define BCCTR_FLUSH.long 0x4c400420
-
 .macro nops number
.rept \number
nop
@@ -293,7 +291,7 @@ flush_branch_caches:
li  r9,0x7fff
mtctr   r9
 
-   BCCTR_FLUSH
+   PPC_BCCTR_FLUSH
 
 2: nop
patch_site 2b patch__flush_count_cache_return
@@ -302,7 +300,7 @@ flush_branch_caches:
 
.rept 278
.balign 32
-   BCCTR_FLUSH
+   PPC_BCCTR_FLUSH
nops7
.endr
 
-- 
2.23.0



[PATCH 4/7] powerpc/security: split branch cache flush toggle from code patching

2020-06-09 Thread Nicholas Piggin
Branch cache flushing code patching has inter-dependencies on both the
link stack and the count cache flushing state.

To make the code clearer and to separate the link stack and count
cache handling, split the "toggle" (setting up variables and printing
enable/disable) from the code patching.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/security.c | 94 ++
 1 file changed, 51 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 659ef6a92bb9..2a413af21124 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -427,61 +427,69 @@ static __init int stf_barrier_debugfs_init(void)
 device_initcall(stf_barrier_debugfs_init);
 #endif /* CONFIG_DEBUG_FS */
 
-static void no_count_cache_flush(void)
+static void update_branch_cache_flush(void)
 {
-   count_cache_flush_type = BRANCH_CACHE_FLUSH_NONE;
-   pr_info("count-cache-flush: flush disabled.\n");
-}
-
-static void toggle_branch_cache_flush(bool enable)
-{
-   if (!security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE) &&
-   !security_ftr_enabled(SEC_FTR_FLUSH_LINK_STACK))
-   enable = false;
-
-   if (!enable) {
-   patch_instruction_site(&patch__call_flush_branch_caches,
-  ppc_inst(PPC_INST_NOP));
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   // This controls the branch from guest_exit_cont to kvm_flush_link_stack
+   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_NONE) {
patch_instruction_site(&patch__call_kvm_flush_link_stack,
   ppc_inst(PPC_INST_NOP));
-#endif
-   pr_info("link-stack-flush: flush disabled.\n");
-   link_stack_flush_type = BRANCH_CACHE_FLUSH_NONE;
-   no_count_cache_flush();
-   return;
+   } else {
+   patch_branch_site(&patch__call_kvm_flush_link_stack,
+ (u64)&kvm_flush_link_stack, BRANCH_SET_LINK);
}
-
-   // This enables the branch from _switch to flush_branch_caches
-   patch_branch_site(&patch__call_flush_branch_caches,
- (u64)&flush_branch_caches, BRANCH_SET_LINK);
-
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-   // This enables the branch from guest_exit_cont to kvm_flush_link_stack
-   patch_branch_site(&patch__call_kvm_flush_link_stack,
- (u64)&kvm_flush_link_stack, BRANCH_SET_LINK);
 #endif
 
-   pr_info("link-stack-flush: software flush enabled.\n");
-   link_stack_flush_type = BRANCH_CACHE_FLUSH_SW;
+   // This controls the branch from _switch to flush_branch_caches
+   if (count_cache_flush_type == BRANCH_CACHE_FLUSH_NONE &&
+   link_stack_flush_type == BRANCH_CACHE_FLUSH_NONE) {
+   patch_instruction_site(&patch__call_flush_branch_caches,
+  ppc_inst(PPC_INST_NOP));
+   } else {
+   patch_branch_site(&patch__call_flush_branch_caches,
+ (u64)&flush_branch_caches, BRANCH_SET_LINK);
+
+   // If we just need to flush the link stack, early return
+   if (count_cache_flush_type == BRANCH_CACHE_FLUSH_NONE) {
+   patch_instruction_site(&patch__flush_link_stack_return,
+  ppc_inst(PPC_INST_BLR));
+
+   // If we have flush instruction, early return
+   } else if (count_cache_flush_type == BRANCH_CACHE_FLUSH_HW) {
+   patch_instruction_site(&patch__flush_count_cache_return,
+  ppc_inst(PPC_INST_BLR));
+   }
+   }
+}
 
-   // If we just need to flush the link stack, patch an early return
-   if (!security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE)) {
-   patch_instruction_site(&patch__flush_link_stack_return,
-  ppc_inst(PPC_INST_BLR));
-   no_count_cache_flush();
-   return;
+static void toggle_branch_cache_flush(bool enable)
+{
+   if (!enable || !security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE)) {
+   if (count_cache_flush_type != BRANCH_CACHE_FLUSH_NONE) {
+   count_cache_flush_type = BRANCH_CACHE_FLUSH_NONE;
+   pr_info("count-cache-flush: flush disabled.\n");
+   }
+   } else {
+   if (security_ftr_enabled(SEC_FTR_BCCTR_FLUSH_ASSIST)) {
+   count_cache_flush_type = BRANCH_CACHE_FLUSH_HW;
+   pr_info("count-cache-flush: hardware flush enabled.\n");
+   } else {
+   count_cache_flush_type = BRANCH_CACHE_FLUSH_SW;
+   pr_info("count-cache-flush: software flush enabled.\n");
+   }
}
 
-   if (!security_ftr_enabled(SEC_FTR_BCCTR_FLUSH

[PATCH 3/7] powerpc/security: make display of branch cache flush more consistent

2020-06-09 Thread Nicholas Piggin
Make the count-cache and link-stack messages look the same

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/security.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 28f4cb062f69..659ef6a92bb9 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -430,7 +430,7 @@ device_initcall(stf_barrier_debugfs_init);
 static void no_count_cache_flush(void)
 {
count_cache_flush_type = BRANCH_CACHE_FLUSH_NONE;
-   pr_info("count-cache-flush: software flush disabled.\n");
+   pr_info("count-cache-flush: flush disabled.\n");
 }
 
 static void toggle_branch_cache_flush(bool enable)
@@ -446,7 +446,7 @@ static void toggle_branch_cache_flush(bool enable)
patch_instruction_site(&patch__call_kvm_flush_link_stack,
   ppc_inst(PPC_INST_NOP));
 #endif
-   pr_info("link-stack-flush: software flush disabled.\n");
+   pr_info("link-stack-flush: flush disabled.\n");
link_stack_flush_type = BRANCH_CACHE_FLUSH_NONE;
no_count_cache_flush();
return;
@@ -475,13 +475,13 @@ static void toggle_branch_cache_flush(bool enable)
 
if (!security_ftr_enabled(SEC_FTR_BCCTR_FLUSH_ASSIST)) {
count_cache_flush_type = BRANCH_CACHE_FLUSH_SW;
-   pr_info("count-cache-flush: full software flush sequence 
enabled.\n");
+   pr_info("count-cache-flush: software flush enabled.\n");
return;
}
 
patch_instruction_site(&patch__flush_count_cache_return, 
ppc_inst(PPC_INST_BLR));
count_cache_flush_type = BRANCH_CACHE_FLUSH_HW;
-   pr_info("count-cache-flush: hardware assisted flush sequence 
enabled\n");
+   pr_info("count-cache-flush: hardware flush enabled.\n");
 }
 
 void setup_count_cache_flush(void)
-- 
2.23.0



[PATCH 2/7] powerpc/security: change link stack flush state to the flush type enum

2020-06-09 Thread Nicholas Piggin
Prepare to allow for hardware link stack flushing by using the
none/sw/hw type, same as the count cache state.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/security.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index df2a3eff950b..28f4cb062f69 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -27,7 +27,7 @@ enum branch_cache_flush_type {
BRANCH_CACHE_FLUSH_HW   = 0x4,
 };
 static enum branch_cache_flush_type count_cache_flush_type = 
BRANCH_CACHE_FLUSH_NONE;
-static bool link_stack_flush_enabled;
+static enum branch_cache_flush_type link_stack_flush_type = 
BRANCH_CACHE_FLUSH_NONE;
 
 bool barrier_nospec_enabled;
 static bool no_nospec;
@@ -219,7 +219,7 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
device_attribute *attr, c
if (ccd)
seq_buf_printf(&s, "Indirect branch cache disabled");
 
-   if (link_stack_flush_enabled)
+   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_SW)
seq_buf_printf(&s, ", Software link stack flush");
 
} else if (count_cache_flush_type != BRANCH_CACHE_FLUSH_NONE) {
@@ -228,7 +228,7 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
device_attribute *attr, c
if (count_cache_flush_type == BRANCH_CACHE_FLUSH_HW)
seq_buf_printf(&s, " (hardware accelerated)");
 
-   if (link_stack_flush_enabled)
+   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_SW)
seq_buf_printf(&s, ", Software link stack flush");
 
} else if (btb_flush_enabled) {
@@ -447,7 +447,7 @@ static void toggle_branch_cache_flush(bool enable)
   ppc_inst(PPC_INST_NOP));
 #endif
pr_info("link-stack-flush: software flush disabled.\n");
-   link_stack_flush_enabled = false;
+   link_stack_flush_type = BRANCH_CACHE_FLUSH_NONE;
no_count_cache_flush();
return;
}
@@ -463,7 +463,7 @@ static void toggle_branch_cache_flush(bool enable)
 #endif
 
pr_info("link-stack-flush: software flush enabled.\n");
-   link_stack_flush_enabled = true;
+   link_stack_flush_type = BRANCH_CACHE_FLUSH_SW;
 
// If we just need to flush the link stack, patch an early return
if (!security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE)) {
-- 
2.23.0



[PATCH 1/7] powerpc/security: re-name count cache flush to branch cache flush

2020-06-09 Thread Nicholas Piggin
The count cache flush mostly refers to both count cache and link stack
flushing. As a first step to untangling these a bit, re-name the bits
that apply to both.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/asm-prototypes.h |  4 +--
 arch/powerpc/kernel/entry_64.S|  7 ++---
 arch/powerpc/kernel/security.c| 36 +++
 3 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 7d81e86a1e5d..fa9057360e88 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -144,13 +144,13 @@ void _kvmppc_restore_tm_pr(struct kvm_vcpu *vcpu, u64 
guest_msr);
 void _kvmppc_save_tm_pr(struct kvm_vcpu *vcpu, u64 guest_msr);
 
 /* Patch sites */
-extern s32 patch__call_flush_count_cache;
+extern s32 patch__call_flush_branch_caches;
 extern s32 patch__flush_count_cache_return;
 extern s32 patch__flush_link_stack_return;
 extern s32 patch__call_kvm_flush_link_stack;
 extern s32 patch__memset_nocache, patch__memcpy_nocache;
 
-extern long flush_count_cache;
+extern long flush_branch_caches;
 extern long kvm_flush_link_stack;
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 9d49338e0c85..2ba25b3b701e 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -259,8 +259,7 @@ _ASM_NOKPROBE_SYMBOL(save_nvgprs);
 
 #define FLUSH_COUNT_CACHE  \
 1: nop;\
-   patch_site 1b, patch__call_flush_count_cache
-
+   patch_site 1b, patch__call_flush_branch_caches
 
 #define BCCTR_FLUSH.long 0x4c400420
 
@@ -271,8 +270,8 @@ _ASM_NOKPROBE_SYMBOL(save_nvgprs);
 .endm
 
 .balign 32
-.global flush_count_cache
-flush_count_cache:
+.global flush_branch_caches
+flush_branch_caches:
/* Save LR into r9 */
mflrr9
 
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index d86701ce116b..df2a3eff950b 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -21,12 +21,12 @@
 
 u64 powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
 
-enum count_cache_flush_type {
-   COUNT_CACHE_FLUSH_NONE  = 0x1,
-   COUNT_CACHE_FLUSH_SW= 0x2,
-   COUNT_CACHE_FLUSH_HW= 0x4,
+enum branch_cache_flush_type {
+   BRANCH_CACHE_FLUSH_NONE = 0x1,
+   BRANCH_CACHE_FLUSH_SW   = 0x2,
+   BRANCH_CACHE_FLUSH_HW   = 0x4,
 };
-static enum count_cache_flush_type count_cache_flush_type = 
COUNT_CACHE_FLUSH_NONE;
+static enum branch_cache_flush_type count_cache_flush_type = 
BRANCH_CACHE_FLUSH_NONE;
 static bool link_stack_flush_enabled;
 
 bool barrier_nospec_enabled;
@@ -222,10 +222,10 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
device_attribute *attr, c
if (link_stack_flush_enabled)
seq_buf_printf(&s, ", Software link stack flush");
 
-   } else if (count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) {
+   } else if (count_cache_flush_type != BRANCH_CACHE_FLUSH_NONE) {
seq_buf_printf(&s, "Mitigation: Software count cache flush");
 
-   if (count_cache_flush_type == COUNT_CACHE_FLUSH_HW)
+   if (count_cache_flush_type == BRANCH_CACHE_FLUSH_HW)
seq_buf_printf(&s, " (hardware accelerated)");
 
if (link_stack_flush_enabled)
@@ -429,18 +429,18 @@ device_initcall(stf_barrier_debugfs_init);
 
 static void no_count_cache_flush(void)
 {
-   count_cache_flush_type = COUNT_CACHE_FLUSH_NONE;
+   count_cache_flush_type = BRANCH_CACHE_FLUSH_NONE;
pr_info("count-cache-flush: software flush disabled.\n");
 }
 
-static void toggle_count_cache_flush(bool enable)
+static void toggle_branch_cache_flush(bool enable)
 {
if (!security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE) &&
!security_ftr_enabled(SEC_FTR_FLUSH_LINK_STACK))
enable = false;
 
if (!enable) {
-   patch_instruction_site(&patch__call_flush_count_cache,
+   patch_instruction_site(&patch__call_flush_branch_caches,
   ppc_inst(PPC_INST_NOP));
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
patch_instruction_site(&patch__call_kvm_flush_link_stack,
@@ -452,9 +452,9 @@ static void toggle_count_cache_flush(bool enable)
return;
}
 
-   // This enables the branch from _switch to flush_count_cache
-   patch_branch_site(&patch__call_flush_count_cache,
- (u64)&flush_count_cache, BRANCH_SET_LINK);
+   // This enables the branch from _switch to flush_branch_caches
+   patch_branch_site(&patch__call_flush_branch_caches,
+ (u64)&flush_branch_caches, BRANCH_SET_LINK);
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
// This enables the branch from guest_exit_cont to kvm

[PATCH 0/7] powerpc: branch cache flush changes

2020-06-09 Thread Nicholas Piggin
This series allows the link stack to be flushed with the speical
bcctr 2,0,0 flush instruction that also flushes the count cache if
the processor supports it.

Firmware does not support this at the moment, but I've tested it in
simulator with a patched firmware to advertise support.

Thanks,
Nick

Nicholas Piggin (7):
  powerpc/security: re-name count cache flush to branch cache flush
  powerpc/security: change link stack flush state to the flush type enum
  powerpc/security: make display of branch cache flush more consistent
  powerpc/security: split branch cache flush toggle from code patching
  powerpc/64s: Move branch cache flushing bcctr variant to ppc-ops.h
  powerpc/security: Allow for processors that flush the link stack using
the special bcctr
  powerpc/64s: advertise hardware link stack flush

 arch/powerpc/include/asm/asm-prototypes.h|   4 +-
 arch/powerpc/include/asm/hvcall.h|   1 +
 arch/powerpc/include/asm/ppc-opcode.h|   2 +
 arch/powerpc/include/asm/security_features.h |   2 +
 arch/powerpc/include/uapi/asm/kvm.h  |   1 +
 arch/powerpc/kernel/entry_64.S   |  13 +-
 arch/powerpc/kernel/security.c   | 139 +++
 arch/powerpc/kvm/powerpc.c   |   9 +-
 arch/powerpc/platforms/powernv/setup.c   |   3 +
 arch/powerpc/platforms/pseries/setup.c   |   3 +
 tools/arch/powerpc/include/uapi/asm/kvm.h|   1 +
 11 files changed, 106 insertions(+), 72 deletions(-)

-- 
2.23.0



Re: [PATCH v2] selftests: powerpc: Fix CPU affinity for child process

2020-06-09 Thread Satheesh Rajendran
On Tue, Jun 09, 2020 at 09:10:05AM +0530, Harish wrote:
> On systems with large number of cpus, test fails trying to set
> affinity for child process by calling sched_setaffinity() with 
> smaller size for cpuset. This patch fixes it by making sure that
> the size of allocated cpu set is dependent on the number of CPUs
> as reported by get_nprocs().
> 
> Fixes: 00b7ec5c9cf3 ("selftests/powerpc: Import Anton's context_switch2 
> benchmark")
> Reported-by: Shirisha Ganta 
> Signed-off-by: Harish 
> Signed-off-by: Sandipan Das 
> ---
>  .../powerpc/benchmarks/context_switch.c| 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/powerpc/benchmarks/context_switch.c 
> b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> index a2e8c9da7fa5..de6c49d6f88f 100644
> --- a/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> +++ b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -104,8 +105,9 @@ static void start_thread_on(void *(*fn)(void *), void 
> *arg, unsigned long cpu)
> 
>  static void start_process_on(void *(*fn)(void *), void *arg, unsigned long 
> cpu)
>  {
> - int pid;
> - cpu_set_t cpuset;
> + int pid, ncpus;
> + cpu_set_t *cpuset;
> + size_t size;
> 
>   pid = fork();
>   if (pid == -1) {
> @@ -116,12 +118,16 @@ static void start_process_on(void *(*fn)(void *), void 
> *arg, unsigned long cpu)
>   if (pid)
>   return;
> 
> - CPU_ZERO(&cpuset);
> - CPU_SET(cpu, &cpuset);
> + size = CPU_ALLOC_SIZE(ncpus);
> + ncpus = get_nprocs();
above two lines should be interchanged, ncpus not assigned while getting used 
to get size.

> + cpuset = CPU_ALLOC(ncpus);
> + CPU_ZERO_S(size, cpuset);
> + CPU_SET_S(cpu, size, cpuset);
> 
> - if (sched_setaffinity(0, sizeof(cpuset), &cpuset)) {
> + if (sched_setaffinity(0, size, cpuset)) {
>   perror("sched_setaffinity");
> - exit(1);
> + CPU_FREE(cpuset);
> + exit(-1);
do we need to change the return value here?
probably other framework might rely on previous value?

Regards,
-Satheesh.
>   }
> 
>   fn(arg);
> -- 

> 2.24.1
> 


Re: [PATCH] selftests: powerpc: Fix online CPU selection

2020-06-09 Thread Sandipan Das



On 08/06/20 8:12 pm, Sandipan Das wrote:
> The size of the cpu set must be large enough for systems
> with a very large number of CPUs. Otherwise, tests which
> try to determine the first online CPU by calling
> sched_getaffinity() will fail. This makes sure that the
> size of the allocated cpu set is dependent on the number
> of CPUs as reported by get_nprocs().
> 
> Fixes: 3752e453f6ba ("selftests/powerpc: Add tests of PMU EBBs")
> Reported-by: Shirisha Ganta 
> Signed-off-by: Sandipan Das 
> ---
>  tools/testing/selftests/powerpc/utils.c | 33 -
>  1 file changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/testing/selftests/powerpc/utils.c 
> b/tools/testing/selftests/powerpc/utils.c
> index 933678f1ed0a..bb8e402752c0 100644
> --- a/tools/testing/selftests/powerpc/utils.c
> +++ b/tools/testing/selftests/powerpc/utils.c
> @@ -16,6 +16,7 @@
> [...]
>  
>  int pick_online_cpu(void)
>  {
> - cpu_set_t mask;
> - int cpu;
> + int ncpus, cpu = -1;
> + cpu_set_t *mask;
> + size_t size;
>  
> - CPU_ZERO(&mask);
> + ncpus = get_nprocs();
> + size = CPU_ALLOC_SIZE(ncpus);
> + mask = CPU_ALLOC(ncpus);
>  
> - if (sched_getaffinity(0, sizeof(mask), &mask)) {
> + CPU_ZERO_S(size, mask);
> +
> + if (sched_getaffinity(0, size, mask)) {
>   perror("sched_getaffinity");
> - return -1;
> + goto done;
>   }
>  
>   /* We prefer a primary thread, but skip 0 */
> - for (cpu = 8; cpu < CPU_SETSIZE; cpu += 8)
> - if (CPU_ISSET(cpu, &mask))
> - return cpu;
> + for (cpu = 8; cpu < ncpus; cpu += 8)
> + if (CPU_ISSET_S(cpu, size, mask))
> + goto done;
>  
>   /* Search for anything, but in reverse */
> - for (cpu = CPU_SETSIZE - 1; cpu >= 0; cpu--)
> - if (CPU_ISSET(cpu, &mask))
> - return cpu;
> + for (cpu = ncpus - 1; cpu >= 0; cpu--)
> + if (CPU_ISSET_S(cpu, size, mask))
> + goto done;
>  
>   printf("No cpus in affinity mask?!\n");
> - return -1;

Missed the fact that the for loop before this would anyway make 'cpu'
count down to -1 if no online CPU is found. Please ignore the previous
message.

> +
> +done:
> + CPU_FREE(mask);
> + return cpu;
>  }
>  
>  bool is_ppc64le(void)
> 

- Sandipan


[PATCH 7/7] powerpc/64s: advertise hardware link stack flush

2020-06-09 Thread Nicholas Piggin
For testing only at the moment, firmware does not define these bits.
---
 arch/powerpc/include/asm/hvcall.h | 1 +
 arch/powerpc/include/uapi/asm/kvm.h   | 1 +
 arch/powerpc/kvm/powerpc.c| 9 +++--
 arch/powerpc/platforms/powernv/setup.c| 3 +++
 arch/powerpc/platforms/pseries/setup.c| 3 +++
 tools/arch/powerpc/include/uapi/asm/kvm.h | 1 +
 6 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e90c073e437e..a92a07c89b6f 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -373,6 +373,7 @@
 #define H_CPU_CHAR_THREAD_RECONFIG_CTRL(1ull << 57) // IBM bit 6
 #define H_CPU_CHAR_COUNT_CACHE_DISABLED(1ull << 56) // IBM bit 7
 #define H_CPU_CHAR_BCCTR_FLUSH_ASSIST  (1ull << 54) // IBM bit 9
+#define H_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST (1ull << 53) // IBM bit 10
 
 #define H_CPU_BEHAV_FAVOUR_SECURITY(1ull << 63) // IBM bit 0
 #define H_CPU_BEHAV_L1D_FLUSH_PR   (1ull << 62) // IBM bit 1
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 264e266a85bf..dd229d5f46ee 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -464,6 +464,7 @@ struct kvm_ppc_cpu_char {
 #define KVM_PPC_CPU_CHAR_MTTRIG_THR_RECONF (1ULL << 57)
 #define KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS   (1ULL << 56)
 #define KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST(1ull << 54)
+#define KVM_PPC_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST   (1ull << 53)
 
 #define KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY  (1ULL << 63)
 #define KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR (1ULL << 62)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 27ccff612903..fa981ee09dec 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -2221,7 +2221,8 @@ static int pseries_get_cpu_char(struct kvm_ppc_cpu_char 
*cp)
KVM_PPC_CPU_CHAR_BR_HINT_HONOURED |
KVM_PPC_CPU_CHAR_MTTRIG_THR_RECONF |
KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS |
-   KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST;
+   KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST |
+   KVM_PPC_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST;
cp->behaviour_mask = KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY |
KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR |
KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR |
@@ -2287,13 +2288,17 @@ static int kvmppc_get_cpu_char(struct kvm_ppc_cpu_char 
*cp)
if (have_fw_feat(fw_features, "enabled",
 "fw-count-cache-flush-bcctr2,0,0"))
cp->character |= KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST;
+   if (have_fw_feat(fw_features, "enabled",
+"fw-link-stack-flush-bcctr2,0,0"))
+   cp->character |= 
KVM_PPC_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST;
cp->character_mask = KVM_PPC_CPU_CHAR_SPEC_BAR_ORI31 |
KVM_PPC_CPU_CHAR_BCCTRL_SERIALISED |
KVM_PPC_CPU_CHAR_L1D_FLUSH_ORI30 |
KVM_PPC_CPU_CHAR_L1D_FLUSH_TRIG2 |
KVM_PPC_CPU_CHAR_L1D_THREAD_PRIV |
KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS |
-   KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST;
+   KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST |
+   KVM_PPC_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST;
 
if (have_fw_feat(fw_features, "enabled",
 "speculation-policy-favor-security"))
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 3bc188da82ba..1a06d3b4c0a9 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -78,6 +78,9 @@ static void init_fw_feat_flags(struct device_node *np)
if (fw_feature_is("enabled", "fw-count-cache-flush-bcctr2,0,0", np))
security_ftr_set(SEC_FTR_BCCTR_FLUSH_ASSIST);
 
+   if (fw_feature_is("enabled", "fw-link-stack-flush-bcctr2,0,0", np))
+   security_ftr_set(SEC_FTR_BCCTR_LINK_FLUSH_ASSIST);
+
if (fw_feature_is("enabled", 
"needs-count-cache-flush-on-context-switch", np))
security_ftr_set(SEC_FTR_FLUSH_COUNT_CACHE);
 
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 64d18f4bf093..70c9264f23c5 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -517,6 +517,9 @@ static void init_cpu_char_feature_flags(struct 
h_cpu_char_result *result)
if (result->character & H_CPU_CHAR_BCCTR_FLUSH_ASSIST)
security_ftr_set(SEC_FTR_BCCTR_FLUSH_ASSIST);
 
+   if (result->character & H_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST)
+   security_ftr_set(S

[PATCH 6/7] powerpc/security: Allow for processors that flush the link stack using the special bcctr

2020-06-09 Thread Nicholas Piggin
If both count cache and link stack are to be flushed, and can be flushed
with the special bcctr, patch that in directly to the flush/branch nop
site.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/security_features.h |  2 ++
 arch/powerpc/kernel/security.c   | 27 ++--
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/security_features.h 
b/arch/powerpc/include/asm/security_features.h
index 7c05e95a5c44..fbb8fa32150f 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -63,6 +63,8 @@ static inline bool security_ftr_enabled(u64 feature)
 // bcctr 2,0,0 triggers a hardware assisted count cache flush
 #define SEC_FTR_BCCTR_FLUSH_ASSIST 0x0800ull
 
+// bcctr 2,0,0 triggers a hardware assisted link stack flush
+#define SEC_FTR_BCCTR_LINK_FLUSH_ASSIST0x2000ull
 
 // Features indicating need for Spectre/Meltdown mitigations
 
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 2a413af21124..6ad5c753d47c 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -219,24 +219,25 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
device_attribute *attr, c
if (ccd)
seq_buf_printf(&s, "Indirect branch cache disabled");
 
-   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_SW)
-   seq_buf_printf(&s, ", Software link stack flush");
-
} else if (count_cache_flush_type != BRANCH_CACHE_FLUSH_NONE) {
seq_buf_printf(&s, "Mitigation: Software count cache flush");
 
if (count_cache_flush_type == BRANCH_CACHE_FLUSH_HW)
seq_buf_printf(&s, " (hardware accelerated)");
 
-   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_SW)
-   seq_buf_printf(&s, ", Software link stack flush");
-
} else if (btb_flush_enabled) {
seq_buf_printf(&s, "Mitigation: Branch predictor state flush");
} else {
seq_buf_printf(&s, "Vulnerable");
}
 
+   if (bcs || ccd || count_cache_flush_type != BRANCH_CACHE_FLUSH_NONE) {
+   if (link_stack_flush_type != BRANCH_CACHE_FLUSH_NONE)
+   seq_buf_printf(&s, ", Software link stack flush");
+   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_HW)
+   seq_buf_printf(&s, " (hardware accelerated)");
+   }
+
seq_buf_printf(&s, "\n");
 
return s.len;
@@ -435,6 +436,7 @@ static void update_branch_cache_flush(void)
patch_instruction_site(&patch__call_kvm_flush_link_stack,
   ppc_inst(PPC_INST_NOP));
} else {
+   // Could use HW flush, but that could also flush count cache
patch_branch_site(&patch__call_kvm_flush_link_stack,
  (u64)&kvm_flush_link_stack, BRANCH_SET_LINK);
}
@@ -445,6 +447,10 @@ static void update_branch_cache_flush(void)
link_stack_flush_type == BRANCH_CACHE_FLUSH_NONE) {
patch_instruction_site(&patch__call_flush_branch_caches,
   ppc_inst(PPC_INST_NOP));
+   } else if (count_cache_flush_type == BRANCH_CACHE_FLUSH_HW &&
+  link_stack_flush_type == BRANCH_CACHE_FLUSH_HW) {
+   patch_instruction_site(&patch__call_flush_branch_caches,
+  ppc_inst(PPC_INST_BCCTR_FLUSH));
} else {
patch_branch_site(&patch__call_flush_branch_caches,
  (u64)&flush_branch_caches, BRANCH_SET_LINK);
@@ -485,8 +491,13 @@ static void toggle_branch_cache_flush(bool enable)
pr_info("link-stack-flush: flush disabled.\n");
}
} else {
-   link_stack_flush_type = BRANCH_CACHE_FLUSH_SW;
-   pr_info("link-stack-flush: software flush enabled.\n");
+   if (security_ftr_enabled(SEC_FTR_BCCTR_LINK_FLUSH_ASSIST)) {
+   link_stack_flush_type = BRANCH_CACHE_FLUSH_HW;
+   pr_info("link-stack-flush: hardware flush enabled.\n");
+   } else {
+   link_stack_flush_type = BRANCH_CACHE_FLUSH_SW;
+   pr_info("link-stack-flush: software flush enabled.\n");
+   }
}
 
update_branch_cache_flush();
-- 
2.23.0



[PATCH 5/7] powerpc/64s: Move branch cache flushing bcctr variant to ppc-ops.h

2020-06-09 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ppc-opcode.h | 2 ++
 arch/powerpc/kernel/entry_64.S| 6 ++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 2a39c716c343..79d511a38bbb 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -195,6 +195,7 @@
 #define OP_LQ56
 
 /* sorted alphabetically */
+#define PPC_INST_BCCTR_FLUSH   0x4c400420
 #define PPC_INST_BHRBE 0x7c00025c
 #define PPC_INST_CLRBHRB   0x7c00035c
 #define PPC_INST_COPY  0x7c20060c
@@ -432,6 +433,7 @@
 #endif
 
 /* Deal with instructions that older assemblers aren't aware of */
+#definePPC_BCCTR_FLUSH stringify_in_c(.long 
PPC_INST_BCCTR_FLUSH)
 #definePPC_CP_ABORTstringify_in_c(.long PPC_INST_CP_ABORT)
 #definePPC_COPY(a, b)  stringify_in_c(.long PPC_INST_COPY | \
___PPC_RA(a) | ___PPC_RB(b))
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 2ba25b3b701e..a115aeb2983a 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -261,8 +261,6 @@ _ASM_NOKPROBE_SYMBOL(save_nvgprs);
 1: nop;\
patch_site 1b, patch__call_flush_branch_caches
 
-#define BCCTR_FLUSH.long 0x4c400420
-
 .macro nops number
.rept \number
nop
@@ -293,7 +291,7 @@ flush_branch_caches:
li  r9,0x7fff
mtctr   r9
 
-   BCCTR_FLUSH
+   PPC_BCCTR_FLUSH
 
 2: nop
patch_site 2b patch__flush_count_cache_return
@@ -302,7 +300,7 @@ flush_branch_caches:
 
.rept 278
.balign 32
-   BCCTR_FLUSH
+   PPC_BCCTR_FLUSH
nops7
.endr
 
-- 
2.23.0



[PATCH 4/7] powerpc/security: split branch cache flush toggle from code patching

2020-06-09 Thread Nicholas Piggin
Branch cache flushing code patching has inter-dependencies on both the
link stack and the count cache flushing state.

To make the code clearer and to separate the link stack and count
cache handling, split the "toggle" (setting up variables and printing
enable/disable) from the code patching.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/security.c | 94 ++
 1 file changed, 51 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 659ef6a92bb9..2a413af21124 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -427,61 +427,69 @@ static __init int stf_barrier_debugfs_init(void)
 device_initcall(stf_barrier_debugfs_init);
 #endif /* CONFIG_DEBUG_FS */
 
-static void no_count_cache_flush(void)
+static void update_branch_cache_flush(void)
 {
-   count_cache_flush_type = BRANCH_CACHE_FLUSH_NONE;
-   pr_info("count-cache-flush: flush disabled.\n");
-}
-
-static void toggle_branch_cache_flush(bool enable)
-{
-   if (!security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE) &&
-   !security_ftr_enabled(SEC_FTR_FLUSH_LINK_STACK))
-   enable = false;
-
-   if (!enable) {
-   patch_instruction_site(&patch__call_flush_branch_caches,
-  ppc_inst(PPC_INST_NOP));
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   // This controls the branch from guest_exit_cont to kvm_flush_link_stack
+   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_NONE) {
patch_instruction_site(&patch__call_kvm_flush_link_stack,
   ppc_inst(PPC_INST_NOP));
-#endif
-   pr_info("link-stack-flush: flush disabled.\n");
-   link_stack_flush_type = BRANCH_CACHE_FLUSH_NONE;
-   no_count_cache_flush();
-   return;
+   } else {
+   patch_branch_site(&patch__call_kvm_flush_link_stack,
+ (u64)&kvm_flush_link_stack, BRANCH_SET_LINK);
}
-
-   // This enables the branch from _switch to flush_branch_caches
-   patch_branch_site(&patch__call_flush_branch_caches,
- (u64)&flush_branch_caches, BRANCH_SET_LINK);
-
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-   // This enables the branch from guest_exit_cont to kvm_flush_link_stack
-   patch_branch_site(&patch__call_kvm_flush_link_stack,
- (u64)&kvm_flush_link_stack, BRANCH_SET_LINK);
 #endif
 
-   pr_info("link-stack-flush: software flush enabled.\n");
-   link_stack_flush_type = BRANCH_CACHE_FLUSH_SW;
+   // This controls the branch from _switch to flush_branch_caches
+   if (count_cache_flush_type == BRANCH_CACHE_FLUSH_NONE &&
+   link_stack_flush_type == BRANCH_CACHE_FLUSH_NONE) {
+   patch_instruction_site(&patch__call_flush_branch_caches,
+  ppc_inst(PPC_INST_NOP));
+   } else {
+   patch_branch_site(&patch__call_flush_branch_caches,
+ (u64)&flush_branch_caches, BRANCH_SET_LINK);
+
+   // If we just need to flush the link stack, early return
+   if (count_cache_flush_type == BRANCH_CACHE_FLUSH_NONE) {
+   patch_instruction_site(&patch__flush_link_stack_return,
+  ppc_inst(PPC_INST_BLR));
+
+   // If we have flush instruction, early return
+   } else if (count_cache_flush_type == BRANCH_CACHE_FLUSH_HW) {
+   patch_instruction_site(&patch__flush_count_cache_return,
+  ppc_inst(PPC_INST_BLR));
+   }
+   }
+}
 
-   // If we just need to flush the link stack, patch an early return
-   if (!security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE)) {
-   patch_instruction_site(&patch__flush_link_stack_return,
-  ppc_inst(PPC_INST_BLR));
-   no_count_cache_flush();
-   return;
+static void toggle_branch_cache_flush(bool enable)
+{
+   if (!enable || !security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE)) {
+   if (count_cache_flush_type != BRANCH_CACHE_FLUSH_NONE) {
+   count_cache_flush_type = BRANCH_CACHE_FLUSH_NONE;
+   pr_info("count-cache-flush: flush disabled.\n");
+   }
+   } else {
+   if (security_ftr_enabled(SEC_FTR_BCCTR_FLUSH_ASSIST)) {
+   count_cache_flush_type = BRANCH_CACHE_FLUSH_HW;
+   pr_info("count-cache-flush: hardware flush enabled.\n");
+   } else {
+   count_cache_flush_type = BRANCH_CACHE_FLUSH_SW;
+   pr_info("count-cache-flush: software flush enabled.\n");
+   }
}
 
-   if (!security_ftr_enabled(SEC_FTR_BCCTR_FLUSH

[PATCH 3/7] powerpc/security: make display of branch cache flush more consistent

2020-06-09 Thread Nicholas Piggin
Make the count-cache and link-stack messages look the same

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/security.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 28f4cb062f69..659ef6a92bb9 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -430,7 +430,7 @@ device_initcall(stf_barrier_debugfs_init);
 static void no_count_cache_flush(void)
 {
count_cache_flush_type = BRANCH_CACHE_FLUSH_NONE;
-   pr_info("count-cache-flush: software flush disabled.\n");
+   pr_info("count-cache-flush: flush disabled.\n");
 }
 
 static void toggle_branch_cache_flush(bool enable)
@@ -446,7 +446,7 @@ static void toggle_branch_cache_flush(bool enable)
patch_instruction_site(&patch__call_kvm_flush_link_stack,
   ppc_inst(PPC_INST_NOP));
 #endif
-   pr_info("link-stack-flush: software flush disabled.\n");
+   pr_info("link-stack-flush: flush disabled.\n");
link_stack_flush_type = BRANCH_CACHE_FLUSH_NONE;
no_count_cache_flush();
return;
@@ -475,13 +475,13 @@ static void toggle_branch_cache_flush(bool enable)
 
if (!security_ftr_enabled(SEC_FTR_BCCTR_FLUSH_ASSIST)) {
count_cache_flush_type = BRANCH_CACHE_FLUSH_SW;
-   pr_info("count-cache-flush: full software flush sequence 
enabled.\n");
+   pr_info("count-cache-flush: software flush enabled.\n");
return;
}
 
patch_instruction_site(&patch__flush_count_cache_return, 
ppc_inst(PPC_INST_BLR));
count_cache_flush_type = BRANCH_CACHE_FLUSH_HW;
-   pr_info("count-cache-flush: hardware assisted flush sequence 
enabled\n");
+   pr_info("count-cache-flush: hardware flush enabled.\n");
 }
 
 void setup_count_cache_flush(void)
-- 
2.23.0



[PATCH 2/7] powerpc/security: change link stack flush state to the flush type enum

2020-06-09 Thread Nicholas Piggin
Prepare to allow for hardware link stack flushing by using the
none/sw/hw type, same as the count cache state.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/security.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index df2a3eff950b..28f4cb062f69 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -27,7 +27,7 @@ enum branch_cache_flush_type {
BRANCH_CACHE_FLUSH_HW   = 0x4,
 };
 static enum branch_cache_flush_type count_cache_flush_type = 
BRANCH_CACHE_FLUSH_NONE;
-static bool link_stack_flush_enabled;
+static enum branch_cache_flush_type link_stack_flush_type = 
BRANCH_CACHE_FLUSH_NONE;
 
 bool barrier_nospec_enabled;
 static bool no_nospec;
@@ -219,7 +219,7 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
device_attribute *attr, c
if (ccd)
seq_buf_printf(&s, "Indirect branch cache disabled");
 
-   if (link_stack_flush_enabled)
+   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_SW)
seq_buf_printf(&s, ", Software link stack flush");
 
} else if (count_cache_flush_type != BRANCH_CACHE_FLUSH_NONE) {
@@ -228,7 +228,7 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
device_attribute *attr, c
if (count_cache_flush_type == BRANCH_CACHE_FLUSH_HW)
seq_buf_printf(&s, " (hardware accelerated)");
 
-   if (link_stack_flush_enabled)
+   if (link_stack_flush_type == BRANCH_CACHE_FLUSH_SW)
seq_buf_printf(&s, ", Software link stack flush");
 
} else if (btb_flush_enabled) {
@@ -447,7 +447,7 @@ static void toggle_branch_cache_flush(bool enable)
   ppc_inst(PPC_INST_NOP));
 #endif
pr_info("link-stack-flush: software flush disabled.\n");
-   link_stack_flush_enabled = false;
+   link_stack_flush_type = BRANCH_CACHE_FLUSH_NONE;
no_count_cache_flush();
return;
}
@@ -463,7 +463,7 @@ static void toggle_branch_cache_flush(bool enable)
 #endif
 
pr_info("link-stack-flush: software flush enabled.\n");
-   link_stack_flush_enabled = true;
+   link_stack_flush_type = BRANCH_CACHE_FLUSH_SW;
 
// If we just need to flush the link stack, patch an early return
if (!security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE)) {
-- 
2.23.0



[PATCH 1/7] powerpc/security: re-name count cache flush to branch cache flush

2020-06-09 Thread Nicholas Piggin
The count cache flush mostly refers to both count cache and link stack
flushing. As a first step to untangling these a bit, re-name the bits
that apply to both.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/asm-prototypes.h |  4 +--
 arch/powerpc/kernel/entry_64.S|  7 ++---
 arch/powerpc/kernel/security.c| 36 +++
 3 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 7d81e86a1e5d..fa9057360e88 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -144,13 +144,13 @@ void _kvmppc_restore_tm_pr(struct kvm_vcpu *vcpu, u64 
guest_msr);
 void _kvmppc_save_tm_pr(struct kvm_vcpu *vcpu, u64 guest_msr);
 
 /* Patch sites */
-extern s32 patch__call_flush_count_cache;
+extern s32 patch__call_flush_branch_caches;
 extern s32 patch__flush_count_cache_return;
 extern s32 patch__flush_link_stack_return;
 extern s32 patch__call_kvm_flush_link_stack;
 extern s32 patch__memset_nocache, patch__memcpy_nocache;
 
-extern long flush_count_cache;
+extern long flush_branch_caches;
 extern long kvm_flush_link_stack;
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 9d49338e0c85..2ba25b3b701e 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -259,8 +259,7 @@ _ASM_NOKPROBE_SYMBOL(save_nvgprs);
 
 #define FLUSH_COUNT_CACHE  \
 1: nop;\
-   patch_site 1b, patch__call_flush_count_cache
-
+   patch_site 1b, patch__call_flush_branch_caches
 
 #define BCCTR_FLUSH.long 0x4c400420
 
@@ -271,8 +270,8 @@ _ASM_NOKPROBE_SYMBOL(save_nvgprs);
 .endm
 
 .balign 32
-.global flush_count_cache
-flush_count_cache:
+.global flush_branch_caches
+flush_branch_caches:
/* Save LR into r9 */
mflrr9
 
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index d86701ce116b..df2a3eff950b 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -21,12 +21,12 @@
 
 u64 powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
 
-enum count_cache_flush_type {
-   COUNT_CACHE_FLUSH_NONE  = 0x1,
-   COUNT_CACHE_FLUSH_SW= 0x2,
-   COUNT_CACHE_FLUSH_HW= 0x4,
+enum branch_cache_flush_type {
+   BRANCH_CACHE_FLUSH_NONE = 0x1,
+   BRANCH_CACHE_FLUSH_SW   = 0x2,
+   BRANCH_CACHE_FLUSH_HW   = 0x4,
 };
-static enum count_cache_flush_type count_cache_flush_type = 
COUNT_CACHE_FLUSH_NONE;
+static enum branch_cache_flush_type count_cache_flush_type = 
BRANCH_CACHE_FLUSH_NONE;
 static bool link_stack_flush_enabled;
 
 bool barrier_nospec_enabled;
@@ -222,10 +222,10 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
device_attribute *attr, c
if (link_stack_flush_enabled)
seq_buf_printf(&s, ", Software link stack flush");
 
-   } else if (count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) {
+   } else if (count_cache_flush_type != BRANCH_CACHE_FLUSH_NONE) {
seq_buf_printf(&s, "Mitigation: Software count cache flush");
 
-   if (count_cache_flush_type == COUNT_CACHE_FLUSH_HW)
+   if (count_cache_flush_type == BRANCH_CACHE_FLUSH_HW)
seq_buf_printf(&s, " (hardware accelerated)");
 
if (link_stack_flush_enabled)
@@ -429,18 +429,18 @@ device_initcall(stf_barrier_debugfs_init);
 
 static void no_count_cache_flush(void)
 {
-   count_cache_flush_type = COUNT_CACHE_FLUSH_NONE;
+   count_cache_flush_type = BRANCH_CACHE_FLUSH_NONE;
pr_info("count-cache-flush: software flush disabled.\n");
 }
 
-static void toggle_count_cache_flush(bool enable)
+static void toggle_branch_cache_flush(bool enable)
 {
if (!security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE) &&
!security_ftr_enabled(SEC_FTR_FLUSH_LINK_STACK))
enable = false;
 
if (!enable) {
-   patch_instruction_site(&patch__call_flush_count_cache,
+   patch_instruction_site(&patch__call_flush_branch_caches,
   ppc_inst(PPC_INST_NOP));
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
patch_instruction_site(&patch__call_kvm_flush_link_stack,
@@ -452,9 +452,9 @@ static void toggle_count_cache_flush(bool enable)
return;
}
 
-   // This enables the branch from _switch to flush_count_cache
-   patch_branch_site(&patch__call_flush_count_cache,
- (u64)&flush_count_cache, BRANCH_SET_LINK);
+   // This enables the branch from _switch to flush_branch_caches
+   patch_branch_site(&patch__call_flush_branch_caches,
+ (u64)&flush_branch_caches, BRANCH_SET_LINK);
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
// This enables the branch from guest_exit_cont to kvm

[PATCH 0/7] powerpc: branch cache flush changes

2020-06-09 Thread Nicholas Piggin
This series allows the link stack to be flushed with the speical
bcctr 2,0,0 flush instruction that also flushes the count cache if
the processor supports it.

Firmware does not support this at the moment, but I've tested it in
simulator with a patched firmware to advertise support.

Thanks,
Nick

Nicholas Piggin (7):
  powerpc/security: re-name count cache flush to branch cache flush
  powerpc/security: change link stack flush state to the flush type enum
  powerpc/security: make display of branch cache flush more consistent
  powerpc/security: split branch cache flush toggle from code patching
  powerpc/64s: Move branch cache flushing bcctr variant to ppc-ops.h
  powerpc/security: Allow for processors that flush the link stack using
the special bcctr
  powerpc/64s: advertise hardware link stack flush

 arch/powerpc/include/asm/asm-prototypes.h|   4 +-
 arch/powerpc/include/asm/hvcall.h|   1 +
 arch/powerpc/include/asm/ppc-opcode.h|   2 +
 arch/powerpc/include/asm/security_features.h |   2 +
 arch/powerpc/include/uapi/asm/kvm.h  |   1 +
 arch/powerpc/kernel/entry_64.S   |  13 +-
 arch/powerpc/kernel/security.c   | 139 +++
 arch/powerpc/kvm/powerpc.c   |   9 +-
 arch/powerpc/platforms/powernv/setup.c   |   3 +
 arch/powerpc/platforms/pseries/setup.c   |   3 +
 tools/arch/powerpc/include/uapi/asm/kvm.h|   1 +
 11 files changed, 106 insertions(+), 72 deletions(-)

-- 
2.23.0



Re: [PATCH] selftests: powerpc: Fix online CPU selection

2020-06-09 Thread Sandipan Das



On 08/06/20 8:12 pm, Sandipan Das wrote:
> The size of the cpu set must be large enough for systems
> with a very large number of CPUs. Otherwise, tests which
> try to determine the first online CPU by calling
> sched_getaffinity() will fail. This makes sure that the
> size of the allocated cpu set is dependent on the number
> of CPUs as reported by get_nprocs().
> 
> Fixes: 3752e453f6ba ("selftests/powerpc: Add tests of PMU EBBs")
> Reported-by: Shirisha Ganta 
> Signed-off-by: Sandipan Das 
> ---
>  tools/testing/selftests/powerpc/utils.c | 33 -
>  1 file changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/testing/selftests/powerpc/utils.c 
> b/tools/testing/selftests/powerpc/utils.c
> index 933678f1ed0a..bb8e402752c0 100644
> --- a/tools/testing/selftests/powerpc/utils.c
> +++ b/tools/testing/selftests/powerpc/utils.c
> @@ -16,6 +16,7 @@
> @@ -88,28 +89,36 @@ void *get_auxv_entry(int type)
> [...] 
>  int pick_online_cpu(void)
>  {
> - cpu_set_t mask;
> - int cpu;
> + int ncpus, cpu = -1;
> + cpu_set_t *mask;
> + size_t size;
>  
> - CPU_ZERO(&mask);
> + ncpus = get_nprocs();
> + size = CPU_ALLOC_SIZE(ncpus);
> + mask = CPU_ALLOC(ncpus);
>  
> - if (sched_getaffinity(0, sizeof(mask), &mask)) {
> + CPU_ZERO_S(size, mask);
> +
> + if (sched_getaffinity(0, size, mask)) {
>   perror("sched_getaffinity");
> - return -1;
> + goto done;
>   }
>  
>   /* We prefer a primary thread, but skip 0 */
> - for (cpu = 8; cpu < CPU_SETSIZE; cpu += 8)
> - if (CPU_ISSET(cpu, &mask))
> - return cpu;
> + for (cpu = 8; cpu < ncpus; cpu += 8)
> + if (CPU_ISSET_S(cpu, size, mask))
> + goto done;
>  
>   /* Search for anything, but in reverse */
> - for (cpu = CPU_SETSIZE - 1; cpu >= 0; cpu--)
> - if (CPU_ISSET(cpu, &mask))
> - return cpu;
> + for (cpu = ncpus - 1; cpu >= 0; cpu--)
> + if (CPU_ISSET_S(cpu, size, mask))
> + goto done;
>  
>   printf("No cpus in affinity mask?!\n");
> - return -1;

There's a bug here as cpu should have been set to -1.
Will send v2 with this fix.

> +
> +done:
> + CPU_FREE(mask);
> + return cpu;
>  }
>  
>  bool is_ppc64le(void)
> 

- Sandipan


Re: [PATCH] powerpc/powernv: Fix a warning message

2020-06-09 Thread Michael Ellerman
On Sat, 2020-05-02 at 11:59:49 UTC, Christophe JAILLET wrote:
> Fix a cut'n'paste error in a warning message. This should be
> 'cpu-idle-state-residency-ns' to match the property searched in the
> previous 'of_property_read_u32_array()'
> 
> Fixes: 9c7b185ab2fe ("powernv/cpuidle: Parse dt idle properties into global 
> structure")
> Signed-off-by: Christophe JAILLET 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2f62870ca5bc9d305f3c212192320c29e9dbdc54

cheers


Re: [PATCH v3 2/5] powerpc: module_[32|64].c: replace swap function with built-in one

2020-06-09 Thread Michael Ellerman
On Tue, 2019-04-02 at 20:47:22 UTC, Andrey Abramov wrote:
> Replace relaswap with built-in one, because relaswap
> does a simple byte to byte swap.
> 
> Since Spectre mitigations have made indirect function calls more
> expensive, and the default simple byte copies swap is implemented
> without them, an "optimized" custom swap function is now
> a waste of time as well as code.
> 
> Signed-off-by: Andrey Abramov 
> Reviewed by: George Spelvin 
> Acked-by: Michael Ellerman  (powerpc)

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/bac7ca7b985b72873bd4ac2553b13b5af5b1f08a

cheers


Re: [PATCH v3 7/9] powerpc/ps3: Add check for otheros image size

2020-06-09 Thread Michael Ellerman
On Sat, 2020-05-16 at 16:20:46 UTC, Geoff Levand wrote:
> The ps3's otheros flash loader has a size limit of 16 MiB for the
> uncompressed image.  If that limit will be reached output the
> flash image file as 'otheros-too-big.bld'.
> 
> Signed-off-by: Geoff Levand 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/aa3bc365ee73765af5059678bf55b0f3e4a3e6c4

cheers


Re: [PATCH 0/6] assorted kuap fixes (try again)

2020-06-09 Thread Michael Ellerman
On Wed, 29 Apr 2020 16:56:48 +1000, Nicholas Piggin wrote:
> Well the last series was a disaster, I'll try again sending the
> patches with proper subject and changelogs written.
> 
> Nicholas Piggin (6):
>   powerpc/64/kuap: move kuap checks out of MSR[RI]=0 regions of exit
> code
>   powerpc/64s/kuap: kuap_restore missing isync
>   powerpc/64/kuap: interrupt exit conditionally restore AMR
>   powerpc/64s/kuap: restore AMR in system reset exception
>   powerpc/64s/kuap: restore AMR in fast_interrupt_return
>   powerpc/64s/kuap: conditionally restore AMR in kuap_restore_amr asm
> 
> [...]

Patches 2, 3 and 6 applied to powerpc/next.

[2/6] powerpc/64s/kuap: Add missing isync to KUAP restore paths
  https://git.kernel.org/powerpc/c/cb2b53cbffe3c388cd676b63f34e54ceb2643ae2
[3/6] powerpc/64/kuap: Conditionally restore AMR in interrupt exit
  https://git.kernel.org/powerpc/c/579940bb451c2dd33396d2d56ce6ef5d92154b3b
[6/6] powerpc/64s/kuap: Conditionally restore AMR in kuap_restore_amr asm
  https://git.kernel.org/powerpc/c/d4539074b0e9c5fa6508e8c33aaf51abc8ff6e91

cheers