date:20190806

Re: [PATCH] nvdimm/of_pmem: Provide a unique name for bus provider

2019-08-06 Thread Vaibhav Jain

"Aneesh Kumar K.V"  writes:

> ndctl utility requires the ndbus to have unique names. If not while
> enumerating the bus in userspace it drops bus with similar names.
> This results in us not listing devices beneath the bus.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  drivers/nvdimm/of_pmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> index a0c8dcfa0bf9..97187d6c0bdb 100644
> --- a/drivers/nvdimm/of_pmem.c
> +++ b/drivers/nvdimm/of_pmem.c
> @@ -42,7 +42,7 @@ static int of_pmem_region_probe(struct platform_device 
> *pdev)
>   return -ENOMEM;
>  
>   priv->bus_desc.attr_groups = bus_attr_groups;
> - priv->bus_desc.provider_name = "of_pmem";
> + priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);
>   priv->bus_desc.module = THIS_MODULE;
>   priv->bus_desc.of_node = np;
>  
> -- 
> 2.21.0
>

Tested-by: Vaibhav Jain 
-- 
Vaibhav Jain 
Linux Technology Center, IBM India Pvt. Ltd.

Re: [PATCH] nvdimm/of_pmem: Provide a unique name for bus provider

2019-08-06 Thread Dan Williams

On Tue, Aug 6, 2019 at 9:17 PM Aneesh Kumar K.V
 wrote:
>
> On 8/7/19 9:43 AM, Dan Williams wrote:
> > On Tue, Aug 6, 2019 at 9:00 PM Aneesh Kumar K.V
> >  wrote:
> >>
> >> ndctl utility requires the ndbus to have unique names. If not while
> >> enumerating the bus in userspace it drops bus with similar names.
> >> This results in us not listing devices beneath the bus.
> >
> > It does?
> >
> >>
> >> Signed-off-by: Aneesh Kumar K.V 
> >> ---
> >>   drivers/nvdimm/of_pmem.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> >> index a0c8dcfa0bf9..97187d6c0bdb 100644
> >> --- a/drivers/nvdimm/of_pmem.c
> >> +++ b/drivers/nvdimm/of_pmem.c
> >> @@ -42,7 +42,7 @@ static int of_pmem_region_probe(struct platform_device 
> >> *pdev)
> >>  return -ENOMEM;
> >>
> >>  priv->bus_desc.attr_groups = bus_attr_groups;
> >> -   priv->bus_desc.provider_name = "of_pmem";
> >> +   priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);
> >
> > This looks ok to me to address support for older ndctl binaries, but
> > I'd like to also fix the ndctl bug that makes non-unique provider
> > names fail.
> >
>
> 0462269ab121d323a016874ebdd42217f2911ee7 (ndctl: provide a method to
> invalidate the bus list)
>
> This hunk does the filtering.
>
> @@ -928,6 +929,14 @@ static int add_bus(void *parent, int id, const char
> *ctl_base)
> goto err_read;
> bus->buf_len = strlen(bus->bus_path) + 50;
>
> +   ndctl_bus_foreach(ctx, bus_dup)
> +   if (strcmp(ndctl_bus_get_provider(bus_dup),
> +   ndctl_bus_get_provider(bus)) == 0) {
> +   free_bus(bus, NULL);
> +   free(path);
> +   return 1;
> +   }
> +

Yup, that's broken, does this incremental fix work?

diff --git a/ndctl/lib/libndctl.c b/ndctl/lib/libndctl.c
index 4d9cc7e29c6b..6596f94edef8 100644
--- a/ndctl/lib/libndctl.c
+++ b/ndctl/lib/libndctl.c
@@ -889,7 +889,9 @@ static void *add_bus(void *parent, int id, const
char *ctl_base)

ndctl_bus_foreach(ctx, bus_dup)
if (strcmp(ndctl_bus_get_provider(bus_dup),
-   ndctl_bus_get_provider(bus)) == 0) {
+   ndctl_bus_get_provider(bus)) == 0
+   && strcmp(ndctl_bus_get_devname(bus_dup),
+   ndctl_bus_get_devname(bus)) == 0) {
free_bus(bus, NULL);
free(path);
return bus_dup;

Re: [PATCH] nvdimm/of_pmem: Provide a unique name for bus provider

2019-08-06 Thread Aneesh Kumar K.V


On 8/7/19 9:43 AM, Dan Williams wrote:

On Tue, Aug 6, 2019 at 9:00 PM Aneesh Kumar K.V
 wrote:


ndctl utility requires the ndbus to have unique names. If not while
enumerating the bus in userspace it drops bus with similar names.
This results in us not listing devices beneath the bus.


It does?



Signed-off-by: Aneesh Kumar K.V 
---
  drivers/nvdimm/of_pmem.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index a0c8dcfa0bf9..97187d6c0bdb 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -42,7 +42,7 @@ static int of_pmem_region_probe(struct platform_device *pdev)
 return -ENOMEM;

 priv->bus_desc.attr_groups = bus_attr_groups;
-   priv->bus_desc.provider_name = "of_pmem";
+   priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);


This looks ok to me to address support for older ndctl binaries, but
I'd like to also fix the ndctl bug that makes non-unique provider
names fail.



0462269ab121d323a016874ebdd42217f2911ee7 (ndctl: provide a method to 
invalidate the bus list)


This hunk does the filtering.

@@ -928,6 +929,14 @@ static int add_bus(void *parent, int id, const char 
*ctl_base)

goto err_read;
bus->buf_len = strlen(bus->bus_path) + 50;

+   ndctl_bus_foreach(ctx, bus_dup)
+   if (strcmp(ndctl_bus_get_provider(bus_dup),
+   ndctl_bus_get_provider(bus)) == 0) {
+   free_bus(bus, NULL);
+   free(path);
+   return 1;
+   }
+
list_add(>busses, >list);

rc = 0;

Re: [PATCH] nvdimm/of_pmem: Provide a unique name for bus provider

2019-08-06 Thread Dan Williams

On Tue, Aug 6, 2019 at 9:00 PM Aneesh Kumar K.V
 wrote:
>
> ndctl utility requires the ndbus to have unique names. If not while
> enumerating the bus in userspace it drops bus with similar names.
> This results in us not listing devices beneath the bus.

It does?

>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  drivers/nvdimm/of_pmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> index a0c8dcfa0bf9..97187d6c0bdb 100644
> --- a/drivers/nvdimm/of_pmem.c
> +++ b/drivers/nvdimm/of_pmem.c
> @@ -42,7 +42,7 @@ static int of_pmem_region_probe(struct platform_device 
> *pdev)
> return -ENOMEM;
>
> priv->bus_desc.attr_groups = bus_attr_groups;
> -   priv->bus_desc.provider_name = "of_pmem";
> +   priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);

This looks ok to me to address support for older ndctl binaries, but
I'd like to also fix the ndctl bug that makes non-unique provider
names fail.

[PATCH] nvdimm/of_pmem: Provide a unique name for bus provider

2019-08-06 Thread Aneesh Kumar K.V

ndctl utility requires the ndbus to have unique names. If not while
enumerating the bus in userspace it drops bus with similar names.
This results in us not listing devices beneath the bus.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/nvdimm/of_pmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index a0c8dcfa0bf9..97187d6c0bdb 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -42,7 +42,7 @@ static int of_pmem_region_probe(struct platform_device *pdev)
return -ENOMEM;
 
priv->bus_desc.attr_groups = bus_attr_groups;
-   priv->bus_desc.provider_name = "of_pmem";
+   priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);
priv->bus_desc.module = THIS_MODULE;
priv->bus_desc.of_node = np;
 
-- 
2.21.0

[PATCH v4 9/9] powerpc/eeh: Convert log messages to eeh_edev_* macros

2019-08-06 Thread Sam Bobroff

Convert existing messages, where appropriate, to use the eeh_edev_*
logging macros.

The only effect should be minor adjustments to the log messages, apart
from:

- A new message in pseries_eeh_probe() "Probing device" to match the
powernv case.
- The "Probing device" message in pnv_eeh_probe() is now generated
slightly later, which will mean that it is no longer emitted for
devices that aren't probed due to the initial checks.

Signed-off-by: Sam Bobroff 
---
v4 - Fixed compile warning when compiling without CONFIG_IOV.

 arch/powerpc/include/asm/ppc-pci.h   |  5 --
 arch/powerpc/kernel/eeh.c| 19 +++-
 arch/powerpc/kernel/eeh_cache.c  |  8 +--
 arch/powerpc/kernel/eeh_driver.c |  7 +--
 arch/powerpc/kernel/eeh_pe.c | 51 +---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 17 ++-
 arch/powerpc/platforms/pseries/eeh_pseries.c | 21 +++-
 7 files changed, 39 insertions(+), 89 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index 72860de205a0..7f4be5a05eb3 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -62,11 +62,6 @@ void eeh_pe_dev_mode_mark(struct eeh_pe *pe, int mode);
 void eeh_sysfs_add_device(struct pci_dev *pdev);
 void eeh_sysfs_remove_device(struct pci_dev *pdev);
 
-static inline const char *eeh_pci_name(struct pci_dev *pdev) 
-{ 
-   return pdev ? pci_name(pdev) : "";
-} 
-
 static inline const char *eeh_driver_name(struct pci_dev *pdev)
 {
return (pdev && pdev->driver) ? pdev->driver->name : "";
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index c0ec1b6b1e69..b6683f367f7f 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -461,8 +461,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
/* Access to IO BARs might get this far and still not want checking. */
if (!pe) {
eeh_stats.ignored_check++;
-   pr_debug("EEH: Ignored check for %s\n",
-   eeh_pci_name(dev));
+   eeh_edev_dbg(edev, "Ignored check\n");
return 0;
}
 
@@ -502,12 +501,11 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
if (dn)
location = of_get_property(dn, "ibm,loc-code",
NULL);
-   printk(KERN_ERR "EEH: %d reads ignored for recovering 
device at "
-   "location=%s driver=%s pci addr=%s\n",
+   eeh_edev_err(edev, "%d reads ignored for recovering 
device at location=%s driver=%s\n",
pe->check_count,
location ? location : "unknown",
-   eeh_driver_name(dev), eeh_pci_name(dev));
-   printk(KERN_ERR "EEH: Might be infinite loop in %s 
driver\n",
+   eeh_driver_name(dev));
+   eeh_edev_err(edev, "Might be infinite loop in %s 
driver\n",
eeh_driver_name(dev));
dump_stack();
}
@@ -1268,12 +1266,11 @@ void eeh_add_device_late(struct pci_dev *dev)
if (!dev)
return;
 
-   pr_debug("EEH: Adding device %s\n", pci_name(dev));
-
pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
edev = pdn_to_eeh_dev(pdn);
+   eeh_edev_dbg(edev, "Adding device\n");
if (edev->pdev == dev) {
-   pr_debug("EEH: Device %s already referenced!\n", pci_name(dev));
+   eeh_edev_dbg(edev, "Device already referenced!\n");
return;
}
 
@@ -1374,10 +1371,10 @@ void eeh_remove_device(struct pci_dev *dev)
edev = pci_dev_to_eeh_dev(dev);
 
/* Unregister the device with the EEH/PCI address search system */
-   pr_debug("EEH: Removing device %s\n", pci_name(dev));
+   dev_dbg(>dev, "EEH: Removing device\n");
 
if (!edev || !edev->pdev || !edev->pe) {
-   pr_debug("EEH: Not referenced !\n");
+   dev_dbg(>dev, "EEH: Device not referenced!\n");
return;
}
 
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 8c8649172e97..45360b9eab90 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -145,8 +145,8 @@ eeh_addr_cache_insert(struct pci_dev *dev, resource_size_t 
alo,
piar->pcidev = dev;
piar->flags = flags;
 
-   pr_debug("PIAR: insert range=[%pap:%pap] dev=%s\n",
-, , pci_name(dev));
+   eeh_edev_dbg(piar->edev, "PIAR: insert range=[%pap:%pap]\n",
+, );
 
rb_link_node(>rb_node, parent, p);
rb_insert_color(>rb_node, _io_addr_cache_root.rb_root);
@@ -226,8 +226,8 @@ static inline void __eeh_addr_cache_rmv_dev(struct pci_dev

[PATCH v4 6/9] powerpc/eeh: Refactor around eeh_probe_devices()

2019-08-06 Thread Sam Bobroff

Now that EEH support for all devices (on PowerNV and pSeries) is
provided by the pcibios bus add device hooks, eeh_probe_devices() and
eeh_addr_cache_build() are redundant and can be removed.

Move the EEH enabled message into it's own function so that it can be
called from multiple places.

Note that previously on pSeries, useless EEH sysfs files were created
for some devices that did not have EEH support and this change
prevents them from being created.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh.h   |  7 ++---
 arch/powerpc/kernel/eeh.c| 27 ++---
 arch/powerpc/kernel/eeh_cache.c  | 32 
 arch/powerpc/platforms/powernv/eeh-powernv.c |  4 +--
 arch/powerpc/platforms/pseries/pci.c |  3 +-
 5 files changed, 14 insertions(+), 59 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 20105964287a..7f9404a0c3bb 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -270,13 +270,12 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
 struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
-void eeh_probe_devices(void);
+void eeh_show_enabled(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 int eeh_check_failure(const volatile void __iomem *token);
 int eeh_dev_check_failure(struct eeh_dev *edev);
 void eeh_addr_cache_init(void);
-void eeh_addr_cache_build(void);
 void eeh_add_device_early(struct pci_dn *);
 void eeh_add_device_tree_early(struct pci_dn *);
 void eeh_add_device_late(struct pci_dev *);
@@ -320,7 +319,7 @@ static inline bool eeh_enabled(void)
 return false;
 }
 
-static inline void eeh_probe_devices(void) { }
+static inline void eeh_show_enabled(void) { }
 
 static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
 {
@@ -338,8 +337,6 @@ static inline int eeh_check_failure(const volatile void 
__iomem *token)
 
 static inline void eeh_addr_cache_init(void) { }
 
-static inline void eeh_addr_cache_build(void) { }
-
 static inline void eeh_add_device_early(struct pci_dn *pdn) { }
 
 static inline void eeh_add_device_tree_early(struct pci_dn *pdn) { }
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 87edac6f2fd9..c0ec1b6b1e69 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -150,6 +150,16 @@ static int __init eeh_setup(char *str)
 }
 __setup("eeh=", eeh_setup);
 
+void eeh_show_enabled(void)
+{
+   if (eeh_has_flag(EEH_FORCE_DISABLED))
+   pr_info("EEH: Recovery disabled by kernel parameter.\n");
+   else if (eeh_has_flag(EEH_ENABLED))
+   pr_info("EEH: Capable adapter found: recovery enabled.\n");
+   else
+   pr_info("EEH: No capable adapters found: recovery disabled.\n");
+}
+
 /*
  * This routine captures assorted PCI configuration space data
  * for the indicated PCI device, and puts them into a buffer
@@ -1143,23 +1153,6 @@ static struct notifier_block eeh_reboot_nb = {
.notifier_call = eeh_reboot_notifier,
 };
 
-void eeh_probe_devices(void)
-{
-   struct pci_controller *hose, *tmp;
-   struct pci_dn *pdn;
-
-   /* Enable EEH for all adapters */
-   list_for_each_entry_safe(hose, tmp, _list, list_node) {
-   pdn = hose->pci_data;
-   traverse_pci_dn(pdn, eeh_ops->probe, NULL);
-   }
-   if (eeh_enabled())
-   pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
-   else
-   pr_info("EEH: No capable adapters found\n");
-
-}
-
 /**
  * eeh_init - EEH initialization
  *
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index a790fa49c62d..8c8649172e97 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -265,38 +265,6 @@ void eeh_addr_cache_init(void)
spin_lock_init(_io_addr_cache_root.piar_lock);
 }
 
-/**
- * eeh_addr_cache_build - Build a cache of I/O addresses
- *
- * Build a cache of pci i/o addresses.  This cache will be used to
- * find the pci device that corresponds to a given address.
- * This routine scans all pci busses to build the cache.
- * Must be run late in boot process, after the pci controllers
- * have been scanned for devices (after all device resources are known).
- */
-void eeh_addr_cache_build(void)
-{
-   struct pci_dn *pdn;
-   struct eeh_dev *edev;
-   struct pci_dev *dev = NULL;
-
-   for_each_pci_dev(dev) {
-   pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
-   if (!pdn)
-   continue;
-
-   edev = pdn_to_eeh_dev(pdn);
-   if (!edev)
-   continue;
-
-   dev->dev.archdata.edev = edev;
-   edev->pdev = dev;
-
-   eeh_addr_cache_insert_dev(dev);
-

[PATCH v4 7/9] powerpc/eeh: Add bdfn field to eeh_dev

2019-08-06 Thread Sam Bobroff

From: Oliver O'Halloran 

Preparation for removing pci_dn from the powernv EEH code. The only
thing we really use pci_dn for is to get the bdfn of the device for
config space accesses, so adding that information to eeh_dev reduces
the need to carry around the pci_dn.

Signed-off-by: Oliver O'Halloran 
[SB: Re-wrapped commit message, fixed whitespace damage.]
Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh.h | 2 ++
 arch/powerpc/include/asm/ppc-pci.h | 2 ++
 arch/powerpc/kernel/eeh_dev.c  | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 7f9404a0c3bb..bbe0798f6624 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -121,6 +121,8 @@ static inline bool eeh_pe_passed(struct eeh_pe *pe)
 struct eeh_dev {
int mode;   /* EEH mode */
int class_code; /* Class code of the device */
+   int bdfn;   /* bdfn of device (for cfg ops) */
+   struct pci_controller *controller;
int pe_config_addr; /* PE config address*/
u32 config_space[16];   /* Saved PCI config space   */
int pcix_cap;   /* Saved PCIx capability*/
diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index cec2d6409515..72860de205a0 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -74,6 +74,8 @@ static inline const char *eeh_driver_name(struct pci_dev 
*pdev)
 
 #endif /* CONFIG_EEH */
 
+#define PCI_BUSNO(bdfn) ((bdfn >> 8) & 0xff)
+
 #else /* CONFIG_PCI */
 static inline void init_pci_config_tokens(void) { }
 #endif /* !CONFIG_PCI */
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index c4317c452d98..7370185c7a05 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -47,6 +47,8 @@ struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
/* Associate EEH device with OF node */
pdn->edev = edev;
edev->pdn = pdn;
+   edev->bdfn = (pdn->busno << 8) | pdn->devfn;
+   edev->controller = pdn->phb;
 
return edev;
 }
-- 
2.22.0.216.g00a2a96fc9

[PATCH v4 2/9] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag

2019-08-06 Thread Sam Bobroff

The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the
use of driver callbacks in drivers that have been bound part way
through the recovery process. This is necessary to prevent later stage
handlers from being called when the earlier stage handlers haven't,
which can be confusing for drivers.

However, the flag is set for all devices that are added after boot
time and only cleared at the end of the EEH recovery process. This
results in hot plugged devices erroneously having the flag set during
the first recovery after they are added (causing their driver's
handlers to be incorrectly ignored).

To remedy this, clear the flag at the beginning of recovery
processing. The flag is still cleared at the end of recovery
processing, although it is no longer really necessary.

Also clear the flag during eeh_handle_special_event(), for the same
reasons.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh_driver.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 6f3ee30565dd..d6f54840a3a9 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -819,6 +819,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
result = PCI_ERS_RESULT_DISCONNECT;
}
 
+   eeh_for_each_pe(pe, tmp_pe)
+   eeh_pe_for_each_dev(tmp_pe, edev, tmp)
+   edev->mode &= ~EEH_DEV_NO_HANDLER;
+
/* Walk the various device drivers attached to this slot through
 * a reset sequence, giving each an opportunity to do what it needs
 * to accomplish the reset.  Each child gets a report of the
@@ -1009,7 +1013,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
  */
 void eeh_handle_special_event(void)
 {
-   struct eeh_pe *pe, *phb_pe;
+   struct eeh_pe *pe, *phb_pe, *tmp_pe;
+   struct eeh_dev *edev, *tmp_edev;
struct pci_bus *bus;
struct pci_controller *hose;
unsigned long flags;
@@ -1078,6 +1083,10 @@ void eeh_handle_special_event(void)
(phb_pe->state & EEH_PE_RECOVERING))
continue;
 
+   eeh_for_each_pe(pe, tmp_pe)
+   eeh_pe_for_each_dev(tmp_pe, edev, 
tmp_edev)
+   edev->mode &= 
~EEH_DEV_NO_HANDLER;
+
/* Notify all devices to be down */
eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
eeh_set_channel_state(pe, 
pci_channel_io_perm_failure);
-- 
2.22.0.216.g00a2a96fc9

[PATCH v4 1/9] powerpc/64: Adjust order in pcibios_init()

2019-08-06 Thread Sam Bobroff

The pcibios_init() function for 64 bit PowerPC currently calls
pci_bus_add_devices() before pcibios_resource_survey(), which seems
incorrect because it adds devices and attempts to bind their drivers
before allocating their resources (although no problems seem to be
apparent).

So move the call to pci_bus_add_devices() to after
pcibios_resource_survey(), while extracting call to the
pcibios_fixup() hook so that it remains in the same location.

This will also allow the ppc_md.pcibios_bus_add_device() hooks to
perform actions that depend on PCI resources, both during rescanning
(where this is already the case) and at boot time, to support future
work.

Signed-off-by: Sam Bobroff 
Reviewed-by: Alexey Kardashevskiy 
---
 arch/powerpc/kernel/pci-common.c |  4 
 arch/powerpc/kernel/pci_32.c |  4 
 arch/powerpc/kernel/pci_64.c | 12 +---
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index f627e15bb43c..1c448cf25506 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1379,10 +1379,6 @@ void __init pcibios_resource_survey(void)
pr_debug("PCI: Assigning unassigned resources...\n");
pci_assign_unassigned_resources();
}
-
-   /* Call machine dependent fixup */
-   if (ppc_md.pcibios_fixup)
-   ppc_md.pcibios_fixup();
 }
 
 /* This is used by the PCI hotplug driver to allocate resource
diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c
index 50942a1d1a5f..b49e1060a3bf 100644
--- a/arch/powerpc/kernel/pci_32.c
+++ b/arch/powerpc/kernel/pci_32.c
@@ -263,6 +263,10 @@ static int __init pcibios_init(void)
/* Call common code to handle resource allocation */
pcibios_resource_survey();
 
+   /* Call machine dependent fixup */
+   if (ppc_md.pcibios_fixup)
+   ppc_md.pcibios_fixup();
+
/* Call machine dependent post-init code */
if (ppc_md.pcibios_after_init)
ppc_md.pcibios_after_init();
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index b7030b1189d0..f83d1f69b1dd 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -54,14 +54,20 @@ static int __init pcibios_init(void)
pci_add_flags(PCI_ENABLE_PROC_DOMAINS | PCI_COMPAT_DOMAIN_0);
 
/* Scan all of the recorded PCI controllers.  */
-   list_for_each_entry_safe(hose, tmp, _list, list_node) {
+   list_for_each_entry_safe(hose, tmp, _list, list_node)
pcibios_scan_phb(hose);
-   pci_bus_add_devices(hose->bus);
-   }
 
/* Call common code to handle resource allocation */
pcibios_resource_survey();
 
+   /* Add devices. */
+   list_for_each_entry_safe(hose, tmp, _list, list_node)
+   pci_bus_add_devices(hose->bus);
+
+   /* Call machine dependent fixup */
+   if (ppc_md.pcibios_fixup)
+   ppc_md.pcibios_fixup();
+
printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
 
return 0;
-- 
2.22.0.216.g00a2a96fc9

[PATCH v4 5/9] powerpc/eeh: EEH for pSeries hot plug

2019-08-06 Thread Sam Bobroff

On PowerNV and pSeries, devices currently acquire EEH support from
several different places: Boot-time devices from eeh_probe_devices()
and eeh_addr_cache_build(), Virtual Function devices from the pcibios
bus add device hooks and hot plugged devices from pci_hp_add_devices()
(with other platforms using other methods as well).  Unfortunately,
pSeries machines currently discover hot plugged devices using
pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do
not receive EEH support.

Rather than adding another case for pci_rescan_bus(), this change
widens the scope of the pcibios bus add device hooks so that they can
handle all devices. As a side effect this also supports devices
discovered after manually rescanning via /sys/bus/pci/rescan.

Note that on PowerNV, this change allows the EEH subsystem to become
enabled after boot as long as it has not been forced off, which was
not previously possible (it was already possible on pSeries).

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh.c|  2 +-
 arch/powerpc/kernel/of_platform.c|  3 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 39 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c | 54 ++--
 4 files changed, 56 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index ca8b0c58a6a7..87edac6f2fd9 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1272,7 +1272,7 @@ void eeh_add_device_late(struct pci_dev *dev)
struct pci_dn *pdn;
struct eeh_dev *edev;
 
-   if (!dev || !eeh_enabled())
+   if (!dev)
return;
 
pr_debug("EEH: Adding device %s\n", pci_name(dev));
diff --git a/arch/powerpc/kernel/of_platform.c 
b/arch/powerpc/kernel/of_platform.c
index 427fc22f72b6..11c807468ab5 100644
--- a/arch/powerpc/kernel/of_platform.c
+++ b/arch/powerpc/kernel/of_platform.c
@@ -81,7 +81,8 @@ static int of_pci_phb_probe(struct platform_device *dev)
pcibios_claim_one_bus(phb->bus);
 
/* Finish EEH setup */
-   eeh_add_device_tree_late(phb->bus);
+   if (!eeh_has_flag(EEH_FORCE_DISABLED))
+   eeh_add_device_tree_late(phb->bus);
 
/* Add probed PCI devices to the device model */
pci_bus_add_devices(phb->bus);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 629f9390d9af..77cc2f51c2ea 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -43,7 +43,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
 {
struct pci_dn *pdn = pci_get_pdn(pdev);
 
-   if (!pdev->is_virtfn)
+   if (eeh_has_flag(EEH_FORCE_DISABLED))
return;
 
pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
@@ -222,6 +222,25 @@ static const struct file_operations 
eeh_tree_state_debugfs_ops = {
 
 #endif /* CONFIG_DEBUG_FS */
 
+void pnv_eeh_enable_phbs(void)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+
+   list_for_each_entry(hose, _list, list_node) {
+   phb = hose->private_data;
+   /*
+* If EEH is enabled, we're going to rely on that.
+* Otherwise, we restore to conventional mechanism
+* to clear frozen PE during PCI config access.
+*/
+   if (eeh_enabled())
+   phb->flags |= PNV_PHB_FLAG_EEH;
+   else
+   phb->flags &= ~PNV_PHB_FLAG_EEH;
+   }
+}
+
 /**
  * pnv_eeh_post_init - EEH platform dependent post initialization
  *
@@ -260,19 +279,11 @@ int pnv_eeh_post_init(void)
if (!eeh_enabled())
disable_irq(eeh_event_irq);
 
+   pnv_eeh_enable_phbs();
+
list_for_each_entry(hose, _list, list_node) {
phb = hose->private_data;
 
-   /*
-* If EEH is enabled, we're going to rely on that.
-* Otherwise, we restore to conventional mechanism
-* to clear frozen PE during PCI config access.
-*/
-   if (eeh_enabled())
-   phb->flags |= PNV_PHB_FLAG_EEH;
-   else
-   phb->flags &= ~PNV_PHB_FLAG_EEH;
-
/* Create debugfs entries */
 #ifdef CONFIG_DEBUG_FS
if (phb->has_dbgfs || !phb->dbgfs)
@@ -483,7 +494,11 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 * Enable EEH explicitly so that we will do EEH check
 * while accessing I/O stuff
 */
-   eeh_add_flag(EEH_ENABLED);
+   if (!eeh_has_flag(EEH_ENABLED)) {
+   enable_irq(eeh_event_irq);
+   pnv_eeh_enable_phbs();
+   eeh_add_flag(EEH_ENABLED);
+   }
 
/* Save memory bars */
eeh_save_bars(edev);
diff --git

[PATCH v4 8/9] powerpc/eeh: Introduce EEH edev logging macros

2019-08-06 Thread Sam Bobroff

Now that struct eeh_dev includes the BDFN of it's PCI device, make use
of it to replace eeh_edev_info() with a set of dev_dbg()-style macros
that only need a struct edev.

With the BDFN available without the struct pci_dev, eeh_pci_name() is
now unnecessary, so remove it.

While only the "info" level function is used here, the others will be
used in followup work.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh.h   | 11 +++
 arch/powerpc/kernel/eeh_driver.c | 17 -
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index bbe0798f6624..e1023a556721 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -138,6 +138,17 @@ struct eeh_dev {
struct pci_dev *physfn; /* Associated SRIOV PF  */
 };
 
+/* "fmt" must be a simple literal string */
+#define EEH_EDEV_PRINT(level, edev, fmt, ...) \
+   pr_##level("PCI %04x:%02x:%02x.%x#%04x: EEH: " fmt, \
+   (edev)->controller->global_number, PCI_BUSNO((edev)->bdfn), \
+   PCI_SLOT((edev)->bdfn), PCI_FUNC((edev)->bdfn), \
+   ((edev)->pe ? (edev)->pe_config_addr : 0x), ##__VA_ARGS__)
+#define eeh_edev_dbg(edev, fmt, ...) EEH_EDEV_PRINT(debug, (edev), fmt, 
##__VA_ARGS__)
+#define eeh_edev_info(edev, fmt, ...) EEH_EDEV_PRINT(info, (edev), fmt, 
##__VA_ARGS__)
+#define eeh_edev_warn(edev, fmt, ...) EEH_EDEV_PRINT(warn, (edev), fmt, 
##__VA_ARGS__)
+#define eeh_edev_err(edev, fmt, ...) EEH_EDEV_PRINT(err, (edev), fmt, 
##__VA_ARGS__)
+
 static inline struct pci_dn *eeh_dev_to_pdn(struct eeh_dev *edev)
 {
return edev ? edev->pdn : NULL;
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index d6f54840a3a9..29424d5e5fea 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -82,23 +82,6 @@ static const char *pci_ers_result_name(enum pci_ers_result 
result)
}
 };
 
-static __printf(2, 3) void eeh_edev_info(const struct eeh_dev *edev,
-const char *fmt, ...)
-{
-   struct va_format vaf;
-   va_list args;
-
-   va_start(args, fmt);
-
-   vaf.fmt = fmt;
-   vaf.va = 
-
-   printk(KERN_INFO "EEH: PE#%x (PCI %s): %pV\n", edev->pe_config_addr,
-  edev->pdev ? dev_name(>pdev->dev) : "none", );
-
-   va_end(args);
-}
-
 static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
enum pci_ers_result new)
 {
-- 
2.22.0.216.g00a2a96fc9

[PATCH v4 0/9]

2019-08-06 Thread Sam Bobroff

Hi all,

Here is v4, with only a fix for a warning when CONFIG_IOV isn't defined.

Cover letter:

This patch set adds support for EEH recovery of hot plugged devices on pSeries
machines. Specifically, devices discovered by PCI rescanning using
/sys/bus/pci/rescan, which includes devices hotplugged by QEMU's device_add
command. (Upstream Linux pSeries guests running under QEMU/KVM don't currently
use slot power control for hotplugging.)

As a side effect this also provides EEH support for devices removed by
/sys/bus/pci/devices/*/remove and re-discovered by writing to 
/sys/bus/pci/rescan,
on all platforms.

The approach I've taken is to use the fact that the existing
pcibios_bus_add_device() platform hooks (which are used to set up EEH on
Virtual Function devices (VFs)) are actually called for all devices, so I've
widened their scope and made other adjustments necessary to allow them to work
for hotplugged and boot-time devices as well.

Because some of the changes are in generic PowerPC code, it's
possible that I've disturbed something for another PowerPC platform. I've tried
to minimize this by leaving that code alone as much as possible and so there
are a few cases where eeh_add_device_{early,late}() or eeh_add_sysfs_files() is
called more than once. I think these can be looked at later, as duplicate calls
are not harmful.

The first patch is a rework of the pcibios_init reordering patch I posted
earlier, which I've included here because it's necessary for this set.

I have done some testing for PowerNV on Power9 using a modified pnv_php module
and some testing on pSeries with slot power control using a modified rpaphp
module, and the EEH-related parts seem to work.

Cheers,
Sam.

Patch set changelog follows:

Patch set v4: 
Patch 1/9: powerpc/64: Adjust order in pcibios_init()
Patch 2/9: powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
Patch 3/9: powerpc/eeh: Improve debug messages around device addition
Patch 4/9: powerpc/eeh: Initialize EEH address cache earlier
Patch 5/9: powerpc/eeh: EEH for pSeries hot plug
Patch 6/9: powerpc/eeh: Refactor around eeh_probe_devices()
Patch 7/9: powerpc/eeh: Add bdfn field to eeh_dev
Patch 8/9: powerpc/eeh: Introduce EEH edev logging macros
Patch 9/9: powerpc/eeh: Convert log messages to eeh_edev_* macros
- Fixed compile warning when compiling without CONFIG_IOV.

Patch set v3: 
Patch 1/9: powerpc/64: Adjust order in pcibios_init()
Patch 2/9: powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
Patch 3/9: powerpc/eeh: Improve debug messages around device addition
Patch 4/9: powerpc/eeh: Initialize EEH address cache earlier
Patch 5/9: powerpc/eeh: EEH for pSeries hot plug
Patch 6/9: powerpc/eeh: Refactor around eeh_probe_devices()
Patch 7/9 (new in this version): powerpc/eeh: Add bdfn field to eeh_dev
Patch 8/9 (new in this version): powerpc/eeh: Introduce EEH edev logging macros
Patch 9/9 (new in this version): powerpc/eeh: Convert log messages to 
eeh_edev_* macros

Patch set v2: 
Patch 1/6: powerpc/64: Adjust order in pcibios_init()
Patch 2/6: powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
* Also clear EEH_DEV_NO_HANDLER in eeh_handle_special_event().
Patch 3/6 (was 4/8): powerpc/eeh: Improve debug messages around device addition
Patch 4/6 (was 6/8): powerpc/eeh: Initialize EEH address cache earlier
Patch 5/6 (was 3/8 and 7/8): powerpc/eeh: EEH for pSeries hot plug
- Dropped changes to the PowerNV PHB EEH flag, instead refactor just enough to
  use the existing flag from multiple places.
- Merge the little remaining work from the above change into the patch where
  it's used.
Patch 6/6 (was 5/8 and 8/8): powerpc/eeh: Refactor around eeh_probe_devices()
- As it's so small, merged the enablement message patch into this one (where 
it's used).
- Reworked enablement messages.

Patch set v1:
Patch 1/8: powerpc/64: Adjust order in pcibios_init()
Patch 2/8: powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
Patch 3/8: powerpc/eeh: Convert PNV_PHB_FLAG_EEH to global flag
Patch 4/8: powerpc/eeh: Improve debug messages around device addition
Patch 5/8: powerpc/eeh: Add eeh_show_enabled()
Patch 6/8: powerpc/eeh: Initialize EEH address cache earlier
Patch 7/8: powerpc/eeh: EEH for pSeries hot plug
Patch 8/8: powerpc/eeh: Remove eeh_probe_devices() and eeh_addr_cache_build()

Oliver O'Halloran (1):
  powerpc/eeh: Add bdfn field to eeh_dev

Sam Bobroff (8):
  powerpc/64: Adjust order in pcibios_init()
  powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
  powerpc/eeh: Improve debug messages around device addition
  powerpc/eeh: Initialize EEH address cache earlier
  powerpc/eeh: EEH for pSeries hot plug
  powerpc/eeh: Refactor around eeh_probe_devices()
  powerpc/eeh: Introduce EEH edev logging macros
  powerpc/eeh: Convert log messages to eeh_edev_* macros

 arch/powerpc/include/asm/eeh.h   | 21 --
 arch/powerpc/include/asm/ppc-pci.h   |  7 +-
 arch/powerpc/kernel/eeh.c| 50 ++
 arch/powerpc/kernel/eeh_cache.c  | 37

[PATCH v4 3/9] powerpc/eeh: Improve debug messages around device addition

2019-08-06 Thread Sam Bobroff

Also remove useless comment.

Signed-off-by: Sam Bobroff 
Reviewed-by: Alexey Kardashevskiy 
---
 arch/powerpc/kernel/eeh.c|  2 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 14 
 arch/powerpc/platforms/pseries/eeh_pseries.c | 23 +++-
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index d44533bba642..846cc697030c 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1278,7 +1278,7 @@ void eeh_add_device_late(struct pci_dev *dev)
pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
edev = pdn_to_eeh_dev(pdn);
if (edev->pdev == dev) {
-   pr_debug("EEH: Already referenced !\n");
+   pr_debug("EEH: Device %s already referenced!\n", pci_name(dev));
return;
}
 
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 1cd5ebd7299c..629f9390d9af 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -46,10 +46,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
if (!pdev->is_virtfn)
return;
 
-   /*
-* The following operations will fail if VF's sysfs files
-* aren't created or its resources aren't finalized.
-*/
+   pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
eeh_add_device_early(pdn);
eeh_add_device_late(pdev);
eeh_sysfs_add_device(pdev);
@@ -393,6 +390,10 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
int ret;
int config_addr = (pdn->busno << 8) | (pdn->devfn);
 
+   pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
+   __func__, hose->global_number, pdn->busno,
+   PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+
/*
 * When probing the root bridge, which doesn't have any
 * subordinate PCI devices. We don't have OF node for
@@ -487,6 +488,11 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
/* Save memory bars */
eeh_save_bars(edev);
 
+   pr_debug("%s: EEH enabled on %02x:%02x.%01x PHB#%x-PE#%x\n",
+   __func__, pdn->busno, PCI_SLOT(pdn->devfn),
+   PCI_FUNC(pdn->devfn), edev->pe->phb->global_number,
+   edev->pe->addr);
+
return NULL;
 }
 
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 07e3fc2667aa..31733f6d642c 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -52,6 +52,8 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
if (!pdev->is_virtfn)
return;
 
+   pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
+
pdn->device_id  =  pdev->device;
pdn->vendor_id  =  pdev->vendor;
pdn->class_code =  pdev->class;
@@ -238,6 +240,10 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
*data)
int enable = 0;
int ret;
 
+   pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
+   __func__, pdn->phb->global_number, pdn->busno,
+   PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+
/* Retrieve OF node and eeh device */
edev = pdn_to_eeh_dev(pdn);
if (!edev || edev->pe)
@@ -281,7 +287,12 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
*data)
 
/* Enable EEH on the device */
ret = eeh_ops->set_option(, EEH_OPT_ENABLE);
-   if (!ret) {
+   if (ret) {
+   pr_debug("%s: EEH failed to enable on %02x:%02x.%01x 
PHB#%x-PE#%x (code %d)\n",
+   __func__, pdn->busno, PCI_SLOT(pdn->devfn),
+   PCI_FUNC(pdn->devfn), pe.phb->global_number,
+   pe.addr, ret);
+   } else {
/* Retrieve PE address */
edev->pe_config_addr = eeh_ops->get_pe_addr();
pe.addr = edev->pe_config_addr;
@@ -297,11 +308,6 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
*data)
if (enable) {
eeh_add_flag(EEH_ENABLED);
eeh_add_to_parent_pe(edev);
-
-   pr_debug("%s: EEH enabled on %02x:%02x.%01x 
PHB#%x-PE#%x\n",
-   __func__, pdn->busno, PCI_SLOT(pdn->devfn),
-   PCI_FUNC(pdn->devfn), pe.phb->global_number,
-   pe.addr);
} else if (pdn->parent && pdn_to_eeh_dev(pdn->parent) &&
   (pdn_to_eeh_dev(pdn->parent))->pe) {
/* This device doesn't support EEH, but it may have an
@@ -310,6 +316,11 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
*data)
edev->pe_config_addr =

[PATCH v4 4/9] powerpc/eeh: Initialize EEH address cache earlier

2019-08-06 Thread Sam Bobroff

The EEH address cache is currently initialized and populated by a
single function: eeh_addr_cache_build().  While the initial population
of the cache can only be done once resources are allocated,
initialization (just setting up a spinlock) could be done much
earlier.

So move the initialization step into a separate function and call it
from a core_initcall (rather than a subsys initcall).

This will allow future work to make use of the cache during boot time
PCI scanning.

Signed-off-by: Sam Bobroff 
Reviewed-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/eeh.h  |  3 +++
 arch/powerpc/kernel/eeh.c   |  2 ++
 arch/powerpc/kernel/eeh_cache.c | 13 +++--
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 45c9b26e3cce..20105964287a 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -275,6 +275,7 @@ int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 int eeh_check_failure(const volatile void __iomem *token);
 int eeh_dev_check_failure(struct eeh_dev *edev);
+void eeh_addr_cache_init(void);
 void eeh_addr_cache_build(void);
 void eeh_add_device_early(struct pci_dn *);
 void eeh_add_device_tree_early(struct pci_dn *);
@@ -335,6 +336,8 @@ static inline int eeh_check_failure(const volatile void 
__iomem *token)
 
 #define eeh_dev_check_failure(x) (0)
 
+static inline void eeh_addr_cache_init(void) { }
+
 static inline void eeh_addr_cache_build(void) { }
 
 static inline void eeh_add_device_early(struct pci_dn *pdn) { }
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 846cc697030c..ca8b0c58a6a7 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1200,6 +1200,8 @@ static int eeh_init(void)
list_for_each_entry_safe(hose, tmp, _list, list_node)
eeh_dev_phb_init_dynamic(hose);
 
+   eeh_addr_cache_init();
+
/* Initialize EEH event */
return eeh_event_init();
 }
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 320472373122..a790fa49c62d 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -254,6 +254,17 @@ void eeh_addr_cache_rmv_dev(struct pci_dev *dev)
spin_unlock_irqrestore(_io_addr_cache_root.piar_lock, flags);
 }
 
+/**
+ * eeh_addr_cache_init - Initialize a cache of I/O addresses
+ *
+ * Initialize a cache of pci i/o addresses.  This cache will be used to
+ * find the pci device that corresponds to a given address.
+ */
+void eeh_addr_cache_init(void)
+{
+   spin_lock_init(_io_addr_cache_root.piar_lock);
+}
+
 /**
  * eeh_addr_cache_build - Build a cache of I/O addresses
  *
@@ -269,8 +280,6 @@ void eeh_addr_cache_build(void)
struct eeh_dev *edev;
struct pci_dev *dev = NULL;
 
-   spin_lock_init(_io_addr_cache_root.piar_lock);
-
for_each_pci_dev(dev) {
pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
if (!pdn)
-- 
2.22.0.216.g00a2a96fc9

Re: [PATCH v4 09/10] powerpc/fsl_booke/kaslr: support nokaslr cmdline parameter

2019-08-06 Thread Jason Yan





On 2019/8/6 15:59, Christophe Leroy wrote:



Le 05/08/2019 à 08:43, Jason Yan a écrit :

One may want to disable kaslr when boot, so provide a cmdline parameter
'nokaslr' to support this.

Signed-off-by: Jason Yan 
Cc: Diana Craciun 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Nicholas Piggin 
Cc: Kees Cook 
Reviewed-by: Diana Craciun 
Tested-by: Diana Craciun 


Reviewed-by: Christophe Leroy 

Tiny comment below.


---
  arch/powerpc/kernel/kaslr_booke.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/kernel/kaslr_booke.c 
b/arch/powerpc/kernel/kaslr_booke.c

index 4b3f19a663fc..7c3cb41e7122 100644
--- a/arch/powerpc/kernel/kaslr_booke.c
+++ b/arch/powerpc/kernel/kaslr_booke.c
@@ -361,6 +361,18 @@ static unsigned long __init 
kaslr_choose_location(void *dt_ptr, phys_addr_t size

  return kaslr_offset;
  }
+static inline __init bool kaslr_disabled(void)
+{
+    char *str;
+
+    str = strstr(boot_command_line, "nokaslr");
+    if ((str == boot_command_line) ||
+    (str > boot_command_line && *(str - 1) == ' '))
+    return true;


I don't think additional () are needed for the left part 'str == 
boot_command_line'




Agree.


+
+    return false;
+}
+
  /*
   * To see if we need to relocate the kernel to a random offset
   * void *dt_ptr - address of the device tree
@@ -376,6 +388,8 @@ notrace void __init kaslr_early_init(void *dt_ptr, 
phys_addr_t size)

  kernel_sz = (unsigned long)_end - KERNELBASE;
  kaslr_get_cmdline(dt_ptr);
+    if (kaslr_disabled())
+    return;
  offset = kaslr_choose_location(dt_ptr, size, kernel_sz);



.

Re: [PATCH v4 07/10] powerpc/fsl_booke/32: randomize the kernel image offset

2019-08-06 Thread Jason Yan





On 2019/8/6 15:56, Christophe Leroy wrote:



Le 05/08/2019 à 08:43, Jason Yan a écrit :

After we have the basic support of relocate the kernel in some
appropriate place, we can start to randomize the offset now.

Entropy is derived from the banner and timer, which will change every
build and boot. This not so much safe so additionally the bootloader may
pass entropy via the /chosen/kaslr-seed node in device tree.

We will use the first 512M of the low memory to randomize the kernel
image. The memory will be split in 64M zones. We will use the lower 8
bit of the entropy to decide the index of the 64M zone. Then we chose a
16K aligned offset inside the 64M zone to put the kernel in.

 KERNELBASE

 |-->   64M   <--|
 |   |
 +---+    ++---+
 |   ||    |kernel|    |   |
 +---+    ++---+
 | |
 |->   offset    <-|

   kimage_vaddr

We also check if we will overlap with some areas like the dtb area, the
initrd area or the crashkernel area. If we cannot find a proper area,
kaslr will be disabled and boot from the original kernel.

Signed-off-by: Jason Yan 
Cc: Diana Craciun 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Nicholas Piggin 
Cc: Kees Cook 
Reviewed-by: Diana Craciun 
Tested-by: Diana Craciun 


Reviewed-by: Christophe Leroy 



Thanks for your help,


One small comment below


---
  arch/powerpc/kernel/kaslr_booke.c | 322 +-
  1 file changed, 320 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kaslr_booke.c 
b/arch/powerpc/kernel/kaslr_booke.c

index 30f84c0321b2..97250cad71de 100644
--- a/arch/powerpc/kernel/kaslr_booke.c
+++ b/arch/powerpc/kernel/kaslr_booke.c
@@ -23,6 +23,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  #include 
  #include 
@@ -34,15 +36,329 @@
  #include 
  #include 
  #include 
+#include 
  #include 
+#include 
+#include 
+
+#ifdef DEBUG
+#define DBG(fmt...) printk(KERN_ERR fmt)
+#else
+#define DBG(fmt...)
+#endif
+
+struct regions {
+    unsigned long pa_start;
+    unsigned long pa_end;
+    unsigned long kernel_size;
+    unsigned long dtb_start;
+    unsigned long dtb_end;
+    unsigned long initrd_start;
+    unsigned long initrd_end;
+    unsigned long crash_start;
+    unsigned long crash_end;
+    int reserved_mem;
+    int reserved_mem_addr_cells;
+    int reserved_mem_size_cells;
+};
  extern int is_second_reloc;
+/* Simplified build-specific string for starting entropy. */
+static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
+    LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
+
+static __init void kaslr_get_cmdline(void *fdt)
+{
+    int node = fdt_path_offset(fdt, "/chosen");
+
+    early_init_dt_scan_chosen(node, "chosen", 1, boot_command_line);
+}
+
+static unsigned long __init rotate_xor(unsigned long hash, const void 
*area,

+   size_t size)
+{
+    size_t i;
+    unsigned long *ptr = (unsigned long *)area;


As area is a void *, this cast shouldn't be necessary. Or maybe it is 
necessary because it discards the const ?




It's true the cast is not necessary. The ptr can be made const and then 
remove the cast.



Christophe

Re: [PATCH] powerpc/fadump: sysfs for fadump memory reservation

2019-08-06 Thread Michael Ellerman

Sourabh Jain  writes:
> Add a sys interface to allow querying the memory reserved by fadump
> for saving the crash dump.
>
> Signed-off-by: Sourabh Jain 
> ---
>  Documentation/powerpc/firmware-assisted-dump.rst |  5 +
>  arch/powerpc/kernel/fadump.c | 14 ++
>  2 files changed, 19 insertions(+)
>
> diff --git a/Documentation/powerpc/firmware-assisted-dump.rst 
> b/Documentation/powerpc/firmware-assisted-dump.rst
> index 9ca12830a48e..4a7f6dc556f5 100644
> --- a/Documentation/powerpc/firmware-assisted-dump.rst
> +++ b/Documentation/powerpc/firmware-assisted-dump.rst
> @@ -222,6 +222,11 @@ Here is the list of files under kernel sysfs:
>  be handled and vmcore will not be captured. This interface can be
>  easily integrated with kdump service start/stop.
>  
> +/sys/kernel/fadump_mem_reserved
> +
> +   This is used to display the memory reserved by fadump for saving the
> +   crash dump.
> +
>   /sys/kernel/fadump_release_mem
>  This file is available only when fadump is active during
>  second kernel. This is used to release the reserved memory

Dumping these in /sys/kernel is pretty gross, but I guess that ship has
sailed.

But please add it to Documentation/ABI, and Cc the appropriate lists/people.

cheers

> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 4eab97292cc2..70d49013ebec 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -1514,6 +1514,13 @@ static ssize_t fadump_enabled_show(struct kobject 
> *kobj,
>   return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
>  }
>  
> +static ssize_t fadump_mem_reserved_show(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + char *buf)
> +{
> + return sprintf(buf, "%ld\n", fw_dump.reserve_dump_area_size);
> +}
> +
>  static ssize_t fadump_register_show(struct kobject *kobj,
>   struct kobj_attribute *attr,
>   char *buf)
> @@ -1632,6 +1639,9 @@ static struct kobj_attribute fadump_attr = 
> __ATTR(fadump_enabled,
>  static struct kobj_attribute fadump_register_attr = __ATTR(fadump_registered,
>   0644, fadump_register_show,
>   fadump_register_store);
> +static struct kobj_attribute fadump_mem_reserved_attr =
> + __ATTR(fadump_mem_reserved, 0444,
> + fadump_mem_reserved_show, NULL);
>  
>  DEFINE_SHOW_ATTRIBUTE(fadump_region);
>  
> @@ -1663,6 +1673,10 @@ static void fadump_init_files(void)
>   printk(KERN_ERR "fadump: unable to create sysfs file"
>   " fadump_release_mem (%d)\n", rc);
>   }
> + rc = sysfs_create_file(kernel_kobj, _mem_reserved_attr.attr);
> + if (rc)
> + pr_err("unable to create sysfs file fadump_mem_reserved (%d)\n",
> + rc);
>   return;
>  }
>  
> -- 
> 2.17.2

Re: [PATCH v3 13/16] powerpc/pseries/iommu: Don't use dma_iommu_ops on secure guests

2019-08-06 Thread Thiago Jung Bauermann



Hello Christoph,

Thanks for your review.

Christoph Hellwig  writes:

> On Tue, Aug 06, 2019 at 02:22:34AM -0300, Thiago Jung Bauermann wrote:
>> @@ -1318,7 +1319,10 @@ void iommu_init_early_pSeries(void)
>>  of_reconfig_notifier_register(_reconfig_nb);
>>  register_memory_notifier(_mem_nb);
>>  
>> -set_pci_dma_ops(_iommu_ops);
>> +if (is_secure_guest())
>> +set_pci_dma_ops(NULL);
>> +else
>> +set_pci_dma_ops(_iommu_ops);
>
> Shoudn't:
>
>   if (!is_secure_guest())
>   set_pci_dma_ops(_iommu_ops);
>
> be enough here, given that NULL is the default?

Indeed, it is enough.

> Also either way I think this conditional needs a comment explaining
> why it is there.

Good point. I added the commit message as a comment in the code.
New version of this patch below.


>From 5dc3914efa4765eef2869d554d4ea1c676bb1e75 Mon Sep 17 00:00:00 2001
From: Thiago Jung Bauermann 
Date: Thu, 24 Jan 2019 22:40:16 -0200
Subject: [PATCH] powerpc/pseries/iommu: Don't use dma_iommu_ops on secure
 guests

Secure guest memory is inacessible to devices so regular DMA isn't
possible.

In that case set devices' dma_map_ops to NULL so that the generic
DMA code path will use SWIOTLB to bounce buffers for DMA.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/platforms/pseries/iommu.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 889dc2e44b89..8d9c2b17ad54 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pseries.h"
 
@@ -1318,7 +1319,15 @@ void iommu_init_early_pSeries(void)
of_reconfig_notifier_register(_reconfig_nb);
register_memory_notifier(_mem_nb);
 
-   set_pci_dma_ops(_iommu_ops);
+   /*
+* Secure guest memory is inacessible to devices so regular DMA isn't
+* possible.
+*
+* In that case keep devices' dma_map_ops as NULL so that the generic
+* DMA code path will use SWIOTLB to bounce buffers for DMA.
+*/
+   if (!is_secure_guest())
+   set_pci_dma_ops(_iommu_ops);
 }
 
 static int __init disable_multitce(char *str)

[PATCH v3 38/41] powerpc: convert put_page() to put_user_page*()

2019-08-06 Thread john . hubbard

From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Note that this effectively changes the code's behavior in
mm_iommu_unpin(): it now ultimately calls set_page_dirty_lock(),
instead of set_page_dirty(). This is probably more accurate.

As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [1]

[1] https://lore.kernel.org/r/20190723153640.gb...@lst.de

Cc: Benjamin Herrenschmidt 
Cc: Christoph Hellwig 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: John Hubbard 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c|  4 ++--
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 19 ++-
 arch/powerpc/kvm/e500_mmu.c|  3 +--
 arch/powerpc/mm/book3s64/iommu_api.c   | 11 +--
 4 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 9a75f0e1933b..18646b738ce1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -731,7 +731,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 * we have to drop the reference on the correct tail
 * page to match the get inside gup()
 */
-   put_page(pages[0]);
+   put_user_page(pages[0]);
}
return ret;
 
@@ -1206,7 +1206,7 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, 
unsigned long gpa,
unsigned long gfn;
int srcu_idx;
 
-   put_page(page);
+   put_user_page(page);
 
if (!dirty)
return;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 2d415c36a61d..f53273fbfa2d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -821,8 +821,12 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 */
if (!ptep) {
local_irq_enable();
-   if (page)
-   put_page(page);
+   if (page) {
+   if (upgrade_write)
+   put_user_page(page);
+   else
+   put_page(page);
+   }
return RESUME_GUEST;
}
pte = *ptep;
@@ -870,9 +874,14 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
*levelp = level;
 
if (page) {
-   if (!ret && (pte_val(pte) & _PAGE_WRITE))
-   set_page_dirty_lock(page);
-   put_page(page);
+   bool dirty = !ret && (pte_val(pte) & _PAGE_WRITE);
+   if (upgrade_write)
+   put_user_pages_dirty_lock(, 1, dirty);
+   else {
+   if (dirty)
+   set_page_dirty_lock(page);
+   put_page(page);
+   }
}
 
/* Increment number of large pages if we (successfully) inserted one */
diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
index 2d910b87e441..67bb8d59d4b1 100644
--- a/arch/powerpc/kvm/e500_mmu.c
+++ b/arch/powerpc/kvm/e500_mmu.c
@@ -850,8 +850,7 @@ int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
  free_privs_first:
kfree(privs[0]);
  put_pages:
-   for (i = 0; i < num_pages; i++)
-   put_page(pages[i]);
+   put_user_pages(pages, num_pages);
  free_pages:
kfree(pages);
return ret;
diff --git a/arch/powerpc/mm/book3s64/iommu_api.c 
b/arch/powerpc/mm/book3s64/iommu_api.c
index b056cae3388b..e126193ba295 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -170,9 +170,8 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, 
unsigned long ua,
return 0;
 
 free_exit:
-   /* free the reference taken */
-   for (i = 0; i < pinned; i++)
-   put_page(mem->hpages[i]);
+   /* free the references taken */
+   put_user_pages(mem->hpages, pinned);
 
vfree(mem->hpas);
kfree(mem);
@@ -203,6 +202,7 @@ static void mm_iommu_unpin(struct 
mm_iommu_table_group_mem_t *mem)
 {
long i;
struct page *page = NULL;
+   bool dirty = false;
 
if (!mem->hpas)
return;
@@ -215,10 +215,9 @@ static void mm_iommu_unpin(struct 
mm_iommu_table_group_mem_t *mem)
if (!page)
continue;
 
-   if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY)
-   SetPageDirty(page);
+   dirty = mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY;

Re: [PATCH v2 0/3] arm/arm64: Add support for function error injection

2019-08-06 Thread Leo Yan

On Wed, Aug 07, 2019 at 09:08:11AM +0900, Masami Hiramatsu wrote:
> On Tue,  6 Aug 2019 18:00:12 +0800
> Leo Yan  wrote:
> 
> > This small patch set is to add support for function error injection;
> > this can be used to eanble more advanced debugging feature, e.g.
> > CONFIG_BPF_KPROBE_OVERRIDE.
> > 
> > The patch 01/03 is to consolidate the function definition which can be
> > suared cross architectures, patches 02,03/03 are used for enabling
> > function error injection on arm64 and arm architecture respectively.
> > 
> > I tested on arm64 platform Juno-r2 and one of my laptop with x86
> > architecture with below steps; I don't test for Arm architecture so
> > only pass compilation.
> > 
> > - Enable kernel configuration:
> >   CONFIG_BPF_KPROBE_OVERRIDE
> >   CONFIG_BTRFS_FS
> >   CONFIG_BPF_EVENTS=y
> >   CONFIG_KPROBES=y
> >   CONFIG_KPROBE_EVENTS=y
> >   CONFIG_BPF_KPROBE_OVERRIDE=y
> > 
> > - Build samples/bpf on with Debian rootFS:
> >   # cd $kernel
> >   # make headers_install
> >   # make samples/bpf/ LLC=llc-7 CLANG=clang-7
> > 
> > - Run the sample tracex7:
> >   # dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
> >   # DEVICE=$(losetup --show -f testfile.img)
> >   # mkfs.btrfs -f $DEVICE
> >   # ./tracex7 testfile.img
> >   [ 1975.211781] BTRFS error (device (efault)): open_ctree failed
> >   mount: /mnt/linux-kernel/linux-cs-dev/samples/bpf/tmpmnt: mount(2) system 
> > call failed: Cannot allocate memory.
> > 
> > Changes from v1:
> > * Consolidated the function definition into asm-generic header (Will);
> > * Used APIs to access pt_regs elements (Will);
> > * Fixed typos in the comments (Will).
> 
> This looks good to me.
> 
> Reviewed-by: Masami Hiramatsu 
> 
> Thank you!

Thanks a lot for reviewing, Masami.

Leo.

Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-06 Thread Chris Packham

On Wed, 2019-08-07 at 11:13 +1000, Michael Ellerman wrote:
> Chris Packham  writes:
> > 
> > On Tue, 2019-08-06 at 21:32 +1000, Michael Ellerman wrote:
> > > 
> > > Chris Packham  writes:
> > > > 
> > > > On Mon, 2019-08-05 at 14:06 +1200, Chris Packham wrote:
> > > > > 
> > > > > 
> > > > > Hi All,
> > > > > 
> > > > > I have a custom board that uses the Freescale/NXP T2080 SoC.
> > > > > 
> > > > > The board boots fine using v4.19.60 but when I use v5.1.21 it
> > > > > locks
> > > > > up
> > > > > waiting for the other CPUs to come online (earlyprintk output
> > > > > below).
> > > > > If I set maxcpus=0 then the system boots all the way through
> > > > > to
> > > > > userland. The same thing happens with 5.3-rc2.
> > > > > 
> > > > > The defconfig I'm using is 
> > > > > https://gist.github.com/cpackham/f24d0b426f3
> > > > > de0eaaba17b82c3528a9d it was updated from the working
> > > > > v4.19.60
> > > > > defconfig using make olddefconfig.
> > > > > 
> > > > > Does this ring any bells for anyone?
> > > > > 
> > > > > I haven't dug into the differences between the working an
> > > > > non-
> > > > > working
> > > > > versions yet. I'll start looking now.
> > > > I've bisected this to the following commit
> > > Thanks that's super helpful.
> > > 
> > > > 
> > > > 
> > > > commit ed1cd6deb013a11959d17a94e35ce159197632da
> > > > Author: Christophe Leroy 
> > > > Date:   Thu Jan 31 10:08:58 2019 +
> > > > 
> > > > powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
> > > > 
> > > > This patch activates CONFIG_THREAD_INFO_IN_TASK which
> > > > moves the thread_info into task_struct.
> > > > 
> > > > I'll be the first to admit this is well beyond my area of
> > > > knowledge
> > > > so
> > > > I'm unsure what about this patch is problematic but I can be
> > > > fairly
> > > > sure that a build immediately before this patch works while a
> > > > build
> > > > with this patch hangs.
> > > It makes a pretty fundamental change to the way the kernel stores
> > > some
> > > information about each task, moving it off the stack and into the
> > > task
> > > struct.
> > > 
> > > It definitely has the potential to break things, but I thought we
> > > had
> > > reasonable test coverage of the Book3E platforms, I have a
> > > p5020ds
> > > (e5500) that I boot as part of my CI.
> > > 
> > > Aha. If I take your config and try to boot it on my p5020ds I get
> > > the
> > > same behaviour, stuck at SMP bringup. So it seems it's something
> > > in
> > > your
> > > config vs corenet64_smp_defconfig that is triggering the bug.
> > > 
> > > Can you try bisecting what in the config triggers it?
> > > 
> > > To do that you checkout ed1cd6deb013a11959d17a94e35ce159197632da,
> > > then
> > > you build/boot with corenet64_smp_defconfig to confirm it works.
> > > Then
> > > you use tools/testing/ktest/config-bisect.pl to bisect the
> > > changes in
> > > the .config.
> > > 
> > The difference between a working and non working defconfig is
> > CONFIG_PREEMPT specifically CONFIG_PREEMPT=y makes my system hang
> > at
> > boot.
> > 
> > Is that now intentionally prohibited on 64-bit powerpc?
> It's not prohibitied, but it probably should be because no one really
> tests it properly. I have a handful of IBM machines where I boot a
> PREEMPT kernel but that's about it.
> 
> The corenet configs don't have PREEMPT enabled, which suggests it was
> never really supported on those machines.
> 
> But maybe someone from NXP can tell me otherwise.
> 

I think our workloads need CONFIG_PREEMPT=y because our systems have
switch ASIC drivers implemented in userland and we need to be able to
react quickly to network events in order to prevent loops. We have seen
instances of this not happening simply because some other process is in
the middle of a syscall.

One thing I am working on here is a setup with a few vendor boards and
some of our own kit that we can test the upstream kernels on. Hopefully
that'd make these kinds of reports more timely rather than just
whenever we decide to move to a new kernel version.

Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-06 Thread Michael Ellerman

Chris Packham  writes:
> On Tue, 2019-08-06 at 21:32 +1000, Michael Ellerman wrote:
>> Chris Packham  writes:
>> > On Mon, 2019-08-05 at 14:06 +1200, Chris Packham wrote:
>> > > 
>> > > Hi All,
>> > > 
>> > > I have a custom board that uses the Freescale/NXP T2080 SoC.
>> > > 
>> > > The board boots fine using v4.19.60 but when I use v5.1.21 it
>> > > locks
>> > > up
>> > > waiting for the other CPUs to come online (earlyprintk output
>> > > below).
>> > > If I set maxcpus=0 then the system boots all the way through to
>> > > userland. The same thing happens with 5.3-rc2.
>> > > 
>> > > The defconfig I'm using is 
>> > > https://gist.github.com/cpackham/f24d0b426f3
>> > > de0eaaba17b82c3528a9d it was updated from the working v4.19.60
>> > > defconfig using make olddefconfig.
>> > > 
>> > > Does this ring any bells for anyone?
>> > > 
>> > > I haven't dug into the differences between the working an non-
>> > > working
>> > > versions yet. I'll start looking now.
>> > I've bisected this to the following commit
>> Thanks that's super helpful.
>> 
>> > 
>> > commit ed1cd6deb013a11959d17a94e35ce159197632da
>> > Author: Christophe Leroy 
>> > Date:   Thu Jan 31 10:08:58 2019 +
>> > 
>> > powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
>> > 
>> > This patch activates CONFIG_THREAD_INFO_IN_TASK which
>> > moves the thread_info into task_struct.
>> > 
>> > I'll be the first to admit this is well beyond my area of knowledge
>> > so
>> > I'm unsure what about this patch is problematic but I can be fairly
>> > sure that a build immediately before this patch works while a build
>> > with this patch hangs.
>> It makes a pretty fundamental change to the way the kernel stores
>> some
>> information about each task, moving it off the stack and into the
>> task
>> struct.
>> 
>> It definitely has the potential to break things, but I thought we had
>> reasonable test coverage of the Book3E platforms, I have a p5020ds
>> (e5500) that I boot as part of my CI.
>> 
>> Aha. If I take your config and try to boot it on my p5020ds I get the
>> same behaviour, stuck at SMP bringup. So it seems it's something in
>> your
>> config vs corenet64_smp_defconfig that is triggering the bug.
>> 
>> Can you try bisecting what in the config triggers it?
>> 
>> To do that you checkout ed1cd6deb013a11959d17a94e35ce159197632da,
>> then
>> you build/boot with corenet64_smp_defconfig to confirm it works. Then
>> you use tools/testing/ktest/config-bisect.pl to bisect the changes in
>> the .config.
>> 
>
> The difference between a working and non working defconfig is
> CONFIG_PREEMPT specifically CONFIG_PREEMPT=y makes my system hang at
> boot.
>
> Is that now intentionally prohibited on 64-bit powerpc?

It's not prohibitied, but it probably should be because no one really
tests it properly. I have a handful of IBM machines where I boot a
PREEMPT kernel but that's about it.

The corenet configs don't have PREEMPT enabled, which suggests it was
never really supported on those machines.

But maybe someone from NXP can tell me otherwise.

cheers

Re: [PATCH v2 0/3] arm/arm64: Add support for function error injection

2019-08-06 Thread Masami Hiramatsu

On Tue,  6 Aug 2019 18:00:12 +0800
Leo Yan  wrote:

> This small patch set is to add support for function error injection;
> this can be used to eanble more advanced debugging feature, e.g.
> CONFIG_BPF_KPROBE_OVERRIDE.
> 
> The patch 01/03 is to consolidate the function definition which can be
> suared cross architectures, patches 02,03/03 are used for enabling
> function error injection on arm64 and arm architecture respectively.
> 
> I tested on arm64 platform Juno-r2 and one of my laptop with x86
> architecture with below steps; I don't test for Arm architecture so
> only pass compilation.
> 
> - Enable kernel configuration:
>   CONFIG_BPF_KPROBE_OVERRIDE
>   CONFIG_BTRFS_FS
>   CONFIG_BPF_EVENTS=y
>   CONFIG_KPROBES=y
>   CONFIG_KPROBE_EVENTS=y
>   CONFIG_BPF_KPROBE_OVERRIDE=y
> 
> - Build samples/bpf on with Debian rootFS:
>   # cd $kernel
>   # make headers_install
>   # make samples/bpf/ LLC=llc-7 CLANG=clang-7
> 
> - Run the sample tracex7:
>   # dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
>   # DEVICE=$(losetup --show -f testfile.img)
>   # mkfs.btrfs -f $DEVICE
>   # ./tracex7 testfile.img
>   [ 1975.211781] BTRFS error (device (efault)): open_ctree failed
>   mount: /mnt/linux-kernel/linux-cs-dev/samples/bpf/tmpmnt: mount(2) system 
> call failed: Cannot allocate memory.
> 
> Changes from v1:
> * Consolidated the function definition into asm-generic header (Will);
> * Used APIs to access pt_regs elements (Will);
> * Fixed typos in the comments (Will).

This looks good to me.

Reviewed-by: Masami Hiramatsu 

Thank you!

> 
> 
> Leo Yan (3):
>   error-injection: Consolidate override function definition
>   arm64: Add support for function error injection
>   arm: Add support for function error injection
> 
>  arch/arm/Kconfig   |  1 +
>  arch/arm/include/asm/ptrace.h  |  5 +
>  arch/arm/lib/Makefile  |  2 ++
>  arch/arm/lib/error-inject.c| 19 +++
>  arch/arm64/Kconfig |  1 +
>  arch/arm64/include/asm/ptrace.h|  5 +
>  arch/arm64/lib/Makefile|  2 ++
>  arch/arm64/lib/error-inject.c  | 18 ++
>  arch/powerpc/include/asm/error-injection.h | 13 -
>  arch/x86/include/asm/error-injection.h | 13 -
>  include/asm-generic/error-injection.h  |  6 ++
>  include/linux/error-injection.h|  6 +++---
>  12 files changed, 62 insertions(+), 29 deletions(-)
>  create mode 100644 arch/arm/lib/error-inject.c
>  create mode 100644 arch/arm64/lib/error-inject.c
>  delete mode 100644 arch/powerpc/include/asm/error-injection.h
>  delete mode 100644 arch/x86/include/asm/error-injection.h
> 
> -- 
> 2.17.1
> 


-- 
Masami Hiramatsu

Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-06 Thread Chris Packham

On Tue, 2019-08-06 at 21:32 +1000, Michael Ellerman wrote:
> Chris Packham  writes:
> > 
> > On Mon, 2019-08-05 at 14:06 +1200, Chris Packham wrote:
> > > 
> > > Hi All,
> > > 
> > > I have a custom board that uses the Freescale/NXP T2080 SoC.
> > > 
> > > The board boots fine using v4.19.60 but when I use v5.1.21 it
> > > locks
> > > up
> > > waiting for the other CPUs to come online (earlyprintk output
> > > below).
> > > If I set maxcpus=0 then the system boots all the way through to
> > > userland. The same thing happens with 5.3-rc2.
> > > 
> > > The defconfig I'm using is 
> > > https://gist.github.com/cpackham/f24d0b426f3
> > > de0eaaba17b82c3528a9d it was updated from the working v4.19.60
> > > defconfig using make olddefconfig.
> > > 
> > > Does this ring any bells for anyone?
> > > 
> > > I haven't dug into the differences between the working an non-
> > > working
> > > versions yet. I'll start looking now.
> > I've bisected this to the following commit
> Thanks that's super helpful.
> 
> > 
> > commit ed1cd6deb013a11959d17a94e35ce159197632da
> > Author: Christophe Leroy 
> > Date:   Thu Jan 31 10:08:58 2019 +
> > 
> > powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
> > 
> > This patch activates CONFIG_THREAD_INFO_IN_TASK which
> > moves the thread_info into task_struct.
> > 
> > I'll be the first to admit this is well beyond my area of knowledge
> > so
> > I'm unsure what about this patch is problematic but I can be fairly
> > sure that a build immediately before this patch works while a build
> > with this patch hangs.
> It makes a pretty fundamental change to the way the kernel stores
> some
> information about each task, moving it off the stack and into the
> task
> struct.
> 
> It definitely has the potential to break things, but I thought we had
> reasonable test coverage of the Book3E platforms, I have a p5020ds
> (e5500) that I boot as part of my CI.
> 
> Aha. If I take your config and try to boot it on my p5020ds I get the
> same behaviour, stuck at SMP bringup. So it seems it's something in
> your
> config vs corenet64_smp_defconfig that is triggering the bug.
> 
> Can you try bisecting what in the config triggers it?
> 
> To do that you checkout ed1cd6deb013a11959d17a94e35ce159197632da,
> then
> you build/boot with corenet64_smp_defconfig to confirm it works. Then
> you use tools/testing/ktest/config-bisect.pl to bisect the changes in
> the .config.
> 
> cheers
> 

The difference between a working and non working defconfig is
CONFIG_PREEMPT specifically CONFIG_PREEMPT=y makes my system hang at
boot.

Is that now intentionally prohibited on 64-bit powerpc?

> > 
> > > 
> > > Booting...
> > > MMU: Supported page sizes
> > >  4 KB as direct
> > >   2048 KB as direct & indirect
> > >   4096 KB as direct
> > >  16384 KB as direct
> > >  65536 KB as direct
> > > 262144 KB as direct
> > >    1048576 KB as direct
> > > MMU: Book3E HW tablewalk enabled
> > > Linux version 5.1.21-at1+ (@chrisp-dl) (gcc version 4.9.3
> > > (crosstool-
> > > NG 
> > > crosstool-ng-1.22.0)) #24 SMP PREEMPT Mon Aug 5 01:42:00 UTC 2019
> > > Found initrd at 0xc0002f045000:0xc0003000
> > > Using CoreNet Generic machine description
> > > Found legacy serial port 0 for /soc@ffe00/serial@11c500
> > >   mem=ffe11c500, taddr=ffe11c500, irq=0, clk=3, speed=0
> > > Found legacy serial port 1 for /soc@ffe00/serial@11c600
> > >   mem=ffe11c600, taddr=ffe11c600, irq=0, clk=3, speed=0
> > > Found legacy serial port 2 for /soc@ffe00/serial@11d500
> > >   mem=ffe11d500, taddr=ffe11d500, irq=0, clk=3, speed=0
> > > Found legacy serial port 3 for /soc@ffe00/serial@11d600
> > >   mem=ffe11d600, taddr=ffe11d600, irq=0, clk=3, speed=0
> > > printk: bootconsole [udbg0] enabled
> > > CPU maps initialized for 2 threads per core
> > >  (thread shift is 1)
> > > Allocated 1856 bytes for 8 pacas
> > > -
> > > phys_mem_size = 0x1
> > > dcache_bsize  = 0x40
> > > icache_bsize  = 0x40
> > > cpu_features  = 0x0003009003b6
> > >   possible= 0x0003009003b6
> > >   always  = 0x0003008003b4
> > > cpu_user_features = 0xdc008000 0x0800
> > > mmu_features  = 0x000a0010
> > > firmware_features = 0x
> > > -
> > > CoreNet Generic board
> > > barrier-nospec: using isync; sync as speculation barrier
> > > barrier-nospec: patched 412 locations
> > > Top of RAM: 0x1, Total RAM: 0x1
> > > Memory hole size: 0MB
> > > Zone ranges:
> > >   DMA  [mem 0x-0x7fffefff]
> > >   Normal   [mem 0x7000-0x]
> > > Movable zone start for each node
> > > Early memory node ranges
> > >   node   0: [mem 0x-0x]
> > > Initmem setup node 0 [mem 0x-0x]
> > > On node 0

[PATCH 4/4] powerpc: Book3S 64-bit "heavyweight" KASAN support

2019-08-06 Thread Daniel Axtens

KASAN support on powerpc64 is interesting:

 - We want to be able to support inline instrumentation so as to be
   able to catch global and stack issues.

 - We run a lot of code at boot in real mode. This includes stuff like
   printk(), so it's not feasible to just disable instrumentation
   around it.

   [For those not immersed in ppc64, in real mode, the top nibble or
   2 bits (depending on radix/hash mmu) of the address is ignored. To
   make things work, we put the linear mapping at
   0xc000. This means that a pointer to part of the linear
   mapping will work both in real mode, where it will be interpreted
   as a physical address of the form 0x000..., and out of real mode,
   where it will go via the linear mapping.]

 - Inline instrumentation requires a fixed offset.

 - Because of our running things in real mode, the offset has to
   point to valid memory both in and out of real mode.

This makes finding somewhere to put the KASAN shadow region a bit fun.

One approach is just to give up on inline instrumentation and override
the address->shadow calculation. This way we can delay all checking
until after we get everything set up to our satisfaction. However,
we'd really like to do better.

What we can do - if we know _at compile time_ how much contiguous
physical memory we have - is to set aside the top 1/8th of the memory
and use that. This is a big hammer (hence the "heavyweight" name) and
comes with 3 big consequences:

 - kernels will simply fail to boot on machines with less memory than
   specified when compiling.

 - kernels running on machines with more memory than specified when
   compiling will simply ignore the extra memory.

 - there's no nice way to handle physically discontiguous memory, so
   you are restricted to the first physical memory block.

If you can bear all this, you get pretty full support for KASAN.

Despite the limitations, it can still find bugs,
e.g. http://patchwork.ozlabs.org/patch/1103775/

The current implementation is Radix only. I am open to extending
it to hash at some point but I don't think it should hold up v1.

Massive thanks to mpe, who had the idea for the initial design.

Signed-off-by: Daniel Axtens 

---
Changes since the rfc:

 - Boots real and virtual hardware, kvm works.

 - disabled reporting when we're checking the stack for exception
   frames. The behaviour isn't wrong, just incompatible with KASAN.

 - Documentation!

 - Dropped old module stuff in favour of KASAN_VMALLOC.

The bugs with ftrace and kuap were due to kernel bloat pushing
prom_init calls to be done via the plt. Because we did not have
a relocatable kernel, and they are done very early, this caused
everything to explode. Compile with CONFIG_RELOCATABLE!

---
 Documentation/dev-tools/kasan.rst|   7 +-
 Documentation/powerpc/kasan.txt  | 111 +++
 arch/powerpc/Kconfig |   4 +
 arch/powerpc/Kconfig.debug   |  21 
 arch/powerpc/Makefile|   7 ++
 arch/powerpc/include/asm/book3s/64/radix.h   |   5 +
 arch/powerpc/include/asm/kasan.h |  35 +-
 arch/powerpc/kernel/process.c|   8 ++
 arch/powerpc/kernel/prom.c   |  57 +-
 arch/powerpc/mm/kasan/Makefile   |   1 +
 arch/powerpc/mm/kasan/kasan_init_book3s_64.c |  76 +
 11 files changed, 326 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_book3s_64.c

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index 35fda484a672..48d3b669e577 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -22,7 +22,9 @@ global variables yet.
 Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later.
 
 Currently generic KASAN is supported for the x86_64, arm64, xtensa and s390
-architectures, and tag-based KASAN is supported only for arm64.
+architectures. It is also supported on powerpc for 32-bit kernels, and for
+64-bit kernels running under the radix MMU. Tag-based KASAN is supported only
+for arm64.
 
 Usage
 -
@@ -252,7 +254,8 @@ CONFIG_KASAN_VMALLOC
 
 
 With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
-cost of greater memory usage. Currently this is only supported on x86.
+cost of greater memory usage. Currently this is optional on x86, and
+required on 64-bit powerpc.
 
 This works by hooking into vmalloc and vmap, and dynamically
 allocating real shadow memory to back the mappings.
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
new file mode 100644
index ..a5592454353b
--- /dev/null
+++ b/Documentation/powerpc/kasan.txt
@@ -0,0 +1,111 @@
+KASAN is supported on powerpc on 32-bit and 64-bit Radix only.
+
+32 bit support
+==
+
+KASAN is supported on both hash and nohash MMUs on

[PATCH 3/4] powerpc: support KASAN instrumentation of bitops

2019-08-06 Thread Daniel Axtens

In KASAN development I noticed that the powerpc-specific bitops
were not being picked up by the KASAN test suite.

Instrumentation is done via the bitops-instrumented.h header. It
requies that arch-specific versions of bitop functions are renamed
to arch_*. Do this renaming.

For clear_bit_unlock_is_negative_byte, the current implementation
uses the PG_waiter constant. This works because it's a preprocessor
macro - so it's only actually evaluated in contexts where PG_waiter
is defined. With instrumentation however, it becomes a static inline
function, and all of a sudden we need the actual value of PG_waiter.
Because of the order of header includes, it's not available and we
fail to compile. Instead, manually specify that we care about bit 7.
This is still correct: bit 7 is the bit that would mark a negative
byte, but it does obscure the origin a little bit.

Cc: Nicholas Piggin  # clear_bit_unlock_negative_byte
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/include/asm/bitops.h | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/bitops.h 
b/arch/powerpc/include/asm/bitops.h
index 603aed229af7..19dc16e62e6a 100644
--- a/arch/powerpc/include/asm/bitops.h
+++ b/arch/powerpc/include/asm/bitops.h
@@ -86,22 +86,22 @@ DEFINE_BITOP(clear_bits, andc, "")
 DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER)
 DEFINE_BITOP(change_bits, xor, "")
 
-static __inline__ void set_bit(int nr, volatile unsigned long *addr)
+static __inline__ void arch_set_bit(int nr, volatile unsigned long *addr)
 {
set_bits(BIT_MASK(nr), addr + BIT_WORD(nr));
 }
 
-static __inline__ void clear_bit(int nr, volatile unsigned long *addr)
+static __inline__ void arch_clear_bit(int nr, volatile unsigned long *addr)
 {
clear_bits(BIT_MASK(nr), addr + BIT_WORD(nr));
 }
 
-static __inline__ void clear_bit_unlock(int nr, volatile unsigned long *addr)
+static __inline__ void arch_clear_bit_unlock(int nr, volatile unsigned long 
*addr)
 {
clear_bits_unlock(BIT_MASK(nr), addr + BIT_WORD(nr));
 }
 
-static __inline__ void change_bit(int nr, volatile unsigned long *addr)
+static __inline__ void arch_change_bit(int nr, volatile unsigned long *addr)
 {
change_bits(BIT_MASK(nr), addr + BIT_WORD(nr));
 }
@@ -138,26 +138,26 @@ DEFINE_TESTOP(test_and_clear_bits, andc, 
PPC_ATOMIC_ENTRY_BARRIER,
 DEFINE_TESTOP(test_and_change_bits, xor, PPC_ATOMIC_ENTRY_BARRIER,
  PPC_ATOMIC_EXIT_BARRIER, 0)
 
-static __inline__ int test_and_set_bit(unsigned long nr,
+static __inline__ int arch_test_and_set_bit(unsigned long nr,
   volatile unsigned long *addr)
 {
return test_and_set_bits(BIT_MASK(nr), addr + BIT_WORD(nr)) != 0;
 }
 
-static __inline__ int test_and_set_bit_lock(unsigned long nr,
+static __inline__ int arch_test_and_set_bit_lock(unsigned long nr,
   volatile unsigned long *addr)
 {
return test_and_set_bits_lock(BIT_MASK(nr),
addr + BIT_WORD(nr)) != 0;
 }
 
-static __inline__ int test_and_clear_bit(unsigned long nr,
+static __inline__ int arch_test_and_clear_bit(unsigned long nr,
 volatile unsigned long *addr)
 {
return test_and_clear_bits(BIT_MASK(nr), addr + BIT_WORD(nr)) != 0;
 }
 
-static __inline__ int test_and_change_bit(unsigned long nr,
+static __inline__ int arch_test_and_change_bit(unsigned long nr,
  volatile unsigned long *addr)
 {
return test_and_change_bits(BIT_MASK(nr), addr + BIT_WORD(nr)) != 0;
@@ -186,14 +186,14 @@ static __inline__ unsigned long 
clear_bit_unlock_return_word(int nr,
 }
 
 /* This is a special function for mm/filemap.c */
-#define clear_bit_unlock_is_negative_byte(nr, addr)\
-   (clear_bit_unlock_return_word(nr, addr) & BIT_MASK(PG_waiters))
+#define arch_clear_bit_unlock_is_negative_byte(nr, addr)   \
+   (clear_bit_unlock_return_word(nr, addr) & BIT_MASK(7))
 
 #endif /* CONFIG_PPC64 */
 
 #include 
 
-static __inline__ void __clear_bit_unlock(int nr, volatile unsigned long *addr)
+static __inline__ void arch___clear_bit_unlock(int nr, volatile unsigned long 
*addr)
 {
__asm__ __volatile__(PPC_RELEASE_BARRIER "" ::: "memory");
__clear_bit(nr, addr);
@@ -239,6 +239,9 @@ unsigned long __arch_hweight64(__u64 w);
 
 #include 
 
+/* wrappers that deal with KASAN instrumentation */
+#include 
+
 /* Little-endian versions */
 #include 
 
-- 
2.20.1

[PATCH 2/4] kasan: support instrumented bitops with generic non-atomic bitops

2019-08-06 Thread Daniel Axtens

Currently bitops-instrumented.h assumes that the architecture provides
both the atomic and non-atomic versions of the bitops (e.g. both
set_bit and __set_bit). This is true on x86, but is not always true:
there is a generic bitops/non-atomic.h header that provides generic
non-atomic versions. powerpc uses this generic version, so it does
not have it's own e.g. __set_bit that could be renamed arch___set_bit.

Rearrange bitops-instrumented.h. As operations in bitops/non-atomic.h
will already be instrumented (they use regular memory accesses), put
the instrumenting wrappers for them behind an ifdef. Only include
these instrumentation wrappers if non-atomic.h has not been included.

Signed-off-by: Daniel Axtens 
---
 include/asm-generic/bitops-instrumented.h | 144 --
 1 file changed, 76 insertions(+), 68 deletions(-)

diff --git a/include/asm-generic/bitops-instrumented.h 
b/include/asm-generic/bitops-instrumented.h
index ddd1c6d9d8db..2fe8f7e12a11 100644
--- a/include/asm-generic/bitops-instrumented.h
+++ b/include/asm-generic/bitops-instrumented.h
@@ -29,21 +29,6 @@ static inline void set_bit(long nr, volatile unsigned long 
*addr)
arch_set_bit(nr, addr);
 }
 
-/**
- * __set_bit - Set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic. If it is called on the same
- * region of memory concurrently, the effect may be that only one operation
- * succeeds.
- */
-static inline void __set_bit(long nr, volatile unsigned long *addr)
-{
-   kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
-   arch___set_bit(nr, addr);
-}
-
 /**
  * clear_bit - Clears a bit in memory
  * @nr: Bit to clear
@@ -57,21 +42,6 @@ static inline void clear_bit(long nr, volatile unsigned long 
*addr)
arch_clear_bit(nr, addr);
 }
 
-/**
- * __clear_bit - Clears a bit in memory
- * @nr: the bit to clear
- * @addr: the address to start counting from
- *
- * Unlike clear_bit(), this function is non-atomic. If it is called on the same
- * region of memory concurrently, the effect may be that only one operation
- * succeeds.
- */
-static inline void __clear_bit(long nr, volatile unsigned long *addr)
-{
-   kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
-   arch___clear_bit(nr, addr);
-}
-
 /**
  * clear_bit_unlock - Clear a bit in memory, for unlock
  * @nr: the bit to set
@@ -116,21 +86,6 @@ static inline void change_bit(long nr, volatile unsigned 
long *addr)
arch_change_bit(nr, addr);
 }
 
-/**
- * __change_bit - Toggle a bit in memory
- * @nr: the bit to change
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic. If it is called on the 
same
- * region of memory concurrently, the effect may be that only one operation
- * succeeds.
- */
-static inline void __change_bit(long nr, volatile unsigned long *addr)
-{
-   kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
-   arch___change_bit(nr, addr);
-}
-
 /**
  * test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
@@ -144,20 +99,6 @@ static inline bool test_and_set_bit(long nr, volatile 
unsigned long *addr)
return arch_test_and_set_bit(nr, addr);
 }
 
-/**
- * __test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic. If two instances of this operation race, one
- * can appear to succeed but actually fail.
- */
-static inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
-{
-   kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
-   return arch___test_and_set_bit(nr, addr);
-}
-
 /**
  * test_and_set_bit_lock - Set a bit and return its old value, for lock
  * @nr: Bit to set
@@ -187,30 +128,96 @@ static inline bool test_and_clear_bit(long nr, volatile 
unsigned long *addr)
 }
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
+ * test_and_change_bit - Change a bit and return its old value
+ * @nr: Bit to change
+ * @addr: Address to count from
+ *
+ * This is an atomic fully-ordered operation (implied full memory barrier).
+ */
+static inline bool test_and_change_bit(long nr, volatile unsigned long *addr)
+{
+   kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+   return arch_test_and_change_bit(nr, addr);
+}
+
+/*
+ * If the arch is using the generic non-atomic bit ops, they are already
+ * instrumented, and we don't need to create wrappers. Only wrap if we
+ * haven't included that header.
+ */
+#ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
+
+/**
+ * __set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic. If it is called on the same
+ * region of memory concurrently, the effect may be that only one operation
+ * succeeds.
+ */
+static inline void __set_bit(long nr, volatile unsigned

[PATCH 1/4] kasan: allow arches to provide their own early shadow setup

2019-08-06 Thread Daniel Axtens

powerpc supports several different MMUs. In particular, book3s
machines support both a hash-table based MMU and a radix MMU.
These MMUs support different numbers of entries per directory
level: the PTES_PER_* defines evaluate to variables, not constants.
This leads to complier errors as global variables must have constant
sizes.

Allow architectures to manage their own early shadow variables so we
can work around this on powerpc.

Signed-off-by: Daniel Axtens 

---
Changes from RFC:

 - To make checkpatch happy, move ARCH_HAS_KASAN_EARLY_SHADOW from
   a random #define to a config option selected when building for
   ppc64 book3s
---
 include/linux/kasan.h |  2 ++
 lib/Kconfig.kasan |  3 +++
 mm/kasan/init.c   | 10 ++
 3 files changed, 15 insertions(+)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index ec81113fcee4..15933da52a3e 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -14,11 +14,13 @@ struct task_struct;
 #include 
 #include 
 
+#ifndef CONFIG_ARCH_HAS_KASAN_EARLY_SHADOW
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
 extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE];
 extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
 extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
+#endif
 
 int kasan_populate_early_shadow(const void *shadow_start,
const void *shadow_end);
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index a320dc2e9317..0621a0129c04 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -9,6 +9,9 @@ config HAVE_ARCH_KASAN_SW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
bool
 
+config ARCH_HAS_KASAN_EARLY_SHADOW
+   bool
+
 config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
 
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index ce45c491ebcd..7ef2b87a7988 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -31,10 +31,14 @@
  *   - Latter it reused it as zero shadow to cover large ranges of memory
  * that allowed to access, but not handled by kasan (vmalloc/vmemmap ...).
  */
+#ifndef CONFIG_ARCH_HAS_KASAN_EARLY_SHADOW
 unsigned char kasan_early_shadow_page[PAGE_SIZE] __page_aligned_bss;
+#endif
 
 #if CONFIG_PGTABLE_LEVELS > 4
+#ifndef CONFIG_ARCH_HAS_KASAN_EARLY_SHADOW
 p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D] __page_aligned_bss;
+#endif
 static inline bool kasan_p4d_table(pgd_t pgd)
 {
return pgd_page(pgd) == virt_to_page(lm_alias(kasan_early_shadow_p4d));
@@ -46,7 +50,9 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
+#ifndef CONFIG_ARCH_HAS_KASAN_EARLY_SHADOW
 pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+#endif
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -58,7 +64,9 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
+#ifndef CONFIG_ARCH_HAS_KASAN_EARLY_SHADOW
 pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+#endif
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -69,7 +77,9 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
+#ifndef CONFIG_ARCH_HAS_KASAN_EARLY_SHADOW
 pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
+#endif
 
 static inline bool kasan_pte_table(pmd_t pmd)
 {
-- 
2.20.1

[PATCH 0/4] powerpc: KASAN for 64-bit Book3S on Radix

2019-08-06 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU.

It builds on top Christophe's work on 32bit. It also builds on my
generic KASAN_VMALLOC series, available at:
https://patchwork.kernel.org/project/linux-mm/list/?series=153209

This provides full inline instrumentation on radix, but does require
that you be able to specify the amount of memory on the system at
compile time. More details in patch 4.

Notable changes from the RFC:

 - I've dropped Book3E 64-bit for now.

 - Now instead of hacking into the KASAN core to disable module
   allocations, we use KASAN_VMALLOC.

 - More testing, including on real hardware. This revealed that
   discontiguous memory is a bit of a headache, at the moment we
   must disable memory not contiguous from 0. 
   
 - Update to deal with kasan bitops instrumentation that landed
   between RFC and now.

 - Documentation!

 - Various cleanups and tweaks.

I am getting occasional problems on boot of real hardware where it
seems vmalloc space mappings don't get installed in time. (We get a
BUG that memory is not accessible, but by the time we hit xmon the
memory then is accessible!) It happens once every few boots. I haven't
yet been able to figure out what is happening and why. I'm going to
look in to it, but I think the patches are in good enough shape to
review while I work on it.

Regards,
Daniel

Daniel Axtens (4):
  kasan: allow arches to provide their own early shadow setup
  kasan: support instrumented bitops with generic non-atomic bitops
  powerpc: support KASAN instrumentation of bitops
  powerpc: Book3S 64-bit "heavyweight" KASAN support

 Documentation/dev-tools/kasan.rst|   7 +-
 Documentation/powerpc/kasan.txt  | 111 ++
 arch/powerpc/Kconfig |   4 +
 arch/powerpc/Kconfig.debug   |  21 +++
 arch/powerpc/Makefile|   7 +
 arch/powerpc/include/asm/bitops.h|  25 ++--
 arch/powerpc/include/asm/book3s/64/radix.h   |   5 +
 arch/powerpc/include/asm/kasan.h |  35 -
 arch/powerpc/kernel/process.c|   8 ++
 arch/powerpc/kernel/prom.c   |  57 +++-
 arch/powerpc/mm/kasan/Makefile   |   1 +
 arch/powerpc/mm/kasan/kasan_init_book3s_64.c |  76 ++
 include/asm-generic/bitops-instrumented.h| 144 ++-
 include/linux/kasan.h|   2 +
 lib/Kconfig.kasan|   3 +
 mm/kasan/init.c  |  10 ++
 16 files changed, 431 insertions(+), 85 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_book3s_64.c

-- 
2.20.1

Re: [PATCH net-next v2] ibmveth: Allow users to update reported speed and duplex

2019-08-06 Thread Jakub Kicinski

On Tue,  6 Aug 2019 11:23:08 -0500, Thomas Falcon wrote:
> Reported ethtool link settings for the ibmveth driver are currently
> hardcoded and no longer reflect the actual capabilities of supported
> hardware. There is no interface designed for retrieving this information
> from device firmware nor is there any way to update current settings
> to reflect observed or expected link speeds.
> 
> To avoid breaking existing configurations, retain current values as
> default settings but let users update them to match the expected
> capabilities of underlying hardware if needed. This update would
> allow the use of configurations that rely on certain link speed
> settings, such as LACP. This patch is based on the implementation
> in virtio_net.
> 
> Signed-off-by: Thomas Falcon 

Looks like this is the third copy of the same code virtio and
netvsc have :(  Is there a chance we could factor this out into
helpers in the core?

[PATCH AUTOSEL 5.2 21/59] powerpc/nvdimm: Pick nearby online node if the device node is not online

2019-08-06 Thread Sasha Levin

From: "Aneesh Kumar K.V" 

[ Upstream commit da1115fdbd6e86c62185cdd2b4bf7add39f2f82b ]

Currently, nvdimm subsystem expects the device numa node for SCM device to be
an online node. It also doesn't try to bring the device numa node online. Hence
if we use a non-online numa node as device node we hit crashes like below. This
is because we try to access uninitialized NODE_DATA in different code paths.

cpu 0x0: Vector: 300 (Data Access) at [c000fac53170]
pc: c04bbc50: ___slab_alloc+0x120/0xca0
lr: c04bc834: __slab_alloc+0x64/0xc0
sp: c000fac53400
   msr: 82009033
   dar: 73e8
 dsisr: 8
  current = 0xc000fabb6d80
  paca= 0xc387   irqmask: 0x03   irq_happened: 0x01
pid   = 7, comm = kworker/u16:0
Linux version 5.2.0-06234-g76bd729b2644 (kvaneesh@ltc-boston123) (gcc version 
7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #135 SMP Thu Jul 11 05:36:30 CDT 2019
enter ? for help
[link register   ] c04bc834 __slab_alloc+0x64/0xc0
[c000fac53400] c000fac53480 (unreliable)
[c000fac53500] c04bc818 __slab_alloc+0x48/0xc0
[c000fac53560] c04c30a0 __kmalloc_node_track_caller+0x3c0/0x6b0
[c000fac535d0] c0cfafe4 devm_kmalloc+0x74/0xc0
[c000fac53600] c0d69434 nd_region_activate+0x144/0x560
[c000fac536d0] c0d6b19c nd_region_probe+0x17c/0x370
[c000fac537b0] c0d6349c nvdimm_bus_probe+0x10c/0x230
[c000fac53840] c0cf3cc4 really_probe+0x254/0x4e0
[c000fac538d0] c0cf429c driver_probe_device+0x16c/0x1e0
[c000fac53950] c0cf0b44 bus_for_each_drv+0x94/0x130
[c000fac539b0] c0cf392c __device_attach+0xdc/0x200
[c000fac53a50] c0cf231c bus_probe_device+0x4c/0xf0
[c000fac53a90] c0ced268 device_add+0x528/0x810
[c000fac53b60] c0d62a58 nd_async_device_register+0x28/0xa0
[c000fac53bd0] c01ccb8c async_run_entry_fn+0xcc/0x1f0
[c000fac53c50] c01bcd9c process_one_work+0x46c/0x860
[c000fac53d20] c01bd4f4 worker_thread+0x364/0x5f0
[c000fac53db0] c01c7260 kthread+0x1b0/0x1c0
[c000fac53e20] c000b954 ret_from_kernel_thread+0x5c/0x68

The patch tries to fix this by picking the nearest online node as the SCM node.
This does have a problem of us losing the information that SCM node is
equidistant from two other online nodes. If applications need to understand 
these
fine-grained details we should express then like x86 does via
/sys/devices/system/node/nodeX/accessY/initiators/

With the patch we get

 # numactl -H
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31
node 1 size: 130865 MB
node 1 free: 129130 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10
 # cat /sys/bus/nd/devices/region0/numa_node
0
 # dmesg | grep papr_scm
[   91.332305] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Region 
registered with target node 2 and online node 0

Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Michael Ellerman 
Link: 
https://lore.kernel.org/r/20190729095128.23707-1-aneesh.ku...@linux.ibm.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 29 +--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 96c53b23e58f9..30af5019a68fe 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -194,12 +194,32 @@ static const struct attribute_group 
*papr_scm_dimm_groups[] = {
NULL,
 };
 
+static inline int papr_scm_node(int node)
+{
+   int min_dist = INT_MAX, dist;
+   int nid, min_node;
+
+   if ((node == NUMA_NO_NODE) || node_online(node))
+   return node;
+
+   min_node = first_online_node;
+   for_each_online_node(nid) {
+   dist = node_distance(node, nid);
+   if (dist < min_dist) {
+   min_dist = dist;
+   min_node = nid;
+   }
+   }
+   return min_node;
+}
+
 static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 {
struct device *dev = >pdev->dev;
struct nd_mapping_desc mapping;
struct nd_region_desc ndr_desc;
unsigned long dimm_flags;
+   int target_nid, online_nid;
 
p->bus_desc.ndctl = papr_scm_ndctl;
p->bus_desc.module = THIS_MODULE;
@@ -238,8 +258,10 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 
memset(_desc, 0, sizeof(ndr_desc));
ndr_desc.attr_groups = region_attr_groups;
-   ndr_desc.numa_node = dev_to_node(>pdev->dev);
-   ndr_desc.target_node = ndr_desc.numa_node;
+   target_nid = dev_to_node(>pdev->dev);
+   online_nid = papr_scm_node(target_nid);
+   ndr_desc.numa_node = online_nid;
+

[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-06 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=204371

--- Comment #8 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 284243
  --> https://bugzilla.kernel.org/attachment.cgi?id=284243=edit
kernel .config (PowerMac G5 11,2, kernel 5.3-rc3)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-06 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=204371

--- Comment #7 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 284241
  --> https://bugzilla.kernel.org/attachment.cgi?id=284241=edit
dmesg (PowerMac G5 11,2, kernel 5.3-rc3)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-06 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=204371

--- Comment #6 from Erhard F. (erhar...@mailbox.org) ---
On Wed, 31 Jul 2019 12:09:54 +
bugzilla-dae...@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=204371
> 
> --- Comment #4 from m...@ellerman.id.au ---
> bugzilla-dae...@bugzilla.kernel.org writes:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=204371
> >
> > --- Comment #2 from Andrew Morton (a...@linux-foundation.org) ---
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> >
> > On Mon, 29 Jul 2019 22:35:48 + bugzilla-dae...@bugzilla.kernel.org
> wrote:
> >  
> >> https://bugzilla.kernel.org/show_bug.cgi?id=204371
> >> 
> >> Bug ID: 204371
> >>Summary: BUG kmalloc-4k (Tainted: GW): Object
> >> padding overwritten
> >>Product: Memory Management
> >>Version: 2.5
> >> Kernel Version: 5.3.0-rc2
> >>   Hardware: PPC-32
> >> OS: Linux
> >>   Tree: Mainline
> >> Status: NEW
> >>   Severity: normal
> >>   Priority: P1
> >>  Component: Slab Allocator
> >>   Assignee: a...@linux-foundation.org
> >>   Reporter: erhar...@mailbox.org
> >> Regression: No  
> >
> > cc'ing various people here.
> >
> > I suspect proc_cgroup_show() is innocent and that perhaps
> > bpf_prepare_filter() had a memory scribble.  iirc there has been at
> > least one recent pretty serious bpf fix applied recently.  Can others
> > please take a look?  
> 
> I haven't been able to reproduce this on a 64-bit or 32-bit powerpc
> machine here. But I don't run gentoo userspace, so I suspect I'm not
> tripping the same path at boot. I did run the seccomp selftest and that
> didn't trip it either.

Had the time to test this on my G5 11,2. It's kernel 5.3-rc3 now, also booting
from a zstd:1 compressed btrfs partition. Without SLUB_DEBUG_ON selected in the
kernel, the machine boots seemingly fine, with SLUB_DEBUG_ON I get this:

[...]
Aug 06 22:26:35 T800 kernel: BTRFS info (device sda7): use zstd compression,
level 1
Aug 06 22:26:35 T800 kernel: BTRFS info (device sda7): disk space caching is
enabled
Aug 06 22:26:38 T800 kernel:
=
Aug 06 22:26:38 T800 kernel: BUG kmalloc-4k (Tainted: GW):
Object padding overwritten
Aug 06 22:26:38 T800 kernel:
-
Aug 06 22:26:38 T800 kernel: INFO: 0x62cd4309-0x4edab9d1. First
byte 0x0 instead of 0x5a
Aug 06 22:26:38 T800 kernel: INFO: Slab 0x70aa589a objects=7 used=7
fp=0x16708aa5 flags=0x7fe0010200
Aug 06 22:26:38 T800 kernel: INFO: Object 0x7ed48057 @offset=17736
fp=0xb4be3601
Aug 06 22:26:38 T800 kernel: Redzone f5b164d9: bb bb bb bb bb bb bb bb 

Aug 06 22:26:38 T800 kernel: Object 7ed48057: 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b 6b 6b  
[...]
Aug 06 22:26:38 T800 kernel: Redzone bd6d4c8f: bb bb bb bb bb bb bb bb 

Aug 06 22:26:38 T800 kernel: Padding 62cd4309: 00 00 00 00 00 00 00 00 

Aug 06 22:26:38 T800 kernel: CPU: 0 PID: 118 Comm: systemd-journal Tainted: G  
 B   W 5.3.0-rc3 #5
Aug 06 22:26:38 T800 kernel: Call Trace:
Aug 06 22:26:38 T800 kernel: [c0045baa72a0] [c09e1a74]
.dump_stack+0xe0/0x15c (unreliable)
Aug 06 22:26:38 T800 kernel: [c0045baa7340] [c02d4640]
.print_trailer+0x228/0x250
Aug 06 22:26:38 T800 kernel: [c0045baa73e0] [c02c81f8]
.check_bytes_and_report+0x118/0x140
Aug 06 22:26:38 T800 kernel: [c0045baa7490] [c02ca9fc]
.check_object+0xcc/0x3a0
Aug 06 22:26:38 T800 kernel: [c0045baa7540] [c02cc6b8]
.alloc_debug_processing+0x158/0x210
Aug 06 22:26:38 T800 kernel: [c0045baa75d0] [c02cce28]
.___slab_alloc+0x6b8/0x860
Aug 06 22:26:38 T800 kernel: [c0045baa7710] [c02cd024]
.__slab_alloc+0x54/0xc0
Aug 06 22:26:38 T800 kernel: [c0045baa7790] [c02cd854]
.kmem_cache_alloc_trace+0x3b4/0x410
Aug 06 22:26:38 T800 kernel: [c0045baa7840] [c04b9928]
.alloc_log_tree+0x38/0x140
Aug 06 22:26:38 T800 kernel: [c0045baa78d0] [c04b9ad0]
.btrfs_add_log_tree+0x30/0x130
Aug 06 22:26:38 T800 kernel: [c0045baa7960] [c0525624]
.btrfs_log_inode_parent+0x4a4/0xeb0
Aug 06 22:26:38 T800 kernel: [c0045baa7ae0] [c052737c]
.btrfs_log_dentry_safe+0x6c/0xb0
Aug 06 22:26:38 T800 kernel: [c0045baa7b80] [c04e1e3c]
.btrfs_sync_file+0x1ec/0x570
Aug 06 22:26:38 T800 kernel: [c0045baa7c90] [c0355ac4]
.vfs_fsync_range+0x64/0xe0
Aug 06 22:26:38 T800 kernel: [c0045baa7d20] [c0355ba8]
.do_fsync+0x48/0xc0
Aug 06

Re: [PATCH 1/2] dma-mapping: fix page attributes for dma_mmap_*

2019-08-06 Thread Shawn Anastasio


On 8/5/19 10:01 AM, Christoph Hellwig wrote:

diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index 3813211a9aad..9ae5cee543c4 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -42,13 +42,8 @@ void arch_dma_free(struct device *dev, size_t size, void 
*cpu_addr,
dma_addr_t dma_addr, unsigned long attrs);
  long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
dma_addr_t dma_addr);
-
-#ifdef CONFIG_ARCH_HAS_DMA_MMAP_PGPROT
  pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
unsigned long attrs);
-#else
-# define arch_dma_mmap_pgprot(dev, prot, attrs)pgprot_noncached(prot)
-#endif


Nit, but maybe the prototype should still be ifdef'd here? It at least
could prevent a reader from incorrectly thinking that the function is
always present.

Also, like Will mentioned earlier, the function name isn't entirely
accurate anymore. I second the suggestion of using something like
arch_dma_noncoherent_pgprot(). As for your idea of defining
pgprot_dmacoherent for all architectures as

#ifndef pgprot_dmacoherent
#define pgprot_dmacoherent pgprot_noncached
#endif

I think that the name here is kind of misleading too, since this
definition will only be used when there is no support for proper
DMA coherency.

[PATCH] KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP

2019-08-06 Thread Cédric Le Goater

When a vCPU is brought done, the XIVE VP is first disabled and then
the event notification queues are freed. When freeing the queues, we
check for possible escalation interrupts and free them also.

But when a XIVE VP is disabled, the underlying XIVE ENDs also are
disabled in OPAL. When an END is disabled, its ESB pages (ESn and ESe)
are disabled and loads return all 1s. Which means that any access on
the ESB page of the escalation interrupt will return invalid values.

When an interrupt is freed, the shutdown handler computes a 'saved_p'
field from the value returned by a load in xive_do_source_set_mask().
This value is incorrect for escalation interrupts for the reason
described above.

This has no impact on Linux/KVM today because we don't make use of it
but we will introduce in future changes a xive_get_irqchip_state()
handler. This handler will use the 'saved_p' field to return the state
of an interrupt and 'saved_p' being incorrect, softlockup will occur.

Fix the vCPU cleanup sequence by first freeing the escalation
interrupts if any, then disable the XIVE VP and last free the queues.

Signed-off-by: Cédric Le Goater 
---
 arch/powerpc/kvm/book3s_xive.c| 18 ++
 arch/powerpc/kvm/book3s_xive_native.c | 12 +++-
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index e3ba67095895..09f838aa3138 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1134,20 +1134,22 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
/* Mask the VP IPI */
xive_vm_esb_load(>vp_ipi_data, XIVE_ESB_SET_PQ_01);
 
-   /* Disable the VP */
-   xive_native_disable_vp(xc->vp_id);
-
-   /* Free the queues & associated interrupts */
+   /* Free escalations */
for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
-   struct xive_q *q = >queues[i];
-
-   /* Free the escalation irq */
if (xc->esc_virq[i]) {
free_irq(xc->esc_virq[i], vcpu);
irq_dispose_mapping(xc->esc_virq[i]);
kfree(xc->esc_virq_names[i]);
}
-   /* Free the queue */
+   }
+
+   /* Disable the VP */
+   xive_native_disable_vp(xc->vp_id);
+
+   /* Free the queues */
+   for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
+   struct xive_q *q = >queues[i];
+
xive_native_disable_queue(xc->vp_id, q, i);
if (q->qpage) {
free_pages((unsigned long)q->qpage,
diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
b/arch/powerpc/kvm/book3s_xive_native.c
index a998823f68a3..368427fcad20 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -67,10 +67,7 @@ void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu)
xc->valid = false;
kvmppc_xive_disable_vcpu_interrupts(vcpu);
 
-   /* Disable the VP */
-   xive_native_disable_vp(xc->vp_id);
-
-   /* Free the queues & associated interrupts */
+   /* Free escalations */
for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
/* Free the escalation irq */
if (xc->esc_virq[i]) {
@@ -79,8 +76,13 @@ void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu)
kfree(xc->esc_virq_names[i]);
xc->esc_virq[i] = 0;
}
+   }
 
-   /* Free the queue */
+   /* Disable the VP */
+   xive_native_disable_vp(xc->vp_id);
+
+   /* Free the queues */
+   for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
kvmppc_xive_native_cleanup_queue(vcpu, i);
}
 
-- 
2.21.0

Re: [PATCH] dma-mapping: fix page attributes for dma_mmap_*

2019-08-06 Thread Russell King - ARM Linux admin

On Tue, Aug 06, 2019 at 05:08:54PM +0100, Will Deacon wrote:
> On Sat, Aug 03, 2019 at 08:48:12AM +0200, Christoph Hellwig wrote:
> > On Fri, Aug 02, 2019 at 11:38:03AM +0100, Will Deacon wrote:
> > > 
> > > So this boils down to a terminology mismatch. The Arm architecture 
> > > doesn't have
> > > anything called "write combine", so in Linux we instead provide what the 
> > > Arm
> > > architecture calls "Normal non-cacheable" memory for 
> > > pgprot_writecombine().
> > > Amongst other things, this memory type permits speculation, unaligned 
> > > accesses
> > > and merging of writes. I found something in the architecture spec about
> > > non-cachable memory, but it's written in Armglish[1].
> > > 
> > > pgprot_noncached(), on the other hand, provides what the architecture 
> > > calls
> > > Strongly Ordered or Device-nGnRnE memory. This is intended for mapping 
> > > MMIO
> > > (i.e. PCI config space) and therefore forbids speculation, preserves 
> > > access
> > > size, requires strict alignment and also forces write responses to come 
> > > from
> > > the endpoint.
> > > 
> > > I think the naming mismatch is historical, but on arm64 we wanted to use 
> > > the
> > > same names as arm32 so that any drivers using these things directly would 
> > > get
> > > the same behaviour.
> > 
> > That all makes sense, but it totally needs a comment.  I'll try to draft
> > one based on this.  I've also looked at the arm32 code a bit more, and
> > it seems arm always (?) supported Normal non-cacheable attribute, but
> > Linux only optionally uses it for arm v6+ because of fears of drivers
> > missing barriers.
> 
> I think it was also to do with aliasing, but I don't recall all of the
> details.

ARMv6+ is where the architecture significantly changed to introduce
the idea of [Normal, Device, Strongly Ordered] where Normal has the
cache attributes.

Before that, we had just "uncached/unbuffered, uncached/buffered,
cached/unbuffered, cached/buffered" modes.

The write buffer (enabled by buffered modes) has no architected
guarantees about how long writes will sit in it, and there is only
the "drain write buffer" instruction to push writes out.

Up to and including ARMv5, we took the easy approach of just using
the "uncached/unbuffered" mode since that is (a) the safest, and (b)
avoids write buffers that alias when there are multiple different
mappings.

We could have used a different approach, making all IO writes contain
a "drain write buffer" instruction, and map DMA memory as "buffered",
but as there were no Linux barriers defined to order memory accesses
to DMA memory (so, for example, ring buffers can be updated in the
correct order) back in those days, using the uncached/unbuffered mode
was the sanest and most reliable solution.

> 
> > The other really weird things is that in arm32
> > pgprot_dmacoherent incudes the L_PTE_XN bit, which from my understanding
> > is the no-execture bit, but pgprot_writecombine does not.  This seems to
> > not very unintentional.  So minus that the whole DMA_ATTR_WRITE_COMBІNE
> > seems to be about flagging old arm specific drivers as having the proper
> > barriers in places and otherwise is a no-op.
> 
> I think it only matters for Armv7 CPUs, but yes, we should probably be
> setting L_PTE_XN for both of these memory types.

Conventionally, pgprot_writecombine() has only been used to change
the memory type and not the permissions.  Since writecombine memory
is still capable of being executed, I don't see any reason to set XN
for it.

If the user wishes to mmap() using PROT_READ|PROT_EXEC, then is there
really a reason for writecombine to set XN overriding the user?

That said, pgprot_writecombine() is mostly used for framebuffers, which
arguably shouldn't be executable anyway - but who'd want to mmap() the
framebuffer with PROT_EXEC?

> 
> > Here is my tentative plan:
> > 
> >  - respin this patch with a small fix to handle the
> >DMA_ATTR_NON_CONSISTENT (as in ignore it unless actually supported),
> >but keep the name as-is to avoid churn.  This should allow 5.3
> >inclusion and backports
> >  - remove DMA_ATTR_WRITE_COMBINE support from mips, probably also 5.3
> >material.
> >  - move all architectures but arm over to just define
> >pgprot_dmacoherent, including a comment with the above explanation
> >for arm64.
> 
> That would be great, thanks.
> 
> >  - make DMA_ATTR_WRITE_COMBINE a no-op and schedule it for removal,
> >thus removing the last instances of arch_dma_mmap_pgprot
> 
> All sounds good to me, although I suppose 32-bit Arm platforms without
> CONFIG_ARM_DMA_MEM_BUFFERABLE may run into issues if DMA_ATTR_WRITE_COMBINE
> disappears. Only one way to find out...

Looking at the results of grep, I think only OMAP2+ and Exynos may be
affected.

However, removing writecombine support from the DMA API is going to
have a huge impact for framebuffers on earlier ARMs - that's where we
do expect framebuffers to be mapped "uncached/buffered" for

Re: [PATCH] dma-mapping: fix page attributes for dma_mmap_*

2019-08-06 Thread Russell King - ARM Linux admin

On Tue, Aug 06, 2019 at 05:45:03PM +0100, Russell King - ARM Linux admin wrote:
> On Tue, Aug 06, 2019 at 05:08:54PM +0100, Will Deacon wrote:
> > On Sat, Aug 03, 2019 at 08:48:12AM +0200, Christoph Hellwig wrote:
> > > On Fri, Aug 02, 2019 at 11:38:03AM +0100, Will Deacon wrote:
> > > > 
> > > > So this boils down to a terminology mismatch. The Arm architecture 
> > > > doesn't have
> > > > anything called "write combine", so in Linux we instead provide what 
> > > > the Arm
> > > > architecture calls "Normal non-cacheable" memory for 
> > > > pgprot_writecombine().
> > > > Amongst other things, this memory type permits speculation, unaligned 
> > > > accesses
> > > > and merging of writes. I found something in the architecture spec about
> > > > non-cachable memory, but it's written in Armglish[1].
> > > > 
> > > > pgprot_noncached(), on the other hand, provides what the architecture 
> > > > calls
> > > > Strongly Ordered or Device-nGnRnE memory. This is intended for mapping 
> > > > MMIO
> > > > (i.e. PCI config space) and therefore forbids speculation, preserves 
> > > > access
> > > > size, requires strict alignment and also forces write responses to come 
> > > > from
> > > > the endpoint.
> > > > 
> > > > I think the naming mismatch is historical, but on arm64 we wanted to 
> > > > use the
> > > > same names as arm32 so that any drivers using these things directly 
> > > > would get
> > > > the same behaviour.
> > > 
> > > That all makes sense, but it totally needs a comment.  I'll try to draft
> > > one based on this.  I've also looked at the arm32 code a bit more, and
> > > it seems arm always (?) supported Normal non-cacheable attribute, but
> > > Linux only optionally uses it for arm v6+ because of fears of drivers
> > > missing barriers.
> > 
> > I think it was also to do with aliasing, but I don't recall all of the
> > details.
> 
> ARMv6+ is where the architecture significantly changed to introduce
> the idea of [Normal, Device, Strongly Ordered] where Normal has the
> cache attributes.
> 
> Before that, we had just "uncached/unbuffered, uncached/buffered,
> cached/unbuffered, cached/buffered" modes.
> 
> The write buffer (enabled by buffered modes) has no architected
> guarantees about how long writes will sit in it, and there is only
> the "drain write buffer" instruction to push writes out.
> 
> Up to and including ARMv5, we took the easy approach of just using
> the "uncached/unbuffered" mode since that is (a) the safest, and (b)
> avoids write buffers that alias when there are multiple different
> mappings.
> 
> We could have used a different approach, making all IO writes contain
> a "drain write buffer" instruction, and map DMA memory as "buffered",
> but as there were no Linux barriers defined to order memory accesses
> to DMA memory (so, for example, ring buffers can be updated in the
> correct order) back in those days, using the uncached/unbuffered mode
> was the sanest and most reliable solution.
> 
> > 
> > > The other really weird things is that in arm32
> > > pgprot_dmacoherent incudes the L_PTE_XN bit, which from my understanding
> > > is the no-execture bit, but pgprot_writecombine does not.  This seems to
> > > not very unintentional.  So minus that the whole DMA_ATTR_WRITE_COMBІNE
> > > seems to be about flagging old arm specific drivers as having the proper
> > > barriers in places and otherwise is a no-op.
> > 
> > I think it only matters for Armv7 CPUs, but yes, we should probably be
> > setting L_PTE_XN for both of these memory types.
> 
> Conventionally, pgprot_writecombine() has only been used to change
> the memory type and not the permissions.  Since writecombine memory
> is still capable of being executed, I don't see any reason to set XN
> for it.
> 
> If the user wishes to mmap() using PROT_READ|PROT_EXEC, then is there
> really a reason for writecombine to set XN overriding the user?
> 
> That said, pgprot_writecombine() is mostly used for framebuffers, which
> arguably shouldn't be executable anyway - but who'd want to mmap() the
> framebuffer with PROT_EXEC?
> 
> > 
> > > Here is my tentative plan:
> > > 
> > >  - respin this patch with a small fix to handle the
> > >DMA_ATTR_NON_CONSISTENT (as in ignore it unless actually supported),
> > >but keep the name as-is to avoid churn.  This should allow 5.3
> > >inclusion and backports
> > >  - remove DMA_ATTR_WRITE_COMBINE support from mips, probably also 5.3
> > >material.
> > >  - move all architectures but arm over to just define
> > >pgprot_dmacoherent, including a comment with the above explanation
> > >for arm64.
> > 
> > That would be great, thanks.
> > 
> > >  - make DMA_ATTR_WRITE_COMBINE a no-op and schedule it for removal,
> > >thus removing the last instances of arch_dma_mmap_pgprot
> > 
> > All sounds good to me, although I suppose 32-bit Arm platforms without
> > CONFIG_ARM_DMA_MEM_BUFFERABLE may run into issues if DMA_ATTR_WRITE_COMBINE
> > disappears. Only one way to

Re: [PATCH v2 3/5] x86: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES

2019-08-06 Thread Hoan Tran OS

Hi Thomas,

On 7/15/19 11:43 AM, Thomas Gleixner wrote:
> On Thu, 11 Jul 2019, Hoan Tran OS wrote:
> 
>> Remove CONFIG_NODES_SPAN_OTHER_NODES as it's enabled
>> by default with NUMA.
> 
> As I told you before this does not mention that the option is now enabled
> even for x86(32bit) configurations which did not enable it before and does
> not longer depend on X86_64_ACPI_NUMA.

Agreed, let me add it into this patch description.

> 
> And there is still no rationale why this makes sense.
> 

As we know about the memmap_init_zone() function, it is used to 
initialize all pages. During initializing, early_pfn_in_nid() function 
makes sure the page is in the same node id. Otherwise, 
memmap_init_zone() only checks the page validity. It won't work with 
node memory spans across the others.

The option CONFIG_NODES_SPAN_OTHER_NODES is only used to enable 
early_pfn_in_nid() function.

It occurs during boot-time and won't affect the run-time performance.
And I saw the majority NUMA architectures enable this option by default 
with NUMA.

Thanks and Regards
Hoan

> Thanks,
> 
>   tglx
>

[PATCH net-next v2] ibmveth: Allow users to update reported speed and duplex

2019-08-06 Thread Thomas Falcon

Reported ethtool link settings for the ibmveth driver are currently
hardcoded and no longer reflect the actual capabilities of supported
hardware. There is no interface designed for retrieving this information
from device firmware nor is there any way to update current settings
to reflect observed or expected link speeds.

To avoid breaking existing configurations, retain current values as
default settings but let users update them to match the expected
capabilities of underlying hardware if needed. This update would
allow the use of configurations that rely on certain link speed
settings, such as LACP. This patch is based on the implementation
in virtio_net.

Signed-off-by: Thomas Falcon 
---
v2: Updated default driver speed/duplex settings to avoid
breaking existing setups
---
 drivers/net/ethernet/ibm/ibmveth.c | 83 --
 drivers/net/ethernet/ibm/ibmveth.h |  3 ++
 2 files changed, 64 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
b/drivers/net/ethernet/ibm/ibmveth.c
index d654c23..5dc634f 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -712,31 +712,68 @@ static int ibmveth_close(struct net_device *netdev)
return 0;
 }
 
-static int netdev_get_link_ksettings(struct net_device *dev,
-struct ethtool_link_ksettings *cmd)
+static bool
+ibmveth_validate_ethtool_cmd(const struct ethtool_link_ksettings *cmd)
 {
-   u32 supported, advertising;
-
-   supported = (SUPPORTED_1000baseT_Full | SUPPORTED_Autoneg |
-   SUPPORTED_FIBRE);
-   advertising = (ADVERTISED_1000baseT_Full | ADVERTISED_Autoneg |
-   ADVERTISED_FIBRE);
-   cmd->base.speed = SPEED_1000;
-   cmd->base.duplex = DUPLEX_FULL;
-   cmd->base.port = PORT_FIBRE;
-   cmd->base.phy_address = 0;
-   cmd->base.autoneg = AUTONEG_ENABLE;
-
-   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
-   supported);
-   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.advertising,
-   advertising);
+   struct ethtool_link_ksettings diff1 = *cmd;
+   struct ethtool_link_ksettings diff2 = {};
+
+   diff2.base.port = PORT_OTHER;
+   diff1.base.speed = 0;
+   diff1.base.duplex = 0;
+   diff1.base.cmd = 0;
+   diff1.base.link_mode_masks_nwords = 0;
+   ethtool_link_ksettings_zero_link_mode(, advertising);
+
+   return !memcmp(, , sizeof(diff1.base)) &&
+   bitmap_empty(diff1.link_modes.supported,
+__ETHTOOL_LINK_MODE_MASK_NBITS) &&
+   bitmap_empty(diff1.link_modes.advertising,
+__ETHTOOL_LINK_MODE_MASK_NBITS) &&
+   bitmap_empty(diff1.link_modes.lp_advertising,
+__ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static int ibmveth_set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *cmd)
+{
+   struct ibmveth_adapter *adapter = netdev_priv(dev);
+   u32 speed;
+   u8 duplex;
+
+   speed = cmd->base.speed;
+   duplex = cmd->base.duplex;
+   /* don't allow custom speed and duplex */
+   if (!ethtool_validate_speed(speed) ||
+   !ethtool_validate_duplex(duplex) ||
+   !ibmveth_validate_ethtool_cmd(cmd))
+   return -EINVAL;
+   adapter->speed = speed;
+   adapter->duplex = duplex;
 
return 0;
 }
 
-static void netdev_get_drvinfo(struct net_device *dev,
-  struct ethtool_drvinfo *info)
+static int ibmveth_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *cmd)
+{
+   struct ibmveth_adapter *adapter = netdev_priv(dev);
+
+   cmd->base.speed = adapter->speed;
+   cmd->base.duplex = adapter->duplex;
+   cmd->base.port = PORT_OTHER;
+
+   return 0;
+}
+
+static void ibmveth_init_link_settings(struct ibmveth_adapter *adapter)
+{
+   adapter->duplex = DUPLEX_FULL;
+   adapter->speed = SPEED_1000;
+}
+
+static void ibmveth_get_drvinfo(struct net_device *dev,
+   struct ethtool_drvinfo *info)
 {
strlcpy(info->driver, ibmveth_driver_name, sizeof(info->driver));
strlcpy(info->version, ibmveth_driver_version, sizeof(info->version));
@@ -965,12 +1002,13 @@ static void ibmveth_get_ethtool_stats(struct net_device 
*dev,
 }
 
 static const struct ethtool_ops netdev_ethtool_ops = {
-   .get_drvinfo= netdev_get_drvinfo,
+   .get_drvinfo= ibmveth_get_drvinfo,
.get_link   = ethtool_op_get_link,
.get_strings= ibmveth_get_strings,
.get_sset_count = ibmveth_get_sset_count,
.get_ethtool_stats  =

Re: [PATCH] dma-mapping: fix page attributes for dma_mmap_*

2019-08-06 Thread Will Deacon

On Sat, Aug 03, 2019 at 08:48:12AM +0200, Christoph Hellwig wrote:
> On Fri, Aug 02, 2019 at 11:38:03AM +0100, Will Deacon wrote:
> > 
> > So this boils down to a terminology mismatch. The Arm architecture doesn't 
> > have
> > anything called "write combine", so in Linux we instead provide what the Arm
> > architecture calls "Normal non-cacheable" memory for pgprot_writecombine().
> > Amongst other things, this memory type permits speculation, unaligned 
> > accesses
> > and merging of writes. I found something in the architecture spec about
> > non-cachable memory, but it's written in Armglish[1].
> > 
> > pgprot_noncached(), on the other hand, provides what the architecture calls
> > Strongly Ordered or Device-nGnRnE memory. This is intended for mapping MMIO
> > (i.e. PCI config space) and therefore forbids speculation, preserves access
> > size, requires strict alignment and also forces write responses to come from
> > the endpoint.
> > 
> > I think the naming mismatch is historical, but on arm64 we wanted to use the
> > same names as arm32 so that any drivers using these things directly would 
> > get
> > the same behaviour.
> 
> That all makes sense, but it totally needs a comment.  I'll try to draft
> one based on this.  I've also looked at the arm32 code a bit more, and
> it seems arm always (?) supported Normal non-cacheable attribute, but
> Linux only optionally uses it for arm v6+ because of fears of drivers
> missing barriers.

I think it was also to do with aliasing, but I don't recall all of the
details.

> The other really weird things is that in arm32
> pgprot_dmacoherent incudes the L_PTE_XN bit, which from my understanding
> is the no-execture bit, but pgprot_writecombine does not.  This seems to
> not very unintentional.  So minus that the whole DMA_ATTR_WRITE_COMBІNE
> seems to be about flagging old arm specific drivers as having the proper
> barriers in places and otherwise is a no-op.

I think it only matters for Armv7 CPUs, but yes, we should probably be
setting L_PTE_XN for both of these memory types.

> Here is my tentative plan:
> 
>  - respin this patch with a small fix to handle the
>DMA_ATTR_NON_CONSISTENT (as in ignore it unless actually supported),
>but keep the name as-is to avoid churn.  This should allow 5.3
>inclusion and backports
>  - remove DMA_ATTR_WRITE_COMBINE support from mips, probably also 5.3
>material.
>  - move all architectures but arm over to just define
>pgprot_dmacoherent, including a comment with the above explanation
>for arm64.

That would be great, thanks.

>  - make DMA_ATTR_WRITE_COMBINE a no-op and schedule it for removal,
>thus removing the last instances of arch_dma_mmap_pgprot

All sounds good to me, although I suppose 32-bit Arm platforms without
CONFIG_ARM_DMA_MEM_BUFFERABLE may run into issues if DMA_ATTR_WRITE_COMBINE
disappears. Only one way to find out...

Will

Re: [PATCH v2 3/3] powerpc/spinlocks: Fix oops in shared-processor spinlocks

2019-08-06 Thread Segher Boessenkool

On Tue, Aug 06, 2019 at 10:14:27PM +1000, Michael Ellerman wrote:
> Christopher M Riedl  writes:
> > Yep, and that's no good. Hmm, executing the barrier() in the 
> > non-shared-processor
> > case probably hurts performance here?
> 
> It's only a "compiler barrier", so it shouldn't generate any code.
> 
> But it does have the effect of telling the compiler it can't optimise
> across that barrier, which can be important.

This is

#define barrier() __asm__ __volatile__("": : :"memory")

It doesn't tell the compiler "not to optimise" across the barrier.  It
tells the compiler that all memory accesses before the barrier should
stay before it, and all accesses after the barrier should stay after it,
because it says the "barrier" can access and/or change any memory.

This does not tell the hardware not to move those accesses around.  It
also doesn't say anything about things that are not in memory.  Not
everything you think is in memory, is.  What is and isn't in memory can
change during compilation.

[ This message brought to you by the "Stamp Out Optimisation Barrier"
  campaign. ]

Segher

[PATCH v2 3/3] arm: Add support for function error injection

2019-08-06 Thread Leo Yan

This patch implements arm specific functions regs_set_return_value() and
override_function_with_return() to support function error injection.

In the exception flow, it updates pt_regs::ARM_pc with pt_regs::ARM_lr
so can override the probed function return.

Signed-off-by: Leo Yan 
---
 arch/arm/Kconfig  |  1 +
 arch/arm/include/asm/ptrace.h |  5 +
 arch/arm/lib/Makefile |  2 ++
 arch/arm/lib/error-inject.c   | 19 +++
 4 files changed, 27 insertions(+)
 create mode 100644 arch/arm/lib/error-inject.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 33b00579beff..2d3d44a037f6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -77,6 +77,7 @@ config ARM
select HAVE_EXIT_THREAD
select HAVE_FAST_GUP if ARM_LPAE
select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
+   select HAVE_FUNCTION_ERROR_INJECTION if !THUMB2_KERNEL
select HAVE_FUNCTION_GRAPH_TRACER if !THUMB2_KERNEL && !CC_IS_CLANG
select HAVE_FUNCTION_TRACER if !XIP_KERNEL
select HAVE_GCC_PLUGINS
diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h
index 91d6b7856be4..3b41f37b361a 100644
--- a/arch/arm/include/asm/ptrace.h
+++ b/arch/arm/include/asm/ptrace.h
@@ -89,6 +89,11 @@ static inline long regs_return_value(struct pt_regs *regs)
return regs->ARM_r0;
 }
 
+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long 
rc)
+{
+   regs->ARM_r0 = rc;
+}
+
 #define instruction_pointer(regs)  (regs)->ARM_pc
 
 #ifdef CONFIG_THUMB2_KERNEL
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index b25c54585048..8f56484a7156 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -42,3 +42,5 @@ ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
   CFLAGS_xor-neon.o+= $(NEON_FLAGS)
   obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
 endif
+
+obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
diff --git a/arch/arm/lib/error-inject.c b/arch/arm/lib/error-inject.c
new file mode 100644
index ..2d696dc94893
--- /dev/null
+++ b/arch/arm/lib/error-inject.c
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+
+void override_function_with_return(struct pt_regs *regs)
+{
+   /*
+* 'regs' represents the state on entry of a predefined function in
+* the kernel/module and which is captured on a kprobe.
+*
+* 'regs->ARM_lr' contains the the link register for the probed
+* function, when kprobe returns back from exception it will override
+* the end of probed function and directly return to the predefined
+* function's caller.
+*/
+   instruction_pointer_set(regs, regs->ARM_lr);
+}
+NOKPROBE_SYMBOL(override_function_with_return);
-- 
2.17.1

[PATCH v2 2/3] arm64: Add support for function error injection

2019-08-06 Thread Leo Yan

Inspired by the commit 7cd01b08d35f ("powerpc: Add support for function
error injection"), this patch supports function error injection for
Arm64.

This patch mainly support two functions: one is regs_set_return_value()
which is used to overwrite the return value; the another function is
override_function_with_return() which is to override the probed
function returning and jump to its caller.

Signed-off-by: Leo Yan 
---
 arch/arm64/Kconfig  |  1 +
 arch/arm64/include/asm/ptrace.h |  5 +
 arch/arm64/lib/Makefile |  2 ++
 arch/arm64/lib/error-inject.c   | 18 ++
 4 files changed, 26 insertions(+)
 create mode 100644 arch/arm64/lib/error-inject.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3adcec05b1f6..b15803afb2a0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -148,6 +148,7 @@ config ARM64
select HAVE_FAST_GUP
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
+   select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_GCC_PLUGINS
select HAVE_HW_BREAKPOINT if PERF_EVENTS
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index b1dd039023ef..891b9995cb4b 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -301,6 +301,11 @@ static inline unsigned long regs_return_value(struct 
pt_regs *regs)
return regs->regs[0];
 }
 
+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long 
rc)
+{
+   regs->regs[0] = rc;
+}
+
 /**
  * regs_get_kernel_argument() - get Nth function argument in kernel
  * @regs:  pt_regs of that context
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 33c2a4abda04..f182ccb0438e 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -33,3 +33,5 @@ UBSAN_SANITIZE_atomic_ll_sc.o := n
 lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
 
 obj-$(CONFIG_CRC32) += crc32.o
+
+obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
diff --git a/arch/arm64/lib/error-inject.c b/arch/arm64/lib/error-inject.c
new file mode 100644
index ..ed15021da3ed
--- /dev/null
+++ b/arch/arm64/lib/error-inject.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+
+void override_function_with_return(struct pt_regs *regs)
+{
+   /*
+* 'regs' represents the state on entry of a predefined function in
+* the kernel/module and which is captured on a kprobe.
+*
+* When kprobe returns back from exception it will override the end
+* of probed function and directly return to the predefined
+* function's caller.
+*/
+   instruction_pointer_set(regs, procedure_link_pointer(regs));
+}
+NOKPROBE_SYMBOL(override_function_with_return);
-- 
2.17.1

[PATCH v2 1/3] error-injection: Consolidate override function definition

2019-08-06 Thread Leo Yan

The function override_function_with_return() is defined separately for
each architecture and every architecture's definition is almost same
with each other.  E.g. x86 and powerpc both define function in its own
asm/error-injection.h header and override_function_with_return() has
the same definition, the only difference is that x86 defines an extra
function just_return_func() but it is specific for x86 and is only used
by x86's override_function_with_return(), so don't need to export this
function.

This patch consolidates override_function_with_return() definition into
asm-generic/error-injection.h header, thus all architectures can use the
common definition.  As result, the architecture specific headers are
removed; the include/linux/error-injection.h header also changes to
include asm-generic/error-injection.h header rather than architecture
header, furthermore, it includes linux/compiler.h for successful
compilation.

Signed-off-by: Leo Yan 
---
 arch/powerpc/include/asm/error-injection.h | 13 -
 arch/x86/include/asm/error-injection.h | 13 -
 include/asm-generic/error-injection.h  |  6 ++
 include/linux/error-injection.h|  6 +++---
 4 files changed, 9 insertions(+), 29 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/error-injection.h
 delete mode 100644 arch/x86/include/asm/error-injection.h

diff --git a/arch/powerpc/include/asm/error-injection.h 
b/arch/powerpc/include/asm/error-injection.h
deleted file mode 100644
index 62fd24739852..
--- a/arch/powerpc/include/asm/error-injection.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0+ */
-
-#ifndef _ASM_ERROR_INJECTION_H
-#define _ASM_ERROR_INJECTION_H
-
-#include 
-#include 
-#include 
-#include 
-
-void override_function_with_return(struct pt_regs *regs);
-
-#endif /* _ASM_ERROR_INJECTION_H */
diff --git a/arch/x86/include/asm/error-injection.h 
b/arch/x86/include/asm/error-injection.h
deleted file mode 100644
index 47b7a1296245..
--- a/arch/x86/include/asm/error-injection.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_ERROR_INJECTION_H
-#define _ASM_ERROR_INJECTION_H
-
-#include 
-#include 
-#include 
-#include 
-
-asmlinkage void just_return_func(void);
-void override_function_with_return(struct pt_regs *regs);
-
-#endif /* _ASM_ERROR_INJECTION_H */
diff --git a/include/asm-generic/error-injection.h 
b/include/asm-generic/error-injection.h
index 95a159a4137f..80ca61058dd2 100644
--- a/include/asm-generic/error-injection.h
+++ b/include/asm-generic/error-injection.h
@@ -16,6 +16,8 @@ struct error_injection_entry {
int etype;
 };
 
+struct pt_regs;
+
 #ifdef CONFIG_FUNCTION_ERROR_INJECTION
 /*
  * Whitelist ganerating macro. Specify functions which can be
@@ -28,8 +30,12 @@ static struct error_injection_entry __used   
\
.addr = (unsigned long)fname,   \
.etype = EI_ETYPE_##_etype, \
};
+
+void override_function_with_return(struct pt_regs *regs);
 #else
 #define ALLOW_ERROR_INJECTION(fname, _etype)
+
+static inline void override_function_with_return(struct pt_regs *regs) { }
 #endif
 #endif
 
diff --git a/include/linux/error-injection.h b/include/linux/error-injection.h
index 280c61ecbf20..635a95caf29f 100644
--- a/include/linux/error-injection.h
+++ b/include/linux/error-injection.h
@@ -2,16 +2,16 @@
 #ifndef _LINUX_ERROR_INJECTION_H
 #define _LINUX_ERROR_INJECTION_H
 
-#ifdef CONFIG_FUNCTION_ERROR_INJECTION
+#include 
+#include 
 
-#include 
+#ifdef CONFIG_FUNCTION_ERROR_INJECTION
 
 extern bool within_error_injection_list(unsigned long addr);
 extern int get_injectable_error_type(unsigned long addr);
 
 #else /* !CONFIG_FUNCTION_ERROR_INJECTION */
 
-#include 
 static inline bool within_error_injection_list(unsigned long addr)
 {
return false;
-- 
2.17.1

[PATCH v2 0/3] arm/arm64: Add support for function error injection

2019-08-06 Thread Leo Yan

This small patch set is to add support for function error injection;
this can be used to eanble more advanced debugging feature, e.g.
CONFIG_BPF_KPROBE_OVERRIDE.

The patch 01/03 is to consolidate the function definition which can be
suared cross architectures, patches 02,03/03 are used for enabling
function error injection on arm64 and arm architecture respectively.

I tested on arm64 platform Juno-r2 and one of my laptop with x86
architecture with below steps; I don't test for Arm architecture so
only pass compilation.

- Enable kernel configuration:
  CONFIG_BPF_KPROBE_OVERRIDE
  CONFIG_BTRFS_FS
  CONFIG_BPF_EVENTS=y
  CONFIG_KPROBES=y
  CONFIG_KPROBE_EVENTS=y
  CONFIG_BPF_KPROBE_OVERRIDE=y

- Build samples/bpf on with Debian rootFS:
  # cd $kernel
  # make headers_install
  # make samples/bpf/ LLC=llc-7 CLANG=clang-7

- Run the sample tracex7:
  # dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
  # DEVICE=$(losetup --show -f testfile.img)
  # mkfs.btrfs -f $DEVICE
  # ./tracex7 testfile.img
  [ 1975.211781] BTRFS error (device (efault)): open_ctree failed
  mount: /mnt/linux-kernel/linux-cs-dev/samples/bpf/tmpmnt: mount(2) system 
call failed: Cannot allocate memory.

Changes from v1:
* Consolidated the function definition into asm-generic header (Will);
* Used APIs to access pt_regs elements (Will);
* Fixed typos in the comments (Will).


Leo Yan (3):
  error-injection: Consolidate override function definition
  arm64: Add support for function error injection
  arm: Add support for function error injection

 arch/arm/Kconfig   |  1 +
 arch/arm/include/asm/ptrace.h  |  5 +
 arch/arm/lib/Makefile  |  2 ++
 arch/arm/lib/error-inject.c| 19 +++
 arch/arm64/Kconfig |  1 +
 arch/arm64/include/asm/ptrace.h|  5 +
 arch/arm64/lib/Makefile|  2 ++
 arch/arm64/lib/error-inject.c  | 18 ++
 arch/powerpc/include/asm/error-injection.h | 13 -
 arch/x86/include/asm/error-injection.h | 13 -
 include/asm-generic/error-injection.h  |  6 ++
 include/linux/error-injection.h|  6 +++---
 12 files changed, 62 insertions(+), 29 deletions(-)
 create mode 100644 arch/arm/lib/error-inject.c
 create mode 100644 arch/arm64/lib/error-inject.c
 delete mode 100644 arch/powerpc/include/asm/error-injection.h
 delete mode 100644 arch/x86/include/asm/error-injection.h

-- 
2.17.1

Re: [PATCH net-next] ibmveth: Allow users to update reported speed and duplex

2019-08-06 Thread Thomas Falcon




On 8/6/19 5:25 AM, Michael Ellerman wrote:

Thomas Falcon  writes:

Reported ethtool link settings for the ibmveth driver are currently
hardcoded and no longer reflect the actual capabilities of supported
hardware. There is no interface designed for retrieving this information
from device firmware nor is there any way to update current settings
to reflect observed or expected link speeds.

To avoid confusion, initially define speed and duplex as unknown and

Doesn't that risk break existing setups?



You're right, sorry for missing that.





allow the user to alter these settings to match the expected
capabilities of underlying hardware if needed. This update would allow
the use of configurations that rely on certain link speed settings,
such as LACP. This patch is based on the implementation in virtio_net.

Wouldn't it be safer to keep the current values as the default, and then
also allow them to be overridden by a motivated user.


That is a good compromise.  I will resend an updated version soon with 
that change.


Thanks!




cheers

Re: [PATCH v2 3/3] powerpc/spinlocks: Fix oops in shared-processor spinlocks

2019-08-06 Thread Christopher M Riedl



> On August 6, 2019 at 7:14 AM Michael Ellerman  wrote:
> 
> 
> Christopher M Riedl  writes:
> >> On August 2, 2019 at 6:38 AM Michael Ellerman  wrote:
> >> "Christopher M. Riedl"  writes:
> >> 
> >> This leaves us with a double test of is_shared_processor() doesn't it?
> >
> > Yep, and that's no good. Hmm, executing the barrier() in the 
> > non-shared-processor
> > case probably hurts performance here?
> 
> It's only a "compiler barrier", so it shouldn't generate any code.
> 
> But it does have the effect of telling the compiler it can't optimise
> across that barrier, which can be important.
> 
> In those spin loops all we're doing is checking lock->slock which is
> already marked volatile in the definition of arch_spinlock_t, so the
> extra barrier shouldn't really make any difference.
> 
> But still the current code doesn't have a barrier() there, so we should
> make sure we don't introduce one as part of this refactor.

Thank you for taking the time to explain this. I have some more reading to
do about compiler-barriers it seems :)

> 
> So I think you just want to change the call to spin_yield() above to
> splpar_spin_yield(), which avoids the double check, and also avoids the
> barrier() in the SPLPAR=n case.
> 
> And then arch_spin_relax() calls spin_yield() etc.

I submitted a v3 before your reply with this change already - figured this
is the best way to avoid the double check and maintain legacy behavior.

> 
> cheers

Re: [PATCH v2 3/3] powerpc/spinlocks: Fix oops in shared-processor spinlocks

2019-08-06 Thread Michael Ellerman

Christopher M Riedl  writes:
>> On August 2, 2019 at 6:38 AM Michael Ellerman  wrote:
>> "Christopher M. Riedl"  writes:
>> > diff --git a/arch/powerpc/include/asm/spinlock.h 
>> > b/arch/powerpc/include/asm/spinlock.h
>> > index 0a8270183770..6aed8a83b180 100644
>> > --- a/arch/powerpc/include/asm/spinlock.h
>> > +++ b/arch/powerpc/include/asm/spinlock.h
>> > @@ -124,6 +122,22 @@ static inline bool is_shared_processor(void)
>> >  #endif
>> >  }
>> >  
>> > +static inline void spin_yield(arch_spinlock_t *lock)
>> > +{
>> > +  if (is_shared_processor())
>> > +  splpar_spin_yield(lock);
>> > +  else
>> > +  barrier();
>> > +}
>> ...
>> >  static inline void arch_spin_lock(arch_spinlock_t *lock)
>> >  {
>> >while (1) {
>> > @@ -132,7 +146,7 @@ static inline void arch_spin_lock(arch_spinlock_t 
>> > *lock)
>> >do {
>> >HMT_low();
>> >if (is_shared_processor())
>> > -  __spin_yield(lock);
>> > +  spin_yield(lock);
>> 
>> This leaves us with a double test of is_shared_processor() doesn't it?
>
> Yep, and that's no good. Hmm, executing the barrier() in the 
> non-shared-processor
> case probably hurts performance here?

It's only a "compiler barrier", so it shouldn't generate any code.

But it does have the effect of telling the compiler it can't optimise
across that barrier, which can be important.

In those spin loops all we're doing is checking lock->slock which is
already marked volatile in the definition of arch_spinlock_t, so the
extra barrier shouldn't really make any difference.

But still the current code doesn't have a barrier() there, so we should
make sure we don't introduce one as part of this refactor.

So I think you just want to change the call to spin_yield() above to
splpar_spin_yield(), which avoids the double check, and also avoids the
barrier() in the SPLPAR=n case.

And then arch_spin_relax() calls spin_yield() etc.

cheers

Re: [RFC PATCH v3] powerpc/xmon: Restrict when kernel is locked down

2019-08-06 Thread Michael Ellerman

"Christopher M. Riedl"  writes:
> Xmon should be either fully or partially disabled depending on the
> kernel lockdown state.
>
> Put xmon into read-only mode for lockdown=integrity and completely
> disable xmon when lockdown=confidentiality. Xmon checks the lockdown
> state and takes appropriate action:
>
>  (1) during xmon_setup to prevent early xmon'ing
>
>  (2) when triggered via sysrq
>
>  (3) when toggled via debugfs
>
>  (4) when triggered via a previously enabled breakpoint
>
> The following lockdown state transitions are handled:
>
>  (1) lockdown=none -> lockdown=integrity
>  set xmon read-only mode
>
>  (2) lockdown=none -> lockdown=confidentiality
>  clear all breakpoints, set xmon read-only mode,
>  prevent re-entry into xmon
>
>  (3) lockdown=integrity -> lockdown=confidentiality
>  clear all breakpoints, set xmon read-only mode,
>  prevent re-entry into xmon
>
> Suggested-by: Andrew Donnellan 
> Signed-off-by: Christopher M. Riedl 
> ---
> Changes since v1:
>  - Rebased onto v36 of https://patchwork.kernel.org/cover/11049461/
>(based on: f632a8170a6b667ee4e3f552087588f0fe13c4bb)
>  - Do not clear existing breakpoints when transitioning from
>lockdown=none to lockdown=integrity
>  - Remove line continuation and dangling quote (confuses checkpatch.pl)
>from the xmon command help/usage string

This looks good to me.

So I guess we're just waiting on lockdown to go in somewhere.

cheers

> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index d0620d762a5a..1a5e43d664ca 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -187,6 +188,9 @@ static void dump_tlb_44x(void);
>  static void dump_tlb_book3e(void);
>  #endif
>  
> +static void clear_all_bpt(void);
> +static void xmon_init(int);
> +
>  #ifdef CONFIG_PPC64
>  #define REG  "%.16lx"
>  #else
> @@ -283,10 +287,41 @@ Commands:\n\
>  "  U show uptime information\n"
>  "  ? help\n"
>  "  # n   limit output to n lines per page (for dp, dpa, dl)\n"
> -"  zrreboot\n\
> -  zh halt\n"
> +"  zrreboot\n"
> +"  zhhalt\n"
>  ;
>  
> +#ifdef CONFIG_SECURITY
> +static bool xmon_is_locked_down(void)
> +{
> + static bool lockdown;
> +
> + if (!lockdown) {
> + lockdown = !!security_locked_down(LOCKDOWN_XMON_RW);
> + if (lockdown) {
> + printf("xmon: Disabled due to kernel lockdown\n");
> + xmon_is_ro = true;
> + xmon_on = 0;
> + xmon_init(0);
> + clear_all_bpt();
> + }
> + }
> +
> + if (!xmon_is_ro) {
> + xmon_is_ro = !!security_locked_down(LOCKDOWN_XMON_WR);
> + if (xmon_is_ro)
> + printf("xmon: Read-only due to kernel lockdown\n");
> + }
> +
> + return lockdown;
> +}
> +#else /* CONFIG_SECURITY */
> +static inline bool xmon_is_locked_down(void)
> +{
> + return false;
> +}
> +#endif
> +
>  static struct pt_regs *xmon_regs;
>  
>  static inline void sync(void)
> @@ -704,6 +739,9 @@ static int xmon_bpt(struct pt_regs *regs)
>   struct bpt *bp;
>   unsigned long offset;
>  
> + if (xmon_is_locked_down())
> + return 0;
> +
>   if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT))
>   return 0;
>  
> @@ -735,6 +773,9 @@ static int xmon_sstep(struct pt_regs *regs)
>  
>  static int xmon_break_match(struct pt_regs *regs)
>  {
> + if (xmon_is_locked_down())
> + return 0;
> +
>   if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT))
>   return 0;
>   if (dabr.enabled == 0)
> @@ -745,6 +786,9 @@ static int xmon_break_match(struct pt_regs *regs)
>  
>  static int xmon_iabr_match(struct pt_regs *regs)
>  {
> + if (xmon_is_locked_down())
> + return 0;
> +
>   if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT))
>   return 0;
>   if (iabr == NULL)
> @@ -3741,6 +3785,9 @@ static void xmon_init(int enable)
>  #ifdef CONFIG_MAGIC_SYSRQ
>  static void sysrq_handle_xmon(int key)
>  {
> + if (xmon_is_locked_down())
> + return;
> +
>   /* ensure xmon is enabled */
>   xmon_init(1);
>   debugger(get_irq_regs());
> @@ -3762,7 +3809,6 @@ static int __init setup_xmon_sysrq(void)
>  device_initcall(setup_xmon_sysrq);
>  #endif /* CONFIG_MAGIC_SYSRQ */
>  
> -#ifdef CONFIG_DEBUG_FS
>  static void clear_all_bpt(void)
>  {
>   int i;
> @@ -3784,8 +3830,12 @@ static void clear_all_bpt(void)
>   printf("xmon: All breakpoints cleared\n");
>  }
>  
> +#ifdef CONFIG_DEBUG_FS
>  static int xmon_dbgfs_set(void *data, u64 val)
>  {
> + if (xmon_is_locked_down())
> + return 0;
> +
>   xmon_on = !!val;
>   xmon_init(xmon_on);
>  
> @@ -3844,6 +3894,9 @@ early_param("xmon",

Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-06 Thread Michael Ellerman

Chris Packham  writes:
> On Mon, 2019-08-05 at 14:06 +1200, Chris Packham wrote:
>> Hi All,
>> 
>> I have a custom board that uses the Freescale/NXP T2080 SoC.
>> 
>> The board boots fine using v4.19.60 but when I use v5.1.21 it locks
>> up
>> waiting for the other CPUs to come online (earlyprintk output below).
>> If I set maxcpus=0 then the system boots all the way through to
>> userland. The same thing happens with 5.3-rc2.
>> 
>> The defconfig I'm using is 
>> https://gist.github.com/cpackham/f24d0b426f3
>> de0eaaba17b82c3528a9d it was updated from the working v4.19.60
>> defconfig using make olddefconfig.
>> 
>> Does this ring any bells for anyone?
>> 
>> I haven't dug into the differences between the working an non-working
>> versions yet. I'll start looking now.
>
> I've bisected this to the following commit

Thanks that's super helpful.

> commit ed1cd6deb013a11959d17a94e35ce159197632da
> Author: Christophe Leroy 
> Date:   Thu Jan 31 10:08:58 2019 +
>
> powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
> 
> This patch activates CONFIG_THREAD_INFO_IN_TASK which
> moves the thread_info into task_struct.
>
> I'll be the first to admit this is well beyond my area of knowledge so
> I'm unsure what about this patch is problematic but I can be fairly
> sure that a build immediately before this patch works while a build
> with this patch hangs.

It makes a pretty fundamental change to the way the kernel stores some
information about each task, moving it off the stack and into the task
struct.

It definitely has the potential to break things, but I thought we had
reasonable test coverage of the Book3E platforms, I have a p5020ds
(e5500) that I boot as part of my CI.

Aha. If I take your config and try to boot it on my p5020ds I get the
same behaviour, stuck at SMP bringup. So it seems it's something in your
config vs corenet64_smp_defconfig that is triggering the bug.

Can you try bisecting what in the config triggers it?

To do that you checkout ed1cd6deb013a11959d17a94e35ce159197632da, then
you build/boot with corenet64_smp_defconfig to confirm it works. Then
you use tools/testing/ktest/config-bisect.pl to bisect the changes in
the .config.

cheers


>> Booting...
>> MMU: Supported page sizes
>>  4 KB as direct
>>   2048 KB as direct & indirect
>>   4096 KB as direct
>>  16384 KB as direct
>>  65536 KB as direct
>> 262144 KB as direct
>>1048576 KB as direct
>> MMU: Book3E HW tablewalk enabled
>> Linux version 5.1.21-at1+ (@chrisp-dl) (gcc version 4.9.3 (crosstool-
>> NG 
>> crosstool-ng-1.22.0)) #24 SMP PREEMPT Mon Aug 5 01:42:00 UTC 2019
>> Found initrd at 0xc0002f045000:0xc0003000
>> Using CoreNet Generic machine description
>> Found legacy serial port 0 for /soc@ffe00/serial@11c500
>>   mem=ffe11c500, taddr=ffe11c500, irq=0, clk=3, speed=0
>> Found legacy serial port 1 for /soc@ffe00/serial@11c600
>>   mem=ffe11c600, taddr=ffe11c600, irq=0, clk=3, speed=0
>> Found legacy serial port 2 for /soc@ffe00/serial@11d500
>>   mem=ffe11d500, taddr=ffe11d500, irq=0, clk=3, speed=0
>> Found legacy serial port 3 for /soc@ffe00/serial@11d600
>>   mem=ffe11d600, taddr=ffe11d600, irq=0, clk=3, speed=0
>> printk: bootconsole [udbg0] enabled
>> CPU maps initialized for 2 threads per core
>>  (thread shift is 1)
>> Allocated 1856 bytes for 8 pacas
>> -
>> phys_mem_size = 0x1
>> dcache_bsize  = 0x40
>> icache_bsize  = 0x40
>> cpu_features  = 0x0003009003b6
>>   possible= 0x0003009003b6
>>   always  = 0x0003008003b4
>> cpu_user_features = 0xdc008000 0x0800
>> mmu_features  = 0x000a0010
>> firmware_features = 0x
>> -
>> CoreNet Generic board
>> barrier-nospec: using isync; sync as speculation barrier
>> barrier-nospec: patched 412 locations
>> Top of RAM: 0x1, Total RAM: 0x1
>> Memory hole size: 0MB
>> Zone ranges:
>>   DMA  [mem 0x-0x7fffefff]
>>   Normal   [mem 0x7000-0x]
>> Movable zone start for each node
>> Early memory node ranges
>>   node   0: [mem 0x-0x]
>> Initmem setup node 0 [mem 0x-0x]
>> On node 0 totalpages: 1048576
>>   DMA zone: 7168 pages used for memmap
>>   DMA zone: 0 pages reserved
>>   DMA zone: 524287 pages, LIFO batch:63
>>   Normal zone: 7169 pages used for memmap
>>   Normal zone: 524289 pages, LIFO batch:63
>> MMU: Allocated 2112 bytes of context maps for 255 contexts
>> percpu: Embedded 22 pages/cpu s49304 r0 d40808 u131072
>> pcpu-alloc: s49304 r0 d40808 u131072 alloc=1*1048576
>> pcpu-alloc: [0] 0 1 2 3 4 5 6 7 
>> Built 1 zonelists, mobility grouping on.  Total pages: 1034239
>> Kernel command line: console=ttyS0,115200 root=/dev/ram0
>>

Re: [PATCH] powerpc/fadump: sysfs for fadump memory reservation

2019-08-06 Thread Mahesh Jagannath Salgaonkar

On 8/6/19 8:42 AM, Sourabh Jain wrote:
> Add a sys interface to allow querying the memory reserved by fadump
> for saving the crash dump.
> 
> Signed-off-by: Sourabh Jain 

Looks good to me.

Reviewed-by: Mahesh Salgaonkar 

Thanks,
-Mahesh.

> ---
>  Documentation/powerpc/firmware-assisted-dump.rst |  5 +
>  arch/powerpc/kernel/fadump.c | 14 ++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/Documentation/powerpc/firmware-assisted-dump.rst 
> b/Documentation/powerpc/firmware-assisted-dump.rst
> index 9ca12830a48e..4a7f6dc556f5 100644
> --- a/Documentation/powerpc/firmware-assisted-dump.rst
> +++ b/Documentation/powerpc/firmware-assisted-dump.rst
> @@ -222,6 +222,11 @@ Here is the list of files under kernel sysfs:
>  be handled and vmcore will not be captured. This interface can be
>  easily integrated with kdump service start/stop.
> 
> +/sys/kernel/fadump_mem_reserved
> +
> +   This is used to display the memory reserved by fadump for saving the
> +   crash dump.
> +
>   /sys/kernel/fadump_release_mem
>  This file is available only when fadump is active during
>  second kernel. This is used to release the reserved memory
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 4eab97292cc2..70d49013ebec 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -1514,6 +1514,13 @@ static ssize_t fadump_enabled_show(struct kobject 
> *kobj,
>   return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
>  }
> 
> +static ssize_t fadump_mem_reserved_show(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + char *buf)
> +{
> + return sprintf(buf, "%ld\n", fw_dump.reserve_dump_area_size);
> +}
> +
>  static ssize_t fadump_register_show(struct kobject *kobj,
>   struct kobj_attribute *attr,
>   char *buf)
> @@ -1632,6 +1639,9 @@ static struct kobj_attribute fadump_attr = 
> __ATTR(fadump_enabled,
>  static struct kobj_attribute fadump_register_attr = __ATTR(fadump_registered,
>   0644, fadump_register_show,
>   fadump_register_store);
> +static struct kobj_attribute fadump_mem_reserved_attr =
> + __ATTR(fadump_mem_reserved, 0444,
> + fadump_mem_reserved_show, NULL);
> 
>  DEFINE_SHOW_ATTRIBUTE(fadump_region);
> 
> @@ -1663,6 +1673,10 @@ static void fadump_init_files(void)
>   printk(KERN_ERR "fadump: unable to create sysfs file"
>   " fadump_release_mem (%d)\n", rc);
>   }
> + rc = sysfs_create_file(kernel_kobj, _mem_reserved_attr.attr);
> + if (rc)
> + pr_err("unable to create sysfs file fadump_mem_reserved (%d)\n",
> + rc);
>   return;
>  }
>

Re: [PATCH net-next] ibmveth: Allow users to update reported speed and duplex

2019-08-06 Thread Michael Ellerman

Thomas Falcon  writes:
> Reported ethtool link settings for the ibmveth driver are currently
> hardcoded and no longer reflect the actual capabilities of supported
> hardware. There is no interface designed for retrieving this information
> from device firmware nor is there any way to update current settings
> to reflect observed or expected link speeds.
>
> To avoid confusion, initially define speed and duplex as unknown and

Doesn't that risk break existing setups?

> allow the user to alter these settings to match the expected
> capabilities of underlying hardware if needed. This update would allow
> the use of configurations that rely on certain link speed settings,
> such as LACP. This patch is based on the implementation in virtio_net.

Wouldn't it be safer to keep the current values as the default, and then
also allow them to be overridden by a motivated user.

cheers

Re: [PATCH v4 09/10] powerpc/fsl_booke/kaslr: support nokaslr cmdline parameter

2019-08-06 Thread Christophe Leroy





Le 05/08/2019 à 08:43, Jason Yan a écrit :

One may want to disable kaslr when boot, so provide a cmdline parameter
'nokaslr' to support this.

Signed-off-by: Jason Yan 
Cc: Diana Craciun 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Nicholas Piggin 
Cc: Kees Cook 
Reviewed-by: Diana Craciun 
Tested-by: Diana Craciun 


Reviewed-by: Christophe Leroy 

Tiny comment below.


---
  arch/powerpc/kernel/kaslr_booke.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/kernel/kaslr_booke.c 
b/arch/powerpc/kernel/kaslr_booke.c
index 4b3f19a663fc..7c3cb41e7122 100644
--- a/arch/powerpc/kernel/kaslr_booke.c
+++ b/arch/powerpc/kernel/kaslr_booke.c
@@ -361,6 +361,18 @@ static unsigned long __init kaslr_choose_location(void 
*dt_ptr, phys_addr_t size
return kaslr_offset;
  }
  
+static inline __init bool kaslr_disabled(void)

+{
+   char *str;
+
+   str = strstr(boot_command_line, "nokaslr");
+   if ((str == boot_command_line) ||
+   (str > boot_command_line && *(str - 1) == ' '))
+   return true;


I don't think additional () are needed for the left part 'str == 
boot_command_line'



+
+   return false;
+}
+
  /*
   * To see if we need to relocate the kernel to a random offset
   * void *dt_ptr - address of the device tree
@@ -376,6 +388,8 @@ notrace void __init kaslr_early_init(void *dt_ptr, 
phys_addr_t size)
kernel_sz = (unsigned long)_end - KERNELBASE;
  
  	kaslr_get_cmdline(dt_ptr);

+   if (kaslr_disabled())
+   return;
  
  	offset = kaslr_choose_location(dt_ptr, size, kernel_sz);

Re: [PATCH v4 07/10] powerpc/fsl_booke/32: randomize the kernel image offset

2019-08-06 Thread Christophe Leroy





Le 05/08/2019 à 08:43, Jason Yan a écrit :

After we have the basic support of relocate the kernel in some
appropriate place, we can start to randomize the offset now.

Entropy is derived from the banner and timer, which will change every
build and boot. This not so much safe so additionally the bootloader may
pass entropy via the /chosen/kaslr-seed node in device tree.

We will use the first 512M of the low memory to randomize the kernel
image. The memory will be split in 64M zones. We will use the lower 8
bit of the entropy to decide the index of the 64M zone. Then we chose a
16K aligned offset inside the 64M zone to put the kernel in.

 KERNELBASE

 |-->   64M   <--|
 |   |
 +---+++---+
 |   |||kernel||   |
 +---+++---+
 | |
 |->   offset<-|

   kimage_vaddr

We also check if we will overlap with some areas like the dtb area, the
initrd area or the crashkernel area. If we cannot find a proper area,
kaslr will be disabled and boot from the original kernel.

Signed-off-by: Jason Yan 
Cc: Diana Craciun 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Nicholas Piggin 
Cc: Kees Cook 
Reviewed-by: Diana Craciun 
Tested-by: Diana Craciun 


Reviewed-by: Christophe Leroy 

One small comment below


---
  arch/powerpc/kernel/kaslr_booke.c | 322 +-
  1 file changed, 320 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kaslr_booke.c 
b/arch/powerpc/kernel/kaslr_booke.c
index 30f84c0321b2..97250cad71de 100644
--- a/arch/powerpc/kernel/kaslr_booke.c
+++ b/arch/powerpc/kernel/kaslr_booke.c
@@ -23,6 +23,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  #include 
  #include 
@@ -34,15 +36,329 @@
  #include 
  #include 
  #include 
+#include 
  #include 
+#include 
+#include 
+
+#ifdef DEBUG
+#define DBG(fmt...) printk(KERN_ERR fmt)
+#else
+#define DBG(fmt...)
+#endif
+
+struct regions {
+   unsigned long pa_start;
+   unsigned long pa_end;
+   unsigned long kernel_size;
+   unsigned long dtb_start;
+   unsigned long dtb_end;
+   unsigned long initrd_start;
+   unsigned long initrd_end;
+   unsigned long crash_start;
+   unsigned long crash_end;
+   int reserved_mem;
+   int reserved_mem_addr_cells;
+   int reserved_mem_size_cells;
+};
  
  extern int is_second_reloc;
  
+/* Simplified build-specific string for starting entropy. */

+static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
+   LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
+
+static __init void kaslr_get_cmdline(void *fdt)
+{
+   int node = fdt_path_offset(fdt, "/chosen");
+
+   early_init_dt_scan_chosen(node, "chosen", 1, boot_command_line);
+}
+
+static unsigned long __init rotate_xor(unsigned long hash, const void *area,
+  size_t size)
+{
+   size_t i;
+   unsigned long *ptr = (unsigned long *)area;


As area is a void *, this cast shouldn't be necessary. Or maybe it is 
necessary because it discards the const ?


Christophe


+
+   for (i = 0; i < size / sizeof(hash); i++) {
+   /* Rotate by odd number of bits and XOR. */
+   hash = (hash << ((sizeof(hash) * 8) - 7)) | (hash >> 7);
+   hash ^= ptr[i];
+   }
+
+   return hash;
+}
+
+/* Attempt to create a simple but unpredictable starting entropy. */
+static unsigned long __init get_boot_seed(void *fdt)
+{
+   unsigned long hash = 0;
+
+   hash = rotate_xor(hash, build_str, sizeof(build_str));
+   hash = rotate_xor(hash, fdt, fdt_totalsize(fdt));
+
+   return hash;
+}
+
+static __init u64 get_kaslr_seed(void *fdt)
+{
+   int node, len;
+   fdt64_t *prop;
+   u64 ret;
+
+   node = fdt_path_offset(fdt, "/chosen");
+   if (node < 0)
+   return 0;
+
+   prop = fdt_getprop_w(fdt, node, "kaslr-seed", );
+   if (!prop || len != sizeof(u64))
+   return 0;
+
+   ret = fdt64_to_cpu(*prop);
+   *prop = 0;
+   return ret;
+}
+
+static __init bool regions_overlap(u32 s1, u32 e1, u32 s2, u32 e2)
+{
+   return e1 >= s2 && e2 >= s1;
+}
+
+static __init bool overlaps_reserved_region(const void *fdt, u32 start,
+   u32 end, struct regions *regions)
+{
+   int subnode, len, i;
+   u64 base, size;
+
+   /* check for overlap with /memreserve/ entries */
+   for (i = 0; i < fdt_num_mem_rsv(fdt); i++) {
+   if (fdt_get_mem_rsv(fdt, i, , ) < 0)
+   continue;
+   if (regions_overlap(start, end, base, base + size))
+   return true;
+   }
+
+   if

Re: [PATCH v4 06/10] powerpc/fsl_booke/32: implement KASLR infrastructure

2019-08-06 Thread Christophe Leroy





Le 05/08/2019 à 08:43, Jason Yan a écrit :

This patch add support to boot kernel from places other than KERNELBASE.
Since CONFIG_RELOCATABLE has already supported, what we need to do is
map or copy kernel to a proper place and relocate. Freescale Book-E
parts expect lowmem to be mapped by fixed TLB entries(TLB1). The TLB1
entries are not suitable to map the kernel directly in a randomized
region, so we chose to copy the kernel to a proper place and restart to
relocate.

The offset of the kernel was not randomized yet(a fixed 64M is set). We
will randomize it in the next patch.

Signed-off-by: Jason Yan 
Cc: Diana Craciun 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Nicholas Piggin 
Cc: Kees Cook 
Tested-by: Diana Craciun 


Reviewed-by: Christophe Leroy 


---
  arch/powerpc/Kconfig  | 11 +++
  arch/powerpc/kernel/Makefile  |  1 +
  arch/powerpc/kernel/early_32.c|  2 +-
  arch/powerpc/kernel/fsl_booke_entry_mapping.S | 17 ++--
  arch/powerpc/kernel/head_fsl_booke.S  | 13 ++-
  arch/powerpc/kernel/kaslr_booke.c | 84 +++
  arch/powerpc/mm/mmu_decl.h|  6 ++
  arch/powerpc/mm/nohash/fsl_booke.c|  7 +-
  8 files changed, 126 insertions(+), 15 deletions(-)
  create mode 100644 arch/powerpc/kernel/kaslr_booke.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 77f6ebf97113..755378887912 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -548,6 +548,17 @@ config RELOCATABLE
  setting can still be useful to bootwrappers that need to know the
  load address of the kernel (eg. u-boot/mkimage).
  
+config RANDOMIZE_BASE

+   bool "Randomize the address of the kernel image"
+   depends on (FSL_BOOKE && FLATMEM && PPC32)
+   select RELOCATABLE
+   help
+ Randomizes the virtual address at which the kernel image is
+ loaded, as a security feature that deters exploit attempts
+ relying on knowledge of the location of kernel internals.
+
+ If unsure, say N.
+
  config RELOCATABLE_TEST
bool "Test relocatable kernel"
depends on (PPC64 && RELOCATABLE)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index ea0c69236789..32f6c5b99307 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -106,6 +106,7 @@ extra-$(CONFIG_PPC_8xx) := head_8xx.o
  extra-y   += vmlinux.lds
  
  obj-$(CONFIG_RELOCATABLE)	+= reloc_$(BITS).o

+obj-$(CONFIG_RANDOMIZE_BASE)   += kaslr_booke.o
  
  obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o early_32.o

  obj-$(CONFIG_PPC64)   += dma-iommu.o iommu.o
diff --git a/arch/powerpc/kernel/early_32.c b/arch/powerpc/kernel/early_32.c
index 3482118ffe76..fe8347cdc07d 100644
--- a/arch/powerpc/kernel/early_32.c
+++ b/arch/powerpc/kernel/early_32.c
@@ -32,5 +32,5 @@ notrace unsigned long __init early_init(unsigned long dt_ptr)
  
  	apply_feature_fixups();
  
-	return KERNELBASE + offset;

+   return kimage_vaddr + offset;
  }
diff --git a/arch/powerpc/kernel/fsl_booke_entry_mapping.S 
b/arch/powerpc/kernel/fsl_booke_entry_mapping.S
index de0980945510..de7ee682bb4a 100644
--- a/arch/powerpc/kernel/fsl_booke_entry_mapping.S
+++ b/arch/powerpc/kernel/fsl_booke_entry_mapping.S
@@ -155,23 +155,22 @@ skpinv:   addir6,r6,1 /* 
Increment */
  
  #if defined(ENTRY_MAPPING_BOOT_SETUP)
  
-/* 6. Setup KERNELBASE mapping in TLB1[0] */

+/* 6. Setup kimage_vaddr mapping in TLB1[0] */
lis r6,0x1000   /* Set MAS0(TLBSEL) = TLB1(1), ESEL = 0 
*/
mtspr   SPRN_MAS0,r6
lis r6,(MAS1_VALID|MAS1_IPROT)@h
ori r6,r6,(MAS1_TSIZE(BOOK3E_PAGESZ_64M))@l
mtspr   SPRN_MAS1,r6
-   lis r6,MAS2_VAL(PAGE_OFFSET, BOOK3E_PAGESZ_64M, M_IF_NEEDED)@h
-   ori r6,r6,MAS2_VAL(PAGE_OFFSET, BOOK3E_PAGESZ_64M, M_IF_NEEDED)@l
-   mtspr   SPRN_MAS2,r6
+   lis r6,MAS2_EPN_MASK(BOOK3E_PAGESZ_64M)@h
+   ori r6,r6,MAS2_EPN_MASK(BOOK3E_PAGESZ_64M)@l
+   and r6,r6,r20
+   ori r6,r6,M_IF_NEEDED@l
+   mtspr   SPRN_MAS2,r6
mtspr   SPRN_MAS3,r8
tlbwe
  
-/* 7. Jump to KERNELBASE mapping */

-   lis r6,(KERNELBASE & ~0xfff)@h
-   ori r6,r6,(KERNELBASE & ~0xfff)@l
-   rlwinm  r7,r25,0,0x03ff
-   add r6,r7,r6
+/* 7. Jump to kimage_vaddr mapping */
+   mr  r6,r20
  
  #elif defined(ENTRY_MAPPING_KEXEC_SETUP)

  /*
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 2083382dd662..aa55832e7506 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -155,6 +155,8 @@ _ENTRY(_start);
   */
  
  _ENTRY(__early_start)

+   LOAD_REG_ADDR_PIC(r20, kimage_vaddr)
+   lwz r20,0(r20)
  
  #define

[PATCHv3 3/3] PCI: layerscape: Add LS1028a support

2019-08-06 Thread Xiaowei Bao

Add support for the LS1028a PCIe controller.

Signed-off-by: Xiaowei Bao 
Signed-off-by: Hou Zhiqiang 
---
v2:
 - no change.
v3:
 - Reuse the ls2088 driver data structurt.

 drivers/pci/controller/dwc/pci-layerscape.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 3a5fa26..f24f79a 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -263,6 +263,7 @@ static const struct ls_pcie_drvdata ls2088_drvdata = {
 static const struct of_device_id ls_pcie_of_match[] = {
{ .compatible = "fsl,ls1012a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1021a-pcie", .data = _drvdata },
+   { .compatible = "fsl,ls1028a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1043a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1046a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls2080a-pcie", .data = _drvdata },
-- 
2.9.5

[PATCHv3 2/3] arm64: dts: ls1028a: Add PCIe controller DT nodes

2019-08-06 Thread Xiaowei Bao

LS1028a implements 2 PCIe 3.0 controllers.

Signed-off-by: Xiaowei Bao 
Signed-off-by: Hou Zhiqiang 
---
v2:
 - Fix up the legacy INTx allocate failed issue.
v3:
 - no change.

 arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 52 ++
 1 file changed, 52 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
index aef5b06..0b542ed 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
@@ -503,6 +503,58 @@
status = "disabled";
};
 
+   pcie@340 {
+   compatible = "fsl,ls1028a-pcie";
+   reg = <0x00 0x0340 0x0 0x0010   /* controller 
registers */
+  0x80 0x 0x0 0x2000>; /* 
configuration space */
+   reg-names = "regs", "config";
+   interrupts = , /* PME 
interrupt */
+; /* aer 
interrupt */
+   interrupt-names = "pme", "aer";
+   #address-cells = <3>;
+   #size-cells = <2>;
+   device_type = "pci";
+   dma-coherent;
+   num-lanes = <4>;
+   bus-range = <0x0 0xff>;
+   ranges = <0x8100 0x0 0x 0x80 0x0001 0x0 
0x0001   /* downstream I/O */
+ 0x8200 0x0 0x4000 0x80 0x4000 0x0 
0x4000>; /* non-prefetchable memory */
+   msi-parent = <>;
+   #interrupt-cells = <1>;
+   interrupt-map-mask = <0 0 0 7>;
+   interrupt-map = < 0 0 1  0 0 GIC_SPI 109 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 2  0 0 GIC_SPI 110 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 3  0 0 GIC_SPI 111 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 4  0 0 GIC_SPI 112 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
+   };
+
+   pcie@350 {
+   compatible = "fsl,ls1028a-pcie";
+   reg = <0x00 0x0350 0x0 0x0010   /* controller 
registers */
+  0x88 0x 0x0 0x2000>; /* 
configuration space */
+   reg-names = "regs", "config";
+   interrupts = ,
+;
+   interrupt-names = "pme", "aer";
+   #address-cells = <3>;
+   #size-cells = <2>;
+   device_type = "pci";
+   dma-coherent;
+   num-lanes = <4>;
+   bus-range = <0x0 0xff>;
+   ranges = <0x8100 0x0 0x 0x88 0x0001 0x0 
0x0001   /* downstream I/O */
+ 0x8200 0x0 0x4000 0x88 0x4000 0x0 
0x4000>; /* non-prefetchable memory */
+   msi-parent = <>;
+   #interrupt-cells = <1>;
+   interrupt-map-mask = <0 0 0 7>;
+   interrupt-map = < 0 0 1  0 0 GIC_SPI 114 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 2  0 0 GIC_SPI 115 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 3  0 0 GIC_SPI 116 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 4  0 0 GIC_SPI 117 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
+   };
+
pcie@1f000 { /* Integrated Endpoint Root Complex */
compatible = "pci-host-ecam-generic";
reg = <0x01 0xf000 0x0 0x10>;
-- 
2.9.5

[PATCHv3 1/3] dt-bindings: pci: layerscape-pci: add compatible strings "fsl, ls1028a-pcie"

2019-08-06 Thread Xiaowei Bao

Add the PCIe compatible string for LS1028A

Signed-off-by: Xiaowei Bao 
Signed-off-by: Hou Zhiqiang 
---
v2:
 - no change.
v3:
 - no change.

 Documentation/devicetree/bindings/pci/layerscape-pci.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt 
b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
index e20ceaa..99a386e 100644
--- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt
+++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
@@ -21,6 +21,7 @@ Required properties:
 "fsl,ls1046a-pcie"
 "fsl,ls1043a-pcie"
 "fsl,ls1012a-pcie"
+"fsl,ls1028a-pcie"
   EP mode:
"fsl,ls1046a-pcie-ep", "fsl,ls-pcie-ep"
 - reg: base addresses and lengths of the PCIe controller register blocks.
-- 
2.9.5

Re: [PATCH 4/7] ALSA: pcm: use dma_can_mmap() to check if a device supports dma_mmap_*

2019-08-06 Thread Takashi Iwai

On Tue, 06 Aug 2019 07:29:49 +0200,
Christoph Hellwig wrote:
> 
> On Mon, Aug 05, 2019 at 11:22:03AM +0200, Takashi Iwai wrote:
> > This won't work as expected, unfortunately.  It's a bit tricky check,
> > since the driver may have its own mmap implementation via
> > substream->ops->mmap, and the dma_buffer.dev.dev might point to
> > another object depending on the dma_buffer.dev.type.
> > 
> > So please replace with something like below:
> 
> >From my gut feeling I'd reorder it a bit to make it more clear like
> this:
> 
> http://git.infradead.org/users/hch/misc.git/commitdiff/958fccf54d00c16740522f818d23f9350498e911
> 
> if this is fine it'll be in the next version, waiting for a little more
> feedback on the other patches.

Yes, the new one looks good.
Reviewed-by: Takashi Iwai 


Thanks!

Takashi

Re: [PATCH v3 13/16] powerpc/pseries/iommu: Don't use dma_iommu_ops on secure guests

2019-08-06 Thread Christoph Hellwig

On Tue, Aug 06, 2019 at 02:22:34AM -0300, Thiago Jung Bauermann wrote:
> @@ -1318,7 +1319,10 @@ void iommu_init_early_pSeries(void)
>   of_reconfig_notifier_register(_reconfig_nb);
>   register_memory_notifier(_mem_nb);
>  
> - set_pci_dma_ops(_iommu_ops);
> + if (is_secure_guest())
> + set_pci_dma_ops(NULL);
> + else
> + set_pci_dma_ops(_iommu_ops);

Shoudn't:

if (!is_secure_guest())
set_pci_dma_ops(_iommu_ops);

be enough here, given that NULL is the default?

Also either way I think this conditional needs a comment explaining
why it is there.

[PATCH v3 16/16] powerpc/configs: Enable secure guest support in pseries and ppc64 defconfigs

2019-08-06 Thread Thiago Jung Bauermann

From: Ryan Grimm 

Enables running as a secure guest in platforms with an Ultravisor.

Signed-off-by: Ryan Grimm 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/configs/ppc64_defconfig   | 1 +
 arch/powerpc/configs/pseries_defconfig | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index dc83fefa04f7..b250e6f5a7ca 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -29,6 +29,7 @@ CONFIG_DTL=y
 CONFIG_SCANLOG=m
 CONFIG_PPC_SMLPAR=y
 CONFIG_IBMEBUS=y
+CONFIG_PPC_SVM=y
 CONFIG_PPC_MAPLE=y
 CONFIG_PPC_PASEMI=y
 CONFIG_PPC_PASEMI_IOMMU=y
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 38abc9c1770a..26126b4d4de3 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -42,6 +42,7 @@ CONFIG_DTL=y
 CONFIG_SCANLOG=m
 CONFIG_PPC_SMLPAR=y
 CONFIG_IBMEBUS=y
+CONFIG_PPC_SVM=y
 # CONFIG_PPC_PMAC is not set
 CONFIG_RTAS_FLASH=m
 CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y

63 matches

Mail list logo