from:"Robert Richter"

Re: [PATCH] cxl/mem: Force array size of mem_commands[] to CXL_MEM_COMMAND_ID_MAX

2021-03-25 Thread Robert Richter

On 24.03.21 12:08:20, Dan Williams wrote:
> On Wed, Mar 24, 2021 at 11:43 AM Ira Weiny  wrote:

> > Can't we use ARRAY_SIZE?
> 
> An ARRAY_SIZE() check in cxl_validate_cmd_from_user() would work too,
> but it wouldn't give the compiler protection that Robert mentions for
> going the other way where mem_commands tries to add an entry that is
> out of bounds relative to CXL_CMDS.

I was considering that too. Another reason apart from above was to
treat 'holes' in the array caused by #ifdefs the same regardless its
position in the array. Thus, all should show up as being zeroed
instead of cutting those at the end from the array.

Thanks for applying,

-Robert

[PATCH] cxl/mem: Force array size of mem_commands[] to CXL_MEM_COMMAND_ID_MAX

2021-03-24 Thread Robert Richter

Typically the mem_commands[] array is in sync with 'enum { CXL_CMDS }'.
Current code works well.

However, the array size of mem_commands[] may not strictly be the same
as CXL_MEM_COMMAND_ID_MAX. E.g. if a new CXL_CMD() is added that is
guarded by #ifdefs, the array could be shorter. This could lead then
further to an out-of-bounds array access in cxl_validate_cmd_from_user().

Fix this by forcing the array size to CXL_MEM_COMMAND_ID_MAX. This
also adds range checks for array items in mem_commands[] at compile
time.

Signed-off-by: Robert Richter 
---
 drivers/cxl/mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 244cb7d89678..ecfc9ccdba8d 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -169,7 +169,7 @@ struct cxl_mem_command {
  * table will be validated against the user's input. For example, if size_in is
  * 0, and the user passed in 1, it is an error.
  */
-static struct cxl_mem_command mem_commands[] = {
+static struct cxl_mem_command mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
CXL_CMD(IDENTIFY, 0, 0x43, CXL_CMD_FLAG_FORCE_ENABLE),
 #ifdef CONFIG_CXL_MEM_RAW_COMMANDS
CXL_CMD(RAW, ~0, ~0, 0),
-- 
2.29.2

Re: [PATCH v5 0/4] Introduce pcim_alloc_irq_vectors()

2021-02-26 Thread Robert Richter

On 26.02.21 23:50:52, Dejin Zheng wrote:
> Introduce pcim_alloc_irq_vectors(), a device-managed version of
> pci_alloc_irq_vectors(), In some i2c drivers, If pcim_enable_device()
> has been called before, then pci_alloc_irq_vectors() is actually a
> device-managed function. It is used as a device-managed function, So
> replace it with pcim_alloc_irq_vectors().

For the whole series:

Reviewed-by: Robert Richter 

Thanks.

Re: [PATCH v5 4/4] i2c: thunderx: Use pcim_alloc_irq_vectors() to allocate IRQ vectors

2021-02-26 Thread Robert Richter

On 26.02.21 23:50:56, Dejin Zheng wrote:
> The pcim_alloc_irq_vectors() function, an explicit device-managed version
> of pci_alloc_irq_vectors(). If pcim_enable_device() has been called
> before, then pci_alloc_irq_vectors() is actually a device-managed
> function. It is used here as a device-managed function, So replace it
> with pcim_alloc_irq_vectors().
> 
> Signed-off-by: Dejin Zheng 

Acked-by: Robert Richter

Re: [PATCH 2/2] PCI: controller: avoid building empty drivers

2021-02-26 Thread Robert Richter

On 25.02.21 15:37:10, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> There are harmless warnings when compile testing the kernel with
> CONFIG_TRIM_UNUSED_KSYMS:
> 
> drivers/pci/controller/dwc/pcie-al.o: no symbols
> drivers/pci/controller/pci-thunder-ecam.o: no symbols
> drivers/pci/controller/pci-thunder-pem.o: no symbols
> 
> The problem here is that the host drivers get built even when the
> configuration symbols are all disabled, as they pretend to not be drivers
> but are silently enabled because of the promise that ACPI based systems
> need no drivers.
> 
> Add back the normal symbols to have these drivers built, and change the
> logic to otherwise only build them when both CONFIG_PCI_QUIRKS and
> CONFIG_ACPI are enabled.
> 
> As a side-effect, this enables compile-testing the drivers on other
> architectures, which in turn needs the acpi_get_rc_resources()
> function to be defined.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/pci/controller/Makefile | 7 ++-
>  drivers/pci/controller/dwc/Makefile | 7 ++-
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
> index e4559f2182f2..6d24a163033f 100644
> --- a/drivers/pci/controller/Makefile
> +++ b/drivers/pci/controller/Makefile
> @@ -11,10 +11,13 @@ obj-$(CONFIG_PCIE_RCAR_HOST) += pcie-rcar.o 
> pcie-rcar-host.o
>  obj-$(CONFIG_PCIE_RCAR_EP) += pcie-rcar.o pcie-rcar-ep.o
>  obj-$(CONFIG_PCI_HOST_COMMON) += pci-host-common.o
>  obj-$(CONFIG_PCI_HOST_GENERIC) += pci-host-generic.o
> +obj-$(CONFIG_PCI_HOST_THUNDER_ECAM) += pci-thunder-ecam.o
> +obj-$(CONFIG_PCI_HOST_THUNDER_PEM) += pci-thunder-pem.o
>  obj-$(CONFIG_PCIE_XILINX) += pcie-xilinx.o
>  obj-$(CONFIG_PCIE_XILINX_NWL) += pcie-xilinx-nwl.o
>  obj-$(CONFIG_PCIE_XILINX_CPM) += pcie-xilinx-cpm.o
>  obj-$(CONFIG_PCI_V3_SEMI) += pci-v3-semi.o
> +obj-$(CONFIG_PCI_XGENE) += pci-xgene.o
>  obj-$(CONFIG_PCI_XGENE_MSI) += pci-xgene-msi.o
>  obj-$(CONFIG_PCI_VERSATILE) += pci-versatile.o
>  obj-$(CONFIG_PCIE_IPROC) += pcie-iproc.o
> @@ -47,8 +50,10 @@ obj-y  += mobiveil/
>  # ARM64 and use internal ifdefs to only build the pieces we need
>  # depending on whether ACPI, the DT driver, or both are enabled.
>  
> -ifdef CONFIG_PCI
> +ifdef CONFIG_ACPI
> +ifdef CONFIG_PCI_QUIRKS
>  obj-$(CONFIG_ARM64) += pci-thunder-ecam.o
>  obj-$(CONFIG_ARM64) += pci-thunder-pem.o
>  obj-$(CONFIG_ARM64) += pci-xgene.o
>  endif
> +endif

A possible double inclusion isn't really nice here, but it should work
that way.

Also, the menu entry for the driver is in fact only for the OF case,
as it is always included for ACPI even if the option is disabled (and
thus the choice is useless). But this is unrelated to this patch.

Anyway:

Reviewed-by: Robert Richter 

> diff --git a/drivers/pci/controller/dwc/Makefile 
> b/drivers/pci/controller/dwc/Makefile
> index a751553fa0db..ba7c42f6df6f 100644
> --- a/drivers/pci/controller/dwc/Makefile
> +++ b/drivers/pci/controller/dwc/Makefile
> @@ -31,7 +31,12 @@ obj-$(CONFIG_PCIE_UNIPHIER_EP) += pcie-uniphier-ep.o
>  # ARM64 and use internal ifdefs to only build the pieces we need
>  # depending on whether ACPI, the DT driver, or both are enabled.
>  
> -ifdef CONFIG_PCI
> +obj-$(CONFIG_PCIE_AL) += pcie-al.o
> +obj-$(CONFIG_PCI_HISI) += pcie-hisi.o
> +
> +ifdef CONFIG_ACPI
> +ifdef CONFIG_PCI_QUIRKS
>  obj-$(CONFIG_ARM64) += pcie-al.o
>  obj-$(CONFIG_ARM64) += pcie-hisi.o
>  endif
> +endif
> -- 
> 2.29.2
>

Re: [PATCH 1/2] PCI: controller: thunder: fix compile testing

2021-02-26 Thread Robert Richter

On 25.02.21 15:37:09, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> Compile-testing these drivers is currently broken. Enabling
> it causes a couple of build failures though:
> 
> drivers/pci/controller/pci-thunder-ecam.c:119:30: error: shift count >= width 
> of type [-Werror,-Wshift-count-overflow]
> drivers/pci/controller/pci-thunder-pem.c:54:2: error: implicit declaration of 
> function 'writeq' [-Werror,-Wimplicit-function-declaration]
> drivers/pci/controller/pci-thunder-pem.c:392:8: error: implicit declaration 
> of function 'acpi_get_rc_resources' [-Werror,-Wimplicit-function-declaration]
> 
> Fix them with the obvious one-line changes.
> 
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Robert Richter 

> ---
>  drivers/pci/controller/pci-thunder-ecam.c |  2 +-
>  drivers/pci/controller/pci-thunder-pem.c  | 13 +++--
>  drivers/pci/pci.h |  6 ++
>  3 files changed, 14 insertions(+), 7 deletions(-)

Re: [PATCH] Documentation/submitting-patches: Extend commit message layout description

2021-02-25 Thread Robert Richter

On 15.02.21 15:19:49, Borislav Petkov wrote:
> From: Borislav Petkov 
> Subject: [PATCH] Documentation/submitting-patches: Extend commit message 
> layout description
> 
> Add more blurb about the level of detail that should be contained in a
> patch's commit message. Extend and make more explicit what text should
> be added under the --- line. Extend examples and split into more easily
> palatable paragraphs.
> 
> This has been partially carved out from a tip subsystem handbook
> patchset by Thomas Gleixner:
> 
>   https://lkml.kernel.org/r/20181107171010.421878...@linutronix.de
> 
> and incorporates follow-on comments.
> 
> Signed-off-by: Borislav Petkov 

Reviewed-by: Robert Richter

Re: [PATCH v4 1/4] PCI: Introduce pcim_alloc_irq_vectors()

2021-02-25 Thread Robert Richter

On 23.02.21 22:14:35, Dejin Zheng wrote:
> On Tue, Feb 23, 2021 at 09:02:54AM +0100, Robert Richter wrote:
> > On 22.02.21 23:14:15, Dejin Zheng wrote:
> > > On Mon, Feb 22, 2021 at 11:56:08AM +0100, Robert Richter wrote:
> > > > On 20.02.21 00:46:49, Dejin Zheng wrote:
> > > > > > On 18.02.21 23:04:55, Dejin Zheng wrote:
> > > > 
> > > > > > > + if (!dr || !dr->enabled)
> > > > > here checks whether the pci device is enabled.
> > > > 
> > > > What is the purpose of this? The device "is_managed" or not.
> > > >
> > > The device is managed or not by check whether "dr" is NULL. And
> > > check the "dr->enabled" is for the PCI device enable. I think it
> > > may not make sense to apply for irq vectors when PCI device is not
> > > enabled.
> > 
> > I don't see how a disabled device affects in any way the release of
> > the irq vectors during device removal. dr is always non-null in case
> > the device is managed, a check isn't needed for that.
> >
> Yes, the disabled device does not affect release irq vectors, But
> the disabled device affects apply for irq vectors, It is wrong to apply
> for the irq vectors when the device is not enabled.

What is the scenario you have in mind here? What does happen then?
The typical use case is to pcim_enable_device() it and then add the
irq vectors. It is always enabled then.

Even if the device could wrongly be disabled, it does not affect the
device's release.

Also, how is this related to pcim? There isn't a check in
pci_alloc_irq_vectors() either for that case. 

> Add this check can
> facilitate developers to find problems as soon as possible.

No, there are many ways to shoot yourself in the foot. We cannot add
checks here and there for this, esp. at runtime. If there is a valid
reason that the device must always be enabled and we cannot assume
this is the case, then we could add a WARN_ON(). But I doubt that.

-Robert

Re: [PATCH v4 1/4] PCI: Introduce pcim_alloc_irq_vectors()

2021-02-23 Thread Robert Richter

On 22.02.21 23:14:15, Dejin Zheng wrote:
> On Mon, Feb 22, 2021 at 11:56:08AM +0100, Robert Richter wrote:
> > On 20.02.21 00:46:49, Dejin Zheng wrote:
> > > > On 18.02.21 23:04:55, Dejin Zheng wrote:
> > 
> > > > > + if (!dr || !dr->enabled)
> > > here checks whether the pci device is enabled.
> > 
> > What is the purpose of this? The device "is_managed" or not.
> >
> The device is managed or not by check whether "dr" is NULL. And
> check the "dr->enabled" is for the PCI device enable. I think it
> may not make sense to apply for irq vectors when PCI device is not
> enabled.

I don't see how a disabled device affects in any way the release of
the irq vectors during device removal. dr is always non-null in case
the device is managed, a check isn't needed for that.

-Robert

Re: [PATCH v4 1/4] PCI: Introduce pcim_alloc_irq_vectors()

2021-02-22 Thread Robert Richter

On 19.02.21 17:15:50, Krzysztof Wilczyński wrote:
> Hi Robert,
> 
> [...]
> > Obiously this is meant here:
> > 
> > if (!pci_is_managed(dev))
> [...]
> 
> A question to improve my understanding for future reference.  Was the
> previous approach of checking for "enabled" flag from struct pci_devres
> was not a good choice here?

Initially this was meant to just show the idea.

After careful review I don't see this additional check is required as
once the pci dev is managed, it will be always released with
pcim_release().

-Robert

Re: [PATCH v4 1/4] PCI: Introduce pcim_alloc_irq_vectors()

2021-02-22 Thread Robert Richter

On 20.02.21 00:46:49, Dejin Zheng wrote:
> > On 18.02.21 23:04:55, Dejin Zheng wrote:

> > > + if (!dr || !dr->enabled)
> here checks whether the pci device is enabled.

What is the purpose of this? The device "is_managed" or not.

-Robert

Re: [PATCH v4 4/4] i2c: thunderx: Use the correct name of device-managed function

2021-02-19 Thread Robert Richter

On 18.02.21 23:04:58, Dejin Zheng wrote:
> Use the new function pcim_alloc_irq_vectors() to allocate IRQ vectors,
> the pcim_alloc_irq_vectors() function, an explicit device-managed version
> of pci_alloc_irq_vectors(). If pcim_enable_device() has been called
> before, then pci_alloc_irq_vectors() is actually a device-managed
> function. It is used here as a device-managed function, So replace it
> with pcim_alloc_irq_vectors().
> 
> Signed-off-by: Dejin Zheng 

The title of this patch suggests there was something incorrect before
and this is a fix, which it isn't. Please reword.

Thanks,

-Robert

Re: [PATCH v4 1/4] PCI: Introduce pcim_alloc_irq_vectors()

2021-02-19 Thread Robert Richter

On 19.02.21 16:48:57, Andy Shevchenko wrote:
> On Fri, Feb 19, 2021 at 03:40:05PM +0100, Robert Richter wrote:
> > On 18.02.21 23:04:55, Dejin Zheng wrote:
> > > Introduce pcim_alloc_irq_vectors(), a device-managed version of
> > > pci_alloc_irq_vectors(). Introducing this function can simplify
> > > the error handling path in many drivers.
> > > 
> > > And use pci_free_irq_vectors() to replace some code in pcim_release(),
> > > they are equivalent, and no functional change. It is more explicit
> > > that pcim_alloc_irq_vectors() is a device-managed function.
> 
> ...
> 
> > If it is just about having a pcim-* counterpart why not just an inline
> > function like the one below.
> 
> It's a good suggestion, thanks!
> 
> Still we need to amend pcim_release() to explicitly show that we call
> pci_free_irq_vectors().

Fair enough. Thanks,

-Robert

Re: [PATCH v4 1/4] PCI: Introduce pcim_alloc_irq_vectors()

2021-02-19 Thread Robert Richter

On 19.02.21 15:40:11, Robert Richter wrote:
> static inline int pcim_alloc_irq_vectors(struct pci_dev *dev,
>   unsigned int min_vecs, unsigned int max_vecs, unsigned int flags)
> {
>   if (!pci_is_managed(dev, min_vecs, max_vecs, flags))

Obiously this is meant here:

if (!pci_is_managed(dev))

>   return -EINVAL;
> 
>   return pci_alloc_irq_vectors(dev, min_vecs, max_vecs, flags);
> }

Re: [PATCH v4 1/4] PCI: Introduce pcim_alloc_irq_vectors()

2021-02-19 Thread Robert Richter

On 18.02.21 23:04:55, Dejin Zheng wrote:
> Introduce pcim_alloc_irq_vectors(), a device-managed version of
> pci_alloc_irq_vectors(). Introducing this function can simplify
> the error handling path in many drivers.
> 
> And use pci_free_irq_vectors() to replace some code in pcim_release(),
> they are equivalent, and no functional change. It is more explicit
> that pcim_alloc_irq_vectors() is a device-managed function.
> 
> Suggested-by: Andy Shevchenko 
> Signed-off-by: Dejin Zheng 
> ---
> v3 -> v4:
>   - No change
> v2 -> v3:
>   - Add some commit comments for replace some codes in
> pcim_release() by pci_free_irq_vectors().
> v1 -> v2:
>   - Use pci_free_irq_vectors() to replace some code in
> pcim_release().
>   - Modify some commit messages.
> 
>  drivers/pci/pci.c   | 33 +
>  include/linux/pci.h |  3 +++
>  2 files changed, 32 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b67c4327d307..db799d089c85 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1969,10 +1969,7 @@ static void pcim_release(struct device *gendev, void 
> *res)
>   struct pci_devres *this = res;
>   int i;
>  
> - if (dev->msi_enabled)
> - pci_disable_msi(dev);
> - if (dev->msix_enabled)
> - pci_disable_msix(dev);
> + pci_free_irq_vectors(dev);
>  
>   for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
>   if (this->region_mask & (1 << i))
> @@ -2054,6 +2051,34 @@ void pcim_pin_device(struct pci_dev *pdev)
>  }
>  EXPORT_SYMBOL(pcim_pin_device);
>  
> +/**
> + * pcim_alloc_irq_vectors - a device-managed pci_alloc_irq_vectors()
> + * @dev: PCI device to operate on
> + * @min_vecs:minimum number of vectors required (must be >= 
> 1)
> + * @max_vecs:maximum (desired) number of vectors
> + * @flags:   flags or quirks for the allocation
> + *
> + * Return the number of vectors allocated, (which might be smaller than
> + * @max_vecs) if successful, or a negative error code on error. If less
> + * than @min_vecs interrupt vectors are available for @dev the function
> + * will fail with -ENOSPC.
> + *
> + * It depends on calling pcim_enable_device() to make IRQ resources
> + * manageable.
> + */
> +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> + unsigned int max_vecs, unsigned int flags)
> +{
> + struct pci_devres *dr;
> +
> + dr = find_pci_dr(dev);
> + if (!dr || !dr->enabled)
> + return -EINVAL;
> +
> + return pci_alloc_irq_vectors(dev, min_vecs, max_vecs, flags);
> +}
> +EXPORT_SYMBOL(pcim_alloc_irq_vectors);

If it is just about having a pcim-* counterpart why not just an inline
function like the one below.

> +
>  /*
>   * pcibios_add_device - provide arch specific hooks when adding device dev
>   * @dev: the PCI device being added
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 86c799c97b77..d75ba85ddfc5 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1818,6 +1818,9 @@ pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int 
> min_vecs,
> NULL);
>  }
>  
> +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> + unsigned int max_vecs, unsigned int flags);
> +

static inline int pcim_alloc_irq_vectors(struct pci_dev *dev,
unsigned int min_vecs, unsigned int max_vecs, unsigned int flags)
{
if (!pci_is_managed(dev, min_vecs, max_vecs, flags))
return -EINVAL;

return pci_alloc_irq_vectors(dev, min_vecs, max_vecs, flags);
}

All those stub functions with EXPORT_SYMBOLS etc. could be dropped
then.

With some macro magic added a list of functions could easily being
created that are already managed but just need a pcim* counterpart.

-Robert

>  /* Include architecture-dependent settings and functions */
>  
>  #include 
> -- 
> 2.25.0
>

Re: [PATCH v3 0/4] Introduce pcim_alloc_irq_vectors()

2021-02-19 Thread Robert Richter

On 18.02.21 16:01:56, Andy Shevchenko wrote:
> The problem this series solves is an imbalanced API.

This (added) API is bloated and incomplete. It adds functions without
benefit, the only is to have a single pcim alloc function in addition
to the pairing of alloc/free functions. I agree, it is hard to detect
which parts are released if pcim_enable_device() is used.

Additional, you need to go through pcim_release() to add other
pcim_*() functions for everything else that is released there.
Otherwise that new API is still incomplete. But this adds another
bunch of useless functions.

> Christoph IIRC was clear that if we want to use PCI IRQ allocation API the
> caller must know what's going on. Hiding this behind the scenes is not good.
> And this series unhides that.

IMO, this is more a documentation issue. pcim_enable_device() must be
better documented and list all enable/alloc functions that are going
to be released out of the box later.

Even better, make sure everything is managed and thus all of a pci_dev
is released, no matter how it was setup (this could even already be
the case).

In addition you could implement a static code checker.

> Also, you may go and clean up all pci_free_irq_vectors() when
> pcim_enable_device() is called, but I guess you will get painful process and
> rejection in a pile of cases.

Why should something be rejected if it is not correctly freed?

Even if pci_free_irq_vectors() is called, pcim_release() will not
complain if it was already freed before. So using
pci_free_irq_vectors() is ok even in conjunction with
pcim_enable_device().

In the end, let's make sure everything is released in pci_dev if it is
managed and document this.

Thanks,

-Robert

Re: [PATCH] i2c: thunderx: Fix an issue about missing call to 'pci_free_irq_vectors()'

2021-02-18 Thread Robert Richter

On 14.02.21 22:37:34, Dejin Zheng wrote:
> Call to 'pci_free_irq_vectors()' are missing both in the error handling
> path of the probe function, and in the remove function. So add them.
> 
> Fixes: 4c21541d8da17fb ("i2c: thunderx: Replace pci_enable_msix()")
> Signed-off-by: Dejin Zheng 
> ---
>  drivers/i2c/busses/i2c-thunderx-pcidrv.c | 9 ++---

FTR, I reviewed the error paths of thunder_i2c_probe_pci() (v5.11-rc1)
and all look sane to me.

-Robert

Re: [PATCH v3 0/4] Introduce pcim_alloc_irq_vectors()

2021-02-18 Thread Robert Richter

On 17.02.21 00:02:45, Dejin Zheng wrote:
> Introduce pcim_alloc_irq_vectors(), a device-managed version of
> pci_alloc_irq_vectors(), In some i2c drivers, If pcim_enable_device()
> has been called before, then pci_alloc_irq_vectors() is actually a
> device-managed function. It is used as a device-managed function, So
> replace it with pcim_alloc_irq_vectors().
> 
> Changelog
> -
> v2 -> v3:
>   - Add some commit comments for replace some codes in
> pcim_release() by pci_free_irq_vectors().
>   - Simplify the error handling path in i2c designware
> driver.
> v1 -> v2:
>   - Use pci_free_irq_vectors() to replace some code in
> pcim_release().
>   - Modify some commit messages.
> 
> Dejin Zheng (4):
>   PCI: Introduce pcim_alloc_irq_vectors()
>   Documentation: devres: Add pcim_alloc_irq_vectors()

This is already taken care of, see pcim_release():

if (dev->msi_enabled)
pci_disable_msi(dev);
if (dev->msix_enabled)
pci_disable_msix(dev);

Activated when used with pcim_enable_device().

This series is not required.

-Robert

>   i2c: designware: Use the correct name of device-managed function
>   i2c: thunderx: Use the correct name of device-managed function
> 
>  .../driver-api/driver-model/devres.rst|  1 +
>  drivers/i2c/busses/i2c-designware-pcidrv.c| 15 +++--
>  drivers/i2c/busses/i2c-thunderx-pcidrv.c  |  2 +-
>  drivers/pci/pci.c | 33 ---
>  include/linux/pci.h   |  3 ++
>  5 files changed, 38 insertions(+), 16 deletions(-)
> 
> -- 
> 2.25.0
>

Re: [PATCH] i2c: thunderx: Fix an issue about missing call to 'pci_free_irq_vectors()'

2021-02-18 Thread Robert Richter

On 14.02.21 22:37:34, Dejin Zheng wrote:
> Call to 'pci_free_irq_vectors()' are missing both in the error handling
> path of the probe function, and in the remove function. So add them.
> 
> Fixes: 4c21541d8da17fb ("i2c: thunderx: Replace pci_enable_msix()")
> Signed-off-by: Dejin Zheng 

This is already taken care of in pcim_release() as the device is
managed (pcim_enable_device()). Your change is not required.

-Robert

Re: [PATCH 02/29] alpha: Avoid comma separated statements

2021-02-16 Thread Robert Richter

On 30.01.21 10:54:42, Joe Perches wrote:
> On Mon, 2020-08-24 at 21:55 -0700, Joe Perches wrote:
> > Use semicolons and braces.
> 
> ping?
> 
> > 
> > Signed-off-by: Joe Perches 
> > ---
> >  arch/alpha/kernel/pci_iommu.c  |  8 +---
> >  arch/alpha/oprofile/op_model_ev4.c | 22 ++
> >  arch/alpha/oprofile/op_model_ev5.c |  8 +---

This patch should be rebased as oprofile is going to be removed in
5.12. A branch is in linux-next.

-Robert

> >  3 files changed, 24 insertions(+), 14 deletions(-)

Re: [PATCH 06/13] staging: octeon: Switch from strlcpy to strscpy

2021-02-16 Thread Robert Richter

On 31.01.21 22:58:27, Kumar Kartikeya Dwivedi wrote:
> strlcpy is marked as deprecated in Documentation/process/deprecated.rst,
> and there is no functional difference when the caller expects truncation
> (when not checking the return value). strscpy is relatively better as it
> also avoids scanning the whole source string.
> 
> This silences the related checkpatch warnings from:
> 5dbdb2d87c29 ("checkpatch: prefer strscpy to strlcpy")
> 
> Signed-off-by: Kumar Kartikeya Dwivedi 
> ---
>  drivers/staging/octeon/ethernet-mdio.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Robert Richter

Re: [PATCH 0/8] cpupower: Updates and cleanup to support AMD Family 0x19

2021-01-22 Thread Robert Richter

On 22.01.21 11:38:28, Nathan Fontenot wrote:
> Updates to the cpupower command to add support for AMD family 0x19
> and cleanup the code to remove many of the family checks to hopefully
> make any future family updates easier.
> 
> The first couple of patches are simple updates to rename the structs
> in the msr_pstate union to better reflect current support and correcting
> the name of the CPUPOWER_CAP_AMD_CPB cpuid cap flag.
> 
> Patches 3, 5, and 8 update the family checks to either replace
> them with a new cpuid cap flag based off of cpuid checks or check for
> family >= 0x17 where removing the direct family check isn't possible.
> 
> The reamianing patches are cleanups to remove unneeded extra enabled bit
> checking, remove passing no longer used variables, and remove unused
> variables in decode_pstates().
> 
> ---
> 
> Nathan Fontenot (7):
>   cpupower: Update msr_pstate union struct naming
>   cpupower: Add CPUPOWER_CAP_AMD_HW_PSTATE cpuid caps flag
>   cpupower: Remove unused pscur variable.
>   cpupower: Update family checks when decoding HW pstates
>   cpupower: Condense pstate enabled bit checks in decode_pstates()
>   cpupower: Remove family arg to decode_pstates()
>   cpupower: Add cpuid cap flag for MSR_AMD_HWCR support
> 
> Robert Richter (1):
>   cpupower: Correct macro name for CPB caps flag

For the whole series:

Reviewed-by: Robert Richter

Re: [PATCH 00/18] drivers: Remove oprofile and dcookies

2021-01-14 Thread Robert Richter

On 14.01.21 17:04:24, Viresh Kumar wrote:
> Hello,
> 
> The "oprofile" user-space tools don't use the kernel OPROFILE support
> any more, and haven't in a long time. User-space has been converted to
> the perf interfaces.
> 
> Remove oprofile and dcookies (whose only user is oprofile) support from
> the kernel.
> 
> This was suggested here [1] earlier.
> 
> This is build/boot tested by kernel test robot (Intel) and Linaro's
> Tuxmake[2] for a lot of architectures and no failures were reported.
> 
> --
> Viresh
> 
> [1] 
> https://lore.kernel.org/lkml/CAHk-=whw9t3ztv8ia2sjwyqs1vojus14p_qhj3v5-9pcbmg...@mail.gmail.com/
> [2] https://lwn.net/Articles/841624/
> 
> Viresh Kumar (18):
>   arch: alpha: Remove CONFIG_OPROFILE support
>   arch: arm: Remove CONFIG_OPROFILE support
>   arch: arc: Remove CONFIG_OPROFILE support
>   arch: hexagon: Don't select HAVE_OPROFILE
>   arch: ia64: Remove CONFIG_OPROFILE support
>   arch: ia64: Remove rest of perfmon support
>   arch: microblaze: Remove CONFIG_OPROFILE support
>   arch: mips: Remove CONFIG_OPROFILE support
>   arch: parisc: Remove CONFIG_OPROFILE support
>   arch: powerpc: Stop building and using oprofile
>   arch: powerpc: Remove oprofile
>   arch: s390: Remove CONFIG_OPROFILE support
>   arch: sh: Remove CONFIG_OPROFILE support
>   arch: sparc: Remove CONFIG_OPROFILE support
>   arch: x86: Remove CONFIG_OPROFILE support
>   arch: xtensa: Remove CONFIG_OPROFILE support
>   drivers: Remove CONFIG_OPROFILE support
>   fs: Remove dcookies support

After oprofile userland moved to version 1.x, the kernel support for
it isn't needed anymore. The switch was back in 2014 when oprofile
started using the perf syscall:

 
https://sourceforge.net/p/oprofile/oprofile/ci/ba9edea2bdfe2c9475749fc83105632bd916b96c

Since then I haven't received any significant patches to implement new
features or add support for newer platforms in the kernel. There
haven't been bug reports sent or questions asked on the mailing list
for quite a while, which indicates there are no or less users. Users
(if any) should switch to oprofile 1.x or the perf tool. No need to
carry kernel support any longer with us.

So time to get rid of it. For the whole series:

Acked-by: Robert Richter

Re: [PATCH v2] edac: remove redundant error print in xgene_edac_probe

2021-01-12 Thread Robert Richter

On 12.01.21 02:35:40, menglong8.d...@gmail.com wrote:
> From: Menglong Dong 
> 
> Coccinelle reports a redundant error print in xgene_edac_probe.
> As 'platform_get_irq' already prints the error message, error
> print here is redundant.
> 
> Fix it by using 'platform_get_irq_optional' in place of
> 'platform_get_irq', as Robert suggested.
> 
> Signed-off-by: Menglong Dong 
> ---
> v2:
> - use 'platform_get_irq_optional' instead of 'platform_get_irq'
> ---

Reviewed-by: Robert Richter

Re: [PATCH] edac: remove redundant error print in xgene_edac_probe

2021-01-12 Thread Robert Richter

On 12.01.21 00:46:05, menglong8.d...@gmail.com wrote:
> From: Menglong Dong 
> 
> Coccinelle reports a redundant error print in xgene_edac_probe.
> As 'platform_get_irq' already prints the error message, error
> print here is redundant and can be removed.
> 
> Signed-off-by: Menglong Dong 
> ---
>  drivers/edac/xgene_edac.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/edac/xgene_edac.c b/drivers/edac/xgene_edac.c
> index 1d2c27a00a4a..1d583f0ea4dc 100644
> --- a/drivers/edac/xgene_edac.c
> +++ b/drivers/edac/xgene_edac.c
> @@ -1918,7 +1918,6 @@ static int xgene_edac_probe(struct platform_device 
> *pdev)
>   for (i = 0; i < 3; i++) {
>   irq = platform_get_irq(pdev, i);
>   if (irq < 0) {
> - dev_err(>dev, "No IRQ resource\n");

Better call platform_get_irq_optional() instead. That would keep the
same error message patterns for all error paths of the probe function.

-Robert

>   rc = -EINVAL;
>   goto out_err;
>   }
> -- 
> 2.17.1
>

Re: [PATCH 4/8] i2c: octeon: check correct size of maximum RECV_LEN packet

2021-01-11 Thread Robert Richter

On 09.01.21 13:43:08, Wolfram Sang wrote:
> I2C_SMBUS_BLOCK_MAX defines already the maximum number as defined in the
> SMBus 2.0 specs. No reason to add one to it.
> 
> Fixes: 886f6f8337dd ("i2c: octeon: Support I2C_M_RECV_LEN")
> Signed-off-by: Wolfram Sang 

Reviewed-by: Robert Richter

Re: [PATCH] edac: amd64_edac: remove unneeded break

2020-10-19 Thread Robert Richter

On 19.10.20 12:35:24, t...@redhat.com wrote:
> From: Tom Rix 
> 
> A break is not needed if it is preceded by a return
> 
> Signed-off-by: Tom Rix 

Reviewed-by: Robert Richter

Re: [PATCH 02/29] alpha: Avoid comma separated statements

2020-08-25 Thread Robert Richter

On 24.08.20 21:55:59, Joe Perches wrote:
> Use semicolons and braces.
> 
> Signed-off-by: Joe Perches 
> ---
>  arch/alpha/kernel/pci_iommu.c  |  8 +---
>  arch/alpha/oprofile/op_model_ev4.c | 22 ++
>  arch/alpha/oprofile/op_model_ev5.c |  8 +---
>  3 files changed, 24 insertions(+), 14 deletions(-)

For oprofile:

Acked-by: Robert Richter

[PATCH] MAINTAINERS, edac: Calxeda Highbank, handover maintenance to Andre Przywara

2020-08-24 Thread Robert Richter

I do not have hardware anymore, nor there is ongoing development. So
handover maintenance to Andre who already maintains the last
remainings of Calxeda.

Cc: Andre Przywara 
Signed-off-by: Robert Richter 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1b7b0c1a24c8..6ed56e1a7d28 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6148,7 +6148,7 @@ S:Supported
 F: drivers/edac/bluefield_edac.c
 
 EDAC-CALXEDA
-M: Robert Richter 
+M: Andre Przywara 
 L: linux-e...@vger.kernel.org
 S: Maintained
 F: drivers/edac/highbank*
-- 
2.20.1

[PATCH] MAINTAINERS: Update Cavium/Marvell entries

2020-08-24 Thread Robert Richter

I am leaving Marvell and already do not have access to my @marvell.com
email address. So switching over to my korg mail address or removing
my address there another maintainer is already listed. For the entries
there no other maintainer is listed I will keep looking into patches
for Cavium systems for a while until someone from Marvell takes it
over. Since I might have limited access to hardware and also limited
time I changed state to 'Odd Fixes' for those entries.

Cc: Ganapatrao Kulkarni 
Cc: Sunil Goutham 
Signed-off-by: Robert Richter 
---
 MAINTAINERS | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a45fe1a6251e..1b7b0c1a24c8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1693,7 +1693,6 @@ F:arch/arm/mach-cns3xxx/
 
 ARM/CAVIUM THUNDER NETWORK DRIVER
 M: Sunil Goutham 
-M: Robert Richter 
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
 S: Supported
 F: drivers/net/ethernet/cavium/thunder/
@@ -3930,8 +3929,8 @@ W:
https://wireless.wiki.kernel.org/en/users/Drivers/carl9170
 F: drivers/net/wireless/ath/carl9170/
 
 CAVIUM I2C DRIVER
-M: Robert Richter 
-S: Supported
+M: Robert Richter 
+S: Odd Fixes
 W: http://www.marvell.com
 F: drivers/i2c/busses/i2c-octeon*
 F: drivers/i2c/busses/i2c-thunderx*
@@ -3946,8 +3945,8 @@ W:http://www.marvell.com
 F: drivers/net/ethernet/cavium/liquidio/
 
 CAVIUM MMC DRIVER
-M: Robert Richter 
-S: Supported
+M: Robert Richter 
+S: Odd Fixes
 W: http://www.marvell.com
 F: drivers/mmc/host/cavium*
 
@@ -3959,9 +3958,9 @@ W:http://www.marvell.com
 F: drivers/crypto/cavium/cpt/
 
 CAVIUM THUNDERX2 ARM64 SOC
-M: Robert Richter 
+M: Robert Richter 
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
-S: Maintained
+S: Odd Fixes
 F: Documentation/devicetree/bindings/arm/cavium-thunder2.txt
 F: arch/arm64/boot/dts/cavium/thunder2-99xx*
 
@@ -6156,16 +6155,15 @@ F:  drivers/edac/highbank*
 
 EDAC-CAVIUM OCTEON
 M: Ralf Baechle 
-M: Robert Richter 
 L: linux-e...@vger.kernel.org
 L: linux-m...@vger.kernel.org
 S: Supported
 F: drivers/edac/octeon_edac*
 
 EDAC-CAVIUM THUNDERX
-M: Robert Richter 
+M: Robert Richter 
 L: linux-e...@vger.kernel.org
-S: Supported
+S: Odd Fixes
 F: drivers/edac/thunderx_edac*
 
 EDAC-CORE
@@ -6173,7 +6171,7 @@ M:Borislav Petkov 
 M: Mauro Carvalho Chehab 
 M: Tony Luck 
 R: James Morse 
-R: Robert Richter 
+R: Robert Richter 
 L: linux-e...@vger.kernel.org
 S: Supported
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git 
edac-for-next
@@ -13394,10 +13392,10 @@ F:
Documentation/devicetree/bindings/pci/axis,artpec*
 F: drivers/pci/controller/dwc/*artpec*
 
 PCIE DRIVER FOR CAVIUM THUNDERX
-M: Robert Richter 
+M: Robert Richter 
 L: linux-...@vger.kernel.org
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
-S: Supported
+S: Odd Fixes
 F: drivers/pci/controller/pci-thunder-*
 
 PCIE DRIVER FOR HISILICON
@@ -17125,8 +17123,8 @@ S:  Maintained
 F: drivers/net/thunderbolt.c
 
 THUNDERX GPIO DRIVER
-M: Robert Richter 
-S: Maintained
+M: Robert Richter 
+S: Odd Fixes
 F: drivers/gpio/gpio-thunderx.c
 
 TI AM437X VPFE DRIVER
-- 
2.20.1

Re: [PATCH] EDAC, {skx, i10nm}: Advice mcelog that the error were handled

2020-06-10 Thread Robert Richter

On 10.06.20 14:58:01, Zhenzhong Duan wrote:
> If one MCE error has been processed in kernel, it's not necessory
> to pass it to user level mcelog.
> 
> Signed-off-by: Zhenzhong Duan 

Reviewed-by: Robert Richter 

> ---
>  drivers/edac/skx_common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/edac/skx_common.c b/drivers/edac/skx_common.c
> index 46be1a7..8c0165b 100644
> --- a/drivers/edac/skx_common.c
> +++ b/drivers/edac/skx_common.c
> @@ -615,7 +615,8 @@ int skx_mce_check_error(struct notifier_block *nb, 
> unsigned long val,
>  
>   skx_mce_output_error(mci, mce, );
>  
> - return NOTIFY_DONE;
> + /* Advice mcelog that the error were handled */

... error was ...

And make a sentence out of it, so close with a dot.

> + return NOTIFY_STOP;

This change aligns with other implementation in:

 i7core_mce_check_error(),
 amd_decode_mce() and
 sbridge_mce_check_error().

-Robert

>  }
>  
>  void skx_remove(void)
> -- 
> 1.8.3.1
>

Re: [PATCH v4] EDAC/ghes: Setup DIMM label from DMI and use it in error reports

2020-06-03 Thread Robert Richter

On 02.06.20 17:48:43, Borislav Petkov wrote:
> On Thu, May 28, 2020 at 12:13:06PM +0200, Robert Richter wrote:

> > v4:
> > 
> >  * dimm->label: Only update dimm->label in if bank/device is found in
> >the SMBIOS table, this keeps current behavior for machines that do
> >not provide this information.
> > 
> >  * e->location: Keep current behavior how e->location is written.
> > 
> >  * e->label: Use dimm->label if a DIMM was found by its handle and
> >"unknown memory" otherwise. This aligns with the edac_mc
> >implementation.
> > 
> > Signed-off-by: Robert Richter 
> > ---
> >  drivers/edac/ghes_edac.c | 37 ++---
> >  1 file changed, 26 insertions(+), 11 deletions(-)
> 
> Yap, looks good. I'll queue it after the merge window.

Great, thanks.

-Robert

[PATCH v4] EDAC/ghes: Setup DIMM label from DMI and use it in error reports

2020-05-28 Thread Robert Richter

The ghes driver reports errors with 'unknown label' even if the actual
DIMM label is known, e.g.:

 EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0
   module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0
   page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location:
   node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM
   location:N0 DIMM_A0 status(0x0400): Storage error in
   DRAM memory)

Fix this by using struct dimm_info's label string in error reports:

 EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0
   rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0
   page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location:
   node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM
   location:N0 DIMM_A0 status(0x0400): Storage error in
   DRAM memory)

The labels are initialized by reading the bank and device strings from
DMI. Now, the label information can also read from sysfs. E.g. a
ThunderX2 system will show the following:

 /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
 /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0
 /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0
 /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0
 /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0
 /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0
 /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0
 /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0
 /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0

Since dimm_labels can be rewritten, that label will be used in a later
error report:

 # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label
 # # some error injection here
 # dmesg | grep foobar
 [ 751.383533] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0
 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0
 page:0x8c8dc74 offset:0x0 grain:1 syndrome:0x0 - APEI location:
 node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM
 location:N0 DIMM_A0 status(0x0400): Storage error in DRAM
 memory)

Signed-off-by: Robert Richter 
---
v4:

 * dimm->label: Only update dimm->label in if bank/device is found in
   the SMBIOS table, this keeps current behavior for machines that do
   not provide this information.

 * e->location: Keep current behavior how e->location is written.

 * e->label: Use dimm->label if a DIMM was found by its handle and
   "unknown memory" otherwise. This aligns with the edac_mc
   implementation.

Signed-off-by: Robert Richter 
---
 drivers/edac/ghes_edac.c | 37 ++---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index cb3dab56a875..9a6a055ab624 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -87,16 +87,29 @@ static void ghes_edac_count_dimms(const struct dmi_header 
*dh, void *arg)
(*num_dimm)++;
 }
 
-static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle)
+static struct dimm_info *find_dimm_by_handle(struct mem_ctl_info *mci, u16 
handle)
 {
struct dimm_info *dimm;
 
mci_for_each_dimm(mci, dimm) {
if (dimm->smbios_handle == handle)
-   return dimm->idx;
+   return dimm;
}
 
-   return -1;
+   return NULL;
+}
+
+static void dimm_setup_label(struct dimm_info *dimm, u16 handle)
+{
+   const char *bank = NULL, *device = NULL;
+
+   dmi_memdev_name(handle, , );
+
+   /* both strings must be non-zero */
+   if (bank && *bank && device && *device) {
+   snprintf(dimm->label, sizeof(dimm->label),
+   "%s %s", bank, device);
+   }
 }
 
 static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
@@ -179,9 +192,7 @@ static void ghes_edac_dmidecode(const struct dmi_header 
*dh, void *arg)
dimm->dtype = DEV_UNKNOWN;
dimm->grain = 128;  /* Likely, worse case */
 
-   /*
-* FIXME: It shouldn't be hard to also fill the DIMM labels
-*/
+   dimm_setup_label(dimm, entry->handle);
 
if (dimm->nr_pages) {
edac_dbg(1, "DIMM%i: %s size = %d MB%s\n",
@@ -228,7 +239,6 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_me

Re: [PATCH v3] EDAC/ghes: Setup DIMM label from DMI and use it in error reports

2020-05-20 Thread Robert Richter

Thanks for testing.

On 19.05.20 22:25:35, Borislav Petkov wrote:

> -/sys/devices/system/edac/mc/mc0/csrow15/ch0_dimm_label:1:mc#0memory#15
> +/sys/devices/system/edac/mc/mc0/csrow15/ch0_dimm_label:1:unknown memory 
> (handle: 0x0030)

Looks like on that system device locator or bank locator (or both) are
empty. What shows this (esp. the Locator fields):?

 dmidecode -t 17

So maybe the check is too strict and we should allow one of both being
empty?

If the strings are missing, shouldn't we still provide the handle in
the label information?

The string 'mc#0memory#15' is also of no use as it just takes the
mc_num and dimm_num as a reference, which can be determined from sysfs
without that label.

Your add on patch to just ignore the write does not revert to we old
behavior as you will see now 'mc#0memory#15' in the error reports
where you have seen 'unknown label' before.

So the question is what to do if that information is missing?

Note: it should better show "unknown label ..." instead of "unknown
memory ...".

Thanks,

-Robert

[PATCH v3 4/5] EDAC/ghes: Have a separate code path for creating the fake MC

2020-05-19 Thread Robert Richter

The code in ghes_edac_register() switches back and forth between
standard and fake controller creation. Do one thing only and separate
the code path that creates the fake MC.

Note: For better review the code is not yet carved out in separate
functions.

Signed-off-by: Robert Richter 
---
 drivers/edac/ghes_edac.c | 73 +---
 1 file changed, 46 insertions(+), 27 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 8329af037fbe..47b99e0fea6d 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -536,7 +536,6 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
struct dimm_fill dimm_fill;
int rc = 0, num_dimm = 0;
struct mem_ctl_info *mci;
-   bool fake = false;
int idx;
 
if (IS_ENABLED(CONFIG_X86)) {
@@ -560,24 +559,44 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
/* Get the number of DIMMs */
dmi_walk(ghes_edac_count_dimms, _dimm);
 
-   /* Check if we've got a bogus BIOS */
-   if (num_dimm == 0) {
-   fake = true;
-   num_dimm = 1;
-   }
+   if (!num_dimm) {
+   /*
+* Bogus BIOS: Ignore DMI topology and use a single MC
+* with only one DIMM for the whole address range to
+* catch all errros.
+*/
+   struct dimm_info *dimm;
 
-   mci = ghes_mc_create(dev, 0, num_dimm);
-   if (!mci) {
-   pr_info("Can't allocate memory for EDAC data\n");
-   rc = -ENOMEM;
-   goto unlock;
-   }
+   mci = ghes_mc_create(dev, 0, 1);
+   if (!mci) {
+   rc = -ENOMEM;
+   goto unlock;
+   }
+
+   dimm = edac_get_dimm_by_index(mci, 0);
+
+   dimm->nr_pages = 1;
+   dimm->grain = 128;
+   dimm->mtype = MEM_UNKNOWN;
+   dimm->dtype = DEV_UNKNOWN;
+   dimm->edac_mode = EDAC_SECDED;
+
+   snprintf(dimm->label, sizeof(dimm->label), "unknown memory");
+
+   rc = ghes_mc_add(mci);
+   if (rc) {
+   ghes_mc_destroy(mci);
+   goto unlock;
+   }
 
-   if (fake) {
pr_info("This system has a very crappy BIOS: It doesn't even 
list the DIMMS.\n");
pr_info("Its SMBIOS info is wrong. It is doubtful that the 
error report would\n");
pr_info("work on such system. Use this driver with caution\n");
-   } else if (idx < 0) {
+
+   goto out;
+   }
+
+   if (idx < 0) {
pr_info("This EDAC driver relies on BIOS to enumerate memory 
and get error reports.\n");
pr_info("Unfortunately, not all BIOSes reflect the memory 
layout correctly.\n");
pr_info("So, the end result of using this driver varies from 
vendor to vendor.\n");
@@ -586,31 +605,31 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
pr_info("This system has %d DIMM sockets.\n", num_dimm);
}
 
-   if (!fake) {
-   dimm_fill.index = 0;
-   dimm_fill.mci = mci;
-   dmi_walk(ghes_edac_dmidecode, _fill);
-   } else {
-   struct dimm_info *dimm = edac_get_dimm_by_index(mci, 0);
-
-   dimm->nr_pages = 1;
-   dimm->grain = 128;
-   dimm->mtype = MEM_UNKNOWN;
-   dimm->dtype = DEV_UNKNOWN;
-   dimm->edac_mode = EDAC_SECDED;
+   mci = ghes_mc_create(dev, 0, num_dimm);
+   if (!mci) {
+   rc = -ENOMEM;
+   goto unlock;
}
 
+   dimm_fill.index = 0;
+   dimm_fill.mci = mci;
+
+   dmi_walk(ghes_edac_dmidecode, _fill);
+
rc = ghes_mc_add(mci);
if (rc < 0) {
-   pr_info("Can't register at EDAC core\n");
ghes_mc_destroy(mci);
goto unlock;
}
 
+out:
/* only set on success */
refcount_set(_refcount, 1);
 
 unlock:
+   if (rc < 0)
+   pr_info("Can't register at EDAC core\n");
+
mutex_unlock(_reg_mutex);
 
return rc;
-- 
2.20.1

[PATCH v3 1/5] EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_pvt

2020-05-19 Thread Robert Richter

The struct members list and ghes of struct ghes_edac_pvt are unused,
remove them. On that occasion, rename it to the shorter name struct
ghes_pvt.

Signed-off-by: Robert Richter 
---
 drivers/edac/ghes_edac.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index c7d404629863..2ed48a5d48d6 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -15,9 +15,7 @@
 #include "edac_module.h"
 #include 
 
-struct ghes_edac_pvt {
-   struct list_head list;
-   struct ghes *ghes;
+struct ghes_pvt {
struct mem_ctl_info *mci;
 
/* Buffers for the error handling routine */
@@ -32,7 +30,7 @@ static refcount_t ghes_refcount = REFCOUNT_INIT(0);
  * also provides the necessary (implicit) memory barrier for the SMP
  * case to make the pointer visible on another CPU.
  */
-static struct ghes_edac_pvt *ghes_pvt;
+static struct ghes_pvt *ghes_pvt;
 
 /* GHES registration mutex */
 static DEFINE_MUTEX(ghes_reg_mutex);
@@ -216,7 +214,7 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
 {
struct edac_raw_error_desc *e;
struct mem_ctl_info *mci;
-   struct ghes_edac_pvt *pvt;
+   struct ghes_pvt *pvt;
unsigned long flags;
char *p;
 
@@ -468,7 +466,7 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
bool fake = false;
int rc = 0, num_dimm = 0;
struct mem_ctl_info *mci;
-   struct ghes_edac_pvt *pvt;
+   struct ghes_pvt *pvt;
struct edac_mc_layer layers[1];
struct ghes_edac_dimm_fill dimm_fill;
unsigned long flags;
@@ -505,7 +503,7 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
layers[0].size = num_dimm;
layers[0].is_virt_csrow = true;
 
-   mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct 
ghes_edac_pvt));
+   mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct 
ghes_pvt));
if (!mci) {
pr_info("Can't allocate memory for EDAC data\n");
rc = -ENOMEM;
@@ -513,7 +511,6 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
}
 
pvt = mci->pvt_info;
-   pvt->ghes   = ghes;
pvt->mci= mci;
 
mci->pdev = dev;
-- 
2.20.1

[PATCH v3 0/5] EDAC/ghes: Cleanup and reworks

2020-05-19 Thread Robert Richter

This series contains a general cleanup and rework of the edac ghes
driver:

 * Some small code improvements (patches #1, #2).

 * Code in functions ghes_edac_{register,unregister}() is move to new
   functions ghes_mc_{create,destroy}() and ghes_mc_{add,del}() (patch
   #3).

 * Separated 'fake' controller code path (patches #4, #5).

Tested on a Marvell/Cavium ThunderX2 Sabre (dual socket) system.

v3:
 * Rebased onto dc63e28efa19 ("Merge branch 'edac-i10nm' into
   edac-for-next") plus patch "EDAC/ghes: Setup DIMM label from DMI
   and use it in error reports" applied from
   https://lore.kernel.org/patchwork/patch/1243203/
 * Removed v2 patches 01/10 and 02/10 for edac_mc driver from this
   series, both are unrelated.
 * Dropped v2 patch 04/10 "EDAC/ghes: Make SMBIOS handle private data
   to ghes", there is no consent with the maintainer to the code
   introduced to get a private ghes_dimm data structure, nor there was
   any feasible alternative suggested.
 * Taken v2 patch 05/10 "EDAC/ghes: Setup DIMM label from DMI and use
   it in error reports" out of this series and submitted separately
   (see above patchwork link).
 * Dropped v2 patch 06/10, keep rdr_mask variable.
 * Fixed subject of v2 patch 07/10 to 'EDAC/ghes: Cleanup struct
   dimm_fill'.
 * Reworked function interface, there is now
   ghes_mc_{create,destroy}() and ghes_mc_{add,del}().
 * Aligned arguments on the opening brace (ghes_mc_*()).
 * Remove ghes_ prefix from ghes_dimm_* definitions.
 * Use sizeof(struct ghes_pvt) in edac_mc_alloc().
 * Rename struct ghes_mci to struct ghes_pvt.

v2:
 * https://lore.kernel.org/patchwork/cover/1229380/
 * reordered patches to have fixes and struct changes first, code
   refactoring patches last,
 * dropped v1 patches #9 to #11 (multiple conrollers) to handle them
   in a separate series,
 * rewrote patch to remove smbios_handle (based on v1 #9): EDAC/ghes:
   Move smbios_handle from struct dimm_info to ghes private data,
 * added lockdep_assert_held() checkers,
 * renamed struct ghes_dimm_fill to struct dimm_fill,
 * renamed local variable dimms to dimm_list to avoid conflict with
   the global variable,
 * removed dimm list for "fake" controller,
 * fixed return code check to use (rc < 0),
 * added: EDAC/mc: Fix usage of snprintf() and dimm location setup

v1:
 * https://lore.kernel.org/patchwork/cover/1205901/


Robert Richter (5):
  EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to
ghes_pvt
  EDAC/ghes: Cleanup struct dimm_fill
  EDAC/ghes: Carve out MC device handling into separate functions
  EDAC/ghes: Have a separate code path for creating the fake MC
  EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}()

 drivers/edac/ghes_edac.c | 254 ---
 1 file changed, 159 insertions(+), 95 deletions(-)

-- 
2.20.1

[PATCH v3 2/5] EDAC/ghes: Cleanup struct dimm_fill

2020-05-19 Thread Robert Richter

The struct is used to store temporary data for the dmidecode callback.
Clean this up a bit:

 1) Rename member count to index since this is what it is used for.

 2) Move code close to ghes_edac_dmidecode() where it is used.

 3) While at it, use edac_get_dimm_by_index().

Signed-off-by: Robert Richter 
---
 drivers/edac/ghes_edac.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 2ed48a5d48d6..b72fe10b84d4 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -72,11 +72,6 @@ struct memdev_dmi_entry {
u16 conf_mem_clk_speed;
 } __attribute__((__packed__));
 
-struct ghes_edac_dimm_fill {
-   struct mem_ctl_info *mci;
-   unsigned int count;
-};
-
 static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
 {
int *num_dimm = arg;
@@ -112,19 +107,24 @@ static void dimm_setup_label(struct dimm_info *dimm, u16 
handle)
"unknown memory (handle: 0x%.4x)", handle);
 }
 
+struct dimm_fill {
+   struct mem_ctl_info *mci;
+   int index;
+};
+
 static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 {
-   struct ghes_edac_dimm_fill *dimm_fill = arg;
+   struct dimm_fill *dimm_fill = arg;
struct mem_ctl_info *mci = dimm_fill->mci;
 
if (dh->type == DMI_ENTRY_MEM_DEVICE) {
struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
-   struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count, 
0, 0);
+   struct dimm_info *dimm = edac_get_dimm_by_index(mci, 
dimm_fill->index);
u16 rdr_mask = BIT(7) | BIT(13);
 
if (entry->size == 0x) {
pr_info("Can't get DIMM%i size\n",
-   dimm_fill->count);
+   dimm_fill->index);
dimm->nr_pages = MiB_TO_PAGES(32);/* Unknown */
} else if (entry->size == 0x7fff) {
dimm->nr_pages = MiB_TO_PAGES(entry->extended_size);
@@ -196,7 +196,7 @@ static void ghes_edac_dmidecode(const struct dmi_header 
*dh, void *arg)
 
if (dimm->nr_pages) {
edac_dbg(1, "DIMM%i: %s size = %d MB%s\n",
-   dimm_fill->count, edac_mem_types[dimm->mtype],
+   dimm_fill->index, edac_mem_types[dimm->mtype],
PAGES_TO_MiB(dimm->nr_pages),
(dimm->edac_mode != EDAC_NONE) ? "(ECC)" : "");
edac_dbg(2, "\ttype %d, detail 0x%02x, width %d(total 
%d)\n",
@@ -206,7 +206,7 @@ static void ghes_edac_dmidecode(const struct dmi_header 
*dh, void *arg)
 
dimm->smbios_handle = entry->handle;
 
-   dimm_fill->count++;
+   dimm_fill->index++;
}
 }
 
@@ -468,7 +468,7 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
struct mem_ctl_info *mci;
struct ghes_pvt *pvt;
struct edac_mc_layer layers[1];
-   struct ghes_edac_dimm_fill dimm_fill;
+   struct dimm_fill dimm_fill;
unsigned long flags;
int idx = -1;
 
@@ -535,11 +535,11 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
}
 
if (!fake) {
-   dimm_fill.count = 0;
+   dimm_fill.index = 0;
dimm_fill.mci = mci;
dmi_walk(ghes_edac_dmidecode, _fill);
} else {
-   struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
+   struct dimm_info *dimm = edac_get_dimm_by_index(mci, 0);
 
dimm->nr_pages = 1;
dimm->grain = 128;
-- 
2.20.1

[PATCH v3 3/5] EDAC/ghes: Carve out MC device handling into separate functions

2020-05-19 Thread Robert Richter

The functions are too long, carve out code that handles MC devices
into the new functions ghes_mc_create(), ghes_mc_add_or_free() and
ghes_mc_free(). Apart from better code readability the functions can
be reused and the implementation of the error paths becomes easier.

Signed-off-by: Robert Richter 
---
 drivers/edac/ghes_edac.c | 128 ---
 1 file changed, 80 insertions(+), 48 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index b72fe10b84d4..8329af037fbe 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -461,16 +461,83 @@ static struct acpi_platform_list plat_list[] = {
{ } /* End */
 };
 
-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
+  int num_dimm)
 {
-   bool fake = false;
-   int rc = 0, num_dimm = 0;
+   struct edac_mc_layer layers[1];
struct mem_ctl_info *mci;
struct ghes_pvt *pvt;
-   struct edac_mc_layer layers[1];
-   struct dimm_fill dimm_fill;
+
+   layers[0].type = EDAC_MC_LAYER_ALL_MEM;
+   layers[0].size = num_dimm;
+   layers[0].is_virt_csrow = true;
+
+   mci = edac_mc_alloc(mc_idx, ARRAY_SIZE(layers), layers, sizeof(struct 
ghes_pvt));
+   if (!mci)
+   return NULL;
+
+   pvt = mci->pvt_info;
+   pvt->mci= mci;
+
+   mci->pdev = dev;
+   mci->mtype_cap = MEM_FLAG_EMPTY;
+   mci->edac_ctl_cap = EDAC_FLAG_NONE;
+   mci->edac_cap = EDAC_FLAG_NONE;
+   mci->mod_name = "ghes_edac.c";
+   mci->ctl_name = "ghes_edac";
+   mci->dev_name = "ghes";
+
+   return mci;
+}
+
+static void ghes_mc_destroy(struct mem_ctl_info *mci)
+{
+   if (mci)
+   edac_mc_free(mci);
+}
+
+static int ghes_mc_add(struct mem_ctl_info *mci)
+{
unsigned long flags;
-   int idx = -1;
+   int rc;
+
+   rc = edac_mc_add_mc(mci);
+   if (rc < 0)
+   return rc;
+
+   spin_lock_irqsave(_lock, flags);
+   ghes_pvt = mci->pvt_info;
+   spin_unlock_irqrestore(_lock, flags);
+
+   return 0;
+}
+
+static struct mem_ctl_info *ghes_mc_del(void)
+{
+   struct mem_ctl_info *mci;
+   unsigned long flags;
+
+   /*
+* Wait for the irq handler being finished.
+*/
+   spin_lock_irqsave(_lock, flags);
+   mci = ghes_pvt ? ghes_pvt->mci : NULL;
+   ghes_pvt = NULL;
+   spin_unlock_irqrestore(_lock, flags);
+
+   if (mci)
+   mci = edac_mc_del_mc(mci->pdev);
+
+   return mci;
+}
+
+int ghes_edac_register(struct ghes *ghes, struct device *dev)
+{
+   struct dimm_fill dimm_fill;
+   int rc = 0, num_dimm = 0;
+   struct mem_ctl_info *mci;
+   bool fake = false;
+   int idx;
 
if (IS_ENABLED(CONFIG_X86)) {
/* Check if safe to enable on this system */
@@ -499,28 +566,13 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
num_dimm = 1;
}
 
-   layers[0].type = EDAC_MC_LAYER_ALL_MEM;
-   layers[0].size = num_dimm;
-   layers[0].is_virt_csrow = true;
-
-   mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct 
ghes_pvt));
+   mci = ghes_mc_create(dev, 0, num_dimm);
if (!mci) {
pr_info("Can't allocate memory for EDAC data\n");
rc = -ENOMEM;
goto unlock;
}
 
-   pvt = mci->pvt_info;
-   pvt->mci= mci;
-
-   mci->pdev = dev;
-   mci->mtype_cap = MEM_FLAG_EMPTY;
-   mci->edac_ctl_cap = EDAC_FLAG_NONE;
-   mci->edac_cap = EDAC_FLAG_NONE;
-   mci->mod_name = "ghes_edac.c";
-   mci->ctl_name = "ghes_edac";
-   mci->dev_name = "ghes";
-
if (fake) {
pr_info("This system has a very crappy BIOS: It doesn't even 
list the DIMMS.\n");
pr_info("Its SMBIOS info is wrong. It is doubtful that the 
error report would\n");
@@ -548,18 +600,13 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
dimm->edac_mode = EDAC_SECDED;
}
 
-   rc = edac_mc_add_mc(mci);
+   rc = ghes_mc_add(mci);
if (rc < 0) {
pr_info("Can't register at EDAC core\n");
-   edac_mc_free(mci);
-   rc = -ENODEV;
+   ghes_mc_destroy(mci);
goto unlock;
}
 
-   spin_lock_irqsave(_lock, flags);
-   ghes_pvt = pvt;
-   spin_unlock_irqrestore(_lock, flags);
-
/* only set on success */
refcount_set(_refcount, 1);
 
@@ -572,28 +619,13 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
 void ghes_edac_unre

[PATCH v3 5/5] EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}()

2020-05-19 Thread Robert Richter

Factor out code to register a memory controller including DIMMs. Do
this for standard and fake memory controller in the two functions
ghes_edac_register_one() and ghes_edac_register_fake().

Function ghes_edac_register_one() could be reused to register multiple
*mci structs.

Signed-off-by: Robert Richter 
---
 drivers/edac/ghes_edac.c | 94 +++-
 1 file changed, 55 insertions(+), 39 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 47b99e0fea6d..b68cd22e68bf 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -531,11 +531,60 @@ static struct mem_ctl_info *ghes_mc_del(void)
return mci;
 }
 
-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static int ghes_edac_register_fake(struct device *dev)
+{
+   struct mem_ctl_info *mci;
+   struct dimm_info *dimm;
+   int rc;
+
+   mci = ghes_mc_create(dev, 0, 1);
+   if (!mci)
+   return -ENOMEM;
+
+   dimm = edac_get_dimm_by_index(mci, 0);
+
+   dimm->nr_pages = 1;
+   dimm->grain = 128;
+   dimm->mtype = MEM_UNKNOWN;
+   dimm->dtype = DEV_UNKNOWN;
+   dimm->edac_mode = EDAC_SECDED;
+
+   snprintf(dimm->label, sizeof(dimm->label), "unknown memory");
+
+   rc = ghes_mc_add(mci);
+
+   if (rc < 0)
+   ghes_mc_destroy(mci);
+
+   return rc;
+}
+
+static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm)
 {
struct dimm_fill dimm_fill;
-   int rc = 0, num_dimm = 0;
struct mem_ctl_info *mci;
+   int rc;
+
+   mci = ghes_mc_create(dev, mc_idx, num_dimm);
+   if (!mci)
+   return -ENOMEM;
+
+   dimm_fill.index = 0;
+   dimm_fill.mci = mci;
+
+   dmi_walk(ghes_edac_dmidecode, _fill);
+
+   rc = ghes_mc_add(mci);
+
+   if (rc < 0)
+   ghes_mc_destroy(mci);
+
+   return rc;
+}
+
+int ghes_edac_register(struct ghes *ghes, struct device *dev)
+{
+   int rc = 0, num_dimm = 0;
int idx;
 
if (IS_ENABLED(CONFIG_X86)) {
@@ -565,29 +614,9 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
 * with only one DIMM for the whole address range to
 * catch all errros.
 */
-   struct dimm_info *dimm;
-
-   mci = ghes_mc_create(dev, 0, 1);
-   if (!mci) {
-   rc = -ENOMEM;
+   rc = ghes_edac_register_fake(dev);
+   if (rc < 0)
goto unlock;
-   }
-
-   dimm = edac_get_dimm_by_index(mci, 0);
-
-   dimm->nr_pages = 1;
-   dimm->grain = 128;
-   dimm->mtype = MEM_UNKNOWN;
-   dimm->dtype = DEV_UNKNOWN;
-   dimm->edac_mode = EDAC_SECDED;
-
-   snprintf(dimm->label, sizeof(dimm->label), "unknown memory");
-
-   rc = ghes_mc_add(mci);
-   if (rc) {
-   ghes_mc_destroy(mci);
-   goto unlock;
-   }
 
pr_info("This system has a very crappy BIOS: It doesn't even 
list the DIMMS.\n");
pr_info("Its SMBIOS info is wrong. It is doubtful that the 
error report would\n");
@@ -605,22 +634,9 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
pr_info("This system has %d DIMM sockets.\n", num_dimm);
}
 
-   mci = ghes_mc_create(dev, 0, num_dimm);
-   if (!mci) {
-   rc = -ENOMEM;
-   goto unlock;
-   }
-
-   dimm_fill.index = 0;
-   dimm_fill.mci = mci;
-
-   dmi_walk(ghes_edac_dmidecode, _fill);
-
-   rc = ghes_mc_add(mci);
-   if (rc < 0) {
-   ghes_mc_destroy(mci);
+   rc = ghes_edac_register_one(dev, 0, num_dimm);
+   if (rc < 0)
goto unlock;
-   }
 
 out:
/* only set on success */
-- 
2.20.1

Re: [PATCH v2 08/10] EDAC/ghes: Carve out MC device handling into separate functions

2020-05-19 Thread Robert Richter

On 11.05.20 15:32:03, Borislav Petkov wrote:
> see below. It probably doesn't work but this is what it should do -
> straightforward and simple.
> 
> And now that I've looked at this in more detail, this whole DIMM
> counting is still not doing what it should do.
> 
> So lemme try again:

As you have major concerns with my code that deals with ghes private
dimm data, let's just keep smbios_handle in struct dimm_info.  ATM I
do not see any alternative solution suggested how this could be
implemented. So I am going to drop patch '[PATCH v2 04/10] EDAC/ghes:
Make SMBIOS handle private data to ghes' from this series.

Thanks,

-Robert

Re: [PATCH v2 07/10] EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to ghes_dimm_fill

2020-05-19 Thread Robert Richter

On 27.04.20 16:00:43, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:11PM +0200, Robert Richter wrote:
> > The struct is used to store temporary data for the dmidecode callback.
> > Clean this up a bit:
> > 
> >  1) Rename member count to index since this is what it is used for.
> > 
> >  2) Move code close to ghes_edac_dmidecode() where it is used.
> > 
> >  3) While at it, use edac_get_dimm_by_index().
> > 
> > Signed-off-by: Robert Richter 
> > ---
> >  drivers/edac/ghes_edac.c | 24 
> >  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> Ok except the commit title is wrong. And yes, pls keep it "dimm_fill" -
> short and sweet and without yet another "ghes" in the name. :)

Fixed the title.

Thanks,

-Robert

Re: [PATCH v2 06/10] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()

2020-05-19 Thread Robert Richter

On 27.04.20 19:34:02, Borislav Petkov wrote:
> On Mon, Apr 27, 2020 at 10:24:08AM -0700, Luck, Tony wrote:
> > That isn't the same. The previous version checked that BOTH bits
> > 7 and 13 were set. Your version checks for either bit.
> 
> Whoops, I'm confused again. ;-\
> 
> > Looks like the original with the local variable was checking for both
> > bits set.
> 
> Yeah, let's leave it as it is. I prefer the rdr_mask thing.

I am dropping this patch from the series.

Thanks,

-Robert

Re: [PATCH v2 02/10] EDAC/mc: Use int type for parameters of edac_mc_alloc()

2020-05-19 Thread Robert Richter

On 23.04.20 19:49:34, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:06PM +0200, Robert Richter wrote:
> > Most iterators use int type as index. mci->mc_idx is also type int. So
> > use int type for parameters of edac_mc_alloc(). Extend the range check
> > to check for negative values. There is a type cast now when assigning
> > variable n_layers to mci->n_layer, it is safe due to the range check.
> > 
> > While at it, rename the users of edac_mc_alloc() to mc_idx as this
> > fits better here.
> > 
> > Signed-off-by: Robert Richter 
> > ---
> >  drivers/edac/edac_mc.c | 7 +++
> >  drivers/edac/edac_mc.h | 6 +++---
> >  2 files changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> > index 107d7c4de933..57d1d356d69c 100644
> > --- a/drivers/edac/edac_mc.c
> > +++ b/drivers/edac/edac_mc.c
> > @@ -444,8 +444,7 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
> > return 0;
> >  }
> >  
> > -struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> > -  unsigned int n_layers,
> > +struct mem_ctl_info *edac_mc_alloc(int mc_idx, int n_layers,
> >struct edac_mc_layer *layers,
> >unsigned int sz_pvt)
> >  {
> > @@ -456,7 +455,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> > void *pvt, *ptr = NULL;
> > bool per_rank = false;
> >  
> > -   if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
> > +   if (WARN_ON(mc_idx < 0 || n_layers < 1 || n_layers > EDAC_MAX_LAYERS))
> > return NULL;
> 
> Yeah, no, this doesn't make sense to me. The memory controller number
> and the number of layers can never ever be negative and thus signed.
> 
> And some drivers supply unsigned types and some signed. So if anything,
> this should be fixing all the callers to supply unsigned quantities.

mci->mc_idx is of type int and there is a cast here that should be
fixed. IMO that should be a signed int as some interfaces (esp. if you
search for an index) that require a negative value to report errors or
something could not be found.

So let's take this patch out of this series if you want have it
different.

Thanks,

-Robert

Re: [PATCH v2 01/10] EDAC/mc: Fix usage of snprintf() and dimm location setup

2020-05-19 Thread Robert Richter

On 22.04.20 22:52:43, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:05PM +0200, Robert Richter wrote:
> > The setup of the dimm->location may be incomplete in case writing to
> > dimm->label fails due to small buffer size. Fix this by iterating
> > through all existing layers.
> > 
> > Also, the return value of snprintf() can be higher than the number of
> > bytes written to the buffer in case it is to small. Fix usage of
> > snprintf() by either porting it to scnprintf() or fixing the handling
> > of the return code.
> > 
> > It is very unlikely the buffer is too small in practice, but fixing it
> > anyway.
> > 
> > Signed-off-by: Robert Richter 
> > ---
> >  drivers/edac/edac_mc.c | 20 ++--
> >  1 file changed, 10 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> > index 75ede27bdf6a..107d7c4de933 100644
> > --- a/drivers/edac/edac_mc.c
> > +++ b/drivers/edac/edac_mc.c
> > @@ -130,11 +130,11 @@ unsigned int edac_dimm_info_location(struct dimm_info 
> > *dimm, char *buf,
> > n = snprintf(p, len, "%s %d ",
> >   edac_layer_name[mci->layers[i].type],
> >   dimm->location[i]);
> > +   if (len <= n)
> > +   return count + len - 1;
> > p += n;
> > len -= n;
> > count += n;
> > -   if (!len)
> > -   break;
> > }
> >  
> > return count;
> > @@ -397,19 +397,19 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info 
> > *mci)
> >  */
> > len = sizeof(dimm->label);
> > p = dimm->label;
> > -   n = snprintf(p, len, "mc#%u", mci->mc_idx);
> > +   n = scnprintf(p, len, "mc#%u", mci->mc_idx);
> > p += n;
> > len -= n;
> > +
> > for (layer = 0; layer < mci->n_layers; layer++) {
> > -   n = snprintf(p, len, "%s#%u",
> > -edac_layer_name[mci->layers[layer].type],
> > -pos[layer]);
> 
> The edac_layer_name[]'s are single words of a couple of letters and the
> pos is a number. The buffer we pass in is at least 80 chars and in one
> place even a PAGE_SIZE.
> 
> But in general, this is just silly with the buffers on stack and
> printing into them.
> 
> It would be much better to opencode that loop in
> edac_dimm_info_location() and simply dump those layer names at the call
> sites. And then kill that silly edac_dimm_info_location() function. See
> below for example.
> 
> And then since two call sites do edac_dbg(), you can put that in a
> function edac_dbg_dump_dimm_location() or so and call it and not care
> about any buffer lengths and s*printf's and so on.
> 
> Right?

The aim of this patch is just to fix snprintf() users. Anything else
would involve a larger cleanup. It is not only about edac_dbg(), there
are other users of edac_layer_name[] which implement similar things
that need to be looked at.

So I am dropping this patch from the series.

Thanks,

-Robert

[PATCH v3] EDAC/ghes: Setup DIMM label from DMI and use it in error reports

2020-05-18 Thread Robert Richter

The ghes driver reports errors with 'unknown label' even if the actual
DIMM label is known, e.g.:

 EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0
   module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0
   page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location:
   node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM
   location:N0 DIMM_A0 status(0x0400): Storage error in
   DRAM memory)

Fix this by using struct dimm_info's label string in error reports:

 EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0
   rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0
   page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location:
   node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM
   location:N0 DIMM_A0 status(0x0400): Storage error in
   DRAM memory)

The labels are initialized by reading the bank and device strings from
DMI. Now, the label information can also read from sysfs. E.g. a
ThunderX2 system will show the following:

 /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
 /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0
 /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0
 /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0
 /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0
 /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0
 /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0
 /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0
 /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0

Since dimm_labels can be rewritten, that label will be used in a later
error report:

 # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label
 # # some error injection here
 # dmesg | grep foobar
 [ 2119.784489] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0
 module:0 rank:0 bank:769 col:1 bit_pos:16 DIMM location:foobar
 page:0x94d027 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0
 card:0 module:0 rank:0 bank:769 col:1 bit_pos:16 DIMM location:foobar
 status(0x0400): Storage error in DRAM memory)

Signed-off-by: Robert Richter 
---
This patch is a self-contained version of:

 [v2,05/10] EDAC/ghes: Setup DIMM label from DMI and use it in error reports
 https://lore.kernel.org/patchwork/patch/1229388/

 [02/11] EDAC/ghes: Setup DIMM label from DMI and use it in error reports
 https://lore.kernel.org/patchwork/patch/1205891/

It applies on ras:edac-for-next commit id af8a9a36af01 ("EDAC/ghes:
Setup DIMM label from DMI and use it in error reports").

v3 changes:

 * shortend function name to dimm_setup_label(),

 * let args stick out of find_dimm_by_handle() (line length 82 chars).
---
 drivers/edac/ghes_edac.c | 43 +---
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index cb3dab56a875..c7d404629863 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -87,16 +87,31 @@ static void ghes_edac_count_dimms(const struct dmi_header 
*dh, void *arg)
(*num_dimm)++;
 }
 
-static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle)
+static struct dimm_info *find_dimm_by_handle(struct mem_ctl_info *mci, u16 
handle)
 {
struct dimm_info *dimm;
 
mci_for_each_dimm(mci, dimm) {
if (dimm->smbios_handle == handle)
-   return dimm->idx;
+   return dimm;
}
 
-   return -1;
+   return NULL;
+}
+
+static void dimm_setup_label(struct dimm_info *dimm, u16 handle)
+{
+   const char *bank = NULL, *device = NULL;
+
+   dmi_memdev_name(handle, , );
+
+   /* both strings must be non-zero */
+   if (bank && *bank && device && *device)
+   snprintf(dimm->label, sizeof(dimm->label),
+   "%s %s", bank, device);
+   else
+   snprintf(dimm->label, sizeof(dimm->label),
+   "unknown memory (handle: 0x%.4x)", handle);
 }
 
 static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
@@ -179,9 +194,7 @@ static void ghes_edac_dmidecode(const struct dmi_header 
*dh, void *arg)
dimm->dtype = DEV_UNKNOWN;
dimm->grain = 128;  /* Likely, worse case */
 
-   /*
-* FIXME: It shouldn't be hard to also fill the DIMM labels
-*/
+

Re: [PATCH v2 00/10] EDAC/mc/ghes: Fixes, cleanup and reworks

2020-05-06 Thread Robert Richter

Boris,

On 22.04.20 13:58:04, Robert Richter wrote:
> This series contains edac fixes and a significant cleanup and rework
> of the ghes driver:
> 
>  * fixes and updates for edac_mc (patches #1, #2),
> 
>  * removed smbios_handle from struct dimm_info (patch #4),
> 
>  * fix of DIMM label in error reports (patch #5),
> 
>  * general ghes_edac cleanup and rework (patches #3, #6-#10).
> 
> First 2 patches for edac_mc are individual patches, the remaining
> patches do not depend on them.
> 
> Tested on a Marvell/Cavium ThunderX2 Sabre (dual socket) system.
> 
> v1:
>  * https://lore.kernel.org/patchwork/cover/1205901/
> 
> v2:
>  * reordered patches to have fixes and struct changes first, code
>refactoring patches last,
>  * dropped v1 patches #9 to #11 (multiple conrollers) to handle them
>in a separate series,
>  * rewrote patch to remove smbios_handle (based on v1 #9): EDAC/ghes:
>Move smbios_handle from struct dimm_info to ghes private data,
>  * added lockdep_assert_held() checkers,
>  * renamed struct ghes_dimm_fill to struct dimm_fill,
>  * renamed local variable dimms to dimm_list to avoid conflict with
>the global variable,
>  * removed dimm list for "fake" controller,
>  * fixed return code check to use (rc < 0),
>  * added: EDAC/mc: Fix usage of snprintf() and dimm location setup

thanks for review.

I have addressed all of your review comments if not otherwise noted.
Please take a look at my replies to you. I am a bit unsure on how to
proceed with 08/10. I have sent you a detailed explanation and hope we
can find a solution soon. I could send a v3 then.

Thanks,

-Robert

Re: [PATCH v2 08/10] EDAC/ghes: Carve out MC device handling into separate functions

2020-05-06 Thread Robert Richter

On 27.04.20 18:38:56, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:12PM +0200, Robert Richter wrote:

> > +static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
> > +   struct list_head *dimm_list)
> 
> No, I think we talked about this already. This function should be
> called:
> 
>   ghes_mc_add()
> 
> and should do one thing and one thing only in good old unix tradition:
> add the MC.
> 
> > +{
> > unsigned long flags;
> > -   int idx = -1;
> > +   int rc;
> > +
> > +   rc = edac_mc_add_mc(mci);
> > +   if (rc < 0) {
> 
> > +   ghes_dimm_release(dimm_list);
> > +   edac_mc_free(mci);
> > +   return rc;
> 
> Those last three lines should be called by the *caller* of
> ghes_mc_add(), when latter returns an error value.

These direct operations are nothing a caller should deal with.

The caller does now:

mci = ghes_mc_create(...);
... /* prepare dimms */
return ghes_mc_add_or_free(...);

To shut it down we just use:

ghes_mc_free();

Pretty simple.

Now, lets look at your suggestion to put it out of the function. A
caller always needs to free the mci and dimms, so we will get:

int rc;
mci = ghes_mc_create(...);
... /* prepare dimms */
rc = ghes_mc_add(...);
if (rc < 0) {
/* free mci */
/* free dimms */
...
}
return rc;

We loose the tail call and simplicity here. Note this duplicates code
as there are 2 users of ghes_mc_add().

Now, the caller does not know the implementation details, so we need
to provide another release function (let's call it *_release() here):

mci = ghes_mc_create(...);
... /* prepare dimms */
rc = ghes_mc_add(...);
if (rc < 0) {
ghes_mc_release(mci);
ghes_dimm_release(dimm_list);
}
return rc;

Ok, now there is another function needed to release everything.

This design also impacts ghes_mc_free(). So the shutdown
implementation would turn to:

struct mem_ctl_info *mci;
...
mci = ghes_mc_del();
ghes_mc_release(mci);
...

I don't see any benefit. See also below for the delta of an
implementation of the suggested changes.

> 
> > +   }
> > +
> > +   spin_lock_irqsave(_lock, flags);
> > +   ghes_pvt = mci->pvt_info;
> > +   list_splice_tail(dimm_list, _dimm_list);
> > +   spin_unlock_irqrestore(_lock, flags);
> > +
> > +   return 0;
> > +}
> > +
> > +static void ghes_mc_free(void)
> > +{
> > +   struct mem_ctl_info *mci;
> > +   unsigned long flags;
> > +   LIST_HEAD(dimm_list);
> > +
> > +   /*
> > +* Wait for the irq handler being finished.
> > +*/
> > +   spin_lock_irqsave(_lock, flags);
> > +   mci = ghes_pvt ? ghes_pvt->mci : NULL;
> > +   ghes_pvt = NULL;
> > +   list_splice_init(_dimm_list, _list);
> > +   spin_unlock_irqrestore(_lock, flags);
> > +
> > +   ghes_dimm_release(_list);
> > +
> > +   if (!mci)
> > +   return;
> > +
> > +   mci = edac_mc_del_mc(mci->pdev);
> > +   if (mci)
> > +   edac_mc_free(mci);
> > +}
> 
> This function needs to do only freeing of the mc. The list splicing and
> dimm releasing needs to be done by its caller, before calling it.

ghes_mc_free() is the counterpart to ghes_mc_add() and thus needs to
also handle the dimm_list here. This cannot be left to the caller.

Considering all the above, I don't see how your suggestions to the
interface could improve the code. Hmm...

Below an implementation that illustrates the changes.

Thanks,

-Robert

---
 drivers/edac/ghes_edac.c | 41 +++-
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 7f39346d895b..896d7b488fc2 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -576,18 +576,14 @@ static struct mem_ctl_info *ghes_mc_create(struct device 
*dev, int mc_idx,
return mci;
 }
 
-static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
-  struct list_head *dimms)
+static int ghes_mc_add(struct mem_ctl_info *mci, struct list_head *dimms)
 {
unsigned long flags;
int rc;
 
rc = edac_mc_add_mc(mci);
-   if (rc < 0) {
-   dimm_release(dimms);
-   edac_mc_free(mci);
+   if (rc < 0)
return rc;
-   }
 
spin_lock_irqsave(_lock, flags);
ghes_pvt = mci->pvt_info;
@@ -597,7 +593,7 @@ static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
return 0;
 }
 
-static void ghes

Re: [PATCH v2 04/10] EDAC/ghes: Make SMBIOS handle private data to ghes

2020-05-05 Thread Robert Richter

On 24.04.20 18:21:57, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:08PM +0200, Robert Richter wrote:

> > @@ -562,6 +647,7 @@ void ghes_edac_unregister(struct ghes *ghes)
> >  {
> > struct mem_ctl_info *mci;
> > unsigned long flags;
> > +   LIST_HEAD(dimm_list);
> >  
> > mutex_lock(_reg_mutex);
> >  
> > @@ -574,14 +660,19 @@ void ghes_edac_unregister(struct ghes *ghes)
> > spin_lock_irqsave(_lock, flags);
> > mci = ghes_pvt ? ghes_pvt->mci : NULL;
> > ghes_pvt = NULL;
> > +   list_splice_init(_dimm_list, _list);
> 
> Why do you need to do this?
> 
> Can't you simply do:
> 
>ghes_dimm_release(_dimm_list);
> 
> here?

This decouples the locking. Otherwise ghes_dimm_release() would be
called with ghes_lock held which I want to avoid to keep the locking
simple.

-Robert

Re: [PATCH v2 03/10] EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_mci

2020-05-05 Thread Robert Richter

On 23.04.20 19:55:17, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:07PM +0200, Robert Richter wrote:

> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> > index cb3dab56a875..39efce0df881 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -15,9 +15,7 @@
> >  #include "edac_module.h"
> >  #include 
> >  
> > -struct ghes_edac_pvt {
> > -   struct list_head list;
> > -   struct ghes *ghes;
> > +struct ghes_mci {
> 
> No, that should be "ghes_pvt" because it *is* ghes_edac's private
> structure and there's also an mci pointer in it.

The ghes driver will use private data for both structs, mci and
dimm_info. Thus I named it ghes_mci and ghes_dimm (see next patch) as
they are counterparts. I could name it "ghes_pvt", but the meaning
would be less obvious. Same for your suggestion in the next patch,
struct dimm is too general and could cause namespace conflicts with
other code (think of cscope etc.).

-Robert

Re: [PATCH 00/19] EDAC: Rework edac_mc and ghes drivers

2019-10-14 Thread Robert Richter

On 10.10.19 20:25:01, Robert Richter wrote:
> This patch set is a rework of the ghes_edac and edac_mc driver. It
> addresses issues found during code review and while working with the
> code. The changes include:

Thanks Mauro for your review of the whole series. I hope I could all
your concerns address with my answers.

-Robert

Re: [PATCH 14/19] EDAC, mc: Create new function edac_inc_csrow()

2019-10-14 Thread Robert Richter

On 11.10.19 08:08:25, Mauro Carvalho Chehab wrote:
> Em Thu, 10 Oct 2019 20:25:33 +
> Robert Richter  escreveu:
> 
> > Have a separate function to count errors in csrow/channel. This better
> > separates code and reduces the indentation level. No functional
> > changes.
> 
> This one assumes patch 06/19, with I'm not sure if it is correct.

See also my answer there, the driver should still work as you expect
it even with patch #6 applied. This patch is important to isolate the
csrow handling that makes the code much better understandable and
readable. I hope there are no concerns to this patch.

Thank you for review.

-Robert

> 
> > 
> > Signed-off-by: Robert Richter 
> > ---
> >  drivers/edac/edac_mc.c | 40 +---
> >  1 file changed, 25 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> > index 9e8c5716a8c0..3779204c0e21 100644
> > --- a/drivers/edac/edac_mc.c
> > +++ b/drivers/edac/edac_mc.c
> > @@ -1045,6 +1045,26 @@ static struct mem_ctl_info *error_desc_to_mci(struct 
> > edac_raw_error_desc *e)
> > return container_of(e, struct mem_ctl_info, error_desc);
> >  }
> >  
> > +static void edac_inc_csrow(struct edac_raw_error_desc *e, int row, int 
> > chan)
> > +{
> > +   struct mem_ctl_info *mci = error_desc_to_mci(e);
> > +   u16 count = e->error_count;
> > +   enum hw_event_mc_err_type type = e->type;
> > +
> > +   if (row < 0)
> > +   return;
> > +
> > +   edac_dbg(4, "csrow/channel to increment: (%d,%d)\n", row, chan);
> > +
> > +   if (type == HW_EVENT_ERR_CORRECTED) {
> > +   mci->csrows[row]->ce_count += count;
> > +   if (chan >= 0)
> > +   mci->csrows[row]->channels[chan]->ce_count += count;
> > +   } else {
> > +   mci->csrows[row]->ue_count += count;
> > +   }
> > +}
> > +
> >  void edac_raw_mc_handle_error(struct edac_raw_error_desc *e,
> >   struct dimm_info *dimm)
> >  {
> > @@ -1201,22 +1221,12 @@ void edac_mc_handle_error(const enum 
> > hw_event_mc_err_type type,
> > chan = -2;
> > }
> >  
> > -   if (any_memory) {
> > +   if (any_memory)
> > strcpy(e->label, "any memory");
> > -   } else {
> > -   edac_dbg(4, "csrow/channel to increment: (%d,%d)\n", row, chan);
> > -   if (p == e->label)
> > -   strcpy(e->label, "unknown memory");
> > -   if (type == HW_EVENT_ERR_CORRECTED) {
> > -   if (row >= 0) {
> > -   mci->csrows[row]->ce_count += error_count;
> > -   if (chan >= 0)
> > -   
> > mci->csrows[row]->channels[chan]->ce_count += error_count;
> > -   }
> > -   } else
> > -   if (row >= 0)
> > -   mci->csrows[row]->ue_count += error_count;
> > -   }
> > +   else if (!*e->label)
> > +   strcpy(e->label, "unknown memory");
> > +
> > +   edac_inc_csrow(e, row, chan);
> >  
> > /* Fill the RAM location data */
> > p = e->location;
> 
> 
> 
> Thanks,
> Mauro

Re: [PATCH 12/19] EDAC: Store error type in struct edac_raw_error_desc

2019-10-14 Thread Robert Richter

On 11.10.19 07:54:19, Mauro Carvalho Chehab wrote:
> Em Thu, 10 Oct 2019 20:25:29 +
> Robert Richter  escreveu:
> 
> > Store the error type in struct edac_raw_error_desc. This makes the
> > type parameter of edac_raw_mc_handle_error() obsolete.
> 
> I don't see much gain on this change, but whatever works best for
> ghes.

The error type clearly describes the error. It makes sense to keep it
in struct edac_raw_error_desc as the function interface of
edac_raw_mc_handle_error() becomes easier. There is no reason to have
a function argument for the type while all other error data is in
edac_raw_error_desc.

This change might look trivial, but this series contains many small
changes like this and in the end there is a reasonable change of the
function that describes it much better:

void edac_raw_mc_handle_error(struct edac_raw_error_desc *e,
  struct dimm_info *dimm);

... that reads as handle error described in e that affects this dimm.

-Robert

Re: [PATCH 06/19] EDAC, mc: Remove per layer counters

2019-10-14 Thread Robert Richter

On 11.10.19 07:40:31, Mauro Carvalho Chehab wrote:
> Em Thu, 10 Oct 2019 20:25:16 +
> Robert Richter  escreveu:
> 
> > Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it
> > turns out that only the leaves in the memory hierarchy are consumed
> > (in sysfs), but not the intermediate layers, e.g.:
> > 
> >  count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];
> > 
> > These unused counters only add complexity, remove them. The error
> > counter values are directly stored in struct dimm_info now.
> 
> Hmm... not sure if this patch is correct. I remember that there are some
> border cases on some drivers (maybe the 3-layer drivers used together
> with RDIMM memory controllers?) where some errors are not associated
> to an specific dimm, but, instead, are related to a problem at the memory
> bus.
> 
> Also, depending on how the memory controllers are organized[1], the ECC
> logic groups memory on DIMM pairs. So, when an error occur, it may be
> either at DIMM1 or DIMM2.
> 
> [1] On Intel, this happens with pre-Nehalem memory controllers.
> 
> Due to that, storing errors at the dimm struct sounds wrong, as the
> error may affect multiple DIMMs or even the entire layer.

That was my first thought too, but the counter values are not used at
all. The only exception is to provide *per-dimm* counters here:

 {ce,ue}_per_layer[n_layers-1][dimm->idx]. 

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/edac/edac_mc_sysfs.c?id=4f5cafb5cb8471e54afdc9054d973535614f7675#n567

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/edac/edac_mc_sysfs.c?id=4f5cafb5cb8471e54afdc9054d973535614f7675#n584

The case you mentioned above is if the mc only sends parts of the
error location (with a top, mid or low layer missing). The dimm cannot
be identified then. In this case edac_mc_handle_error() tries to find
a unique row (+ channel infomation if available and lists all possible
dimm labels in e->label. See:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/edac/edac_mc.c?id=4f5cafb5cb8471e54afdc9054d973535614f7675#n1153

Thus, we see a counter increment for row (and also channel if it can
be identified), but this is counted in mci->csrows array only that is
not removed by this patch.

That said, the {ue,ce}_per_layer[] arrays can be removed by keeping
the same driver functionality, esp. the case you mentioned above.

-Robert

Re: [PATCH 05/19] EDAC, mc: Reduce indentation level in edac_mc_handle_error()

2019-10-11 Thread Robert Richter

On 11.10.19 03:50:23, Joe Perches wrote:
> On Fri, 2019-10-11 at 07:20 -0300, Mauro Carvalho Chehab wrote:
> > Em Thu, 10 Oct 2019 15:10:53 -0700
> > Joe Perches  escreveu:
> > 
> > > On Thu, 2019-10-10 at 20:25 +, Robert Richter wrote:
> > > > Reduce the indentation level in edac_mc_handle_error() a bit by using
> > > > continue. No functional changes.  
> > > 
> > > Seems fine, but trivially below:
> > > 
> > > > diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c  
> > > []
> > > > @@ -1171,37 +1171,38 @@ void edac_mc_handle_error(const enum 
> > > > hw_event_mc_err_type type,  
> > > []
> > > > +   strcpy(p, dimm->label);
> > > > +   p += strlen(p);
> > > > +   *p = '\0';  
> > > 
> > > This *p = '\0' is unnecessary as the strcpy already did that.
> > 
> > True, but better to put it on a separate patch, as it makes
> > easier to review if you don't mix code de-indent with changes.
> > 
> > Also, maybe there are other occurrences of this pattern.
> 
> Maybe 80 or so uses of paired
> 
>   strcpy(foo, bar);
>   strlen(foo)
> 
> where another function like stpcpy, which doesn't exist
> in the kernel, could have been used.

Under drivers/edac/ I found this one place only.

So I could create a separate patch as a oneliner with that (trivial)
change?

Thanks,

-Robert

Re: [PATCH 01/19] EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function

2019-10-11 Thread Robert Richter

On 11.10.19 06:58:57, Mauro Carvalho Chehab wrote:
> Anyway, if you have a good reason to add this idx, please place
> it on a separate patch, with a proper description about why
> it is needed.

Right, see next patch. Will change.

-Robert

Re: [PATCH 02/19] EDAC: Remove EDAC_DIMM_OFF() macro

2019-10-11 Thread Robert Richter

Thanks for review.

On 11.10.19 07:09:03, Mauro Carvalho Chehab wrote:
> Em Thu, 10 Oct 2019 20:25:07 +
> Robert Richter  escreveu:
> 
> > The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index.
> > This can be much simplified now as the offset is already stored in
> > struct dimm_info. Use it directly and completely remove the
> > EDAC_DIMM_OFF() macro.
> 
> Ah, now I understand why you added the dimm idx field. One more reason
> why to split it on a separate patch (or perhaps merge it here).

Right, it needs to be part of the this patch.

-Robert

Re: [PATCH 05/19] EDAC, mc: Reduce indentation level in edac_mc_handle_error()

2019-10-11 Thread Robert Richter

On 10.10.19 15:10:53, Joe Perches wrote:
> On Thu, 2019-10-10 at 20:25 +0000, Robert Richter wrote:
> > Reduce the indentation level in edac_mc_handle_error() a bit by using
> > continue. No functional changes.
> 
> Seems fine, but trivially below:
> 
> > diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> []
> > @@ -1171,37 +1171,38 @@ void edac_mc_handle_error(const enum 
> > hw_event_mc_err_type type,
> []
> > +   strcpy(p, dimm->label);
> > +   p += strlen(p);
> > +   *p = '\0';
> 
> This *p = '\0' is unnecessary as the strcpy already did that.

Other changes than the indentation are not the aim of this patch.
However, on the occasion and if there won't be any objections I could
include this trivial change in the patch in my next version of the
series.

Thanks for review.

-Robert

Re: [PATCH 00/19] EDAC: Rework edac_mc and ghes drivers

2019-10-10 Thread Robert Richter

On 10.10.19 20:25:01, Robert Richter wrote:
> This patch set is a rework of the ghes_edac and edac_mc driver. It
> addresses issues found during code review and while working with the
> code. The changes include:

Sorry for the:

  Content-Transfer-Encoding: quoted-printable

I definitely sent it out 8bit, but thanks to office365 this was
converted (IMO this is according to RFC but still a pain).

I will switch to another account for sending patches in the future.

-Robert

[PATCH 07/19] EDAC, mc: Rename iterator variable to idx

2019-10-10 Thread Robert Richter

Rename iterator variable to idx. The name is more handy, esp. when
searching it in the code.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index c1e142643006..a893f793be8a 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -319,7 +319,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
unsigned int idx, size, tot_dimms = 1;
unsigned int tot_csrows = 1, tot_channels = 1;
void *pvt, *p, *ptr = NULL;
-   int i, j, row, chn, n, len;
+   int j, row, chn, n, len;
bool per_rank = false;
 
if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
@@ -329,14 +329,14 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
 * Calculate the total amount of dimms and csrows/cschannels while
 * in the old API emulation mode
 */
-   for (i = 0; i < n_layers; i++) {
-   tot_dimms *= layers[i].size;
-   if (layers[i].is_virt_csrow)
-   tot_csrows *= layers[i].size;
+   for (idx = 0; idx < n_layers; idx++) {
+   tot_dimms *= layers[idx].size;
+   if (layers[idx].is_virt_csrow)
+   tot_csrows *= layers[idx].size;
else
-   tot_channels *= layers[i].size;
+   tot_channels *= layers[idx].size;
 
-   if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
+   if (layers[idx].type == EDAC_MC_LAYER_CHIP_SELECT)
per_rank = true;
}
 
-- 
2.20.1

[PATCH 12/19] EDAC: Store error type in struct edac_raw_error_desc

2019-10-10 Thread Robert Richter

Store the error type in struct edac_raw_error_desc. This makes the
type parameter of edac_raw_mc_handle_error() obsolete.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c   |  8 
 drivers/edac/edac_mc.h   |  4 +---
 drivers/edac/ghes_edac.c | 13 ++---
 include/linux/edac.h |  1 +
 4 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index cdfb383f7a35..ca206854b8ee 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -1040,15 +1040,14 @@ static void edac_ue_error(struct mem_ctl_info *mci,
edac_inc_ue_error(mci, dimm, error_count);
 }
 
-void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
- struct mem_ctl_info *mci,
+void edac_raw_mc_handle_error(struct mem_ctl_info *mci,
  struct dimm_info *dimm,
  struct edac_raw_error_desc *e)
 {
char detail[80];
 
/* Memory type dependent details about the error */
-   if (type == HW_EVENT_ERR_CORRECTED) {
+   if (e->type == HW_EVENT_ERR_CORRECTED) {
snprintf(detail, sizeof(detail),
"page:0x%lx offset:0x%lx grain:%ld syndrome:0x%lx",
e->page_frame_number, e->offset_in_page,
@@ -1095,6 +1094,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type 
type,
/* Fills the error report buffer */
memset(e, 0, sizeof (*e));
e->error_count = error_count;
+   e->type = type;
e->top_layer = top_layer;
e->mid_layer = mid_layer;
e->low_layer = low_layer;
@@ -1243,6 +1243,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type 
type,
 
dimm = edac_get_dimm(mci, top_layer, mid_layer, low_layer);
 
-   edac_raw_mc_handle_error(type, mci, dimm, e);
+   edac_raw_mc_handle_error(mci, dimm, e);
 }
 EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 2c3e2fbcedc4..a8f1b5b5e873 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -212,7 +212,6 @@ extern int edac_mc_find_csrow_by_page(struct mem_ctl_info 
*mci,
  * edac_raw_mc_handle_error() - Reports a memory event to userspace without
  * doing anything to discover the error location.
  *
- * @type:  severity of the error (CE/UE/Fatal)
  * @mci:   a struct mem_ctl_info pointer
  * @dimm:  a struct dimm_info pointer
  * @e: error description
@@ -221,8 +220,7 @@ extern int edac_mc_find_csrow_by_page(struct mem_ctl_info 
*mci,
  * only be called directly when the hardware error come directly from BIOS,
  * like in the case of APEI GHES driver.
  */
-void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
- struct mem_ctl_info *mci,
+void edac_raw_mc_handle_error(struct mem_ctl_info *mci,
  struct dimm_info *dimm,
  struct edac_raw_error_desc *e);
 
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 4f5721cf4380..1db1c012bed9 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -194,7 +194,6 @@ static void ghes_edac_dmidecode(const struct dmi_header 
*dh, void *arg)
 void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 {
struct dimm_info *dimm;
-   enum hw_event_mc_err_type type;
struct edac_raw_error_desc *e;
struct mem_ctl_info *mci;
struct ghes_edac_pvt *pvt = ghes_pvt;
@@ -232,17 +231,17 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
 
switch (sev) {
case GHES_SEV_CORRECTED:
-   type = HW_EVENT_ERR_CORRECTED;
+   e->type = HW_EVENT_ERR_CORRECTED;
break;
case GHES_SEV_RECOVERABLE:
-   type = HW_EVENT_ERR_UNCORRECTED;
+   e->type = HW_EVENT_ERR_UNCORRECTED;
break;
case GHES_SEV_PANIC:
-   type = HW_EVENT_ERR_FATAL;
+   e->type = HW_EVENT_ERR_FATAL;
break;
default:
case GHES_SEV_NO:
-   type = HW_EVENT_ERR_INFO;
+   e->type = HW_EVENT_ERR_INFO;
}
 
edac_dbg(1, "error validation_bits: 0x%08llx\n",
@@ -433,14 +432,14 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
grain_bits = fls_long(e->grain);
snprintf(pvt->detail_location, sizeof(pvt->detail_location),
 "APEI location: %s %s", e->location, e->other_detail);
-   trace_mc_event(type, e->msg, e->label, e->error_count,
+   trace_mc_event(e->type, e->msg, e->label, e->error_count,
   mci->mc_idx, e->top_layer, e->mid_layer, e->low_layer,

[PATCH 11/19] EDAC: Remove misleading comment in struct edac_raw_error_desc

2019-10-10 Thread Robert Richter

There never has been such function edac_raw_error_desc_clean() and in
function ghes_edac_report_mem_error() the whole struct is zero'ed
including the string arrays. Remove that comment.

Signed-off-by: Robert Richter 
---
 include/linux/edac.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/include/linux/edac.h b/include/linux/edac.h
index 8e7e50b0..4d9673954856 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -458,15 +458,10 @@ struct errcount_attribute_data {
  * @other_detail:  other driver-specific detail about the error
  */
 struct edac_raw_error_desc {
-   /*
-* NOTE: everything before grain won't be cleaned by
-* edac_raw_error_desc_clean()
-*/
char location[LOCATION_SIZE];
char label[(EDAC_MC_LABEL_LEN + 1 + sizeof(OTHER_LABEL)) * 
EDAC_MAX_LABELS];
long grain;
 
-   /* the vars below and grain will be cleaned on every new error report */
u16 error_count;
int top_layer;
int mid_layer;
-- 
2.20.1

[PATCH 17/19] EDAC, ghes: Remove intermediate buffer pvt->detail_location

2019-10-10 Thread Robert Richter

detail_location[] is used to collect two location strings so they can
be passed as one to trace_mc_event(). Instead of having an extra copy
step, assemble the location string in other_detail[] from the
beginning.

Using other_detail[] to call trace_mc_event() is now the same as in
edac_mc.c and code can be unified.

Reviewed-by: James Morse 
Signed-off-by: Robert Richter 
---
 drivers/edac/ghes_edac.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 97242cf18a88..8d9d3c4dbebb 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -21,8 +21,7 @@ struct ghes_edac_pvt {
struct mem_ctl_info *mci;
 
/* Buffers for the error handling routine */
-   char detail_location[240];
-   char other_detail[160];
+   char other_detail[400];
char msg[80];
 };
 
@@ -356,6 +355,8 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
 
/* All other fields are mapped on e->other_detail */
p = pvt->other_detail;
+   p += snprintf(p, sizeof(pvt->other_detail),
+   "APEI location: %s ", e->location);
if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_STATUS) {
u64 status = mem_err->error_status;
 
@@ -436,12 +437,10 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
grain_bits = fls_long(e->grain - 1);
 
/* Generate the trace event */
-   snprintf(pvt->detail_location, sizeof(pvt->detail_location),
-"APEI location: %s %s", e->location, e->other_detail);
trace_mc_event(e->type, e->msg, e->label, e->error_count,
   mci->mc_idx, e->top_layer, e->mid_layer, e->low_layer,
   (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
-  grain_bits, e->syndrome, pvt->detail_location);
+  grain_bits, e->syndrome, e->other_detail);
 
dimm = edac_get_dimm_by_index(mci, e->top_layer);
 
-- 
2.20.1

[PATCH 15/19] EDAC, ghes: Use standard kernel macros for page calculations

2019-10-10 Thread Robert Richter

Use standard macros for page calculations.

Reviewed-by: James Morse 
Signed-off-by: Robert Richter 
---
 drivers/edac/ghes_edac.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 8078d4ec9631..851aad92e42d 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -309,8 +309,8 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
 
/* Error address */
if (mem_err->validation_bits & CPER_MEM_VALID_PA) {
-   e->page_frame_number = mem_err->physical_addr >> PAGE_SHIFT;
-   e->offset_in_page = mem_err->physical_addr & ~PAGE_MASK;
+   e->page_frame_number = PHYS_PFN(mem_err->physical_addr);
+   e->offset_in_page = offset_in_page(mem_err->physical_addr);
}
 
/* Error grain */
-- 
2.20.1

[PATCH 16/19] EDAC, ghes: Fix grain calculation

2019-10-10 Thread Robert Richter

The current code to convert a physical address mask to a grain
(defined as granularity in bytes) is:

e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);

This is broken in several ways:

1) It calculates to wrong grain values. E.g., a physical address mask
of ~0xfff should give a grain of 0x1000. Without considering
PAGE_MASK, there is an off-by-one. Things are worse when also
filtering it with ~PAGE_MASK. This will calculate to a grain with the
upper bits set. In the example it even calculates to ~0.

2) The grain does not depend on and is unrelated to the kernel's
page-size. The page-size only matters when unmapping memory in
memory_failure(). Smaller grains are wrongly rounded up to the
page-size, on architectures with a configurable page-size (e.g. arm64)
this could round up to the even bigger page-size of the hypervisor.

Fix this with:

e->grain = ~mem_err->physical_addr_mask + 1;

The grain_bits are defined as:

grain = 1 << grain_bits;

Change also the grain_bits calculation accordingly, it is the same
formula as in edac_mc.c now and the code can be unified.

The value in ->physical_addr_mask coming from firmware is assumed to
be contiguous, but this is not sanity-checked. However, in case the
mask is non-contiguous, a conversion to grain_bits effectively
converts the grain bit mask to a power of 2 by rounding up.

Suggested-by: James Morse 
Signed-off-by: Robert Richter 
---
 drivers/edac/ghes_edac.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 851aad92e42d..97242cf18a88 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -220,6 +220,7 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
/* Cleans the error report buffer */
memset(e, 0, sizeof (*e));
e->error_count = 1;
+   e->grain = 1;
strcpy(e->label, "unknown label");
e->msg = pvt->msg;
e->other_detail = pvt->other_detail;
@@ -315,7 +316,7 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
 
/* Error grain */
if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
-   e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
+   e->grain = ~mem_err->physical_addr_mask + 1;
 
/* Memory error location, mapped on e->location */
p = e->location;
@@ -428,8 +429,13 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
if (p > pvt->other_detail)
*(p - 1) = '\0';
 
+   /* Sanity-check driver-supplied grain value. */
+   if (WARN_ON_ONCE(!e->grain))
+   e->grain = 1;
+
+   grain_bits = fls_long(e->grain - 1);
+
/* Generate the trace event */
-   grain_bits = fls_long(e->grain);
snprintf(pvt->detail_location, sizeof(pvt->detail_location),
 "APEI location: %s %s", e->location, e->other_detail);
trace_mc_event(e->type, e->msg, e->label, e->error_count,
-- 
2.20.1

[PATCH 06/19] EDAC, mc: Remove per layer counters

2019-10-10 Thread Robert Richter

Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it
turns out that only the leaves in the memory hierarchy are consumed
(in sysfs), but not the intermediate layers, e.g.:

 count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];

These unused counters only add complexity, remove them. The error
counter values are directly stored in struct dimm_info now.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c   | 104 ---
 drivers/edac/edac_mc_sysfs.c |  20 +++
 drivers/edac/ghes_edac.c |   5 +-
 include/linux/edac.h |   7 +--
 4 files changed, 46 insertions(+), 90 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 45b02bb31964..c1e142643006 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -315,10 +315,9 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
struct csrow_info *csr;
struct rank_info *chan;
struct dimm_info *dimm;
-   u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
unsigned int pos[EDAC_MAX_LAYERS];
-   unsigned int idx, size, tot_dimms = 1, count = 1;
-   unsigned int tot_csrows = 1, tot_channels = 1, tot_errcount = 0;
+   unsigned int idx, size, tot_dimms = 1;
+   unsigned int tot_csrows = 1, tot_channels = 1;
void *pvt, *p, *ptr = NULL;
int i, j, row, chn, n, len;
bool per_rank = false;
@@ -346,19 +345,10 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
 * stringent as what the compiler would provide if we could simply
 * hardcode everything into a single struct.
 */
-   mci = edac_align_ptr(, sizeof(*mci), 1);
-   layer = edac_align_ptr(, sizeof(*layer), n_layers);
-   for (i = 0; i < n_layers; i++) {
-   count *= layers[i].size;
-   edac_dbg(4, "errcount layer %d size %d\n", i, count);
-   ce_per_layer[i] = edac_align_ptr(, sizeof(u32), count);
-   ue_per_layer[i] = edac_align_ptr(, sizeof(u32), count);
-   tot_errcount += 2 * count;
-   }
-
-   edac_dbg(4, "allocating %d error counters\n", tot_errcount);
-   pvt = edac_align_ptr(, sz_pvt, 1);
-   size = ((unsigned long)pvt) + sz_pvt;
+   mci = edac_align_ptr(, sizeof(*mci), 1);
+   layer   = edac_align_ptr(, sizeof(*layer), n_layers);
+   pvt = edac_align_ptr(, sz_pvt, 1);
+   size= ((unsigned long)pvt) + sz_pvt;
 
edac_dbg(1, "allocating %u bytes for mci data (%d %s, %d 
csrows/channels)\n",
 size,
@@ -374,10 +364,6 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
 * rather than an imaginary chunk of memory located at address 0.
 */
layer = (struct edac_mc_layer *)(((char *)mci) + ((unsigned 
long)layer));
-   for (i = 0; i < n_layers; i++) {
-   mci->ce_per_layer[i] = (u32 *)((char *)mci + ((unsigned 
long)ce_per_layer[i]));
-   mci->ue_per_layer[i] = (u32 *)((char *)mci + ((unsigned 
long)ue_per_layer[i]));
-   }
pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;
 
/* setup index and various internal pointers */
@@ -908,53 +894,31 @@ const char *edac_layer_name[] = {
 EXPORT_SYMBOL_GPL(edac_layer_name);
 
 static void edac_inc_ce_error(struct mem_ctl_info *mci,
- bool enable_per_layer_report,
  const int pos[EDAC_MAX_LAYERS],
  const u16 count)
 {
-   int i, index = 0;
+   struct dimm_info *dimm = edac_get_dimm(mci, pos[0], pos[1], pos[2]);
 
mci->ce_mc += count;
 
-   if (!enable_per_layer_report) {
+   if (dimm)
+   dimm->ce_count += count;
+   else
mci->ce_noinfo_count += count;
-   return;
-   }
-
-   for (i = 0; i < mci->n_layers; i++) {
-   if (pos[i] < 0)
-   break;
-   index += pos[i];
-   mci->ce_per_layer[i][index] += count;
-
-   if (i < mci->n_layers - 1)
-   index *= mci->layers[i + 1].size;
-   }
 }
 
 static void edac_inc_ue_error(struct mem_ctl_info *mci,
-   bool enable_per_layer_report,
const int pos[EDAC_MAX_LAYERS],
const u16 count)
 {
-   int i, index = 0;
+   struct dimm_info *dimm = edac_get_dimm(mci, pos[0], pos[1], pos[2]);
 
mci->ue_mc += count;
 
-   if (!enable_per_layer_report) {
+   if (dimm)
+   dimm->ue_count += count;
+   else
mci->ue_noinfo_count += count;
-   return;
-   }
-
-   for (i = 0; i < mci->n_layers; i++) {
-   if (pos[i] < 0)
-   break;
-

[PATCH 09/19] EDAC, mc: Reorder functions edac_mc_alloc*()

2019-10-10 Thread Robert Richter

Reorder the new created functions edac_mc_alloc_csrows() and
edac_mc_alloc_dimms() and move them before edac_mc_alloc(). No further
code changes.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c | 171 -
 1 file changed, 84 insertions(+), 87 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 0db504cb3419..6d880cf4d599 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -305,93 +305,6 @@ static void _edac_mc_free(struct mem_ctl_info *mci)
kfree(mci);
 }
 
-static int edac_mc_alloc_csrows(struct mem_ctl_info *mci);
-static int edac_mc_alloc_dimms(struct mem_ctl_info *mci);
-
-struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
-  unsigned int n_layers,
-  struct edac_mc_layer *layers,
-  unsigned int sz_pvt)
-{
-   struct mem_ctl_info *mci;
-   struct edac_mc_layer *layer;
-   unsigned int idx, size, tot_dimms = 1;
-   unsigned int tot_csrows = 1, tot_channels = 1;
-   void *pvt, *ptr = NULL;
-   bool per_rank = false;
-
-   if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
-   return NULL;
-
-   /*
-* Calculate the total amount of dimms and csrows/cschannels while
-* in the old API emulation mode
-*/
-   for (idx = 0; idx < n_layers; idx++) {
-   tot_dimms *= layers[idx].size;
-   if (layers[idx].is_virt_csrow)
-   tot_csrows *= layers[idx].size;
-   else
-   tot_channels *= layers[idx].size;
-
-   if (layers[idx].type == EDAC_MC_LAYER_CHIP_SELECT)
-   per_rank = true;
-   }
-
-   /* Figure out the offsets of the various items from the start of an mc
-* structure.  We want the alignment of each item to be at least as
-* stringent as what the compiler would provide if we could simply
-* hardcode everything into a single struct.
-*/
-   mci = edac_align_ptr(, sizeof(*mci), 1);
-   layer   = edac_align_ptr(, sizeof(*layer), n_layers);
-   pvt = edac_align_ptr(, sz_pvt, 1);
-   size= ((unsigned long)pvt) + sz_pvt;
-
-   edac_dbg(1, "allocating %u bytes for mci data (%d %s, %d 
csrows/channels)\n",
-size,
-tot_dimms,
-per_rank ? "ranks" : "dimms",
-tot_csrows * tot_channels);
-
-   mci = kzalloc(size, GFP_KERNEL);
-   if (mci == NULL)
-   return NULL;
-
-   /* Adjust pointers so they point within the memory we just allocated
-* rather than an imaginary chunk of memory located at address 0.
-*/
-   layer = (struct edac_mc_layer *)(((char *)mci) + ((unsigned 
long)layer));
-   pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;
-
-   /* setup index and various internal pointers */
-   mci->mc_idx = mc_num;
-   mci->tot_dimms = tot_dimms;
-   mci->pvt_info = pvt;
-   mci->n_layers = n_layers;
-   mci->layers = layer;
-   memcpy(mci->layers, layers, sizeof(*layer) * n_layers);
-   mci->nr_csrows = tot_csrows;
-   mci->num_cschannel = tot_channels;
-   mci->csbased = per_rank;
-
-   if (edac_mc_alloc_csrows(mci))
-   goto error;
-
-   if (edac_mc_alloc_dimms(mci))
-   goto error;
-
-   mci->op_state = OP_ALLOC;
-
-   return mci;
-
-error:
-   _edac_mc_free(mci);
-
-   return NULL;
-}
-EXPORT_SYMBOL_GPL(edac_mc_alloc);
-
 static int edac_mc_alloc_csrows(struct mem_ctl_info *mci)
 {
unsigned int tot_csrows = mci->nr_csrows;
@@ -520,6 +433,90 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
return 0;
 }
 
+struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
+  unsigned int n_layers,
+  struct edac_mc_layer *layers,
+  unsigned int sz_pvt)
+{
+   struct mem_ctl_info *mci;
+   struct edac_mc_layer *layer;
+   unsigned int idx, size, tot_dimms = 1;
+   unsigned int tot_csrows = 1, tot_channels = 1;
+   void *pvt, *ptr = NULL;
+   bool per_rank = false;
+
+   if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
+   return NULL;
+
+   /*
+* Calculate the total amount of dimms and csrows/cschannels while
+* in the old API emulation mode
+*/
+   for (idx = 0; idx < n_layers; idx++) {
+   tot_dimms *= layers[idx].size;
+   if (layers[idx].is_virt_csrow)
+   tot_csrows *= layers[idx].size;
+   else
+   tot_channels *= layers[idx].size;
+
+   if (layers[idx].type == EDAC_MC_LAYER_CHIP_SELECT)
+

[PATCH 18/19] EDAC, ghes: Unify trace_mc_event() code with edac_mc driver

2019-10-10 Thread Robert Richter

The code in ghes_edac.c and edac_mc.c for grain_bits calculation and
calling trace_mc_event() is now the same. Move it to a single location
in edac_raw_mc_handle_error().

The only difference is the missing IS_ENABLED(CONFIG_RAS) switch, but
this is needed for ghes too.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c   | 30 +++---
 drivers/edac/ghes_edac.c | 13 -
 2 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 3779204c0e21..e6c6e40dc760 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -1070,6 +1070,21 @@ void edac_raw_mc_handle_error(struct edac_raw_error_desc 
*e,
 {
struct mem_ctl_info *mci = error_desc_to_mci(e);
char detail[80];
+   u8 grain_bits;
+
+   /* Sanity-check driver-supplied grain value. */
+   if (WARN_ON_ONCE(!e->grain))
+   e->grain = 1;
+
+   grain_bits = fls_long(e->grain - 1);
+
+   /* Report the error via the trace interface */
+   if (IS_ENABLED(CONFIG_RAS))
+   trace_mc_event(e->type, e->msg, e->label, e->error_count,
+  mci->mc_idx, e->top_layer, e->mid_layer,
+  e->low_layer,
+  (e->page_frame_number << PAGE_SHIFT) | 
e->offset_in_page,
+  grain_bits, e->syndrome, e->other_detail);
 
/* Memory type dependent details about the error */
if (e->type == HW_EVENT_ERR_CORRECTED) {
@@ -1110,7 +1125,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type 
type,
int row = -1, chan = -1;
int pos[EDAC_MAX_LAYERS] = { top_layer, mid_layer, low_layer };
int i, n_labels = 0;
-   u8 grain_bits;
struct edac_raw_error_desc *e = >error_desc;
bool any_memory = true;
 
@@ -1242,20 +1256,6 @@ void edac_mc_handle_error(const enum 
hw_event_mc_err_type type,
if (p > e->location)
*(p - 1) = '\0';
 
-   /* Sanity-check driver-supplied grain value. */
-   if (WARN_ON_ONCE(!e->grain))
-   e->grain = 1;
-
-   grain_bits = fls_long(e->grain - 1);
-
-   /* Report the error via the trace interface */
-   if (IS_ENABLED(CONFIG_RAS))
-   trace_mc_event(type, e->msg, e->label, e->error_count,
-  mci->mc_idx, e->top_layer, e->mid_layer,
-  e->low_layer,
-  (e->page_frame_number << PAGE_SHIFT) | 
e->offset_in_page,
-  grain_bits, e->syndrome, e->other_detail);
-
dimm = edac_get_dimm(mci, top_layer, mid_layer, low_layer);
 
edac_raw_mc_handle_error(e, dimm);
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 8d9d3c4dbebb..17d5b22fe000 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -198,7 +198,6 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
struct ghes_edac_pvt *pvt = ghes_pvt;
unsigned long flags;
char *p;
-   u8 grain_bits;
 
if (!pvt)
return;
@@ -430,18 +429,6 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
if (p > pvt->other_detail)
*(p - 1) = '\0';
 
-   /* Sanity-check driver-supplied grain value. */
-   if (WARN_ON_ONCE(!e->grain))
-   e->grain = 1;
-
-   grain_bits = fls_long(e->grain - 1);
-
-   /* Generate the trace event */
-   trace_mc_event(e->type, e->msg, e->label, e->error_count,
-  mci->mc_idx, e->top_layer, e->mid_layer, e->low_layer,
-  (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
-  grain_bits, e->syndrome, e->other_detail);
-
dimm = edac_get_dimm_by_index(mci, e->top_layer);
 
edac_raw_mc_handle_error(e, dimm);
-- 
2.20.1

[PATCH 14/19] EDAC, mc: Create new function edac_inc_csrow()

2019-10-10 Thread Robert Richter

Have a separate function to count errors in csrow/channel. This better
separates code and reduces the indentation level. No functional
changes.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c | 40 +---
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 9e8c5716a8c0..3779204c0e21 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -1045,6 +1045,26 @@ static struct mem_ctl_info *error_desc_to_mci(struct 
edac_raw_error_desc *e)
return container_of(e, struct mem_ctl_info, error_desc);
 }
 
+static void edac_inc_csrow(struct edac_raw_error_desc *e, int row, int chan)
+{
+   struct mem_ctl_info *mci = error_desc_to_mci(e);
+   u16 count = e->error_count;
+   enum hw_event_mc_err_type type = e->type;
+
+   if (row < 0)
+   return;
+
+   edac_dbg(4, "csrow/channel to increment: (%d,%d)\n", row, chan);
+
+   if (type == HW_EVENT_ERR_CORRECTED) {
+   mci->csrows[row]->ce_count += count;
+   if (chan >= 0)
+   mci->csrows[row]->channels[chan]->ce_count += count;
+   } else {
+   mci->csrows[row]->ue_count += count;
+   }
+}
+
 void edac_raw_mc_handle_error(struct edac_raw_error_desc *e,
  struct dimm_info *dimm)
 {
@@ -1201,22 +1221,12 @@ void edac_mc_handle_error(const enum 
hw_event_mc_err_type type,
chan = -2;
}
 
-   if (any_memory) {
+   if (any_memory)
strcpy(e->label, "any memory");
-   } else {
-   edac_dbg(4, "csrow/channel to increment: (%d,%d)\n", row, chan);
-   if (p == e->label)
-   strcpy(e->label, "unknown memory");
-   if (type == HW_EVENT_ERR_CORRECTED) {
-   if (row >= 0) {
-   mci->csrows[row]->ce_count += error_count;
-   if (chan >= 0)
-   
mci->csrows[row]->channels[chan]->ce_count += error_count;
-   }
-   } else
-   if (row >= 0)
-   mci->csrows[row]->ue_count += error_count;
-   }
+   else if (!*e->label)
+   strcpy(e->label, "unknown memory");
+
+   edac_inc_csrow(e, row, chan);
 
/* Fill the RAM location data */
p = e->location;
-- 
2.20.1

[PATCH 19/19] EDAC, Documentation: Describe CPER module definition and DIMM ranks

2019-10-10 Thread Robert Richter

Update on CPER DIMM naming convention and DIMM ranks.

Signed-off-by: Robert Richter 
---
 Documentation/admin-guide/ras.rst | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/ras.rst 
b/Documentation/admin-guide/ras.rst
index 2b20f5f7380d..26e02a59f0f4 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -330,9 +330,12 @@ There can be multiple csrows and multiple channels.
 
 .. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely
   used to refer to a memory module, although there are other memory
-  packaging alternatives, like SO-DIMM, SIMM, etc. Along this document,
-  and inside the EDAC system, the term "dimm" is used for all memory
-  modules, even when they use a different kind of packaging.
+  packaging alternatives, like SO-DIMM, SIMM, etc. The UEFI
+  specification (Version 2.7) defines a memory module in the Common
+  Platform Error Record (CPER) section to be an SMBIOS Memory Device
+  (Type 17). Along this document, and inside the EDAC system, the term
+  "dimm" is used for all memory modules, even when they use a
+  different kind of packaging.
 
 Memory controllers allow for several csrows, with 8 csrows being a
 typical value. Yet, the actual number of csrows depends on the layout of
@@ -349,12 +352,14 @@ controllers. The following example will assume 2 channels:
||  ``ch0``  |  ``ch1``  |
++===+===+
| ``csrow0`` |  DIMM_A0  |  DIMM_B0  |
-   ++   |   |
-   | ``csrow1`` |   |   |
+   ||   rank0   |   rank0   |
+   ++ - | - |
+   | ``csrow1`` |   rank1   |   rank1   |
++---+---+
| ``csrow2`` |  DIMM_A1  | DIMM_B1   |
-   ++   |   |
-   | ``csrow3`` |   |   |
+   ||   rank0   |   rank0   |
+   ++ - | - |
+   | ``csrow3`` |   rank1   |   rank1   |
++---+---+
 
 In the above example, there are 4 physical slots on the motherboard
@@ -374,11 +379,13 @@ which the memory DIMM is placed. Thus, when 1 DIMM is 
placed in each
 Channel, the csrows cross both DIMMs.
 
 Memory DIMMs come single or dual "ranked". A rank is a populated csrow.
-Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above
-will have just one csrow (csrow0). csrow1 will be empty. On the other
-hand, when 2 dual ranked DIMMs are similarly placed, then both csrow0
-and csrow1 will be populated. The pattern repeats itself for csrow2 and
-csrow3.
+In the example above 2 dual ranked DIMMs are similarly placed. Thus,
+both csrow0 and csrow1 are populated. On the other hand, when 2 single
+ranked DIMMs are placed in slots DIMM_A0 and DIMM_B0, then they will
+have just one csrow (csrow0) and csrow1 will be empty. The pattern
+repeats itself for csrow2 and csrow3. Also note that some memory
+controller doesn't have any logic to identify the memory module, see
+``rankX`` directories below.
 
 The representation of the above is reflected in the directory
 tree in EDAC's sysfs interface. Starting in directory
-- 
2.20.1

[PATCH 13/19] EDAC, mc: Determine mci pointer from the error descriptor

2019-10-10 Thread Robert Richter

Each struct mci has its own error descriptor. Create a function
error_desc_to_mci() to determine the corresponding mci from an error
descriptor. This eases the parameter list of edac_raw_mc_handle_
error() as the mci pointer do not need to be passed any longer.

While at it, reorder parameters of edac_raw_mc_handle_error().

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c   | 13 +
 drivers/edac/edac_mc.h   |  8 +++-
 drivers/edac/ghes_edac.c |  2 +-
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index ca206854b8ee..9e8c5716a8c0 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -1040,10 +1040,15 @@ static void edac_ue_error(struct mem_ctl_info *mci,
edac_inc_ue_error(mci, dimm, error_count);
 }
 
-void edac_raw_mc_handle_error(struct mem_ctl_info *mci,
- struct dimm_info *dimm,
- struct edac_raw_error_desc *e)
+static struct mem_ctl_info *error_desc_to_mci(struct edac_raw_error_desc *e)
+{
+   return container_of(e, struct mem_ctl_info, error_desc);
+}
+
+void edac_raw_mc_handle_error(struct edac_raw_error_desc *e,
+ struct dimm_info *dimm)
 {
+   struct mem_ctl_info *mci = error_desc_to_mci(e);
char detail[80];
 
/* Memory type dependent details about the error */
@@ -1243,6 +1248,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type 
type,
 
dimm = edac_get_dimm(mci, top_layer, mid_layer, low_layer);
 
-   edac_raw_mc_handle_error(mci, dimm, e);
+   edac_raw_mc_handle_error(e, dimm);
 }
 EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index a8f1b5b5e873..3b01d5d9d7bc 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -212,17 +212,15 @@ extern int edac_mc_find_csrow_by_page(struct mem_ctl_info 
*mci,
  * edac_raw_mc_handle_error() - Reports a memory event to userspace without
  * doing anything to discover the error location.
  *
- * @mci:   a struct mem_ctl_info pointer
- * @dimm:  a struct dimm_info pointer
  * @e: error description
+ * @dimm:  a struct dimm_info pointer
  *
  * This raw function is used internally by edac_mc_handle_error(). It should
  * only be called directly when the hardware error come directly from BIOS,
  * like in the case of APEI GHES driver.
  */
-void edac_raw_mc_handle_error(struct mem_ctl_info *mci,
- struct dimm_info *dimm,
- struct edac_raw_error_desc *e);
+void edac_raw_mc_handle_error(struct edac_raw_error_desc *e,
+ struct dimm_info *dimm);
 
 /**
  * edac_mc_handle_error() - Reports a memory event to userspace.
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 1db1c012bed9..8078d4ec9631 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -439,7 +439,7 @@ void ghes_edac_report_mem_error(int sev, struct 
cper_sec_mem_err *mem_err)
 
dimm = edac_get_dimm_by_index(mci, e->top_layer);
 
-   edac_raw_mc_handle_error(mci, dimm, e);
+   edac_raw_mc_handle_error(e, dimm);
 
spin_unlock_irqrestore(_lock, flags);
 }
-- 
2.20.1

[PATCH 02/19] EDAC: Remove EDAC_DIMM_OFF() macro

2019-10-10 Thread Robert Richter

The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index.
This can be much simplified now as the offset is already stored in
struct dimm_info. Use it directly and completely remove the
EDAC_DIMM_OFF() macro.

Another advantage is that edac_mc_alloc() could be used even if the
exact size of the layers is unknown. Only the number of DIMMs would be
needed.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c   | 15 +
 drivers/edac/edac_mc_sysfs.c | 20 --
 include/linux/edac.h | 41 
 3 files changed, 9 insertions(+), 67 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 3d45adb5957f..72088d49b03b 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -314,10 +314,10 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
struct dimm_info *dimm;
u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
unsigned int pos[EDAC_MAX_LAYERS];
-   unsigned int size, tot_dimms = 1, count = 1;
+   unsigned int idx, size, tot_dimms = 1, count = 1;
unsigned int tot_csrows = 1, tot_channels = 1, tot_errcount = 0;
void *pvt, *p, *ptr = NULL;
-   int i, j, row, chn, n, len, off;
+   int i, j, row, chn, n, len;
bool per_rank = false;
 
BUG_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0);
@@ -425,20 +425,15 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
memset(, 0, sizeof(pos));
row = 0;
chn = 0;
-   for (i = 0; i < tot_dimms; i++) {
+   for (idx = 0; idx < tot_dimms; idx++) {
chan = mci->csrows[row]->channels[chn];
-   off = EDAC_DIMM_OFF(layer, n_layers, pos[0], pos[1], pos[2]);
-   if (off < 0 || off >= tot_dimms) {
-   edac_mc_printk(mci, KERN_ERR, "EDAC core bug: 
EDAC_DIMM_OFF is trying to do an illegal data access\n");
-   goto error;
-   }
 
dimm = kzalloc(sizeof(**mci->dimms), GFP_KERNEL);
if (!dimm)
goto error;
-   mci->dimms[off] = dimm;
+   mci->dimms[idx] = dimm;
dimm->mci = mci;
-   dimm->idx = off;
+   dimm->idx = idx;
 
/*
 * Copy DIMM location and initialize it.
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 32d016f1ecd1..91e4c8f155af 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -557,14 +557,8 @@ static ssize_t dimmdev_ce_count_show(struct device *dev,
 {
struct dimm_info *dimm = to_dimm(dev);
u32 count;
-   int off;
-
-   off = EDAC_DIMM_OFF(dimm->mci->layers,
-   dimm->mci->n_layers,
-   dimm->location[0],
-   dimm->location[1],
-   dimm->location[2]);
-   count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][off];
+
+   count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];
return sprintf(data, "%u\n", count);
 }
 
@@ -574,14 +568,8 @@ static ssize_t dimmdev_ue_count_show(struct device *dev,
 {
struct dimm_info *dimm = to_dimm(dev);
u32 count;
-   int off;
-
-   off = EDAC_DIMM_OFF(dimm->mci->layers,
-   dimm->mci->n_layers,
-   dimm->location[0],
-   dimm->location[1],
-   dimm->location[2]);
-   count = dimm->mci->ue_per_layer[dimm->mci->n_layers-1][off];
+
+   count = dimm->mci->ue_per_layer[dimm->mci->n_layers-1][dimm->idx];
return sprintf(data, "%u\n", count);
 }
 
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 7f9804438589..79c5564ee314 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -362,47 +362,6 @@ struct edac_mc_layer {
  */
 #define EDAC_MAX_LAYERS3
 
-/**
- * EDAC_DIMM_OFF - Macro responsible to get a pointer offset inside a pointer
- *array for the element given by [layer0,layer1,layer2]
- *position
- *
- * @layers:a struct edac_mc_layer array, describing how many elements
- * were allocated for each layer
- * @nlayers:   Number of layers at the @layers array
- * @layer0:layer0 position
- * @layer1:layer1 position. Unused if n_layers < 2
- * @layer2:layer2 position. Unused if n_layers < 3
- *
- * For 1 layer, this macro returns "var[layer0] - var";
- *
- * For 2 layers, this macro is similar to allocate a bi-dimensional array
- * and to return "var[layer0][layer1] - var";
- *
- * For 3 layers, this macro is similar to allocate a tri-dimensional array

[PATCH 08/19] EDAC, mc: Split edac_mc_alloc() into smaller functions

2019-10-10 Thread Robert Richter

edac_mc_alloc() is huge. Factor out code by moving it to the two new
functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms(). Do not
move code yet for better review.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c | 104 +++--
 1 file changed, 69 insertions(+), 35 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index a893f793be8a..0db504cb3419 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -305,6 +305,9 @@ static void _edac_mc_free(struct mem_ctl_info *mci)
kfree(mci);
 }
 
+static int edac_mc_alloc_csrows(struct mem_ctl_info *mci);
+static int edac_mc_alloc_dimms(struct mem_ctl_info *mci);
+
 struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
   unsigned int n_layers,
   struct edac_mc_layer *layers,
@@ -312,14 +315,9 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
 {
struct mem_ctl_info *mci;
struct edac_mc_layer *layer;
-   struct csrow_info *csr;
-   struct rank_info *chan;
-   struct dimm_info *dimm;
-   unsigned int pos[EDAC_MAX_LAYERS];
unsigned int idx, size, tot_dimms = 1;
unsigned int tot_csrows = 1, tot_channels = 1;
-   void *pvt, *p, *ptr = NULL;
-   int j, row, chn, n, len;
+   void *pvt, *ptr = NULL;
bool per_rank = false;
 
if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
@@ -377,16 +375,43 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
mci->num_cschannel = tot_channels;
mci->csbased = per_rank;
 
+   if (edac_mc_alloc_csrows(mci))
+   goto error;
+
+   if (edac_mc_alloc_dimms(mci))
+   goto error;
+
+   mci->op_state = OP_ALLOC;
+
+   return mci;
+
+error:
+   _edac_mc_free(mci);
+
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(edac_mc_alloc);
+
+static int edac_mc_alloc_csrows(struct mem_ctl_info *mci)
+{
+   unsigned int tot_csrows = mci->nr_csrows;
+   unsigned int tot_channels = mci->num_cschannel;
+   unsigned int row, chn;
+
/*
 * Alocate and fill the csrow/channels structs
 */
mci->csrows = kcalloc(tot_csrows, sizeof(*mci->csrows), GFP_KERNEL);
if (!mci->csrows)
-   goto error;
+   return -ENOMEM;
+
for (row = 0; row < tot_csrows; row++) {
+   struct csrow_info *csr;
+
csr = kzalloc(sizeof(**mci->csrows), GFP_KERNEL);
if (!csr)
-   goto error;
+   return -ENOMEM;
+
mci->csrows[row] = csr;
csr->csrow_idx = row;
csr->mci = mci;
@@ -394,34 +419,51 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
csr->channels = kcalloc(tot_channels, sizeof(*csr->channels),
GFP_KERNEL);
if (!csr->channels)
-   goto error;
+   return -ENOMEM;
 
for (chn = 0; chn < tot_channels; chn++) {
+   struct rank_info *chan;
+
chan = kzalloc(sizeof(**csr->channels), GFP_KERNEL);
if (!chan)
-   goto error;
+   return -ENOMEM;
+
csr->channels[chn] = chan;
chan->chan_idx = chn;
chan->csrow = csr;
}
}
 
+   return 0;
+}
+
+static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
+{
+   void *p;
+   unsigned int pos[EDAC_MAX_LAYERS];
+   unsigned int row, chn, idx;
+   int layer;
+
/*
 * Allocate and fill the dimm structs
 */
-   mci->dimms  = kcalloc(tot_dimms, sizeof(*mci->dimms), GFP_KERNEL);
+   mci->dimms  = kcalloc(mci->tot_dimms, sizeof(*mci->dimms), GFP_KERNEL);
if (!mci->dimms)
-   goto error;
+   return -ENOMEM;
 
memset(, 0, sizeof(pos));
row = 0;
chn = 0;
-   for (idx = 0; idx < tot_dimms; idx++) {
+   for (idx = 0; idx < mci->tot_dimms; idx++) {
+   struct dimm_info *dimm;
+   struct rank_info *chan;
+   int n, len;
+
chan = mci->csrows[row]->channels[chn];
 
dimm = kzalloc(sizeof(**mci->dimms), GFP_KERNEL);
if (!dimm)
-   goto error;
+   return -ENOMEM;
mci->dimms[idx] = dimm;
dimm->mci = mci;
dimm->idx = idx;
@@ -431,16 +473,16 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
 */
len = sizeof(dimm->label);
p = dimm->label;
-   n =

[PATCH 05/19] EDAC, mc: Reduce indentation level in edac_mc_handle_error()

2019-10-10 Thread Robert Richter

Reduce the indentation level in edac_mc_handle_error() a bit by using
continue. No functional changes.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c | 59 +-
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index f2cbca77bc50..45b02bb31964 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -1171,37 +1171,38 @@ void edac_mc_handle_error(const enum 
hw_event_mc_err_type type,
 * channel/memory controller/...  may be affected.
 * Also, don't show errors for empty DIMM slots.
 */
-   if (e->enable_per_layer_report && dimm->nr_pages) {
-   if (n_labels >= EDAC_MAX_LABELS) {
-   e->enable_per_layer_report = false;
-   break;
-   }
-   n_labels++;
-   if (p != e->label) {
-   strcpy(p, OTHER_LABEL);
-   p += strlen(OTHER_LABEL);
-   }
-   strcpy(p, dimm->label);
-   p += strlen(p);
-   *p = '\0';
+   if (!e->enable_per_layer_report || !dimm->nr_pages)
+   continue;
 
-   /*
-* get csrow/channel of the DIMM, in order to allow
-* incrementing the compat API counters
-*/
-   edac_dbg(4, "%s csrows map: (%d,%d)\n",
-mci->csbased ? "rank" : "dimm",
-dimm->csrow, dimm->cschannel);
-   if (row == -1)
-   row = dimm->csrow;
-   else if (row >= 0 && row != dimm->csrow)
-   row = -2;
-
-   if (chan == -1)
-   chan = dimm->cschannel;
-   else if (chan >= 0 && chan != dimm->cschannel)
-   chan = -2;
+   if (n_labels >= EDAC_MAX_LABELS) {
+   e->enable_per_layer_report = false;
+   break;
+   }
+   n_labels++;
+   if (p != e->label) {
+   strcpy(p, OTHER_LABEL);
+   p += strlen(OTHER_LABEL);
}
+   strcpy(p, dimm->label);
+   p += strlen(p);
+   *p = '\0';
+
+   /*
+* get csrow/channel of the DIMM, in order to allow
+* incrementing the compat API counters
+*/
+   edac_dbg(4, "%s csrows map: (%d,%d)\n",
+   mci->csbased ? "rank" : "dimm",
+   dimm->csrow, dimm->cschannel);
+   if (row == -1)
+   row = dimm->csrow;
+   else if (row >= 0 && row != dimm->csrow)
+   row = -2;
+
+   if (chan == -1)
+   chan = dimm->cschannel;
+   else if (chan >= 0 && chan != dimm->cschannel)
+   chan = -2;
}
 
if (!e->enable_per_layer_report) {
-- 
2.20.1

[PATCH 10/19] EDAC, mc: Rework edac_raw_mc_handle_error() to use struct dimm_info

2019-10-10 Thread Robert Richter

The error handling functions have the pos[] array argument for
determing the dimm handle. Rework those functions to use the dimm
handle directly.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c   | 28 +---
 drivers/edac/edac_mc.h   |  2 ++
 drivers/edac/ghes_edac.c |  6 +-
 3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 6d880cf4d599..cdfb383f7a35 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -925,11 +925,9 @@ const char *edac_layer_name[] = {
 EXPORT_SYMBOL_GPL(edac_layer_name);
 
 static void edac_inc_ce_error(struct mem_ctl_info *mci,
- const int pos[EDAC_MAX_LAYERS],
+ struct dimm_info *dimm,
  const u16 count)
 {
-   struct dimm_info *dimm = edac_get_dimm(mci, pos[0], pos[1], pos[2]);
-
mci->ce_mc += count;
 
if (dimm)
@@ -939,11 +937,9 @@ static void edac_inc_ce_error(struct mem_ctl_info *mci,
 }
 
 static void edac_inc_ue_error(struct mem_ctl_info *mci,
-   const int pos[EDAC_MAX_LAYERS],
-   const u16 count)
+ struct dimm_info *dimm,
+ const u16 count)
 {
-   struct dimm_info *dimm = edac_get_dimm(mci, pos[0], pos[1], pos[2]);
-
mci->ue_mc += count;
 
if (dimm)
@@ -953,8 +949,8 @@ static void edac_inc_ue_error(struct mem_ctl_info *mci,
 }
 
 static void edac_ce_error(struct mem_ctl_info *mci,
+ struct dimm_info *dimm,
  const u16 error_count,
- const int pos[EDAC_MAX_LAYERS],
  const char *msg,
  const char *location,
  const char *label,
@@ -982,7 +978,7 @@ static void edac_ce_error(struct mem_ctl_info *mci,
   error_count, msg, msg_aux, label,
   location, detail);
}
-   edac_inc_ce_error(mci, pos, error_count);
+   edac_inc_ce_error(mci, dimm, error_count);
 
if (mci->scrub_mode == SCRUB_SW_SRC) {
/*
@@ -1006,8 +1002,8 @@ static void edac_ce_error(struct mem_ctl_info *mci,
 }
 
 static void edac_ue_error(struct mem_ctl_info *mci,
+ struct dimm_info *dimm,
  const u16 error_count,
- const int pos[EDAC_MAX_LAYERS],
  const char *msg,
  const char *location,
  const char *label,
@@ -1041,15 +1037,15 @@ static void edac_ue_error(struct mem_ctl_info *mci,
  msg, msg_aux, label, location, detail);
}
 
-   edac_inc_ue_error(mci, pos, error_count);
+   edac_inc_ue_error(mci, dimm, error_count);
 }
 
 void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
  struct mem_ctl_info *mci,
+ struct dimm_info *dimm,
  struct edac_raw_error_desc *e)
 {
char detail[80];
-   int pos[EDAC_MAX_LAYERS] = { e->top_layer, e->mid_layer, e->low_layer };
 
/* Memory type dependent details about the error */
if (type == HW_EVENT_ERR_CORRECTED) {
@@ -1057,7 +1053,7 @@ void edac_raw_mc_handle_error(const enum 
hw_event_mc_err_type type,
"page:0x%lx offset:0x%lx grain:%ld syndrome:0x%lx",
e->page_frame_number, e->offset_in_page,
e->grain, e->syndrome);
-   edac_ce_error(mci, e->error_count, pos, e->msg, e->location,
+   edac_ce_error(mci, dimm, e->error_count, e->msg, e->location,
  e->label, detail, e->other_detail,
  e->page_frame_number, e->offset_in_page, 
e->grain);
} else {
@@ -1065,7 +1061,7 @@ void edac_raw_mc_handle_error(const enum 
hw_event_mc_err_type type,
"page:0x%lx offset:0x%lx grain:%ld",
e->page_frame_number, e->offset_in_page, e->grain);
 
-   edac_ue_error(mci, e->error_count, pos, e->msg, e->location,
+   edac_ue_error(mci, dimm, e->error_count, e->msg, e->location,
  e->label, detail, e->other_detail);
}
 
@@ -1245,6 +1241,8 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type 
type,
   (e->page_frame_number << PAGE_SHIFT) | 
e->offset_in_page,
   grain_bits, e->syndrome, e->other_detail);
 
-   edac_raw_mc_handle_error(type, mci, e);
+   dimm = edac_get_dimm(mci, top_layer, mid_layer, low_layer);
+
+   ed

[PATCH 03/19] EDAC: Introduce mci_for_each_dimm() iterator

2019-10-10 Thread Robert Richter

Introduce the mci_for_each_dimm() iterator. It returns a pointer to a
struct dimm_info. This makes the declaration and use of an index
obsolete and avoids access to internal data of struct mci (direct
array access etc).

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c   | 19 +++
 drivers/edac/edac_mc_sysfs.c | 29 -
 drivers/edac/ghes_edac.c |  8 
 drivers/edac/i5100_edac.c| 13 +
 include/linux/edac.h |  7 +++
 5 files changed, 39 insertions(+), 37 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 72088d49b03b..c5240bb4c6c0 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -145,15 +145,18 @@ static void edac_mc_dump_channel(struct rank_info *chan)
edac_dbg(4, "channel->dimm = %p\n", chan->dimm);
 }
 
-static void edac_mc_dump_dimm(struct dimm_info *dimm, int number)
+static void edac_mc_dump_dimm(struct dimm_info *dimm)
 {
char location[80];
 
+   if (!dimm->nr_pages)
+   return;
+
edac_dimm_info_location(dimm, location, sizeof(location));
 
edac_dbg(4, "%s%i: %smapped as virtual row %d, chan %d\n",
 dimm->mci->csbased ? "rank" : "dimm",
-number, location, dimm->csrow, dimm->cschannel);
+dimm->idx, location, dimm->csrow, dimm->cschannel);
edac_dbg(4, "  dimm = %p\n", dimm);
edac_dbg(4, "  dimm->label = '%s'\n", dimm->label);
edac_dbg(4, "  dimm->nr_pages = 0x%x\n", dimm->nr_pages);
@@ -702,6 +705,7 @@ EXPORT_SYMBOL_GPL(edac_get_owner);
 int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
   const struct attribute_group **groups)
 {
+   struct dimm_info *dimm;
int ret = -EINVAL;
edac_dbg(0, "\n");
 
@@ -726,9 +730,9 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
if (csrow->channels[j]->dimm->nr_pages)

edac_mc_dump_channel(csrow->channels[j]);
}
-   for (i = 0; i < mci->tot_dimms; i++)
-   if (mci->dimms[i]->nr_pages)
-   edac_mc_dump_dimm(mci->dimms[i], i);
+
+   mci_for_each_dimm(mci, dimm)
+   edac_mc_dump_dimm(dimm);
}
 #endif
mutex_lock(_ctls_mutex);
@@ -1086,6 +1090,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type 
type,
  const char *msg,
  const char *other_detail)
 {
+   struct dimm_info *dimm;
char *p;
int row = -1, chan = -1;
int pos[EDAC_MAX_LAYERS] = { top_layer, mid_layer, low_layer };
@@ -1146,9 +1151,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type 
type,
p = e->label;
*p = '\0';
 
-   for (i = 0; i < mci->tot_dimms; i++) {
-   struct dimm_info *dimm = mci->dimms[i];
-
+   mci_for_each_dimm(mci, dimm) {
if (top_layer >= 0 && top_layer != dimm->location[0])
continue;
if (mid_layer >= 0 && mid_layer != dimm->location[1])
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 91e4c8f155af..0367554e7437 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -621,8 +621,7 @@ static const struct device_type dimm_attr_type = {
 
 /* Create a DIMM object under specifed memory controller device */
 static int edac_create_dimm_object(struct mem_ctl_info *mci,
-  struct dimm_info *dimm,
-  int index)
+  struct dimm_info *dimm)
 {
int err;
dimm->mci = mci;
@@ -632,9 +631,9 @@ static int edac_create_dimm_object(struct mem_ctl_info *mci,
 
dimm->dev.parent = >dev;
if (mci->csbased)
-   dev_set_name(>dev, "rank%d", index);
+   dev_set_name(>dev, "rank%d", dimm->idx);
else
-   dev_set_name(>dev, "dimm%d", index);
+   dev_set_name(>dev, "dimm%d", dimm->idx);
dev_set_drvdata(>dev, dimm);
pm_runtime_forbid(>dev);
 
@@ -916,7 +915,8 @@ static const struct device_type mci_attr_type = {
 int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
 const struct attribute_group **groups)
 {
-   int i, err;
+   struct dimm_info *dimm;
+   int err;
 
/* get the /sys/devices/system/edac subsys reference */
mci->dev.type = _attr_type;
@@ -940,13 +940,12 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
/*
 *

[PATCH 04/19] EDAC, mc: Do not BUG_ON() in edac_mc_alloc()

2019-10-10 Thread Robert Richter

No need to crash the system in case edac_mc_alloc() is called with
invalid arguments, just warn and return. This would cause a checkpatch
warning when touching the code later, so just fix it.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index c5240bb4c6c0..f2cbca77bc50 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -323,7 +323,9 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
int i, j, row, chn, n, len;
bool per_rank = false;
 
-   BUG_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0);
+   if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
+   return NULL;
+
/*
 * Calculate the total amount of dimms and csrows/cschannels while
 * in the old API emulation mode
-- 
2.20.1

[PATCH 01/19] EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function

2019-10-10 Thread Robert Richter

The EDAC_DIMM_PTR() macro takes 3 arguments from struct mem_ctl_info.
Clean up this interface to only pass the mci struct and replace this
macro with the new function edac_get_dimm().

Also introduce the edac_get_dimm_by_index() function for later use.
This allows it to get a dimm pointer only by a given index. This can
be useful if the dimm's position within the layers of the memory
controller or the exact size of the layers are unknown.

Small style changes made for some hunks after applying the semantic
patch.

Semantic patch used:

@@ expression mci, a, b,c; @@

-EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, a, b, c)
+edac_get_dimm(mci, a, b, c)

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c  |  1 +
 drivers/edac/ghes_edac.c|  7 +--
 drivers/edac/i10nm_base.c   |  3 +-
 drivers/edac/i3200_edac.c   |  3 +-
 drivers/edac/i5000_edac.c   |  5 +-
 drivers/edac/i5100_edac.c   |  3 +-
 drivers/edac/i5400_edac.c   |  3 +-
 drivers/edac/i7300_edac.c   |  3 +-
 drivers/edac/i7core_edac.c  |  3 +-
 drivers/edac/ie31200_edac.c |  7 +--
 drivers/edac/pnd2_edac.c|  4 +-
 drivers/edac/sb_edac.c  |  2 +-
 drivers/edac/skx_base.c |  3 +-
 drivers/edac/ti_edac.c  |  2 +-
 include/linux/edac.h| 92 -
 15 files changed, 79 insertions(+), 62 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index e6fd079783bd..3d45adb5957f 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -438,6 +438,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
goto error;
mci->dimms[off] = dimm;
dimm->mci = mci;
+   dimm->idx = off;
 
/*
 * Copy DIMM location and initialize it.
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index d413a0bdc9ad..fb31e80dfe94 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -98,9 +98,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, 
void *arg)
 
if (dh->type == DMI_ENTRY_MEM_DEVICE) {
struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
-   struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
-  mci->n_layers,
-  dimm_fill->count, 0, 0);
+   struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count, 
0, 0);
u16 rdr_mask = BIT(7) | BIT(13);
 
if (entry->size == 0x) {
@@ -527,8 +525,7 @@ int ghes_edac_register(struct ghes *ghes, struct device 
*dev)
dimm_fill.mci = mci;
dmi_walk(ghes_edac_dmidecode, _fill);
} else {
-   struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
-  mci->n_layers, 0, 0, 0);
+   struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
 
dimm->nr_pages = 1;
dimm->grain = 128;
diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c
index c370d5457e6b..059eccf0582b 100644
--- a/drivers/edac/i10nm_base.c
+++ b/drivers/edac/i10nm_base.c
@@ -154,8 +154,7 @@ static int i10nm_get_dimm_config(struct mem_ctl_info *mci)
 
ndimms = 0;
for (j = 0; j < I10NM_NUM_DIMMS; j++) {
-   dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
-mci->n_layers, i, j, 0);
+   dimm = edac_get_dimm(mci, i, j, 0);
mtr = I10NM_GET_DIMMMTR(imc, i, j);
mcddrtcfg = I10NM_GET_MCDDRTCFG(imc, i, j);
edac_dbg(1, "dimmmtr 0x%x mcddrtcfg 0x%x (mc%d ch%d 
dimm%d)\n",
diff --git a/drivers/edac/i3200_edac.c b/drivers/edac/i3200_edac.c
index 299b441647cd..432b375a4075 100644
--- a/drivers/edac/i3200_edac.c
+++ b/drivers/edac/i3200_edac.c
@@ -392,8 +392,7 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
unsigned long nr_pages;
 
for (j = 0; j < nr_channels; j++) {
-   struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, 
mci->dimms,
-  mci->n_layers, 
i, j, 0);
+   struct dimm_info *dimm = edac_get_dimm(mci, i, j, 0);
 
nr_pages = drb_to_nr_pages(drbs, stacked, j, i);
if (nr_pages == 0)
diff --git a/drivers/edac/i5000_edac.c b/drivers/edac/i5000_edac.c
index 078a7351bf05..1a6f69c859ab 100644
--- a/drivers/edac/i5000_edac.c
+++ b/drivers/edac/i5000_edac.c
@@ -1275,9 +1275,8 @@ static int i5000_init_csrows(struct mem_ctl_info *mci)
if (!MTR_DIMMS_PRESENT(mtr))
continue

[PATCH 00/19] EDAC: Rework edac_mc and ghes drivers

2019-10-10 Thread Robert Richter

This patch set is a rework of the ghes_edac and edac_mc driver. It
addresses issues found during code review and while working with the
code. The changes include:

 * improve function interfaces and data structures to decrease
   complexity such as number of function arguments, unused data, etc.
   (#1, #2, #6, #10, #12, #13),

 * add helper functions and factor out code (#3, #8, #9, #14)

 * fix style issues found by checkpatch (#4)

 * improve code readability (#5, #7, #11)

 * use of standard kernel macros (#15)

 * code unification (#16, #17, #18)

 * documentation updates (#19)

This series contains some patches that have been posted and reviewed
earlier here:

 https://lore.kernel.org/patchwork/cover/1093488/

Review comments have been addressed and the following changes have
been made to those patches:

 * rebased onto ras:edac-for-next (v5.4-rc1 based)

 * reworded patch descriptions

 * dropped hunk that only reorders code in "EDAC, ghes: Remove
   pvt->detail_location string"

 * updated description for edac_get_dimm_by_index()

 * updated description for edac_get_dimm()

 * small style changes in "EDAC: Replace EDAC_DIMM_PTR() macro with
   edac_get_dimm() function"

 * use unsigned type for dimm->idx to avoid need for negative range
   check

 * fixed mci_for_each_dimm() loop in i5100_init_csrows()

 * fixed off-by-one in mci_for_each_dimm()


Robert Richter (19):
  EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function
  EDAC: Remove EDAC_DIMM_OFF() macro
  EDAC: Introduce mci_for_each_dimm() iterator
  EDAC, mc: Do not BUG_ON() in edac_mc_alloc()
  EDAC, mc: Reduce indentation level in edac_mc_handle_error()
  EDAC, mc: Remove per layer counters
  EDAC, mc: Rename iterator variable to idx
  EDAC, mc: Split edac_mc_alloc() into smaller functions
  EDAC, mc: Reorder functions edac_mc_alloc*()
  EDAC, mc: Rework edac_raw_mc_handle_error() to use struct dimm_info
  EDAC: Remove misleading comment in struct edac_raw_error_desc
  EDAC: Store error type in struct edac_raw_error_desc
  EDAC, mc: Determine mci pointer from the error descriptor
  EDAC, mc: Create new function edac_inc_csrow()
  EDAC, ghes: Use standard kernel macros for page calculations
  EDAC, ghes: Fix grain calculation
  EDAC, ghes: Remove intermediate buffer pvt->detail_location
  EDAC, ghes: Unify trace_mc_event() code with edac_mc driver
  EDAC, Documentation: Describe CPER module definition and DIMM ranks

 Documentation/admin-guide/ras.rst |  31 +-
 drivers/edac/edac_mc.c| 478 +++---
 drivers/edac/edac_mc.h|   8 +-
 drivers/edac/edac_mc_sysfs.c  |  65 ++--
 drivers/edac/ghes_edac.c  |  55 ++--
 drivers/edac/i10nm_base.c |   3 +-
 drivers/edac/i3200_edac.c |   3 +-
 drivers/edac/i5000_edac.c |   5 +-
 drivers/edac/i5100_edac.c |  14 +-
 drivers/edac/i5400_edac.c |   3 +-
 drivers/edac/i7300_edac.c |   3 +-
 drivers/edac/i7core_edac.c|   3 +-
 drivers/edac/ie31200_edac.c   |   7 +-
 drivers/edac/pnd2_edac.c  |   4 +-
 drivers/edac/sb_edac.c|   2 +-
 drivers/edac/skx_base.c   |   3 +-
 drivers/edac/ti_edac.c|   2 +-
 include/linux/edac.h  | 153 +-
 18 files changed, 401 insertions(+), 441 deletions(-)

-- 
2.20.1

Re: [PATCH] PCI: Enhance the ACS quirk for Cavium devices

2019-10-08 Thread Robert Richter

On 04.10.19 14:48:13, Bjorn Helgaas wrote:
> commit 37b22fbfec2d
> Author: George Cherian 
> Date:   Thu Sep 19 02:43:34 2019 +
> 
> PCI: Apply Cavium ACS quirk to CN99xx and CN11xxx Root Ports
> 
> Add an array of Cavium Root Port device IDs and apply the quirk only to 
> the
> listed devices.
> 
> Instead of applying the quirk to all Root Ports where
> "(dev->device & 0xf800) == 0xa000", apply it only to CN88xx 0xa180 and
> 0xa170 Root Ports.

No, this can't be removed. It is a match all for all CN8xxx variants
(note the 3 'x', all TX1 cores). So all device ids from 0xa000 to
0xa7FF are affected here and need the quirk.

> 
> Also apply the quirk to CN99xx (0xaf84) and CN11xxx (0xb884) Root Ports.

I thought the quirk is CN8xxx specific, but I could be wrong here.

-Robert

> 
> Link: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_r_20190919024319.GA8792-40dc5-2Deodlnx05.marvell.com=DwIBAg=nKjWec2b6R0mOyPaz7xtfQ=8vKOpC26NZGzQPAMiIlimxyEGCRSJiq-j8yyjPJ6VZ4=Vmml-rx3t63ZbbXZ0XaESAM9yAlexE29R-giTbcj4Qk=57jKIj8BAydbLpftLt5Ssva7vD6GuoCaIpjTi-sB5kU=
>  
> Fixes: f2ddaf8dfd4a ("PCI: Apply Cavium ThunderX ACS quirk to more Root 
> Ports")
> Fixes: b404bcfbf035 ("PCI: Add ACS quirk for all Cavium devices")
> Signed-off-by: George Cherian 
> Signed-off-by: Bjorn Helgaas 
> Cc: sta...@vger.kernel.org  # v4.12+
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 320255e5e8f8..4e5048cb5ec6 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4311,17 +4311,24 @@ static int pci_quirk_amd_sb_acs(struct pci_dev *dev, 
> u16 acs_flags)
>  #endif
>  }
>  
> +static const u16 pci_quirk_cavium_acs_ids[] = {
> + 0xa180, 0xa170, /* CN88xx family of devices */
> + 0xaf84, /* CN99xx family of devices */
> + 0xb884, /* CN11xxx family of devices */
> +};
> +
>  static bool pci_quirk_cavium_acs_match(struct pci_dev *dev)
>  {
> - /*
> -  * Effectively selects all downstream ports for whole ThunderX 1
> -  * family by 0xf800 mask (which represents 8 SoCs), while the lower
> -  * bits of device ID are used to indicate which subdevice is used
> -  * within the SoC.
> -  */
> - return (pci_is_pcie(dev) &&
> - (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) &&
> - ((dev->device & 0xf800) == 0xa000));
> + int i;
> +
> + if (!pci_is_pcie(dev) || pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
> + return false;
> +
> + for (i = 0; i < ARRAY_SIZE(pci_quirk_cavium_acs_ids); i++)
> + if (pci_quirk_cavium_acs_ids[i] == dev->device)
> + return true;
> +
> + return false;
>  }
>  
>  static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags)

Re: [PATCH v4 1/2] edac: Add an API for edac device to report for multiple errors

2019-10-01 Thread Robert Richter

On 01.10.19 12:25:39, Borislav Petkov wrote:
> On Tue, Oct 01, 2019 at 09:47:07AM +0000, Robert Richter wrote:
> > If you move to static inline for edac_device_handle_{ce,ue} the
> > symbols vanish and this breaks the abi. That's why the split in two
> > patches.
> 
> ABI issues do not concern upstream. And that coming from me working at a
> company who dance a lot to make ABI happy.
> 
> Also, I'm missing the reasoning why you use the ABI as an argument at
> all: do you know of a particular case where people are thinking of
> backporting this or this is all hypothetical.
> 
> > Your comment to not have a __ version as a third variant of the
> > interface makes sense to me. But to keep ABI your patch still needs to
> > be split.
> 
> Not really - normally, when you fix ABI issues with symbols
> disappearing, all of a sudden, you add dummy ones so that the ABI
> checker is happy.

Let's go with a single patch then and the function naming you
suggested before.

Thanks,

-Robert

Re: [PATCH v4 1/2] edac: Add an API for edac device to report for multiple errors

2019-10-01 Thread Robert Richter

On 01.10.19 10:32:42, Borislav Petkov wrote:

> --
> On Tue, Oct 01, 2019 at 06:56:58AM +, Robert Richter wrote:
> > It is *not* the counterpart. The __* version already has the...
> 
> Lemme cut to the chase:
> 
> "Make the main workhorse the "count" functions which can log a @count
> of errors. Have the current APIs edac_device_handle_{ce,ue}() call
> the _count() variants and this way keep the exported symbols number
> unchanged."
> 
> I want to see exactly *two* pairs of functions:
> 
> edac_device_handle_{ce,ue}_count  - logs a @count of errors
> edac_device_handle_{ce,ue}- logs a single error
> 
> Not three pairs. I.e., the "__" versions are not needed.
> 
> > The first patch only adds functionality but keeps the abi. Thus it
> > makes a backport easier.
> 
> Works just the same with my version - single backport.

If you move to static inline for edac_device_handle_{ce,ue} the
symbols vanish and this breaks the abi. That's why the split in two
patches.

Your comment to not have a __ version as a third variant of the
interface makes sense to me. But to keep ABI your patch still needs to
be split. The first patch still must contain symbols for
edac_device_handle_{ce,ue}. I am not sure if ABI compatability is
something we want to make easier here. I personally like the approach.

-Robert

Re: [PATCH v4 1/2] edac: Add an API for edac device to report for multiple errors

2019-10-01 Thread Robert Richter

On 30.09.19 16:50:46, Borislav Petkov wrote:
> --
> On Mon, Sep 23, 2019 at 08:17:40PM +0100, Hanna Hawa wrote:
> > +void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
> > +   int inst_nr, int block_nr, const char *msg)
> > +{
> > +   __edac_device_handle_ce(edac_dev, 1, inst_nr, block_nr, msg);
> > +}
> > +EXPORT_SYMBOL_GPL(edac_device_handle_ce);
> 
> Eww, I don't like that: exporting the function *and* the __ counterpart.
> The user will get confused and that is unnecessary.

It is *not* the counterpart. The __* version already has the
additional count argument in. There are 2 patches:

1) introduce new function __edac_device_handle_ce/ue() including the
   *_count() inline functions, but keep the old symbols (note the
   count=1 parameter).

2) remove old symbol edac_device_handle_ce/ue() and replace it with an
   inline function to use the counterpart symbol too.

The first patch only adds functionality but keeps the abi. Thus it
makes a backport easier. The 2nd switches completely to the new
function.

-Robert

Re: [PATCH v4 0/2] Add an API for edac device, for mulriple errors

2019-09-24 Thread Robert Richter

On 23.09.19 20:17:39, Hanna Hawa wrote:
> Add an API for EDAC device to report for multiple errors, and move the
> old report function to use the new API.
> 
> Changes from v3:
> 
> - Move count check to inline function
> - Fix count variable describtion
>   (Reported-by: kbuild test robot )

Looks good to me. If another respin happens, please fix the 80 char
limitation for the static inline functions, you could line break after
the type definition.

Thanks,

-Robert

Re: [PATCH] EDAC: Armada XP: Use devm_platform_ioremap_resource() in two functions

2019-09-23 Thread Robert Richter

Hi Markus,

On 21.09.19 17:57:24, Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sat, 21 Sep 2019 17:50:17 +0200
> 
> Simplify these function implementations by using a known function.
> 
> This issue was detected by using the Coccinelle software.

Which semantic patch did you use here?

The kernel's spatch for this pattern only found and fixed the
following:

 $ make coccicheck 
COCCI=scripts/coccinelle/api/devm_platform_ioremap_resource.cocci MODE=patch 
M=drivers/edac | patch -p1
 patching file drivers/edac/ti_edac.c
 patching file drivers/edac/xgene_edac.c
 patching file drivers/edac/synopsys_edac.c

There are probably more drivers to fix than the above and the one you
fixed:

 $ git grep -l platform_get_resource drivers/edac/
 drivers/edac/altera_edac.c
 drivers/edac/armada_xp_edac.c
 drivers/edac/aspeed_edac.c
 drivers/edac/bluefield_edac.c
 drivers/edac/cpc925_edac.c
 drivers/edac/highbank_l2_edac.c
 drivers/edac/highbank_mc_edac.c
 drivers/edac/mv64x60_edac.c
 drivers/edac/synopsys_edac.c
 drivers/edac/ti_edac.c
 drivers/edac/xgene_edac.c

So while at it, how about fixing the .cocci patch in scripts/ and run
it for drivers/edac? There should be one patch only for all edac
drivers.

Thanks,

-Robert

Re: [PATCH 07/32] x86: Use pr_warn instead of pr_warning

2019-09-20 Thread Robert Richter

On 20.09.19 14:25:19, Kefeng Wang wrote:
> As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of
> pr_warning"), removing pr_warning so all logging messages use a
> consistent _warn style. Let's do it.
> 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: Robert Richter 
> Cc: Darren Hart 
> Cc: Andy Shevchenko 
> Signed-off-by: Kefeng Wang 
> ---
>  arch/x86/kernel/amd_gart_64.c  | 11 ---
>  arch/x86/kernel/apic/apic.c| 41 --
>  arch/x86/kernel/setup_percpu.c |  4 +--
>  arch/x86/kernel/tboot.c| 15 +-
>  arch/x86/kernel/tsc_sync.c |  8 ++---
>  arch/x86/kernel/umip.c |  6 ++--
>  arch/x86/mm/kmmio.c|  7 ++---
>  arch/x86/mm/mmio-mod.c |  6 ++--
>  arch/x86/mm/numa_emulation.c   |  4 +--
>  arch/x86/mm/testmmiotrace.c|  6 ++--
>  arch/x86/oprofile/op_x86_model.h   |  6 ++--

For oprofile:

Acked-by: Robert Richter 

But see below:

>  arch/x86/platform/olpc/olpc-xo15-sci.c |  2 +-
>  arch/x86/platform/sfi/sfi.c|  3 +-
>  arch/x86/xen/setup.c   |  2 +-
>  14 files changed, 57 insertions(+), 64 deletions(-)
> 
> diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
> index a585ea6f686a..53545c9c7cad 100644
> --- a/arch/x86/kernel/amd_gart_64.c
> +++ b/arch/x86/kernel/amd_gart_64.c

> @@ -665,7 +664,7 @@ static __init int init_amd_gatt(struct agp_kern_info 
> *info)
>  
>   nommu:
>   /* Should not happen anymore */
> - pr_warning("PCI-DMA: More than 4GB of RAM and no IOMMU\n"
> + pr_warn("PCI-DMA: More than 4GB of RAM and no IOMMU\n"
>  "falling back to iommu=soft.\n");

This indentation should be fixed too, while at it.

>   return -1;
>  }

-Robert

Re: [PATCH 17/32] oprofile: Use pr_warn instead of pr_warning

2019-09-20 Thread Robert Richter

On 20.09.19 14:25:29, Kefeng Wang wrote:
> As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of
> pr_warning"), removing pr_warning so all logging messages use a
> consistent _warn style. Let's do it.
> 
> Cc: Robert Richter 
> Signed-off-by: Kefeng Wang 
> ---
>  drivers/oprofile/oprofile_perf.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Acked-by: Robert Richter

Re: [PATCH v3 2/2] edac: move edac_device_handle_*() API functions to header

2019-09-20 Thread Robert Richter

On 19.09.19 18:17:13, Hanna Hawa wrote:
> Move edac_device_handle_*() functions from source file to header file as
> inline funtcion that use the new API with single error.
> 
> Signed-off-by: Hanna Hawa 

With the changes below it looks good to me:

Acked-by: Robert Richter 

Thanks,

-Robert

> diff --git a/drivers/edac/edac_device.h b/drivers/edac/edac_device.h
> index 30dc5f5979c8..796ea134a691 100644
> --- a/drivers/edac/edac_device.h
> +++ b/drivers/edac/edac_device.h
> @@ -285,29 +285,6 @@ extern int edac_device_add_device(struct 
> edac_device_ctl_info *edac_dev);
>   */
>  extern struct edac_device_ctl_info *edac_device_del_device(struct device 
> *dev);
>  
> -/**
> - * edac_device_handle_ue():
> - *   perform a common output and handling of an 'edac_dev' UE event
> - *
> - * @edac_dev: pointer to struct _device_ctl_info
> - * @inst_nr: number of the instance where the UE error happened
> - * @block_nr: number of the block where the UE error happened
> - * @msg: message to be printed
> - */
> -extern void edac_device_handle_ue(struct edac_device_ctl_info *edac_dev,
> - int inst_nr, int block_nr, const char *msg);
> -/**
> - * edac_device_handle_ce():
> - *   perform a common output and handling of an 'edac_dev' CE event
> - *
> - * @edac_dev: pointer to struct _device_ctl_info
> - * @inst_nr: number of the instance where the CE error happened
> - * @block_nr: number of the block where the CE error happened
> - * @msg: message to be printed
> - */
> -extern void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
> - int inst_nr, int block_nr, const char *msg);
> -

Just put in the inline replacement here.

>  /**
>   * edac_device_alloc_index: Allocate a unique device index number
>   *
> @@ -357,4 +334,18 @@ static inline void edac_device_handle_ue_count(struct 
> edac_device_ctl_info *edac
>  {
>   __edac_device_handle_ue(edac_dev, count, inst_nr, block_nr, msg);
>  }
> +
> +static inline void edac_device_handle_ce(struct edac_device_ctl_info 
> *edac_dev,
> +  int inst_nr, int block_nr,

No need for this linebreak.

> +  const char *msg)
> +{
> + __edac_device_handle_ce(edac_dev, 1, inst_nr, block_nr, msg);
> +}
> +
> +static inline void edac_device_handle_ue(struct edac_device_ctl_info 
> *edac_dev,
> +  int inst_nr, int block_nr,

Same here.

> +  const char *msg)
> +{
> + __edac_device_handle_ue(edac_dev, 1, inst_nr, block_nr, msg);
> +}
>  #endif
> -- 
> 2.17.1
>

Re: [PATCH v3 1/2] edac: Add an API for edac device to report for multiple errors

2019-09-20 Thread Robert Richter

On 19.09.19 18:17:12, Hanna Hawa wrote:
> Add an API for EDAC device to report multiple errors with same type.
> 
> Signed-off-by: Hanna Hawa 

With the change below it looks good to me:

Acked-by: Robert Richter 

Thanks,

-Robert

> ---
>  drivers/edac/edac_device.c | 62 ++
>  drivers/edac/edac_device.h | 40 
>  2 files changed, 82 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
> index 65cf2b9355c4..866934f2bcb0 100644
> --- a/drivers/edac/edac_device.c
> +++ b/drivers/edac/edac_device.c
> @@ -555,12 +555,16 @@ static inline int edac_device_get_panic_on_ue(struct 
> edac_device_ctl_info
>   return edac_dev->panic_on_ue;
>  }
>  
> -void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
> - int inst_nr, int block_nr, const char *msg)
> +void __edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
> +  unsigned int count, int inst_nr, int block_nr,
> +  const char *msg)
>  {
>   struct edac_device_instance *instance;
>   struct edac_device_block *block = NULL;
>  
> + if (!count)
> + return;
> +

Those checks should be moved to the *_count() variants of both
functions.

[...]

> +static inline void edac_device_handle_ce_count(struct edac_device_ctl_info 
> *edac_dev,
> +unsigned int count, int inst_nr,
> +int block_nr, const char *msg)
> +{

if (count)
...

> + __edac_device_handle_ce(edac_dev, count, inst_nr, block_nr, msg);
> +}
> +
> +static inline void edac_device_handle_ue_count(struct edac_device_ctl_info 
> *edac_dev,
> +unsigned int count, int inst_nr,
> +int block_nr, const char *msg)
> +{

Here too.

> + __edac_device_handle_ue(edac_dev, count, inst_nr, block_nr, msg);
> +}

Re: [PATCH v2 1/2] edac: Add an API for edac device to report for multiple errors

2019-09-19 Thread Robert Richter

On 12.09.19 15:53:04, Hanna Hawa wrote:
> Add an API for EDAC device to report multiple errors with same type.
> 
> Signed-off-by: Hanna Hawa 
> ---
>  drivers/edac/edac_device.c | 91 ++
>  drivers/edac/edac_device.h | 40 +
>  2 files changed, 131 insertions(+)
> 
> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
> index 65cf2b9355c4..78ac44103acc 100644
> --- a/drivers/edac/edac_device.c
> +++ b/drivers/edac/edac_device.c
> @@ -643,3 +643,94 @@ void edac_device_handle_ue(struct edac_device_ctl_info 
> *edac_dev,
>   block ? block->name : "N/A", msg);
>  }
>  EXPORT_SYMBOL_GPL(edac_device_handle_ue);
> +
> +void __edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
> +  unsigned int count, int inst_nr, int block_nr,
> +  const char *msg)
> +{

Please do not add a copy here, instead modify the existing function
and share the code with both, old and new functions.

Thanks,

-Robert

Re: [PATCH 1/1] edac: Add an API for edac device to report for multiple errors

2019-09-10 Thread Robert Richter

On 08.09.19 10:58:31, Hawa, Hanna wrote:
> On 9/5/2019 12:56 PM, Robert Richter wrote:

> > > diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
> > > index 65cf2b9355c4..bf6a4fd9831b 100644
> > > --- a/drivers/edac/edac_device.c
> > > +++ b/drivers/edac/edac_device.c
> > > @@ -555,12 +555,15 @@ static inline int 
> > > edac_device_get_panic_on_ue(struct edac_device_ctl_info
> > >   return edac_dev->panic_on_ue;
> > >   }
> > > -void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
> > > - int inst_nr, int block_nr, const char *msg)
> > > +static void __edac_device_handle_ce(struct edac_device_ctl_info 
> > > *edac_dev,
> > > +u16 error_count, int inst_nr, int block_nr,
> > 
> > Just curious, why u16, some register mask size? Maybe just use unsigned int?
> 
> Wanted to be aligned with edac MC.
> I can change it to be u32.

There is no specific reason for u32 either. This code is generic for
many machines and compilers, so unsigned int seems to be the best
choice here to get an optimum.

> > I think the variable can be shortened to 'count', the meaning should
> > still be clear.
> 
> I think more clear to include 'error'.
> maybe shorter name 'err_count'?

IMO 'count' is clear here, 'error' isn't. I prefer short names for
local variables and arguments.

> > > +void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
> > > +int inst_nr, int block_nr, const char *msg)
> > > +{
> > > + __edac_device_handle_ce(edac_dev, 1, inst_nr, block_nr, msg);
> > >   }
> > >   EXPORT_SYMBOL_GPL(edac_device_handle_ce);
> > 
> > We could just export the __*() version of those functions and make
> > everything else inline in the header file? Though, better do this with
> > two patches to avoid an ABI breakage in case someone wants to backport
> > it. Let's see what others say here.
> 
> Waiting for other reviewers.

If no one else complains I would prefer moving to
__edac_device_handle_* as exported sympbol. But make this change in 2
patches to make backports easy, first add __edac_device_handle_*()
(and introduce the *_count() inline functions), then remove
edac_device_handle_*() entirely.

Thanks,

-Robert

Re: [PATCH 1/1] edac: Add an API for edac device to report for multiple errors

2019-09-05 Thread Robert Richter

Hi Hanna,

thanks for the update. See below.

On 05.09.19 09:37:45, Hanna Hawa wrote:
> Add an API for edac device to report multiple errors with same type.
> 
> Signed-off-by: Hanna Hawa 
> ---
>  drivers/edac/edac_device.c | 66 +-
>  drivers/edac/edac_device.h | 31 --
>  2 files changed, 79 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
> index 65cf2b9355c4..bf6a4fd9831b 100644
> --- a/drivers/edac/edac_device.c
> +++ b/drivers/edac/edac_device.c
> @@ -555,12 +555,15 @@ static inline int edac_device_get_panic_on_ue(struct 
> edac_device_ctl_info
>   return edac_dev->panic_on_ue;
>  }
>  
> -void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
> - int inst_nr, int block_nr, const char *msg)
> +static void __edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
> +u16 error_count, int inst_nr, int block_nr,

Just curious, why u16, some register mask size? Maybe just use unsigned int?

I think the variable can be shortened to 'count', the meaning should
still be clear.

> +const char *msg)
>  {
>   struct edac_device_instance *instance;
>   struct edac_device_block *block = NULL;
>  
> + WARN_ON(!error_count);

Should return in this case.

Better use WARN_ON_ONCE() to avoid flooding.

The check should be moved to edac_device_handle_ce_count().

> +
>   if ((inst_nr >= edac_dev->nr_instances) || (inst_nr < 0)) {
>   edac_device_printk(edac_dev, KERN_ERR,
>   "INTERNAL ERROR: 'instance' out of range "
> @@ -582,27 +585,44 @@ void edac_device_handle_ce(struct edac_device_ctl_info 
> *edac_dev,
>  
>   if (instance->nr_blocks > 0) {
>   block = instance->blocks + block_nr;
> - block->counters.ce_count++;
> + block->counters.ce_count += error_count;
>   }
>  
>   /* Propagate the count up the 'totals' tree */
> - instance->counters.ce_count++;
> - edac_dev->counters.ce_count++;
> + instance->counters.ce_count += error_count;
> + edac_dev->counters.ce_count += error_count;
>  
>   if (edac_device_get_log_ce(edac_dev))
>   edac_device_printk(edac_dev, KERN_WARNING,
> - "CE: %s instance: %s block: %s '%s'\n",
> + "CE: %s instance: %s block: %s count: %d 
> '%s'\n",
>   edac_dev->ctl_name, instance->name,
> - block ? block->name : "N/A", msg);
> + block ? block->name : "N/A", error_count, msg);
> +}
> +
> +void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
> +int inst_nr, int block_nr, const char *msg)
> +{
> + __edac_device_handle_ce(edac_dev, 1, inst_nr, block_nr, msg);
>  }
>  EXPORT_SYMBOL_GPL(edac_device_handle_ce);

We could just export the __*() version of those functions and make
everything else inline in the header file? Though, better do this with
two patches to avoid an ABI breakage in case someone wants to backport
it. Let's see what others say here.

>  
> -void edac_device_handle_ue(struct edac_device_ctl_info *edac_dev,
> - int inst_nr, int block_nr, const char *msg)
> +void edac_device_handle_ce_count(struct edac_device_ctl_info *edac_dev,
> +  u16 error_count, int inst_nr, int block_nr,
> +  const char *msg)
> +{
> + __edac_device_handle_ce(edac_dev, error_count, inst_nr, block_nr, msg);
> +}
> +EXPORT_SYMBOL_GPL(edac_device_handle_ce_count);
> +
> +static void __edac_device_handle_ue(struct edac_device_ctl_info *edac_dev,
> +u16 error_count, int inst_nr, int block_nr,
> +const char *msg)

All the above applies for this function too.

>  {
>   struct edac_device_instance *instance;
>   struct edac_device_block *block = NULL;
>  
> + WARN_ON(!error_count);
> +
>   if ((inst_nr >= edac_dev->nr_instances) || (inst_nr < 0)) {
>   edac_device_printk(edac_dev, KERN_ERR,
>   "INTERNAL ERROR: 'instance' out of range "
> @@ -624,22 +644,36 @@ void edac_device_handle_ue(struct edac_device_ctl_info 
> *edac_dev,
>  
>   if (instance->nr_blocks > 0) {
>   block = instance->blocks + block_nr;
> - block->counters.ue_count++;
> + block->counters.ue_count += error_count;
>   }
>  
>   /* Propagate the count up the 'totals' tree */
> - instance->counters.ue_count++;
> - edac_dev->counters.ue_count++;
> + instance->counters.ue_count += error_count;
> + edac_dev->counters.ue_count += error_count;
>  
>   if (edac_device_get_log_ue(edac_dev))
>   edac_device_printk(edac_dev, KERN_EMERG,
> - "UE: %s instance: %s block: %s

Re: [PATCH v3 4/4] edac: Add support for Amazon's Annapurna Labs L2 EDAC

2019-09-03 Thread Robert Richter

On 03.09.19 10:58:16, Borislav Petkov wrote:
> On Tue, Sep 03, 2019 at 08:46:24AM +0000, Robert Richter wrote:
> > This is good, but recent practice is also to have all the drivers for
> > the same piece of hardware in a single file, see e.g. thunderx_edac.c.
> > I don't know how detailed this was discussed already, but I would
> > prefer to join the code.
> 
> This is no longer needed anymore. Check out this thread:
> 
> https://lkml.kernel.org/r/1559211329-13098-3-git-send-email-hhh...@amazon.com

Thanks for the pointer, looks good to me.

-Robert

Re: [PATCH v3 4/4] edac: Add support for Amazon's Annapurna Labs L2 EDAC

2019-09-03 Thread Robert Richter

On 03.09.19 11:28:41, Hawa, Hanna wrote:
> On 9/3/2019 10:27 AM, Robert Richter wrote:
> > On 15.07.19 16:24:09, Hanna Hawa wrote:
> > > Adds support for Amazon's Annapurna Labs L2 EDAC driver to detect and
> > > report L2 errors.
> > > 
> > > Signed-off-by: Hanna Hawa 
> > > ---
> > >   MAINTAINERS   |   6 ++
> > >   drivers/edac/Kconfig  |   8 ++
> > >   drivers/edac/Makefile |   1 +
> > >   drivers/edac/al_l2_edac.c | 187 
> > > ++
> > >   4 files changed, 202 insertions(+)
> > >   create mode 100644 drivers/edac/al_l2_edac.c
> > 
> >  From a brief look at it, it seems some of my comments from 2/4 apply
> > here too. Please look through it.
> 
> Thanks for your review, will look and fix on top of v5.

I have later seen your newer versions posted which haven't made it
into my inbox. But looking at the changelog my comments are still
valid. Thanks for addressing them.

>From the changelog:

"Changes since v1:
-
- Split into two drivers"

This is good, but recent practice is also to have all the drivers for
the same piece of hardware in a single file, see e.g. thunderx_edac.c.
I don't know how detailed this was discussed already, but I would
prefer to join the code.

-Robert

Re: [PATCH v3 4/4] edac: Add support for Amazon's Annapurna Labs L2 EDAC

2019-09-03 Thread Robert Richter

On 15.07.19 16:24:09, Hanna Hawa wrote:
> Adds support for Amazon's Annapurna Labs L2 EDAC driver to detect and
> report L2 errors.
> 
> Signed-off-by: Hanna Hawa 
> ---
>  MAINTAINERS   |   6 ++
>  drivers/edac/Kconfig  |   8 ++
>  drivers/edac/Makefile |   1 +
>  drivers/edac/al_l2_edac.c | 187 
> ++
>  4 files changed, 202 insertions(+)
>  create mode 100644 drivers/edac/al_l2_edac.c

>From a brief look at it, it seems some of my comments from 2/4 apply
here too. Please look through it.

-Robert

Re: [PATCH v3 2/4] edac: Add support for Amazon's Annapurna Labs L1 EDAC

2019-09-03 Thread Robert Richter

On 15.07.19 16:24:07, Hanna Hawa wrote:
> Adds support for Amazon's Annapurna Labs L1 EDAC driver to detect and
> report L1 errors.
> 
> Signed-off-by: Hanna Hawa 
> Reviewed-by: James Morse 
> ---
>  MAINTAINERS   |   6 ++
>  drivers/edac/Kconfig  |   8 +++
>  drivers/edac/Makefile |   1 +
>  drivers/edac/al_l1_edac.c | 156 
> ++
>  4 files changed, 171 insertions(+)
>  create mode 100644 drivers/edac/al_l1_edac.c

> diff --git a/drivers/edac/al_l1_edac.c b/drivers/edac/al_l1_edac.c
> new file mode 100644
> index 000..70510ea
> --- /dev/null
> +++ b/drivers/edac/al_l1_edac.c

[...]

> +static void al_l1_edac_cpumerrsr(void *arg)

Could this being named to something meaningful, such as
*_read_status() or so?

> +{
> + struct edac_device_ctl_info *edac_dev = arg;
> + int cpu, i;
> + u32 ramid, repeat, other, fatal;
> + u64 val = read_sysreg_s(ARM_CA57_CPUMERRSR_EL1);
> + char msg[AL_L1_EDAC_MSG_MAX];
> + int space, count;
> + char *p;
> +
> + if (!(FIELD_GET(ARM_CA57_CPUMERRSR_VALID, val)))
> + return;

[...]

> +static void al_l1_edac_check(struct edac_device_ctl_info *edac_dev)
> +{
> + on_each_cpu(al_l1_edac_cpumerrsr, edac_dev, 1);
> +}
> +
> +static int al_l1_edac_probe(struct platform_device *pdev)
> +{
> + struct edac_device_ctl_info *edac_dev;
> + struct device *dev = >dev;
> + int ret;
> +
> + edac_dev = edac_device_alloc_ctl_info(0, (char *)dev_name(dev), 1, "L",

This type cast looks broken. dev_name() is a constant string already.

Other drivers do not use the dynamically generated dev_name() string
here, instead a fix string such as mod_name or ctl_name could be used.
edac_device_alloc_ctl_info() later generates a unique instance name
derived from name + index.

Regarding the type, this seems to be an API issue of edac_device_
alloc_ctl_info() that should actually use const char* in its
interface. So if needed (from what I wrote above it is not) the type
in the argument list needs to be changed instead.

> +   1, 1, NULL, 0,
> +   edac_device_alloc_index());
> + if (IS_ERR(edac_dev))
> + return -ENOMEM;

Use the original error code instead.

> +
> + edac_dev->edac_check = al_l1_edac_check;
> + edac_dev->dev = dev;
> + edac_dev->mod_name = DRV_NAME;
> + edac_dev->dev_name = dev_name(dev);
> + edac_dev->ctl_name = "L1 cache";

Should not contain spaces and maybe a bit more specific.

> + platform_set_drvdata(pdev, edac_dev);
> +
> + ret = edac_device_add_device(edac_dev);
> + if (ret) {
> + dev_err(dev, "Failed to add L1 edac device\n");

Move this printk below to the error path and maybe print the error
code. You do not cover the -ENOMEM failure.

-Robert

> + goto err;
> + }
> +
> + return 0;
> +err:
> + edac_device_free_ctl_info(edac_dev);
> +
> + return ret;
> +}

[PATCH 5/5] MAINTAINERS: update EDAC's reviewer entry

2019-09-02 Thread Robert Richter

I did some significant work with code in edac_mc.c and ghes_edac.c
already, so I guess I can probably helping out a bit as code reviewer
here.

Signed-off-by: Robert Richter 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 97912d8333a9..79d2ae8519e3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5788,6 +5788,7 @@ M:Borislav Petkov 
 M: Mauro Carvalho Chehab 
 M: Tony Luck 
 R: James Morse 
+R: Robert Richter 
 L: linux-e...@vger.kernel.org
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git 
edac-for-next
 S: Supported
-- 
2.20.1

[PATCH 1/5] EDAC: Prefer 'unsigned int' to bare use of 'unsigned'

2019-09-02 Thread Robert Richter

Use of 'unsigned int' instead of bare use of 'unsigned'. Fix this for
edac_mc*, ghes and the i5100 driver.

Depending on the compiler's warning level it can throw messages like
this:

 WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
 #235: FILE: drivers/edac/i5100_edac.c:854:
 +   const unsigned chan = i5100_csrow_to_chan(mci, dimm->idx);

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c   | 20 ++--
 drivers/edac/edac_mc.h   |  6 +++---
 drivers/edac/edac_mc_sysfs.c |  6 +++---
 drivers/edac/ghes_edac.c |  2 +-
 drivers/edac/i5100_edac.c| 16 +---
 include/linux/edac.h | 10 +-
 6 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index c68f62ab54b0..2f1e3ec8e1cc 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -114,8 +114,8 @@ static const struct kernel_param_ops edac_report_ops = {
 
 module_param_cb(edac_report, _report_ops, _report, 0644);
 
-unsigned edac_dimm_info_location(struct dimm_info *dimm, char *buf,
-unsigned len)
+unsigned int edac_dimm_info_location(struct dimm_info *dimm, char *buf,
+   unsigned int len)
 {
struct mem_ctl_info *mci = dimm->mci;
int i, n, count = 0;
@@ -236,9 +236,9 @@ EXPORT_SYMBOL_GPL(edac_mem_types);
  * At return, the pointer 'p' will be incremented to be used on a next call
  * to this function.
  */
-void *edac_align_ptr(void **p, unsigned size, int n_elems)
+void *edac_align_ptr(void **p, unsigned int size, int n_elems)
 {
-   unsigned align, r;
+   unsigned int align, r;
void *ptr = *p;
 
*p += size * n_elems;
@@ -302,10 +302,10 @@ static void _edac_mc_free(struct mem_ctl_info *mci)
kfree(mci);
 }
 
-struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
-  unsigned n_layers,
+struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
+  unsigned int n_layers,
   struct edac_mc_layer *layers,
-  unsigned sz_pvt)
+  unsigned int sz_pvt)
 {
struct mem_ctl_info *mci;
struct edac_mc_layer *layer;
@@ -313,9 +313,9 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
struct rank_info *chan;
struct dimm_info *dimm;
u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
-   unsigned pos[EDAC_MAX_LAYERS];
-   unsigned size, tot_dimms = 1, count = 1;
-   unsigned tot_csrows = 1, tot_channels = 1, tot_errcount = 0;
+   unsigned int pos[EDAC_MAX_LAYERS];
+   unsigned int size, tot_dimms = 1, count = 1;
+   unsigned int tot_csrows = 1, tot_channels = 1, tot_errcount = 0;
void *pvt, *p, *ptr = NULL;
int i, j, row, chn, n, len, off;
bool per_rank = false;
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 4165e15995ad..02aac5c61d00 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -122,10 +122,10 @@ do {  
\
  * On success, return a pointer to struct mem_ctl_info pointer;
  * %NULL otherwise
  */
-struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
-  unsigned n_layers,
+struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
+  unsigned int n_layers,
   struct edac_mc_layer *layers,
-  unsigned sz_pvt);
+  unsigned int sz_pvt);
 
 /**
  * edac_get_owner - Return the owner's mod_name of EDAC MC
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 4386ea4b9b5a..640b9419623e 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -200,7 +200,7 @@ static ssize_t channel_dimm_label_show(struct device *dev,
   char *data)
 {
struct csrow_info *csrow = to_csrow(dev);
-   unsigned chan = to_channel(mattr);
+   unsigned int chan = to_channel(mattr);
struct rank_info *rank = csrow->channels[chan];
 
/* if field has not been initialized, there is nothing to send */
@@ -216,7 +216,7 @@ static ssize_t channel_dimm_label_store(struct device *dev,
const char *data, size_t count)
 {
struct csrow_info *csrow = to_csrow(dev);
-   unsigned chan = to_channel(mattr);
+   unsigned int chan = to_channel(mattr);
struct rank_info *rank = csrow->channels[chan];
size_t copy_count = count;
 
@@ -240,7 +240,7 @@ static ssize_t channel_ce_count_show(struct device *dev,
 struct device_attribute *mattr, char *data)
 {
struct csrow_info *csrow = to_csrow(dev);
-   unsigned chan

[PATCH 4/5] EDAC, mc_sysfs: Make debug messages consistent

2019-09-02 Thread Robert Richter

Debug messages are inconsistently used in the error handlers. Some
lack an error message, some are called regardless the return status,
messages for the same error are at different locations in the code
depending on the error code. This happens esp. near put_device()
calls. Make those debug messages more consistent. Additionally, unify
the error messages to have the same terms for the same operations of
the device.

Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc_sysfs.c | 63 +---
 1 file changed, 29 insertions(+), 34 deletions(-)

diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 309fc24339b0..eaccde3fc1b8 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -278,7 +278,7 @@ static void csrow_attr_release(struct device *dev)
 {
struct csrow_info *csrow = container_of(dev, struct csrow_info, dev);
 
-   edac_dbg(1, "Releasing csrow device %s\n", dev_name(dev));
+   edac_dbg(1, "device %s released\n", dev_name(dev));
kfree(csrow);
 }
 
@@ -414,14 +414,16 @@ static int edac_create_csrow_object(struct mem_ctl_info 
*mci,
dev_set_name(>dev, "csrow%d", index);
dev_set_drvdata(>dev, csrow);
 
-   edac_dbg(0, "creating (virtual) csrow node %s\n",
-dev_name(>dev));
-
err = device_add(>dev);
-   if (err)
+   if (err) {
+   edac_dbg(1, "failure: create device %s\n", 
dev_name(>dev));
put_device(>dev);
+   return err;
+   }
 
-   return err;
+   edac_dbg(0, "device %s created\n", dev_name(>dev));
+
+   return 0;
 }
 
 /* Create a CSROW object under specifed edac_mc_device */
@@ -435,12 +437,8 @@ static int edac_create_csrow_objects(struct mem_ctl_info 
*mci)
if (!nr_pages_per_csrow(csrow))
continue;
err = edac_create_csrow_object(mci, mci->csrows[i], i);
-   if (err < 0) {
-   edac_dbg(1,
-"failure: create csrow objects for csrow %d\n",
-i);
+   if (err < 0)
goto error;
-   }
}
return 0;
 
@@ -624,7 +622,7 @@ static void dimm_attr_release(struct device *dev)
 {
struct dimm_info *dimm = container_of(dev, struct dimm_info, dev);
 
-   edac_dbg(1, "Releasing dimm device %s\n", dev_name(dev));
+   edac_dbg(1, "device %s released\n", dev_name(dev));
kfree(dimm);
 }
 
@@ -653,12 +651,20 @@ static int edac_create_dimm_object(struct mem_ctl_info 
*mci,
pm_runtime_forbid(>dev);
 
err = device_add(>dev);
-   if (err)
+   if (err) {
+   edac_dbg(1, "failure: create device %s\n", 
dev_name(>dev));
put_device(>dev);
+   return err;
+   }
 
-   edac_dbg(0, "created rank/dimm device %s\n", dev_name(>dev));
+   if (IS_ENABLED(CONFIG_EDAC_DEBUG)) {
+   char location[80];
+   edac_dimm_info_location(dimm, location, sizeof(location));
+   edac_dbg(0, "device %s created at location %s\n",
+   dev_name(>dev), location);
+   }
 
-   return err;
+   return 0;
 }
 
 /*
@@ -901,7 +907,7 @@ static void mci_attr_release(struct device *dev)
 {
struct mem_ctl_info *mci = container_of(dev, struct mem_ctl_info, dev);
 
-   edac_dbg(1, "Releasing csrow device %s\n", dev_name(dev));
+   edac_dbg(1, "device %s released\n", dev_name(dev));
kfree(mci);
 }
 
@@ -933,7 +939,6 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
dev_set_drvdata(>dev, mci);
pm_runtime_forbid(>dev);
 
-   edac_dbg(0, "creating device %s\n", dev_name(>dev));
err = device_add(>dev);
if (err < 0) {
edac_dbg(1, "failure: create device %s\n", dev_name(>dev));
@@ -941,6 +946,8 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
return err;
}
 
+   edac_dbg(0, "device %s created\n", dev_name(>dev));
+
/*
 * Create the dimm/rank devices
 */
@@ -950,22 +957,9 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
if (!dimm->nr_pages)
continue;
 
-#ifdef CONFIG_EDAC_DEBUG
-   edac_dbg(1, "creating dimm%d, located at ", i);
-   if (edac_debug_level >= 1) {
-   int lay;
-   for (lay = 0; lay < mci->n_layers; lay++)
-   printk(KERN_CONT "%s %d ",
-   edac_layer_name[mci->layers[lay].type],
-

[PATCH 0/5] EDAC: Small cleanups and fixes

2019-09-02 Thread Robert Richter

A bunch of cleanups and fixes for issues found while working with the
code. Changes are individual and independent from each other. They can
be applied separately (only #4 depends on #3).

Also updating the reviewer's entry as I will be able to do some
reviews for edac code.

Note that patch #3 is an updated version of a patch reviewed before:

 https://lore.kernel.org/patchwork/patch/1093466/

Some references to ml discussions that are related to this series:

 https://lore.kernel.org/patchwork/patch/1093480/#1312590
 https://lore.kernel.org/patchwork/patch/1093466/#1310572

Robert Richter (5):
  EDAC: Prefer 'unsigned int' to bare use of 'unsigned'
  EDAC, mc_sysfs: Change dev_ch_attribute->channel to unsigned int
  EDAC, mc_sysfs: Remove pointless gotos
  EDAC, mc_sysfs: Make debug messages consistent
  MAINTAINERS: update EDAC's reviewer entry

 MAINTAINERS  |  1 +
 drivers/edac/edac_mc.c   | 20 
 drivers/edac/edac_mc.h   |  6 +--
 drivers/edac/edac_mc_sysfs.c | 91 
 drivers/edac/ghes_edac.c |  2 +-
 drivers/edac/i5100_edac.c| 16 ---
 include/linux/edac.h | 10 ++--
 7 files changed, 69 insertions(+), 77 deletions(-)

-- 
2.20.1

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1735 matches

Mail list logo