[PATCH] powerpc/mm: fix early initialization failure for MMUs with no hash table

2021-11-26 Thread Vladimir Oltean
The blamed patch attempted to do a trivial conversion of
map_mem_in_cams() by adding an extra "bool init" argument, but by
mistake, changed the way in which two call sites pass the other boolean
argument, "bool dry_run".

As a result, early_init_this_mmu() now calls map_mem_in_cams() with
dry_run=true, and setup_initial_memory_limit() calls with dry_run=false,
both of which are unintended changes.

This makes the kernel boot process hang here:

[0.045211] e500 family performance monitor hardware support registered
[0.051891] rcu: Hierarchical SRCU implementation.
[0.057791] smp: Bringing up secondary CPUs ...

Issue noticed on a Freescale T1040.

Fixes: 52bda69ae8b5 ("powerpc/fsl_booke: Tell map_mem_in_cams() if init is 
done")
Cc: sta...@vger.kernel.org
Signed-off-by: Vladimir Oltean 
---
 arch/powerpc/mm/nohash/tlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index 89353d4f5604..647bf454a0fa 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -645,7 +645,7 @@ static void early_init_this_mmu(void)
 
if (map)
linear_map_top = map_mem_in_cams(linear_map_top,
-num_cams, true, true);
+num_cams, false, true);
}
 #endif
 
@@ -766,7 +766,7 @@ void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
num_cams = (mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY) / 4;
 
linear_sz = map_mem_in_cams(first_memblock_size, num_cams,
-   false, true);
+   true, true);
 
ppc64_rma_size = min_t(u64, linear_sz, 0x4000);
} else
-- 
2.25.1



Re: [PATCH] ASoC: imx-hdmi: add put_device() after of_find_device_by_node()

2021-11-26 Thread Mark Brown
On Wed, 10 Nov 2021 00:29:10 +, cgel@gmail.com wrote:
> From: Ye Guojin 
> 
> This was found by coccicheck:
> ./sound/soc/fsl/imx-hdmi.c,209,1-7,ERROR  missing put_device; call
> of_find_device_by_node on line 119, but without a corresponding object
> release within this function.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: imx-hdmi: add put_device() after of_find_device_by_node()
  commit: f670b274f7f6f4b2722d7f08d0fddf606a727e92

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


[patch 22/22] PCI/MSI: Move descriptor counting on allocation fail to the legacy code

2021-11-26 Thread Thomas Gleixner
The irqdomain code already returns the information. Move the loop to the
legacy code.

Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi/legacy.c |   20 +++-
 drivers/pci/msi/msi.c|   19 +--
 2 files changed, 20 insertions(+), 19 deletions(-)

--- a/drivers/pci/msi/legacy.c
+++ b/drivers/pci/msi/legacy.c
@@ -50,9 +50,27 @@ void __weak arch_teardown_msi_irqs(struc
}
 }
 
+static int pci_msi_setup_check_result(struct pci_dev *dev, int type, int ret)
+{
+   struct msi_desc *entry;
+   int avail = 0;
+
+   if (type != PCI_CAP_ID_MSIX || ret >= 0)
+   return ret;
+
+   /* Scan the MSI descriptors for successfully allocated ones. */
+   for_each_pci_msi_entry(entry, dev) {
+   if (entry->irq != 0)
+   avail++;
+   }
+   return avail ? avail : ret;
+}
+
 int pci_msi_legacy_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
-   return arch_setup_msi_irqs(dev, nvec, type);
+   int ret = arch_setup_msi_irqs(dev, nvec, type);
+
+   return pci_msi_setup_check_result(dev, type, ret);
 }
 
 void pci_msi_legacy_teardown_msi_irqs(struct pci_dev *dev)
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -609,7 +609,7 @@ static int msix_capability_init(struct p
 
ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX);
if (ret)
-   goto out_avail;
+   goto out_free;
 
/* Check if all MSI entries honor device restrictions */
ret = msi_verify_entries(dev);
@@ -634,23 +634,6 @@ static int msix_capability_init(struct p
pcibios_free_irq(dev);
return 0;
 
-out_avail:
-   if (ret < 0) {
-   /*
-* If we had some success, report the number of IRQs
-* we succeeded in setting up.
-*/
-   struct msi_desc *entry;
-   int avail = 0;
-
-   for_each_pci_msi_entry(entry, dev) {
-   if (entry->irq != 0)
-   avail++;
-   }
-   if (avail != 0)
-   ret = avail;
-   }
-
 out_free:
free_msi_irqs(dev);
 



[patch 21/22] genirq/msi: Handle PCI/MSI allocation fail in core code

2021-11-26 Thread Thomas Gleixner
Get rid of yet another irqdomain callback and let the core code return the
already available information of how many descriptors could be allocated.

Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi/irqdomain.c |   13 -
 include/linux/msi.h |5 +
 kernel/irq/msi.c|   29 +
 3 files changed, 26 insertions(+), 21 deletions(-)

--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -95,16 +95,6 @@ static int pci_msi_domain_check_cap(stru
return 0;
 }
 
-static int pci_msi_domain_handle_error(struct irq_domain *domain,
-  struct msi_desc *desc, int error)
-{
-   /* Special handling to support __pci_enable_msi_range() */
-   if (pci_msi_desc_is_multi_msi(desc) && error == -ENOSPC)
-   return 1;
-
-   return error;
-}
-
 static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
struct msi_desc *desc)
 {
@@ -115,7 +105,6 @@ static void pci_msi_domain_set_desc(msi_
 static struct msi_domain_ops pci_msi_domain_ops_default = {
.set_desc   = pci_msi_domain_set_desc,
.msi_check  = pci_msi_domain_check_cap,
-   .handle_error   = pci_msi_domain_handle_error,
 };
 
 static void pci_msi_domain_update_dom_ops(struct msi_domain_info *info)
@@ -129,8 +118,6 @@ static void pci_msi_domain_update_dom_op
ops->set_desc = pci_msi_domain_set_desc;
if (ops->msi_check == NULL)
ops->msi_check = pci_msi_domain_check_cap;
-   if (ops->handle_error == NULL)
-   ops->handle_error = pci_msi_domain_handle_error;
}
 }
 
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -285,7 +285,6 @@ struct msi_domain_info;
  * @msi_check: Callback for verification of the domain/info/dev data
  * @msi_prepare:   Prepare the allocation of the interrupts in the domain
  * @set_desc:  Set the msi descriptor for an interrupt
- * @handle_error:  Optional error handler if the allocation fails
  * @domain_alloc_irqs: Optional function to override the default allocation
  * function.
  * @domain_free_irqs:  Optional function to override the default free
@@ -294,7 +293,7 @@ struct msi_domain_info;
  * @get_hwirq, @msi_init and @msi_free are callbacks used by the underlying
  * irqdomain.
  *
- * @msi_check, @msi_prepare, @handle_error and @set_desc are callbacks used by
+ * @msi_check, @msi_prepare and @set_desc are callbacks used by
  * msi_domain_alloc/free_irqs().
  *
  * @domain_alloc_irqs, @domain_free_irqs can be used to override the
@@ -331,8 +330,6 @@ struct msi_domain_ops {
   msi_alloc_info_t *arg);
void(*set_desc)(msi_alloc_info_t *arg,
struct msi_desc *desc);
-   int (*handle_error)(struct irq_domain *domain,
-   struct msi_desc *desc, int error);
int (*domain_alloc_irqs)(struct irq_domain *domain,
 struct device *dev, int nvec);
void(*domain_free_irqs)(struct irq_domain *domain,
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -538,6 +538,27 @@ static bool msi_check_reservation_mode(s
return desc->pci.msi_attrib.is_msix || desc->pci.msi_attrib.can_mask;
 }
 
+static int msi_handle_pci_fail(struct irq_domain *domain, struct msi_desc 
*desc,
+  int allocated)
+{
+   switch(domain->bus_token) {
+   case DOMAIN_BUS_PCI_MSI:
+   case DOMAIN_BUS_VMD_MSI:
+   if (IS_ENABLED(CONFIG_PCI_MSI))
+   break;
+   fallthrough;
+   default:
+   return -ENOSPC;
+   }
+
+   /* Let a failed PCI multi MSI allocation retry */
+   if (desc->nvec_used > 1)
+   return 1;
+
+   /* If there was a successful allocation let the caller know */
+   return allocated ? allocated : -ENOSPC;
+}
+
 int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
int nvec)
 {
@@ -546,6 +567,7 @@ int __msi_domain_alloc_irqs(struct irq_d
struct irq_data *irq_data;
struct msi_desc *desc;
msi_alloc_info_t arg = { };
+   int allocated = 0;
int i, ret, virq;
bool can_reserve;
 
@@ -560,16 +582,15 @@ int __msi_domain_alloc_irqs(struct irq_d
   dev_to_node(dev), , false,
   desc->affinity);
if (virq < 0) {
-   ret = -ENOSPC;
-   if (ops->handle_error)
-   ret = ops->handle_error(domain, desc, ret);
-   return ret;
+   ret = msi_handle_pci_fail(domain, desc, allocated);
+

[patch 19/22] PCI/MSI: Sanitize MSIX table map handling

2021-11-26 Thread Thomas Gleixner
Unmapping the MSIX base mapping in the loops which allocate/free MSI
desciptors is daft and in the way of allowing runtime expansion of MSI-X
descriptors.

Store the mapping in struct pci_dev and free it after freeing the MSI-X
descriptors.

Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi/msi.c |   18 --
 include/linux/pci.h   |1 +
 2 files changed, 9 insertions(+), 10 deletions(-)

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -241,14 +241,14 @@ static void free_msi_irqs(struct pci_dev
pci_msi_teardown_msi_irqs(dev);
 
list_for_each_entry_safe(entry, tmp, msi_list, list) {
-   if (entry->pci.msi_attrib.is_msix) {
-   if (list_is_last(>list, msi_list))
-   iounmap(entry->pci.mask_base);
-   }
-
list_del(>list);
free_msi_entry(entry);
}
+
+   if (dev->msix_base) {
+   iounmap(dev->msix_base);
+   dev->msix_base = NULL;
+   }
 }
 
 static void pci_intx_for_msi(struct pci_dev *dev, int enable)
@@ -501,10 +501,6 @@ static int msix_setup_entries(struct pci
for (i = 0, curmsk = masks; i < nvec; i++) {
entry = alloc_msi_entry(>dev, 1, curmsk);
if (!entry) {
-   if (!i)
-   iounmap(base);
-   else
-   free_msi_irqs(dev);
/* No enough memory. Don't try again */
ret = -ENOMEM;
goto out;
@@ -602,12 +598,14 @@ static int msix_capability_init(struct p
goto out_disable;
}
 
+   dev->msix_base = base;
+
/* Ensure that all table entries are masked. */
msix_mask_all(base, tsize);
 
ret = msix_setup_entries(dev, base, entries, nvec, affd);
if (ret)
-   goto out_disable;
+   goto out_free;
 
ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX);
if (ret)
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -473,6 +473,7 @@ struct pci_dev {
u8  ptm_granularity;
 #endif
 #ifdef CONFIG_PCI_MSI
+   void __iomem*msix_base;
const struct attribute_group **msi_irq_groups;
 #endif
struct pci_vpd  vpd;



[patch 20/22] PCI/MSI: Make pci_msi_domain_check_cap() static

2021-11-26 Thread Thomas Gleixner
No users outside of that file.

Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi/irqdomain.c |5 +++--
 include/linux/msi.h |2 --
 2 files changed, 3 insertions(+), 4 deletions(-)

--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -79,8 +79,9 @@ static inline bool pci_msi_desc_is_multi
  *  1 if Multi MSI is requested, but the domain does not support it
  *  -ENOTSUPP otherwise
  */
-int pci_msi_domain_check_cap(struct irq_domain *domain,
-struct msi_domain_info *info, struct device *dev)
+static int pci_msi_domain_check_cap(struct irq_domain *domain,
+   struct msi_domain_info *info,
+   struct device *dev)
 {
struct msi_desc *desc = first_pci_msi_entry(to_pci_dev(dev));
 
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -438,8 +438,6 @@ void *platform_msi_get_host_data(struct
 struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
 struct msi_domain_info *info,
 struct irq_domain *parent);
-int pci_msi_domain_check_cap(struct irq_domain *domain,
-struct msi_domain_info *info, struct device *dev);
 u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev 
*pdev);
 struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev);
 bool pci_dev_has_special_msi_domain(struct pci_dev *pdev);



[patch 18/22] PCI/MSI: Split out irqdomain code

2021-11-26 Thread Thomas Gleixner
Move the irqdomain specific code into it's own file.

Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi/Makefile|1 
 drivers/pci/msi/irqdomain.c |  279 ++
 drivers/pci/msi/legacy.c|   10 +
 drivers/pci/msi/msi.c   |  319 +---
 drivers/pci/msi/msi.h   |   39 +
 include/linux/msi.h |   11 -
 6 files changed, 339 insertions(+), 320 deletions(-)

--- a/drivers/pci/msi/Makefile
+++ b/drivers/pci/msi/Makefile
@@ -3,4 +3,5 @@
 # Makefile for the PCI/MSI
 obj-$(CONFIG_PCI)  += pcidev_msi.o
 obj-$(CONFIG_PCI_MSI)  += msi.o
+obj-$(CONFIG_PCI_MSI_IRQ_DOMAIN)   += irqdomain.o
 obj-$(CONFIG_PCI_MSI_ARCH_FALLBACKS)   += legacy.o
--- /dev/null
+++ b/drivers/pci/msi/irqdomain.c
@@ -0,0 +1,279 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCI Message Signaled Interrupt (MSI) - irqdomain support
+ */
+#include 
+#include 
+#include 
+
+#include "msi.h"
+
+int pci_msi_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+{
+   struct irq_domain *domain;
+
+   domain = dev_get_msi_domain(>dev);
+   if (domain && irq_domain_is_hierarchy(domain))
+   return msi_domain_alloc_irqs(domain, >dev, nvec);
+
+   return pci_msi_legacy_setup_msi_irqs(dev, nvec, type);
+}
+
+void pci_msi_teardown_msi_irqs(struct pci_dev *dev)
+{
+   struct irq_domain *domain;
+
+   domain = dev_get_msi_domain(>dev);
+   if (domain && irq_domain_is_hierarchy(domain))
+   msi_domain_free_irqs(domain, >dev);
+   else
+   pci_msi_legacy_teardown_msi_irqs(dev);
+}
+
+/**
+ * pci_msi_domain_write_msg - Helper to write MSI message to PCI config space
+ * @irq_data:  Pointer to interrupt data of the MSI interrupt
+ * @msg:   Pointer to the message
+ */
+static void pci_msi_domain_write_msg(struct irq_data *irq_data, struct msi_msg 
*msg)
+{
+   struct msi_desc *desc = irq_data_get_msi_desc(irq_data);
+
+   /*
+* For MSI-X desc->irq is always equal to irq_data->irq. For
+* MSI only the first interrupt of MULTI MSI passes the test.
+*/
+   if (desc->irq == irq_data->irq)
+   __pci_write_msi_msg(desc, msg);
+}
+
+/**
+ * pci_msi_domain_calc_hwirq - Generate a unique ID for an MSI source
+ * @desc:  Pointer to the MSI descriptor
+ *
+ * The ID number is only used within the irqdomain.
+ */
+static irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
+{
+   struct pci_dev *dev = msi_desc_to_pci_dev(desc);
+
+   return (irq_hw_number_t)desc->pci.msi_attrib.entry_nr |
+   pci_dev_id(dev) << 11 |
+   (pci_domain_nr(dev->bus) & 0x) << 27;
+}
+
+static inline bool pci_msi_desc_is_multi_msi(struct msi_desc *desc)
+{
+   return !desc->pci.msi_attrib.is_msix && desc->nvec_used > 1;
+}
+
+/**
+ * pci_msi_domain_check_cap - Verify that @domain supports the capabilities
+ *   for @dev
+ * @domain:The interrupt domain to check
+ * @info:  The domain info for verification
+ * @dev:   The device to check
+ *
+ * Returns:
+ *  0 if the functionality is supported
+ *  1 if Multi MSI is requested, but the domain does not support it
+ *  -ENOTSUPP otherwise
+ */
+int pci_msi_domain_check_cap(struct irq_domain *domain,
+struct msi_domain_info *info, struct device *dev)
+{
+   struct msi_desc *desc = first_pci_msi_entry(to_pci_dev(dev));
+
+   /* Special handling to support __pci_enable_msi_range() */
+   if (pci_msi_desc_is_multi_msi(desc) &&
+   !(info->flags & MSI_FLAG_MULTI_PCI_MSI))
+   return 1;
+   else if (desc->pci.msi_attrib.is_msix && !(info->flags & 
MSI_FLAG_PCI_MSIX))
+   return -ENOTSUPP;
+
+   return 0;
+}
+
+static int pci_msi_domain_handle_error(struct irq_domain *domain,
+  struct msi_desc *desc, int error)
+{
+   /* Special handling to support __pci_enable_msi_range() */
+   if (pci_msi_desc_is_multi_msi(desc) && error == -ENOSPC)
+   return 1;
+
+   return error;
+}
+
+static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
+   struct msi_desc *desc)
+{
+   arg->desc = desc;
+   arg->hwirq = pci_msi_domain_calc_hwirq(desc);
+}
+
+static struct msi_domain_ops pci_msi_domain_ops_default = {
+   .set_desc   = pci_msi_domain_set_desc,
+   .msi_check  = pci_msi_domain_check_cap,
+   .handle_error   = pci_msi_domain_handle_error,
+};
+
+static void pci_msi_domain_update_dom_ops(struct msi_domain_info *info)
+{
+   struct msi_domain_ops *ops = info->ops;
+
+   if (ops == NULL) {
+   info->ops = _msi_domain_ops_default;
+   } else {
+   if (ops->set_desc == NULL)
+   ops->set_desc = pci_msi_domain_set_desc;
+   if (ops->msi_check == NULL)
+   

[patch 17/22] PCI/MSI: Split out !IRQDOMAIN code

2021-11-26 Thread Thomas Gleixner
Split out the non irqdomain code into its own file.

Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi/Makefile |5 ++--
 drivers/pci/msi/legacy.c |   51 +++
 drivers/pci/msi/msi.c|   46 --
 3 files changed, 54 insertions(+), 48 deletions(-)

--- a/drivers/pci/msi/Makefile
+++ b/drivers/pci/msi/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 #
 # Makefile for the PCI/MSI
-obj-$(CONFIG_PCI)  += pcidev_msi.o
-obj-$(CONFIG_PCI_MSI)  += msi.o
+obj-$(CONFIG_PCI)  += pcidev_msi.o
+obj-$(CONFIG_PCI_MSI)  += msi.o
+obj-$(CONFIG_PCI_MSI_ARCH_FALLBACKS)   += legacy.o
--- /dev/null
+++ b/drivers/pci/msi/legacy.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCI Message Signaled Interrupt (MSI).
+ *
+ * Legacy architecture specific setup and teardown mechanism.
+ */
+#include "msi.h"
+
+/* Arch hooks */
+int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
+{
+   return -EINVAL;
+}
+
+void __weak arch_teardown_msi_irq(unsigned int irq)
+{
+}
+
+int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+{
+   struct msi_desc *desc;
+   int ret;
+
+   /*
+* If an architecture wants to support multiple MSI, it needs to
+* override arch_setup_msi_irqs()
+*/
+   if (type == PCI_CAP_ID_MSI && nvec > 1)
+   return 1;
+
+   for_each_pci_msi_entry(desc, dev) {
+   ret = arch_setup_msi_irq(dev, desc);
+   if (ret)
+   return ret < 0 ? ret : -ENOSPC;
+   }
+
+   return 0;
+}
+
+void __weak arch_teardown_msi_irqs(struct pci_dev *dev)
+{
+   struct msi_desc *desc;
+   int i;
+
+   for_each_pci_msi_entry(desc, dev) {
+   if (desc->irq) {
+   for (i = 0; i < entry->nvec_used; i++)
+   arch_teardown_msi_irq(desc->irq + i);
+   }
+   }
+}
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -50,52 +50,6 @@ static void pci_msi_teardown_msi_irqs(st
 #define pci_msi_teardown_msi_irqs  arch_teardown_msi_irqs
 #endif
 
-#ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
-/* Arch hooks */
-int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
-{
-   return -EINVAL;
-}
-
-void __weak arch_teardown_msi_irq(unsigned int irq)
-{
-}
-
-int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-{
-   struct msi_desc *entry;
-   int ret;
-
-   /*
-* If an architecture wants to support multiple MSI, it needs to
-* override arch_setup_msi_irqs()
-*/
-   if (type == PCI_CAP_ID_MSI && nvec > 1)
-   return 1;
-
-   for_each_pci_msi_entry(entry, dev) {
-   ret = arch_setup_msi_irq(dev, entry);
-   if (ret < 0)
-   return ret;
-   if (ret > 0)
-   return -ENOSPC;
-   }
-
-   return 0;
-}
-
-void __weak arch_teardown_msi_irqs(struct pci_dev *dev)
-{
-   int i;
-   struct msi_desc *entry;
-
-   for_each_pci_msi_entry(entry, dev)
-   if (entry->irq)
-   for (i = 0; i < entry->nvec_used; i++)
-   arch_teardown_msi_irq(entry->irq + i);
-}
-#endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */
-
 /*
  * PCI 2.3 does not specify mask bits for each MSI interrupt.  Attempting to
  * mask all MSI interrupts by clearing the MSI enable bit does not work



[patch 16/22] PCI/MSI: Split out CONFIG_PCI_MSI independent part

2021-11-26 Thread Thomas Gleixner
These functions are required even when CONFIG_PCI_MSI is not set. Move them
to their own file.

Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi/Makefile |3 ++-
 drivers/pci/msi/msi.c|   39 ---
 drivers/pci/msi/pcidev_msi.c |   43 +++
 3 files changed, 45 insertions(+), 40 deletions(-)

--- a/drivers/pci/msi/Makefile
+++ b/drivers/pci/msi/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 #
 # Makefile for the PCI/MSI
-obj-$(CONFIG_PCI)  += msi.o
+obj-$(CONFIG_PCI)  += pcidev_msi.o
+obj-$(CONFIG_PCI_MSI)  += msi.o
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -18,8 +18,6 @@
 
 #include "../pci.h"
 
-#ifdef CONFIG_PCI_MSI
-
 static int pci_msi_enable = 1;
 int pci_msi_ignore_mask;
 
@@ -1479,40 +1477,3 @@ bool pci_dev_has_special_msi_domain(stru
 }
 
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
-#endif /* CONFIG_PCI_MSI */
-
-void pci_msi_init(struct pci_dev *dev)
-{
-   u16 ctrl;
-
-   /*
-* Disable the MSI hardware to avoid screaming interrupts
-* during boot.  This is the power on reset default so
-* usually this should be a noop.
-*/
-   dev->msi_cap = pci_find_capability(dev, PCI_CAP_ID_MSI);
-   if (!dev->msi_cap)
-   return;
-
-   pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, );
-   if (ctrl & PCI_MSI_FLAGS_ENABLE)
-   pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS,
- ctrl & ~PCI_MSI_FLAGS_ENABLE);
-
-   if (!(ctrl & PCI_MSI_FLAGS_64BIT))
-   dev->no_64bit_msi = 1;
-}
-
-void pci_msix_init(struct pci_dev *dev)
-{
-   u16 ctrl;
-
-   dev->msix_cap = pci_find_capability(dev, PCI_CAP_ID_MSIX);
-   if (!dev->msix_cap)
-   return;
-
-   pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, );
-   if (ctrl & PCI_MSIX_FLAGS_ENABLE)
-   pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS,
- ctrl & ~PCI_MSIX_FLAGS_ENABLE);
-}
--- /dev/null
+++ b/drivers/pci/msi/pcidev_msi.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * MSI[X} related functions which are available unconditionally.
+ */
+#include "../pci.h"
+
+/*
+ * Disable the MSI[X] hardware to avoid screaming interrupts during boot.
+ * This is the power on reset default so usually this should be a noop.
+ */
+
+void pci_msi_init(struct pci_dev *dev)
+{
+   u16 ctrl;
+
+   dev->msi_cap = pci_find_capability(dev, PCI_CAP_ID_MSI);
+   if (!dev->msi_cap)
+   return;
+
+   pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, );
+   if (ctrl & PCI_MSI_FLAGS_ENABLE) {
+   pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS,
+ ctrl & ~PCI_MSI_FLAGS_ENABLE);
+   }
+
+   if (!(ctrl & PCI_MSI_FLAGS_64BIT))
+   dev->no_64bit_msi = 1;
+}
+
+void pci_msix_init(struct pci_dev *dev)
+{
+   u16 ctrl;
+
+   dev->msix_cap = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+   if (!dev->msix_cap)
+   return;
+
+   pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, );
+   if (ctrl & PCI_MSIX_FLAGS_ENABLE) {
+   pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS,
+ ctrl & ~PCI_MSIX_FLAGS_ENABLE);
+   }
+}



[patch 15/22] PCI/MSI: Move code into a separate directory

2021-11-26 Thread Thomas Gleixner
msi.c is getting larger and really could do with a splitup. Move it into
it's own directory to prepare for that.

Signed-off-by: Thomas Gleixner 
---
 Documentation/driver-api/pci/pci.rst |2 
 drivers/pci/Makefile |3 
 drivers/pci/msi.c| 1532 ---
 drivers/pci/msi/Makefile |4 
 drivers/pci/msi/msi.c| 1532 +++
 5 files changed, 1539 insertions(+), 1534 deletions(-)

--- a/Documentation/driver-api/pci/pci.rst
+++ b/Documentation/driver-api/pci/pci.rst
@@ -13,7 +13,7 @@ PCI Support Library
 .. kernel-doc:: drivers/pci/search.c
:export:
 
-.. kernel-doc:: drivers/pci/msi.c
+.. kernel-doc:: drivers/pci/msi/msi.c
:export:
 
 .. kernel-doc:: drivers/pci/bus.c
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -5,8 +5,9 @@
 obj-$(CONFIG_PCI)  += access.o bus.o probe.o host-bridge.o \
   remove.o pci.o pci-driver.o search.o \
   pci-sysfs.o rom.o setup-res.o irq.o vpd.o \
-  setup-bus.o vc.o mmap.o setup-irq.o msi.o
+  setup-bus.o vc.o mmap.o setup-irq.o
 
+obj-$(CONFIG_PCI)  += msi/
 obj-$(CONFIG_PCI)  += pcie/
 
 ifdef CONFIG_PCI
--- a/drivers/pci/msi.c
+++ /dev/null
@@ -1,1532 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * PCI Message Signaled Interrupt (MSI)
- *
- * Copyright (C) 2003-2004 Intel
- * Copyright (C) Tom Long Nguyen (tom.l.ngu...@intel.com)
- * Copyright (C) 2016 Christoph Hellwig.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "pci.h"
-
-#ifdef CONFIG_PCI_MSI
-
-static int pci_msi_enable = 1;
-int pci_msi_ignore_mask;
-
-#define msix_table_size(flags) ((flags & PCI_MSIX_FLAGS_QSIZE) + 1)
-
-#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
-static int pci_msi_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-{
-   struct irq_domain *domain;
-
-   domain = dev_get_msi_domain(>dev);
-   if (domain && irq_domain_is_hierarchy(domain))
-   return msi_domain_alloc_irqs(domain, >dev, nvec);
-
-   return arch_setup_msi_irqs(dev, nvec, type);
-}
-
-static void pci_msi_teardown_msi_irqs(struct pci_dev *dev)
-{
-   struct irq_domain *domain;
-
-   domain = dev_get_msi_domain(>dev);
-   if (domain && irq_domain_is_hierarchy(domain))
-   msi_domain_free_irqs(domain, >dev);
-   else
-   arch_teardown_msi_irqs(dev);
-}
-#else
-#define pci_msi_setup_msi_irqs arch_setup_msi_irqs
-#define pci_msi_teardown_msi_irqs  arch_teardown_msi_irqs
-#endif
-
-#ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
-/* Arch hooks */
-int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
-{
-   return -EINVAL;
-}
-
-void __weak arch_teardown_msi_irq(unsigned int irq)
-{
-}
-
-int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-{
-   struct msi_desc *entry;
-   int ret;
-
-   /*
-* If an architecture wants to support multiple MSI, it needs to
-* override arch_setup_msi_irqs()
-*/
-   if (type == PCI_CAP_ID_MSI && nvec > 1)
-   return 1;
-
-   for_each_pci_msi_entry(entry, dev) {
-   ret = arch_setup_msi_irq(dev, entry);
-   if (ret < 0)
-   return ret;
-   if (ret > 0)
-   return -ENOSPC;
-   }
-
-   return 0;
-}
-
-void __weak arch_teardown_msi_irqs(struct pci_dev *dev)
-{
-   int i;
-   struct msi_desc *entry;
-
-   for_each_pci_msi_entry(entry, dev)
-   if (entry->irq)
-   for (i = 0; i < entry->nvec_used; i++)
-   arch_teardown_msi_irq(entry->irq + i);
-}
-#endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */
-
-/*
- * PCI 2.3 does not specify mask bits for each MSI interrupt.  Attempting to
- * mask all MSI interrupts by clearing the MSI enable bit does not work
- * reliably as devices without an INTx disable bit will then generate a
- * level IRQ which will never be cleared.
- */
-static inline __attribute_const__ u32 msi_multi_mask(struct msi_desc *desc)
-{
-   /* Don't shift by >= width of type */
-   if (desc->pci.msi_attrib.multi_cap >= 5)
-   return 0x;
-   return (1 << (1 << desc->pci.msi_attrib.multi_cap)) - 1;
-}
-
-static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 
set)
-{
-   raw_spinlock_t *lock = >dev->msi_lock;
-   unsigned long flags;
-
-   if (!desc->pci.msi_attrib.can_mask)
-   return;
-
-   raw_spin_lock_irqsave(lock, flags);
-   desc->pci.msi_mask &= ~clear;
-   desc->pci.msi_mask |= set;
-   pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->pci.mask_pos,
-  desc->pci.msi_mask);
-   

[patch 14/22] PCI/MSI: Make msix_update_entries() smarter

2021-11-26 Thread Thomas Gleixner
No need to walk the descriptors and check for each one whether the entries
pointer function argument is NULL. Do it once.

Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -642,8 +642,8 @@ static void msix_update_entries(struct p
 {
struct msi_desc *entry;
 
-   for_each_pci_msi_entry(entry, dev) {
-   if (entries) {
+   if (entries) {
+   for_each_pci_msi_entry(entry, dev) {
entries->vector = entry->irq;
entries++;
}



[patch 13/22] PCI/MSI: Cleanup include zoo

2021-11-26 Thread Thomas Gleixner
Get rid of the pile of unneeded includes which accumulated over time.

Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi.c |   16 
 1 file changed, 4 insertions(+), 12 deletions(-)

--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -7,22 +7,14 @@
  * Copyright (C) 2016 Christoph Hellwig.
  */
 
+#include 
 #include 
-#include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
+#include 
 #include 
+#include 
 #include 
+#include 
 
 #include "pci.h"
 



[patch 12/22] PCI/MSI: Make arch_restore_msi_irqs() less horrible.

2021-11-26 Thread Thomas Gleixner
Make arch_restore_msi_irqs() return a boolean which indicates whether the
core code should restore the MSI message or not. Get rid of the indirection
in x86.

Signed-off-by: Thomas Gleixner 
Cc: Juergen Gross 
Cc: x...@kernel.org
Cc: xen-de...@lists.xenproject.org
Cc: Christian Borntraeger 
Cc: Heiko Carstens 
---
 arch/s390/pci/pci_irq.c   |4 +-
 arch/x86/include/asm/x86_init.h   |6 ---
 arch/x86/include/asm/xen/hypervisor.h |8 +
 arch/x86/kernel/apic/msi.c|6 +++
 arch/x86/kernel/x86_init.c|   12 ---
 arch/x86/pci/xen.c|   13 
 drivers/pci/msi.c |   54 +++---
 include/linux/msi.h   |7 +---
 8 files changed, 45 insertions(+), 65 deletions(-)

--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -387,13 +387,13 @@ void arch_teardown_msi_irqs(struct pci_d
airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, 
zdev->msi_nr_irqs);
 }
 
-void arch_restore_msi_irqs(struct pci_dev *pdev)
+bool arch_restore_msi_irqs(struct pci_dev *pdev)
 {
struct zpci_dev *zdev = to_zpci(pdev);
 
if (!zdev->irqs_registered)
zpci_set_irq(zdev);
-   default_restore_msi_irqs(pdev);
+   return true;
 }
 
 static struct airq_struct zpci_airq = {
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -289,12 +289,6 @@ struct x86_platform_ops {
struct x86_hyper_runtime hyper;
 };
 
-struct pci_dev;
-
-struct x86_msi_ops {
-   void (*restore_msi_irqs)(struct pci_dev *dev);
-};
-
 struct x86_apic_ops {
unsigned int(*io_apic_read)   (unsigned int apic, unsigned int reg);
void(*restore)(void);
--- a/arch/x86/include/asm/xen/hypervisor.h
+++ b/arch/x86/include/asm/xen/hypervisor.h
@@ -57,6 +57,14 @@ static inline bool __init xen_x2apic_par
 }
 #endif
 
+struct pci_dev;
+
+#ifdef CONFIG_XEN_DOM0
+bool xen_initdom_restore_msi(struct pci_dev *dev);
+#else
+static inline bool xen_initdom_restore_msi(struct pci_dev *dev) { return true; 
}
+#endif
+
 #ifdef CONFIG_HOTPLUG_CPU
 void xen_arch_register_cpu(int num);
 void xen_arch_unregister_cpu(int num);
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -345,3 +346,8 @@ void dmar_free_hwirq(int irq)
irq_domain_free_irqs(irq, 1);
 }
 #endif
+
+bool arch_restore_msi_irqs(struct pci_dev *dev)
+{
+   return xen_initdom_restore_msi(dev);
+}
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -145,18 +145,6 @@ struct x86_platform_ops x86_platform __r
 
 EXPORT_SYMBOL_GPL(x86_platform);
 
-#if defined(CONFIG_PCI_MSI)
-struct x86_msi_ops x86_msi __ro_after_init = {
-   .restore_msi_irqs   = default_restore_msi_irqs,
-};
-
-/* MSI arch specific hooks */
-void arch_restore_msi_irqs(struct pci_dev *dev)
-{
-   x86_msi.restore_msi_irqs(dev);
-}
-#endif
-
 struct x86_apic_ops x86_apic_ops __ro_after_init = {
.io_apic_read   = native_io_apic_read,
.restore= native_restore_boot_irq_mode,
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -351,10 +351,13 @@ static int xen_initdom_setup_msi_irqs(st
return ret;
 }
 
-static void xen_initdom_restore_msi_irqs(struct pci_dev *dev)
+bool xen_initdom_restore_msi(struct pci_dev *dev)
 {
int ret = 0;
 
+   if (!xen_initial_domain())
+   return true;
+
if (pci_seg_supported) {
struct physdev_pci_device restore_ext;
 
@@ -375,10 +378,10 @@ static void xen_initdom_restore_msi_irqs
ret = HYPERVISOR_physdev_op(PHYSDEVOP_restore_msi, );
WARN(ret && ret != -ENOSYS, "restore_msi -> %d\n", ret);
}
+   return false;
 }
 #else /* CONFIG_XEN_PV_DOM0 */
 #define xen_initdom_setup_msi_irqs NULL
-#define xen_initdom_restore_msi_irqs   NULL
 #endif /* !CONFIG_XEN_PV_DOM0 */
 
 static void xen_teardown_msi_irqs(struct pci_dev *dev)
@@ -466,12 +469,10 @@ static __init struct irq_domain *xen_cre
 static __init void xen_setup_pci_msi(void)
 {
if (xen_pv_domain()) {
-   if (xen_initial_domain()) {
+   if (xen_initial_domain())
xen_msi_ops.setup_msi_irqs = xen_initdom_setup_msi_irqs;
-   x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
-   } else {
+   else
xen_msi_ops.setup_msi_irqs = xen_setup_msi_irqs;
-   }
xen_msi_ops.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
pci_msi_ignore_mask = 1;
} else if (xen_hvm_domain()) {
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -106,29 +106,6 @@ void __weak arch_teardown_msi_irqs(struc
 }
 #endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */
 
-static void default_restore_msi_irq(struct pci_dev *dev, int irq)
-{
-   struct 

[patch 10/22] genirq/msi, treewide: Use a named struct for PCI/MSI attributes

2021-11-26 Thread Thomas Gleixner
The unnamed struct sucks and is in the way of further cleanups. Stick the
PCI related MSI data into a real data structure and cleanup all users.

No functional change.

Signed-off-by: Thomas Gleixner 
Cc: Greg Kroah-Hartman 
Cc: sparcli...@vger.kernel.org
Cc: x...@kernel.org
Cc: xen-de...@lists.xenproject.org
Cc: ath...@lists.infradead.org
---
 arch/powerpc/platforms/cell/axon_msi.c|2 
 arch/powerpc/platforms/powernv/pci-ioda.c |4 -
 arch/powerpc/platforms/pseries/msi.c  |6 -
 arch/sparc/kernel/pci_msi.c   |4 -
 arch/x86/kernel/apic/msi.c|2 
 arch/x86/pci/xen.c|6 -
 drivers/net/wireless/ath/ath11k/pci.c |2 
 drivers/pci/msi.c |  116 +++---
 drivers/pci/xen-pcifront.c|2 
 include/linux/msi.h   |   84 ++---
 kernel/irq/msi.c  |4 -
 11 files changed, 115 insertions(+), 117 deletions(-)

--- a/arch/powerpc/platforms/cell/axon_msi.c
+++ b/arch/powerpc/platforms/cell/axon_msi.c
@@ -212,7 +212,7 @@ static int setup_msi_msg_address(struct
entry = first_pci_msi_entry(dev);
 
for (; dn; dn = of_get_next_parent(dn)) {
-   if (entry->msi_attrib.is_64) {
+   if (entry->pci.msi_attrib.is_64) {
prop = of_get_property(dn, "msi-address-64", );
if (prop)
break;
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2154,10 +2154,10 @@ static void pnv_msi_compose_msg(struct i
int rc;
 
rc = __pnv_pci_ioda_msi_setup(phb, pdev, d->hwirq,
- entry->msi_attrib.is_64, msg);
+ entry->pci.msi_attrib.is_64, msg);
if (rc)
dev_err(>dev, "Failed to setup %s-bit MSI #%ld : %d\n",
-   entry->msi_attrib.is_64 ? "64" : "32", d->hwirq, rc);
+   entry->pci.msi_attrib.is_64 ? "64" : "32", d->hwirq, 
rc);
 }
 
 /*
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -332,7 +332,7 @@ static int check_msix_entries(struct pci
 
expected = 0;
for_each_pci_msi_entry(entry, pdev) {
-   if (entry->msi_attrib.entry_nr != expected) {
+   if (entry->pci.msi_attrib.entry_nr != expected) {
pr_debug("rtas_msi: bad MSI-X entries.\n");
return -EINVAL;
}
@@ -449,7 +449,7 @@ static int pseries_msi_ops_prepare(struc
 {
struct pci_dev *pdev = to_pci_dev(dev);
struct msi_desc *desc = first_pci_msi_entry(pdev);
-   int type = desc->msi_attrib.is_msix ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI;
+   int type = desc->pci.msi_attrib.is_msix ? PCI_CAP_ID_MSIX : 
PCI_CAP_ID_MSI;
 
return rtas_prepare_msi_irqs(pdev, nvec, type, arg);
 }
@@ -580,7 +580,7 @@ static int pseries_irq_domain_alloc(stru
int hwirq;
int i, ret;
 
-   hwirq = rtas_query_irq_number(pci_get_pdn(pdev), 
desc->msi_attrib.entry_nr);
+   hwirq = rtas_query_irq_number(pci_get_pdn(pdev), 
desc->pci.msi_attrib.entry_nr);
if (hwirq < 0) {
dev_err(>dev, "Failed to query HW IRQ: %d\n", hwirq);
return hwirq;
--- a/arch/sparc/kernel/pci_msi.c
+++ b/arch/sparc/kernel/pci_msi.c
@@ -146,13 +146,13 @@ static int sparc64_setup_msi_irq(unsigne
msiqid = pick_msiq(pbm);
 
err = ops->msi_setup(pbm, msiqid, msi,
-(entry->msi_attrib.is_64 ? 1 : 0));
+(entry->pci.msi_attrib.is_64 ? 1 : 0));
if (err)
goto out_msi_free;
 
pbm->msi_irq_table[msi - pbm->msi_first] = *irq_p;
 
-   if (entry->msi_attrib.is_64) {
+   if (entry->pci.msi_attrib.is_64) {
msg.address_hi = pbm->msi64_start >> 32;
msg.address_lo = pbm->msi64_start & 0x;
} else {
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -163,7 +163,7 @@ int pci_msi_prepare(struct irq_domain *d
struct msi_desc *desc = first_pci_msi_entry(pdev);
 
init_irq_alloc_info(arg, NULL);
-   if (desc->msi_attrib.is_msix) {
+   if (desc->pci.msi_attrib.is_msix) {
arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
} else {
arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -306,7 +306,7 @@ static int xen_initdom_setup_msi_irqs(st
return -EINVAL;
 
map_irq.table_base = pci_resource_start(dev, bir);
-   map_irq.entry_nr = msidesc->msi_attrib.entry_nr;
+   map_irq.entry_nr = msidesc->pci.msi_attrib.entry_nr;
}
 
ret = -EINVAL;
@@ -398,7 +398,7 

[patch 11/22] x86/hyperv: Refactor hv_msi_domain_free_irqs()

2021-11-26 Thread Thomas Gleixner
No point in looking up things over and over. Just look up the associated
irq data and work from there.

No functional change.

Signed-off-by: Thomas Gleixner 
Cc: Wei Liu 
Cc: x...@kernel.org
Cc: linux-hyp...@vger.kernel.org
---
 arch/x86/hyperv/irqdomain.c |   55 +---
 1 file changed, 17 insertions(+), 38 deletions(-)

--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -253,64 +253,43 @@ static int hv_unmap_msi_interrupt(struct
return hv_unmap_interrupt(hv_build_pci_dev_id(dev).as_uint64, 
old_entry);
 }
 
-static void hv_teardown_msi_irq_common(struct pci_dev *dev, struct msi_desc 
*msidesc, int irq)
+static void hv_teardown_msi_irq(struct pci_dev *dev, struct irq_data *irqd)
 {
-   u64 status;
struct hv_interrupt_entry old_entry;
-   struct irq_desc *desc;
-   struct irq_data *data;
struct msi_msg msg;
+   u64 status;
 
-   desc = irq_to_desc(irq);
-   if (!desc) {
-   pr_debug("%s: no irq desc\n", __func__);
-   return;
-   }
-
-   data = >irq_data;
-   if (!data) {
-   pr_debug("%s: no irq data\n", __func__);
-   return;
-   }
-
-   if (!data->chip_data) {
+   if (!irqd->chip_data) {
pr_debug("%s: no chip data\n!", __func__);
return;
}
 
-   old_entry = *(struct hv_interrupt_entry *)data->chip_data;
+   old_entry = *(struct hv_interrupt_entry *)irqd->chip_data;
entry_to_msi_msg(_entry, );
 
-   kfree(data->chip_data);
-   data->chip_data = NULL;
+   kfree(irqd->chip_data);
+   irqd->chip_data = NULL;
 
status = hv_unmap_msi_interrupt(dev, _entry);
 
-   if (status != HV_STATUS_SUCCESS) {
+   if (status != HV_STATUS_SUCCESS)
pr_err("%s: hypercall failed, status %lld\n", __func__, status);
-   return;
-   }
 }
 
-static void hv_msi_domain_free_irqs(struct irq_domain *domain, struct device 
*dev)
+static void hv_msi_free_irq(struct irq_domain *domain,
+   struct msi_domain_info *info, unsigned int virq)
 {
-   int i;
-   struct msi_desc *entry;
-   struct pci_dev *pdev;
+   struct irq_data *irqd = irq_get_irq_data(virq);
+   struct msi_desc *desc;
 
-   if (WARN_ON_ONCE(!dev_is_pci(dev)))
+   if (!irqd)
return;
 
-   pdev = to_pci_dev(dev);
+   desc = irq_data_get_msi_desc(irqd);
+   if (!desc || !desc->irq || WARN_ON_ONCE(!dev_is_pci(desc->dev)))
+   return;
 
-   for_each_pci_msi_entry(entry, pdev) {
-   if (entry->irq) {
-   for (i = 0; i < entry->nvec_used; i++) {
-   hv_teardown_msi_irq_common(pdev, entry, 
entry->irq + i);
-   irq_domain_free_irqs(entry->irq + i, 1);
-   }
-   }
-   }
+   hv_teardown_msi_irq(to_pci_dev(desc->dev), irqd);
 }
 
 /*
@@ -329,7 +308,7 @@ static struct irq_chip hv_pci_msi_contro
 };
 
 static struct msi_domain_ops pci_msi_domain_ops = {
-   .domain_free_irqs   = hv_msi_domain_free_irqs,
+   .msi_free   = hv_msi_free_irq,
.msi_prepare= pci_msi_prepare,
 };
 



[patch 08/22] PCI/sysfs: Use pci_irq_vector()

2021-11-26 Thread Thomas Gleixner
instead of fiddling with msi descriptors.

Signed-off-by: Thomas Gleixner 
---
 drivers/pci/pci-sysfs.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -62,11 +62,8 @@ static ssize_t irq_show(struct device *d
 * For MSI, show the first MSI IRQ; for all other cases including
 * MSI-X, show the legacy INTx IRQ.
 */
-   if (pdev->msi_enabled) {
-   struct msi_desc *desc = first_pci_msi_entry(pdev);
-
-   return sysfs_emit(buf, "%u\n", desc->irq);
-   }
+   if (pdev->msi_enabled)
+   return sysfs_emit(buf, "%u\n", pci_irq_vector(pdev, 0));
 #endif
 
return sysfs_emit(buf, "%u\n", pdev->irq);



[patch 09/22] MIPS: Octeon: Use arch_setup_msi_irq()

2021-11-26 Thread Thomas Gleixner
The core code provides the same loop code except for the MSI-X reject. Move
that to arch_setup_msi_irq() and remove the duplicated code.

No functional change.

Signed-off-by: Thomas Gleixner 
Cc: Thomas Bogendoerfer 
Cc: linux-m...@vger.kernel.org
---
 arch/mips/pci/msi-octeon.c |   32 +++-
 1 file changed, 3 insertions(+), 29 deletions(-)

--- a/arch/mips/pci/msi-octeon.c
+++ b/arch/mips/pci/msi-octeon.c
@@ -68,6 +68,9 @@ int arch_setup_msi_irq(struct pci_dev *d
u64 search_mask;
int index;
 
+   if (desc->pci.msi_attrib.is_msix)
+   return -EINVAL;
+
/*
 * Read the MSI config to figure out how many IRQs this device
 * wants.  Most devices only want 1, which will give
@@ -182,35 +185,6 @@ int arch_setup_msi_irq(struct pci_dev *d
return 0;
 }
 
-int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-{
-   struct msi_desc *entry;
-   int ret;
-
-   /*
-* MSI-X is not supported.
-*/
-   if (type == PCI_CAP_ID_MSIX)
-   return -EINVAL;
-
-   /*
-* If an architecture wants to support multiple MSI, it needs to
-* override arch_setup_msi_irqs()
-*/
-   if (type == PCI_CAP_ID_MSI && nvec > 1)
-   return 1;
-
-   for_each_pci_msi_entry(entry, dev) {
-   ret = arch_setup_msi_irq(dev, entry);
-   if (ret < 0)
-   return ret;
-   if (ret > 0)
-   return -ENOSPC;
-   }
-
-   return 0;
-}
-
 /**
  * Called when a device no longer needs its MSI interrupts. All
  * MSI interrupts for the device are freed.



[patch 07/22] PCI/MSI: Remove msi_desc_to_pci_sysdata()

2021-11-26 Thread Thomas Gleixner
Last user is gone long ago.

Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi.c   |8 
 include/linux/msi.h |5 -
 2 files changed, 13 deletions(-)

--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1253,14 +1253,6 @@ struct pci_dev *msi_desc_to_pci_dev(stru
 }
 EXPORT_SYMBOL(msi_desc_to_pci_dev);
 
-void *msi_desc_to_pci_sysdata(struct msi_desc *desc)
-{
-   struct pci_dev *dev = msi_desc_to_pci_dev(desc);
-
-   return dev->bus->sysdata;
-}
-EXPORT_SYMBOL_GPL(msi_desc_to_pci_sysdata);
-
 #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
 /**
  * pci_msi_domain_write_msg - Helper to write MSI message to PCI config space
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -217,13 +217,8 @@ static inline void msi_desc_set_iommu_co
for_each_msi_entry((desc), &(pdev)->dev)
 
 struct pci_dev *msi_desc_to_pci_dev(struct msi_desc *desc);
-void *msi_desc_to_pci_sysdata(struct msi_desc *desc);
 void pci_write_msi_msg(unsigned int irq, struct msi_msg *msg);
 #else /* CONFIG_PCI_MSI */
-static inline void *msi_desc_to_pci_sysdata(struct msi_desc *desc)
-{
-   return NULL;
-}
 static inline void pci_write_msi_msg(unsigned int irq, struct msi_msg *msg)
 {
 }



[patch 06/22] PCI/MSI: Make pci_msi_domain_write_msg() static

2021-11-26 Thread Thomas Gleixner
There is no point to have this function public as it is set by the PCI core
anyway when a PCI/MSI irqdomain is created.

Signed-off-by: Thomas Gleixner 
---
 drivers/irqchip/irq-gic-v2m.c|1 -
 drivers/irqchip/irq-gic-v3-its-pci-msi.c |1 -
 drivers/irqchip/irq-gic-v3-mbi.c |1 -
 drivers/pci/msi.c|2 +-
 include/linux/msi.h  |1 -
 5 files changed, 1 insertion(+), 5 deletions(-)

--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -88,7 +88,6 @@ static struct irq_chip gicv2m_msi_irq_ch
.irq_mask   = gicv2m_mask_msi_irq,
.irq_unmask = gicv2m_unmask_msi_irq,
.irq_eoi= irq_chip_eoi_parent,
-   .irq_write_msi_msg  = pci_msi_domain_write_msg,
 };
 
 static struct msi_domain_info gicv2m_msi_domain_info = {
--- a/drivers/irqchip/irq-gic-v3-its-pci-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
@@ -28,7 +28,6 @@ static struct irq_chip its_msi_irq_chip
.irq_unmask = its_unmask_msi_irq,
.irq_mask   = its_mask_msi_irq,
.irq_eoi= irq_chip_eoi_parent,
-   .irq_write_msi_msg  = pci_msi_domain_write_msg,
 };
 
 static int its_pci_msi_vec_count(struct pci_dev *pdev, void *data)
--- a/drivers/irqchip/irq-gic-v3-mbi.c
+++ b/drivers/irqchip/irq-gic-v3-mbi.c
@@ -171,7 +171,6 @@ static struct irq_chip mbi_msi_irq_chip
.irq_unmask = mbi_unmask_msi_irq,
.irq_eoi= irq_chip_eoi_parent,
.irq_compose_msi_msg= mbi_compose_msi_msg,
-   .irq_write_msi_msg  = pci_msi_domain_write_msg,
 };
 
 static struct msi_domain_info mbi_msi_domain_info = {
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1267,7 +1267,7 @@ EXPORT_SYMBOL_GPL(msi_desc_to_pci_sysdat
  * @irq_data:  Pointer to interrupt data of the MSI interrupt
  * @msg:   Pointer to the message
  */
-void pci_msi_domain_write_msg(struct irq_data *irq_data, struct msi_msg *msg)
+static void pci_msi_domain_write_msg(struct irq_data *irq_data, struct msi_msg 
*msg)
 {
struct msi_desc *desc = irq_data_get_msi_desc(irq_data);
 
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -454,7 +454,6 @@ void *platform_msi_get_host_data(struct
 #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
 
 #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
-void pci_msi_domain_write_msg(struct irq_data *irq_data, struct msi_msg *msg);
 struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
 struct msi_domain_info *info,
 struct irq_domain *parent);



[patch 04/22] genirq/msi: Remove unused domain callbacks

2021-11-26 Thread Thomas Gleixner
No users and there is no need to grow them.

Signed-off-by: Thomas Gleixner 
---
 include/linux/msi.h |   11 ---
 kernel/irq/msi.c|5 -
 2 files changed, 4 insertions(+), 12 deletions(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -304,7 +304,6 @@ struct msi_domain_info;
  * @msi_free:  Domain specific function to free a MSI interrupts
  * @msi_check: Callback for verification of the domain/info/dev data
  * @msi_prepare:   Prepare the allocation of the interrupts in the domain
- * @msi_finish:Optional callback to finalize the allocation
  * @set_desc:  Set the msi descriptor for an interrupt
  * @handle_error:  Optional error handler if the allocation fails
  * @domain_alloc_irqs: Optional function to override the default allocation
@@ -312,12 +311,11 @@ struct msi_domain_info;
  * @domain_free_irqs:  Optional function to override the default free
  * function.
  *
- * @get_hwirq, @msi_init and @msi_free are callbacks used by
- * msi_create_irq_domain() and related interfaces
+ * @get_hwirq, @msi_init and @msi_free are callbacks used by the underlying
+ * irqdomain.
  *
- * @msi_check, @msi_prepare, @msi_finish, @set_desc and @handle_error
- * are callbacks used by msi_domain_alloc_irqs() and related
- * interfaces which are based on msi_desc.
+ * @msi_check, @msi_prepare, @handle_error and @set_desc are callbacks used by
+ * msi_domain_alloc/free_irqs().
  *
  * @domain_alloc_irqs, @domain_free_irqs can be used to override the
  * default allocation/free functions (__msi_domain_alloc/free_irqs). This
@@ -351,7 +349,6 @@ struct msi_domain_ops {
int (*msi_prepare)(struct irq_domain *domain,
   struct device *dev, int nvec,
   msi_alloc_info_t *arg);
-   void(*msi_finish)(msi_alloc_info_t *arg, int retval);
void(*set_desc)(msi_alloc_info_t *arg,
struct msi_desc *desc);
int (*handle_error)(struct irq_domain *domain,
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -562,8 +562,6 @@ int __msi_domain_alloc_irqs(struct irq_d
ret = -ENOSPC;
if (ops->handle_error)
ret = ops->handle_error(domain, desc, ret);
-   if (ops->msi_finish)
-   ops->msi_finish(, ret);
return ret;
}
 
@@ -573,9 +571,6 @@ int __msi_domain_alloc_irqs(struct irq_d
}
}
 
-   if (ops->msi_finish)
-   ops->msi_finish(, 0);
-
can_reserve = msi_check_reservation_mode(domain, info, dev);
 
/*



[patch 05/22] genirq/msi: Fixup includes

2021-11-26 Thread Thomas Gleixner
Remove the kobject.h include from msi.h as it's not required and add a
sysfs.h include to the core code instead.

Signed-off-by: Thomas Gleixner 
---
 include/linux/msi.h |1 -
 kernel/irq/msi.c|1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -2,7 +2,6 @@
 #ifndef LINUX_MSI_H
 #define LINUX_MSI_H
 
-#include 
 #include 
 #include 
 
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "internals.h"



[patch 01/22] powerpc/4xx: Remove MSI support which never worked

2021-11-26 Thread Thomas Gleixner
This code is broken since day one. ppc4xx_setup_msi_irqs() has the
following gems:

 1) The handling of the result of msi_bitmap_alloc_hwirqs() is completely
broken:

When the result is greater than or equal 0 (bitmap allocation
successful) then the loop terminates and the function returns 0
(success) despite not having installed an interrupt.

When the result is less than 0 (bitmap allocation fails), it prints an
error message and continues to "work" with that error code which would
eventually end up in the MSI message data.

 2) On every invocation the file global pp4xx_msi::msi_virqs bitmap is
allocated thereby leaking the previous one.

IOW, this has never worked and for more than 10 years nobody cared. Remove
the gunk.

Fixes: 3fb7933850fa ("powerpc/4xx: Adding PCIe MSI support")
Fixes: 247540b03bfc ("powerpc/44x: Fix PCI MSI support for Maui APM821xx SoC 
and Bluestone board")
Signed-off-by: Thomas Gleixner 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/platforms/4xx/Makefile |1 
 arch/powerpc/platforms/4xx/msi.c|  281 
 arch/powerpc/sysdev/Kconfig |6 
 3 files changed, 288 deletions(-)

--- a/arch/powerpc/platforms/4xx/Makefile
+++ b/arch/powerpc/platforms/4xx/Makefile
@@ -3,6 +3,5 @@ obj-y   += uic.o machine_check.o
 obj-$(CONFIG_4xx_SOC)  += soc.o
 obj-$(CONFIG_PCI)  += pci.o
 obj-$(CONFIG_PPC4xx_HSTA_MSI)  += hsta_msi.o
-obj-$(CONFIG_PPC4xx_MSI)   += msi.o
 obj-$(CONFIG_PPC4xx_CPM)   += cpm.o
 obj-$(CONFIG_PPC4xx_GPIO)  += gpio.o
--- a/arch/powerpc/platforms/4xx/msi.c
+++ /dev/null
@@ -1,281 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * Adding PCI-E MSI support for PPC4XX SoCs.
- *
- * Copyright (c) 2010, Applied Micro Circuits Corporation
- * Authors:Tirumala R Marri 
- * Feng Kan 
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#define PEIH_TERMADH   0x00
-#define PEIH_TERMADL   0x08
-#define PEIH_MSIED 0x10
-#define PEIH_MSIMK 0x18
-#define PEIH_MSIASS0x20
-#define PEIH_FLUSH00x30
-#define PEIH_FLUSH10x38
-#define PEIH_CNTRST0x48
-
-static int msi_irqs;
-
-struct ppc4xx_msi {
-   u32 msi_addr_lo;
-   u32 msi_addr_hi;
-   void __iomem *msi_regs;
-   int *msi_virqs;
-   struct msi_bitmap bitmap;
-   struct device_node *msi_dev;
-};
-
-static struct ppc4xx_msi ppc4xx_msi;
-
-static int ppc4xx_msi_init_allocator(struct platform_device *dev,
-   struct ppc4xx_msi *msi_data)
-{
-   int err;
-
-   err = msi_bitmap_alloc(_data->bitmap, msi_irqs,
- dev->dev.of_node);
-   if (err)
-   return err;
-
-   err = msi_bitmap_reserve_dt_hwirqs(_data->bitmap);
-   if (err < 0) {
-   msi_bitmap_free(_data->bitmap);
-   return err;
-   }
-
-   return 0;
-}
-
-static int ppc4xx_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-{
-   int int_no = -ENOMEM;
-   unsigned int virq;
-   struct msi_msg msg;
-   struct msi_desc *entry;
-   struct ppc4xx_msi *msi_data = _msi;
-
-   dev_dbg(>dev, "PCIE-MSI:%s called. vec %x type %d\n",
-   __func__, nvec, type);
-   if (type == PCI_CAP_ID_MSIX)
-   pr_debug("ppc4xx msi: MSI-X untested, trying anyway.\n");
-
-   msi_data->msi_virqs = kmalloc_array(msi_irqs, sizeof(int), GFP_KERNEL);
-   if (!msi_data->msi_virqs)
-   return -ENOMEM;
-
-   for_each_pci_msi_entry(entry, dev) {
-   int_no = msi_bitmap_alloc_hwirqs(_data->bitmap, 1);
-   if (int_no >= 0)
-   break;
-   if (int_no < 0) {
-   pr_debug("%s: fail allocating msi interrupt\n",
-   __func__);
-   }
-   virq = irq_of_parse_and_map(msi_data->msi_dev, int_no);
-   if (!virq) {
-   dev_err(>dev, "%s: fail mapping irq\n", __func__);
-   msi_bitmap_free_hwirqs(_data->bitmap, int_no, 1);
-   return -ENOSPC;
-   }
-   dev_dbg(>dev, "%s: virq = %d\n", __func__, virq);
-
-   /* Setup msi address space */
-   msg.address_hi = msi_data->msi_addr_hi;
-   msg.address_lo = msi_data->msi_addr_lo;
-
-   irq_set_msi_desc(virq, entry);
-   msg.data = int_no;
-   pci_write_msi_msg(virq, );
-   }
-   return 0;
-}
-
-void ppc4xx_teardown_msi_irqs(struct pci_dev *dev)
-{
-   struct msi_desc *entry;
-   struct ppc4xx_msi *msi_data = _msi;
-   irq_hw_number_t hwirq;
-
-   dev_dbg(>dev, "PCIE-MSI: tearing down msi irqs\n");
-
-   

[patch 00/22] genirq/msi, PCI/MSI: Spring cleaning - Part 1

2021-11-26 Thread Thomas Gleixner
The [PCI] MSI code has gained quite some warts over time. A recent
discussion unearthed a shortcoming: the lack of support for expanding
PCI/MSI-X vectors after initialization of MSI-X.

PCI/MSI-X has no requirement to setup all vectors when MSI-X is enabled in
the device. The non-used vectors have just to be masked in the vector
table. For PCI/MSI this is not possible because the number of vectors
cannot be changed after initialization.

The PCI/MSI code, but also the core MSI irq domain code are built around
the assumption that all required vectors are installed at initialization
time and freed when the device is shut down by the driver.

Supporting dynamic expansion at least for MSI-X is important for VFIO so
that the host side interrupts for passthrough devices can be installed on
demand.

This is the first part of a large (total 101 patches) series which
refactors the [PCI]MSI infrastructure to make runtime expansion of MSI-X
vectors possible. The last part (10 patches) provide this functionality.

The first part is mostly a cleanup which consolidates code, moves the PCI
MSI code into a separate directory and splits it up into several parts.

No functional change intended except for patch 2/N which changes the
behaviour of pci_get_vector()/affinity() to get rid of the assumption that
the provided index is the "index" into the descriptor list instead of using
it as the actual MSI[X] index as seen by the hardware. This would break
users of sparse allocated MSI-X entries, but non of them use these
functions.

This series is based on 5.16-rc2 and also available via git:

 git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v1-part-1

For the curious who can't wait for the next part to arrive the full series
is available via:

 git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v1-part-4

Thanks,

tglx
---
 arch/powerpc/platforms/4xx/msi.c|  281 
 b/Documentation/driver-api/pci/pci.rst  |2 
 b/arch/mips/pci/msi-octeon.c|   32 -
 b/arch/powerpc/platforms/4xx/Makefile   |1 
 b/arch/powerpc/platforms/cell/axon_msi.c|2 
 b/arch/powerpc/platforms/powernv/pci-ioda.c |4 
 b/arch/powerpc/platforms/pseries/msi.c  |6 
 b/arch/powerpc/sysdev/Kconfig   |6 
 b/arch/s390/pci/pci_irq.c   |4 
 b/arch/sparc/kernel/pci_msi.c   |4 
 b/arch/x86/hyperv/irqdomain.c   |   55 --
 b/arch/x86/include/asm/x86_init.h   |6 
 b/arch/x86/include/asm/xen/hypervisor.h |8 
 b/arch/x86/kernel/apic/msi.c|8 
 b/arch/x86/kernel/x86_init.c|   12 
 b/arch/x86/pci/xen.c|   19 
 b/drivers/irqchip/irq-gic-v2m.c |1 
 b/drivers/irqchip/irq-gic-v3-its-pci-msi.c  |1 
 b/drivers/irqchip/irq-gic-v3-mbi.c  |1 
 b/drivers/net/wireless/ath/ath11k/pci.c |2 
 b/drivers/pci/Makefile  |3 
 b/drivers/pci/msi/Makefile  |7 
 b/drivers/pci/msi/irqdomain.c   |  267 +++
 b/drivers/pci/msi/legacy.c  |   79 +++
 b/drivers/pci/msi/msi.c |  645 
 b/drivers/pci/msi/msi.h |   39 +
 b/drivers/pci/msi/pcidev_msi.c  |   43 +
 b/drivers/pci/pci-sysfs.c   |7 
 b/drivers/pci/xen-pcifront.c|2 
 b/include/linux/msi.h   |  135 ++---
 b/include/linux/pci.h   |1 
 b/kernel/irq/msi.c  |   41 +
 32 files changed, 696 insertions(+), 1028 deletions(-)


[patch 03/22] genirq/msi: Guard sysfs code

2021-11-26 Thread Thomas Gleixner
No point in building unused code when CONFIG_SYSFS=n.

Signed-off-by: Thomas Gleixner 
---
 include/linux/msi.h |   10 ++
 kernel/irq/msi.c|2 ++
 2 files changed, 12 insertions(+)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -239,9 +239,19 @@ void __pci_write_msi_msg(struct msi_desc
 void pci_msi_mask_irq(struct irq_data *data);
 void pci_msi_unmask_irq(struct irq_data *data);
 
+#ifdef CONFIG_SYSFS
 const struct attribute_group **msi_populate_sysfs(struct device *dev);
 void msi_destroy_sysfs(struct device *dev,
   const struct attribute_group **msi_irq_groups);
+#else
+static inline const struct attribute_group **msi_populate_sysfs(struct device 
*dev)
+{
+   return NULL;
+}
+static inline void msi_destroy_sysfs(struct device *dev, const struct 
attribute_group **msi_irq_groups)
+{
+}
+#endif
 
 /*
  * The arch hooks to setup up msi irqs. Default functions are implemented
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -72,6 +72,7 @@ void get_cached_msi_msg(unsigned int irq
 }
 EXPORT_SYMBOL_GPL(get_cached_msi_msg);
 
+#ifdef CONFIG_SYSFS
 static ssize_t msi_mode_show(struct device *dev, struct device_attribute *attr,
 char *buf)
 {
@@ -204,6 +205,7 @@ void msi_destroy_sysfs(struct device *de
kfree(msi_irq_groups);
}
 }
+#endif
 
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
 static inline void irq_chip_write_msi_msg(struct irq_data *data,



[patch 02/22] PCI/MSI: Fix pci_irq_vector()/pci_irq_get_attinity()

2021-11-26 Thread Thomas Gleixner
pci_irq_vector() and pci_irq_get_affinity() use the list position to find the
MSI-X descriptor at a given index. That's correct for the normal case where
the entry number is the same as the list position.

But it's wrong for cases where MSI-X was allocated with an entries array
describing sparse entry numbers into the hardware message descriptor
table. That's inconsistent at best.

Make it always check the entry number because that's what the zero base
index really means. This change won't break existing users which use a
sparse entries array for allocation because these users retrieve the Linux
interrupt number from the entries array after allocation and none of them
uses pci_irq_vector() or pci_irq_get_affinity().

Fixes: aff171641d18 ("PCI: Provide sensible IRQ vector alloc/free routines")
Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi.c |   26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1187,19 +1187,24 @@ EXPORT_SYMBOL(pci_free_irq_vectors);
 
 /**
  * pci_irq_vector - return Linux IRQ number of a device vector
- * @dev: PCI device to operate on
- * @nr: device-relative interrupt vector index (0-based).
+ * @dev:   PCI device to operate on
+ * @nr:Interrupt vector index (0-based)
+ *
+ * @nr has the following meanings depending on the interrupt mode:
+ *   MSI-X:The index in the MSI-X vector table
+ *   MSI:  The index of the enabled MSI vectors
+ *   INTx: Must be 0
+ *
+ * Return: The Linux interrupt number or -EINVAl if @nr is out of range.
  */
 int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
 {
if (dev->msix_enabled) {
struct msi_desc *entry;
-   int i = 0;
 
for_each_pci_msi_entry(entry, dev) {
-   if (i == nr)
+   if (entry->msi_attrib.entry_nr == nr)
return entry->irq;
-   i++;
}
WARN_ON_ONCE(1);
return -EINVAL;
@@ -1223,17 +1228,22 @@ EXPORT_SYMBOL(pci_irq_vector);
  * pci_irq_get_affinity - return the affinity of a particular MSI vector
  * @dev:   PCI device to operate on
  * @nr:device-relative interrupt vector index (0-based).
+ *
+ * @nr has the following meanings depending on the interrupt mode:
+ *   MSI-X:The index in the MSI-X vector table
+ *   MSI:  The index of the enabled MSI vectors
+ *   INTx: Must be 0
+ *
+ * Return: A cpumask pointer or NULL if @nr is out of range
  */
 const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
 {
if (dev->msix_enabled) {
struct msi_desc *entry;
-   int i = 0;
 
for_each_pci_msi_entry(entry, dev) {
-   if (i == nr)
+   if (entry->msi_attrib.entry_nr == nr)
return >affinity->mask;
-   i++;
}
WARN_ON_ONCE(1);
return NULL;



Re: [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option

2021-11-26 Thread kernel test robot
Hi Hari,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.16-rc2 next-20211126]
[cannot apply to mpe/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Hari-Bathini/powerpc-handle-kdump-appropriately-with-crash_kexec_post_notifiers-option/20211126-021120
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-iss476-smp_defconfig 
(https://download.01.org/0day-ci/archive/20211127/202111270740.t4qbma4l-...@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/112b5fcac650e78c2130b7f43ef66d965e69623e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Hari-Bathini/powerpc-handle-kdump-appropriately-with-crash_kexec_post_notifiers-option/20211126-021120
git checkout 112b5fcac650e78c2130b7f43ef66d965e69623e
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
O=build_dir ARCH=powerpc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/smp.c: In function 'crash_smp_send_stop':
>> arch/powerpc/kernel/smp.c:645:27: error: passing argument 1 of 
>> 'smp_call_function' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
 645 | smp_call_function(crash_stop_this_cpu, NULL, 0);
 |   ^~~
 |   |
 |   void (*)(struct pt_regs *)
   In file included from include/linux/lockdep.h:14,
from include/linux/rcupdate.h:29,
from include/linux/rculist.h:11,
from include/linux/pid.h:5,
from include/linux/sched.h:14,
from include/linux/sched/mm.h:7,
from arch/powerpc/kernel/smp.c:18:
   include/linux/smp.h:149:40: note: expected 'smp_call_func_t' {aka 'void 
(*)(void *)'} but argument is of type 'void (*)(struct pt_regs *)'
 149 | void smp_call_function(smp_call_func_t func, void *info, int wait);
 |^~~~
   cc1: all warnings being treated as errors

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for HOTPLUG_CPU
   Depends on SMP && (PPC_PSERIES || PPC_PMAC || PPC_POWERNV || FSL_SOC_BOOKE
   Selected by
   - PM_SLEEP_SMP && SMP && (ARCH_SUSPEND_POSSIBLE || ARCH_HIBERNATION_POSSIBLE 
&& PM_SLEEP


vim +/smp_call_function +645 arch/powerpc/kernel/smp.c

   632  
   633  void crash_smp_send_stop(void)
   634  {
   635  static bool stopped = false;
   636  
   637  if (stopped)
   638  return;
   639  
   640  stopped = true;
   641  
   642  #ifdef CONFIG_NMI_IPI
   643  smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_stop_this_cpu, 
100);
   644  #else
 > 645  smp_call_function(crash_stop_this_cpu, NULL, 0);
   646  #endif /* CONFIG_NMI_IPI */
   647  }
   648  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


[PATCH v4 25/25] reboot: Remove pm_power_off_prepare()

2021-11-26 Thread Dmitry Osipenko
All pm_power_off_prepare() users were converted to sys-off handler API.
Remove the obsolete callback.

Signed-off-by: Dmitry Osipenko 
---
 include/linux/pm.h |  1 -
 kernel/reboot.c| 11 ---
 2 files changed, 12 deletions(-)

diff --git a/include/linux/pm.h b/include/linux/pm.h
index 1d8209c09686..d9bf1426f81e 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -20,7 +20,6 @@
  * Callbacks for platform drivers to implement.
  */
 extern void (*pm_power_off)(void);
-extern void (*pm_power_off_prepare)(void);
 
 struct device; /* we have a circular dep with device.h */
 #ifdef CONFIG_VT_CONSOLE_SLEEP
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 4884204f9a31..a832bb660040 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -48,13 +48,6 @@ int reboot_cpu;
 enum reboot_type reboot_type = BOOT_ACPI;
 int reboot_force;
 
-/*
- * If set, this is used for preparing the system to power off.
- */
-
-void (*pm_power_off_prepare)(void);
-EXPORT_SYMBOL_GPL(pm_power_off_prepare);
-
 /**
  * emergency_restart - reboot the system
  *
@@ -807,10 +800,6 @@ void do_kernel_power_off(void)
 
 static void do_kernel_power_off_prepare(void)
 {
-   /* legacy pm_power_off_prepare() is unchained and has highest priority 
*/
-   if (pm_power_off_prepare)
-   return pm_power_off_prepare();
-
blocking_notifier_call_chain(_off_handler_list, POWEROFF_PREPARE,
 NULL);
 }
-- 
2.33.1



[PATCH v4 24/25] regulator: pfuze100: Use devm_register_sys_off_handler()

2021-11-26 Thread Dmitry Osipenko
Use devm_register_sys_off_handler() that replaces global
pm_power_off_prepare variable and allows to register multiple
power-off handlers.

Acked-by: Mark Brown 
Signed-off-by: Dmitry Osipenko 
---
 drivers/regulator/pfuze100-regulator.c | 38 ++
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/drivers/regulator/pfuze100-regulator.c 
b/drivers/regulator/pfuze100-regulator.c
index d60d7d1b7fa2..2eca8d43a097 100644
--- a/drivers/regulator/pfuze100-regulator.c
+++ b/drivers/regulator/pfuze100-regulator.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,6 +77,7 @@ struct pfuze_chip {
struct pfuze_regulator regulator_descs[PFUZE100_MAX_REGULATOR];
struct regulator_dev *regulators[PFUZE100_MAX_REGULATOR];
struct pfuze_regulator *pfuze_regulators;
+   struct sys_off_handler sys_off;
 };
 
 static const int pfuze100_swbst[] = {
@@ -569,10 +571,10 @@ static inline struct device_node *match_of_node(int index)
return pfuze_matches[index].of_node;
 }
 
-static struct pfuze_chip *syspm_pfuze_chip;
-
-static void pfuze_power_off_prepare(void)
+static void pfuze_power_off_prepare(struct power_off_prep_data *data)
 {
+   struct pfuze_chip *syspm_pfuze_chip = data->cb_data;
+
dev_info(syspm_pfuze_chip->dev, "Configure standby mode for power off");
 
/* Switch from default mode: APS/APS to APS/Off */
@@ -611,24 +613,23 @@ static void pfuze_power_off_prepare(void)
 
 static int pfuze_power_off_prepare_init(struct pfuze_chip *pfuze_chip)
 {
+   int err;
+
if (pfuze_chip->chip_id != PFUZE100) {
dev_warn(pfuze_chip->dev, "Requested pm_power_off_prepare 
handler for not supported chip\n");
return -ENODEV;
}
 
-   if (pm_power_off_prepare) {
-   dev_warn(pfuze_chip->dev, "pm_power_off_prepare is already 
registered.\n");
-   return -EBUSY;
-   }
+   pfuze_chip->sys_off.power_off_prepare_cb = pfuze_power_off_prepare;
+   pfuze_chip->sys_off.cb_data = pfuze_chip;
 
-   if (syspm_pfuze_chip) {
-   dev_warn(pfuze_chip->dev, "syspm_pfuze_chip is already set.\n");
-   return -EBUSY;
+   err = devm_register_sys_off_handler(pfuze_chip->dev, 
_chip->sys_off);
+   if (err) {
+   dev_err(pfuze_chip->dev,
+   "failed to register sys-off handler: %d\n", err);
+   return err;
}
 
-   syspm_pfuze_chip = pfuze_chip;
-   pm_power_off_prepare = pfuze_power_off_prepare;
-
return 0;
 }
 
@@ -837,23 +838,12 @@ static int pfuze100_regulator_probe(struct i2c_client 
*client,
return 0;
 }
 
-static int pfuze100_regulator_remove(struct i2c_client *client)
-{
-   if (syspm_pfuze_chip) {
-   syspm_pfuze_chip = NULL;
-   pm_power_off_prepare = NULL;
-   }
-
-   return 0;
-}
-
 static struct i2c_driver pfuze_driver = {
.driver = {
.name = "pfuze100-regulator",
.of_match_table = pfuze_dt_ids,
},
.probe = pfuze100_regulator_probe,
-   .remove = pfuze100_regulator_remove,
 };
 module_i2c_driver(pfuze_driver);
 
-- 
2.33.1



[PATCH v4 23/25] ACPI: power: Switch to sys-off handler API

2021-11-26 Thread Dmitry Osipenko
Switch to sys-off API that replaces legacy pm_power_off callbacks.

Signed-off-by: Dmitry Osipenko 
---
 drivers/acpi/sleep.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index eaa47753b758..2e613fddd614 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -47,19 +47,11 @@ static void acpi_sleep_tts_switch(u32 acpi_state)
}
 }
 
-static int tts_notify_reboot(struct notifier_block *this,
-   unsigned long code, void *x)
+static void tts_reboot_prepare(struct reboot_prep_data *data)
 {
acpi_sleep_tts_switch(ACPI_STATE_S5);
-   return NOTIFY_DONE;
 }
 
-static struct notifier_block tts_notifier = {
-   .notifier_call  = tts_notify_reboot,
-   .next   = NULL,
-   .priority   = 0,
-};
-
 static int acpi_sleep_prepare(u32 acpi_state)
 {
 #ifdef CONFIG_ACPI_SLEEP
@@ -1020,7 +1012,7 @@ static void acpi_sleep_hibernate_setup(void)
 static inline void acpi_sleep_hibernate_setup(void) {}
 #endif /* !CONFIG_HIBERNATION */
 
-static void acpi_power_off_prepare(void)
+static void acpi_power_off_prepare(struct power_off_prep_data *data)
 {
/* Prepare to power off the system */
acpi_sleep_prepare(ACPI_STATE_S5);
@@ -1028,7 +1020,7 @@ static void acpi_power_off_prepare(void)
acpi_os_wait_events_complete();
 }
 
-static void acpi_power_off(void)
+static void acpi_power_off(struct power_off_data *data)
 {
/* acpi_sleep_prepare(ACPI_STATE_S5) should have already been called */
pr_debug("%s called\n", __func__);
@@ -1036,6 +1028,11 @@ static void acpi_power_off(void)
acpi_enter_sleep_state(ACPI_STATE_S5);
 }
 
+static struct sys_off_handler acpi_sys_off_handler = {
+   .power_off_priority = POWEROFF_PRIO_FIRMWARE,
+   .reboot_prepare_cb = tts_reboot_prepare,
+};
+
 int __init acpi_sleep_init(void)
 {
char supported[ACPI_S_STATE_COUNT * 3 + 1];
@@ -1052,8 +1049,8 @@ int __init acpi_sleep_init(void)
 
if (acpi_sleep_state_supported(ACPI_STATE_S5)) {
sleep_states[ACPI_STATE_S5] = 1;
-   pm_power_off_prepare = acpi_power_off_prepare;
-   pm_power_off = acpi_power_off;
+   acpi_sys_off_handler.power_off_cb = acpi_power_off;
+   acpi_sys_off_handler.power_off_prepare_cb = 
acpi_power_off_prepare;
} else {
acpi_no_s5 = true;
}
@@ -1069,6 +1066,6 @@ int __init acpi_sleep_init(void)
 * Register the tts_notifier to reboot notifier list so that the _TTS
 * object can also be evaluated when the system enters S5.
 */
-   register_reboot_notifier(_notifier);
+   register_sys_off_handler(_sys_off_handler);
return 0;
 }
-- 
2.33.1



[PATCH v4 22/25] memory: emif: Use kernel_can_power_off()

2021-11-26 Thread Dmitry Osipenko
Replace legacy pm_power_off with kernel_can_power_off() helper that
is aware about chained power-off handlers.

Signed-off-by: Dmitry Osipenko 
---
 drivers/memory/emif.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/memory/emif.c b/drivers/memory/emif.c
index 762d0c0f0716..cab10d5274a0 100644
--- a/drivers/memory/emif.c
+++ b/drivers/memory/emif.c
@@ -630,7 +630,7 @@ static irqreturn_t emif_threaded_isr(int irq, void *dev_id)
dev_emerg(emif->dev, "SDRAM temperature exceeds operating 
limit.. Needs shut down!!!\n");
 
/* If we have Power OFF ability, use it, else try restarting */
-   if (pm_power_off) {
+   if (kernel_can_power_off()) {
kernel_power_off();
} else {
WARN(1, "FIXME: NO pm_power_off!!! trying restart\n");
-- 
2.33.1



[PATCH v4 21/25] nds32: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Signed-off-by: Dmitry Osipenko 
---
 arch/nds32/kernel/process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/nds32/kernel/process.c b/arch/nds32/kernel/process.c
index 49fab9e39cbf..0936dcd7db1b 100644
--- a/arch/nds32/kernel/process.c
+++ b/arch/nds32/kernel/process.c
@@ -54,8 +54,7 @@ EXPORT_SYMBOL(machine_halt);
 
 void machine_power_off(void)
 {
-   if (pm_power_off)
-   pm_power_off();
+   do_kernel_power_off();
 }
 
 EXPORT_SYMBOL(machine_power_off);
-- 
2.33.1



[PATCH v4 20/25] mips: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Signed-off-by: Dmitry Osipenko 
---
 arch/mips/kernel/reset.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/mips/kernel/reset.c b/arch/mips/kernel/reset.c
index 6288780b779e..e7ce07b3e79b 100644
--- a/arch/mips/kernel/reset.c
+++ b/arch/mips/kernel/reset.c
@@ -114,8 +114,7 @@ void machine_halt(void)
 
 void machine_power_off(void)
 {
-   if (pm_power_off)
-   pm_power_off();
+   do_kernel_power_off();
 
 #ifdef CONFIG_SMP
preempt_disable();
-- 
2.33.1



[PATCH v4 19/25] ia64: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Signed-off-by: Dmitry Osipenko 
---
 arch/ia64/kernel/process.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 834df24a88f1..cee4d7db2143 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -599,8 +600,7 @@ machine_halt (void)
 void
 machine_power_off (void)
 {
-   if (pm_power_off)
-   pm_power_off();
+   do_kernel_power_off();
machine_halt();
 }
 
-- 
2.33.1



[PATCH v4 18/25] x86: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Signed-off-by: Dmitry Osipenko 
---
 arch/x86/kernel/reboot.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 0a40df66a40d..cd7d9416d81a 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -747,10 +747,10 @@ static void native_machine_halt(void)
 
 static void native_machine_power_off(void)
 {
-   if (pm_power_off) {
+   if (kernel_can_power_off()) {
if (!reboot_force)
machine_shutdown();
-   pm_power_off();
+   do_kernel_power_off();
}
/* A fallback in case there is no PM info available */
tboot_shutdown(TB_SHUTDOWN_HALT);
-- 
2.33.1



[PATCH v4 17/25] sh: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Signed-off-by: Dmitry Osipenko 
---
 arch/sh/kernel/reboot.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/sh/kernel/reboot.c b/arch/sh/kernel/reboot.c
index 5c33f036418b..e8eeedc9b182 100644
--- a/arch/sh/kernel/reboot.c
+++ b/arch/sh/kernel/reboot.c
@@ -46,8 +46,7 @@ static void native_machine_shutdown(void)
 
 static void native_machine_power_off(void)
 {
-   if (pm_power_off)
-   pm_power_off();
+   do_kernel_power_off();
 }
 
 static void native_machine_halt(void)
-- 
2.33.1



[PATCH v4 16/25] m68k: Switch to new sys-off handler API

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use
register_power_off_handler() that registers power-off handlers and
do_kernel_power_off() that invokes chained power-off handlers. Legacy
pm_power_off() will be removed once all drivers will be converted to
the new power-off API.

Normally arch code should adopt only the do_kernel_power_off() at first,
but m68k is a special case because it uses pm_power_off() "inside out",
i.e. pm_power_off() invokes machine_power_off() [in fact it does nothing],
while it's machine_power_off() that should invoke the pm_power_off(), and
thus, we can't convert platforms to the new API separately. There are only
two platforms changed here, so it's not a big deal.

Acked-by: Geert Uytterhoeven 
Signed-off-by: Dmitry Osipenko 
---
 arch/m68k/emu/natfeat.c | 3 ++-
 arch/m68k/include/asm/machdep.h | 1 -
 arch/m68k/kernel/process.c  | 5 ++---
 arch/m68k/kernel/setup_mm.c | 1 -
 arch/m68k/kernel/setup_no.c | 1 -
 arch/m68k/mac/config.c  | 4 +++-
 6 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/m68k/emu/natfeat.c b/arch/m68k/emu/natfeat.c
index 71b78ecee75c..b19dc00026d9 100644
--- a/arch/m68k/emu/natfeat.c
+++ b/arch/m68k/emu/natfeat.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -90,5 +91,5 @@ void __init nf_init(void)
pr_info("NatFeats found (%s, %lu.%lu)\n", buf, version >> 16,
version & 0x);
 
-   mach_power_off = nf_poweroff;
+   register_platform_power_off(nf_poweroff);
 }
diff --git a/arch/m68k/include/asm/machdep.h b/arch/m68k/include/asm/machdep.h
index 8fd80ef1b77e..8d8c3ee2069f 100644
--- a/arch/m68k/include/asm/machdep.h
+++ b/arch/m68k/include/asm/machdep.h
@@ -24,7 +24,6 @@ extern int (*mach_get_rtc_pll)(struct rtc_pll_info *);
 extern int (*mach_set_rtc_pll)(struct rtc_pll_info *);
 extern void (*mach_reset)( void );
 extern void (*mach_halt)( void );
-extern void (*mach_power_off)( void );
 extern unsigned long (*mach_hd_init) (unsigned long, unsigned long);
 extern void (*mach_hd_setup)(char *, int *);
 extern void (*mach_heartbeat) (int);
diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
index a6030dbaa089..e160a7c57bd3 100644
--- a/arch/m68k/kernel/process.c
+++ b/arch/m68k/kernel/process.c
@@ -67,12 +67,11 @@ void machine_halt(void)
 
 void machine_power_off(void)
 {
-   if (mach_power_off)
-   mach_power_off();
+   do_kernel_power_off();
for (;;);
 }
 
-void (*pm_power_off)(void) = machine_power_off;
+void (*pm_power_off)(void);
 EXPORT_SYMBOL(pm_power_off);
 
 void show_regs(struct pt_regs * regs)
diff --git a/arch/m68k/kernel/setup_mm.c b/arch/m68k/kernel/setup_mm.c
index 4b51bfd38e5f..50f4f120a4ff 100644
--- a/arch/m68k/kernel/setup_mm.c
+++ b/arch/m68k/kernel/setup_mm.c
@@ -98,7 +98,6 @@ EXPORT_SYMBOL(mach_get_rtc_pll);
 EXPORT_SYMBOL(mach_set_rtc_pll);
 void (*mach_reset)( void );
 void (*mach_halt)( void );
-void (*mach_power_off)( void );
 #ifdef CONFIG_HEARTBEAT
 void (*mach_heartbeat) (int);
 EXPORT_SYMBOL(mach_heartbeat);
diff --git a/arch/m68k/kernel/setup_no.c b/arch/m68k/kernel/setup_no.c
index 5e4104f07a44..00bf82258233 100644
--- a/arch/m68k/kernel/setup_no.c
+++ b/arch/m68k/kernel/setup_no.c
@@ -55,7 +55,6 @@ int (*mach_hwclk) (int, struct rtc_time*);
 /* machine dependent reboot functions */
 void (*mach_reset)(void);
 void (*mach_halt)(void);
-void (*mach_power_off)(void);
 
 #ifdef CONFIG_M68000
 #if defined(CONFIG_M68328)
diff --git a/arch/m68k/mac/config.c b/arch/m68k/mac/config.c
index 5d16f9b47aa9..727320dedf08 100644
--- a/arch/m68k/mac/config.c
+++ b/arch/m68k/mac/config.c
@@ -12,6 +12,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -139,7 +140,6 @@ void __init config_mac(void)
mach_hwclk = mac_hwclk;
mach_reset = mac_reset;
mach_halt = mac_poweroff;
-   mach_power_off = mac_poweroff;
 #if IS_ENABLED(CONFIG_INPUT_M68K_BEEP)
mach_beep = mac_mksound;
 #endif
@@ -159,6 +159,8 @@ void __init config_mac(void)
 
if (macintosh_config->ident == MAC_MODEL_IICI)
mach_l2_flush = via_l2_flush;
+
+   register_platform_power_off(mac_poweroff);
 }
 
 
-- 
2.33.1



[PATCH v4 15/25] powerpc: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Acked-by: Michael Ellerman 
Signed-off-by: Dmitry Osipenko 
---
 arch/powerpc/kernel/setup-common.c | 4 +---
 arch/powerpc/xmon/xmon.c   | 3 +--
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 4f1322b65760..71c4ccd9bbb1 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -161,9 +161,7 @@ void machine_restart(char *cmd)
 void machine_power_off(void)
 {
machine_shutdown();
-   if (pm_power_off)
-   pm_power_off();
-
+   do_kernel_power_off();
smp_send_stop();
machine_hang();
 }
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 83100c6524cc..759e167704e6 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1243,8 +1243,7 @@ static void bootcmds(void)
} else if (cmd == 'h') {
ppc_md.halt();
} else if (cmd == 'p') {
-   if (pm_power_off)
-   pm_power_off();
+   do_kernel_power_off();
}
 }
 
-- 
2.33.1



[PATCH v4 14/25] xen/x86: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Acked-by: Juergen Gross 
Signed-off-by: Dmitry Osipenko 
---
 arch/x86/xen/enlighten_pv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 5004feb16783..527fa545eb1f 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1068,8 +1069,7 @@ static void xen_machine_halt(void)
 
 static void xen_machine_power_off(void)
 {
-   if (pm_power_off)
-   pm_power_off();
+   do_kernel_power_off();
xen_reboot(SHUTDOWN_poweroff);
 }
 
-- 
2.33.1



[PATCH v4 13/25] parisc: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Acked-by: Helge Deller  # parisc
Signed-off-by: Dmitry Osipenko 
---
 arch/parisc/kernel/process.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index ea3d83b6fb62..928201b1f58f 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -114,8 +115,7 @@ void machine_power_off(void)
pdc_chassis_send_status(PDC_CHASSIS_DIRECT_SHUTDOWN);
 
/* ipmi_poweroff may have been installed. */
-   if (pm_power_off)
-   pm_power_off();
+   do_kernel_power_off();

/* It seems we have no way to power the system off via
 * software. The user has to press the button himself. */
-- 
2.33.1



[PATCH v4 12/25] arm64: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Acked-by: Catalin Marinas 
Signed-off-by: Dmitry Osipenko 
---
 arch/arm64/kernel/process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index aacf2f5559a8..f8db031afa7d 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -110,8 +110,7 @@ void machine_power_off(void)
 {
local_irq_disable();
smp_send_stop();
-   if (pm_power_off)
-   pm_power_off();
+   do_kernel_power_off();
 }
 
 /*
-- 
2.33.1



[PATCH v4 11/25] riscv: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Acked-by: Palmer Dabbelt 
Signed-off-by: Dmitry Osipenko 
---
 arch/riscv/kernel/reset.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/riscv/kernel/reset.c b/arch/riscv/kernel/reset.c
index 9c842c41684a..912288572226 100644
--- a/arch/riscv/kernel/reset.c
+++ b/arch/riscv/kernel/reset.c
@@ -23,16 +23,12 @@ void machine_restart(char *cmd)
 
 void machine_halt(void)
 {
-   if (pm_power_off != NULL)
-   pm_power_off();
-   else
-   default_power_off();
+   do_kernel_power_off();
+   default_power_off();
 }
 
 void machine_power_off(void)
 {
-   if (pm_power_off != NULL)
-   pm_power_off();
-   else
-   default_power_off();
+   do_kernel_power_off();
+   default_power_off();
 }
-- 
2.33.1



[PATCH v4 10/25] csky: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Acked-by: Guo Ren 
Signed-off-by: Dmitry Osipenko 
---
 arch/csky/kernel/power.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/csky/kernel/power.c b/arch/csky/kernel/power.c
index 923ee4e381b8..86ee202906f8 100644
--- a/arch/csky/kernel/power.c
+++ b/arch/csky/kernel/power.c
@@ -9,16 +9,14 @@ EXPORT_SYMBOL(pm_power_off);
 void machine_power_off(void)
 {
local_irq_disable();
-   if (pm_power_off)
-   pm_power_off();
+   do_kernel_power_off();
asm volatile ("bkpt");
 }
 
 void machine_halt(void)
 {
local_irq_disable();
-   if (pm_power_off)
-   pm_power_off();
+   do_kernel_power_off();
asm volatile ("bkpt");
 }
 
-- 
2.33.1



[PATCH v4 09/25] ARM: Use do_kernel_power_off()

2021-11-26 Thread Dmitry Osipenko
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new power-off API.

Reviewed-by: Russell King (Oracle) 
Signed-off-by: Dmitry Osipenko 
---
 arch/arm/kernel/reboot.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm/kernel/reboot.c b/arch/arm/kernel/reboot.c
index 3044fcb8d073..2cb943422554 100644
--- a/arch/arm/kernel/reboot.c
+++ b/arch/arm/kernel/reboot.c
@@ -116,9 +116,7 @@ void machine_power_off(void)
 {
local_irq_disable();
smp_send_stop();
-
-   if (pm_power_off)
-   pm_power_off();
+   do_kernel_power_off();
 }
 
 /*
-- 
2.33.1



[PATCH v4 08/25] kernel: Add combined power-off+restart handler call chain API

2021-11-26 Thread Dmitry Osipenko
SoC platforms often have multiple ways of how to perform system's
power-off and restart operations. Meanwhile today's kernel is limited to
a single option. Add combined power-off+restart handler call chain API,
which is inspired by the restart API. The new API provides both power-off
and restart functionality.

The old pm_power_off method will be kept around till all users are
converted to the new API.

Current restart API will be replaced by the new unified API since
new API is its superset. The restart functionality of the sys-off handler
API is built upon the existing restart-notifier APIs.

In order to ease conversion to the new API, convenient helpers are added
for the common use-cases. They will reduce amount of boilerplate code and
remove global variables. These helpers preserve old behaviour for cases
where only one power-off handler is expected, this is what all existing
drivers want, and thus, they could be easily converted to the new API.
Users of the new API should explicitly enable power-off chaining by
setting corresponding flag of the power_handler structure.

Signed-off-by: Dmitry Osipenko 
---
 include/linux/reboot.h   | 265 ++-
 kernel/power/hibernate.c |   2 +-
 kernel/reboot.c  | 536 ++-
 3 files changed, 795 insertions(+), 8 deletions(-)

diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index b7fa25726323..76799bb3a560 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -8,10 +8,35 @@
 
 struct device;
 
-#define SYS_DOWN   0x0001  /* Notify of system down */
-#define SYS_RESTARTSYS_DOWN
-#define SYS_HALT   0x0002  /* Notify of system halt */
-#define SYS_POWER_OFF  0x0003  /* Notify of system power off */
+enum reboot_prepare_mode {
+   SYS_DOWN = 1,   /* Notify of system down */
+   SYS_RESTART = SYS_DOWN,
+   SYS_HALT,   /* Notify of system halt */
+   SYS_POWER_OFF,  /* Notify of system power off */
+};
+
+/*
+ * Standard restart priority levels. Intended to be set in the
+ * sys_off_handler.restart_priority field.
+ *
+ * Use `RESTART_PRIO_ABC +- prio` style for additional levels.
+ *
+ * RESTART_PRIO_RESERVED:  Falls back to RESTART_PRIO_DEFAULT.
+ * Drivers may leave priority initialized
+ * to zero, to auto-set it to the default level.
+ *
+ * RESTART_PRIO_LOW:   Use this for handler of last resort.
+ *
+ * RESTART_PRIO_DEFAULT:   Use this for default/generic handler.
+ *
+ * RESTART_PRIO_HIGH:  Use this if you have multiple handlers and
+ * this handler has higher priority than the
+ * default handler.
+ */
+#define RESTART_PRIO_RESERVED  0
+#define RESTART_PRIO_LOW   8
+#define RESTART_PRIO_DEFAULT   128
+#define RESTART_PRIO_HIGH  192
 
 enum reboot_mode {
REBOOT_UNDEFINED = -1,
@@ -49,6 +74,237 @@ int register_restart_handler(struct notifier_block *);
 int unregister_restart_handler(struct notifier_block *);
 void do_kernel_restart(char *cmd);
 
+/*
+ * System power-off and restart API.
+ */
+
+/*
+ * Standard power-off priority levels. Intended to be set in the
+ * sys_off_handler.power_off_priority field.
+ *
+ * Use `POWEROFF_PRIO_ABC +- prio` style for additional levels.
+ *
+ * POWEROFF_PRIO_RESERVED: Falls back to POWEROFF_PRIO_DEFAULT.
+ * Drivers may leave priority initialized
+ * to zero, to auto-set it to the default level.
+ *
+ * POWEROFF_PRIO_PLATFORM: Intended to be used by platform-level handler.
+ * Has lowest priority since device drivers are
+ * expected to take over platform handler which
+ * doesn't allow further callback chaining.
+ *
+ * POWEROFF_PRIO_DEFAULT:  Use this for default/generic handler.
+ *
+ * POWEROFF_PRIO_FIRMWARE: Use this if handler uses firmware call.
+ * Has highest priority since firmware is expected
+ * to know best how to power-off hardware properly.
+ */
+#define POWEROFF_PRIO_RESERVED 0
+#define POWEROFF_PRIO_PLATFORM 1
+#define POWEROFF_PRIO_DEFAULT  128
+#define POWEROFF_PRIO_HIGH 192
+#define POWEROFF_PRIO_FIRMWARE 224
+
+enum poweroff_mode {
+   POWEROFF_NORMAL = 0,
+   POWEROFF_PREPARE,
+};
+
+/**
+ * struct power_off_data - Power-off callback argument
+ *
+ * @cb_data: Callback data.
+ */
+struct power_off_data {
+   void *cb_data;
+};
+
+/**
+ * struct power_off_prep_data - Power-off preparation callback argument
+ *
+ * @cb_data: Callback data.
+ */
+struct power_off_prep_data {
+   void *cb_data;
+};
+
+/**
+ * struct restart_data - Restart callback argument
+ *
+ * @cb_data: Callback data.
+ * @cmd: Restart command string.
+ * 

[PATCH v4 07/25] reboot: Remove extern annotation from function prototypes

2021-11-26 Thread Dmitry Osipenko
There is no need to annotate function prototypes with 'extern', it makes
code less readable. Remove unnecessary annotations from .

Signed-off-by: Dmitry Osipenko 
---
 include/linux/reboot.h | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index 7c288013a3ca..b7fa25726323 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -40,36 +40,36 @@ extern int reboot_cpu;
 extern int reboot_force;
 
 
-extern int register_reboot_notifier(struct notifier_block *);
-extern int unregister_reboot_notifier(struct notifier_block *);
+int register_reboot_notifier(struct notifier_block *);
+int unregister_reboot_notifier(struct notifier_block *);
 
-extern int devm_register_reboot_notifier(struct device *, struct 
notifier_block *);
+int devm_register_reboot_notifier(struct device *, struct notifier_block *);
 
-extern int register_restart_handler(struct notifier_block *);
-extern int unregister_restart_handler(struct notifier_block *);
-extern void do_kernel_restart(char *cmd);
+int register_restart_handler(struct notifier_block *);
+int unregister_restart_handler(struct notifier_block *);
+void do_kernel_restart(char *cmd);
 
 /*
  * Architecture-specific implementations of sys_reboot commands.
  */
 
-extern void migrate_to_reboot_cpu(void);
-extern void machine_restart(char *cmd);
-extern void machine_halt(void);
-extern void machine_power_off(void);
+void migrate_to_reboot_cpu(void);
+void machine_restart(char *cmd);
+void machine_halt(void);
+void machine_power_off(void);
 
-extern void machine_shutdown(void);
+void machine_shutdown(void);
 struct pt_regs;
-extern void machine_crash_shutdown(struct pt_regs *);
+void machine_crash_shutdown(struct pt_regs *);
 
 /*
  * Architecture independent implementations of sys_reboot commands.
  */
 
-extern void kernel_restart_prepare(char *cmd);
-extern void kernel_restart(char *cmd);
-extern void kernel_halt(void);
-extern void kernel_power_off(void);
+void kernel_restart_prepare(char *cmd);
+void kernel_restart(char *cmd);
+void kernel_halt(void);
+void kernel_power_off(void);
 
 extern int C_A_D; /* for sysctl */
 void ctrl_alt_del(void);
@@ -77,15 +77,15 @@ void ctrl_alt_del(void);
 #define POWEROFF_CMD_PATH_LEN  256
 extern char poweroff_cmd[POWEROFF_CMD_PATH_LEN];
 
-extern void orderly_poweroff(bool force);
-extern void orderly_reboot(void);
+void orderly_poweroff(bool force);
+void orderly_reboot(void);
 void hw_protection_shutdown(const char *reason, int ms_until_forced);
 
 /*
  * Emergency restart, callable from an interrupt handler.
  */
 
-extern void emergency_restart(void);
+void emergency_restart(void);
 #include 
 
 #endif /* _LINUX_REBOOT_H */
-- 
2.33.1



[PATCH v4 06/25] reboot: Warn if unregister_restart_handler() fails

2021-11-26 Thread Dmitry Osipenko
Emit warning if unregister_restart_handler() fails since it never should
fail. This will ease further API development by catching mistakes early.

Signed-off-by: Dmitry Osipenko 
---
 kernel/reboot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index e6659ae329f1..f0e7b9c13f6b 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -210,7 +210,7 @@ EXPORT_SYMBOL(register_restart_handler);
  */
 int unregister_restart_handler(struct notifier_block *nb)
 {
-   return atomic_notifier_chain_unregister(_handler_list, nb);
+   return WARN_ON(atomic_notifier_chain_unregister(_handler_list, 
nb));
 }
 EXPORT_SYMBOL(unregister_restart_handler);
 
-- 
2.33.1



[PATCH v4 05/25] reboot: Warn if restart handler has duplicated priority

2021-11-26 Thread Dmitry Osipenko
Add sanity check which ensures that there are no two restart handlers
registered with the same priority. Normally it's a direct sign of a
problem if two handlers use the same priority.

Signed-off-by: Dmitry Osipenko 
---
 kernel/reboot.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index 6bcc5d6a6572..e6659ae329f1 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -182,7 +182,20 @@ static ATOMIC_NOTIFIER_HEAD(restart_handler_list);
  */
 int register_restart_handler(struct notifier_block *nb)
 {
-   return atomic_notifier_chain_register(_handler_list, nb);
+   int ret;
+
+   ret = atomic_notifier_chain_register(_handler_list, nb);
+   if (ret)
+   return ret;
+
+   /*
+* Handler must have unique priority. Otherwise call order is
+* determined by registration order, which is unreliable.
+*/
+   WARN(!atomic_notifier_has_unique_priority(_handler_list, nb),
+"restart handler must have unique priority\n");
+
+   return 0;
 }
 EXPORT_SYMBOL(register_restart_handler);
 
-- 
2.33.1



[PATCH v4 04/25] reboot: Correct typo in a comment

2021-11-26 Thread Dmitry Osipenko
Correct s/implemenations/implementations/ in .

Signed-off-by: Dmitry Osipenko 
---
 include/linux/reboot.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index af907a3d68d1..7c288013a3ca 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -63,7 +63,7 @@ struct pt_regs;
 extern void machine_crash_shutdown(struct pt_regs *);
 
 /*
- * Architecture independent implemenations of sys_reboot commands.
+ * Architecture independent implementations of sys_reboot commands.
  */
 
 extern void kernel_restart_prepare(char *cmd);
-- 
2.33.1



[PATCH v4 03/25] notifier: Add atomic/blocking_notifier_has_unique_priority()

2021-11-26 Thread Dmitry Osipenko
Add atomic/blocking_notifier_has_unique_priority() helpers which return
true if given handler has unique priority.

Signed-off-by: Dmitry Osipenko 
---
 include/linux/notifier.h |  5 +++
 kernel/notifier.c| 69 
 2 files changed, 74 insertions(+)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 924c9d7c8e73..2c4036f225e1 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -175,6 +175,11 @@ int raw_notifier_call_chain_robust(struct 
raw_notifier_head *nh,
 
 bool blocking_notifier_call_chain_is_empty(struct blocking_notifier_head *nh);
 
+bool atomic_notifier_has_unique_priority(struct atomic_notifier_head *nh,
+   struct notifier_block *nb);
+bool blocking_notifier_has_unique_priority(struct blocking_notifier_head *nh,
+   struct notifier_block *nb);
+
 #define NOTIFY_DONE0x  /* Don't care */
 #define NOTIFY_OK  0x0001  /* Suits me */
 #define NOTIFY_STOP_MASK   0x8000  /* Don't call further */
diff --git a/kernel/notifier.c b/kernel/notifier.c
index b20cb7b9b1f0..7a325b742104 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -122,6 +122,19 @@ static int notifier_call_chain_robust(struct 
notifier_block **nl,
return ret;
 }
 
+static int notifier_has_unique_priority(struct notifier_block **nl,
+   struct notifier_block *n)
+{
+   while (*nl && (*nl)->priority >= n->priority) {
+   if ((*nl)->priority == n->priority && *nl != n)
+   return false;
+
+   nl = &((*nl)->next);
+   }
+
+   return true;
+}
+
 /*
  * Atomic notifier chain routines.  Registration and unregistration
  * use a spinlock, and call_chain is synchronized by RCU (no locks).
@@ -203,6 +216,30 @@ int atomic_notifier_call_chain(struct atomic_notifier_head 
*nh,
 EXPORT_SYMBOL_GPL(atomic_notifier_call_chain);
 NOKPROBE_SYMBOL(atomic_notifier_call_chain);
 
+/**
+ * atomic_notifier_has_unique_priority - Checks whether notifier's 
priority is unique
+ * @nh: Pointer to head of the atomic notifier chain
+ * @n: Entry in notifier chain to check
+ *
+ * Checks whether there is another notifier in the chain with the same 
priority.
+ * Must be called in process context.
+ *
+ * Returns true if priority is unique, false otherwise.
+ */
+bool atomic_notifier_has_unique_priority(struct atomic_notifier_head *nh,
+   struct notifier_block *n)
+{
+   unsigned long flags;
+   bool ret;
+
+   spin_lock_irqsave(>lock, flags);
+   ret = notifier_has_unique_priority(>head, n);
+   spin_unlock_irqrestore(>lock, flags);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(atomic_notifier_has_unique_priority);
+
 /*
  * Blocking notifier chain routines.  All access to the chain is
  * synchronized by an rwsem.
@@ -336,6 +373,38 @@ bool blocking_notifier_call_chain_is_empty(struct 
blocking_notifier_head *nh)
 }
 EXPORT_SYMBOL_GPL(blocking_notifier_call_chain_is_empty);
 
+/**
+ * blocking_notifier_has_unique_priority - Checks whether notifier's 
priority is unique
+ * @nh: Pointer to head of the blocking notifier chain
+ * @n: Entry in notifier chain to check
+ *
+ * Checks whether there is another notifier in the chain with the same 
priority.
+ * Must be called in process context.
+ *
+ * Returns true if priority is unique, false otherwise.
+ */
+bool blocking_notifier_has_unique_priority(struct blocking_notifier_head *nh,
+   struct notifier_block *n)
+{
+   bool ret;
+
+   /*
+* This code gets used during boot-up, when task switching is
+* not yet working and interrupts must remain disabled. At such
+* times we must not call down_read().
+*/
+   if (system_state != SYSTEM_BOOTING)
+   down_read(>rwsem);
+
+   ret = notifier_has_unique_priority(>head, n);
+
+   if (system_state != SYSTEM_BOOTING)
+   up_read(>rwsem);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(blocking_notifier_has_unique_priority);
+
 /*
  * Raw notifier chain routines.  There is no protection;
  * the caller must provide it.  Use at your own risk!
-- 
2.33.1



[PATCH v4 02/25] notifier: Add blocking_notifier_call_chain_is_empty()

2021-11-26 Thread Dmitry Osipenko
Add blocking_notifier_call_chain_is_empty() that returns true if call
chain is empty.

Signed-off-by: Dmitry Osipenko 
---
 include/linux/notifier.h |  2 ++
 kernel/notifier.c| 14 ++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 4b80a815b666..924c9d7c8e73 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -173,6 +173,8 @@ int blocking_notifier_call_chain_robust(struct 
blocking_notifier_head *nh,
 int raw_notifier_call_chain_robust(struct raw_notifier_head *nh,
unsigned long val_up, unsigned long val_down, void *v);
 
+bool blocking_notifier_call_chain_is_empty(struct blocking_notifier_head *nh);
+
 #define NOTIFY_DONE0x  /* Don't care */
 #define NOTIFY_OK  0x0001  /* Suits me */
 #define NOTIFY_STOP_MASK   0x8000  /* Don't call further */
diff --git a/kernel/notifier.c b/kernel/notifier.c
index b8251dc0bc0f..b20cb7b9b1f0 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -322,6 +322,20 @@ int blocking_notifier_call_chain(struct 
blocking_notifier_head *nh,
 }
 EXPORT_SYMBOL_GPL(blocking_notifier_call_chain);
 
+/**
+ * blocking_notifier_call_chain_is_empty - Check whether notifier chain is 
empty
+ * @nh: Pointer to head of the blocking notifier chain
+ *
+ * Checks whether notifier chain is empty.
+ *
+ * Returns true is notifier chain is empty, false otherwise.
+ */
+bool blocking_notifier_call_chain_is_empty(struct blocking_notifier_head *nh)
+{
+   return !rcu_access_pointer(nh->head);
+}
+EXPORT_SYMBOL_GPL(blocking_notifier_call_chain_is_empty);
+
 /*
  * Raw notifier chain routines.  There is no protection;
  * the caller must provide it.  Use at your own risk!
-- 
2.33.1



[PATCH v4 01/25] notifier: Remove extern annotation from function prototypes

2021-11-26 Thread Dmitry Osipenko
There is no need to annotate function prototypes with 'extern', it makes
code less readable. Remove unnecessary annotations from .

Signed-off-by: Dmitry Osipenko 
---
 include/linux/notifier.h | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 87069b8459af..4b80a815b666 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -90,7 +90,7 @@ struct srcu_notifier_head {
} while (0)
 
 /* srcu_notifier_heads must be cleaned up dynamically */
-extern void srcu_init_notifier_head(struct srcu_notifier_head *nh);
+void srcu_init_notifier_head(struct srcu_notifier_head *nh);
 #define srcu_cleanup_notifier_head(name)   \
cleanup_srcu_struct(&(name)->srcu);
 
@@ -141,36 +141,36 @@ extern void srcu_init_notifier_head(struct 
srcu_notifier_head *nh);
 
 #ifdef __KERNEL__
 
-extern int atomic_notifier_chain_register(struct atomic_notifier_head *nh,
+int atomic_notifier_chain_register(struct atomic_notifier_head *nh,
struct notifier_block *nb);
-extern int blocking_notifier_chain_register(struct blocking_notifier_head *nh,
+int blocking_notifier_chain_register(struct blocking_notifier_head *nh,
struct notifier_block *nb);
-extern int raw_notifier_chain_register(struct raw_notifier_head *nh,
+int raw_notifier_chain_register(struct raw_notifier_head *nh,
struct notifier_block *nb);
-extern int srcu_notifier_chain_register(struct srcu_notifier_head *nh,
+int srcu_notifier_chain_register(struct srcu_notifier_head *nh,
struct notifier_block *nb);
 
-extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
+int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
struct notifier_block *nb);
-extern int blocking_notifier_chain_unregister(struct blocking_notifier_head 
*nh,
+int blocking_notifier_chain_unregister(struct blocking_notifier_head *nh,
struct notifier_block *nb);
-extern int raw_notifier_chain_unregister(struct raw_notifier_head *nh,
+int raw_notifier_chain_unregister(struct raw_notifier_head *nh,
struct notifier_block *nb);
-extern int srcu_notifier_chain_unregister(struct srcu_notifier_head *nh,
+int srcu_notifier_chain_unregister(struct srcu_notifier_head *nh,
struct notifier_block *nb);
 
-extern int atomic_notifier_call_chain(struct atomic_notifier_head *nh,
+int atomic_notifier_call_chain(struct atomic_notifier_head *nh,
unsigned long val, void *v);
-extern int blocking_notifier_call_chain(struct blocking_notifier_head *nh,
+int blocking_notifier_call_chain(struct blocking_notifier_head *nh,
unsigned long val, void *v);
-extern int raw_notifier_call_chain(struct raw_notifier_head *nh,
+int raw_notifier_call_chain(struct raw_notifier_head *nh,
unsigned long val, void *v);
-extern int srcu_notifier_call_chain(struct srcu_notifier_head *nh,
+int srcu_notifier_call_chain(struct srcu_notifier_head *nh,
unsigned long val, void *v);
 
-extern int blocking_notifier_call_chain_robust(struct blocking_notifier_head 
*nh,
+int blocking_notifier_call_chain_robust(struct blocking_notifier_head *nh,
unsigned long val_up, unsigned long val_down, void *v);
-extern int raw_notifier_call_chain_robust(struct raw_notifier_head *nh,
+int raw_notifier_call_chain_robust(struct raw_notifier_head *nh,
unsigned long val_up, unsigned long val_down, void *v);
 
 #define NOTIFY_DONE0x  /* Don't care */
-- 
2.33.1



[PATCH v4 00/25] Introduce power-off+restart call chain API

2021-11-26 Thread Dmitry Osipenko
Problem
---

SoC devices require power-off call chaining functionality from kernel.
We have a widely used restart chaining provided by restart notifier API,
but nothing for power-off.

Solution


Introduce new API that provides both restart and power-off call chains.

Why combine restart with power-off? Because drivers often do both.
More practical to have API that provides both under the same roof.

The new API is designed with simplicity and extensibility in mind.
It's built upon the existing restart and reboot APIs. The simplicity
is in new helper functions that are convenient for drivers. The
extensibility is in the design that doesn't hardcode callback
arguments, making easy to add new parameters and remove old.

This is a third attempt to introduce the new API. First was made by
Guenter Roeck back in 2014, second was made by Thierry Reding in 2017.
In fact the work didn't stop and recently arm_pm_restart() was removed
from v5.14 kernel, which was a part of preparatory work started by
Guenter Roeck. I took into account experience and ideas from the
previous attempts, extended and polished them.

Adoption plan
-

This patchset introduces the new API. It also converts multiple drivers
and arch code to the new API to demonstrate how it all looks in practice.

The plan is:

1. Merge new API (patches 1-8). This API will co-exist with the old APIs.

2. Convert arch code to do_kernel_power_off() (patches 9-21).

3. Convert drivers and platform code to the new API.

4. Remove obsolete pm_power_off and pm_power_off_prepare variables.

5. Make restart-notifier API private to kernel/reboot.c once no users left.

It's fully implemented here:

[1] https://github.com/grate-driver/linux/commits/sys-off-handler

For now I'm sending only the first 25 base patches out of ~180. It's
preferable to squash 1-2, partially 3 and 4 points of the plan into a
single patchset to ease and speed up applying of the rest of the patches.
Majority of drivers and platform patches depend on the base, hence they
will come later (and per subsystem), once base will land.

All [1] patches are compile-tested. Tegra and x86 ACPI patches are tested
on hardware. The remaining should be covered by unit tests (unpublished).

Results
---

1. Devices can be powered off properly.

2. Global variables are removed from drivers.

3. Global pm_power_off and pm_power_off_prepare callback variables are
removed once all users are converted to the new API. The latter callback
is removed by patch #25 of this series.

4. Ambiguous call chain ordering is prohibited. See patch #5 which adds
verification of restart handlers priorities, ensuring that they are unique.

Changelog:

v4: - Made a very minor improvement to doc comments, clarifying couple
  default values.

- Corrected list of emails recipient by adding Linus, Sebastian,
  Philipp and more NDS people. Removed bouncing emails.

- Added acks that were given to v3.

v3: - Renamed power_handler to sys_off_handler as was suggested by
  Rafael Wysocki.

- Improved doc-comments as was suggested by Rafael Wysocki. Added more
  doc-comments.

- Implemented full set of 180 patches which convert whole kernel in
  accordance to the plan, see link [1] above. Slightly adjusted API to
  better suit for the remaining converted drivers.

  * Added unregister_sys_off_handler() that is handy for a couple old
platform drivers.

  * Dropped devm_register_trivial_restart_handler(), 'simple' variant
is enough to have.

- Improved "Add atomic/blocking_notifier_has_unique_priority()" patch,
  as was suggested by Andy Shevchenko. Also replaced down_write() with
  down_read() and factored out common notifier_has_unique_priority().

- Added stop_chain field to struct restart_data and reboot_prep_data
  after discovering couple drivers wanting that feature.

- Added acks that were given to v2.

v2: - Replaced standalone power-off call chain demo-API with the combined
  power-off+restart API because this is what drivers want. It's a more
  comprehensive solution.

- Converted multiple drivers and arch code to the new API. Suggested by
  Andy Shevchenko. I skimmed through the rest of drivers, verifying that
  new API suits them. The rest of the drivers will be converted once we
  will settle on the new API, otherwise will be too many patches here.

- v2 API doesn't expose notifier to users and require handlers to
  have unique priority. Suggested by Guenter Roeck.

- v2 API has power-off chaining disabled by default and require
  drivers to explicitly opt-in to the chaining. This preserves old
  behaviour for existing drivers once they are converted to the new
  API.

Dmitry Osipenko (25):
  notifier: Remove extern annotation from function prototypes
  notifier: Add blocking_notifier_call_chain_is_empty()
  notifier: Add atomic/blocking_notifier_has_unique_priority()
  

Re: [PATCH v2 9/9] powerpc: Simplify and move arch_randomize_brk()

2021-11-26 Thread kernel test robot
Hi Christophe,

I love your patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on hnaz-mm/master linus/master v5.16-rc2 next-20211126]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/Convert-powerpc-to-default-topdown-mmap-layout/20211125-162916
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc64-buildonly-randconfig-r006-20211125 
(https://download.01.org/0day-ci/archive/20211127/202111270342.b1y85fuz-...@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/554c475dfb73dc352708dff3589b55845b3dd751
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Christophe-Leroy/Convert-powerpc-to-default-topdown-mmap-layout/20211125-162916
git checkout 554c475dfb73dc352708dff3589b55845b3dd751
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
O=build_dir ARCH=powerpc SHELL=/bin/bash arch/powerpc/mm/book3s64/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> arch/powerpc/mm/book3s64/hash_utils.c:2077:15: warning: no previous 
>> prototype for 'arch_randomize_brk' [-Wmissing-prototypes]
2077 | unsigned long arch_randomize_brk(struct mm_struct *mm)
 |   ^~


vim +/arch_randomize_brk +2077 arch/powerpc/mm/book3s64/hash_utils.c

  2076  
> 2077  unsigned long arch_randomize_brk(struct mm_struct *mm)

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH] powerpc/code-patching: Relax verification of patchability

2021-11-26 Thread Sachin Sant


>  Running code patching self-tests ...
>  patch_instruction() called on invalid text address 0xe1011e58 from 
> test_code_patching+0x34/0xd6c
> 
> Reported-by: Sachin Sant 
> Reported-by: Stephen Rothwell 
> Cc: Nicholas Piggin 
> Fixes: 8b8a8f0ab3f5 ("powerpc/code-patching: Improve verification of 
> patchability")
> Signed-off-by: Christophe Leroy 
> ---
> arch/powerpc/lib/code-patching.c | 6 +-
> 1 file changed, 5 insertions(+), 1 deletion(-)
> 

This fixes the problem for me.

Tested-by: Sachin Sant 

Thanks
-Sachin



Re: [PATCH] powerpc/mm: Use refcount_t for refcount

2021-11-26 Thread Christophe Leroy




Le 09/08/2019 à 14:36, Michael Ellerman a écrit :

Chuhong Yuan  writes:

Reference counters are preferred to use refcount_t instead of
atomic_t.
This is because the implementation of refcount_t can prevent
overflows and detect possible use-after-free.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan 


Thanks.

We don't have a fast implementation of refcount_t, so I'm worried this
could cause a measurable performance regression.


Fast implementations have been removed by commit 
https://github.com/linuxppc/linux/commit/fb041bb7c0a918b95c6889fc965cdc4a75b4c0ca


It's now considered that the generic implementation is good enough for 
everybody.


However, this series doesn't apply anymore and needs rebase:

Applying: powerpc/mm: Use refcount_t for refcount
Using index info to reconstruct a base tree...
M   arch/powerpc/mm/book3s64/mmu_context.c
M   arch/powerpc/mm/book3s64/pgtable.c
M   arch/powerpc/mm/pgtable-frag.c
M   include/linux/mm_types.h
Falling back to patching base and 3-way merge...
Auto-merging include/linux/mm_types.h
CONFLICT (content): Merge conflict in include/linux/mm_types.h
Auto-merging arch/powerpc/mm/pgtable-frag.c
CONFLICT (content): Merge conflict in arch/powerpc/mm/pgtable-frag.c
Auto-merging arch/powerpc/mm/book3s64/pgtable.c
CONFLICT (content): Merge conflict in arch/powerpc/mm/book3s64/pgtable.c
Auto-merging arch/powerpc/mm/book3s64/mmu_context.c
Patch failed at 0001 powerpc/mm: Use refcount_t for refcount
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Thanks
Christophe




Did you benchmark it at all?

cheers


diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
b/arch/powerpc/mm/book3s64/mmu_context.c
index 2d0cb5ba9a47..f836fd5a6abc 100644
--- a/arch/powerpc/mm/book3s64/mmu_context.c
+++ b/arch/powerpc/mm/book3s64/mmu_context.c
@@ -231,7 +231,7 @@ static void pmd_frag_destroy(void *pmd_frag)
/* drop all the pending references */
count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
-   if (atomic_sub_and_test(PMD_FRAG_NR - count, >pt_frag_refcount)) {
+   if (refcount_sub_and_test(PMD_FRAG_NR - count, 
>pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
__free_page(page);
}
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 7d0e0d0d22c4..40056896ce4e 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -277,7 +277,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
return NULL;
}
  
-	atomic_set(>pt_frag_refcount, 1);

+   refcount_set(>pt_frag_refcount, 1);
  
  	ret = page_address(page);

/*
@@ -294,7 +294,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
 * count.
 */
if (likely(!mm->context.pmd_frag)) {
-   atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
+   refcount_set(>pt_frag_refcount, PMD_FRAG_NR);
mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
}
spin_unlock(>page_table_lock);
@@ -317,8 +317,7 @@ void pmd_fragment_free(unsigned long *pmd)
  {
struct page *page = virt_to_page(pmd);
  
-	BUG_ON(atomic_read(>pt_frag_refcount) <= 0);

-   if (atomic_dec_and_test(>pt_frag_refcount)) {
+   if (refcount_dec_and_test(>pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
__free_page(page);
}
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index a7b05214760c..4ef8231b677f 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -24,7 +24,7 @@ void pte_frag_destroy(void *pte_frag)
/* drop all the pending references */
count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
-   if (atomic_sub_and_test(PTE_FRAG_NR - count, >pt_frag_refcount)) {
+   if (refcount_sub_and_test(PTE_FRAG_NR - count, 
>pt_frag_refcount)) {
pgtable_page_dtor(page);
__free_page(page);
}
@@ -71,7 +71,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int 
kernel)
return NULL;
}
  
-	atomic_set(>pt_frag_refcount, 1);

+   refcount_set(>pt_frag_refcount, 1);
  
  	ret = page_address(page);

/*
@@ -87,7 +87,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int 
kernel)
 * count.
 */
if (likely(!pte_frag_get(>context))) {
-   atomic_set(>pt_frag_refcount, PTE_FRAG_NR);
+   refcount_set(>pt_frag_refcount, PTE_FRAG_NR);
pte_frag_set(>context, ret + PTE_FRAG_SIZE);
}

Re: [PATCH v5 1/3] powerpc/bitops: Use immediate operand when possible

2021-11-26 Thread LEROY Christophe
Hi Michael,

Any chance to get this series merged this cycle ?

Thanks
Christophe

Le 21/09/2021 à 17:09, Christophe Leroy a écrit :
> Today we get the following code generation for bitops like
> set or clear bit:
> 
>   c0009fe0:   39 40 08 00 li  r10,2048
>   c0009fe4:   7c e0 40 28 lwarx   r7,0,r8
>   c0009fe8:   7c e7 53 78 or  r7,r7,r10
>   c0009fec:   7c e0 41 2d stwcx.  r7,0,r8
> 
>   c000d568:   39 00 18 00 li  r8,6144
>   c000d56c:   7c c0 38 28 lwarx   r6,0,r7
>   c000d570:   7c c6 40 78 andcr6,r6,r8
>   c000d574:   7c c0 39 2d stwcx.  r6,0,r7
> 
> Most set bits are constant on lower 16 bits, so it can easily
> be replaced by the "immediate" version of the operation. Allow
> GCC to choose between the normal or immediate form.
> 
> For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
> when all bits to be cleared are consecutive.
> 
> On 64 bits we don't have any equivalent single operation for clearing,
> single bits or a few bits, we'd need two 'rldicl' so it is not
> worth it, the li/andc sequence is doing the same.
> 
> With this patch we get:
> 
>   c0009fe0:   7d 00 50 28 lwarx   r8,0,r10
>   c0009fe4:   61 08 08 00 ori r8,r8,2048
>   c0009fe8:   7d 00 51 2d stwcx.  r8,0,r10
> 
>   c000d558:   7c e0 40 28 lwarx   r7,0,r8
>   c000d55c:   54 e7 05 64 rlwinm  r7,r7,0,21,18
>   c000d560:   7c e0 41 2d stwcx.  r7,0,r8
> 
> On pmac32_defconfig, it reduces the text by approx 10 kbytes.
> 
> Signed-off-by: Christophe Leroy 
> Reviewed-by: Segher Boessenkool 
> ---
> v5: Fixed the argument of is_rlwinm_mask_valid() in test_and_clear_bits()
> 
> v4: Rebased
> 
> v3:
> - Using the mask validation proposed by Segher
> 
> v2:
> - Use "n" instead of "i" as constraint for the rlwinm mask
> - Improve mask verification to handle more than single bit masks
> 
> Signed-off-by: Christophe Leroy 
> ---
>   arch/powerpc/include/asm/bitops.h | 89 ---
>   1 file changed, 81 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/bitops.h 
> b/arch/powerpc/include/asm/bitops.h
> index 11847b6a244e..a05d8c62cbea 100644
> --- a/arch/powerpc/include/asm/bitops.h
> +++ b/arch/powerpc/include/asm/bitops.h
> @@ -71,19 +71,61 @@ static inline void fn(unsigned long mask, \
>   __asm__ __volatile__ (  \
>   prefix  \
>   "1:"PPC_LLARX "%0,0,%3,0\n" \
> - stringify_in_c(op) "%0,%0,%2\n" \
> + #op "%I2 %0,%0,%2\n"\
>   PPC_STLCX "%0,0,%3\n"   \
>   "bne- 1b\n" \
>   : "=" (old), "+m" (*p)\
> - : "r" (mask), "r" (p)   \
> + : "rK" (mask), "r" (p)  \
>   : "cc", "memory");  \
>   }
>   
>   DEFINE_BITOP(set_bits, or, "")
> -DEFINE_BITOP(clear_bits, andc, "")
> -DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER)
>   DEFINE_BITOP(change_bits, xor, "")
>   
> +static __always_inline bool is_rlwinm_mask_valid(unsigned long x)
> +{
> + if (!x)
> + return false;
> + if (x & 1)
> + x = ~x; // make the mask non-wrapping
> + x += x & -x;// adding the low set bit results in at most one bit set
> +
> + return !(x & (x - 1));
> +}
> +
> +#define DEFINE_CLROP(fn, prefix) \
> +static inline void fn(unsigned long mask, volatile unsigned long *_p)
> \
> +{\
> + unsigned long old;  \
> + unsigned long *p = (unsigned long *)_p; \
> + \
> + if (IS_ENABLED(CONFIG_PPC32) && \
> + __builtin_constant_p(mask) && is_rlwinm_mask_valid(~mask)) {\
> + asm volatile (  \
> + prefix  \
> + "1:""lwarx  %0,0,%3\n"  \
> + "rlwinm %0,%0,0,%2\n"   \
> + "stwcx. %0,0,%3\n"  \
> + "bne- 1b\n" \
> + : "=" (old), "+m" (*p)\
> + : "n" (~mask), "r" (p)  \
> + : "cc", "memory");  \
> + } else {\
> + asm volatile (  \
> + prefix  \
> 

Re: [PATCH v2] w1: Misuse of get_user()/put_user() reported by sparse

2021-11-26 Thread Greg Kroah-Hartman
On Fri, Nov 26, 2021 at 05:57:58PM +0100, Christophe Leroy wrote:
> 
> 
> Le 26/11/2021 à 17:54, Greg Kroah-Hartman a écrit :
> > On Fri, Nov 26, 2021 at 05:47:58PM +0100, Christophe Leroy wrote:
> > > sparse warnings: (new ones prefixed by >>)
> > > > > drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: sparse: incorrect type 
> > > > > in initializer (different address spaces) @@ expected char 
> > > > > [noderef] __user *_pu_addr @@ got char *buf @@
> > > drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: expected char 
> > > [noderef] __user *_pu_addr
> > > drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: got char *buf
> > > > > drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: sparse: incorrect type 
> > > > > in initializer (different address spaces) @@ expected char const 
> > > > > [noderef] __user *_gu_addr @@ got char const *buf @@
> > > drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: expected char 
> > > const [noderef] __user *_gu_addr
> > > drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: got char const *buf
> > > 
> > > The buffer buf is a failsafe buffer in kernel space, it's not user
> > > memory hence doesn't deserve the use of get_user() or put_user().
> > > 
> > > Access 'buf' content directly.
> > > 
> > > Reported-by: kernel test robot 
> > > Link: https://lore.kernel.org/lkml/20290526.k5vb7nwc-...@intel.com/T/
> > > Signed-off-by: Christophe Leroy 
> > > ---
> > > v2: Use sysfs_emit() and kstrtobool()
> > > ---
> > >   drivers/w1/slaves/w1_ds28e04.c | 25 +++--
> > >   1 file changed, 3 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/w1/slaves/w1_ds28e04.c 
> > > b/drivers/w1/slaves/w1_ds28e04.c
> > > index e4f336111edc..98f80f412cfd 100644
> > > --- a/drivers/w1/slaves/w1_ds28e04.c
> > > +++ b/drivers/w1/slaves/w1_ds28e04.c
> > > @@ -32,7 +32,7 @@ static int w1_strong_pullup = 1;
> > >   module_param_named(strong_pullup, w1_strong_pullup, int, 0);
> > >   /* enable/disable CRC checking on DS28E04-100 memory accesses */
> > > -static char w1_enable_crccheck = 1;
> > > +static bool w1_enable_crccheck = true;
> > >   #define W1_EEPROM_SIZE  512
> > >   #define W1_PAGE_COUNT   16
> > > @@ -339,32 +339,13 @@ static BIN_ATTR_RW(pio, 1);
> > >   static ssize_t crccheck_show(struct device *dev, struct 
> > > device_attribute *attr,
> > >char *buf)
> > >   {
> > > - if (put_user(w1_enable_crccheck + 0x30, buf))
> > > - return -EFAULT;
> > > -
> > > - return sizeof(w1_enable_crccheck);
> > > + return sysfs_emit(buf, "%d\n", w1_enable_crccheck);
> > >   }
> > >   static ssize_t crccheck_store(struct device *dev, struct 
> > > device_attribute *attr,
> > > const char *buf, size_t count)
> > >   {
> > > - char val;
> > > -
> > > - if (count != 1 || !buf)
> > > - return -EINVAL;
> > > -
> > > - if (get_user(val, buf))
> > > - return -EFAULT;
> > > -
> > > - /* convert to decimal */
> > > - val = val - 0x30;
> > > - if (val != 0 && val != 1)
> > > - return -EINVAL;
> > > -
> > > - /* set the new value */
> > > - w1_enable_crccheck = val;
> > > -
> > > - return sizeof(w1_enable_crccheck);
> > > + return kstrtobool(buf, _enable_crccheck) ? : count;
> > 
> > Please spell this line out, using ? : is unreadable at times.
> > 
> 
> You prefer something like:
> 
>   int err = kstrtobool(buf, _enable_crccheck);
> 
>   return err ? err : count;
> 
> 
> Or
> 
>   int err = kstrtobool(buf, _enable_crccheck);
> 
>   if (err)
>   return err;
> 
>   return count;

This one.  Write code for people to read first, compiler second.

thanks,

greg k-h


[PATCH v3] w1: Misuse of get_user()/put_user() reported by sparse

2021-11-26 Thread Christophe Leroy
sparse warnings: (new ones prefixed by >>)
>> drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: sparse: incorrect type in 
>> initializer (different address spaces) @@ expected char [noderef] __user 
>> *_pu_addr @@ got char *buf @@
   drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: expected char [noderef] 
__user *_pu_addr
   drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: got char *buf
>> drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: sparse: incorrect type in 
>> initializer (different address spaces) @@ expected char const [noderef] 
>> __user *_gu_addr @@ got char const *buf @@
   drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: expected char const 
[noderef] __user *_gu_addr
   drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: got char const *buf

The buffer buf is a failsafe buffer in kernel space, it's not user
memory hence doesn't deserve the use of get_user() or put_user().

Access 'buf' content directly.

Reported-by: kernel test robot 
Link: https://lore.kernel.org/lkml/20290526.k5vb7nwc-...@intel.com/T/
Signed-off-by: Christophe Leroy 
---
v3: Rewrite crccheck_store() more userfriendly

v2: Use sysfs_emit() and kstrtobool()
---
 drivers/w1/slaves/w1_ds28e04.c | 26 ++
 1 file changed, 6 insertions(+), 20 deletions(-)

diff --git a/drivers/w1/slaves/w1_ds28e04.c b/drivers/w1/slaves/w1_ds28e04.c
index e4f336111edc..6cef6e2edb89 100644
--- a/drivers/w1/slaves/w1_ds28e04.c
+++ b/drivers/w1/slaves/w1_ds28e04.c
@@ -32,7 +32,7 @@ static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
 /* enable/disable CRC checking on DS28E04-100 memory accesses */
-static char w1_enable_crccheck = 1;
+static bool w1_enable_crccheck = true;
 
 #define W1_EEPROM_SIZE 512
 #define W1_PAGE_COUNT  16
@@ -339,32 +339,18 @@ static BIN_ATTR_RW(pio, 1);
 static ssize_t crccheck_show(struct device *dev, struct device_attribute *attr,
 char *buf)
 {
-   if (put_user(w1_enable_crccheck + 0x30, buf))
-   return -EFAULT;
-
-   return sizeof(w1_enable_crccheck);
+   return sysfs_emit(buf, "%d\n", w1_enable_crccheck);
 }
 
 static ssize_t crccheck_store(struct device *dev, struct device_attribute 
*attr,
  const char *buf, size_t count)
 {
-   char val;
-
-   if (count != 1 || !buf)
-   return -EINVAL;
+   int err = kstrtobool(buf, _enable_crccheck);
 
-   if (get_user(val, buf))
-   return -EFAULT;
+   if (err)
+   return err;
 
-   /* convert to decimal */
-   val = val - 0x30;
-   if (val != 0 && val != 1)
-   return -EINVAL;
-
-   /* set the new value */
-   w1_enable_crccheck = val;
-
-   return sizeof(w1_enable_crccheck);
+   return count;
 }
 
 static DEVICE_ATTR_RW(crccheck);
-- 
2.33.1



Re: [PATCH v2] w1: Misuse of get_user()/put_user() reported by sparse

2021-11-26 Thread Christophe Leroy




Le 26/11/2021 à 17:54, Greg Kroah-Hartman a écrit :

On Fri, Nov 26, 2021 at 05:47:58PM +0100, Christophe Leroy wrote:

sparse warnings: (new ones prefixed by >>)

drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: sparse: incorrect type in 
initializer (different address spaces) @@ expected char [noderef] __user 
*_pu_addr @@ got char *buf @@

drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: expected char [noderef] 
__user *_pu_addr
drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: got char *buf

drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: sparse: incorrect type in 
initializer (different address spaces) @@ expected char const [noderef] 
__user *_gu_addr @@ got char const *buf @@

drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: expected char const 
[noderef] __user *_gu_addr
drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: got char const *buf

The buffer buf is a failsafe buffer in kernel space, it's not user
memory hence doesn't deserve the use of get_user() or put_user().

Access 'buf' content directly.

Reported-by: kernel test robot 
Link: https://lore.kernel.org/lkml/20290526.k5vb7nwc-...@intel.com/T/
Signed-off-by: Christophe Leroy 
---
v2: Use sysfs_emit() and kstrtobool()
---
  drivers/w1/slaves/w1_ds28e04.c | 25 +++--
  1 file changed, 3 insertions(+), 22 deletions(-)

diff --git a/drivers/w1/slaves/w1_ds28e04.c b/drivers/w1/slaves/w1_ds28e04.c
index e4f336111edc..98f80f412cfd 100644
--- a/drivers/w1/slaves/w1_ds28e04.c
+++ b/drivers/w1/slaves/w1_ds28e04.c
@@ -32,7 +32,7 @@ static int w1_strong_pullup = 1;
  module_param_named(strong_pullup, w1_strong_pullup, int, 0);
  
  /* enable/disable CRC checking on DS28E04-100 memory accesses */

-static char w1_enable_crccheck = 1;
+static bool w1_enable_crccheck = true;
  
  #define W1_EEPROM_SIZE		512

  #define W1_PAGE_COUNT 16
@@ -339,32 +339,13 @@ static BIN_ATTR_RW(pio, 1);
  static ssize_t crccheck_show(struct device *dev, struct device_attribute 
*attr,
 char *buf)
  {
-   if (put_user(w1_enable_crccheck + 0x30, buf))
-   return -EFAULT;
-
-   return sizeof(w1_enable_crccheck);
+   return sysfs_emit(buf, "%d\n", w1_enable_crccheck);
  }
  
  static ssize_t crccheck_store(struct device *dev, struct device_attribute *attr,

  const char *buf, size_t count)
  {
-   char val;
-
-   if (count != 1 || !buf)
-   return -EINVAL;
-
-   if (get_user(val, buf))
-   return -EFAULT;
-
-   /* convert to decimal */
-   val = val - 0x30;
-   if (val != 0 && val != 1)
-   return -EINVAL;
-
-   /* set the new value */
-   w1_enable_crccheck = val;
-
-   return sizeof(w1_enable_crccheck);
+   return kstrtobool(buf, _enable_crccheck) ? : count;


Please spell this line out, using ? : is unreadable at times.



You prefer something like:

int err = kstrtobool(buf, _enable_crccheck);

return err ? err : count;


Or

int err = kstrtobool(buf, _enable_crccheck);

if (err)
return err;

return count;

?


Re: [PATCH v2] w1: Misuse of get_user()/put_user() reported by sparse

2021-11-26 Thread Greg Kroah-Hartman
On Fri, Nov 26, 2021 at 05:47:58PM +0100, Christophe Leroy wrote:
> sparse warnings: (new ones prefixed by >>)
> >> drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: sparse: incorrect type in 
> >> initializer (different address spaces) @@ expected char [noderef] 
> >> __user *_pu_addr @@ got char *buf @@
>drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: expected char [noderef] 
> __user *_pu_addr
>drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: got char *buf
> >> drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: sparse: incorrect type in 
> >> initializer (different address spaces) @@ expected char const 
> >> [noderef] __user *_gu_addr @@ got char const *buf @@
>drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: expected char const 
> [noderef] __user *_gu_addr
>drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: got char const *buf
> 
> The buffer buf is a failsafe buffer in kernel space, it's not user
> memory hence doesn't deserve the use of get_user() or put_user().
> 
> Access 'buf' content directly.
> 
> Reported-by: kernel test robot 
> Link: https://lore.kernel.org/lkml/20290526.k5vb7nwc-...@intel.com/T/
> Signed-off-by: Christophe Leroy 
> ---
> v2: Use sysfs_emit() and kstrtobool()
> ---
>  drivers/w1/slaves/w1_ds28e04.c | 25 +++--
>  1 file changed, 3 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/w1/slaves/w1_ds28e04.c b/drivers/w1/slaves/w1_ds28e04.c
> index e4f336111edc..98f80f412cfd 100644
> --- a/drivers/w1/slaves/w1_ds28e04.c
> +++ b/drivers/w1/slaves/w1_ds28e04.c
> @@ -32,7 +32,7 @@ static int w1_strong_pullup = 1;
>  module_param_named(strong_pullup, w1_strong_pullup, int, 0);
>  
>  /* enable/disable CRC checking on DS28E04-100 memory accesses */
> -static char w1_enable_crccheck = 1;
> +static bool w1_enable_crccheck = true;
>  
>  #define W1_EEPROM_SIZE   512
>  #define W1_PAGE_COUNT16
> @@ -339,32 +339,13 @@ static BIN_ATTR_RW(pio, 1);
>  static ssize_t crccheck_show(struct device *dev, struct device_attribute 
> *attr,
>char *buf)
>  {
> - if (put_user(w1_enable_crccheck + 0x30, buf))
> - return -EFAULT;
> -
> - return sizeof(w1_enable_crccheck);
> + return sysfs_emit(buf, "%d\n", w1_enable_crccheck);
>  }
>  
>  static ssize_t crccheck_store(struct device *dev, struct device_attribute 
> *attr,
> const char *buf, size_t count)
>  {
> - char val;
> -
> - if (count != 1 || !buf)
> - return -EINVAL;
> -
> - if (get_user(val, buf))
> - return -EFAULT;
> -
> - /* convert to decimal */
> - val = val - 0x30;
> - if (val != 0 && val != 1)
> - return -EINVAL;
> -
> - /* set the new value */
> - w1_enable_crccheck = val;
> -
> - return sizeof(w1_enable_crccheck);
> + return kstrtobool(buf, _enable_crccheck) ? : count;

Please spell this line out, using ? : is unreadable at times.

thanks,

greg k-h


[PATCH v2] w1: Misuse of get_user()/put_user() reported by sparse

2021-11-26 Thread Christophe Leroy
sparse warnings: (new ones prefixed by >>)
>> drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: sparse: incorrect type in 
>> initializer (different address spaces) @@ expected char [noderef] __user 
>> *_pu_addr @@ got char *buf @@
   drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: expected char [noderef] 
__user *_pu_addr
   drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: got char *buf
>> drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: sparse: incorrect type in 
>> initializer (different address spaces) @@ expected char const [noderef] 
>> __user *_gu_addr @@ got char const *buf @@
   drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: expected char const 
[noderef] __user *_gu_addr
   drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: got char const *buf

The buffer buf is a failsafe buffer in kernel space, it's not user
memory hence doesn't deserve the use of get_user() or put_user().

Access 'buf' content directly.

Reported-by: kernel test robot 
Link: https://lore.kernel.org/lkml/20290526.k5vb7nwc-...@intel.com/T/
Signed-off-by: Christophe Leroy 
---
v2: Use sysfs_emit() and kstrtobool()
---
 drivers/w1/slaves/w1_ds28e04.c | 25 +++--
 1 file changed, 3 insertions(+), 22 deletions(-)

diff --git a/drivers/w1/slaves/w1_ds28e04.c b/drivers/w1/slaves/w1_ds28e04.c
index e4f336111edc..98f80f412cfd 100644
--- a/drivers/w1/slaves/w1_ds28e04.c
+++ b/drivers/w1/slaves/w1_ds28e04.c
@@ -32,7 +32,7 @@ static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
 /* enable/disable CRC checking on DS28E04-100 memory accesses */
-static char w1_enable_crccheck = 1;
+static bool w1_enable_crccheck = true;
 
 #define W1_EEPROM_SIZE 512
 #define W1_PAGE_COUNT  16
@@ -339,32 +339,13 @@ static BIN_ATTR_RW(pio, 1);
 static ssize_t crccheck_show(struct device *dev, struct device_attribute *attr,
 char *buf)
 {
-   if (put_user(w1_enable_crccheck + 0x30, buf))
-   return -EFAULT;
-
-   return sizeof(w1_enable_crccheck);
+   return sysfs_emit(buf, "%d\n", w1_enable_crccheck);
 }
 
 static ssize_t crccheck_store(struct device *dev, struct device_attribute 
*attr,
  const char *buf, size_t count)
 {
-   char val;
-
-   if (count != 1 || !buf)
-   return -EINVAL;
-
-   if (get_user(val, buf))
-   return -EFAULT;
-
-   /* convert to decimal */
-   val = val - 0x30;
-   if (val != 0 && val != 1)
-   return -EINVAL;
-
-   /* set the new value */
-   w1_enable_crccheck = val;
-
-   return sizeof(w1_enable_crccheck);
+   return kstrtobool(buf, _enable_crccheck) ? : count;
 }
 
 static DEVICE_ATTR_RW(crccheck);
-- 
2.33.1



Re: [PATCH] powerpc: mm: radix_tlb: rearrange the if-else block

2021-11-26 Thread Christophe Leroy




Le 26/11/2021 à 16:46, Nathan Chancellor a écrit :

On Fri, Nov 26, 2021 at 02:59:29PM +0100, Arnd Bergmann wrote:

On Fri, Nov 26, 2021 at 2:43 PM Christophe Leroy
 wrote:

Le 25/11/2021 à 16:44, Anders Roxell a écrit :
Can't you fix CLANG instead :) ?

Or just add an else to the IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) that
sets hstart and hend to 0 ?


That doesn't sound any less risky than duplicating the code, it can lead to
incorrect changes just as easily if a patch ends up actually flushing at the
wrong address, and the compiler fails to complain because of the bogus
initialization.


Or just put hstart and hend calculation outside the IS_ENABLED() ? After
all GCC should drop the calculation when not used.


I like this one. I'm still unsure how clang can get so confused about whether
the variables are initialized or not, usually it handles this much better than
gcc. My best guess is that one of the memory clobbers makes it conclude
that 'hflush' can be true when it gets written to by an inline asm.


As far as I am aware, clang's analysis does not evaluate variables when
generating a control flow graph and using that for static analysis:

https://godbolt.org/z/PdGxoq9j7

Based on the control flow graph, it knows that hstart and hend are
uninitialized because IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) gets
expanded to 0 by the preprocessor but it does not seem like it can piece
together that hflush's value of false is only changed to true under the
now 'if (0) {' branch, meaning that all the calls to __tlbiel_va_range()
never get evaluated. That may or may not be easy to fix in clang but we
run into issues like this so infrequently.

At any rate, the below diff works for me.

Cheers,
Nathan

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index 7724af19ed7e..156a631df976 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -1174,12 +1174,10 @@ static inline void __radix__flush_tlb_range(struct 
mm_struct *mm,
bool hflush = false;
unsigned long hstart, hend;
  
-		if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {

-   hstart = (start + PMD_SIZE - 1) & PMD_MASK;
-   hend = end & PMD_MASK;
-   if (hstart < hend)
-   hflush = true;
-   }
+   hstart = (start + PMD_SIZE - 1) & PMD_MASK;
+   hend = end & PMD_MASK;
+   if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hstart < hend)
+   hflush = true;


Yes I like that much better.

Maybe even better with

hflush = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hstart < hend;

(And remove default false value at declaration).

  
  		if (type == FLUSH_TYPE_LOCAL) {

asm volatile("ptesync": : :"memory");



Re: [PATCH] w1: Misuse of get_user()/put_user() reported by sparse

2021-11-26 Thread Greg Kroah-Hartman
On Fri, Nov 26, 2021 at 05:10:46PM +0100, Christophe Leroy wrote:
> 
> 
> Le 26/11/2021 à 17:00, Greg Kroah-Hartman a écrit :
> > On Fri, Nov 19, 2021 at 10:15:09AM +0100, Christophe Leroy wrote:
> > > sparse warnings: (new ones prefixed by >>)
> > > > > drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: sparse: incorrect type 
> > > > > in initializer (different address spaces) @@ expected char 
> > > > > [noderef] __user *_pu_addr @@ got char *buf @@
> > > drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: expected char 
> > > [noderef] __user *_pu_addr
> > > drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: got char *buf
> > > > > drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: sparse: incorrect type 
> > > > > in initializer (different address spaces) @@ expected char const 
> > > > > [noderef] __user *_gu_addr @@ got char const *buf @@
> > > drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: expected char 
> > > const [noderef] __user *_gu_addr
> > > drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: got char const *buf
> > > 
> > > The buffer buf is a failsafe buffer in kernel space, it's not user
> > > memory hence doesn't deserve the use of get_user() or put_user().
> > > 
> > > Access 'buf' content directly.
> > > 
> > > Reported-by: kernel test robot 
> > > Link: https://lore.kernel.org/lkml/20290526.k5vb7nwc-...@intel.com/T/
> > > Signed-off-by: Christophe Leroy 
> > > ---
> > >   drivers/w1/slaves/w1_ds28e04.c | 10 ++
> > >   1 file changed, 2 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/w1/slaves/w1_ds28e04.c 
> > > b/drivers/w1/slaves/w1_ds28e04.c
> > > index e4f336111edc..d75bb16fb7a1 100644
> > > --- a/drivers/w1/slaves/w1_ds28e04.c
> > > +++ b/drivers/w1/slaves/w1_ds28e04.c
> > > @@ -339,10 +339,7 @@ static BIN_ATTR_RW(pio, 1);
> > >   static ssize_t crccheck_show(struct device *dev, struct 
> > > device_attribute *attr,
> > >char *buf)
> > >   {
> > > - if (put_user(w1_enable_crccheck + 0x30, buf))
> > > - return -EFAULT;
> > > -
> > > - return sizeof(w1_enable_crccheck);
> > > + return sprintf(buf, "%d", w1_enable_crccheck);
> > 
> > This should be sysfs_emit(), right?
> 
> Ok
> 
> > 
> > >   }
> > >   static ssize_t crccheck_store(struct device *dev, struct 
> > > device_attribute *attr,
> > > @@ -353,11 +350,8 @@ static ssize_t crccheck_store(struct device *dev, 
> > > struct device_attribute *attr,
> > >   if (count != 1 || !buf)
> > >   return -EINVAL;
> > > - if (get_user(val, buf))
> > > - return -EFAULT;
> > > -
> > >   /* convert to decimal */
> > > - val = val - 0x30;
> > > + val = *buf - 0x30;
> > 
> > Why not use a proper function that can parse a string and turn it into a
> > number?
> 
> I wanted to keep the change minimal. But I can also replace it with some
> scanf.
> 
> But don't we have any generic function to read and store a bool after all ?

Yes we do, please use kstrtobool().

thanks,

greg k-h


Re: [PATCH] w1: Misuse of get_user()/put_user() reported by sparse

2021-11-26 Thread Christophe Leroy




Le 26/11/2021 à 17:00, Greg Kroah-Hartman a écrit :

On Fri, Nov 19, 2021 at 10:15:09AM +0100, Christophe Leroy wrote:

sparse warnings: (new ones prefixed by >>)

drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: sparse: incorrect type in 
initializer (different address spaces) @@ expected char [noderef] __user 
*_pu_addr @@ got char *buf @@

drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: expected char [noderef] 
__user *_pu_addr
drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: got char *buf

drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: sparse: incorrect type in 
initializer (different address spaces) @@ expected char const [noderef] 
__user *_gu_addr @@ got char const *buf @@

drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: expected char const 
[noderef] __user *_gu_addr
drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: got char const *buf

The buffer buf is a failsafe buffer in kernel space, it's not user
memory hence doesn't deserve the use of get_user() or put_user().

Access 'buf' content directly.

Reported-by: kernel test robot 
Link: https://lore.kernel.org/lkml/20290526.k5vb7nwc-...@intel.com/T/
Signed-off-by: Christophe Leroy 
---
  drivers/w1/slaves/w1_ds28e04.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/w1/slaves/w1_ds28e04.c b/drivers/w1/slaves/w1_ds28e04.c
index e4f336111edc..d75bb16fb7a1 100644
--- a/drivers/w1/slaves/w1_ds28e04.c
+++ b/drivers/w1/slaves/w1_ds28e04.c
@@ -339,10 +339,7 @@ static BIN_ATTR_RW(pio, 1);
  static ssize_t crccheck_show(struct device *dev, struct device_attribute 
*attr,
 char *buf)
  {
-   if (put_user(w1_enable_crccheck + 0x30, buf))
-   return -EFAULT;
-
-   return sizeof(w1_enable_crccheck);
+   return sprintf(buf, "%d", w1_enable_crccheck);


This should be sysfs_emit(), right?


Ok




  }
  
  static ssize_t crccheck_store(struct device *dev, struct device_attribute *attr,

@@ -353,11 +350,8 @@ static ssize_t crccheck_store(struct device *dev, struct 
device_attribute *attr,
if (count != 1 || !buf)
return -EINVAL;
  
-	if (get_user(val, buf))

-   return -EFAULT;
-
/* convert to decimal */
-   val = val - 0x30;
+   val = *buf - 0x30;


Why not use a proper function that can parse a string and turn it into a
number?


I wanted to keep the change minimal. But I can also replace it with some 
scanf.


But don't we have any generic function to read and store a bool after all ?

Thanks
Christophe


Re: [PATCH] w1: Misuse of get_user()/put_user() reported by sparse

2021-11-26 Thread Greg Kroah-Hartman
On Fri, Nov 19, 2021 at 10:15:09AM +0100, Christophe Leroy wrote:
> sparse warnings: (new ones prefixed by >>)
> >> drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: sparse: incorrect type in 
> >> initializer (different address spaces) @@ expected char [noderef] 
> >> __user *_pu_addr @@ got char *buf @@
>drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: expected char [noderef] 
> __user *_pu_addr
>drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: got char *buf
> >> drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: sparse: incorrect type in 
> >> initializer (different address spaces) @@ expected char const 
> >> [noderef] __user *_gu_addr @@ got char const *buf @@
>drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: expected char const 
> [noderef] __user *_gu_addr
>drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: got char const *buf
> 
> The buffer buf is a failsafe buffer in kernel space, it's not user
> memory hence doesn't deserve the use of get_user() or put_user().
> 
> Access 'buf' content directly.
> 
> Reported-by: kernel test robot 
> Link: https://lore.kernel.org/lkml/20290526.k5vb7nwc-...@intel.com/T/
> Signed-off-by: Christophe Leroy 
> ---
>  drivers/w1/slaves/w1_ds28e04.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/w1/slaves/w1_ds28e04.c b/drivers/w1/slaves/w1_ds28e04.c
> index e4f336111edc..d75bb16fb7a1 100644
> --- a/drivers/w1/slaves/w1_ds28e04.c
> +++ b/drivers/w1/slaves/w1_ds28e04.c
> @@ -339,10 +339,7 @@ static BIN_ATTR_RW(pio, 1);
>  static ssize_t crccheck_show(struct device *dev, struct device_attribute 
> *attr,
>char *buf)
>  {
> - if (put_user(w1_enable_crccheck + 0x30, buf))
> - return -EFAULT;
> -
> - return sizeof(w1_enable_crccheck);
> + return sprintf(buf, "%d", w1_enable_crccheck);

This should be sysfs_emit(), right?

>  }
>  
>  static ssize_t crccheck_store(struct device *dev, struct device_attribute 
> *attr,
> @@ -353,11 +350,8 @@ static ssize_t crccheck_store(struct device *dev, struct 
> device_attribute *attr,
>   if (count != 1 || !buf)
>   return -EINVAL;
>  
> - if (get_user(val, buf))
> - return -EFAULT;
> -
>   /* convert to decimal */
> - val = val - 0x30;
> + val = *buf - 0x30;

Why not use a proper function that can parse a string and turn it into a
number?

thanks,

greg k-h


Re: [PATCH] powerpc: mm: radix_tlb: rearrange the if-else block

2021-11-26 Thread Nathan Chancellor
On Fri, Nov 26, 2021 at 02:59:29PM +0100, Arnd Bergmann wrote:
> On Fri, Nov 26, 2021 at 2:43 PM Christophe Leroy
>  wrote:
> > Le 25/11/2021 à 16:44, Anders Roxell a écrit :
> > Can't you fix CLANG instead :) ?
> >
> > Or just add an else to the IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) that
> > sets hstart and hend to 0 ?
> 
> That doesn't sound any less risky than duplicating the code, it can lead to
> incorrect changes just as easily if a patch ends up actually flushing at the
> wrong address, and the compiler fails to complain because of the bogus
> initialization.
> 
> > Or just put hstart and hend calculation outside the IS_ENABLED() ? After
> > all GCC should drop the calculation when not used.
> 
> I like this one. I'm still unsure how clang can get so confused about whether
> the variables are initialized or not, usually it handles this much better than
> gcc. My best guess is that one of the memory clobbers makes it conclude
> that 'hflush' can be true when it gets written to by an inline asm.

As far as I am aware, clang's analysis does not evaluate variables when
generating a control flow graph and using that for static analysis:

https://godbolt.org/z/PdGxoq9j7

Based on the control flow graph, it knows that hstart and hend are
uninitialized because IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) gets
expanded to 0 by the preprocessor but it does not seem like it can piece
together that hflush's value of false is only changed to true under the
now 'if (0) {' branch, meaning that all the calls to __tlbiel_va_range()
never get evaluated. That may or may not be easy to fix in clang but we
run into issues like this so infrequently.

At any rate, the below diff works for me.

Cheers,
Nathan

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index 7724af19ed7e..156a631df976 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -1174,12 +1174,10 @@ static inline void __radix__flush_tlb_range(struct 
mm_struct *mm,
bool hflush = false;
unsigned long hstart, hend;
 
-   if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
-   hstart = (start + PMD_SIZE - 1) & PMD_MASK;
-   hend = end & PMD_MASK;
-   if (hstart < hend)
-   hflush = true;
-   }
+   hstart = (start + PMD_SIZE - 1) & PMD_MASK;
+   hend = end & PMD_MASK;
+   if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hstart < hend)
+   hflush = true;
 
if (type == FLUSH_TYPE_LOCAL) {
asm volatile("ptesync": : :"memory");


Re: [PATCH] powerpc: mm: radix_tlb: rearrange the if-else block

2021-11-26 Thread Arnd Bergmann
On Fri, Nov 26, 2021 at 2:43 PM Christophe Leroy
 wrote:
> Le 25/11/2021 à 16:44, Anders Roxell a écrit :
> Can't you fix CLANG instead :) ?
>
> Or just add an else to the IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) that
> sets hstart and hend to 0 ?

That doesn't sound any less risky than duplicating the code, it can lead to
incorrect changes just as easily if a patch ends up actually flushing at the
wrong address, and the compiler fails to complain because of the bogus
initialization.

> Or just put hstart and hend calculation outside the IS_ENABLED() ? After
> all GCC should drop the calculation when not used.

I like this one. I'm still unsure how clang can get so confused about whether
the variables are initialized or not, usually it handles this much better than
gcc. My best guess is that one of the memory clobbers makes it conclude
that 'hflush' can be true when it gets written to by an inline asm.

Arnd


[powerpc:merge] BUILD SUCCESS 2dbc3a3e8fc1ea24589150a874cd37904898286a

2021-11-26 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: 2dbc3a3e8fc1ea24589150a874cd37904898286a  Automatic merge of 
'next' into merge (2021-11-25 21:55)

elapsed time: 1569m

configs tested: 54
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm  allyesconfig
arm  allmodconfig
arm64allyesconfig
arm64   defconfig
i386 randconfig-c001-20211125
ia64defconfig
ia64 allmodconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
cskydefconfig
alpha   defconfig
nds32   defconfig
alphaallyesconfig
nios2allyesconfig
arc defconfig
sh   allmodconfig
h8300allyesconfig
xtensa   allyesconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
i386  debian-10.3
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
um i386_defconfig
um   x86_64_defconfig
x86_64   allyesconfig
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec
x86_64  rhel-8.3-func
x86_64rhel-8.3-kselftests

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH] powerpc: mm: radix_tlb: rearrange the if-else block

2021-11-26 Thread Christophe Leroy




Le 25/11/2021 à 16:44, Anders Roxell a écrit :

Clang warns:

arch/powerpc/mm/book3s64/radix_tlb.c:1191:23: error: variable 'hstart' is 
uninitialized when used here [-Werror,-Wuninitialized]
 __tlbiel_va_range(hstart, hend, pid,
   ^~
arch/powerpc/mm/book3s64/radix_tlb.c:1175:23: note: initialize the variable 
'hstart' to silence this warning
 unsigned long hstart, hend;
 ^
  = 0
arch/powerpc/mm/book3s64/radix_tlb.c:1191:31: error: variable 'hend' is 
uninitialized when used here [-Werror,-Wuninitialized]
 __tlbiel_va_range(hstart, hend, pid,
   ^~~~
arch/powerpc/mm/book3s64/radix_tlb.c:1175:29: note: initialize the variable 
'hend' to silence this warning
 unsigned long hstart, hend;
   ^
= 0
2 errors generated.

Rework the if-else to pull the 'IS_ENABLE(CONFIG_TRANSPARENT_HUGEPAGE)'
check one level up, this will silent the warnings. That will also
simplify the 'else' path. Clang is getting confused with these warnings,
but the warnings is a false-positive.


But you are duplicating a significant part of the code by doing that, 
and duplicated code generaly leads to bugs.


And we already have redundant stuff between FLUSH_TYPE_LOCAL leg and 
cputlb_use_tlbie() leg.


Can't you fix CLANG instead :) ?

Or just add an else to the IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) that 
sets hstart and hend to 0 ?


Or just put hstart and hend calculation outside the IS_ENABLED() ? After 
all GCC should drop the calculation when not used.





Suggested-by: Arnd Bergmann 
Signed-off-by: Anders Roxell 
---
  arch/powerpc/mm/book3s64/radix_tlb.c | 31 +---
  1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index 7724af19ed7e..e494a45ce1b4 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -1170,16 +1170,14 @@ static inline void __radix__flush_tlb_range(struct 
mm_struct *mm,
_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_ALL);
}
}
-   } else {
+   } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
bool hflush = false;
unsigned long hstart, hend;
  
-		if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {

-   hstart = (start + PMD_SIZE - 1) & PMD_MASK;
-   hend = end & PMD_MASK;
-   if (hstart < hend)
-   hflush = true;
-   }
+   hstart = (start + PMD_SIZE - 1) & PMD_MASK;
+   hend = end & PMD_MASK;
+   if (hstart < hend)
+   hflush = true;
  
  		if (type == FLUSH_TYPE_LOCAL) {

asm volatile("ptesync": : :"memory");
@@ -1207,6 +1205,25 @@ static inline void __radix__flush_tlb_range(struct 
mm_struct *mm,
_tlbiel_va_range_multicast(mm,
hstart, hend, pid, PMD_SIZE, 
MMU_PAGE_2M, flush_pwc);
}
+   } else {
+
+   if (type == FLUSH_TYPE_LOCAL) {
+   asm volatile("ptesync" : : : "memory");
+   if (flush_pwc)
+   /* For PWC, only one flush is needed */
+   __tlbiel_pid(pid, 0, RIC_FLUSH_PWC);
+   __tlbiel_va_range(start, end, pid, page_size, 
mmu_virtual_psize);
+   ppc_after_tlbiel_barrier();
+   } else if (cputlb_use_tlbie()) {
+   asm volatile("ptesync" : : : "memory");
+   if (flush_pwc)
+   __tlbie_pid(pid, RIC_FLUSH_PWC);
+   __tlbie_va_range(start, end, pid, page_size, 
mmu_virtual_psize);
+   asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+   } else {
+   _tlbiel_va_range_multicast(mm,
+   start, end, pid, page_size, 
mmu_virtual_psize, flush_pwc);
+   }
}
  out:
preempt_enable();



[PATCH] powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs

2021-11-26 Thread Christophe Leroy
Today we have the following IBATs allocated:

---[ Instruction Block Address Translation ]---
0: 0xc000-0xc03f 0x 4M Kernel   x m
1: 0xc040-0xc05f 0x0040 2M Kernel   x m
2: 0xc060-0xc06f 0x0060 1M Kernel   x m
3: 0xc070-0xc077 0x0070   512K Kernel   x m
4: 0xc078-0xc079 0x0078   128K Kernel   x m
5: 0xc07a-0xc07b 0x007a   128K Kernel   x m
6: -
7: -

The two 128K should be a single 256K instead.

When _etext is not aligned to 128Kbytes, the system will allocate
all necessary BATs to the lower 128Kbytes boundary, then allocate
an additional 128Kbytes BAT for the remaining block.

Instead, align the top to 128Kbytes so that the function directly
allocates a 256Mbytes last block:

---[ Instruction Block Address Translation ]---
0: 0xc000-0xc03f 0x 4M Kernel   x m
1: 0xc040-0xc05f 0x0040 2M Kernel   x m
2: 0xc060-0xc06f 0x0060 1M Kernel   x m
3: 0xc070-0xc077 0x0070   512K Kernel   x m
4: 0xc078-0xc07b 0x0078   256K Kernel   x m
5: -
6: -
7: -

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/book3s32/mmu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 27061583a010..33ab63d56435 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -196,18 +196,17 @@ void mmu_mark_initmem_nx(void)
int nb = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4;
int i;
unsigned long base = (unsigned long)_stext - PAGE_OFFSET;
-   unsigned long top = (unsigned long)_etext - PAGE_OFFSET;
+   unsigned long top = ALIGN((unsigned long)_etext - PAGE_OFFSET, SZ_128K);
unsigned long border = (unsigned long)__init_begin - PAGE_OFFSET;
unsigned long size;
 
-   for (i = 0; i < nb - 1 && base < top && top - base > (128 << 10);) {
+   for (i = 0; i < nb - 1 && base < top;) {
size = block_size(base, top);
setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
base += size;
}
if (base < top) {
size = block_size(base, top);
-   size = max(size, 128UL << 10);
if ((top - base) > size) {
size <<= 1;
if (strict_kernel_rwx_enabled() && base + size > border)
-- 
2.33.1



[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d

2021-11-26 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205099

--- Comment #40 from Christophe Leroy (christophe.le...@csgroup.eu) ---
Would also be great if you can activate CONFIG_PTDUMP_DEBUGFS and provide the
content of /sys/kernel/debug/kernel_page_tables

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] powerpc/64s/radix: Fix unmapping huge vmaps when CONFIG_HUGETLB_PAGE=n

2021-11-26 Thread Nicholas Piggin
Excerpts from Daniel Axtens's message of November 26, 2021 4:09 pm:
> Hi,
> 
>> pmd_huge is defined out to false when HUGETLB_PAGE is not configured,
>> but the vmap code still installs huge PMDs. This leads to errors
>> encountering bad PMDs when vunmapping because it is not seen as a
>> huge PTE, and the bad PMD check catches it. The end result may not
>> be much more serious than some bad pmd warning messages, because the
>> pmd_none_or_clear_bad() does what we wanted and clears the huge PTE
>> anyway.
> 
> Huh. So vmap seems to key off arch_vmap_p?d_supported which checks for
> radix and HAVE_ARCH_HUGE_VMAP.
> 
>> Fix this by checking pmd_is_leaf(), which checks for a PTE regardless
>> of config options. The whole huge/large/leaf stuff is a tangled mess
>> but that's kernel-wide and not something we can improve much in
>> arch/powerpc code.
> 
> I guess I'm a bit late to the party here because p?d_is_leaf was added
> in 2019 in commit d6eacedd1f0e ("powerpc/book3s: Use config independent
> helpers for page table walk") but why wouldn't we just make pmd_huge()
> not config dependent?

I guess so it constant folds code if hugetlbfs is not configured 
(and maybe so !huge kernels would correctly print a bad PMD warning if
they got huge PMD in user mappings).

> 
> Also, looking at that commit, there are a few places that might still
> throw warnings, e.g. find_linux_pte, find_current_mm_pte, pud_page which
> seem like they might still throw warnings if they were to encounter a
> huge vmap page:
> 
> struct page *pud_page(pud_t pud)
> {
>   if (pud_is_leaf(pud)) {
>   VM_WARN_ON(!pud_huge(pud));

Oh, hmm. That is used in vmalloc.c so maybe that warning should be
removed as a false positive. Good catch.

> Do these functions need special treatment for huge vmappings()?

find_linux_pte etc could be called for vmaps. I'm not sure I see a
problem in that function.

Thanks,
Nick

> 
> Apart from those questions, the patch itself makes sense to me and I can
> follow how it would fix a problem.
> 
> Reviewed-by: Daniel Axtens 
> 
> Kind regards,
> Daniel
> 


Re: [PATCH 1/3] powerpc/code-patching: work around code patching verification in patching tests

2021-11-26 Thread Christophe Leroy




Le 26/11/2021 à 11:27, Nicholas Piggin a écrit :

Excerpts from Christophe Leroy's message of November 26, 2021 4:34 pm:



Le 26/11/2021 à 04:22, Nicholas Piggin a écrit :

Code patching tests patch the stack and (non-module) vmalloc space now,
which falls afoul of the new address check.

The stack patching can easily be fixed, but the vmalloc patching is more
difficult. For now, add an ugly workaround to skip the check while the
test code is running.


This really looks hacky.

To skip the test, you can call do_patch_instruction() instead of calling
patch_instruction().


And make a do_patch_branch function. I thought about it, and thought
this is sligtly easier.



Anyway, as reported by Sachin the ftrace code also trips in the new 
verification. So I have submitted a patch to revert to the previous 
level of verification.


Then we can fix all this properly without going through a temporary hack 
and activate the verification again once every caller is fixed.


I was not able to reproduce Sachin's problem on PPC32. Could it be 
specific to PPC64 ?


Christophe


Re: [PATCH] powerpc/pseries/vas: Don't print an error when VAS is unavailable

2021-11-26 Thread Nicholas Piggin
Excerpts from Cédric Le Goater's message of November 26, 2021 5:13 pm:
> On 11/26/21 06:21, Nicholas Piggin wrote:
>> KVM does not support VAS so guests always print a useless error on boot
>> 
>>  vas: HCALL(398) error -2, query_type 0, result buffer 0x57f2000
>> 
>> Change this to only print the message if the error is not H_FUNCTION.
> 
> 
> Just being curious, why is it even called since "ibm,compression" should
> not be exposed in the DT ?

It looks like vas does not test for it. I guess in theory there can be 
other functions than compression implemented as an accelerator. Maybe
that's why?

Thanks,
Nick



[PATCH] powerpc/ptdump: Fix display a BAT's size unit

2021-11-26 Thread Christophe Leroy
We have wrong units on BAT's sizes (G instead of M, M instead of ...)

---[ Instruction Block Address Translation ]---
0: 0xc000-0xc03f 0x 4G Kernel   x m
1: 0xc040-0xc05f 0x0040 2G Kernel   x m
2: 0xc060-0xc06f 0x0060 1G Kernel   x m
3: 0xc070-0xc077 0x0070   512M Kernel   x m
4: 0xc078-0xc079 0x0078   128M Kernel   x m
5: 0xc07a-0xc07b 0x007a   128M Kernel   x m
6: -
7: -

This is because pt_dump_size() expects a size in Kbytes but
bat_show_603() gives the size in bytes.

To avoid risk of confusion, change pt_dump_size() to take bytes.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ptdump/ptdump.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index bf251191e78d..031956d0ee84 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -123,7 +123,7 @@ static struct ptdump_range ptdump_range[] __ro_after_init = 
{
 
 void pt_dump_size(struct seq_file *m, unsigned long size)
 {
-   static const char units[] = "KMGTPE";
+   static const char units[] = " KMGTPE";
const char *unit = units;
 
/* Work out what appropriate unit to use */
@@ -176,7 +176,7 @@ static void dump_addr(struct pg_state *st, unsigned long 
addr)
 
pt_dump_seq_printf(st->seq, REG "-" REG " ", st->start_address, addr - 
1);
pt_dump_seq_printf(st->seq, " " REG " ", st->start_pa);
-   pt_dump_size(st->seq, (addr - st->start_address) >> 10);
+   pt_dump_size(st->seq, addr - st->start_address);
 }
 
 static void note_prot_wx(struct pg_state *st, unsigned long addr)
-- 
2.33.1



Re: [PATCH 1/3] powerpc/code-patching: work around code patching verification in patching tests

2021-11-26 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of November 26, 2021 4:34 pm:
> 
> 
> Le 26/11/2021 à 04:22, Nicholas Piggin a écrit :
>> Code patching tests patch the stack and (non-module) vmalloc space now,
>> which falls afoul of the new address check.
>> 
>> The stack patching can easily be fixed, but the vmalloc patching is more
>> difficult. For now, add an ugly workaround to skip the check while the
>> test code is running.
> 
> This really looks hacky.
> 
> To skip the test, you can call do_patch_instruction() instead of calling 
> patch_instruction().

And make a do_patch_branch function. I thought about it, and thought 
this is sligtly easier.

Thanks,
Nick


[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d

2021-11-26 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205099

--- Comment #39 from Christophe Leroy (christophe.le...@csgroup.eu) ---
Can you retry with CONFIG_LOWMEM_SIZE=0x2800 or
CONFIG_LOWMEM_SIZE=0x2000 ?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d

2021-11-26 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205099

--- Comment #38 from Christophe Leroy (christophe.le...@csgroup.eu) ---
Looks like only x86 are arm implement this vmalloc= parameter:

[chleroy@PO20335 linux-powerpc]$ git grep 'early_param("vmalloc"' arch/
arch/arm/mm/mmu.c:early_param("vmalloc", early_vmalloc);
arch/x86/mm/pgtable_32.c:early_param("vmalloc", parse_vmalloc);

However, your vmalloc area has a size of 65M:

Kernel virtual memory layout:
  * 0xf600..0xfec0  : kasan shadow mem
  * 0xf5bbf000..0xf5fff000  : fixmap
  * 0xf540..0xf580  : highmem PTEs
  * 0xf5115000..0xf540  : early ioremap
  * 0xf100..0xf511  : vmalloc & ioremap
  * 0xb000..0xc000  : modules
Memory: 1928984K/2097152K available (22288K kernel code, 2616K rwdata, 4868K
rodata, 1408K init, 8981K bss, 168168K reserved, 0K cma-reserved, 1310720K
highmem)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d

2021-11-26 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205099

--- Comment #37 from Christophe Leroy (christophe.le...@csgroup.eu) ---
I see no obvious reason for a 32Mb allocation to fail while you have 588612kB
free memory.

And that happens early at boot, before user processes are started so the vmap
area, allthough not very big, should still have 32M space available.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] powerpc/code-patching: Relax verification of patchability

2021-11-26 Thread Christophe Leroy




Le 26/11/2021 à 08:39, Christophe Leroy a écrit :

Commit 8b8a8f0ab3f5 ("powerpc/code-patching: Improve verification of
patchability") introduced a stricter verification of the patched
area by checking it is proper kernel text.

But as least two usages of patch_instruction() fall outside:
- Code patching selftests, which use stack and vmalloc space.
- Ftrace

So for the time being, partially revert commit 8b8a8f0ab3f5 and add
a onetime warning:

   Running code patching self-tests ...
   patch_instruction() called on invalid text address 0xe1011e58 from 
test_code_patching+0x34/0xd6c

Reported-by: Sachin Sant 
Reported-by: Stephen Rothwell 
Cc: Nicholas Piggin 
Fixes: 8b8a8f0ab3f5 ("powerpc/code-patching: Improve verification of 
patchability")
Signed-off-by: Christophe Leroy 
---
  arch/powerpc/lib/code-patching.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 1dd636a85cc1..c87eea773930 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -190,9 +190,13 @@ static int do_patch_instruction(u32 *addr, struct ppc_inst 
instr)
  int patch_instruction(u32 *addr, struct ppc_inst instr)
  {
/* Make sure we aren't patching a freed init section */
-   if (!kernel_text_address((unsigned long)addr))
+   if (system_state >= SYSTEM_FREEING_INITMEM && 
init_section_contains(addr, 4))
return 0;
  
+	if (!kernel_text_address((unsigned long)addr))

+   pr_warn_once("%s() called on invalid text address 0x%p from 
%pS\n",
+__func__, addr, __builtin_return_address(0));
+


May it be better to use pr_warn_ratelimited() instead in order to catch 
more than the first occurence ?



return do_patch_instruction(addr, instr);
  }
  NOKPROBE_SYMBOL(patch_instruction);



Re: [PATCH] recordmcount: Support empty section from recent binutils

2021-11-26 Thread LEROY Christophe


Le 24/11/2021 à 15:43, Christophe Leroy a écrit :
> Looks like recent binutils (2.36 and over ?) may empty some section,
> leading to failure like:
> 
>   Cannot find symbol for section 11: .text.unlikely.
>   kernel/kexec_file.o: failed
>   make[1]: *** [scripts/Makefile.build:287: kernel/kexec_file.o] Error 1
> 
> In order to avoid that, ensure that the section has a content before
> returning it's name in has_rel_mcount().

This patch doesn't work, on PPC32 I get the following message with this 
patch applied:

[0.00] ftrace: No functions to be traced?

Without the patch I get:

[0.00] ftrace: allocating 22381 entries in 66 pages
[0.00] ftrace: allocated 66 pages with 2 groups

Christophe

> 
> Suggested-by: Steven Rostedt 
> Link: https://github.com/linuxppc/issues/issues/388
> Link: https://lore.kernel.org/all/20210215162209.5e2a4...@gandalf.local.home/
> Signed-off-by: Christophe Leroy 
> ---
>   scripts/recordmcount.h | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/scripts/recordmcount.h b/scripts/recordmcount.h
> index 1e9baa5c4fc6..cc6600b729ae 100644
> --- a/scripts/recordmcount.h
> +++ b/scripts/recordmcount.h
> @@ -575,6 +575,8 @@ static char const *has_rel_mcount(Elf_Shdr const *const 
> relhdr,
> char const *const shstrtab,
> char const *const fname)
>   {
> + if (!shdr0->sh_size)
> + return NULL;
>   if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA)
>   return NULL;
>   return __has_rel_mcount(relhdr, shdr0, shstrtab, fname);
>