[PATCH 1/3] PCI: pci-host-generic: Fix lookup of linux, pci-probe-only property

2015-08-14 Thread Marc Zyngier
When pci-host-generic looks for the probe-only property, it seems
to trust the DT to be correctly written, and assumes that there
is a parameter to the property.

Unfortunately, this is not always the case, and some firmware expose
this property naked. The driver ends up making a decision based on
whatever the property pointer points to, which is likely to be junk.

Instead, let's check for the validity of the property, and ignore
it if the firmware couldn't make up its mind.

Signed-off-by: Marc Zyngier 
---
 drivers/pci/host/pci-host-generic.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/host/pci-host-generic.c 
b/drivers/pci/host/pci-host-generic.c
index 265dd25..2b2e2ff 100644
--- a/drivers/pci/host/pci-host-generic.c
+++ b/drivers/pci/host/pci-host-generic.c
@@ -211,6 +211,7 @@ static int gen_pci_probe(struct platform_device *pdev)
const char *type;
const struct of_device_id *of_id;
const int *prop;
+   int len;
struct device *dev = &pdev->dev;
struct device_node *np = dev->of_node;
struct gen_pci *pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
@@ -225,12 +226,16 @@ static int gen_pci_probe(struct platform_device *pdev)
return -EINVAL;
}
 
-   prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL);
+   prop = of_get_property(of_chosen, "linux,pci-probe-only", &len);
if (prop) {
-   if (*prop)
-   pci_add_flags(PCI_PROBE_ONLY);
-   else
-   pci_clear_flags(PCI_PROBE_ONLY);
+   if (len) {
+   if (be32_to_cpup(prop))
+   pci_add_flags(PCI_PROBE_ONLY);
+   else
+   pci_clear_flags(PCI_PROBE_ONLY);
+   } else {
+   dev_warn(&pdev->dev, "linux,pci-probe-only set without 
value, ignoring\n");
+   }
}
 
of_id = of_match_node(gen_pci_of_match, np);
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/3] PCI: arm64/powerpc: Fix parsing of linux,pci-probe-only

2015-08-14 Thread Marc Zyngier
The pci-host-generic driver parses the linux,pci-probe-only property,
and assumes that it will have a boolean parameter.

Turns out that the Seattle DTS file has a naked "linux,pci-probe-only"
property, which leads to the driver dereferencing some unsuspecting
memory location. Nothing really bad happens (we end up reading some
other bit of DT, fortunately), but that not a reason to keep it this
way.

The first patch fixes the driver not to do silly things, and simply
give a warning when this happens. The powerpc code from where this
code was lifted is also fixed in a second patch.

Finally, the bad property is removed from the Seatle DTS, because it
is simply not necessary (it actually prevents me from using SR-IOV,
which otherwise runs fine without the probe-only thing).

Marc Zyngier (3):
  PCI: pci-host-generic: Fix lookup of linux,pci-probe-only property
  powerpc: PCI: Fix lookup of linux,pci-probe-only property
  arm64: dts: Drop linux,pci-probe-only from the Seattle DTS

 arch/arm64/boot/dts/amd/amd-overdrive.dts |  1 -
 arch/powerpc/platforms/pseries/setup.c| 15 ++-
 drivers/pci/host/pci-host-generic.c   | 15 ++-
 3 files changed, 20 insertions(+), 11 deletions(-)

-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] powerpc: PCI: Fix lookup of linux, pci-probe-only property

2015-08-14 Thread Marc Zyngier
When find_and_init_phbs() looks for the probe-only property, it seems
to trust the firmware to be correctly written, and assumes that there
is a parameter to the property.

It is conceivable that the firmware could not be that perfect, and it
could expose this property naked (at least one arm64 platform seems to
exhibit this exact behaviour). The setup code the ends up making
a decision based on whatever the property pointer points to, which
is likely to be junk.

Instead, let's check for the validity of the property, and ignore
it if the firmware couldn't make up its mind.

Signed-off-by: Marc Zyngier 
---
 arch/powerpc/platforms/pseries/setup.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index df6a704..6bdc1f9 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -490,14 +490,19 @@ static void __init find_and_init_phbs(void)
 */
if (of_chosen) {
const int *prop;
+   int len;
 
prop = of_get_property(of_chosen,
-   "linux,pci-probe-only", NULL);
+   "linux,pci-probe-only", &len);
if (prop) {
-   if (*prop)
-   pci_add_flags(PCI_PROBE_ONLY);
-   else
-   pci_clear_flags(PCI_PROBE_ONLY);
+   if (len) {
+   if (be32_to_cpup(prop))
+   pci_add_flags(PCI_PROBE_ONLY);
+   else
+   pci_clear_flags(PCI_PROBE_ONLY);
+   } else {
+   pr_warn("linux,pci-probe-only set without 
value, ignoring\n");
+   }
}
}
 }
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/3] arm64: dts: Drop linux, pci-probe-only from the Seattle DTS

2015-08-14 Thread Marc Zyngier
The linux,pci-probe-only property mandates an argument to indicate
whether or not to engage the "probe-only" mode, but the Seattle
DTS just provides a naked property, which is illegal.

Also, it turns out that the board is perfectly happy without
probe-only, so let's drop this altogether.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/boot/dts/amd/amd-overdrive.dts 
b/arch/arm64/boot/dts/amd/amd-overdrive.dts
index 564a3f7..128fa94 100644
--- a/arch/arm64/boot/dts/amd/amd-overdrive.dts
+++ b/arch/arm64/boot/dts/amd/amd-overdrive.dts
@@ -14,7 +14,6 @@
 
chosen {
stdout-path = &serial0;
-   linux,pci-probe-only;
};
 };
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] powerpc: PCI: Fix lookup of linux,pci-probe-only property

2015-08-14 Thread Marc Zyngier
Hi Bjorn,

On 14/08/15 15:57, Bjorn Helgaas wrote:
> Hi Marc,
> 
> On Fri, Aug 14, 2015 at 03:41:07PM +0100, Marc Zyngier wrote:
>> When find_and_init_phbs() looks for the probe-only property, it seems
>> to trust the firmware to be correctly written, and assumes that there
>> is a parameter to the property.
>>
>> It is conceivable that the firmware could not be that perfect, and it
>> could expose this property naked (at least one arm64 platform seems to
>> exhibit this exact behaviour). The setup code the ends up making
>> a decision based on whatever the property pointer points to, which
>> is likely to be junk.
>>
>> Instead, let's check for the validity of the property, and ignore
>> it if the firmware couldn't make up its mind.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/powerpc/platforms/pseries/setup.c | 15 ++-
>>  1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/setup.c 
>> b/arch/powerpc/platforms/pseries/setup.c
>> index df6a704..6bdc1f9 100644
>> --- a/arch/powerpc/platforms/pseries/setup.c
>> +++ b/arch/powerpc/platforms/pseries/setup.c
>> @@ -490,14 +490,19 @@ static void __init find_and_init_phbs(void)
>>   */
>>  if (of_chosen) {
>>  const int *prop;
>> +int len;
>>  
>>  prop = of_get_property(of_chosen,
>> -"linux,pci-probe-only", NULL);
>> +"linux,pci-probe-only", &len);
>>  if (prop) {
>> -if (*prop)
>> -pci_add_flags(PCI_PROBE_ONLY);
>> -else
>> -pci_clear_flags(PCI_PROBE_ONLY);
>> +if (len) {
>> +if (be32_to_cpup(prop))
>> +pci_add_flags(PCI_PROBE_ONLY);
>> +else
>> +pci_clear_flags(PCI_PROBE_ONLY);
>> +} else {
>> +pr_warn("linux,pci-probe-only set without 
>> value, ignoring\n");
>> +}
> 
> This seems essentially identical to the pci-host-generic version.
> Is there a way we can factor it out so there's only one copy?

Probably. drivers/of/of_pci.c seems like a good landing place for it.
I'll hack something and repost it.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/4] of/pci: Add of_pci_check_probe_only to parse "linux, pci-probe-only"

2015-08-14 Thread Marc Zyngier
Both pci-host-generic and Pseries parse the "linux,pci-probe-only"
to engage the PCI_PROBE_ONLY mode, and both have a subtle bug that
can be triggered if the property has no parameter.

Provide a generic implementation that can be used by both.

Signed-off-by: Marc Zyngier 
---
 drivers/of/of_pci.c| 30 ++
 include/linux/of_pci.h |  3 +++
 2 files changed, 33 insertions(+)

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 5751dc5..a4e29ff 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -118,6 +118,36 @@ int of_get_pci_domain_nr(struct device_node *node)
 EXPORT_SYMBOL_GPL(of_get_pci_domain_nr);
 
 /**
+ * of_pci_check_probe_only - Setup probe only mode if linux,pci-probe-only
+ *   is present and valid
+ *
+ * @node: device tree node that may contain the property (usually "chosen")
+ *
+ */
+void of_pci_check_probe_only(struct device_node *node)
+{
+   const int *prop;
+   int len;
+
+   if (!node)
+   return;
+
+   prop = of_get_property(node, "linux,pci-probe-only", &len);
+   if (prop) {
+   if (!len) {
+   pr_warn("linux,pci-probe-only set without value, 
ignoring\n");
+   return;
+   }
+
+   if (be32_to_cpup(prop))
+   pci_add_flags(PCI_PROBE_ONLY);
+   else
+   pci_clear_flags(PCI_PROBE_ONLY);
+   }
+}
+EXPORT_SYMBOL_GPL(of_pci_check_probe_only);
+
+/**
  * of_pci_dma_configure - Setup DMA configuration
  * @dev: ptr to pci_dev struct of the PCI device
  *
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 29fd3fe..4c0a617 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -17,6 +17,7 @@ int of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 
slot, u8 pin);
 int of_pci_parse_bus_range(struct device_node *node, struct resource *res);
 int of_get_pci_domain_nr(struct device_node *node);
 void of_pci_dma_configure(struct pci_dev *pci_dev);
+void of_pci_check_probe_only(struct device_node *node);
 #else
 static inline int of_irq_parse_pci(const struct pci_dev *pdev, struct 
of_phandle_args *out_irq)
 {
@@ -53,6 +54,8 @@ of_get_pci_domain_nr(struct device_node *node)
 }
 
 static inline void of_pci_dma_configure(struct pci_dev *pci_dev) { }
+
+static inline void of_pci_check_probe_only(struct device_node *node) { }
 #endif
 
 #if defined(CONFIG_OF_ADDRESS)
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/4] PCI: arm64/powerpc: Fix parsing of linux, pci-probe-only

2015-08-14 Thread Marc Zyngier
The pci-host-generic driver parses the linux,pci-probe-only property,
and assumes that it will have a boolean parameter.

Turns out that the Seattle DTS file has a naked "linux,pci-probe-only"
property, which leads to the driver dereferencing some unsuspecting
memory location. Nothing really bad happens (we end up reading some
other bit of DT, fortunately), but that not a reason to keep it this
way. Turns out that the Pseries code (where this code was lifted from)
may suffer from the same issue.

The first patch introduces a common (and fixed) version of that check
that can be used by drivers and architectures that require it. The two
following patches change the pci-host-generic driver and the powerpc
code to use it.

Finally, the bad property is removed from the Seatle DTS, because it
is simply not necessary (it actually prevents me from using SR-IOV,
which otherwise runs fine without the probe-only thing).

This has been tested on the offending Seattle board.

* From v1:
  - Consolidate the parsing in of_pci.c (Bjorn)

Marc Zyngier (4):
  of/pci: Add of_pci_check_probe_only to parse "linux,pci-probe-only"
  PCI: pci-host-generic: Fix lookup of linux,pci-probe-only property
  powerpc: PCI: Fix lookup of linux,pci-probe-only property
  arm64: dts: Drop linux,pci-probe-only from the Seattle DTS

 arch/arm64/boot/dts/amd/amd-overdrive.dts |  1 -
 arch/powerpc/platforms/pseries/setup.c| 14 ++
 drivers/of/of_pci.c   | 30 ++
 drivers/pci/host/pci-host-generic.c   |  9 +
 include/linux/of_pci.h|  3 +++
 5 files changed, 36 insertions(+), 21 deletions(-)

-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/4] PCI: pci-host-generic: Fix lookup of linux, pci-probe-only property

2015-08-14 Thread Marc Zyngier
When pci-host-generic looks for the probe-only property, it seems
to trust the DT to be correctly written, and assumes that there
is a parameter to the property.

Unfortunately, this is not always the case, and some firmware expose
this property naked. The driver ends up making a decision based on
whatever the property pointer points to, which is likely to be junk.

Switch to the common of_pci.c implementation that doesn't suffer
from this problem.

Signed-off-by: Marc Zyngier 
---
 drivers/pci/host/pci-host-generic.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/pci/host/pci-host-generic.c 
b/drivers/pci/host/pci-host-generic.c
index 265dd25..545ff4e 100644
--- a/drivers/pci/host/pci-host-generic.c
+++ b/drivers/pci/host/pci-host-generic.c
@@ -210,7 +210,6 @@ static int gen_pci_probe(struct platform_device *pdev)
int err;
const char *type;
const struct of_device_id *of_id;
-   const int *prop;
struct device *dev = &pdev->dev;
struct device_node *np = dev->of_node;
struct gen_pci *pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
@@ -225,13 +224,7 @@ static int gen_pci_probe(struct platform_device *pdev)
return -EINVAL;
}
 
-   prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL);
-   if (prop) {
-   if (*prop)
-   pci_add_flags(PCI_PROBE_ONLY);
-   else
-   pci_clear_flags(PCI_PROBE_ONLY);
-   }
+   of_pci_check_probe_only(of_chosen);
 
of_id = of_match_node(gen_pci_of_match, np);
pci->cfg.ops = of_id->data;
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 3/4] powerpc: PCI: Fix lookup of linux, pci-probe-only property

2015-08-14 Thread Marc Zyngier
When find_and_init_phbs() looks for the probe-only property, it seems
to trust the firmware to be correctly written, and assumes that there
is a parameter to the property.

It is conceivable that the firmware could not be that perfect, and it
could expose this property naked (at least one arm64 platform seems to
exhibit this exact behaviour). The setup code the ends up making
a decision based on whatever the property pointer points to, which
is likely to be junk.

Instead, switch to the common of_pci.c implementation that doesn't
suffer from this problem and ignore the property if the firmware
couldn't make up its mind.

Signed-off-by: Marc Zyngier 
---
 arch/powerpc/platforms/pseries/setup.c | 14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index df6a704..8434953 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -488,18 +489,7 @@ static void __init find_and_init_phbs(void)
 * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties
 * in chosen.
 */
-   if (of_chosen) {
-   const int *prop;
-
-   prop = of_get_property(of_chosen,
-   "linux,pci-probe-only", NULL);
-   if (prop) {
-   if (*prop)
-   pci_add_flags(PCI_PROBE_ONLY);
-   else
-   pci_clear_flags(PCI_PROBE_ONLY);
-   }
-   }
+   of_pci_check_probe_only(of_chosen);
 }
 
 static void __init pSeries_setup_arch(void)
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 4/4] arm64: dts: Drop linux, pci-probe-only from the Seattle DTS

2015-08-14 Thread Marc Zyngier
The linux,pci-probe-only property mandates an argument to indicate
whether or not to engage the "probe-only" mode, but the Seattle
DTS just provides a naked property, which is illegal.

Also, it turns out that the board is perfectly happy without
probe-only, so let's drop this altogether.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/boot/dts/amd/amd-overdrive.dts 
b/arch/arm64/boot/dts/amd/amd-overdrive.dts
index 564a3f7..128fa94 100644
--- a/arch/arm64/boot/dts/amd/amd-overdrive.dts
+++ b/arch/arm64/boot/dts/amd/amd-overdrive.dts
@@ -14,7 +14,6 @@
 
chosen {
stdout-path = &serial0;
-   linux,pci-probe-only;
};
 };
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 1/4] of/pci: Add of_pci_check_probe_only to parse "linux, pci-probe-only"

2015-09-03 Thread Marc Zyngier
On 02/09/15 23:23, Bjorn Helgaas wrote:
> On Fri, Aug 14, 2015 at 04:08:10PM -0500, Rob Herring wrote:
>> On Fri, Aug 14, 2015 at 11:19 AM, Marc Zyngier  wrote:
>>> Both pci-host-generic and Pseries parse the "linux,pci-probe-only"
>>> to engage the PCI_PROBE_ONLY mode, and both have a subtle bug that
>>> can be triggered if the property has no parameter.
>>
>> Humm, I bet we could break a lot of machines if we fixed the core code
>> to properly make pp->value NULL when there is no value.
>>
>>> Provide a generic implementation that can be used by both.
>>>
>>> Signed-off-by: Marc Zyngier 
>>> ---
>>>  drivers/of/of_pci.c| 30 ++
>>>  include/linux/of_pci.h |  3 +++
>>>  2 files changed, 33 insertions(+)
>>>
>>> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
>>> index 5751dc5..a4e29ff 100644
>>> --- a/drivers/of/of_pci.c
>>> +++ b/drivers/of/of_pci.c
>>> @@ -118,6 +118,36 @@ int of_get_pci_domain_nr(struct device_node *node)
>>>  EXPORT_SYMBOL_GPL(of_get_pci_domain_nr);
>>>
>>>  /**
>>> + * of_pci_check_probe_only - Setup probe only mode if linux,pci-probe-only
>>> + *   is present and valid
>>> + *
>>> + * @node: device tree node that may contain the property (usually "chosen")
>>> + *
>>> + */
>>> +void of_pci_check_probe_only(struct device_node *node)
>>> +{
>>> +   const int *prop;
>>> +   int len;
>>> +
>>> +   if (!node)
>>> +   return;
>>> +
>>> +   prop = of_get_property(node, "linux,pci-probe-only", &len);
>>
>> It is preferred to use of_property_read_u32 to avoid just these types
>> of problems.
> 
> I don't know enough OF to really understand this, but I infer that
> this is a suggestion for improving the patch.  Should I be waiting for
> a v3 series?

Yes, I'll post an update very shortly. Thanks for reminding me!

M.
-- 
Jazz is not dead. It just smells funny...
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/4] of/pci: Add of_pci_check_probe_only to parse "linux, pci-probe-only"

2015-09-03 Thread Marc Zyngier
Both pci-host-generic and Pseries parse the "linux,pci-probe-only"
property to engage the PCI_PROBE_ONLY mode, and both have a subtle
bug that can be triggered if the property has no parameter.

Provide a generic, safe implementation that can be used by both.

Signed-off-by: Marc Zyngier 
---
 drivers/of/of_pci.c| 31 +++
 include/linux/of_pci.h |  3 +++
 2 files changed, 34 insertions(+)

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 5751dc5..7876343 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -118,6 +118,37 @@ int of_get_pci_domain_nr(struct device_node *node)
 EXPORT_SYMBOL_GPL(of_get_pci_domain_nr);
 
 /**
+ * of_pci_check_probe_only - Setup probe only mode if linux,pci-probe-only
+ *   is present and valid
+ *
+ * @node: device tree node that may contain the property (usually "chosen")
+ *
+ */
+void of_pci_check_probe_only(struct device_node *node)
+{
+   u32 val;
+   int ret;
+
+   if (!node)
+   return;
+
+   ret = of_property_read_u32(node, "linux,pci-probe-only", &val);
+   if (ret) {
+   if (ret == -ENODATA || ret == -EOVERFLOW)
+   pr_warn("linux,pci-probe-only without valid value, 
ignoring\n");
+   return;
+   }
+
+   if (val)
+   pci_add_flags(PCI_PROBE_ONLY);
+   else
+   pci_clear_flags(PCI_PROBE_ONLY);
+
+   pr_info("PCI: PROBE_ONLY %sabled\n", val ? "en" : "dis");
+}
+EXPORT_SYMBOL_GPL(of_pci_check_probe_only);
+
+/**
  * of_pci_dma_configure - Setup DMA configuration
  * @dev: ptr to pci_dev struct of the PCI device
  *
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 29fd3fe..4c0a617 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -17,6 +17,7 @@ int of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 
slot, u8 pin);
 int of_pci_parse_bus_range(struct device_node *node, struct resource *res);
 int of_get_pci_domain_nr(struct device_node *node);
 void of_pci_dma_configure(struct pci_dev *pci_dev);
+void of_pci_check_probe_only(struct device_node *node);
 #else
 static inline int of_irq_parse_pci(const struct pci_dev *pdev, struct 
of_phandle_args *out_irq)
 {
@@ -53,6 +54,8 @@ of_get_pci_domain_nr(struct device_node *node)
 }
 
 static inline void of_pci_dma_configure(struct pci_dev *pci_dev) { }
+
+static inline void of_pci_check_probe_only(struct device_node *node) { }
 #endif
 
 #if defined(CONFIG_OF_ADDRESS)
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/4] PCI: arm64/powerpc: Fix parsing of linux, pci-probe-only

2015-09-03 Thread Marc Zyngier
The pci-host-generic driver parses the linux,pci-probe-only property,
and assumes that it will have a boolean parameter.

Turns out that the Seattle DTS file has a naked "linux,pci-probe-only"
property, which leads to the driver dereferencing some unsuspecting
memory location. Nothing really bad happens (we end up reading some
other bit of DT, fortunately), but that not a reason to keep it this
way. Turns out that the Pseries code (where this code was lifted from)
may suffer from the same issue.

The first patch introduces a common (and fixed) version of that check
that can be used by drivers and architectures that require it. The two
following patches change the pci-host-generic driver and the powerpc
code to use it.

Finally, the bad property is removed from the Seatle DTS, because it
is simply not necessary (it actually prevents me from using SR-IOV,
which otherwise runs fine without the probe-only thing).

This has been tested on the offending Seattle board.

* From v2:
  - Use of_property_read_u32 to safely read the property (Rob)
  - Add a log message to indicate when we enable probe-only
(probably quite useful for debugging)

* From v1:
  - Consolidate the parsing in of_pci.c (Bjorn)

Marc Zyngier (4):
  of/pci: Add of_pci_check_probe_only to parse "linux,pci-probe-only"
  PCI: pci-host-generic: Fix lookup of linux,pci-probe-only property
  powerpc: PCI: Fix lookup of linux,pci-probe-only property
  arm64: dts: Drop linux,pci-probe-only from the Seattle DTS

 arch/arm64/boot/dts/amd/amd-overdrive.dts |  1 -
 arch/powerpc/platforms/pseries/setup.c| 14 ++
 drivers/of/of_pci.c   | 31 +++
 drivers/pci/host/pci-host-generic.c   |  9 +
 include/linux/of_pci.h|  3 +++
 5 files changed, 37 insertions(+), 21 deletions(-)

-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 2/4] PCI: pci-host-generic: Fix lookup of linux, pci-probe-only property

2015-09-03 Thread Marc Zyngier
When pci-host-generic looks for the probe-only property, it seems
to trust the DT to be correctly written, and assumes that there
is a parameter to the property.

Unfortunately, this is not always the case, and some firmware expose
this property naked. The driver ends up making a decision based on
whatever the property pointer points to, which is likely to be junk.

Switch to the common of_pci.c implementation that doesn't suffer
from this problem.

Signed-off-by: Marc Zyngier 
---
 drivers/pci/host/pci-host-generic.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/pci/host/pci-host-generic.c 
b/drivers/pci/host/pci-host-generic.c
index 265dd25..545ff4e 100644
--- a/drivers/pci/host/pci-host-generic.c
+++ b/drivers/pci/host/pci-host-generic.c
@@ -210,7 +210,6 @@ static int gen_pci_probe(struct platform_device *pdev)
int err;
const char *type;
const struct of_device_id *of_id;
-   const int *prop;
struct device *dev = &pdev->dev;
struct device_node *np = dev->of_node;
struct gen_pci *pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
@@ -225,13 +224,7 @@ static int gen_pci_probe(struct platform_device *pdev)
return -EINVAL;
}
 
-   prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL);
-   if (prop) {
-   if (*prop)
-   pci_add_flags(PCI_PROBE_ONLY);
-   else
-   pci_clear_flags(PCI_PROBE_ONLY);
-   }
+   of_pci_check_probe_only(of_chosen);
 
of_id = of_match_node(gen_pci_of_match, np);
pci->cfg.ops = of_id->data;
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 3/4] powerpc: PCI: Fix lookup of linux, pci-probe-only property

2015-09-03 Thread Marc Zyngier
When find_and_init_phbs() looks for the probe-only property, it seems
to trust the firmware to be correctly written, and assumes that there
is a parameter to the property.

It is conceivable that the firmware could not be that perfect, and it
could expose this property naked (at least one arm64 platform seems to
exhibit this exact behaviour). The setup code the ends up making
a decision based on whatever the property pointer points to, which
is likely to be junk.

Instead, switch to the common of_pci.c implementation that doesn't
suffer from this problem and ignore the property if the firmware
couldn't make up its mind.

Signed-off-by: Marc Zyngier 
---
 arch/powerpc/platforms/pseries/setup.c | 14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 39a74fa..412531f 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -495,18 +496,7 @@ static void __init find_and_init_phbs(void)
 * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties
 * in chosen.
 */
-   if (of_chosen) {
-   const int *prop;
-
-   prop = of_get_property(of_chosen,
-   "linux,pci-probe-only", NULL);
-   if (prop) {
-   if (*prop)
-   pci_add_flags(PCI_PROBE_ONLY);
-   else
-   pci_clear_flags(PCI_PROBE_ONLY);
-   }
-   }
+   of_pci_check_probe_only(of_chosen);
 }
 
 static void __init pSeries_setup_arch(void)
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 4/4] arm64: dts: Drop linux, pci-probe-only from the Seattle DTS

2015-09-03 Thread Marc Zyngier
The linux,pci-probe-only property mandates an argument to indicate
whether or not to engage the "probe-only" mode, but the Seattle
DTS just provides a naked property, which is illegal.

Also, it turns out that the board is perfectly happy without
probe-only, so let's drop this altogether.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/boot/dts/amd/amd-overdrive.dts 
b/arch/arm64/boot/dts/amd/amd-overdrive.dts
index 564a3f7..128fa94 100644
--- a/arch/arm64/boot/dts/amd/amd-overdrive.dts
+++ b/arch/arm64/boot/dts/amd/amd-overdrive.dts
@@ -14,7 +14,6 @@
 
chosen {
stdout-path = &serial0;
-   linux,pci-probe-only;
};
 };
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 0/4] PCI: arm64/powerpc: Fix parsing of linux, pci-probe-only

2015-09-04 Thread Marc Zyngier
The pci-host-generic driver parses the linux,pci-probe-only property,
and assumes that it will have a boolean parameter.

Turns out that the Seattle DTS file has a naked "linux,pci-probe-only"
property, which leads to the driver dereferencing some unsuspecting
memory location. Nothing really bad happens (we end up reading some
other bit of DT, fortunately), but that not a reason to keep it this
way. Turns out that the Pseries code (where this code was lifted from)
may suffer from the same issue.

The first patch introduces a common (and fixed) version of that check
that can be used by drivers and architectures that require it. The two
following patches change the pci-host-generic driver and the powerpc
code to use it.

Finally, the bad property is removed from the Seatle DTS, because it
is simply not necessary (it actually prevents me from using SR-IOV,
which otherwise runs fine without the probe-only thing).

This has been tested on the offending Seattle board.

* From v3:
  - Restrict the property lookup to /chosen (Rob)
  - Acked-by on patch #4 from Suravee
  - I swear this is the last time I rework these patches! ;-)

* From v2:
  - Use of_property_read_u32 to safely read the property (Rob)
  - Add a log message to indicate when we enable probe-only
(probably quite useful for debugging)

* From v1:
  - Consolidate the parsing in of_pci.c (Bjorn)

Marc Zyngier (4):
  of/pci: Add of_pci_check_probe_only to parse "linux,pci-probe-only"
  PCI: pci-host-generic: Fix lookup of linux,pci-probe-only property
  powerpc: PCI: Fix lookup of linux,pci-probe-only property
  arm64: dts: Drop linux,pci-probe-only from the Seattle DTS

 arch/arm64/boot/dts/amd/amd-overdrive.dts |  1 -
 arch/powerpc/platforms/pseries/setup.c| 14 ++
 drivers/of/of_pci.c   | 28 
 drivers/pci/host/pci-host-generic.c   |  9 +
 include/linux/of_pci.h|  3 +++
 5 files changed, 34 insertions(+), 21 deletions(-)

-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 1/4] of/pci: Add of_pci_check_probe_only to parse "linux, pci-probe-only"

2015-09-04 Thread Marc Zyngier
Both pci-host-generic and Pseries parse the "linux,pci-probe-only"
property to engage the PCI_PROBE_ONLY mode, and both have a subtle
bug that can be triggered if the property has no parameter.

Provide a generic, safe implementation that can be used by both.

Signed-off-by: Marc Zyngier 
---
 drivers/of/of_pci.c| 28 
 include/linux/of_pci.h |  3 +++
 2 files changed, 31 insertions(+)

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 5751dc5..5dd4966 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -118,6 +118,34 @@ int of_get_pci_domain_nr(struct device_node *node)
 EXPORT_SYMBOL_GPL(of_get_pci_domain_nr);
 
 /**
+ * of_pci_check_probe_only - Setup probe only mode if linux,pci-probe-only
+ *   is present and valid
+ *
+ * @node: device tree node that may contain the property (usually "chosen")
+ *
+ */
+void of_pci_check_probe_only(void)
+{
+   u32 val;
+   int ret;
+
+   ret = of_property_read_u32(of_chosen, "linux,pci-probe-only", &val);
+   if (ret) {
+   if (ret == -ENODATA || ret == -EOVERFLOW)
+   pr_warn("linux,pci-probe-only without valid value, 
ignoring\n");
+   return;
+   }
+
+   if (val)
+   pci_add_flags(PCI_PROBE_ONLY);
+   else
+   pci_clear_flags(PCI_PROBE_ONLY);
+
+   pr_info("PCI: PROBE_ONLY %sabled\n", val ? "en" : "dis");
+}
+EXPORT_SYMBOL_GPL(of_pci_check_probe_only);
+
+/**
  * of_pci_dma_configure - Setup DMA configuration
  * @dev: ptr to pci_dev struct of the PCI device
  *
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 29fd3fe..38c0533 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -17,6 +17,7 @@ int of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 
slot, u8 pin);
 int of_pci_parse_bus_range(struct device_node *node, struct resource *res);
 int of_get_pci_domain_nr(struct device_node *node);
 void of_pci_dma_configure(struct pci_dev *pci_dev);
+void of_pci_check_probe_only(void);
 #else
 static inline int of_irq_parse_pci(const struct pci_dev *pdev, struct 
of_phandle_args *out_irq)
 {
@@ -53,6 +54,8 @@ of_get_pci_domain_nr(struct device_node *node)
 }
 
 static inline void of_pci_dma_configure(struct pci_dev *pci_dev) { }
+
+static inline void of_pci_check_probe_only(void) { }
 #endif
 
 #if defined(CONFIG_OF_ADDRESS)
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 2/4] PCI: pci-host-generic: Fix lookup of linux, pci-probe-only property

2015-09-04 Thread Marc Zyngier
When pci-host-generic looks for the probe-only property, it seems
to trust the DT to be correctly written, and assumes that there
is a parameter to the property.

Unfortunately, this is not always the case, and some firmware expose
this property naked. The driver ends up making a decision based on
whatever the property pointer points to, which is likely to be junk.

Switch to the common of_pci.c implementation that doesn't suffer
from this problem.

Signed-off-by: Marc Zyngier 
---
 drivers/pci/host/pci-host-generic.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/pci/host/pci-host-generic.c 
b/drivers/pci/host/pci-host-generic.c
index 265dd25..224303d 100644
--- a/drivers/pci/host/pci-host-generic.c
+++ b/drivers/pci/host/pci-host-generic.c
@@ -210,7 +210,6 @@ static int gen_pci_probe(struct platform_device *pdev)
int err;
const char *type;
const struct of_device_id *of_id;
-   const int *prop;
struct device *dev = &pdev->dev;
struct device_node *np = dev->of_node;
struct gen_pci *pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
@@ -225,13 +224,7 @@ static int gen_pci_probe(struct platform_device *pdev)
return -EINVAL;
}
 
-   prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL);
-   if (prop) {
-   if (*prop)
-   pci_add_flags(PCI_PROBE_ONLY);
-   else
-   pci_clear_flags(PCI_PROBE_ONLY);
-   }
+   of_pci_check_probe_only();
 
of_id = of_match_node(gen_pci_of_match, np);
pci->cfg.ops = of_id->data;
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 3/4] powerpc: PCI: Fix lookup of linux, pci-probe-only property

2015-09-04 Thread Marc Zyngier
When find_and_init_phbs() looks for the probe-only property, it seems
to trust the firmware to be correctly written, and assumes that there
is a parameter to the property.

It is conceivable that the firmware could not be that perfect, and it
could expose this property naked (at least one arm64 platform seems to
exhibit this exact behaviour). The setup code the ends up making
a decision based on whatever the property pointer points to, which
is likely to be junk.

Instead, switch to the common of_pci.c implementation that doesn't
suffer from this problem and ignore the property if the firmware
couldn't make up its mind.

Signed-off-by: Marc Zyngier 
---
 arch/powerpc/platforms/pseries/setup.c | 14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 39a74fa..6016709 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -495,18 +496,7 @@ static void __init find_and_init_phbs(void)
 * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties
 * in chosen.
 */
-   if (of_chosen) {
-   const int *prop;
-
-   prop = of_get_property(of_chosen,
-   "linux,pci-probe-only", NULL);
-   if (prop) {
-   if (*prop)
-   pci_add_flags(PCI_PROBE_ONLY);
-   else
-   pci_clear_flags(PCI_PROBE_ONLY);
-   }
-   }
+   of_pci_check_probe_only();
 }
 
 static void __init pSeries_setup_arch(void)
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 4/4] arm64: dts: Drop linux, pci-probe-only from the Seattle DTS

2015-09-04 Thread Marc Zyngier
The linux,pci-probe-only property mandates an argument to indicate
whether or not to engage the "probe-only" mode, but the Seattle
DTS just provides a naked property, which is illegal.

Also, it turns out that the board is perfectly happy without
probe-only, so let's drop this altogether.

Acked-by: Suravee Suthikulpanit 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/boot/dts/amd/amd-overdrive.dts 
b/arch/arm64/boot/dts/amd/amd-overdrive.dts
index 564a3f7..128fa94 100644
--- a/arch/arm64/boot/dts/amd/amd-overdrive.dts
+++ b/arch/arm64/boot/dts/amd/amd-overdrive.dts
@@ -14,7 +14,6 @@
 
chosen {
stdout-path = &serial0;
-   linux,pci-probe-only;
};
 };
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 0/4] PCI: arm64/powerpc: Fix parsing of linux, pci-probe-only

2015-09-17 Thread Marc Zyngier
On 17/09/15 16:30, Bjorn Helgaas wrote:
> On Fri, Sep 04, 2015 at 05:50:07PM +0100, Marc Zyngier wrote:
>> The pci-host-generic driver parses the linux,pci-probe-only property,
>> and assumes that it will have a boolean parameter.
>>
>> Turns out that the Seattle DTS file has a naked "linux,pci-probe-only"
>> property, which leads to the driver dereferencing some unsuspecting
>> memory location. Nothing really bad happens (we end up reading some
>> other bit of DT, fortunately), but that not a reason to keep it this
>> way. Turns out that the Pseries code (where this code was lifted from)
>> may suffer from the same issue.
>>
>> The first patch introduces a common (and fixed) version of that check
>> that can be used by drivers and architectures that require it. The two
>> following patches change the pci-host-generic driver and the powerpc
>> code to use it.
>>
>> Finally, the bad property is removed from the Seatle DTS, because it
>> is simply not necessary (it actually prevents me from using SR-IOV,
>> which otherwise runs fine without the probe-only thing).
>>
>> This has been tested on the offending Seattle board.
>>
>> * From v3:
>>   - Restrict the property lookup to /chosen (Rob)
>>   - Acked-by on patch #4 from Suravee
>>   - I swear this is the last time I rework these patches! ;-)
>>
>> * From v2:
>>   - Use of_property_read_u32 to safely read the property (Rob)
>>   - Add a log message to indicate when we enable probe-only
>> (probably quite useful for debugging)
>>
>> * From v1:
>>   - Consolidate the parsing in of_pci.c (Bjorn)
>>
>> Marc Zyngier (4):
>>   of/pci: Add of_pci_check_probe_only to parse "linux,pci-probe-only"
>>   PCI: pci-host-generic: Fix lookup of linux,pci-probe-only property
>>   powerpc: PCI: Fix lookup of linux,pci-probe-only property
>>   arm64: dts: Drop linux,pci-probe-only from the Seattle DTS
>>
>>  arch/arm64/boot/dts/amd/amd-overdrive.dts |  1 -
>>  arch/powerpc/platforms/pseries/setup.c| 14 ++
>>  drivers/of/of_pci.c   | 28 
>>  drivers/pci/host/pci-host-generic.c   |  9 +
>>  include/linux/of_pci.h|  3 +++
>>  5 files changed, 34 insertions(+), 21 deletions(-)
> 
> Applied with the comment tweak and acks to pci/host-generic for v4.4,
> thanks!

Turns out that the 01.org infrastructure has picked up on a compilation
bug with randconfig. The following patch seems to fix it and should be
applied on the first patch:

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 485d625..2da5abc 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -6,6 +6,8 @@
 #include 
 #include 
 
+#include 
+
 static inline int __of_pci_pci_compare(struct device_node *node,
   unsigned int data)
 {

Sorry for the annoyance.

M.
-- 
Jazz is not dead. It just smells funny...
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 03/35] docs: fix broken references to text files

2020-04-09 Thread Marc Zyngier
On Wed,  8 Apr 2020 17:45:55 +0200
Mauro Carvalho Chehab  wrote:

> Several references got broken due to txt to ReST conversion.
> 
> Several of them can be automatically fixed with:
> 
>   scripts/documentation-file-ref-check --fix
> 
> Reviewed-by: Mathieu Poirier  # 
> hwtracing/coresight/Kconfig
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/virt/kvm/arm/pvtime.rst|  2 +-
>  virt/kvm/arm/vgic/vgic-mmio-v3.c |  2 +-
>  virt/kvm/arm/vgic/vgic.h         |  4 ++--

Acked-by: Marc Zyngier  # kvm/arm64

M.
-- 
Jazz is not dead. It just smells funny...


Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function

2020-04-16 Thread Marc Zyngier

On 2020-04-16 08:03, Vitaly Kuznetsov wrote:

Tianjia Zhang  writes:


In earlier versions of kvm, 'kvm_run' is an independent structure
and is not included in the vcpu structure. At present, 'kvm_run'
is already included in the vcpu structure, so the parameter
'kvm_run' is redundant.

This patch simplify the function definition, removes the extra
'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure
if necessary.

Signed-off-by: Tianjia Zhang 
---

v2 change:
  remove 'kvm_run' parameter and extract it from 'kvm_vcpu'

 arch/mips/kvm/mips.c   |  3 ++-
 arch/powerpc/kvm/powerpc.c |  3 ++-
 arch/s390/kvm/kvm-s390.c   |  3 ++-
 arch/x86/kvm/x86.c | 11 ++-
 include/linux/kvm_host.h   |  2 +-
 virt/kvm/arm/arm.c |  6 +++---
 virt/kvm/kvm_main.c|  2 +-
 7 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 8f05dd0a0f4e..ec24adf4857e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -439,8 +439,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct 
kvm_vcpu *vcpu,

return -ENOIOCTLCMD;
 }

-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run 
*run)

+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *run = vcpu->run;
int r = -EINTR;

vcpu_load(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index e15166b0a16d..7e24691e138a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1764,8 +1764,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu 
*vcpu, struct kvm_one_reg *reg)

return r;
 }

-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run 
*run)

+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *run = vcpu->run;
int r;

vcpu_load(vcpu);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 19a81024fe16..443af3ead739 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4333,8 +4333,9 @@ static void store_regs(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)

store_regs_fmt2(vcpu, kvm_run);
 }

-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run 
*kvm_run)

+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *kvm_run = vcpu->run;
int rc;

if (kvm_run->immediate_exit)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bf2ecafd027..a0338e86c90f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8707,8 +8707,9 @@ static void kvm_put_guest_fpu(struct kvm_vcpu 
*vcpu)

trace_kvm_fpu(0);
 }

-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run 
*kvm_run)

+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *kvm_run = vcpu->run;
int r;

vcpu_load(vcpu);
@@ -8726,18 +8727,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu 
*vcpu, struct kvm_run *kvm_run)

r = -EAGAIN;
if (signal_pending(current)) {
r = -EINTR;
-   vcpu->run->exit_reason = KVM_EXIT_INTR;
+   kvm_run->exit_reason = KVM_EXIT_INTR;
++vcpu->stat.signal_exits;
}
goto out;
}

-   if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) {
+   if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) {
r = -EINVAL;
goto out;
}

-   if (vcpu->run->kvm_dirty_regs) {
+   if (kvm_run->kvm_dirty_regs) {
r = sync_regs(vcpu);
if (r != 0)
goto out;
@@ -8767,7 +8768,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu 
*vcpu, struct kvm_run *kvm_run)


 out:
kvm_put_guest_fpu(vcpu);
-   if (vcpu->run->kvm_valid_regs)
+   if (kvm_run->kvm_valid_regs)
store_regs(vcpu);
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6d58beb65454..1e17ef719595 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -866,7 +866,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct 
kvm_vcpu *vcpu,

struct kvm_mp_state *mp_state);
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg);
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run 
*kvm_run);

+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);

 int kvm_arch_init(void *opaque);
 void kvm_arch_exit(void);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..f5390ac2165b 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -639,7 +639,6 @@ static void check_vcpu_requests(struct kvm_vcpu 
*vcpu)

 /**
  * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute 
guest code

  * @vcpu:  The 

Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function

2020-04-16 Thread Marc Zyngier

On 2020-04-16 09:45, Tianjia Zhang wrote:

On 2020/4/16 16:28, Marc Zyngier wrote:


[...]

Overall, there is a large set of cleanups to be done when both the 
vcpu and the run
structures are passed as parameters at the same time. Just grepping 
the tree for

kvm_run is pretty instructive.

     M.


Sorry, it's my mistake, I only compiled the x86 platform, I will
submit patch again.


Not a mistake. All I'm saying is that there is an opportunity for a 
larger

series that cleans up the code base, rather than just doing a couple of
localized changes.

Thanks,

M.
--
Jazz is not dead. It just smells funny...


Re: [PATCH v4 2/7] KVM: arm64: clean up redundant 'kvm_run' parameters

2020-05-05 Thread Marc Zyngier

Hi Tianjia,

On 2020-04-27 05:35, Tianjia Zhang wrote:
In the current kvm version, 'kvm_run' has been included in the 
'kvm_vcpu'

structure. For historical reasons, many kvm-related function parameters
retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This
patch does a unified cleanup of these remaining redundant parameters.

Signed-off-by: Tianjia Zhang 


On the face of it, this looks OK, but I haven't tried to run the
resulting kernel. I'm not opposed to taking this patch *if* there
is an agreement across architectures to take the series (I value
consistency over the janitorial exercise).

Another thing is that this is going to conflict with the set of
patches that move the KVM/arm code back where it belongs 
(arch/arm64/kvm),

so I'd probably cherry-pick that one directly.

Thanks,

M.


--
Jazz is not dead. It just smells funny...


Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI

2018-06-13 Thread Marc Zyngier
On 13/06/18 10:20, Thomas Gleixner wrote:
> On Wed, 13 Jun 2018, Julien Thierry wrote:
>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
 diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
 index 5426627..dbc5e02 100644
 --- a/include/linux/interrupt.h
 +++ b/include/linux/interrupt.h
 @@ -61,6 +61,8 @@
*interrupt handler after suspending interrupts. For
 system
*wakeup devices users need to implement wakeup
 detection in
*their interrupt handlers.
 + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
 non-maskable, if
 + *supported by the chip.
*/
>>>
>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>> NMIs to this level.
>>>
>>
>> I've been working on something similar on arm64 side, and effectively the one
>> thing that might be common to arm64 and intel is the interface to set an
>> interrupt as NMI. So I guess it would be nice to agree on the right approach
>> for this.
>>
>> The way I did it was by introducing a new irq_state and let the irqchip 
>> driver
>> handle most of the work (if it supports that state):
>>
>> https://lkml.org/lkml/2018/5/25/181
>>
>> This has not been ACKed nor NAKed. So I am just asking whether this is a more
>> suitable approach, and if not, is there any suggestions on how to do this?
> 
> I really didn't pay attention to that as it's burried in the GIC/ARM series
> which is usually Marc's playground.

I'm working my way through it ATM now that I have some brain cycles back.

> Adding NMI delivery support at low level architecture irq chip level is
> perfectly fine, but the exposure of that needs to be restricted very
> much. Adding it to the generic interrupt control interfaces is not going to
> happen. That's doomed to begin with and a complete abuse of the interface
> as the handler can not ever be used for that.

I can only agree with that. Allowing random driver to use request_irq()
to make anything an NMI ultimately turns it into a complete mess ("hey,
NMI is *faster*, let's use that"), and a potential source of horrible
deadlocks.

What I'd find more palatable is a way for an irqchip to be able to
prioritize some interrupts based on a set of architecturally-defined
requirements, and a separate NMI requesting/handling framework that is
separate from the IRQ API, as the overall requirements are likely to
completely different.

It shouldn't have to be nearly as complex as the IRQ API, and require
much stricter requirements in terms of what you can do there (flow
handling should definitely be different).

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


Re: [PATCH 10/10] soc: qcom: convert to devm_platform_ioremap_resource

2019-12-15 Thread Marc Zyngier
On Sat, 14 Dec 2019 17:54:47 +
Yangtao Li  wrote:

> Use devm_platform_ioremap_resource() to simplify code.
> 
> Signed-off-by: Yangtao Li 
> ---
>  drivers/soc/qcom/llcc-qcom.c| 7 +--
>  drivers/soc/qcom/qcom-geni-se.c | 4 +---
>  drivers/soc/qcom/qcom_aoss.c| 4 +---
>  drivers/soc/qcom/qcom_gsbi.c| 5 +
>  drivers/soc/qcom/spm.c  | 4 +---
>  5 files changed, 5 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/soc/qcom/llcc-qcom.c b/drivers/soc/qcom/llcc-qcom.c
> index 429b5a60a1ba..99e19df76889 100644
> --- a/drivers/soc/qcom/llcc-qcom.c
> +++ b/drivers/soc/qcom/llcc-qcom.c
> @@ -387,7 +387,6 @@ static int qcom_llcc_remove(struct platform_device *pdev)
>  static struct regmap *qcom_llcc_init_mmio(struct platform_device *pdev,
>   const char *name)
>  {
> - struct resource *res;
>   void __iomem *base;
>   struct regmap_config llcc_regmap_config = {
>   .reg_bits = 32,
> @@ -396,11 +395,7 @@ static struct regmap *qcom_llcc_init_mmio(struct 
> platform_device *pdev,
>   .fast_io = true,
>   };
>  
> - res = platform_get_resource_byname(pdev, IORESOURCE_MEM, name);
> - if (!res)
> - return ERR_PTR(-ENODEV);
> -
> - base = devm_ioremap_resource(&pdev->dev, res);
> + base = devm_platform_ioremap_resource(pdev, 0);

What guarantees do you have that entry 0 matches name?

I find these changes pointless: they don't add much to the readability
or maintainability of the code, and instead introduce creative bugs.

M.
-- 
Jazz is not dead. It just smells funny...


[PATCH] Make auto-loading of therm_pm72 possible

2010-12-05 Thread Marc Zyngier
The therm_pm72 driver, used on the PowerMac G5 range, cannot be
auto-loaded, since the driver itself creates both the device node
and the driver instance.

Moving the device node creation to the platform setup code and
adding the necessary MODULE_DEVICE_TABLE() information allows the
driver to be automatically loaded by udev on any semi-modern
distribution.

It "fixes" a major source of problem on G5 machines where the
driver wasn't explicitely loaded by default, and the system
would automatically shutdown under load.

Tested on an Xserve G5.

Signed-off-by: Marc Zyngier 
Cc: Benjamin Herrenschmidt 
---
 arch/powerpc/platforms/powermac/setup.c |9 +
 drivers/macintosh/therm_pm72.c  |   30 +-
 2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/setup.c 
b/arch/powerpc/platforms/powermac/setup.c
index 9deb274..d5aceb7 100644
--- a/arch/powerpc/platforms/powermac/setup.c
+++ b/arch/powerpc/platforms/powermac/setup.c
@@ -506,6 +506,15 @@ static int __init pmac_declare_of_platform_devices(void)
of_platform_device_create(np, "smu", NULL);
of_node_put(np);
}
+   np = of_find_node_by_type(NULL, "fcu");
+   if (np == NULL) {
+   /* Some machines have strangely broken device-tree */
+   np = 
of_find_node_by_path("/u...@0,f800/i...@f8001000/f...@15e");
+   }
+   if (np) {
+   of_platform_device_create(np, "temperature", NULL);
+   of_node_put(np);
+   }
 
return 0;
 }
diff --git a/drivers/macintosh/therm_pm72.c b/drivers/macintosh/therm_pm72.c
index 4454927..2e041fd 100644
--- a/drivers/macintosh/therm_pm72.c
+++ b/drivers/macintosh/therm_pm72.c
@@ -2213,6 +2213,9 @@ static void fcu_lookup_fans(struct device_node *fcu_node)
 static int fcu_of_probe(struct platform_device* dev, const struct of_device_id 
*match)
 {
state = state_detached;
+   of_dev = dev;
+
+   dev_info(&dev->dev, "PowerMac G5 Thermal control driver %s\n", VERSION);
 
/* Lookup the fans in the device tree */
fcu_lookup_fans(dev->dev.of_node);
@@ -2235,6 +2238,7 @@ static const struct of_device_id fcu_match[] =
},
{},
 };
+MODULE_DEVICE_TABLE(of, fcu_match);
 
 static struct of_platform_driver fcu_of_platform_driver = 
 {
@@ -2252,8 +2256,6 @@ static struct of_platform_driver fcu_of_platform_driver =
  */
 static int __init therm_pm72_init(void)
 {
-   struct device_node *np;
-
rackmac = of_machine_is_compatible("RackMac3,1");
 
if (!of_machine_is_compatible("PowerMac7,2") &&
@@ -2261,34 +2263,12 @@ static int __init therm_pm72_init(void)
!rackmac)
return -ENODEV;
 
-   printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION);
-
-   np = of_find_node_by_type(NULL, "fcu");
-   if (np == NULL) {
-   /* Some machines have strangely broken device-tree */
-   np = 
of_find_node_by_path("/u...@0,f800/i...@f8001000/f...@15e");
-   if (np == NULL) {
-   printk(KERN_ERR "Can't find FCU in device-tree 
!\n");
-   return -ENODEV;
-   }
-   }
-   of_dev = of_platform_device_create(np, "temperature", NULL);
-   if (of_dev == NULL) {
-   printk(KERN_ERR "Can't register FCU platform device !\n");
-   return -ENODEV;
-   }
-
-   of_register_platform_driver(&fcu_of_platform_driver);
-   
-   return 0;
+   return of_register_platform_driver(&fcu_of_platform_driver);
 }
 
 static void __exit therm_pm72_exit(void)
 {
of_unregister_platform_driver(&fcu_of_platform_driver);
-
-   if (of_dev)
-   of_device_unregister(of_dev);
 }
 
 module_init(therm_pm72_init);
-- 
1.7.2.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] Make auto-loading of therm_pm72 possible

2011-01-03 Thread Marc Zyngier
The therm_pm72 driver, used on the PowerMac G5 range, cannot be
auto-loaded, since the driver itself creates both the device node
and the driver instance.

Moving the device node creation to the platform setup code and
adding the necessary MODULE_DEVICE_TABLE() information allows the
driver to be automatically loaded by udev on any semi-modern
distribution.

It "fixes" a major source of problem on G5 machines where the
driver wasn't explicitely loaded by default, and the system
would automatically shutdown under load.

Tested on an Xserve G5.

Signed-off-by: Marc Zyngier 
Cc: Benjamin Herrenschmidt 
---
 arch/powerpc/platforms/powermac/setup.c |9 +
 drivers/macintosh/therm_pm72.c  |   30 +-
 2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/setup.c 
b/arch/powerpc/platforms/powermac/setup.c
index 9deb274..d5aceb7 100644
--- a/arch/powerpc/platforms/powermac/setup.c
+++ b/arch/powerpc/platforms/powermac/setup.c
@@ -506,6 +506,15 @@ static int __init pmac_declare_of_platform_devices(void)
of_platform_device_create(np, "smu", NULL);
of_node_put(np);
}
+   np = of_find_node_by_type(NULL, "fcu");
+   if (np == NULL) {
+   /* Some machines have strangely broken device-tree */
+   np = 
of_find_node_by_path("/u...@0,f800/i...@f8001000/f...@15e");
+   }
+   if (np) {
+   of_platform_device_create(np, "temperature", NULL);
+   of_node_put(np);
+   }
 
return 0;
 }
diff --git a/drivers/macintosh/therm_pm72.c b/drivers/macintosh/therm_pm72.c
index 4454927..2e041fd 100644
--- a/drivers/macintosh/therm_pm72.c
+++ b/drivers/macintosh/therm_pm72.c
@@ -2213,6 +2213,9 @@ static void fcu_lookup_fans(struct device_node *fcu_node)
 static int fcu_of_probe(struct platform_device* dev, const struct of_device_id 
*match)
 {
state = state_detached;
+   of_dev = dev;
+
+   dev_info(&dev->dev, "PowerMac G5 Thermal control driver %s\n", VERSION);
 
/* Lookup the fans in the device tree */
fcu_lookup_fans(dev->dev.of_node);
@@ -2235,6 +2238,7 @@ static const struct of_device_id fcu_match[] =
},
{},
 };
+MODULE_DEVICE_TABLE(of, fcu_match);
 
 static struct of_platform_driver fcu_of_platform_driver = 
 {
@@ -2252,8 +2256,6 @@ static struct of_platform_driver fcu_of_platform_driver =
  */
 static int __init therm_pm72_init(void)
 {
-   struct device_node *np;
-
rackmac = of_machine_is_compatible("RackMac3,1");
 
if (!of_machine_is_compatible("PowerMac7,2") &&
@@ -2261,34 +2263,12 @@ static int __init therm_pm72_init(void)
!rackmac)
return -ENODEV;
 
-   printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION);
-
-   np = of_find_node_by_type(NULL, "fcu");
-   if (np == NULL) {
-   /* Some machines have strangely broken device-tree */
-   np = 
of_find_node_by_path("/u...@0,f800/i...@f8001000/f...@15e");
-   if (np == NULL) {
-   printk(KERN_ERR "Can't find FCU in device-tree 
!\n");
-   return -ENODEV;
-   }
-   }
-   of_dev = of_platform_device_create(np, "temperature", NULL);
-   if (of_dev == NULL) {
-   printk(KERN_ERR "Can't register FCU platform device !\n");
-   return -ENODEV;
-   }
-
-   of_register_platform_driver(&fcu_of_platform_driver);
-   
-   return 0;
+   return of_register_platform_driver(&fcu_of_platform_driver);
 }
 
 static void __exit therm_pm72_exit(void)
 {
of_unregister_platform_driver(&fcu_of_platform_driver);
-
-   if (of_dev)
-   of_device_unregister(of_dev);
 }
 
 module_init(therm_pm72_init);
-- 
1.7.2.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm/arm64: Spark benchmark

2023-06-09 Thread Marc Zyngier
On Fri, 09 Jun 2023 01:59:35 +0100,
Yu Zhao  wrote:
> 
> TLDR
> 
> Apache Spark spent 12% less time sorting four billion random integers twenty 
> times (in ~4 hours) after this patchset [1].

Why are the 3 architectures you have considered being evaluated with 3
different benchmarks? I am not suspecting you to have cherry-picked
the best results, but I'd really like to see a variety of benchmarks
that exercise this stuff differently.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH v4 12/12] KVM: remove KVM_REQ_UNHALT

2022-09-22 Thread Marc Zyngier
On Wed, 21 Sep 2022 01:32:01 +0100,
Sean Christopherson  wrote:
> 
> From: Paolo Bonzini 
> 
> KVM_REQ_UNHALT is now unnecessary because it is replaced by the return
> value of kvm_vcpu_block/kvm_vcpu_halt.  Remove it.
> 
> No functional change intended.
> 
> Signed-off-by: Paolo Bonzini 
> Signed-off-by: Sean Christopherson 

Acked-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH] KVM: Get rid of return value from kvm_arch_create_vm_debugfs()

2024-02-16 Thread Marc Zyngier
On Fri, 16 Feb 2024 15:59:41 +,
Oliver Upton  wrote:
> 
> The general expectation with debugfs is that any initialization failure
> is nonfatal. Nevertheless, kvm_arch_create_vm_debugfs() allows
> implementations to return an error and kvm_create_vm_debugfs() allows
> that to fail VM creation.
> 
> Change to a void return to discourage architectures from making debugfs
> failures fatal for the VM. Seems like everyone already had the right
> idea, as all implementations already return 0 unconditionally.
> 
> Signed-off-by: Oliver Upton 

Acked-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH mm-unstable v1 3/5] kvm/arm64: add kvm_arch_test_clear_young()

2023-02-17 Thread Marc Zyngier
On Fri, 17 Feb 2023 04:21:28 +,
Yu Zhao  wrote:
> 
> On Thu, Feb 16, 2023 at 9:12 PM Yu Zhao  wrote:
> >
> > This patch adds kvm_arch_test_clear_young() for the vast majority of
> > VMs that are not pKVM and run on hardware that sets the accessed bit
> > in KVM page tables.

I'm really interested in how you can back this statement. 90% of the
HW I have access to is not FEAT_HWAFDB capable, either because it
predates the feature or because the feature is too buggy to be useful.

Do you have numbers?

> >
> > It relies on two techniques, RCU and cmpxchg, to safely test and clear
> > the accessed bit without taking the MMU lock. The former protects KVM
> > page tables from being freed while the latter clears the accessed bit
> > atomically against both the hardware and other software page table
> > walkers.
> >
> > Signed-off-by: Yu Zhao 
> > ---
> >  arch/arm64/include/asm/kvm_host.h   |  7 +++
> >  arch/arm64/include/asm/kvm_pgtable.h|  8 +++
> >  arch/arm64/include/asm/stage2_pgtable.h | 43 ++
> >  arch/arm64/kvm/arm.c|  1 +
> >  arch/arm64/kvm/hyp/pgtable.c| 51 ++--
> >  arch/arm64/kvm/mmu.c| 77 -
> >  6 files changed, 141 insertions(+), 46 deletions(-)
> 
> Adding Marc and Will.
> 
> Can you please add other interested parties that I've missed?

The MAINTAINERS file has it all:

KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)
M:  Marc Zyngier 
M:  Oliver Upton 
R:  James Morse 
R:  Suzuki K Poulose 
R:  Zenghui Yu 
L:  kvm...@lists.linux.dev

May I suggest that you repost your patch and Cc the interested
parties yourself? I guess most folks will want to see this in context,
and not as a random, isolated change with no rationale.

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH mm-unstable v1 3/5] kvm/arm64: add kvm_arch_test_clear_young()

2023-02-23 Thread Marc Zyngier
On Thu, 23 Feb 2023 03:58:47 +,
Yu Zhao  wrote:
> 
> On Fri, Feb 17, 2023 at 2:00 AM Marc Zyngier  wrote:
> >
> > On Fri, 17 Feb 2023 04:21:28 +,
> > Yu Zhao  wrote:
> > >
> > > On Thu, Feb 16, 2023 at 9:12 PM Yu Zhao  wrote:
> > > >
> > > > This patch adds kvm_arch_test_clear_young() for the vast majority of
> > > > VMs that are not pKVM and run on hardware that sets the accessed bit
> > > > in KVM page tables.
> >
> > I'm really interested in how you can back this statement. 90% of the
> > HW I have access to is not FEAT_HWAFDB capable, either because it
> > predates the feature or because the feature is too buggy to be useful.
> 
> This is my expericen too -- most devices are pre v8.2.

And yet you have no issue writing the above. Puzzling.

> 
> > Do you have numbers?
> 
> Let's do a quick market survey by segment. The following only applies
> to ARM CPUs:
> 
> 1. Phones: none of the major Android phone vendors sell phones running
> VMs; no other major Linux phone vendors.

Maybe you should have a reality check and look at what your own
employer is shipping.

> 2. Laptops: only a very limited number of Chromebooks run VMs, namely
> ACRVM. No other major Linux laptop vendors.

Again, your employer disagree.

> 3. Desktops: no major Linux desktop vendors.

My desktop disagree (I send this from my arm64 desktop VM ).

> 4. Embedded/IoT/Router: no major Linux vendors run VMs (Android Auto
> can be a VM guest on QNX host).

This email is brought to you via a router VM on an arm64 box.

> 5. Cloud: this is where the vast majority VMs come from. Among the
> vendors available to the general public, Ampere is the biggest player.
> Here [1] is a list of its customers. The A-bit works well even on its
> EVT products (Neoverse cores).

Just the phone stuff dwarfs the number of cloud hosts.

Hopefully your patches are better than your market analysis...

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify()

2024-04-12 Thread Marc Zyngier
On Fri, 05 Apr 2024 12:58:11 +0100,
Paolo Bonzini  wrote:
> 
> The .change_pte() MMU notifier callback was intended as an optimization
> and for this reason it was initially called without a surrounding
> mmu_notifier_invalidate_range_{start,end}() pair.  It was only ever
> implemented by KVM (which was also the original user of MMU notifiers)
> and the rules on when to call set_pte_at_notify() rather than set_pte_at()
> have always been pretty obscure.
> 
> It may seem a miracle that it has never caused any hard to trigger
> bugs, but there's a good reason for that: KVM's implementation has
> been nonfunctional for a good part of its existence.  Already in
> 2012, commit 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with
> invalidate_range_start and invalidate_range_end", 2012-10-09) changed the
> .change_pte() callback to occur within an invalidate_range_start/end()
> pair; and because KVM unmaps the sPTEs during .invalidate_range_start(),
> .change_pte() has no hope of finding a sPTE to change.
> 
> Therefore, all the code for .change_pte() can be removed from both KVM
> and mm/, and set_pte_at_notify() can be replaced with just set_pte_at().
> 
> Please review!  Also feel free to take the KVM patches through the mm
> tree, as I don't expect any conflicts.
> 
> Thanks,
> 
> Paolo
> 
> Paolo Bonzini (4):
>   KVM: delete .change_pte MMU notifier callback
>   KVM: remove unused argument of kvm_handle_hva_range()
>   mmu_notifier: remove the .change_pte() callback
>   mm: replace set_pte_at_notify() with just set_pte_at()
> 
>  arch/arm64/kvm/mmu.c  | 34 -
>  arch/loongarch/include/asm/kvm_host.h |  1 -
>  arch/loongarch/kvm/mmu.c  | 32 
>  arch/mips/kvm/mmu.c   | 30 ---
>  arch/powerpc/include/asm/kvm_ppc.h|  1 -
>  arch/powerpc/kvm/book3s.c |  5 ---
>  arch/powerpc/kvm/book3s.h |  1 -
>  arch/powerpc/kvm/book3s_64_mmu_hv.c   | 12 --
>  arch/powerpc/kvm/book3s_hv.c  |  1 -
>  arch/powerpc/kvm/book3s_pr.c  |  7 
>  arch/powerpc/kvm/e500_mmu_host.c  |  6 ---
>  arch/riscv/kvm/mmu.c  | 20 --
>  arch/x86/kvm/mmu/mmu.c| 54 +--
>  arch/x86/kvm/mmu/spte.c   | 16 
>  arch/x86/kvm/mmu/spte.h   |  2 -
>  arch/x86/kvm/mmu/tdp_mmu.c| 46 ---
>  arch/x86/kvm/mmu/tdp_mmu.h|  1 -
>  include/linux/kvm_host.h  |  2 -
>  include/linux/mmu_notifier.h  | 44 --
>  include/trace/events/kvm.h| 15 
>  kernel/events/uprobes.c   |  5 +--
>  mm/ksm.c  |  4 +-
>  mm/memory.c   |  7 +---
>  mm/migrate_device.c   |  8 +---
>  mm/mmu_notifier.c | 17 -
>  virt/kvm/kvm_main.c   | 50 +
>  26 files changed, 10 insertions(+), 411 deletions(-)
> 

Reviewed-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-12 Thread Marc Zyngier
On Fri, 12 Apr 2024 11:44:09 +0100,
Will Deacon  wrote:
> 
> On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index dc04bc767865..ff17849be9f4 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1768,40 +1768,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
> > kvm_gfn_range *range)
> > return false;
> >  }
> >  
> > -bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> > -{
> > -   kvm_pfn_t pfn = pte_pfn(range->arg.pte);
> > -
> > -   if (!kvm->arch.mmu.pgt)
> > -   return false;
> > -
> > -   WARN_ON(range->end - range->start != 1);
> > -
> > -   /*
> > -* If the page isn't tagged, defer to user_mem_abort() for sanitising
> > -* the MTE tags. The S2 pte should have been unmapped by
> > -* mmu_notifier_invalidate_range_end().
> > -*/
> > -   if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn)))
> > -   return false;
> > -
> > -   /*
> > -* We've moved a page around, probably through CoW, so let's treat
> > -* it just like a translation fault and the map handler will clean
> > -* the cache to the PoC.
> > -*
> > -* The MMU notifiers will have unmapped a huge PMD before calling
> > -* ->change_pte() (which in turn calls kvm_set_spte_gfn()) and
> > -* therefore we never need to clear out a huge PMD through this
> > -* calling path and a memcache is not required.
> > -*/
> > -   kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
> > -  PAGE_SIZE, __pfn_to_phys(pfn),
> > -  KVM_PGTABLE_PROT_R, NULL, 0);
> > -
> > -   return false;
> > -}
> > -
> >  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> >  {
> > u64 size = (range->end - range->start) << PAGE_SHIFT;
> 
> Thanks. It's nice to see this code retire:
> 
> Acked-by: Will Deacon 
> 
> Also, if you're in the business of hacking the MMU notifier code, it
> would be really great to change the .clear_flush_young() callback so
> that the architecture could handle the TLB invalidation. At the moment,
> the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
> being set by kvm_handle_hva_range(), whereas we could do a much
> lighter-weight and targetted TLBI in the architecture page-table code
> when we actually update the ptes for small ranges.

Indeed, and I was looking at this earlier this week as it has a pretty
devastating effect with NV (it blows the shadow S2 for that VMID, with
costly consequences).

In general, it feels like the TLB invalidation should stay with the
code that deals with the page tables, as it has a pretty good idea of
what needs to be invalidated and how -- specially on architectures
that have a HW-broadcast facility like arm64.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-13 Thread Marc Zyngier
On Fri, 12 Apr 2024 15:54:22 +0100,
Sean Christopherson  wrote:
> 
> On Fri, Apr 12, 2024, Marc Zyngier wrote:
> > On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon  wrote:
> > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> > > Also, if you're in the business of hacking the MMU notifier code, it
> > > would be really great to change the .clear_flush_young() callback so
> > > that the architecture could handle the TLB invalidation. At the moment,
> > > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
> > > being set by kvm_handle_hva_range(), whereas we could do a much
> > > lighter-weight and targetted TLBI in the architecture page-table code
> > > when we actually update the ptes for small ranges.
> > 
> > Indeed, and I was looking at this earlier this week as it has a pretty
> > devastating effect with NV (it blows the shadow S2 for that VMID, with
> > costly consequences).
> > 
> > In general, it feels like the TLB invalidation should stay with the
> > code that deals with the page tables, as it has a pretty good idea of
> > what needs to be invalidated and how -- specially on architectures
> > that have a HW-broadcast facility like arm64.
> 
> Would this be roughly on par with an in-line flush on arm64?  The simpler, 
> more
> straightforward solution would be to let architectures override flush_on_ret,
> but I would prefer something like the below as x86 can also utilize a 
> range-based
> flush when running as a nested hypervisor.
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ff0a20565f90..b65116294efe 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -601,6 +601,7 @@ static __always_inline kvm_mn_ret_t 
> __kvm_handle_hva_range(struct kvm *kvm,
> struct kvm_gfn_range gfn_range;
> struct kvm_memory_slot *slot;
> struct kvm_memslots *slots;
> +   bool need_flush = false;
> int i, idx;
>  
> if (WARN_ON_ONCE(range->end <= range->start))
> @@ -653,10 +654,22 @@ static __always_inline kvm_mn_ret_t 
> __kvm_handle_hva_range(struct kvm *kvm,
> break;
> }
> r.ret |= range->handler(kvm, &gfn_range);
> +
> +   /*
> +* Use a precise gfn-based TLB flush when possible, as
> +* most mmu_notifier events affect a small-ish range.
> +* Fall back to a full TLB flush if the gfn-based 
> flush
> +* fails, and don't bother trying the gfn-based flush
> +* if a full flush is already pending.
> +*/
> +   if (range->flush_on_ret && !need_flush && r.ret &&
> +   kvm_arch_flush_remote_tlbs_range(kvm, 
> gfn_range.start
> +gfn_range.end - 
> gfn_range.start + 1))
> +   need_flush = true;
> }
> }
>  
> -   if (range->flush_on_ret && r.ret)
> +   if (need_flush)
> kvm_flush_remote_tlbs(kvm);
>  
> if (r.found_memslot)

I think this works for us on HW that has range invalidation, which
would already be a positive move.

For the lesser HW that isn't range capable, it also gives the
opportunity to perform the iteration ourselves or go for the nuclear
option if the range is larger than some arbitrary constant (though
this is additional work).

But this still considers the whole range as being affected by
range->handler(). It'd be interesting to try and see whether more
precise tracking is (or isn't) generally beneficial.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [RFC PATCH kernel 1/2] irq: Add reference counting to IRQ mappings

2020-10-27 Thread Marc Zyngier

Hi Alexey,

On 2020-10-27 09:06, Alexey Kardashevskiy wrote:
PCI devices share 4 legacy INTx interrupts from the same PCI host 
bridge.

Device drivers map/unmap hardware interrupts via irq_create_mapping()/
irq_dispose_mapping(). The problem with that these interrupts are
shared and when performing hot unplug, we need to unmap the interrupt
only when the last device is released.

This reuses already existing irq_desc::kobj for this purpose.
The refcounter is naturally 1 when the descriptor is allocated already;
this adds kobject_get() in places where already existing mapped virq
is returned.


That's quite interesting, as I was about to revive a patch series that
rework the irqdomain subsystem to directly cache irq_desc instead of
raw interrupt numbers. And for that, I needed some form of 
refcounting...




This reorganizes irq_dispose_mapping() to release the kobj and let
the release callback do the cleanup.

If some driver or platform does its own reference counting, this 
expects

those parties to call irq_find_mapping() and call irq_dispose_mapping()
for every irq_create_fwspec_mapping()/irq_create_mapping().

Signed-off-by: Alexey Kardashevskiy 
---
 kernel/irq/irqdesc.c   | 35 +++
 kernel/irq/irqdomain.c | 27 +--
 2 files changed, 36 insertions(+), 26 deletions(-)

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 1a7723604399..dae096238500 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -419,20 +419,39 @@ static struct irq_desc *alloc_desc(int irq, int
node, unsigned int flags,
return NULL;
 }

+static void delayed_free_desc(struct rcu_head *rhp);
 static void irq_kobj_release(struct kobject *kobj)
 {
struct irq_desc *desc = container_of(kobj, struct irq_desc, kobj);
+   struct irq_domain *domain;
+   unsigned int virq = desc->irq_data.irq;

-   free_masks(desc);
-   free_percpu(desc->kstat_irqs);
-   kfree(desc);
+   domain = desc->irq_data.domain;
+   if (domain) {
+   if (irq_domain_is_hierarchy(domain)) {
+   irq_domain_free_irqs(virq, 1);


How does this work with hierarchical domains? Each domain should
contribute as a reference on the irq_desc. But if you got here,
it means the refcount has already dropped to 0.

So either there is nothing to free here, or you don't track the
references implied by the hierarchy. I suspect the latter.


+   } else {
+   irq_domain_disassociate(domain, virq);
+   irq_free_desc(virq);
+   }
+   }
+
+   /*
+* We free the descriptor, masks and stat fields via RCU. That
+* allows demultiplex interrupts to do rcu based management of
+* the child interrupts.
+* This also allows us to use rcu in kstat_irqs_usr().
+*/
+   call_rcu(&desc->rcu, delayed_free_desc);
 }

 static void delayed_free_desc(struct rcu_head *rhp)
 {
struct irq_desc *desc = container_of(rhp, struct irq_desc, rcu);

-   kobject_put(&desc->kobj);
+   free_masks(desc);
+   free_percpu(desc->kstat_irqs);
+   kfree(desc);
 }

 static void free_desc(unsigned int irq)
@@ -453,14 +472,6 @@ static void free_desc(unsigned int irq)
 */
irq_sysfs_del(desc);
delete_irq_desc(irq);
-
-   /*
-* We free the descriptor, masks and stat fields via RCU. That
-* allows demultiplex interrupts to do rcu based management of
-* the child interrupts.
-* This also allows us to use rcu in kstat_irqs_usr().
-*/
-   call_rcu(&desc->rcu, delayed_free_desc);
 }

 static int alloc_descs(unsigned int start, unsigned int cnt, int node,
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374b892d..02733ddc321f 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -638,6 +638,7 @@ unsigned int irq_create_mapping(struct irq_domain 
*domain,

 {
struct device_node *of_node;
int virq;
+   struct irq_desc *desc;

pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq);

@@ -655,7 +656,9 @@ unsigned int irq_create_mapping(struct irq_domain 
*domain,

/* Check if mapping already exists */
virq = irq_find_mapping(domain, hwirq);
if (virq) {
+   desc = irq_to_desc(virq);
pr_debug("-> existing mapping on virq %d\n", virq);
+   kobject_get(&desc->kobj);


My worry with this is that there is probably a significant amount of
code out there that relies on multiple calls to irq_create_mapping()
with the same parameters not to have any side effects. They would
expect a subsequent irq_dispose_mapping() to drop the translation
altogether, and that's obviously not the case here.

Have you audited the various call sites to see what could break?


return virq;
}

@@ -751,6 +754,7 @@ unsigned int irq_create_fwspec_mapping(struct
irq_fws

Re: [PATCH AUTOSEL 4.14 3/4] powerpc/msi: Fix deassociation of MSI descriptors

2022-12-28 Thread Marc Zyngier
On Tue, 27 Dec 2022 20:36:08 +,
Sasha Levin  wrote:
> 
> From: Marc Zyngier 
> 
> [ Upstream commit 4545c6a3d6ba71747eaa984c338ddd745e56e23f ]
> 
> Since 2f2940d16823 ("genirq/msi: Remove filter from
> msi_free_descs_free_range()"), the core MSI code relies on the
> msi_desc->irq field to have been cleared before the descriptor
> can be freed, as it indicates that there is no association with
> a device anymore.
> 
> The irq domain code provides this guarantee, and so does s390,
> which is one of the two architectures not using irq domains for
> MSIs.
> 
> Powerpc, however, is missing this particular requirements,
> leading in a splat and leaked MSI descriptors.
> 
> Adding the now required irq reset to the handful of powerpc backends
> that implement MSIs fixes that particular problem.
> 
> Reported-by: Guenter Roeck 
> Signed-off-by: Marc Zyngier 
> Link: 
> https://lore.kernel.org/r/70dab88e-6119-0c12-7c6a-61bcbe239...@roeck-us.net
> Signed-off-by: Sasha Levin 
> ---
>  arch/powerpc/platforms/4xx/hsta_msi.c  | 1 +
>  arch/powerpc/platforms/cell/axon_msi.c | 1 +
>  arch/powerpc/platforms/pasemi/msi.c| 1 +
>  arch/powerpc/sysdev/fsl_msi.c  | 1 +
>  arch/powerpc/sysdev/mpic_u3msi.c   | 1 +
>  5 files changed, 5 insertions(+)

Please drop this patch from all stable branches. It isn't needed
before 6.2, and doesn't fix anything on its own as nobody uses this
structure after this point.

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH v2 00/50] KVM: Rework kvm_init() and hardware enabling

2022-12-28 Thread Marc Zyngier

On 2022-12-27 13:02, Paolo Bonzini wrote:
Queued, thanks.  I will leave this in kvm/queue after testing 
everything

else and moving it to kvm/next; this way, we can wait for test results
on other architectures.


Can you please make this a topic branch, and if possible based
on a released -rc? It would make it a lot easier for everyone.

Thanks,

M.
--
Jazz is not dead. It just smells funny...


Re: Call Trace and scary messages on kernel 2.6.27-rc5

2008-09-09 Thread Marc Zyngier
On Mon, 8 Sep 2008 21:47:29 -0300
Rogério Brito <[EMAIL PROTECTED]> wrote:

> Hi there.
> 
> I just compiled a new 2.6.27-rc5 kernel for my standard Kurobox (an
> embedded NAS that has an MPC8241 CPU, if I'm not mistaken) and upon
> booting, I get these scary messages and Call Traces:

[snip]

This is a known problem.

Check patch bb23b431db7405f6d79f989ad0236bf6428ba1cb in Linus' tree, it
should correct the traces you see.

Regards,

M.
-- 
And if you don't know where you're going, any road will take you there.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 2/9] ARM: PXA: Kill use of irq_create_strict_mappings()

2021-04-06 Thread Marc Zyngier
irq_create_strict_mappings() is a poor way to allow the use of
a linear IRQ domain as a legacy one. Let's be upfront about
it and use a legacy domain when appropriate.

Signed-off-by: Marc Zyngier 
---
 arch/arm/mach-pxa/pxa_cplds_irqs.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/arch/arm/mach-pxa/pxa_cplds_irqs.c 
b/arch/arm/mach-pxa/pxa_cplds_irqs.c
index 45c19ca96f7a..ec0d9b094744 100644
--- a/arch/arm/mach-pxa/pxa_cplds_irqs.c
+++ b/arch/arm/mach-pxa/pxa_cplds_irqs.c
@@ -147,22 +147,20 @@ static int cplds_probe(struct platform_device *pdev)
}
 
irq_set_irq_wake(fpga->irq, 1);
-   fpga->irqdomain = irq_domain_add_linear(pdev->dev.of_node,
-  CPLDS_NB_IRQ,
-  &cplds_irq_domain_ops, fpga);
+   if (base_irq)
+   fpga->irqdomain = irq_domain_add_legacy(pdev->dev.of_node,
+   CPLDS_NB_IRQ,
+   base_irq, 0,
+   &cplds_irq_domain_ops,
+   fpga);
+   else
+   fpga->irqdomain = irq_domain_add_linear(pdev->dev.of_node,
+   CPLDS_NB_IRQ,
+   &cplds_irq_domain_ops,
+   fpga);
if (!fpga->irqdomain)
return -ENODEV;
 
-   if (base_irq) {
-   ret = irq_create_strict_mappings(fpga->irqdomain, base_irq, 0,
-CPLDS_NB_IRQ);
-   if (ret) {
-   dev_err(&pdev->dev, "couldn't create the irq mapping 
%d..%d\n",
-   base_irq, base_irq + CPLDS_NB_IRQ);
-   return ret;
-   }
-   }
-
return 0;
 }
 
-- 
2.29.2



[PATCH 3/9] irqchip/jcore-aic: Kill use of irq_create_strict_mappings()

2021-04-06 Thread Marc Zyngier
irq_create_strict_mappings() is a poor way to allow the use of
a linear IRQ domain as a legacy one. Let's be upfront about it.

Signed-off-by: Marc Zyngier 
---
 drivers/irqchip/irq-jcore-aic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-jcore-aic.c b/drivers/irqchip/irq-jcore-aic.c
index 033bccb41455..5f47d8ee4ae3 100644
--- a/drivers/irqchip/irq-jcore-aic.c
+++ b/drivers/irqchip/irq-jcore-aic.c
@@ -100,11 +100,11 @@ static int __init aic_irq_of_init(struct device_node 
*node,
jcore_aic.irq_unmask = noop;
jcore_aic.name = "AIC";
 
-   domain = irq_domain_add_linear(node, dom_sz, &jcore_aic_irqdomain_ops,
+   domain = irq_domain_add_legacy(node, dom_sz - min_irq, min_irq, min_irq,
+  &jcore_aic_irqdomain_ops,
   &jcore_aic);
if (!domain)
return -ENOMEM;
-   irq_create_strict_mappings(domain, min_irq, min_irq, dom_sz - min_irq);
 
return 0;
 }
-- 
2.29.2



[PATCH 1/9] irqdomain: Reimplement irq_linear_revmap() with irq_find_mapping()

2021-04-06 Thread Marc Zyngier
irq_linear_revmap() is supposed to be a fast path for domain
lookups, but it only exposes low-level details of the irqdomain
implementation, details which are better kept private.

The *overhead* between the two is only a function call and
a couple of tests, so it is likely that noone can show any
meaningful difference compared to the cost of taking an
interrupt.

Reimplement irq_linear_revmap() with irq_find_mapping()
in order to preserve source code compatibility, and
rename the internal field for a measure.

Signed-off-by: Marc Zyngier 
---
 include/linux/irqdomain.h | 22 +-
 kernel/irq/irqdomain.c|  6 +++---
 2 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 33cacc8af26d..b9600f24878a 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -154,9 +154,9 @@ struct irq_domain_chip_generic;
  * Revmap data, used internally by irq_domain
  * @revmap_direct_max_irq: The largest hwirq that can be set for controllers 
that
  * support direct mapping
- * @revmap_size: Size of the linear map table @linear_revmap[]
+ * @revmap_size: Size of the linear map table @revmap[]
  * @revmap_tree: Radix map tree for hwirqs that don't fit in the linear map
- * @linear_revmap: Linear table of hwirq->virq reverse mappings
+ * @revmap: Linear table of hwirq->virq reverse mappings
  */
 struct irq_domain {
struct list_head link;
@@ -180,7 +180,7 @@ struct irq_domain {
unsigned int revmap_size;
struct radix_tree_root revmap_tree;
struct mutex revmap_tree_mutex;
-   unsigned int linear_revmap[];
+   unsigned int revmap[];
 };
 
 /* Irq domain flags */
@@ -396,24 +396,20 @@ static inline unsigned int irq_create_mapping(struct 
irq_domain *host,
return irq_create_mapping_affinity(host, hwirq, NULL);
 }
 
-
 /**
- * irq_linear_revmap() - Find a linux irq from a hw irq number.
+ * irq_find_mapping() - Find a linux irq from a hw irq number.
  * @domain: domain owning this hardware interrupt
  * @hwirq: hardware irq number in that domain space
- *
- * This is a fast path alternative to irq_find_mapping() that can be
- * called directly by irq controller code to save a handful of
- * instructions. It is always safe to call, but won't find irqs mapped
- * using the radix tree.
  */
+extern unsigned int irq_find_mapping(struct irq_domain *host,
+irq_hw_number_t hwirq);
+
 static inline unsigned int irq_linear_revmap(struct irq_domain *domain,
 irq_hw_number_t hwirq)
 {
-   return hwirq < domain->revmap_size ? domain->linear_revmap[hwirq] : 0;
+   return irq_find_mapping(domain, hwirq);
 }
-extern unsigned int irq_find_mapping(struct irq_domain *host,
-irq_hw_number_t hwirq);
+
 extern unsigned int irq_create_direct_mapping(struct irq_domain *host);
 extern int irq_create_strict_mappings(struct irq_domain *domain,
  unsigned int irq_base,
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index d10ab1d689d5..dfa716305ea9 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -486,7 +486,7 @@ static void irq_domain_clear_mapping(struct irq_domain 
*domain,
 irq_hw_number_t hwirq)
 {
if (hwirq < domain->revmap_size) {
-   domain->linear_revmap[hwirq] = 0;
+   domain->revmap[hwirq] = 0;
} else {
mutex_lock(&domain->revmap_tree_mutex);
radix_tree_delete(&domain->revmap_tree, hwirq);
@@ -499,7 +499,7 @@ static void irq_domain_set_mapping(struct irq_domain 
*domain,
   struct irq_data *irq_data)
 {
if (hwirq < domain->revmap_size) {
-   domain->linear_revmap[hwirq] = irq_data->irq;
+   domain->revmap[hwirq] = irq_data->irq;
} else {
mutex_lock(&domain->revmap_tree_mutex);
radix_tree_insert(&domain->revmap_tree, hwirq, irq_data);
@@ -920,7 +920,7 @@ unsigned int irq_find_mapping(struct irq_domain *domain,
 
/* Check if the hwirq is in the linear revmap. */
if (hwirq < domain->revmap_size)
-   return domain->linear_revmap[hwirq];
+   return domain->revmap[hwirq];
 
rcu_read_lock();
data = radix_tree_lookup(&domain->revmap_tree, hwirq);
-- 
2.29.2



[PATCH 0/9] Cleaning up some of the irqdomain features

2021-04-06 Thread Marc Zyngier
The irqdomain subsystem has grown quite a lot over the years, and some
of its features are either oddly used or just pretty useless. Some
other helpers expose internals that are likely to change soon.

Here are the bits that I'm trying to get rid of:

- irq_linear_revmap exposes the internals of the domains, and only
  works for linear domains. The supposed speed improvement really
  isn't an argument, as it gets in the way of more significant
  optimisations. Reimplemented in terms of irq_find_mapping, which
  always works, and will eventually go at some point.

- irq_create_strict_mappings is just a way to constraint the
  allocation of irqdescs into a given range, which is better served by
  creating a legacy irqdomain, and shows that the platform really
  needs to catch up with the 21st century.

- irq_create_identity mapping is just a variation on the above, with
  irq==hwirq, although the way it is used is a gross hack in the SH
  code that needs to go.

- irq_domain_add_legacy_isa is, as the names shows, a variation on
  irq_domain_add_legacy with a reservation of 16 interrupts. This is
  only used in the PPC code.

The patches address all of the above, touching some of the ARM, PPC,
and SH code that is affected. Another couple of patches address a MIPS
platform that could use the generic code, and clean some of the
comments in the irqdomain code.

Unless anyone shouts, I'd like to take this into 5.13, as it is the
basis of some further (and deeper) changes in the way irqdomains work.

    M.

Marc Zyngier (9):
  irqdomain: Reimplement irq_linear_revmap() with irq_find_mapping()
  ARM: PXA: Kill use of irq_create_strict_mappings()
  irqchip/jcore-aic: Kill use of irq_create_strict_mappings()
  sh: intc: Drop the use of irq_create_identity_mapping()
  irqdomain: Kill
irq_create_strict_mappings()/irq_create_identity_mapping()
  mips: netlogic: Use irq_domain_simple_ops for XLP PIC
  irqdomain: Drop references to recusive irqdomain setup
  powerpc: Convert irq_domain_add_legacy_isa use to
irq_domain_add_legacy
  irqdomain: Kill irq_domain_add_legacy_isa

 Documentation/core-api/irq/irq-domain.rst |  1 -
 arch/arm/mach-pxa/pxa_cplds_irqs.c| 24 +--
 arch/mips/netlogic/common/irq.c   |  6 +--
 arch/powerpc/include/asm/irq.h|  4 +-
 arch/powerpc/platforms/ps3/interrupt.c|  4 +-
 arch/powerpc/sysdev/i8259.c   |  3 +-
 arch/powerpc/sysdev/mpic.c|  2 +-
 arch/powerpc/sysdev/tsi108_pci.c  |  3 +-
 arch/powerpc/sysdev/xics/xics-common.c|  2 +-
 drivers/irqchip/irq-jcore-aic.c   |  4 +-
 drivers/sh/intc/core.c| 50 ++-
 include/linux/irqdomain.h | 42 ---
 kernel/irq/irqdomain.c| 49 +++---
 13 files changed, 59 insertions(+), 135 deletions(-)

-- 
2.29.2



[PATCH 4/9] sh: intc: Drop the use of irq_create_identity_mapping()

2021-04-06 Thread Marc Zyngier
Instead of playing games with using irq_create_identity_mapping()
and irq_domain_associate(), drop the use of the former and only
use the latter, together with the allocation of the irq_desc
as needed.

It doesn't make the code less awful, but at least the intent
is clearer.

Signed-off-by: Marc Zyngier 
---
 drivers/sh/intc/core.c | 50 ++
 1 file changed, 21 insertions(+), 29 deletions(-)

diff --git a/drivers/sh/intc/core.c b/drivers/sh/intc/core.c
index a14684ffe4c1..6c57ee1ce6c4 100644
--- a/drivers/sh/intc/core.c
+++ b/drivers/sh/intc/core.c
@@ -179,6 +179,23 @@ static unsigned int __init save_reg(struct intc_desc_int 
*d,
return 0;
 }
 
+static bool __init intc_map(struct irq_domain *domain, int irq)
+{
+   int res;
+
+   if (!irq_to_desc(irq) && irq_alloc_desc_at(irq, NUMA_NO_NODE) != irq) {
+   pr_err("uname to allocate IRQ %d\n", irq);
+   return false;
+   }
+
+   if (irq_domain_associate(domain, irq, irq)) {
+   pr_err("domain association failure\n");
+   return false;
+   }
+
+   return true;
+}
+
 int __init register_intc_controller(struct intc_desc *desc)
 {
unsigned int i, k, smp;
@@ -316,19 +333,8 @@ int __init register_intc_controller(struct intc_desc *desc)
if (!vect->enum_id)
continue;
 
-   res = irq_create_identity_mapping(d->domain, irq);
-   if (unlikely(res)) {
-   if (res == -EEXIST) {
-   res = irq_domain_associate(d->domain, irq, irq);
-   if (unlikely(res)) {
-   pr_err("domain association failure\n");
-   continue;
-   }
-   } else {
-   pr_err("can't identity map IRQ %d\n", irq);
-   continue;
-   }
-   }
+   if (!intc_map(d->domain, irq))
+   continue;
 
intc_irq_xlate_set(irq, vect->enum_id, d);
intc_register_irq(desc, d, vect->enum_id, irq);
@@ -345,22 +351,8 @@ int __init register_intc_controller(struct intc_desc *desc)
 * IRQ support, each vector still needs to have
 * its own backing irq_desc.
 */
-   res = irq_create_identity_mapping(d->domain, irq2);
-   if (unlikely(res)) {
-   if (res == -EEXIST) {
-   res = irq_domain_associate(d->domain,
-  irq2, irq2);
-   if (unlikely(res)) {
-   pr_err("domain association "
-  "failure\n");
-   continue;
-   }
-   } else {
-   pr_err("can't identity map IRQ %d\n",
-  irq);
-   continue;
-   }
-   }
+   if (!intc_map(d->domain, irq2))
+   continue;
 
vect2->enum_id = 0;
 
-- 
2.29.2



[PATCH 6/9] mips: netlogic: Use irq_domain_simple_ops for XLP PIC

2021-04-06 Thread Marc Zyngier
Use the generic irq_domain_simple_ops structure instead of
a home-grown one.

Signed-off-by: Marc Zyngier 
---
 arch/mips/netlogic/common/irq.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/mips/netlogic/common/irq.c b/arch/mips/netlogic/common/irq.c
index cf33dd8a487e..c25a2ce5e29f 100644
--- a/arch/mips/netlogic/common/irq.c
+++ b/arch/mips/netlogic/common/irq.c
@@ -276,10 +276,6 @@ asmlinkage void plat_irq_dispatch(void)
 }
 
 #ifdef CONFIG_CPU_XLP
-static const struct irq_domain_ops xlp_pic_irq_domain_ops = {
-   .xlate = irq_domain_xlate_onetwocell,
-};
-
 static int __init xlp_of_pic_init(struct device_node *node,
struct device_node *parent)
 {
@@ -324,7 +320,7 @@ static int __init xlp_of_pic_init(struct device_node *node,
 
xlp_pic_domain = irq_domain_add_legacy(node, n_picirqs,
nlm_irq_to_xirq(socid, PIC_IRQ_BASE), PIC_IRQ_BASE,
-   &xlp_pic_irq_domain_ops, NULL);
+   &irq_domain_simple_ops, NULL);
if (xlp_pic_domain == NULL) {
pr_err("PIC %pOFn: Creating legacy domain failed!\n", node);
return -EINVAL;
-- 
2.29.2



[PATCH 7/9] irqdomain: Drop references to recusive irqdomain setup

2021-04-06 Thread Marc Zyngier
It was never completely implemented, and was removed a long time
ago. Adjust the documentation to reflect this.

Signed-off-by: Marc Zyngier 
---
 kernel/irq/irqdomain.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 0ba761e6b0a8..89474aa88220 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1659,12 +1659,10 @@ void irq_domain_free_irqs(unsigned int virq, unsigned 
int nr_irqs)
 
 /**
  * irq_domain_alloc_irqs_parent - Allocate interrupts from parent domain
+ * @domain:Domain below which interrupts must be allocated
  * @irq_base:  Base IRQ number
  * @nr_irqs:   Number of IRQs to allocate
  * @arg:   Allocation data (arch/domain specific)
- *
- * Check whether the domain has been setup recursive. If not allocate
- * through the parent domain.
  */
 int irq_domain_alloc_irqs_parent(struct irq_domain *domain,
 unsigned int irq_base, unsigned int nr_irqs,
@@ -1680,11 +1678,9 @@ EXPORT_SYMBOL_GPL(irq_domain_alloc_irqs_parent);
 
 /**
  * irq_domain_free_irqs_parent - Free interrupts from parent domain
+ * @domain:Domain below which interrupts must be freed
  * @irq_base:  Base IRQ number
  * @nr_irqs:   Number of IRQs to free
- *
- * Check whether the domain has been setup recursive. If not free
- * through the parent domain.
  */
 void irq_domain_free_irqs_parent(struct irq_domain *domain,
 unsigned int irq_base, unsigned int nr_irqs)
-- 
2.29.2



[PATCH 5/9] irqdomain: Kill irq_create_strict_mappings()/irq_create_identity_mapping()

2021-04-06 Thread Marc Zyngier
No user of these APIs are left, remove them.

Signed-off-by: Marc Zyngier 
---
 include/linux/irqdomain.h |  9 -
 kernel/irq/irqdomain.c| 35 ---
 2 files changed, 44 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index b9600f24878a..3997ed9e4d7d 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -411,15 +411,6 @@ static inline unsigned int irq_linear_revmap(struct 
irq_domain *domain,
 }
 
 extern unsigned int irq_create_direct_mapping(struct irq_domain *host);
-extern int irq_create_strict_mappings(struct irq_domain *domain,
- unsigned int irq_base,
- irq_hw_number_t hwirq_base, int count);
-
-static inline int irq_create_identity_mapping(struct irq_domain *host,
- irq_hw_number_t hwirq)
-{
-   return irq_create_strict_mappings(host, hwirq, hwirq, 1);
-}
 
 extern const struct irq_domain_ops irq_domain_simple_ops;
 
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index dfa716305ea9..0ba761e6b0a8 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -703,41 +703,6 @@ unsigned int irq_create_mapping_affinity(struct irq_domain 
*domain,
 }
 EXPORT_SYMBOL_GPL(irq_create_mapping_affinity);
 
-/**
- * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs
- * @domain: domain owning the interrupt range
- * @irq_base: beginning of linux IRQ range
- * @hwirq_base: beginning of hardware IRQ range
- * @count: Number of interrupts to map
- *
- * This routine is used for allocating and mapping a range of hardware
- * irqs to linux irqs where the linux irq numbers are at pre-defined
- * locations. For use by controllers that already have static mappings
- * to insert in to the domain.
- *
- * Non-linear users can use irq_create_identity_mapping() for IRQ-at-a-time
- * domain insertion.
- *
- * 0 is returned upon success, while any failure to establish a static
- * mapping is treated as an error.
- */
-int irq_create_strict_mappings(struct irq_domain *domain, unsigned int 
irq_base,
-  irq_hw_number_t hwirq_base, int count)
-{
-   struct device_node *of_node;
-   int ret;
-
-   of_node = irq_domain_get_of_node(domain);
-   ret = irq_alloc_descs(irq_base, irq_base, count,
- of_node_to_nid(of_node));
-   if (unlikely(ret < 0))
-   return ret;
-
-   irq_domain_associate_many(domain, irq_base, hwirq_base, count);
-   return 0;
-}
-EXPORT_SYMBOL_GPL(irq_create_strict_mappings);
-
 static int irq_domain_translate(struct irq_domain *d,
struct irq_fwspec *fwspec,
irq_hw_number_t *hwirq, unsigned int *type)
-- 
2.29.2



[PATCH 8/9] powerpc: Convert irq_domain_add_legacy_isa use to irq_domain_add_legacy

2021-04-06 Thread Marc Zyngier
irq_domain_add_legacy_isa is a pain. It only exists for the benefit of
two PPC-specific drivers, and creates an ugly dependency between asm/irq.h
and linux/irqdomain.h

Instead, let's convert these two drivers to irq_domain_add_legacy(),
stop using NUM_ISA_INTERRUPTS by directly setting NR_IRQS_LEGACY.

The dependency cannot be broken yet as there is a lot of PPC-related
code that depends on it, but that's the first step towards it.

A followup patch will remove irq_domain_add_legacy_isa.

Signed-off-by: Marc Zyngier 
---
 arch/powerpc/include/asm/irq.h | 4 ++--
 arch/powerpc/platforms/ps3/interrupt.c | 4 ++--
 arch/powerpc/sysdev/i8259.c| 3 ++-
 arch/powerpc/sysdev/mpic.c | 2 +-
 arch/powerpc/sysdev/tsi108_pci.c   | 3 ++-
 arch/powerpc/sysdev/xics/xics-common.c | 2 +-
 6 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index f3f264e441a7..aeb209144c68 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -23,8 +23,8 @@ extern atomic_t ppc_n_lost_interrupts;
 /* Total number of virq in the platform */
 #define NR_IRQSCONFIG_NR_IRQS
 
-/* Same thing, used by the generic IRQ code */
-#define NR_IRQS_LEGACY NUM_ISA_INTERRUPTS
+/* Number of irqs reserved for a legacy isa controller */
+#define NR_IRQS_LEGACY 16
 
 extern irq_hw_number_t virq_to_hw(unsigned int virq);
 
diff --git a/arch/powerpc/platforms/ps3/interrupt.c 
b/arch/powerpc/platforms/ps3/interrupt.c
index 78f2339ed5cb..93e367a00452 100644
--- a/arch/powerpc/platforms/ps3/interrupt.c
+++ b/arch/powerpc/platforms/ps3/interrupt.c
@@ -45,7 +45,7 @@
  * implementation equates HV plug value to Linux virq value, constrains each
  * interrupt to have a system wide unique plug number, and limits the range
  * of the plug values to map into the first dword of the bitmaps.  This
- * gives a usable range of plug values of  {NUM_ISA_INTERRUPTS..63}.  Note
+ * gives a usable range of plug values of  {NR_IRQS_LEGACY..63}.  Note
  * that there is no constraint on how many in this set an individual thread
  * can acquire.
  *
@@ -721,7 +721,7 @@ static unsigned int ps3_get_irq(void)
}
 
 #if defined(DEBUG)
-   if (unlikely(plug < NUM_ISA_INTERRUPTS || plug > PS3_PLUG_MAX)) {
+   if (unlikely(plug < NR_IRQS_LEGACY || plug > PS3_PLUG_MAX)) {
dump_bmp(&per_cpu(ps3_private, 0));
dump_bmp(&per_cpu(ps3_private, 1));
BUG();
diff --git a/arch/powerpc/sysdev/i8259.c b/arch/powerpc/sysdev/i8259.c
index c1d76c344351..dc1a151c63d7 100644
--- a/arch/powerpc/sysdev/i8259.c
+++ b/arch/powerpc/sysdev/i8259.c
@@ -260,7 +260,8 @@ void i8259_init(struct device_node *node, unsigned long 
intack_addr)
raw_spin_unlock_irqrestore(&i8259_lock, flags);
 
/* create a legacy host */
-   i8259_host = irq_domain_add_legacy_isa(node, &i8259_host_ops, NULL);
+   i8259_host = irq_domain_add_legacy(node, NR_IRQS_LEGACY, 0, 0,
+  &i8259_host_ops, NULL);
if (i8259_host == NULL) {
printk(KERN_ERR "i8259: failed to allocate irq host !\n");
return;
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index b0426f28946a..995fb2ada507 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -602,7 +602,7 @@ static void __init mpic_scan_ht_pics(struct mpic *mpic)
 /* Find an mpic associated with a given linux interrupt */
 static struct mpic *mpic_find(unsigned int irq)
 {
-   if (irq < NUM_ISA_INTERRUPTS)
+   if (irq < NR_IRQS_LEGACY)
return NULL;
 
return irq_get_chip_data(irq);
diff --git a/arch/powerpc/sysdev/tsi108_pci.c b/arch/powerpc/sysdev/tsi108_pci.c
index 49f9541954f8..042bb38fa5c2 100644
--- a/arch/powerpc/sysdev/tsi108_pci.c
+++ b/arch/powerpc/sysdev/tsi108_pci.c
@@ -404,7 +404,8 @@ void __init tsi108_pci_int_init(struct device_node *node)
 {
DBG("Tsi108_pci_int_init: initializing PCI interrupts\n");
 
-   pci_irq_host = irq_domain_add_legacy_isa(node, &pci_irq_domain_ops, 
NULL);
+   pci_irq_host = irq_domain_add_legacy(node, NR_IRQS_LEGACY, 0, 0,
+&pci_irq_domain_ops, NULL);
if (pci_irq_host == NULL) {
printk(KERN_ERR "pci_irq_host: failed to allocate irq 
domain!\n");
return;
diff --git a/arch/powerpc/sysdev/xics/xics-common.c 
b/arch/powerpc/sysdev/xics/xics-common.c
index 7e4305c01bac..fdf8dbb6 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -201,7 +201,7 @@ void xics_migrate_irqs_away(void)
struct ics *ics;
 
/* We can't set affinity on ISA interrupts */
-   if (virq < NUM_ISA_INTERRUPTS)
+   

[PATCH 9/9] irqdomain: Kill irq_domain_add_legacy_isa

2021-04-06 Thread Marc Zyngier
This helper doesn't have a user anymore, let's remove it.

Signed-off-by: Marc Zyngier 
---
 Documentation/core-api/irq/irq-domain.rst |  1 -
 include/linux/irqdomain.h | 11 ---
 2 files changed, 12 deletions(-)

diff --git a/Documentation/core-api/irq/irq-domain.rst 
b/Documentation/core-api/irq/irq-domain.rst
index a77c24c27f7b..84e561db468f 100644
--- a/Documentation/core-api/irq/irq-domain.rst
+++ b/Documentation/core-api/irq/irq-domain.rst
@@ -146,7 +146,6 @@ Legacy
 
irq_domain_add_simple()
irq_domain_add_legacy()
-   irq_domain_add_legacy_isa()
irq_domain_create_legacy()
 
 The Legacy mapping is a special case for drivers that already have a
diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 3997ed9e4d7d..2a7ecf08d56e 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -45,9 +45,6 @@ struct cpumask;
 struct seq_file;
 struct irq_affinity_desc;
 
-/* Number of irqs reserved for a legacy isa controller */
-#define NUM_ISA_INTERRUPTS 16
-
 #define IRQ_DOMAIN_IRQ_SPEC_PARAMS 16
 
 /**
@@ -346,14 +343,6 @@ static inline struct irq_domain 
*irq_domain_add_nomap(struct device_node *of_nod
 {
return __irq_domain_add(of_node_to_fwnode(of_node), 0, max_irq, 
max_irq, ops, host_data);
 }
-static inline struct irq_domain *irq_domain_add_legacy_isa(
-   struct device_node *of_node,
-   const struct irq_domain_ops *ops,
-   void *host_data)
-{
-   return irq_domain_add_legacy(of_node, NUM_ISA_INTERRUPTS, 0, 0, ops,
-host_data);
-}
 static inline struct irq_domain *irq_domain_add_tree(struct device_node 
*of_node,
 const struct irq_domain_ops *ops,
 void *host_data)
-- 
2.29.2



Re: [PATCH 1/9] irqdomain: Reimplement irq_linear_revmap() with irq_find_mapping()

2021-04-06 Thread Marc Zyngier
Christophe,

On Tue, 06 Apr 2021 12:21:33 +0100,
Christophe Leroy  wrote:
> 
> 
> 
> Le 06/04/2021 à 11:35, Marc Zyngier a écrit :
> > irq_linear_revmap() is supposed to be a fast path for domain
> > lookups, but it only exposes low-level details of the irqdomain
> > implementation, details which are better kept private.
> 
> Can you elaborate with more details ?

Things like directly picking into the revmap are positively awful, and
doesn't work if the domain has been constructed using the radix
tree. Which on its own is totally broken, because things like
irq_domain_create_hierarchy() will pick an implementation or the
other.

> 
> > 
> > The *overhead* between the two is only a function call and
> > a couple of tests, so it is likely that noone can show any
> > meaningful difference compared to the cost of taking an
> > interrupt.
> 
> Do you have any measurement ?

I did measure things on arm64, and couldn't come up with any
difference other than noise.

> Can you make the "likely" a certitude ?

Of course not. You can always come up with an artificial CPU
implementation that has a very small exception entry overhead, and a
ridiculously slow memory subsystem. Do I care about these? No.

If you can come up with realistic platforms that show a regression
with this patch, I'm all ears.

> 
> > 
> > Reimplement irq_linear_revmap() with irq_find_mapping()
> > in order to preserve source code compatibility, and
> > rename the internal field for a measure.
> 
> This is in complete contradiction with commit 
> https://github.com/torvalds/linux/commit/d3dcb436
> 
> At that time, irq_linear_revmap() was less complex than what
> irq_find_mapping() is today, and nevertheless it was considered worth
> restoring in as a fast path. What has changed since then ?

Over 8 years? Plenty. The use of irqdomains has been generalised, we
have domain hierarchies, and if anything, this commit introduces the
buggy behaviour I was mentioning above. I also don't see any mention
of actual performance in that commit.

And if we're worried about a fast path, being able to directly cache
the irq_data in the revmap, hence skipping the irq_desc lookup that
inevitable follows, is a much more interesting prospect than the "get
useless data quick" that irq_linear_revmap() implements.

This latter optimisation is what I am after.

> Can you also explain the reason for the renaming of "linear_revmap"
> into "revmap" ? What is that "measure" ?

To catch the potential direct use of the reverse map field.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 4/9] sh: intc: Drop the use of irq_create_identity_mapping()

2021-04-06 Thread Marc Zyngier
On Tue, 06 Apr 2021 11:32:13 +0100,
Geert Uytterhoeven  wrote:
> 
> Hi Marc,
> 
> On Tue, Apr 6, 2021 at 11:44 AM Marc Zyngier  wrote:
> > Instead of playing games with using irq_create_identity_mapping()
> > and irq_domain_associate(), drop the use of the former and only
> > use the latter, together with the allocation of the irq_desc
> > as needed.
> >
> > It doesn't make the code less awful, but at least the intent
> > is clearer.
> >
> > Signed-off-by: Marc Zyngier 
> 
> Thanks for your patch!
> 
> > --- a/drivers/sh/intc/core.c
> > +++ b/drivers/sh/intc/core.c
> > @@ -179,6 +179,23 @@ static unsigned int __init save_reg(struct 
> > intc_desc_int *d,
> > return 0;
> >  }
> >
> > +static bool __init intc_map(struct irq_domain *domain, int irq)
> > +{
> > +   int res;
> 
> warning: unused variable ‘res’ [-Wunused-variable]
> 
> > +
> > +   if (!irq_to_desc(irq) && irq_alloc_desc_at(irq, NUMA_NO_NODE) != 
> > irq) {
> > +   pr_err("uname to allocate IRQ %d\n", irq);
> > +   return false;
> > +   }
> > +
> > +   if (irq_domain_associate(domain, irq, irq)) {
> > +   pr_err("domain association failure\n");
> > +   return false;
> > +   }
> > +
> > +   return true;
> > +}
> > +
> >  int __init register_intc_controller(struct intc_desc *desc)
> >  {
> > unsigned int i, k, smp;
> > @@ -316,19 +333,8 @@ int __init register_intc_controller(struct intc_desc 
> > *desc)
> 
> warning: unused variable ‘res’ [-Wunused-variable]

Ah, thanks for spotting these.

> 
> > if (!vect->enum_id)
> > continue;
> >
> > -   res = irq_create_identity_mapping(d->domain, irq);
> 
> 
> > -   if (unlikely(res)) {
> > -   if (res == -EEXIST) {
> > -   res = irq_domain_associate(d->domain, irq, 
> > irq);
> > -   if (unlikely(res)) {
> > -   pr_err("domain association 
> > failure\n");
> > -   continue;
> > -   }
> > -   } else {
> > -   pr_err("can't identity map IRQ %d\n", irq);
> > -   continue;
> > -   }
> > -   }
> > +   if (!intc_map(d->domain, irq))
> > +   continue;
> >
> > intc_irq_xlate_set(irq, vect->enum_id, d);
> > intc_register_irq(desc, d, vect->enum_id, irq);
> 
> Otherwise this seems to work fine on real hardware (landisk) and qemu
> (rts7751r2d).  I did verify that the new function intc_map() is called.
> 
> Tested-by: Geert Uytterhoeven 

Awesome, thanks Geert.

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH] powerpc/sysdev: Use of_device_get_match_data()

2022-03-04 Thread Marc Zyngier
On Fri, 04 Mar 2022 13:10:19 +,
Christophe Leroy  wrote:
> 
> 
> 
> Le 04/03/2022 à 02:18, cgel@gmail.com a écrit :
> > From: Minghao Chi (CGEL ZTE) 
> > 
> > Use of_device_get_match_data() to simplify the code.
> > 
> > Reported-by: Zeal Robot 
> > Signed-off-by: Minghao Chi (CGEL ZTE) 
> > ---
> >   arch/powerpc/sysdev/fsl_msi.c | 6 +-
> >   1 file changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
> > index b3475ae9f236..9d135bbb30b7 100644
> > --- a/arch/powerpc/sysdev/fsl_msi.c
> > +++ b/arch/powerpc/sysdev/fsl_msi.c
> > @@ -387,7 +387,6 @@ static int fsl_msi_setup_hwirq(struct fsl_msi *msi, 
> > struct platform_device *dev,
> >   static const struct of_device_id fsl_of_msi_ids[];
> >   static int fsl_of_msi_probe(struct platform_device *dev)
> >   {
> > -   const struct of_device_id *match;
> > struct fsl_msi *msi;
> > struct resource res, msiir;
> > int err, i, j, irq_index, count;
> > @@ -397,10 +396,7 @@ static int fsl_of_msi_probe(struct platform_device 
> > *dev)
> > u32 offset;
> > struct pci_controller *phb;
> >   
> > -   match = of_match_device(fsl_of_msi_ids, &dev->dev);
> > -   if (!match)
> > -   return -EINVAL;
> > -   features = match->data;
> > +   features = of_device_get_match_data(&dev->dev);
> 
> What happens when features is NULL ?

I did jump at that one too, but as it turns out, it cannot happen, by
construction. All the fsl_of_msi_ids[] entries have a non-NULL .data
pointer, and you only enter probe if you match a fsl_of_msi_ids[]
entry with the DT.

So the current check for a NULL match is not something that can happen
short of some other bug somewhere.

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

2022-04-29 Thread Marc Zyngier
On Fri, 29 Apr 2022 22:45:14 +0100,
"Russell King (Oracle)"  wrote:
> 
> On Fri, Apr 29, 2022 at 06:38:19PM -0300, Guilherme G. Piccoli wrote:
> > Thanks Marc and Michael for the review/discussion.
> > 
> > On 29/04/2022 15:20, Marc Zyngier wrote:
> > > [...]
> > 
> > > My expectations would be that, since we're getting here using an IPI,
> > > interrupts are already masked. So what reenabled them the first place?
> > > 
> > > Thanks,
> > > 
> > >   M.
> > > 
> > 
> > Marc, I did some investigation in the code (and tried/failed in the ARM
> > documentation as well heh), but this is still not 100% clear for me.
> > 
> > You're saying IPI calls disable IRQs/FIQs by default in the the target
> > CPUs? Where does it happen? I'm a bit confused if this a processor
> > mechanism, or it's in code.
> 
> When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are
> themselves interrupts, so IRQs will be masked while the IPI is being
> processed. Therefore, there should be no need to re-disable the
> already disabled interrupts.
> 
> > But crash_smp_send_stop() is different, it seems to IPI the other CPUs
> > with the flag IPI_CALL_FUNC, which leads to calling
> > generic_smp_call_function_interrupt() - does it disable interrupts/FIQs
> > as well? I couldn't find it.
> 
> It's buried in the architecture behaviour. When the CPU takes an
> interrupt and jumps to the interrupt vector in the vectors page, it is
> architecturally defined that interrupts will be disabled. If they
> weren't architecturally disabled at this point, then as soon as the
> first instruction is processed (at the interrupt vector, likely a
> branch) the CPU would immediately take another jump to the interrupt
> vector, and this process would continue indefinitely, making interrupt
> handling utterly useless.
> 
> So, you won't find an explicit instruction in the code path from the
> vectors to the IPI handler that disables interrupts - because it's
> written into the architecture that this is what must happen.
> 
> IRQs are a lower priority than FIQs, so FIQs remain unmasked.

Ah, you're of course right. That's one of the huge differences between
AArch32 and AArch64, where the former has per target mode masking
rules, and the later masks everything on entry...

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

2022-04-29 Thread Marc Zyngier
On Wed, 27 Apr 2022 23:48:56 +0100,
"Guilherme G. Piccoli"  wrote:
> 
> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
> is responsible for that. This makes sense, since we're turning off
> such CPUs, putting them in an endless busy-wait loop.
> 
> Problem is that there is an alternative path for disabling CPUs,
> in the form of function crash_smp_send_stop(), used for kexec/panic
> paths. This functions relies in a SMP call that also triggers a
> busy-wait loop [at machine_crash_nonpanic_core()], but *without*
> disabling interrupts. This might lead to odd scenarios, like early
> interrupts in the boot of kexec'd kernel or even interrupts in
> other CPUs while the main one still works in the panic path and
> assumes all secondary CPUs are (really!) off.
> 
> This patch mimics the ipi_cpu_stop() interrupt disable mechanism
> in the crash CPU shutdown path, hence disabling IRQs/FIQs in all
> secondary CPUs in the kexec/panic path as well.
> 
> Cc: Marc Zyngier 
> Cc: Russell King 
> Signed-off-by: Guilherme G. Piccoli 
> ---
>  arch/arm/kernel/machine_kexec.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index f567032a09c0..ef788ee00519 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused)
>   set_cpu_online(smp_processor_id(), false);
>   atomic_dec(&waiting_for_crash_ipi);
>  
> + local_fiq_disable();
> + local_irq_disable();
> +

My expectations would be that, since we're getting here using an IPI,
interrupts are already masked. So what reenabled them the first place?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 2/9] ARM: PXA: Kill use of irq_create_strict_mappings()

2021-04-27 Thread Marc Zyngier
Hi Guenter,

Thanks for the heads up.

On Mon, 26 Apr 2021 23:39:42 +0100,
Guenter Roeck  wrote:
> 
> On Tue, Apr 06, 2021 at 10:35:50AM +0100, Marc Zyngier wrote:
> > irq_create_strict_mappings() is a poor way to allow the use of
> > a linear IRQ domain as a legacy one. Let's be upfront about
> > it and use a legacy domain when appropriate.
> > 
> > Signed-off-by: Marc Zyngier 
> > ---
> 
> When running the "mainstone" qemu emulation, this patch results
> in many (32, actually) runtime warnings such as the following.
> 
> [0.528272] [ cut here ]
> [0.528285] WARNING: CPU: 0 PID: 1 at kernel/irq/irqdomain.c:550 
> irq_domain_associate+0x194/0x1f0
> [0.528315] error: virq335 is not allocated

[...]

This looks like a case of CONFIG_SPARSE_IRQ, combined with a lack of
brain engagement. I've come up with the following patch, which lets
the kernel boot in QEMU without screaming (other than the lack of a
rootfs...).

Please let me know if this helps.

Thanks,

    M.

From 4d7f6ddbbfdff1c9f029bafca79020d3294dc32c Mon Sep 17 00:00:00 2001
From: Marc Zyngier 
Date: Tue, 27 Apr 2021 09:00:28 +0100
Subject: [PATCH] ARM: PXA: Fix cplds irqdesc allocation when using legacy mode

The Mainstone PXA platform uses CONFIG_SPARSE_IRQ, and thus we
cannot rely on the irq descriptors to be readilly allocated
before creating the irqdomain in legacy mode. The kernel then
complains loudly about not being able to associate the interrupt
in the domain -- can't blame it.

Fix it by allocating the irqdescs upfront in the legacy case.

Fixes: b68761da0111 ("ARM: PXA: Kill use of irq_create_strict_mappings()")
Reported-by: Guenter Roeck 
Signed-off-by: Marc Zyngier 
---
 arch/arm/mach-pxa/pxa_cplds_irqs.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-pxa/pxa_cplds_irqs.c 
b/arch/arm/mach-pxa/pxa_cplds_irqs.c
index ec0d9b094744..bddfc7cd5d40 100644
--- a/arch/arm/mach-pxa/pxa_cplds_irqs.c
+++ b/arch/arm/mach-pxa/pxa_cplds_irqs.c
@@ -121,8 +121,13 @@ static int cplds_probe(struct platform_device *pdev)
return fpga->irq;
 
base_irq = platform_get_irq(pdev, 1);
-   if (base_irq < 0)
+   if (base_irq < 0) {
base_irq = 0;
+   } else {
+   ret = devm_irq_alloc_descs(&pdev->dev, base_irq, base_irq, 
CPLDS_NB_IRQ, 0);
+   if (ret < 0)
+   return ret;
+   }
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
fpga->base = devm_ioremap_resource(&pdev->dev, res);
-- 
2.30.2


-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 15/31] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts

2021-05-15 Thread Marc Zyngier
On Fri, 14 May 2021 21:51:51 +0100,
Thomas Gleixner  wrote:
> 
> On Fri, Apr 30 2021 at 10:03, Cédric Le Goater wrote:
> 
> CC: +Marc

Thanks Thomas.

> 
> > PCI MSI interrupt numbers are now mapped in a PCI-MSI domain but the
> > underlying calls handling the passthrough of the interrupt in the
> > guest need a number in the XIVE IRQ domain.
> >
> > Use the IRQ data mapped in the XIVE IRQ domain and not the one in the
> > PCI-MSI domain.
> >
> > Exporting irq_get_default_host() might not be the best solution.
> >
> > Cc: Thomas Gleixner 
> > Cc: Paul Mackerras 
> > Signed-off-by: Cédric Le Goater 
> > ---
> >  arch/powerpc/kvm/book3s_xive.c | 3 ++-
> >  kernel/irq/irqdomain.c | 1 +
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> > index 3a7da42bed57..81b9f4fc3978 100644
> > --- a/arch/powerpc/kvm/book3s_xive.c
> > +++ b/arch/powerpc/kvm/book3s_xive.c
> > @@ -861,7 +861,8 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned 
> > long guest_irq,
> > struct kvmppc_xive *xive = kvm->arch.xive;
> > struct kvmppc_xive_src_block *sb;
> > struct kvmppc_xive_irq_state *state;
> > -   struct irq_data *host_data = irq_get_irq_data(host_irq);
> > +   struct irq_data *host_data =
> > +   irq_domain_get_irq_data(irq_get_default_host(), host_irq);
> > unsigned int hw_irq = (unsigned int)irqd_to_hwirq(host_data);
> > u16 idx;
> > u8 prio;
> > diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> > index d10ab1d689d5..8a073d1ce611 100644
> > --- a/kernel/irq/irqdomain.c
> > +++ b/kernel/irq/irqdomain.c
> > @@ -481,6 +481,7 @@ struct irq_domain *irq_get_default_host(void)
> >  {
> > return irq_default_domain;
> >  }
> > +EXPORT_SYMBOL_GPL(irq_get_default_host);
> >  
> >  static void irq_domain_clear_mapping(struct irq_domain *domain,
> >  irq_hw_number_t hwirq)
> 

Is there any reason why we should add more users of the "default host"
fallback? I would really hope that new code would actually track their
irqdomain in a more fine-grained way, specially when using the
hierarchical MSi setup, which seems to be the goal of this series.

Don't you have enough topology information that you can make use of to
correctly assign a domain identifier (of_node or otherwise)?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.15-1 tag

2021-09-03 Thread Marc Zyngier
Hi Michael,

On Fri, 03 Sep 2021 14:36:57 +0100,
Michael Ellerman  wrote:
>
> Hi Linus,
> 
> Please pull powerpc updates for 5.15.
> 
> A bit of a small batch this time.
> 
> There was one conflict against my own fixes branch, and the resolution was a 
> little bit
> messy, so I just did a merge of fixes myself to resolve the conflict. I 
> didn't think there
> was any value in having you resolve a conflict between two of my own branches.
> 
> Notable out of area changes:
>   scripts/mod/modpost.c   # 1e688dd2a3d6 powerpc/bug: Provide 
> better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
>   kernel/irq/irqdomain.c  # 51be9e51a800 KVM: PPC: Book3S HV: XIVE: Fix 
> mapping of passthrough interrupts
> 
> That second one generated a bit of discussion[1] with tglx and maz,
> who asked if we could avoid adding the export of
> irq_get_default_host(). Cédric replied explaining that we don't
> really have good way to avoid it, but we never heard back from them,
> so in the end I decided to merge it.

Apologies for this, I clearly have lost track of it.

It clearly isn't pretty, but it isn't a deal breaker either. In the
end, it will be easier to address with the code being in the tree.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH v5 00/14] PCI: Add support for Apple M1

2021-10-05 Thread Marc Zyngier
On Mon, 04 Oct 2021 21:42:45 +0100,
Linus Walleij  wrote:
> 
> On Mon, Oct 4, 2021 at 9:52 PM Rob Herring  wrote:
> 
> > FYI, I pushed patches 1-3 to kernelCI and didn't see any regressions.
> > I am a bit worried about changes to the DT interrupt parsing and
> > ancient platforms (such as PowerMacs). Most likely there wouldn't be
> > any report until -rc1 or months later on those old systems.
> 
> Lets page the PPC lists to see if someone can test on some powermac.

/me eyes the XServe-G5 that hasn't been powered on in 10 years. What
could possibly go wrong?

M.

-- 
Without deviation from the norm, progress is not possible.


[PATCH 2/5] KVM: mips: Use kvm_get_vcpu() instead of open-coded access

2021-11-05 Thread Marc Zyngier
As we are about to change the way vcpus are allocated, mandate
the use of kvm_get_vcpu() instead of open-coding the access.

Signed-off-by: Marc Zyngier 
---
 arch/mips/kvm/loongson_ipi.c | 4 ++--
 arch/mips/kvm/mips.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/kvm/loongson_ipi.c b/arch/mips/kvm/loongson_ipi.c
index 3681fc8fba38..5d53f32d837c 100644
--- a/arch/mips/kvm/loongson_ipi.c
+++ b/arch/mips/kvm/loongson_ipi.c
@@ -120,7 +120,7 @@ static int loongson_vipi_write(struct loongson_kvm_ipi *ipi,
s->status |= data;
irq.cpu = id;
irq.irq = 6;
-   kvm_vcpu_ioctl_interrupt(kvm->vcpus[id], &irq);
+   kvm_vcpu_ioctl_interrupt(kvm_get_vcpu(kvm, id), &irq);
break;
 
case CORE0_CLEAR_OFF:
@@ -128,7 +128,7 @@ static int loongson_vipi_write(struct loongson_kvm_ipi *ipi,
if (!s->status) {
irq.cpu = id;
irq.irq = -6;
-   kvm_vcpu_ioctl_interrupt(kvm->vcpus[id], &irq);
+   kvm_vcpu_ioctl_interrupt(kvm_get_vcpu(kvm, id), &irq);
}
break;
 
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index ceacca74f808..6228bf396d63 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -479,7 +479,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
if (irq->cpu == -1)
dvcpu = vcpu;
else
-   dvcpu = vcpu->kvm->vcpus[irq->cpu];
+   dvcpu = kvm_get_vcpu(vcpu->kvm, irq->cpu);
 
if (intr == 2 || intr == 3 || intr == 4 || intr == 6) {
kvm_mips_callbacks->queue_io_int(dvcpu, irq);
-- 
2.30.2



[PATCH 0/5] KVM: Turn the vcpu array into an xarray

2021-11-05 Thread Marc Zyngier
The kvm structure is pretty large. A large portion of it is the vcpu
array, which is 4kB on x86_64 and arm64 as they deal with 512 vcpu
VMs. Of course, hardly anyone runs VMs this big, so this is often a
net waste of memory and cache locality.

A possible approach is to turn the fixed-size array into an xarray,
which results in a net code deletion after a bit of cleanup.

This series is on top of the current linux/master as it touches the
RISC-V implementation. Only tested on arm64.

Marc Zyngier (5):
  KVM: Move wiping of the kvm->vcpus array to common code
  KVM: mips: Use kvm_get_vcpu() instead of open-coded access
  KVM: s390: Use kvm_get_vcpu() instead of open-coded access
  KVM: x86: Use kvm_get_vcpu() instead of open-coded access
  KVM: Convert the kvm->vcpus array to a xarray

 arch/arm64/kvm/arm.c   | 10 +-
 arch/mips/kvm/loongson_ipi.c   |  4 ++--
 arch/mips/kvm/mips.c   | 23 ++-
 arch/powerpc/kvm/powerpc.c | 10 +-
 arch/riscv/kvm/vm.c| 10 +-
 arch/s390/kvm/kvm-s390.c   | 26 ++
 arch/x86/kvm/vmx/posted_intr.c |  2 +-
 arch/x86/kvm/x86.c |  9 +
 include/linux/kvm_host.h   |  7 ---
 virt/kvm/kvm_main.c| 33 ++---
 10 files changed, 45 insertions(+), 89 deletions(-)

-- 
2.30.2



[PATCH 3/5] KVM: s390: Use kvm_get_vcpu() instead of open-coded access

2021-11-05 Thread Marc Zyngier
As we are about to change the way vcpus are allocated, mandate
the use of kvm_get_vcpu() instead of open-coding the access.

Signed-off-by: Marc Zyngier 
---
 arch/s390/kvm/kvm-s390.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 7af53b8788fa..4a0f62b03964 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4572,7 +4572,7 @@ int kvm_s390_vcpu_start(struct kvm_vcpu *vcpu)
}
 
for (i = 0; i < online_vcpus; i++) {
-   if (!is_vcpu_stopped(vcpu->kvm->vcpus[i]))
+   if (!is_vcpu_stopped(kvm_get_vcpu(vcpu->kvm, i)))
started_vcpus++;
}
 
@@ -4634,9 +4634,11 @@ int kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu)
__disable_ibs_on_vcpu(vcpu);
 
for (i = 0; i < online_vcpus; i++) {
-   if (!is_vcpu_stopped(vcpu->kvm->vcpus[i])) {
+   struct kvm_vcpu *tmp = kvm_get_vcpu(vcpu->kvm, i);
+
+   if (!is_vcpu_stopped(tmp)) {
started_vcpus++;
-   started_vcpu = vcpu->kvm->vcpus[i];
+   started_vcpu = tmp;
}
}
 
-- 
2.30.2



[PATCH 4/5] KVM: x86: Use kvm_get_vcpu() instead of open-coded access

2021-11-05 Thread Marc Zyngier
As we are about to change the way vcpus are allocated, mandate
the use of kvm_get_vcpu() instead of open-coding the access.

Signed-off-by: Marc Zyngier 
---
 arch/x86/kvm/vmx/posted_intr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 5f81ef092bd4..82a49720727d 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -272,7 +272,7 @@ int pi_update_irte(struct kvm *kvm, unsigned int host_irq, 
uint32_t guest_irq,
 
if (!kvm_arch_has_assigned_device(kvm) ||
!irq_remapping_cap(IRQ_POSTING_CAP) ||
-   !kvm_vcpu_apicv_active(kvm->vcpus[0]))
+   !kvm_vcpu_apicv_active(kvm_get_vcpu(kvm, 0)))
return 0;
 
idx = srcu_read_lock(&kvm->irq_srcu);
-- 
2.30.2



[PATCH 1/5] KVM: Move wiping of the kvm->vcpus array to common code

2021-11-05 Thread Marc Zyngier
All architectures have similar loops iterating over the vcpus,
freeing one vcpu at a time, and eventually wiping the reference
off the vcpus array. They are also inconsistently taking
the kvm->lock mutex when wiping the references from the array.

Make this code common, which will simplify further changes.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/arm.c   | 10 +-
 arch/mips/kvm/mips.c   | 21 +
 arch/powerpc/kvm/powerpc.c | 10 +-
 arch/riscv/kvm/vm.c| 10 +-
 arch/s390/kvm/kvm-s390.c   | 18 +-
 arch/x86/kvm/x86.c |  9 +
 include/linux/kvm_host.h   |  2 +-
 virt/kvm/kvm_main.c| 20 ++--
 8 files changed, 25 insertions(+), 75 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f5490afe1ebf..75bb7215da03 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -175,19 +175,11 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, 
struct vm_fault *vmf)
  */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-   int i;
-
bitmap_free(kvm->arch.pmu_filter);
 
kvm_vgic_destroy(kvm);
 
-   for (i = 0; i < KVM_MAX_VCPUS; ++i) {
-   if (kvm->vcpus[i]) {
-   kvm_vcpu_destroy(kvm->vcpus[i]);
-   kvm->vcpus[i] = NULL;
-   }
-   }
-   atomic_set(&kvm->online_vcpus, 0);
+   kvm_destroy_vcpus(kvm);
 }
 
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 562aa878b266..ceacca74f808 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -171,25 +171,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return 0;
 }
 
-void kvm_mips_free_vcpus(struct kvm *kvm)
-{
-   unsigned int i;
-   struct kvm_vcpu *vcpu;
-
-   kvm_for_each_vcpu(i, vcpu, kvm) {
-   kvm_vcpu_destroy(vcpu);
-   }
-
-   mutex_lock(&kvm->lock);
-
-   for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-   kvm->vcpus[i] = NULL;
-
-   atomic_set(&kvm->online_vcpus, 0);
-
-   mutex_unlock(&kvm->lock);
-}
-
 static void kvm_mips_free_gpa_pt(struct kvm *kvm)
 {
/* It should always be safe to remove after flushing the whole range */
@@ -199,7 +180,7 @@ static void kvm_mips_free_gpa_pt(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-   kvm_mips_free_vcpus(kvm);
+   kvm_destroy_vcpus(kvm);
kvm_mips_free_gpa_pt(kvm);
 }
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 35e9cccdeef9..492e4a4121cb 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -463,9 +463,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-   unsigned int i;
-   struct kvm_vcpu *vcpu;
-
 #ifdef CONFIG_KVM_XICS
/*
 * We call kick_all_cpus_sync() to ensure that all
@@ -476,14 +473,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kick_all_cpus_sync();
 #endif
 
-   kvm_for_each_vcpu(i, vcpu, kvm)
-   kvm_vcpu_destroy(vcpu);
+   kvm_destroy_vcpus(kvm);
 
mutex_lock(&kvm->lock);
-   for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-   kvm->vcpus[i] = NULL;
-
-   atomic_set(&kvm->online_vcpus, 0);
 
kvmppc_core_destroy_vm(kvm);
 
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 26399df15b63..6af6cde295eb 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -46,15 +46,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-   int i;
-
-   for (i = 0; i < KVM_MAX_VCPUS; ++i) {
-   if (kvm->vcpus[i]) {
-   kvm_vcpu_destroy(kvm->vcpus[i]);
-   kvm->vcpus[i] = NULL;
-   }
-   }
-   atomic_set(&kvm->online_vcpus, 0);
+   kvm_destroy_vcpus(kvm);
 }
 
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index c6257f625929..7af53b8788fa 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2819,27 +2819,11 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
free_page((unsigned long)(vcpu->arch.sie_block));
 }
 
-static void kvm_free_vcpus(struct kvm *kvm)
-{
-   unsigned int i;
-   struct kvm_vcpu *vcpu;
-
-   kvm_for_each_vcpu(i, vcpu, kvm)
-   kvm_vcpu_destroy(vcpu);
-
-   mutex_lock(&kvm->lock);
-   for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-   kvm->vcpus[i] = NULL;
-
-   atomic_set(&kvm->online_vcpus, 0);
-   mutex_unlock(&kvm->lock);
-}
-
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 

[PATCH 5/5] KVM: Convert the kvm->vcpus array to a xarray

2021-11-05 Thread Marc Zyngier
At least on arm64 and x86, the vcpus array is pretty huge (512 entries),
and is mostly empty in most cases (running 512 vcpu VMs is not that
common). This mean that we end-up with a 4kB block of unused memory
in the middle of the kvm structure.

Instead of wasting away this memory, let's use an xarray instead,
which gives us almost the same flexibility as a normal array, but
with a reduced memory usage with smaller VMs.

Signed-off-by: Marc Zyngier 
---
 include/linux/kvm_host.h |  5 +++--
 virt/kvm/kvm_main.c  | 15 +--
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 36967291b8c6..3933d825e28b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -552,7 +553,7 @@ struct kvm {
struct mutex slots_arch_lock;
struct mm_struct *mm; /* userspace tied to this vm */
struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
-   struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+   struct xarray vcpu_array;
 
/* Used to wait for completion of MMU notifiers.  */
spinlock_t mn_invalidate_lock;
@@ -693,7 +694,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm 
*kvm, int i)
 
/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu.  */
smp_rmb();
-   return kvm->vcpus[i];
+   return xa_load(&kvm->vcpu_array, i);
 }
 
 #define kvm_for_each_vcpu(idx, vcpup, kvm) \
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d83553eeea21..4c18d7911fa5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -461,7 +461,7 @@ void kvm_destroy_vcpus(struct kvm *kvm)
 
mutex_lock(&kvm->lock);
for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-   kvm->vcpus[i] = NULL;
+   xa_erase(&kvm->vcpu_array, i);
 
atomic_set(&kvm->online_vcpus, 0);
mutex_unlock(&kvm->lock);
@@ -1066,6 +1066,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
mutex_init(&kvm->slots_arch_lock);
spin_lock_init(&kvm->mn_invalidate_lock);
rcuwait_init(&kvm->mn_memslots_update_rcuwait);
+   xa_init(&kvm->vcpu_array);
 
INIT_LIST_HEAD(&kvm->devices);
 
@@ -3661,7 +3662,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 
id)
}
 
vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
-   BUG_ON(kvm->vcpus[vcpu->vcpu_idx]);
+   r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 
GFP_KERNEL_ACCOUNT);
+   BUG_ON(r == -EBUSY);
+   if (r)
+   goto unlock_vcpu_destroy;
 
/* Fill the stats id string for the vcpu */
snprintf(vcpu->stats_id, sizeof(vcpu->stats_id), "kvm-%d/vcpu-%d",
@@ -3671,15 +3675,14 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, 
u32 id)
kvm_get_kvm(kvm);
r = create_vcpu_fd(vcpu);
if (r < 0) {
+   xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx);
kvm_put_kvm_no_destroy(kvm);
goto unlock_vcpu_destroy;
}
 
-   kvm->vcpus[vcpu->vcpu_idx] = vcpu;
-
/*
-* Pairs with smp_rmb() in kvm_get_vcpu.  Write kvm->vcpus
-* before kvm->online_vcpu's incremented value.
+* Pairs with smp_rmb() in kvm_get_vcpu.  Store the vcpu
+* pointer before kvm->online_vcpu's incremented value.
 */
smp_wmb();
atomic_inc(&kvm->online_vcpus);
-- 
2.30.2



Re: [PATCH 1/5] KVM: Move wiping of the kvm->vcpus array to common code

2021-11-06 Thread Marc Zyngier
On Fri, 05 Nov 2021 20:12:12 +,
Sean Christopherson  wrote:
> 
> On Fri, Nov 05, 2021, Marc Zyngier wrote:
> > All architectures have similar loops iterating over the vcpus,
> > freeing one vcpu at a time, and eventually wiping the reference
> > off the vcpus array. They are also inconsistently taking
> > the kvm->lock mutex when wiping the references from the array.
> 
> ...
> 
> > +void kvm_destroy_vcpus(struct kvm *kvm)
> > +{
> > +   unsigned int i;
> > +   struct kvm_vcpu *vcpu;
> > +
> > +   kvm_for_each_vcpu(i, vcpu, kvm)
> > +   kvm_vcpu_destroy(vcpu);
> > +
> > +   mutex_lock(&kvm->lock);
> 
> But why is kvm->lock taken here?  Unless I'm overlooking an arch,
> everyone calls this from kvm_arch_destroy_vm(), in which case this
> is the only remaining reference to @kvm.  And if there's some magic
> path for which that's not true, I don't see how it can possibly be
> safe to call kvm_vcpu_destroy() without holding kvm->lock, or how
> this would guarantee that all vCPUs have actually been destroyed
> before nullifying the array.

I asked myself the same question two years ago, and couldn't really
understand the requirement. However, x86 does just that, so I
preserved the behaviour.

If you too believe that this is just wrong, I'm happy to drop the
locking altogether. If that breaks someone's flow, they'll shout soon
enough.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 5/5] KVM: Convert the kvm->vcpus array to a xarray

2021-11-06 Thread Marc Zyngier
On Fri, 05 Nov 2021 20:21:36 +,
Sean Christopherson  wrote:
> 
> On Fri, Nov 05, 2021, Marc Zyngier wrote:
> > At least on arm64 and x86, the vcpus array is pretty huge (512 entries),
> > and is mostly empty in most cases (running 512 vcpu VMs is not that
> > common). This mean that we end-up with a 4kB block of unused memory
> > in the middle of the kvm structure.
> 
> Heh, x86 is now up to 1024 entries.

Humph. I don't want to know whether people are actually using that in
practice. The only time I create VMs with 512 vcpus is to check
whether it still works...

>  
> > Instead of wasting away this memory, let's use an xarray instead,
> > which gives us almost the same flexibility as a normal array, but
> > with a reduced memory usage with smaller VMs.
> > 
> > Signed-off-by: Marc Zyngier 
> > ---
> > @@ -693,7 +694,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm 
> > *kvm, int i)
> >  
> > /* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu.  */
> > smp_rmb();
> > -   return kvm->vcpus[i];
> > +   return xa_load(&kvm->vcpu_array, i);
> >  }
> 
> It'd be nice for this series to convert kvm_for_each_vcpu() to use
> xa_for_each() as well.  Maybe as a patch on top so that potential
> explosions from that are isolated from the initiali conversion?
> 
> Or maybe even use xa_for_each_range() to cap at online_vcpus?
> That's technically a functional change, but IMO it's easier to
> reason about iterating over a snapshot of vCPUs as opposed to being
> able to iterate over vCPUs as their being added.  In practice I
> doubt it matters.
> 
> #define kvm_for_each_vcpu(idx, vcpup, kvm) \
>   xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, 
> atomic_read(&kvm->online_vcpus))
>

I think that's already the behaviour of this iterator (we stop at the
first empty slot capped to online_vcpus. The only change in behaviour
is that vcpup currently holds a pointer to the last vcpu in no empty
slot has been encountered. xa_for_each{,_range}() would set the
pointer to NULL at all times.

I doubt anyone relies on that, but it is probably worth eyeballing
some of the use cases...

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 5/5] KVM: Convert the kvm->vcpus array to a xarray

2021-11-08 Thread Marc Zyngier

On 2021-11-06 11:48, Marc Zyngier wrote:

On Fri, 05 Nov 2021 20:21:36 +,
Sean Christopherson  wrote:


On Fri, Nov 05, 2021, Marc Zyngier wrote:
> At least on arm64 and x86, the vcpus array is pretty huge (512 entries),
> and is mostly empty in most cases (running 512 vcpu VMs is not that
> common). This mean that we end-up with a 4kB block of unused memory
> in the middle of the kvm structure.

Heh, x86 is now up to 1024 entries.


Humph. I don't want to know whether people are actually using that in
practice. The only time I create VMs with 512 vcpus is to check
whether it still works...



> Instead of wasting away this memory, let's use an xarray instead,
> which gives us almost the same flexibility as a normal array, but
> with a reduced memory usage with smaller VMs.
>
> Signed-off-by: Marc Zyngier 
> ---
> @@ -693,7 +694,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm 
*kvm, int i)
>
>/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu.  */
>smp_rmb();
> -  return kvm->vcpus[i];
> +  return xa_load(&kvm->vcpu_array, i);
>  }

It'd be nice for this series to convert kvm_for_each_vcpu() to use
xa_for_each() as well.  Maybe as a patch on top so that potential
explosions from that are isolated from the initiali conversion?

Or maybe even use xa_for_each_range() to cap at online_vcpus?
That's technically a functional change, but IMO it's easier to
reason about iterating over a snapshot of vCPUs as opposed to being
able to iterate over vCPUs as their being added.  In practice I
doubt it matters.

#define kvm_for_each_vcpu(idx, vcpup, kvm) \
	xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, 
atomic_read(&kvm->online_vcpus))




I think that's already the behaviour of this iterator (we stop at the
first empty slot capped to online_vcpus. The only change in behaviour
is that vcpup currently holds a pointer to the last vcpu in no empty
slot has been encountered. xa_for_each{,_range}() would set the
pointer to NULL at all times.

I doubt anyone relies on that, but it is probably worth eyeballing
some of the use cases...


This turned out to be an interesting exercise, as we always use an
int for the index, and the xarray iterators insist on an unsigned
long (and even on a pointer to it). On the other hand, I couldn't
spot any case where we'd rely on the last value of the vcpu pointer.

I'll repost the series once we have a solution for patch #4, and
we can then decide whether we want the iterator churn.
--
Jazz is not dead. It just smells funny...


Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-10 Thread Marc Zyngier
HI all,

On Wed, 10 Nov 2021 18:41:06 +,
Bjorn Helgaas  wrote:
> 
> On Wed, Nov 10, 2021 at 07:07:24PM +0100, Christian Zigotzky wrote:
> > On 09 November 2021 at 03:45 pm, Christian Zigotzky wrote:
> > > Hello,
> > >
> > > The Nemo board [1] doesn't recognize any ATA disks with the pci-v5.16
> > updates [2].
> > >
> > > Error messages:
> > >
> > > ata4.00: gc timeout cmd 0xec
> > > ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> > > ata1.00: gc timeout cmd 0xec
> > > ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> > > ata3.00: gc timeout cmd 0xec
> > > ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> > >
> > > I was able to revert the new pci-v5.16 updates [2]. After a new
> > compiling, the kernel recognize all ATA disks correctly.
> > >
> > > Could you please check the pci-v5.16 updates [2]?
> > >
> > > Please find attached the kernel config.
> > >
> > > Thanks,
> > > Christian
> > >
> > > [1] https://en.wikipedia.org/wiki/AmigaOne_X1000
> > > [2] 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4
> > 
> > Hi All,
> > 
> > Many thanks for your nice responses.
> > 
> > I bisected today [1]. 0412841812265734c306ba5ef8088bcb64d5d3bd (of/irq:
> > Allow matching of an interrupt-map local to an interrupt controller) [2] is
> > the first bad commit.
> > 
> > I was able to revert the first bad commit [1]. After a new compiling, the
> > kernel detects all ATA disks without any problems.
> > 
> > I created a patch for an easy reverting the bad commit [1]. With this patch
> > we can do further our kernel tests.
> > 
> > Could you please check the first bad commit [2]?
> > 
> > Thanks,
> > Christian
> > 
> > [1] https://forum.hyperion-entertainment.com/viewtopic.php?p=54398#p54398
> > [2] 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0412841812265734c306ba5ef8088bcb64d5d3bd
> > 
> > [+ Marc Zyngier, Alyssa Rosenzweig, Lorenzo Pieralisi, and Rob Herring
> > because of the first bad commit]
> 
> Thank you very much for the bisection and for also testing the revert!
> 
> It's easy enough to revert 041284181226 ("of/irq: Allow matching of an
> interrupt-map local to an interrupt controller"), and it seems like
> that's what we need to do.  I have it tentatively queued up.
> 
> That commit was part of the new support for the Apple M1 PCIe
> interface, and I don't know what effect a revert will have on that
> support.  Marc, Alyssa?

It is going to badly break the M1 support, as we won't be able to take
interrupts to detect that the PCIe link is up.

Before we apply a full blown revert and decide that this isn't
workable (and revert the whole M1 PCIe series, because they are
otherwise somewhat pointless), I'd like to understand *what* breaks
exactly.

Christian, could you point me to the full DT that this machine uses?
This would help understanding what goes wrong, and cook something for
you to test.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-10 Thread Marc Zyngier
On Thu, 11 Nov 2021 05:24:52 +,
Christian Zigotzky  wrote:
> 
> On 10 November 2021 at 08:09 pm, Marc Zyngier wrote:
> > HI all,
> > 
> > On Wed, 10 Nov 2021 18:41:06 +,
> > Bjorn Helgaas  wrote:
> >> On Wed, Nov 10, 2021 at 07:07:24PM +0100, Christian Zigotzky wrote:
> >>> On 09 November 2021 at 03:45 pm, Christian Zigotzky wrote:
> >>>> Hello,
> >>>> 
> >>>> The Nemo board [1] doesn't recognize any ATA disks with the pci-v5.16
> >>> updates [2].
> >>>> Error messages:
> >>>> 
> >>>> ata4.00: gc timeout cmd 0xec
> >>>> ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> >>>> ata1.00: gc timeout cmd 0xec
> >>>> ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> >>>> ata3.00: gc timeout cmd 0xec
> >>>> ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> >>>> 
> >>>> I was able to revert the new pci-v5.16 updates [2]. After a new
> >>> compiling, the kernel recognize all ATA disks correctly.
> >>>> Could you please check the pci-v5.16 updates [2]?
> >>>> 
> >>>> Please find attached the kernel config.
> >>>> 
> >>>> Thanks,
> >>>> Christian
> >>>> 
> >>>> [1] https://en.wikipedia.org/wiki/AmigaOne_X1000
> >>>> [2] 
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4
> >>> Hi All,
> >>> 
> >>> Many thanks for your nice responses.
> >>> 
> >>> I bisected today [1]. 0412841812265734c306ba5ef8088bcb64d5d3bd (of/irq:
> >>> Allow matching of an interrupt-map local to an interrupt controller) [2] 
> >>> is
> >>> the first bad commit.
> >>> 
> >>> I was able to revert the first bad commit [1]. After a new compiling, the
> >>> kernel detects all ATA disks without any problems.
> >>> 
> >>> I created a patch for an easy reverting the bad commit [1]. With this 
> >>> patch
> >>> we can do further our kernel tests.
> >>> 
> >>> Could you please check the first bad commit [2]?
> >>> 
> >>> Thanks,
> >>> Christian
> >>> 
> >>> [1] https://forum.hyperion-entertainment.com/viewtopic.php?p=54398#p54398
> >>> [2] 
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0412841812265734c306ba5ef8088bcb64d5d3bd
> >>> 
> >>> [+ Marc Zyngier, Alyssa Rosenzweig, Lorenzo Pieralisi, and Rob Herring
> >>> because of the first bad commit]
> >> Thank you very much for the bisection and for also testing the revert!
> >> 
> >> It's easy enough to revert 041284181226 ("of/irq: Allow matching of an
> >> interrupt-map local to an interrupt controller"), and it seems like
> >> that's what we need to do.  I have it tentatively queued up.
> >> 
> >> That commit was part of the new support for the Apple M1 PCIe
> >> interface, and I don't know what effect a revert will have on that
> >> support.  Marc, Alyssa?
> > It is going to badly break the M1 support, as we won't be able to take
> > interrupts to detect that the PCIe link is up.
> > 
> > Before we apply a full blown revert and decide that this isn't
> > workable (and revert the whole M1 PCIe series, because they are
> > otherwise somewhat pointless), I'd like to understand *what* breaks
> > exactly.
> > 
> > Christian, could you point me to the full DT that this machine uses?
> > This would help understanding what goes wrong, and cook something for
> > you to test.
> > 
> > Thanks,
> > 
> > M.
> > 
> Hello Marc,
> 
> Here you are:
> https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406

This is not what I asked. I need the actual source file, or at the
very least the compiled object (the /sys/firmware/fdt file, for
example). Not an interpretation that I can't feed to the kernel.

Without this, I can't debug your problem.

> We are very happy to have the patch for reverting the bad commit
> because we want to test the new PASEMI i2c driver with support for the
> Apple M1 [1] on our Nemo boards.

You can revert the patch on your own. At this stage, we're not blindly
reverting things in the tree, but instead trying to understand what
is happening on your particular system.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-11 Thread Marc Zyngier
On Thu, 11 Nov 2021 07:47:08 +,
Christian Zigotzky  wrote:
> 
> On 11 November 2021 at 08:13 am, Marc Zyngier wrote:
> > On Thu, 11 Nov 2021 05:24:52 +,
> > Christian Zigotzky  wrote:
> >> Hello Marc,
> >> 
> >> Here you are:
> >> https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406
> > This is not what I asked. I need the actual source file, or at the
> > very least the compiled object (the /sys/firmware/fdt file, for
> > example). Not an interpretation that I can't feed to the kernel.
> > 
> > Without this, I can't debug your problem.
> > 
> >> We are very happy to have the patch for reverting the bad commit
> >> because we want to test the new PASEMI i2c driver with support for the
> >> Apple M1 [1] on our Nemo boards.
> > You can revert the patch on your own. At this stage, we're not blindly
> > reverting things in the tree, but instead trying to understand what
> > is happening on your particular system.
> > 
> > Thanks,
> > 
> > M.
> > 
> I added a download link for the fdt file to the post [1]. Please read
> also Darren's comments in this post.

Thanks for that. The DT looks absolutely bonkers, no wonder that
things break with something like that.

But Darren's comments made me jump a bit, and I quote them here for
everyone to see:


[...]

The dtb passed by the CFE firmware has a number of issues, which up till
now have been fixed by use of patches applied to the mainline kernel.
This occasionally causes problems with changes made to mainline.

[...]


Am I right in understanding that the upstream kernel does not support
the machine out of the box, and that you actually have to apply out of
tree patches to make it work? That these patches have to do with the
IRQ routing?

If so, I wonder why upstream should revert a patch to work on a system
that isn't supported upstream the first place. I will still try and
come up with a solution for you. But asking for the revert of a patch
on these grounds is not, IMHO, acceptable. Also, please provide these
patches on the list so that I can help you to some extend (and I mean
*on the list*, not on a random forum that collects my information).

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-11 Thread Marc Zyngier
On Thu, 11 Nov 2021 10:44:30 +,
Christian Zigotzky  wrote:
> 
> On 11 November 2021 at 11:20 am, Marc Zyngier wrote:
> > On Thu, 11 Nov 2021 07:47:08 +,
> > Christian Zigotzky  wrote:
> >> On 11 November 2021 at 08:13 am, Marc Zyngier wrote:
> >>> On Thu, 11 Nov 2021 05:24:52 +,
> >>> Christian Zigotzky  wrote:
> >>>> Hello Marc,
> >>>> 
> >>>> Here you are:
> >>>> https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406
> >>> This is not what I asked. I need the actual source file, or at the
> >>> very least the compiled object (the /sys/firmware/fdt file, for
> >>> example). Not an interpretation that I can't feed to the kernel.
> >>> 
> >>> Without this, I can't debug your problem.
> >>> 
> >>>> We are very happy to have the patch for reverting the bad commit
> >>>> because we want to test the new PASEMI i2c driver with support for the
> >>>> Apple M1 [1] on our Nemo boards.
> >>> You can revert the patch on your own. At this stage, we're not blindly
> >>> reverting things in the tree, but instead trying to understand what
> >>> is happening on your particular system.
> >>> 
> >>> Thanks,
> >>> 
> >>>   M.
> >>> 
> >> I added a download link for the fdt file to the post [1]. Please read
> >> also Darren's comments in this post.
> > 
> > Am I right in understanding that the upstream kernel does not support
> > the machine out of the box, and that you actually have to apply out of
> > tree patches to make it work?
> No, you aren't right. The upstream kernel supports the Nemo board out
> of the box. [1]

That's not the way I interpret Darren's comments, but you surely know
better than I do.

I'll try to come up with something for you to test shortly.

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-11 Thread Marc Zyngier
On Wed, 10 Nov 2021 18:07:24 +,
Christian Zigotzky  wrote:
> 
> On 09 November 2021 at 03:45 pm, Christian Zigotzky wrote:
> > Hello,
> >
> > The Nemo board [1] doesn't recognize any ATA disks with the
> pci-v5.16 updates [2].
> >
> > Error messages:
> >
> > ata4.00: gc timeout cmd 0xec
> > ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> > ata1.00: gc timeout cmd 0xec
> > ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> > ata3.00: gc timeout cmd 0xec
> > ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> >
> > I was able to revert the new pci-v5.16 updates [2]. After a new
> compiling, the kernel recognize all ATA disks correctly.
> >
> > Could you please check the pci-v5.16 updates [2]?
> >
> > Please find attached the kernel config.
> >
> > Thanks,
> > Christian
> >
> > [1] https://en.wikipedia.org/wiki/AmigaOne_X1000
> > [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4
> 
> Hi All,
> 
> Many thanks for your nice responses.
> 
> I bisected today [1]. 0412841812265734c306ba5ef8088bcb64d5d3bd
> (of/irq: Allow matching of an interrupt-map local to an interrupt
> controller) [2] is the first bad commit.

Can you please give the following hack a go and post the result
(including the full dmesg)?

Thanks,

M.
diff --git a/drivers/of/irq.c b/drivers/of/irq.c
index 32be5a03951f..8cf0cc9b7caf 100644
--- a/drivers/of/irq.c
+++ b/drivers/of/irq.c
@@ -156,14 +156,15 @@ int of_irq_parse_raw(const __be32 *addr, struct 
of_phandle_args *out_irq)
 
/* Now start the actual "proper" walk of the interrupt tree */
while (ipar != NULL) {
+   bool intc = of_property_read_bool(ipar, "interrupt-controller");
+
/*
 * Now check if cursor is an interrupt-controller and
 * if it is then we are done, unless there is an
 * interrupt-map which takes precedence.
 */
imap = of_get_property(ipar, "interrupt-map", &imaplen);
-   if (imap == NULL &&
-   of_property_read_bool(ipar, "interrupt-controller")) {
+   if (imap == NULL && intc) {
pr_debug(" -> got it !\n");
return 0;
}
@@ -244,8 +245,14 @@ int of_irq_parse_raw(const __be32 *addr, struct 
of_phandle_args *out_irq)
 
pr_debug(" -> imaplen=%d\n", imaplen);
}
-   if (!match)
+   if (!match) {
+   if (intc) {
+   pr_info("%pOF interrupt-map failed, using 
interrupt-controller\n", ipar);
+   return 0;
+   }
+
goto fail;
+   }
 
/*
 * Successfully parsed an interrrupt-map translation; copy new

-- 
Without deviation from the norm, progress is not possible.


Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-12 Thread Marc Zyngier
On Fri, 12 Nov 2021 09:40:30 +,
Christian Zigotzky  wrote:
> 
> On 11 November 2021 at 06:39 pm, Marc Zyngier wrote:
> > On Wed, 10 Nov 2021 18:07:24 +,
> > Christian Zigotzky  wrote:
> >> On 09 November 2021 at 03:45 pm, Christian Zigotzky wrote:
> >>> Hello,
> >>> 
> >>> The Nemo board [1] doesn't recognize any ATA disks with the
> >> pci-v5.16 updates [2].
> >>> Error messages:
> >>> 
> >>> ata4.00: gc timeout cmd 0xec
> >>> ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> >>> ata1.00: gc timeout cmd 0xec
> >>> ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> >>> ata3.00: gc timeout cmd 0xec
> >>> ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> >>> 
> >>> I was able to revert the new pci-v5.16 updates [2]. After a new
> >> compiling, the kernel recognize all ATA disks correctly.
> >>> Could you please check the pci-v5.16 updates [2]?
> >>> 
> >>> Please find attached the kernel config.
> >>> 
> >>> Thanks,
> >>> Christian
> >>> 
> >>> [1] https://en.wikipedia.org/wiki/AmigaOne_X1000
> >>> [2]
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4
> >> 
> >> Hi All,
> >> 
> >> Many thanks for your nice responses.
> >> 
> >> I bisected today [1]. 0412841812265734c306ba5ef8088bcb64d5d3bd
> >> (of/irq: Allow matching of an interrupt-map local to an interrupt
> >> controller) [2] is the first bad commit.
> > Can you please give the following hack a go and post the result
> > (including the full dmesg)?
> > 
> > Thanks,
> > 
> > M.
> > diff --git a/drivers/of/irq.c b/drivers/of/irq.c
> > index 32be5a03951f..8cf0cc9b7caf 100644
> > --- a/drivers/of/irq.c
> > +++ b/drivers/of/irq.c
> > @@ -156,14 +156,15 @@ int of_irq_parse_raw(const __be32 *addr, struct 
> > of_phandle_args *out_irq)
> > /* Now start the actual "proper" walk of the interrupt tree */
> > while (ipar != NULL) {
> > +   bool intc = of_property_read_bool(ipar, "interrupt-controller");
> > +
> > /*
> >  * Now check if cursor is an interrupt-controller and
> >  * if it is then we are done, unless there is an
> >  * interrupt-map which takes precedence.
> >  */
> > imap = of_get_property(ipar, "interrupt-map", &imaplen);
> > -   if (imap == NULL &&
> > -   of_property_read_bool(ipar, "interrupt-controller")) {
> > +   if (imap == NULL && intc) {
> > pr_debug(" -> got it !\n");
> > return 0;
> > }
> > @@ -244,8 +245,14 @@ int of_irq_parse_raw(const __be32 *addr, struct 
> > of_phandle_args *out_irq)
> > pr_debug(" -> imaplen=%d\n", imaplen);
> > }
> > -   if (!match)
> > +   if (!match) {
> > +   if (intc) {
> > +   pr_info("%pOF interrupt-map failed, using 
> > interrupt-controller\n", ipar);
> > +   return 0;
> > +   }
> > +
> > goto fail;
> > +   }
> > /*
> >  * Successfully parsed an interrrupt-map translation; copy new
> > 
> The detecting of the ATA disks works with this patch! Well done!
> Thanks a lot!

Thanks for testing it. I'll turn that into a proper patch.

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-12 Thread Marc Zyngier
On Fri, 12 Nov 2021 14:15:18 +,
Christian Zigotzky  wrote:
> 
> On 12 November 2021 at 02:41 pm, Marc Zyngier wrote:
> > On Fri, 12 Nov 2021 09:40:30 +,
> > Christian Zigotzky  wrote:
> >> On 11 November 2021 at 06:39 pm, Marc Zyngier wrote:
> >>> On Wed, 10 Nov 2021 18:07:24 +,
> >>> Christian Zigotzky  wrote:
> >>>> On 09 November 2021 at 03:45 pm, Christian Zigotzky wrote:
> >>>>> Hello,
> >>>>> 
> >>>>> The Nemo board [1] doesn't recognize any ATA disks with the
> >>>> pci-v5.16 updates [2].
> >>>>> Error messages:
> >>>>> 
> >>>>> ata4.00: gc timeout cmd 0xec
> >>>>> ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> >>>>> ata1.00: gc timeout cmd 0xec
> >>>>> ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> >>>>> ata3.00: gc timeout cmd 0xec
> >>>>> ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> >>>>> 
> >>>>> I was able to revert the new pci-v5.16 updates [2]. After a new
> >>>> compiling, the kernel recognize all ATA disks correctly.
> >>>>> Could you please check the pci-v5.16 updates [2]?
> >>>>> 
> >>>>> Please find attached the kernel config.
> >>>>> 
> >>>>> Thanks,
> >>>>> Christian
> >>>>> 
> >>>>> [1] https://en.wikipedia.org/wiki/AmigaOne_X1000
> >>>>> [2]
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4
> >>>> 
> >>>> Hi All,
> >>>> 
> >>>> Many thanks for your nice responses.
> >>>> 
> >>>> I bisected today [1]. 0412841812265734c306ba5ef8088bcb64d5d3bd
> >>>> (of/irq: Allow matching of an interrupt-map local to an interrupt
> >>>> controller) [2] is the first bad commit.
> >>> Can you please give the following hack a go and post the result
> >>> (including the full dmesg)?
> >>> 
> >>> Thanks,
> >>> 
> >>>   M.
> >>> diff --git a/drivers/of/irq.c b/drivers/of/irq.c
> >>> index 32be5a03951f..8cf0cc9b7caf 100644
> >>> --- a/drivers/of/irq.c
> >>> +++ b/drivers/of/irq.c
> >>> @@ -156,14 +156,15 @@ int of_irq_parse_raw(const __be32 *addr, struct 
> >>> of_phandle_args *out_irq)
> >>>   /* Now start the actual "proper" walk of the interrupt tree */
> >>>   while (ipar != NULL) {
> >>> + bool intc = of_property_read_bool(ipar, "interrupt-controller");
> >>> +
> >>>   /*
> >>>* Now check if cursor is an interrupt-controller and
> >>>* if it is then we are done, unless there is an
> >>>* interrupt-map which takes precedence.
> >>>*/
> >>>   imap = of_get_property(ipar, "interrupt-map", &imaplen);
> >>> - if (imap == NULL &&
> >>> - of_property_read_bool(ipar, "interrupt-controller")) {
> >>> + if (imap == NULL && intc) {
> >>>   pr_debug(" -> got it !\n");
> >>>   return 0;
> >>>   }
> >>> @@ -244,8 +245,14 @@ int of_irq_parse_raw(const __be32 *addr, struct 
> >>> of_phandle_args *out_irq)
> >>>   pr_debug(" -> imaplen=%d\n", imaplen);
> >>>   }
> >>> - if (!match)
> >>> + if (!match) {
> >>> + if (intc) {
> >>> + pr_info("%pOF interrupt-map failed, using 
> >>> interrupt-controller\n", ipar);
> >>> + return 0;
> >>> + }
> >>> +
> >>>   goto fail;
> >>> + }
> >>>   /*
> >>>* Successfully parsed an interrrupt-map translation; 
> >>> copy new
> >>> 
> >> The detecting of the ATA disks works with this patch! Well done!
> >> Thanks a lot!
> > Thanks for testing it. I'll turn that into a proper patch.
> > 
> > M.
> > 
> Could you please explain your patch?

Please refer to the commit message[1].

> I am not a developer. I work for the A-EON Linux FLS.

I have no idea what this is, unfortunately.

M.

[1] https://lore.kernel.org/r/2022143644.434995-1-...@kernel.org

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 16/39] irqdomain: Make normal and nomap irqdomains exclusive

2021-11-16 Thread Marc Zyngier
On Mon, 15 Nov 2021 19:05:17 +,
Cédric Le Goater  wrote:
> 
> Hello Mark,

s/k/c/

> 
> On 5/20/21 18:37, Marc Zyngier wrote:
> > Direct mappings are completely exclusive of normal mappings, meaning
> > that we can refactor the code slightly so that we can get rid of
> > the revmap_direct_max_irq field and use the revmap_size field
> > instead, reducing the size of the irqdomain structure.
> > 
> > Signed-off-by: Marc Zyngier 
> 
> 
> This patch is breaking the POWER9/POWER10 XIVE driver (these are not
> old PPC systems :) on machines sharing the same LSI HW IRQ. For instance,
> a linux KVM guest with a virtio-rng and a virtio-balloon device. In that
> case, Linux creates two distinct IRQ mappings which can lead to some
> unexpected behavior.

Either the irq domain translates, or it doesn't. If the driver creates
a nomap domain, and yet expects some sort of translation to happen,
then the driver is fundamentally broken. And even without that: how do
you end-up with a single HW interrupt having two mappings?

> A fix to go forward would be to change the XIVE IRQ domain to use a
> 'Tree' domain for reverse mapping and not the 'No Map' domain mapping.
> I will keep you updated for XIVE.

I bet there is a bit more to it. From what you are saying above,
something rather ungodly is happening in the XIVE code.

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 0/5] KVM: Turn the vcpu array into an xarray

2021-11-16 Thread Marc Zyngier
On Tue, 16 Nov 2021 15:03:40 +,
Paolo Bonzini  wrote:
> 
> On 11/5/21 20:20, Marc Zyngier wrote:
> > The kvm structure is pretty large. A large portion of it is the vcpu
> > array, which is 4kB on x86_64 and arm64 as they deal with 512 vcpu
> > VMs. Of course, hardly anyone runs VMs this big, so this is often a
> > net waste of memory and cache locality.
> > 
> > A possible approach is to turn the fixed-size array into an xarray,
> > which results in a net code deletion after a bit of cleanup.
> > 
> > This series is on top of the current linux/master as it touches the
> > RISC-V implementation. Only tested on arm64.
> 
> Queued, only locally until I get a review for my replacement of patch
> 4 (see
> https://lore.kernel.org/kvm/2026142205.719375-1-pbonz...@redhat.com/T/).

In which case, let me send a v2 with the changes that we discussed
with Sean. It will still have my version of patch 4, but that's
nothing you can't fix.

M.

-- 
Without deviation from the norm, progress is not possible.


[PATCH v2 0/7] KVM: Turn the vcpu array into an xarray

2021-11-16 Thread Marc Zyngier
The kvm structure is pretty large. A large portion of it is the vcpu
array, which is 4kB on arm64 with 512 vcpu, double that on x86-64.  Of
course, hardly anyone runs VMs this big, so this is often a net waste
of memory and cache locality.

A possible approach is to turn the fixed-size array into an xarray,
which results in a net code deletion after a bit of cleanup.

* From v1:
  - Rebased on v5.16-rc1
  - Dropped the dubious locking on teardown
  - Converted kvm_for_each_vcpu() to xa_for_each_range(), together with
an invasive change converting the index to an unsigned long

Marc Zyngier (7):
  KVM: Move wiping of the kvm->vcpus array to common code
  KVM: mips: Use kvm_get_vcpu() instead of open-coded access
  KVM: s390: Use kvm_get_vcpu() instead of open-coded access
  KVM: x86: Use kvm_get_vcpu() instead of open-coded access
  KVM: Convert the kvm->vcpus array to a xarray
  KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index
  KVM: Convert kvm_for_each_vcpu() to using xa_for_each_range()

 arch/arm64/kvm/arch_timer.c   |  8 ++---
 arch/arm64/kvm/arm.c  | 16 +++--
 arch/arm64/kvm/pmu-emul.c |  2 +-
 arch/arm64/kvm/psci.c |  6 ++--
 arch/arm64/kvm/reset.c|  2 +-
 arch/arm64/kvm/vgic/vgic-init.c   | 10 +++---
 arch/arm64/kvm/vgic/vgic-kvm-device.c |  2 +-
 arch/arm64/kvm/vgic/vgic-mmio-v2.c|  3 +-
 arch/arm64/kvm/vgic/vgic-mmio-v3.c|  7 ++--
 arch/arm64/kvm/vgic/vgic-v3.c |  4 +--
 arch/arm64/kvm/vgic/vgic-v4.c |  5 +--
 arch/arm64/kvm/vgic/vgic.c|  2 +-
 arch/mips/kvm/loongson_ipi.c  |  4 +--
 arch/mips/kvm/mips.c  | 23 ++---
 arch/powerpc/kvm/book3s_32_mmu.c  |  2 +-
 arch/powerpc/kvm/book3s_64_mmu.c  |  2 +-
 arch/powerpc/kvm/book3s_hv.c  |  8 ++---
 arch/powerpc/kvm/book3s_pr.c  |  2 +-
 arch/powerpc/kvm/book3s_xics.c|  6 ++--
 arch/powerpc/kvm/book3s_xics.h|  2 +-
 arch/powerpc/kvm/book3s_xive.c| 15 +
 arch/powerpc/kvm/book3s_xive.h|  4 +--
 arch/powerpc/kvm/book3s_xive_native.c |  8 ++---
 arch/powerpc/kvm/e500_emulate.c   |  2 +-
 arch/powerpc/kvm/powerpc.c| 10 +-
 arch/riscv/kvm/vcpu_sbi.c |  2 +-
 arch/riscv/kvm/vm.c   | 10 +-
 arch/riscv/kvm/vmid.c |  2 +-
 arch/s390/kvm/interrupt.c |  2 +-
 arch/s390/kvm/kvm-s390.c  | 47 ++-
 arch/s390/kvm/kvm-s390.h  |  4 +--
 arch/x86/kvm/hyperv.c |  7 ++--
 arch/x86/kvm/i8254.c  |  2 +-
 arch/x86/kvm/i8259.c  |  5 +--
 arch/x86/kvm/ioapic.c |  4 +--
 arch/x86/kvm/irq_comm.c   |  7 ++--
 arch/x86/kvm/kvm_onhyperv.c   |  3 +-
 arch/x86/kvm/lapic.c  |  6 ++--
 arch/x86/kvm/svm/avic.c   |  2 +-
 arch/x86/kvm/svm/sev.c|  3 +-
 arch/x86/kvm/vmx/posted_intr.c|  2 +-
 arch/x86/kvm/x86.c| 30 +++--
 include/linux/kvm_host.h  | 17 +-
 virt/kvm/kvm_main.c   | 41 ---
 44 files changed, 158 insertions(+), 193 deletions(-)

-- 
2.30.2



[PATCH v2 1/7] KVM: Move wiping of the kvm->vcpus array to common code

2021-11-16 Thread Marc Zyngier
All architectures have similar loops iterating over the vcpus,
freeing one vcpu at a time, and eventually wiping the reference
off the vcpus array. They are also inconsistently taking
the kvm->lock mutex when wiping the references from the array.

Make this code common, which will simplify further changes.
The locking is dropped altogether, as this should only be called
when there is no further references on the kvm structure.

Reviewed-by: Claudio Imbrenda 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/arm.c   | 10 +-
 arch/mips/kvm/mips.c   | 21 +
 arch/powerpc/kvm/powerpc.c | 10 +-
 arch/riscv/kvm/vm.c| 10 +-
 arch/s390/kvm/kvm-s390.c   | 18 +-
 arch/x86/kvm/x86.c |  9 +
 include/linux/kvm_host.h   |  2 +-
 virt/kvm/kvm_main.c| 17 +++--
 8 files changed, 22 insertions(+), 75 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2f03cbfefe67..29942d8c357c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -175,19 +175,11 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, 
struct vm_fault *vmf)
  */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-   int i;
-
bitmap_free(kvm->arch.pmu_filter);
 
kvm_vgic_destroy(kvm);
 
-   for (i = 0; i < KVM_MAX_VCPUS; ++i) {
-   if (kvm->vcpus[i]) {
-   kvm_vcpu_destroy(kvm->vcpus[i]);
-   kvm->vcpus[i] = NULL;
-   }
-   }
-   atomic_set(&kvm->online_vcpus, 0);
+   kvm_destroy_vcpus(kvm);
 }
 
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 562aa878b266..ceacca74f808 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -171,25 +171,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return 0;
 }
 
-void kvm_mips_free_vcpus(struct kvm *kvm)
-{
-   unsigned int i;
-   struct kvm_vcpu *vcpu;
-
-   kvm_for_each_vcpu(i, vcpu, kvm) {
-   kvm_vcpu_destroy(vcpu);
-   }
-
-   mutex_lock(&kvm->lock);
-
-   for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-   kvm->vcpus[i] = NULL;
-
-   atomic_set(&kvm->online_vcpus, 0);
-
-   mutex_unlock(&kvm->lock);
-}
-
 static void kvm_mips_free_gpa_pt(struct kvm *kvm)
 {
/* It should always be safe to remove after flushing the whole range */
@@ -199,7 +180,7 @@ static void kvm_mips_free_gpa_pt(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-   kvm_mips_free_vcpus(kvm);
+   kvm_destroy_vcpus(kvm);
kvm_mips_free_gpa_pt(kvm);
 }
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 35e9cccdeef9..492e4a4121cb 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -463,9 +463,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-   unsigned int i;
-   struct kvm_vcpu *vcpu;
-
 #ifdef CONFIG_KVM_XICS
/*
 * We call kick_all_cpus_sync() to ensure that all
@@ -476,14 +473,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kick_all_cpus_sync();
 #endif
 
-   kvm_for_each_vcpu(i, vcpu, kvm)
-   kvm_vcpu_destroy(vcpu);
+   kvm_destroy_vcpus(kvm);
 
mutex_lock(&kvm->lock);
-   for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-   kvm->vcpus[i] = NULL;
-
-   atomic_set(&kvm->online_vcpus, 0);
 
kvmppc_core_destroy_vm(kvm);
 
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 26399df15b63..6af6cde295eb 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -46,15 +46,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-   int i;
-
-   for (i = 0; i < KVM_MAX_VCPUS; ++i) {
-   if (kvm->vcpus[i]) {
-   kvm_vcpu_destroy(kvm->vcpus[i]);
-   kvm->vcpus[i] = NULL;
-   }
-   }
-   atomic_set(&kvm->online_vcpus, 0);
+   kvm_destroy_vcpus(kvm);
 }
 
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index c6257f625929..7af53b8788fa 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2819,27 +2819,11 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
free_page((unsigned long)(vcpu->arch.sie_block));
 }
 
-static void kvm_free_vcpus(struct kvm *kvm)
-{
-   unsigned int i;
-   struct kvm_vcpu *vcpu;
-
-   kvm_for_each_vcpu(i, vcpu, kvm)
-   kvm_vcpu_destroy(vcpu);
-
-   mutex_lock(&kvm->lock);
-   for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-   kvm->vcpus[i] = NUL

[PATCH v2 2/7] KVM: mips: Use kvm_get_vcpu() instead of open-coded access

2021-11-16 Thread Marc Zyngier
As we are about to change the way vcpus are allocated, mandate
the use of kvm_get_vcpu() instead of open-coding the access.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Marc Zyngier 
---
 arch/mips/kvm/loongson_ipi.c | 4 ++--
 arch/mips/kvm/mips.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/kvm/loongson_ipi.c b/arch/mips/kvm/loongson_ipi.c
index 3681fc8fba38..5d53f32d837c 100644
--- a/arch/mips/kvm/loongson_ipi.c
+++ b/arch/mips/kvm/loongson_ipi.c
@@ -120,7 +120,7 @@ static int loongson_vipi_write(struct loongson_kvm_ipi *ipi,
s->status |= data;
irq.cpu = id;
irq.irq = 6;
-   kvm_vcpu_ioctl_interrupt(kvm->vcpus[id], &irq);
+   kvm_vcpu_ioctl_interrupt(kvm_get_vcpu(kvm, id), &irq);
break;
 
case CORE0_CLEAR_OFF:
@@ -128,7 +128,7 @@ static int loongson_vipi_write(struct loongson_kvm_ipi *ipi,
if (!s->status) {
irq.cpu = id;
irq.irq = -6;
-   kvm_vcpu_ioctl_interrupt(kvm->vcpus[id], &irq);
+   kvm_vcpu_ioctl_interrupt(kvm_get_vcpu(kvm, id), &irq);
}
break;
 
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index ceacca74f808..6228bf396d63 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -479,7 +479,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
if (irq->cpu == -1)
dvcpu = vcpu;
else
-   dvcpu = vcpu->kvm->vcpus[irq->cpu];
+   dvcpu = kvm_get_vcpu(vcpu->kvm, irq->cpu);
 
if (intr == 2 || intr == 3 || intr == 4 || intr == 6) {
kvm_mips_callbacks->queue_io_int(dvcpu, irq);
-- 
2.30.2



[PATCH v2 3/7] KVM: s390: Use kvm_get_vcpu() instead of open-coded access

2021-11-16 Thread Marc Zyngier
As we are about to change the way vcpus are allocated, mandate
the use of kvm_get_vcpu() instead of open-coding the access.

Reviewed-by: Claudio Imbrenda 
Signed-off-by: Marc Zyngier 
---
 arch/s390/kvm/kvm-s390.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 7af53b8788fa..4a0f62b03964 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4572,7 +4572,7 @@ int kvm_s390_vcpu_start(struct kvm_vcpu *vcpu)
}
 
for (i = 0; i < online_vcpus; i++) {
-   if (!is_vcpu_stopped(vcpu->kvm->vcpus[i]))
+   if (!is_vcpu_stopped(kvm_get_vcpu(vcpu->kvm, i)))
started_vcpus++;
}
 
@@ -4634,9 +4634,11 @@ int kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu)
__disable_ibs_on_vcpu(vcpu);
 
for (i = 0; i < online_vcpus; i++) {
-   if (!is_vcpu_stopped(vcpu->kvm->vcpus[i])) {
+   struct kvm_vcpu *tmp = kvm_get_vcpu(vcpu->kvm, i);
+
+   if (!is_vcpu_stopped(tmp)) {
started_vcpus++;
-   started_vcpu = vcpu->kvm->vcpus[i];
+   started_vcpu = tmp;
}
}
 
-- 
2.30.2



[PATCH v2 4/7] KVM: x86: Use kvm_get_vcpu() instead of open-coded access

2021-11-16 Thread Marc Zyngier
As we are about to change the way vcpus are allocated, mandate
the use of kvm_get_vcpu() instead of open-coding the access.

Signed-off-by: Marc Zyngier 
---
 arch/x86/kvm/vmx/posted_intr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 5f81ef092bd4..82a49720727d 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -272,7 +272,7 @@ int pi_update_irte(struct kvm *kvm, unsigned int host_irq, 
uint32_t guest_irq,
 
if (!kvm_arch_has_assigned_device(kvm) ||
!irq_remapping_cap(IRQ_POSTING_CAP) ||
-   !kvm_vcpu_apicv_active(kvm->vcpus[0]))
+   !kvm_vcpu_apicv_active(kvm_get_vcpu(kvm, 0)))
return 0;
 
idx = srcu_read_lock(&kvm->irq_srcu);
-- 
2.30.2



[PATCH v2 5/7] KVM: Convert the kvm->vcpus array to a xarray

2021-11-16 Thread Marc Zyngier
At least on arm64 and x86, the vcpus array is pretty huge (up to
1024 entries on x86) and is mostly empty in the majority of the cases
(running 1k vcpu VMs is not that common).

This mean that we end-up with a 4kB block of unused memory in the
middle of the kvm structure.

Instead of wasting away this memory, let's use an xarray instead,
which gives us almost the same flexibility as a normal array, but
with a reduced memory usage with smaller VMs.

Signed-off-by: Marc Zyngier 
---
 include/linux/kvm_host.h |  5 +++--
 virt/kvm/kvm_main.c  | 15 +--
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9a8c997ff9bc..df1b2b183d94 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -552,7 +553,7 @@ struct kvm {
struct mutex slots_arch_lock;
struct mm_struct *mm; /* userspace tied to this vm */
struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
-   struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+   struct xarray vcpu_array;
 
/* Used to wait for completion of MMU notifiers.  */
spinlock_t mn_invalidate_lock;
@@ -701,7 +702,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm 
*kvm, int i)
 
/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu.  */
smp_rmb();
-   return kvm->vcpus[i];
+   return xa_load(&kvm->vcpu_array, i);
 }
 
 #define kvm_for_each_vcpu(idx, vcpup, kvm) \
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 73811c3829a2..9a7bce944427 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -458,7 +458,7 @@ void kvm_destroy_vcpus(struct kvm *kvm)
 
kvm_for_each_vcpu(i, vcpu, kvm) {
kvm_vcpu_destroy(vcpu);
-   kvm->vcpus[i] = NULL;
+   xa_erase(&kvm->vcpu_array, i);
}
 
atomic_set(&kvm->online_vcpus, 0);
@@ -1063,6 +1063,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
mutex_init(&kvm->slots_arch_lock);
spin_lock_init(&kvm->mn_invalidate_lock);
rcuwait_init(&kvm->mn_memslots_update_rcuwait);
+   xa_init(&kvm->vcpu_array);
 
INIT_LIST_HEAD(&kvm->devices);
 
@@ -3658,7 +3659,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 
id)
}
 
vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
-   BUG_ON(kvm->vcpus[vcpu->vcpu_idx]);
+   r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 
GFP_KERNEL_ACCOUNT);
+   BUG_ON(r == -EBUSY);
+   if (r)
+   goto unlock_vcpu_destroy;
 
/* Fill the stats id string for the vcpu */
snprintf(vcpu->stats_id, sizeof(vcpu->stats_id), "kvm-%d/vcpu-%d",
@@ -3668,15 +3672,14 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, 
u32 id)
kvm_get_kvm(kvm);
r = create_vcpu_fd(vcpu);
if (r < 0) {
+   xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx);
kvm_put_kvm_no_destroy(kvm);
goto unlock_vcpu_destroy;
}
 
-   kvm->vcpus[vcpu->vcpu_idx] = vcpu;
-
/*
-* Pairs with smp_rmb() in kvm_get_vcpu.  Write kvm->vcpus
-* before kvm->online_vcpu's incremented value.
+* Pairs with smp_rmb() in kvm_get_vcpu.  Store the vcpu
+* pointer before kvm->online_vcpu's incremented value.
 */
smp_wmb();
atomic_inc(&kvm->online_vcpus);
-- 
2.30.2



[PATCH v2 6/7] KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index

2021-11-16 Thread Marc Zyngier
Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu
index. Unfortunately, we're about to move rework the iterator,
which requires this to be upgrade to an unsigned long.

Let's bite the bullet and repaint all of it in one go.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/arch_timer.c   |  8 
 arch/arm64/kvm/arm.c  |  6 +++---
 arch/arm64/kvm/pmu-emul.c |  2 +-
 arch/arm64/kvm/psci.c |  6 +++---
 arch/arm64/kvm/reset.c|  2 +-
 arch/arm64/kvm/vgic/vgic-init.c   | 10 ++
 arch/arm64/kvm/vgic/vgic-kvm-device.c |  2 +-
 arch/arm64/kvm/vgic/vgic-mmio-v2.c|  3 +--
 arch/arm64/kvm/vgic/vgic-mmio-v3.c|  7 ---
 arch/arm64/kvm/vgic/vgic-v3.c |  4 ++--
 arch/arm64/kvm/vgic/vgic-v4.c |  5 +++--
 arch/arm64/kvm/vgic/vgic.c|  2 +-
 arch/powerpc/kvm/book3s_32_mmu.c  |  2 +-
 arch/powerpc/kvm/book3s_64_mmu.c  |  2 +-
 arch/powerpc/kvm/book3s_hv.c  |  8 
 arch/powerpc/kvm/book3s_pr.c  |  2 +-
 arch/powerpc/kvm/book3s_xics.c|  6 +++---
 arch/powerpc/kvm/book3s_xics.h|  2 +-
 arch/powerpc/kvm/book3s_xive.c| 15 +--
 arch/powerpc/kvm/book3s_xive.h|  4 ++--
 arch/powerpc/kvm/book3s_xive_native.c |  8 
 arch/powerpc/kvm/e500_emulate.c   |  2 +-
 arch/riscv/kvm/vcpu_sbi.c |  2 +-
 arch/riscv/kvm/vmid.c |  2 +-
 arch/s390/kvm/interrupt.c |  2 +-
 arch/s390/kvm/kvm-s390.c  | 21 +++--
 arch/s390/kvm/kvm-s390.h  |  4 ++--
 arch/x86/kvm/hyperv.c |  7 ---
 arch/x86/kvm/i8254.c  |  2 +-
 arch/x86/kvm/i8259.c  |  5 +++--
 arch/x86/kvm/ioapic.c |  4 ++--
 arch/x86/kvm/irq_comm.c   |  7 ---
 arch/x86/kvm/kvm_onhyperv.c   |  3 ++-
 arch/x86/kvm/lapic.c  |  6 +++---
 arch/x86/kvm/svm/avic.c   |  2 +-
 arch/x86/kvm/svm/sev.c|  3 ++-
 arch/x86/kvm/x86.c| 21 +++--
 include/linux/kvm_host.h  |  2 +-
 virt/kvm/kvm_main.c   | 13 +++--
 39 files changed, 114 insertions(+), 100 deletions(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 3df67c127489..d6f4114f1d11 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -750,7 +750,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 /* Make the updates of cntvoff for all vtimer contexts atomic */
 static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
 {
-   int i;
+   unsigned long i;
struct kvm *kvm = vcpu->kvm;
struct kvm_vcpu *tmp;
 
@@ -1189,8 +1189,8 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 
 static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
 {
-   int vtimer_irq, ptimer_irq;
-   int i, ret;
+   int vtimer_irq, ptimer_irq, ret;
+   unsigned long i;
 
vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
ret = kvm_vgic_set_owner(vcpu, vtimer_irq, vcpu_vtimer(vcpu));
@@ -1297,7 +1297,7 @@ void kvm_timer_init_vhe(void)
 static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
 {
struct kvm_vcpu *vcpu;
-   int i;
+   unsigned long i;
 
kvm_for_each_vcpu(i, vcpu, kvm) {
vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 29942d8c357c..6177b0c882ea 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -624,7 +624,7 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
 
 void kvm_arm_halt_guest(struct kvm *kvm)
 {
-   int i;
+   unsigned long i;
struct kvm_vcpu *vcpu;
 
kvm_for_each_vcpu(i, vcpu, kvm)
@@ -634,7 +634,7 @@ void kvm_arm_halt_guest(struct kvm *kvm)
 
 void kvm_arm_resume_guest(struct kvm *kvm)
 {
-   int i;
+   unsigned long i;
struct kvm_vcpu *vcpu;
 
kvm_for_each_vcpu(i, vcpu, kvm) {
@@ -2020,7 +2020,7 @@ static int finalize_hyp_mode(void)
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
 {
struct kvm_vcpu *vcpu;
-   int i;
+   unsigned long i;
 
mpidr &= MPIDR_HWID_BITMASK;
kvm_for_each_vcpu(i, vcpu, kvm) {
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index a5e4bbf5e68f..0404357705a8 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -900,7 +900,7 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
  */
 static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
 {
-   int i;
+   unsigned long i;
struct kvm_vcpu *vcpu;
 
kvm_for_each_vcpu(i, vcpu, kvm) {
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 74c47d420253..ed675fce8fb7 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -121,8 +121,8 

[PATCH v2 7/7] KVM: Convert kvm_for_each_vcpu() to using xa_for_each_range()

2021-11-16 Thread Marc Zyngier
Now that the vcpu array is backed by an xarray, use the optimised
iterator that matches the underlying data structure.

Suggested-by: Sean Christopherson 
Signed-off-by: Marc Zyngier 
---
 include/linux/kvm_host.h | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index dc635fbfd901..d20e33378c63 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -705,11 +705,9 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm 
*kvm, int i)
return xa_load(&kvm->vcpu_array, i);
 }
 
-#define kvm_for_each_vcpu(idx, vcpup, kvm) \
-   for (idx = 0; \
-idx < atomic_read(&kvm->online_vcpus) && \
-(vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
-idx++)
+#define kvm_for_each_vcpu(idx, vcpup, kvm)\
+   xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, \
+ (atomic_read(&kvm->online_vcpus) - 1))
 
 static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
 {
-- 
2.30.2



Re: [PATCH] powerpc/xive: Change IRQ domain to a tree domain

2021-11-16 Thread Marc Zyngier
On Tue, 16 Nov 2021 13:40:22 +,
Cédric Le Goater  wrote:
> 
> Commit 4f86a06e2d6e ("irqdomain: Make normal and nomap irqdomains
> exclusive") introduced an IRQ_DOMAIN_FLAG_NO_MAP flag to isolate the
> 'nomap' domains still in use under the powerpc arch. With this new
> flag, the revmap_tree of the IRQ domain is not used anymore. This
> change broke the support of shared LSIs [1] in the XIVE driver because
> it was relying on a lookup in the revmap_tree to query previously
> mapped interrupts.

Just a lookup? Surely there is more to it, no?

> Linux now creates two distinct IRQ mappings on the
> same HW IRQ which can lead to unexpected behavior in the drivers.
> 
> The XIVE IRQ domain is not a direct mapping domain and its HW IRQ
> interrupt number space is rather large : 1M/socket on POWER9 and
> POWER10, change the XIVE driver to use a 'tree' domain type instead.
> 
> [1] For instance, a linux KVM guest with virtio-rng and virtio-balloon
> devices.
> 
> Cc: Marc Zyngier 
> Cc: sta...@vger.kernel.org # v5.14+
> Fixes: 4f86a06e2d6e ("irqdomain: Make normal and nomap irqdomains exclusive")
> Signed-off-by: Cédric Le Goater 
> ---
> 
>  Marc,
> 
>  The Fixes tag is there because the patch in question revealed that
>  something was broken in XIVE. genirq is not in cause. However, I
>  don't know for PS3 and Cell. May be less critical for now.

Depends if they expect something that a no-map domain cannot provide.

>  
>  arch/powerpc/sysdev/xive/common.c | 3 +--
>  arch/powerpc/sysdev/xive/Kconfig  | 1 -
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/sysdev/xive/common.c 
> b/arch/powerpc/sysdev/xive/common.c
> index fed6fd16c8f4..9d0f0fe25598 100644
> --- a/arch/powerpc/sysdev/xive/common.c
> +++ b/arch/powerpc/sysdev/xive/common.c
> @@ -1536,8 +1536,7 @@ static const struct irq_domain_ops xive_irq_domain_ops 
> = {
>  
>  static void __init xive_init_host(struct device_node *np)
>  {
> - xive_irq_domain = irq_domain_add_nomap(np, XIVE_MAX_IRQ,
> -&xive_irq_domain_ops, NULL);
> + xive_irq_domain = irq_domain_add_tree(np, &xive_irq_domain_ops, NULL);
>   if (WARN_ON(xive_irq_domain == NULL))
>   return;
>   irq_set_default_host(xive_irq_domain);
> diff --git a/arch/powerpc/sysdev/xive/Kconfig 
> b/arch/powerpc/sysdev/xive/Kconfig
> index 97796c6b63f0..785c292d104b 100644
> --- a/arch/powerpc/sysdev/xive/Kconfig
> +++ b/arch/powerpc/sysdev/xive/Kconfig
> @@ -3,7 +3,6 @@ config PPC_XIVE
>   bool
>   select PPC_SMP_MUXED_IPI
>   select HARDIRQS_SW_RESEND
> - select IRQ_DOMAIN_NOMAP
>  
>  config PPC_XIVE_NATIVE
>   bool

As long as this works, I'm happy with one less no-map user.

Acked-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH v3 07/12] KVM: arm64: Use Makefile.kvm for common files

2021-11-17 Thread Marc Zyngier
On Wed, 17 Nov 2021 17:39:58 +,
David Woodhouse  wrote:
> 
> From: David Woodhouse 
> 
> Signed-off-by: David Woodhouse 
> ---
>  arch/arm64/kvm/Makefile | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 989bb5dad2c8..04a53f71a6b6 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -5,14 +5,12 @@
>  
>  ccflags-y += -I $(srctree)/$(src)
>  
> -KVM=../../../virt/kvm
> +include $(srctree)/virt/kvm/Makefile.kvm
>  
>  obj-$(CONFIG_KVM) += kvm.o
>  obj-$(CONFIG_KVM) += hyp/
>  
> -kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
> -  $(KVM)/vfio.o $(KVM)/irqchip.o $(KVM)/binary_stats.o \
> -  arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
> +kvm-y += arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
>inject_fault.o va_layout.o handle_exit.o \
>guest.o debug.o reset.o sys_regs.o \
>vgic-sys-reg-v3.o fpsimd.o pmu.o \

Acked-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH v3 02/12] KVM: Add Makefile.kvm for common files, use it for x86

2021-11-17 Thread Marc Zyngier
On Wed, 17 Nov 2021 17:39:53 +,
David Woodhouse  wrote:
> 
> From: David Woodhouse 
> 
> Splitting kvm_main.c out into smaller and better-organized files is
> slightly non-trivial when it involves editing a bunch of per-arch
> KVM makefiles. Provide virt/kvm/Makefile.kvm for them to include.
> 
> Signed-off-by: David Woodhouse 
> ---
>  arch/x86/kvm/Makefile |  7 +--
>  virt/kvm/Makefile.kvm | 13 +
>  2 files changed, 14 insertions(+), 6 deletions(-)
>  create mode 100644 virt/kvm/Makefile.kvm
> 
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index 75dfd27b6e8a..30f244b64523 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -7,12 +7,7 @@ ifeq ($(CONFIG_FRAME_POINTER),y)
>  OBJECT_FILES_NON_STANDARD_vmenter.o := y
>  endif
>  
> -KVM := ../../../virt/kvm
> -
> -kvm-y+= $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
> - $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o 
> \
> - $(KVM)/dirty_ring.o $(KVM)/binary_stats.o
> -kvm-$(CONFIG_KVM_ASYNC_PF)   += $(KVM)/async_pf.o
> +include $(srctree)/virt/kvm/Makefile.kvm
>  
>  kvm-y+= x86.o emulate.o i8259.o irq.o lapic.o \
>  i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
> diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
> new file mode 100644
> index ..ffdcad3cc97a
> --- /dev/null
> +++ b/virt/kvm/Makefile.kvm
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Makefile for Kernel-based Virtual Machine module
> +#
> +
> +KVM ?= ../../../virt/kvm
> +
> +kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
> +kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
> +kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
> +kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
> +kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
> +kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o

Acked-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH v3 08/12] KVM: Propagate vcpu explicitly to mark_page_dirty_in_slot()

2021-11-17 Thread Marc Zyngier
On Wed, 17 Nov 2021 17:39:59 +,
David Woodhouse  wrote:
> 
> From: David Woodhouse 
> 
> The kvm_dirty_ring_get() function uses kvm_get_running_vcpu() to work out
> which dirty ring to use, but there are some use cases where that doesn't
> work.
> 
> There's one in setting the Xen shared info page, introduced in commit
> 629b5348841a ("KVM: x86/xen: update wallclock region") and reported by
> "butt3rflyh4ck"  in
> https://lore.kernel.org/kvm/cafco6xomos7eacn_n6v4txk7xl7iqra2gabg3f7e3naf5ug...@mail.gmail.com/
> 
> There's also about to be another one when the newly-reintroduced
> gfn_to_pfn_cache needs to mark a page as dirty from the MMU notifier
> which invalidates the mapping. In that case, we will *know* the vcpu
> that can be 'blamed' for dirtying the page, and we just need to be
> able to pass it in as an explicit argument when doing so.
> 
> This patch preemptively resolves the second issue, and paves the way
> for resolving the first. A complete fix for the first issue will need
> us to switch the Xen shinfo to be owned by a particular vCPU, which
> will happen in a separate patch.
> 
> Signed-off-by: David Woodhouse 
> ---
>  arch/arm64/kvm/mmu.c   |  2 +-
>  arch/x86/kvm/mmu/mmu.c |  2 +-
>  arch/x86/kvm/mmu/spte.c|  2 +-
>  arch/x86/kvm/mmu/tdp_mmu.c |  2 +-
>  arch/x86/kvm/x86.c |  4 ++--
>  include/linux/kvm_dirty_ring.h |  6 --
>  include/linux/kvm_host.h   |  3 ++-
>  virt/kvm/dirty_ring.c  |  8 ++--
>  virt/kvm/kvm_main.c| 18 +-
>  9 files changed, 27 insertions(+), 20 deletions(-)

What's the base for this series? This patch fails to compile for me
(at least on arm64), and the following patch doesn't apply on -rc1.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function

2020-11-25 Thread Marc Zyngier

On 2020-11-25 14:09, Laurent Vivier wrote:

On 25/11/2020 14:20, Thomas Gleixner wrote:

Laurent,

On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote:

The proper subsystem prefix is: 'genirq/irqdomain:' and the first 
letter

after the colon wants to be uppercase.


Ok.


This function adds an affinity parameter to irq_create_mapping().
This parameter is needed to pass it to irq_domain_alloc_descs().


A changelog has to explain the WHY. 'The parameter is needed' is not
really useful information.



The reason of this change is explained in PATCH 2.

I have two patches, one to change the interface with no functional
change (PATCH 1) and
one to fix the problem (PATCH 2). Moreover they don't cover the same 
subsystems.


I can either:
- merge the two patches
- or make a reference in the changelog of PATCH 1 to PATCH 2
  (something like "(see folowing patch "powerpc/pseries: pass MSI 
affinity to

   irq_create_mapping()")")
- or copy some information from PATCH 2
  (something like "this parameter is needed by rtas_setup_msi_irqs()
to pass the affinity
   to irq_domain_alloc_descs() to fix multiqueue affinity")

What do you prefer?


How about something like this for the first patch:

"There is currently no way to convey the affinity of an interrupt
 via irq_create_mapping(), which creates issues for devices that
 expect that affinity to be managed by the kernel.

 In order to sort this out, rename irq_create_mapping() to
 irq_create_mapping_affinity() with an additional affinity parameter
 that can conveniently passed down to irq_domain_alloc_descs().

 irq_create_mapping() is then re-implemented as a wrapper around
 irq_create_mapping_affinity()."

Thanks,

M.
--
Jazz is not dead. It just smells funny...


Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-25 Thread Marc Zyngier

On 2020-11-25 16:24, Laurent Vivier wrote:

On 25/11/2020 17:05, Denis Kirjanov wrote:

On 11/25/20, Laurent Vivier  wrote:

With virtio multiqueue, normally each queue IRQ is mapped to a CPU.

But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ 
affinity")

this is broken on pseries.


Please add "Fixes" tag.


In fact, the code in commit 0d9f0a52c8b9f is correct.

The problem is with MSI/X irq affinity and pseries. So this patch
fixes more than virtio_scsi. I put this information because this
commit allows to clearly show the problem. Perhaps I should remove
this line in fact?


This patch does not fix virtio_scsi at all, which as you noticed, is
correct. It really fixes the PPC MSI setup, which is starting to show
its age. So getting rid of the reference seems like the right thing to 
do.


I'm also not keen on the BugId thing. It should really be a lore link.
I also cannot find any such tag in the kernel, nor is it a documented
practice. The last reference to a Bugzilla entry seems to have happened
with 786b5219081ff16 (five years ago).

Thanks,

M.
--
Jazz is not dead. It just smells funny...


Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn

2021-06-24 Thread Marc Zyngier
Hi David,

On Thu, 24 Jun 2021 04:57:45 +0100,
David Stevens  wrote:
> 
> From: David Stevens 
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.
> 
> Signed-off-by: David Stevens 
> ---
>  arch/arm64/kvm/mmu.c   |   5 +-
>  arch/mips/kvm/mmu.c|   3 +-
>  arch/powerpc/kvm/book3s.c  |   3 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c|   5 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c |   4 +-
>  arch/powerpc/kvm/e500_mmu_host.c   |   2 +-
>  arch/x86/kvm/mmu/mmu.c |  11 ++-
>  arch/x86/kvm/mmu/mmu_audit.c   |   2 +-
>  arch/x86/kvm/x86.c |   2 +-
>  drivers/gpu/drm/i915/gvt/kvmgt.c   |   2 +-
>  include/linux/kvm_host.h   |  27 --
>  include/linux/kvm_types.h  |   5 +
>  virt/kvm/kvm_main.c| 121 +
>  14 files changed, 109 insertions(+), 88 deletions(-)
> 

[...]

> +kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
> +{
> + if (pfnpg.page)
> + return pfnpg.pfn;
> +
> + kvm_get_pfn(pfnpg.pfn);
> + return pfnpg.pfn;
> +}
> +EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);

I'd really like to see a tiny bit of documentation explaining that
calls to kvm_pfn_page_unwrap() are not idempotent.

Otherwise, looks good to me.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU

2021-06-24 Thread Marc Zyngier
On Thu, 24 Jun 2021 09:58:00 +0100,
Nicholas Piggin  wrote:
> 
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > From: David Stevens 
> >  out_unlock:
> > if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> > read_unlock(&vcpu->kvm->mmu_lock);
> > else
> > write_unlock(&vcpu->kvm->mmu_lock);
> > -   kvm_release_pfn_clean(pfn);
> > +   if (pfnpg.page)
> > +   put_page(pfnpg.page);
> > return r;
> >  }
> 
> How about
> 
>   kvm_release_pfn_page_clean(pfnpg);

I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
doesn't mark the page 'clean'. I find put_page() more correct.

Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
bad at naming things that I could just as well call it 'bob()'.

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU

2021-06-24 Thread Marc Zyngier
On Thu, 24 Jun 2021 04:57:47 +0100,
David Stevens  wrote:
> 
> From: David Stevens 
> 
> Avoid converting pfns returned by follow_fault_pfn to struct pages to
> transiently take a reference. The reference was originally taken to
> match the reference taken by gup. However, pfns returned by
> follow_fault_pfn may not have a struct page set up for reference
> counting.
> 
> Signed-off-by: David Stevens 
> ---
>  arch/arm64/kvm/mmu.c | 43 +++
>  1 file changed, 23 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 896b3644b36f..a741972cb75f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c

[...]

> @@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>*/
>   if (vma_pagesize == PAGE_SIZE && !force_pte)
>   vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> -&pfn, &fault_ipa);
> +&pfnpg, &fault_ipa);
>   if (writable)
>   prot |= KVM_PGTABLE_PROT_W;
>  
>   if (fault_status != FSC_PERM && !device)
> - clean_dcache_guest_page(pfn, vma_pagesize);
> + clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
>  
>   if (exec_fault) {
>   prot |= KVM_PGTABLE_PROT_X;
> - invalidate_icache_guest_page(pfn, vma_pagesize);
> + invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);

This is going to clash with what is currently in -next, specially with
MTE.

Paolo, if you really want to take this in 5.13, can you please push a
stable branch based on -rc4 or older so that I can merge it in and
test it?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 0/2] Fix arm64 boot regression in 5.14

2021-07-20 Thread Marc Zyngier

On 2021-07-20 13:35, Will Deacon wrote:

Hi folks,

Jonathan reports [1] that commit c742199a014d ("mm/pgtable: add stubs
for {pmd/pub}_{set/clear}_huge") breaks the boot on arm64 when huge
mappings are used to map the kernel linear map but the VA size is
configured such that PUDs are folded. This is because the 
non-functional

pud_set_huge() stub is used to create the linear map, which results in
1GB holes and a fatal data abort when the kernel attemps to access 
them.


Digging further into the issue, it also transpired that huge-vmap is
silently disabled in these configurations as well [2], despite working
correctly in 5.13. The latter issue causes the pgtable selftests to
scream due to a failing consistency check [3].

Rather than leave mainline in a terminally broken state for arm64 while
we figure this out, revert the offending commit to get things working
again. Unfortunately, reverting the change in isolation causes a build
breakage for 32-bit PowerPC 8xx machines which recently started relying
on the problematic stubs to support pte-level huge-vmap entries [4].
Since Christophe is away at the moment, this series first reverts the
PowerPC 8xx change in order to avoid breaking the build.

I would really like this to land for -rc3 and I can take these via the
arm64 fixes queue if the PowerPC folks are alright with them.

Cheers,

Will

[1] https://lore.kernel.org/r/20210717160118.9855-1-jonat...@marek.ca
[2] https://lore.kernel.org/r/20210719104918.GA6440@willie-the-truck
[3] 
https://lore.kernel.org/r/camuhmdxshordox-xxaeufdw3wx2peggfsqhvshvznkcgk-y...@mail.gmail.com/
[4] 
https://lore.kernel.org/r/8b972f1c03fb6bd59953035f0a3e4d26659de4f8.1620795204.git.christophe.le...@csgroup.eu/


Cc: Ard Biesheuvel 
Cc: Michael Ellerman 
Cc: Thomas Gleixner 
Cc: Benjamin Herrenschmidt 
Cc: Christophe Leroy 
Cc: Paul Mackerras 
Cc: Jonathan Marek 
Cc: Catalin Marinas 
Cc: Andrew Morton 
Cc: Nicholas Piggin 
Cc: Mark Rutland 
Cc: Geert Uytterhoeven 
Cc: Marc Zyngier 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org

--->8

Jonathan Marek (1):
  Revert "mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge"

Will Deacon (1):
  Revert "powerpc/8xx: add support for huge pages on VMAP and VMALLOC"

 arch/arm64/mm/mmu.c  | 20 -
 arch/powerpc/Kconfig |  2 +-
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 43 
 arch/x86/mm/pgtable.c| 34 +++-
 include/linux/pgtable.h  | 26 +---
 5 files changed, 25 insertions(+), 100 deletions(-)


Acked-by: Marc Zyngier 

M.
--
Jazz is not dead. It just smells funny...


Re: [PATCH v10 0/4] this patchset is to remove PPCisms for QEIC

2017-11-02 Thread Marc Zyngier
On 01/11/17 17:09, Thomas Gleixner wrote:
> On Wed, 1 Nov 2017, Qiang Zhao wrote:
>> Michael Ellerman  wrote
>>>
>>> Qiang Zhao  writes:
>>>
 Hi all,

 Could anybody review this patchset and take action on them? Thank you!
>>>
>>> Who maintains this? I don't actually know, it's not powerpc code, or is it?
>>
>> Yes, it's not powerpc code, it is irqchip code, maintained by Thomas, Jason 
>> and Marc according to MAINTAINERS file.
>>
>> Hi Thomas, Jason and Marc,
>>
>> Could you keep an eye on this patchset? Thank you!
> 
> It's on my radar, but I have zero capacity at the moment. Hopefully Marc
> can spare a few cycles.
I'll try, but I haven't been cc-ed on that one. I'll try to dig it out.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


Re: [PATCH v6 2/4] irqchip/qeic: merge qeic init code from platforms to a common function

2016-12-16 Thread Marc Zyngier
On 28/09/16 04:25, Zhao Qiang wrote:
> The codes of qe_ic init from a variety of platforms are redundant,
> merge them to a common function and put it to irqchip/irq-qeic.c
> 
> For non-p1021_mds mpc85xx_mds boards, use "qe_ic_init(np, 0,
> qe_ic_cascade_low_mpic, qe_ic_cascade_high_mpic);" instead of
> "qe_ic_init(np, 0, qe_ic_cascade_muxed_mpic, NULL);".
> 
> qe_ic_cascade_muxed_mpic was used for boards has the same interrupt
> number for low interrupt and high interrupt, qe_ic_init has checked
> if "low interrupt == high interrupt"
> 
> Signed-off-by: Zhao Qiang 
> ---
> Changes for v2:
>   - modify subject and commit msg
>   - add check for qeic by type
> Changes for v3:
>   - na
> Changes for v4:
>   - na
> Changes for v5:
>   - na
> Changes for v6:
>   - rebase
> 
>  arch/powerpc/platforms/83xx/misc.c| 15 ---
>  arch/powerpc/platforms/85xx/corenet_generic.c |  9 -
>  arch/powerpc/platforms/85xx/mpc85xx_mds.c | 14 --
>  arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 16 
>  arch/powerpc/platforms/85xx/twr_p102x.c   | 14 --
>  drivers/irqchip/irq-qeic.c| 16 
>  6 files changed, 16 insertions(+), 68 deletions(-)
> 

[...]

> --- a/drivers/irqchip/irq-qeic.c
> +++ b/drivers/irqchip/irq-qeic.c
> @@ -598,4 +598,20 @@ static int __init init_qe_ic_sysfs(void)
>   return 0;
>  }
>  
> +static int __init qeic_of_init(void)
> +{
> + struct device_node *np;
> +
> + np = of_find_compatible_node(NULL, NULL, "fsl,qe-ic");
> + if (!np) {
> + np = of_find_node_by_type(NULL, "qeic");
> + if (!np)
> + return;
> + }
> + qe_ic_init(np, 0, qe_ic_cascade_low_mpic,
> +qe_ic_cascade_high_mpic);
> + of_node_put(np);
> +}

Have you actually compiled that code?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


Re: [PATCH v6 2/4] irqchip/qeic: merge qeic init code from platforms to a common function

2016-12-16 Thread Marc Zyngier
On 16/12/16 08:43, Qiang Zhao wrote:
> On 16/12/16 04:33, Marc Zyngier  wrote:
> 
>> -Original Message-----
>> From: Marc Zyngier [mailto:marc.zyng...@arm.com]
>> Sent: Friday, December 16, 2016 4:33 PM
>> To: Qiang Zhao ; o...@buserror.net; t...@linutronix.de
>> Cc: ja...@lakedaemon.net; Xiaobo Xie ; linux-
>> ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
>> Subject: Re: [PATCH v6 2/4] irqchip/qeic: merge qeic init code from 
>> platforms to
>> a common function
>>
>> On 28/09/16 04:25, Zhao Qiang wrote:
>>> The codes of qe_ic init from a variety of platforms are redundant,
>>> merge them to a common function and put it to irqchip/irq-qeic.c
>>>
>>> For non-p1021_mds mpc85xx_mds boards, use "qe_ic_init(np, 0,
>>> qe_ic_cascade_low_mpic, qe_ic_cascade_high_mpic);" instead of
>>> "qe_ic_init(np, 0, qe_ic_cascade_muxed_mpic, NULL);".
>>>
>>> qe_ic_cascade_muxed_mpic was used for boards has the same interrupt
>>> number for low interrupt and high interrupt, qe_ic_init has checked if
>>> "low interrupt == high interrupt"
>>>
>>> Signed-off-by: Zhao Qiang 
>>> ---
>>> Changes for v2:
>>> - modify subject and commit msg
>>> - add check for qeic by type
>>> Changes for v3:
>>> - na
>>> Changes for v4:
>>> - na
>>> Changes for v5:
>>> - na
>>> Changes for v6:
>>> - rebase
>>>
>>>  arch/powerpc/platforms/83xx/misc.c| 15 ---
>>>  arch/powerpc/platforms/85xx/corenet_generic.c |  9 -
>>>  arch/powerpc/platforms/85xx/mpc85xx_mds.c | 14 --
>>>  arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 16 
>>>  arch/powerpc/platforms/85xx/twr_p102x.c   | 14 --
>>>  drivers/irqchip/irq-qeic.c| 16 
>>>  6 files changed, 16 insertions(+), 68 deletions(-)
>>>
>>
>> [...]
>>
>>> --- a/drivers/irqchip/irq-qeic.c
>>> +++ b/drivers/irqchip/irq-qeic.c
>>> @@ -598,4 +598,20 @@ static int __init init_qe_ic_sysfs(void)
>>> return 0;
>>>  }
>>>
>>> +static int __init qeic_of_init(void)
>>> +{
>>> +   struct device_node *np;
>>> +
>>> +   np = of_find_compatible_node(NULL, NULL, "fsl,qe-ic");
>>> +   if (!np) {
>>> +   np = of_find_node_by_type(NULL, "qeic");
>>> +   if (!np)
>>> +   return;
>>> +   }
>>> +   qe_ic_init(np, 0, qe_ic_cascade_low_mpic,
>>> +  qe_ic_cascade_high_mpic);
>>> +   of_node_put(np);
>>> +}
>>
>> Have you actually compiled that code?
> 
> Yes.

And the abundance of warnings that the compiler certainly spits doesn't
strike you as an issue that should be addressed before posting the patches?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


  1   2   >