[PATCH 1/3] PCI: pci-host-generic: Fix lookup of linux, pci-probe-only property
When pci-host-generic looks for the probe-only property, it seems to trust the DT to be correctly written, and assumes that there is a parameter to the property. Unfortunately, this is not always the case, and some firmware expose this property naked. The driver ends up making a decision based on whatever the property pointer points to, which is likely to be junk. Instead, let's check for the validity of the property, and ignore it if the firmware couldn't make up its mind. Signed-off-by: Marc Zyngier --- drivers/pci/host/pci-host-generic.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c index 265dd25..2b2e2ff 100644 --- a/drivers/pci/host/pci-host-generic.c +++ b/drivers/pci/host/pci-host-generic.c @@ -211,6 +211,7 @@ static int gen_pci_probe(struct platform_device *pdev) const char *type; const struct of_device_id *of_id; const int *prop; + int len; struct device *dev = &pdev->dev; struct device_node *np = dev->of_node; struct gen_pci *pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL); @@ -225,12 +226,16 @@ static int gen_pci_probe(struct platform_device *pdev) return -EINVAL; } - prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL); + prop = of_get_property(of_chosen, "linux,pci-probe-only", &len); if (prop) { - if (*prop) - pci_add_flags(PCI_PROBE_ONLY); - else - pci_clear_flags(PCI_PROBE_ONLY); + if (len) { + if (be32_to_cpup(prop)) + pci_add_flags(PCI_PROBE_ONLY); + else + pci_clear_flags(PCI_PROBE_ONLY); + } else { + dev_warn(&pdev->dev, "linux,pci-probe-only set without value, ignoring\n"); + } } of_id = of_match_node(gen_pci_of_match, np); -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/3] PCI: arm64/powerpc: Fix parsing of linux,pci-probe-only
The pci-host-generic driver parses the linux,pci-probe-only property, and assumes that it will have a boolean parameter. Turns out that the Seattle DTS file has a naked "linux,pci-probe-only" property, which leads to the driver dereferencing some unsuspecting memory location. Nothing really bad happens (we end up reading some other bit of DT, fortunately), but that not a reason to keep it this way. The first patch fixes the driver not to do silly things, and simply give a warning when this happens. The powerpc code from where this code was lifted is also fixed in a second patch. Finally, the bad property is removed from the Seatle DTS, because it is simply not necessary (it actually prevents me from using SR-IOV, which otherwise runs fine without the probe-only thing). Marc Zyngier (3): PCI: pci-host-generic: Fix lookup of linux,pci-probe-only property powerpc: PCI: Fix lookup of linux,pci-probe-only property arm64: dts: Drop linux,pci-probe-only from the Seattle DTS arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 - arch/powerpc/platforms/pseries/setup.c| 15 ++- drivers/pci/host/pci-host-generic.c | 15 ++- 3 files changed, 20 insertions(+), 11 deletions(-) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/3] powerpc: PCI: Fix lookup of linux, pci-probe-only property
When find_and_init_phbs() looks for the probe-only property, it seems to trust the firmware to be correctly written, and assumes that there is a parameter to the property. It is conceivable that the firmware could not be that perfect, and it could expose this property naked (at least one arm64 platform seems to exhibit this exact behaviour). The setup code the ends up making a decision based on whatever the property pointer points to, which is likely to be junk. Instead, let's check for the validity of the property, and ignore it if the firmware couldn't make up its mind. Signed-off-by: Marc Zyngier --- arch/powerpc/platforms/pseries/setup.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index df6a704..6bdc1f9 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -490,14 +490,19 @@ static void __init find_and_init_phbs(void) */ if (of_chosen) { const int *prop; + int len; prop = of_get_property(of_chosen, - "linux,pci-probe-only", NULL); + "linux,pci-probe-only", &len); if (prop) { - if (*prop) - pci_add_flags(PCI_PROBE_ONLY); - else - pci_clear_flags(PCI_PROBE_ONLY); + if (len) { + if (be32_to_cpup(prop)) + pci_add_flags(PCI_PROBE_ONLY); + else + pci_clear_flags(PCI_PROBE_ONLY); + } else { + pr_warn("linux,pci-probe-only set without value, ignoring\n"); + } } } } -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/3] arm64: dts: Drop linux, pci-probe-only from the Seattle DTS
The linux,pci-probe-only property mandates an argument to indicate whether or not to engage the "probe-only" mode, but the Seattle DTS just provides a naked property, which is illegal. Also, it turns out that the board is perfectly happy without probe-only, so let's drop this altogether. Signed-off-by: Marc Zyngier --- arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/boot/dts/amd/amd-overdrive.dts b/arch/arm64/boot/dts/amd/amd-overdrive.dts index 564a3f7..128fa94 100644 --- a/arch/arm64/boot/dts/amd/amd-overdrive.dts +++ b/arch/arm64/boot/dts/amd/amd-overdrive.dts @@ -14,7 +14,6 @@ chosen { stdout-path = &serial0; - linux,pci-probe-only; }; }; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] powerpc: PCI: Fix lookup of linux,pci-probe-only property
Hi Bjorn, On 14/08/15 15:57, Bjorn Helgaas wrote: > Hi Marc, > > On Fri, Aug 14, 2015 at 03:41:07PM +0100, Marc Zyngier wrote: >> When find_and_init_phbs() looks for the probe-only property, it seems >> to trust the firmware to be correctly written, and assumes that there >> is a parameter to the property. >> >> It is conceivable that the firmware could not be that perfect, and it >> could expose this property naked (at least one arm64 platform seems to >> exhibit this exact behaviour). The setup code the ends up making >> a decision based on whatever the property pointer points to, which >> is likely to be junk. >> >> Instead, let's check for the validity of the property, and ignore >> it if the firmware couldn't make up its mind. >> >> Signed-off-by: Marc Zyngier >> --- >> arch/powerpc/platforms/pseries/setup.c | 15 ++- >> 1 file changed, 10 insertions(+), 5 deletions(-) >> >> diff --git a/arch/powerpc/platforms/pseries/setup.c >> b/arch/powerpc/platforms/pseries/setup.c >> index df6a704..6bdc1f9 100644 >> --- a/arch/powerpc/platforms/pseries/setup.c >> +++ b/arch/powerpc/platforms/pseries/setup.c >> @@ -490,14 +490,19 @@ static void __init find_and_init_phbs(void) >> */ >> if (of_chosen) { >> const int *prop; >> +int len; >> >> prop = of_get_property(of_chosen, >> -"linux,pci-probe-only", NULL); >> +"linux,pci-probe-only", &len); >> if (prop) { >> -if (*prop) >> -pci_add_flags(PCI_PROBE_ONLY); >> -else >> -pci_clear_flags(PCI_PROBE_ONLY); >> +if (len) { >> +if (be32_to_cpup(prop)) >> +pci_add_flags(PCI_PROBE_ONLY); >> +else >> +pci_clear_flags(PCI_PROBE_ONLY); >> +} else { >> +pr_warn("linux,pci-probe-only set without >> value, ignoring\n"); >> +} > > This seems essentially identical to the pci-host-generic version. > Is there a way we can factor it out so there's only one copy? Probably. drivers/of/of_pci.c seems like a good landing place for it. I'll hack something and repost it. Thanks, M. -- Jazz is not dead. It just smells funny... ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/4] of/pci: Add of_pci_check_probe_only to parse "linux, pci-probe-only"
Both pci-host-generic and Pseries parse the "linux,pci-probe-only" to engage the PCI_PROBE_ONLY mode, and both have a subtle bug that can be triggered if the property has no parameter. Provide a generic implementation that can be used by both. Signed-off-by: Marc Zyngier --- drivers/of/of_pci.c| 30 ++ include/linux/of_pci.h | 3 +++ 2 files changed, 33 insertions(+) diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 5751dc5..a4e29ff 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -118,6 +118,36 @@ int of_get_pci_domain_nr(struct device_node *node) EXPORT_SYMBOL_GPL(of_get_pci_domain_nr); /** + * of_pci_check_probe_only - Setup probe only mode if linux,pci-probe-only + * is present and valid + * + * @node: device tree node that may contain the property (usually "chosen") + * + */ +void of_pci_check_probe_only(struct device_node *node) +{ + const int *prop; + int len; + + if (!node) + return; + + prop = of_get_property(node, "linux,pci-probe-only", &len); + if (prop) { + if (!len) { + pr_warn("linux,pci-probe-only set without value, ignoring\n"); + return; + } + + if (be32_to_cpup(prop)) + pci_add_flags(PCI_PROBE_ONLY); + else + pci_clear_flags(PCI_PROBE_ONLY); + } +} +EXPORT_SYMBOL_GPL(of_pci_check_probe_only); + +/** * of_pci_dma_configure - Setup DMA configuration * @dev: ptr to pci_dev struct of the PCI device * diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 29fd3fe..4c0a617 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -17,6 +17,7 @@ int of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin); int of_pci_parse_bus_range(struct device_node *node, struct resource *res); int of_get_pci_domain_nr(struct device_node *node); void of_pci_dma_configure(struct pci_dev *pci_dev); +void of_pci_check_probe_only(struct device_node *node); #else static inline int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *out_irq) { @@ -53,6 +54,8 @@ of_get_pci_domain_nr(struct device_node *node) } static inline void of_pci_dma_configure(struct pci_dev *pci_dev) { } + +static inline void of_pci_check_probe_only(struct device_node *node) { } #endif #if defined(CONFIG_OF_ADDRESS) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 0/4] PCI: arm64/powerpc: Fix parsing of linux, pci-probe-only
The pci-host-generic driver parses the linux,pci-probe-only property, and assumes that it will have a boolean parameter. Turns out that the Seattle DTS file has a naked "linux,pci-probe-only" property, which leads to the driver dereferencing some unsuspecting memory location. Nothing really bad happens (we end up reading some other bit of DT, fortunately), but that not a reason to keep it this way. Turns out that the Pseries code (where this code was lifted from) may suffer from the same issue. The first patch introduces a common (and fixed) version of that check that can be used by drivers and architectures that require it. The two following patches change the pci-host-generic driver and the powerpc code to use it. Finally, the bad property is removed from the Seatle DTS, because it is simply not necessary (it actually prevents me from using SR-IOV, which otherwise runs fine without the probe-only thing). This has been tested on the offending Seattle board. * From v1: - Consolidate the parsing in of_pci.c (Bjorn) Marc Zyngier (4): of/pci: Add of_pci_check_probe_only to parse "linux,pci-probe-only" PCI: pci-host-generic: Fix lookup of linux,pci-probe-only property powerpc: PCI: Fix lookup of linux,pci-probe-only property arm64: dts: Drop linux,pci-probe-only from the Seattle DTS arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 - arch/powerpc/platforms/pseries/setup.c| 14 ++ drivers/of/of_pci.c | 30 ++ drivers/pci/host/pci-host-generic.c | 9 + include/linux/of_pci.h| 3 +++ 5 files changed, 36 insertions(+), 21 deletions(-) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 2/4] PCI: pci-host-generic: Fix lookup of linux, pci-probe-only property
When pci-host-generic looks for the probe-only property, it seems to trust the DT to be correctly written, and assumes that there is a parameter to the property. Unfortunately, this is not always the case, and some firmware expose this property naked. The driver ends up making a decision based on whatever the property pointer points to, which is likely to be junk. Switch to the common of_pci.c implementation that doesn't suffer from this problem. Signed-off-by: Marc Zyngier --- drivers/pci/host/pci-host-generic.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c index 265dd25..545ff4e 100644 --- a/drivers/pci/host/pci-host-generic.c +++ b/drivers/pci/host/pci-host-generic.c @@ -210,7 +210,6 @@ static int gen_pci_probe(struct platform_device *pdev) int err; const char *type; const struct of_device_id *of_id; - const int *prop; struct device *dev = &pdev->dev; struct device_node *np = dev->of_node; struct gen_pci *pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL); @@ -225,13 +224,7 @@ static int gen_pci_probe(struct platform_device *pdev) return -EINVAL; } - prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL); - if (prop) { - if (*prop) - pci_add_flags(PCI_PROBE_ONLY); - else - pci_clear_flags(PCI_PROBE_ONLY); - } + of_pci_check_probe_only(of_chosen); of_id = of_match_node(gen_pci_of_match, np); pci->cfg.ops = of_id->data; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 3/4] powerpc: PCI: Fix lookup of linux, pci-probe-only property
When find_and_init_phbs() looks for the probe-only property, it seems to trust the firmware to be correctly written, and assumes that there is a parameter to the property. It is conceivable that the firmware could not be that perfect, and it could expose this property naked (at least one arm64 platform seems to exhibit this exact behaviour). The setup code the ends up making a decision based on whatever the property pointer points to, which is likely to be junk. Instead, switch to the common of_pci.c implementation that doesn't suffer from this problem and ignore the property if the firmware couldn't make up its mind. Signed-off-by: Marc Zyngier --- arch/powerpc/platforms/pseries/setup.c | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index df6a704..8434953 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -488,18 +489,7 @@ static void __init find_and_init_phbs(void) * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties * in chosen. */ - if (of_chosen) { - const int *prop; - - prop = of_get_property(of_chosen, - "linux,pci-probe-only", NULL); - if (prop) { - if (*prop) - pci_add_flags(PCI_PROBE_ONLY); - else - pci_clear_flags(PCI_PROBE_ONLY); - } - } + of_pci_check_probe_only(of_chosen); } static void __init pSeries_setup_arch(void) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 4/4] arm64: dts: Drop linux, pci-probe-only from the Seattle DTS
The linux,pci-probe-only property mandates an argument to indicate whether or not to engage the "probe-only" mode, but the Seattle DTS just provides a naked property, which is illegal. Also, it turns out that the board is perfectly happy without probe-only, so let's drop this altogether. Signed-off-by: Marc Zyngier --- arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/boot/dts/amd/amd-overdrive.dts b/arch/arm64/boot/dts/amd/amd-overdrive.dts index 564a3f7..128fa94 100644 --- a/arch/arm64/boot/dts/amd/amd-overdrive.dts +++ b/arch/arm64/boot/dts/amd/amd-overdrive.dts @@ -14,7 +14,6 @@ chosen { stdout-path = &serial0; - linux,pci-probe-only; }; }; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 1/4] of/pci: Add of_pci_check_probe_only to parse "linux, pci-probe-only"
On 02/09/15 23:23, Bjorn Helgaas wrote: > On Fri, Aug 14, 2015 at 04:08:10PM -0500, Rob Herring wrote: >> On Fri, Aug 14, 2015 at 11:19 AM, Marc Zyngier wrote: >>> Both pci-host-generic and Pseries parse the "linux,pci-probe-only" >>> to engage the PCI_PROBE_ONLY mode, and both have a subtle bug that >>> can be triggered if the property has no parameter. >> >> Humm, I bet we could break a lot of machines if we fixed the core code >> to properly make pp->value NULL when there is no value. >> >>> Provide a generic implementation that can be used by both. >>> >>> Signed-off-by: Marc Zyngier >>> --- >>> drivers/of/of_pci.c| 30 ++ >>> include/linux/of_pci.h | 3 +++ >>> 2 files changed, 33 insertions(+) >>> >>> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c >>> index 5751dc5..a4e29ff 100644 >>> --- a/drivers/of/of_pci.c >>> +++ b/drivers/of/of_pci.c >>> @@ -118,6 +118,36 @@ int of_get_pci_domain_nr(struct device_node *node) >>> EXPORT_SYMBOL_GPL(of_get_pci_domain_nr); >>> >>> /** >>> + * of_pci_check_probe_only - Setup probe only mode if linux,pci-probe-only >>> + * is present and valid >>> + * >>> + * @node: device tree node that may contain the property (usually "chosen") >>> + * >>> + */ >>> +void of_pci_check_probe_only(struct device_node *node) >>> +{ >>> + const int *prop; >>> + int len; >>> + >>> + if (!node) >>> + return; >>> + >>> + prop = of_get_property(node, "linux,pci-probe-only", &len); >> >> It is preferred to use of_property_read_u32 to avoid just these types >> of problems. > > I don't know enough OF to really understand this, but I infer that > this is a suggestion for improving the patch. Should I be waiting for > a v3 series? Yes, I'll post an update very shortly. Thanks for reminding me! M. -- Jazz is not dead. It just smells funny... ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 1/4] of/pci: Add of_pci_check_probe_only to parse "linux, pci-probe-only"
Both pci-host-generic and Pseries parse the "linux,pci-probe-only" property to engage the PCI_PROBE_ONLY mode, and both have a subtle bug that can be triggered if the property has no parameter. Provide a generic, safe implementation that can be used by both. Signed-off-by: Marc Zyngier --- drivers/of/of_pci.c| 31 +++ include/linux/of_pci.h | 3 +++ 2 files changed, 34 insertions(+) diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 5751dc5..7876343 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -118,6 +118,37 @@ int of_get_pci_domain_nr(struct device_node *node) EXPORT_SYMBOL_GPL(of_get_pci_domain_nr); /** + * of_pci_check_probe_only - Setup probe only mode if linux,pci-probe-only + * is present and valid + * + * @node: device tree node that may contain the property (usually "chosen") + * + */ +void of_pci_check_probe_only(struct device_node *node) +{ + u32 val; + int ret; + + if (!node) + return; + + ret = of_property_read_u32(node, "linux,pci-probe-only", &val); + if (ret) { + if (ret == -ENODATA || ret == -EOVERFLOW) + pr_warn("linux,pci-probe-only without valid value, ignoring\n"); + return; + } + + if (val) + pci_add_flags(PCI_PROBE_ONLY); + else + pci_clear_flags(PCI_PROBE_ONLY); + + pr_info("PCI: PROBE_ONLY %sabled\n", val ? "en" : "dis"); +} +EXPORT_SYMBOL_GPL(of_pci_check_probe_only); + +/** * of_pci_dma_configure - Setup DMA configuration * @dev: ptr to pci_dev struct of the PCI device * diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 29fd3fe..4c0a617 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -17,6 +17,7 @@ int of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin); int of_pci_parse_bus_range(struct device_node *node, struct resource *res); int of_get_pci_domain_nr(struct device_node *node); void of_pci_dma_configure(struct pci_dev *pci_dev); +void of_pci_check_probe_only(struct device_node *node); #else static inline int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *out_irq) { @@ -53,6 +54,8 @@ of_get_pci_domain_nr(struct device_node *node) } static inline void of_pci_dma_configure(struct pci_dev *pci_dev) { } + +static inline void of_pci_check_probe_only(struct device_node *node) { } #endif #if defined(CONFIG_OF_ADDRESS) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 0/4] PCI: arm64/powerpc: Fix parsing of linux, pci-probe-only
The pci-host-generic driver parses the linux,pci-probe-only property, and assumes that it will have a boolean parameter. Turns out that the Seattle DTS file has a naked "linux,pci-probe-only" property, which leads to the driver dereferencing some unsuspecting memory location. Nothing really bad happens (we end up reading some other bit of DT, fortunately), but that not a reason to keep it this way. Turns out that the Pseries code (where this code was lifted from) may suffer from the same issue. The first patch introduces a common (and fixed) version of that check that can be used by drivers and architectures that require it. The two following patches change the pci-host-generic driver and the powerpc code to use it. Finally, the bad property is removed from the Seatle DTS, because it is simply not necessary (it actually prevents me from using SR-IOV, which otherwise runs fine without the probe-only thing). This has been tested on the offending Seattle board. * From v2: - Use of_property_read_u32 to safely read the property (Rob) - Add a log message to indicate when we enable probe-only (probably quite useful for debugging) * From v1: - Consolidate the parsing in of_pci.c (Bjorn) Marc Zyngier (4): of/pci: Add of_pci_check_probe_only to parse "linux,pci-probe-only" PCI: pci-host-generic: Fix lookup of linux,pci-probe-only property powerpc: PCI: Fix lookup of linux,pci-probe-only property arm64: dts: Drop linux,pci-probe-only from the Seattle DTS arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 - arch/powerpc/platforms/pseries/setup.c| 14 ++ drivers/of/of_pci.c | 31 +++ drivers/pci/host/pci-host-generic.c | 9 + include/linux/of_pci.h| 3 +++ 5 files changed, 37 insertions(+), 21 deletions(-) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 2/4] PCI: pci-host-generic: Fix lookup of linux, pci-probe-only property
When pci-host-generic looks for the probe-only property, it seems to trust the DT to be correctly written, and assumes that there is a parameter to the property. Unfortunately, this is not always the case, and some firmware expose this property naked. The driver ends up making a decision based on whatever the property pointer points to, which is likely to be junk. Switch to the common of_pci.c implementation that doesn't suffer from this problem. Signed-off-by: Marc Zyngier --- drivers/pci/host/pci-host-generic.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c index 265dd25..545ff4e 100644 --- a/drivers/pci/host/pci-host-generic.c +++ b/drivers/pci/host/pci-host-generic.c @@ -210,7 +210,6 @@ static int gen_pci_probe(struct platform_device *pdev) int err; const char *type; const struct of_device_id *of_id; - const int *prop; struct device *dev = &pdev->dev; struct device_node *np = dev->of_node; struct gen_pci *pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL); @@ -225,13 +224,7 @@ static int gen_pci_probe(struct platform_device *pdev) return -EINVAL; } - prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL); - if (prop) { - if (*prop) - pci_add_flags(PCI_PROBE_ONLY); - else - pci_clear_flags(PCI_PROBE_ONLY); - } + of_pci_check_probe_only(of_chosen); of_id = of_match_node(gen_pci_of_match, np); pci->cfg.ops = of_id->data; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 3/4] powerpc: PCI: Fix lookup of linux, pci-probe-only property
When find_and_init_phbs() looks for the probe-only property, it seems to trust the firmware to be correctly written, and assumes that there is a parameter to the property. It is conceivable that the firmware could not be that perfect, and it could expose this property naked (at least one arm64 platform seems to exhibit this exact behaviour). The setup code the ends up making a decision based on whatever the property pointer points to, which is likely to be junk. Instead, switch to the common of_pci.c implementation that doesn't suffer from this problem and ignore the property if the firmware couldn't make up its mind. Signed-off-by: Marc Zyngier --- arch/powerpc/platforms/pseries/setup.c | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 39a74fa..412531f 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -495,18 +496,7 @@ static void __init find_and_init_phbs(void) * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties * in chosen. */ - if (of_chosen) { - const int *prop; - - prop = of_get_property(of_chosen, - "linux,pci-probe-only", NULL); - if (prop) { - if (*prop) - pci_add_flags(PCI_PROBE_ONLY); - else - pci_clear_flags(PCI_PROBE_ONLY); - } - } + of_pci_check_probe_only(of_chosen); } static void __init pSeries_setup_arch(void) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 4/4] arm64: dts: Drop linux, pci-probe-only from the Seattle DTS
The linux,pci-probe-only property mandates an argument to indicate whether or not to engage the "probe-only" mode, but the Seattle DTS just provides a naked property, which is illegal. Also, it turns out that the board is perfectly happy without probe-only, so let's drop this altogether. Signed-off-by: Marc Zyngier --- arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/boot/dts/amd/amd-overdrive.dts b/arch/arm64/boot/dts/amd/amd-overdrive.dts index 564a3f7..128fa94 100644 --- a/arch/arm64/boot/dts/amd/amd-overdrive.dts +++ b/arch/arm64/boot/dts/amd/amd-overdrive.dts @@ -14,7 +14,6 @@ chosen { stdout-path = &serial0; - linux,pci-probe-only; }; }; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 0/4] PCI: arm64/powerpc: Fix parsing of linux, pci-probe-only
The pci-host-generic driver parses the linux,pci-probe-only property, and assumes that it will have a boolean parameter. Turns out that the Seattle DTS file has a naked "linux,pci-probe-only" property, which leads to the driver dereferencing some unsuspecting memory location. Nothing really bad happens (we end up reading some other bit of DT, fortunately), but that not a reason to keep it this way. Turns out that the Pseries code (where this code was lifted from) may suffer from the same issue. The first patch introduces a common (and fixed) version of that check that can be used by drivers and architectures that require it. The two following patches change the pci-host-generic driver and the powerpc code to use it. Finally, the bad property is removed from the Seatle DTS, because it is simply not necessary (it actually prevents me from using SR-IOV, which otherwise runs fine without the probe-only thing). This has been tested on the offending Seattle board. * From v3: - Restrict the property lookup to /chosen (Rob) - Acked-by on patch #4 from Suravee - I swear this is the last time I rework these patches! ;-) * From v2: - Use of_property_read_u32 to safely read the property (Rob) - Add a log message to indicate when we enable probe-only (probably quite useful for debugging) * From v1: - Consolidate the parsing in of_pci.c (Bjorn) Marc Zyngier (4): of/pci: Add of_pci_check_probe_only to parse "linux,pci-probe-only" PCI: pci-host-generic: Fix lookup of linux,pci-probe-only property powerpc: PCI: Fix lookup of linux,pci-probe-only property arm64: dts: Drop linux,pci-probe-only from the Seattle DTS arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 - arch/powerpc/platforms/pseries/setup.c| 14 ++ drivers/of/of_pci.c | 28 drivers/pci/host/pci-host-generic.c | 9 + include/linux/of_pci.h| 3 +++ 5 files changed, 34 insertions(+), 21 deletions(-) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 1/4] of/pci: Add of_pci_check_probe_only to parse "linux, pci-probe-only"
Both pci-host-generic and Pseries parse the "linux,pci-probe-only" property to engage the PCI_PROBE_ONLY mode, and both have a subtle bug that can be triggered if the property has no parameter. Provide a generic, safe implementation that can be used by both. Signed-off-by: Marc Zyngier --- drivers/of/of_pci.c| 28 include/linux/of_pci.h | 3 +++ 2 files changed, 31 insertions(+) diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 5751dc5..5dd4966 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -118,6 +118,34 @@ int of_get_pci_domain_nr(struct device_node *node) EXPORT_SYMBOL_GPL(of_get_pci_domain_nr); /** + * of_pci_check_probe_only - Setup probe only mode if linux,pci-probe-only + * is present and valid + * + * @node: device tree node that may contain the property (usually "chosen") + * + */ +void of_pci_check_probe_only(void) +{ + u32 val; + int ret; + + ret = of_property_read_u32(of_chosen, "linux,pci-probe-only", &val); + if (ret) { + if (ret == -ENODATA || ret == -EOVERFLOW) + pr_warn("linux,pci-probe-only without valid value, ignoring\n"); + return; + } + + if (val) + pci_add_flags(PCI_PROBE_ONLY); + else + pci_clear_flags(PCI_PROBE_ONLY); + + pr_info("PCI: PROBE_ONLY %sabled\n", val ? "en" : "dis"); +} +EXPORT_SYMBOL_GPL(of_pci_check_probe_only); + +/** * of_pci_dma_configure - Setup DMA configuration * @dev: ptr to pci_dev struct of the PCI device * diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 29fd3fe..38c0533 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -17,6 +17,7 @@ int of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin); int of_pci_parse_bus_range(struct device_node *node, struct resource *res); int of_get_pci_domain_nr(struct device_node *node); void of_pci_dma_configure(struct pci_dev *pci_dev); +void of_pci_check_probe_only(void); #else static inline int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *out_irq) { @@ -53,6 +54,8 @@ of_get_pci_domain_nr(struct device_node *node) } static inline void of_pci_dma_configure(struct pci_dev *pci_dev) { } + +static inline void of_pci_check_probe_only(void) { } #endif #if defined(CONFIG_OF_ADDRESS) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 2/4] PCI: pci-host-generic: Fix lookup of linux, pci-probe-only property
When pci-host-generic looks for the probe-only property, it seems to trust the DT to be correctly written, and assumes that there is a parameter to the property. Unfortunately, this is not always the case, and some firmware expose this property naked. The driver ends up making a decision based on whatever the property pointer points to, which is likely to be junk. Switch to the common of_pci.c implementation that doesn't suffer from this problem. Signed-off-by: Marc Zyngier --- drivers/pci/host/pci-host-generic.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c index 265dd25..224303d 100644 --- a/drivers/pci/host/pci-host-generic.c +++ b/drivers/pci/host/pci-host-generic.c @@ -210,7 +210,6 @@ static int gen_pci_probe(struct platform_device *pdev) int err; const char *type; const struct of_device_id *of_id; - const int *prop; struct device *dev = &pdev->dev; struct device_node *np = dev->of_node; struct gen_pci *pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL); @@ -225,13 +224,7 @@ static int gen_pci_probe(struct platform_device *pdev) return -EINVAL; } - prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL); - if (prop) { - if (*prop) - pci_add_flags(PCI_PROBE_ONLY); - else - pci_clear_flags(PCI_PROBE_ONLY); - } + of_pci_check_probe_only(); of_id = of_match_node(gen_pci_of_match, np); pci->cfg.ops = of_id->data; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 3/4] powerpc: PCI: Fix lookup of linux, pci-probe-only property
When find_and_init_phbs() looks for the probe-only property, it seems to trust the firmware to be correctly written, and assumes that there is a parameter to the property. It is conceivable that the firmware could not be that perfect, and it could expose this property naked (at least one arm64 platform seems to exhibit this exact behaviour). The setup code the ends up making a decision based on whatever the property pointer points to, which is likely to be junk. Instead, switch to the common of_pci.c implementation that doesn't suffer from this problem and ignore the property if the firmware couldn't make up its mind. Signed-off-by: Marc Zyngier --- arch/powerpc/platforms/pseries/setup.c | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 39a74fa..6016709 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -495,18 +496,7 @@ static void __init find_and_init_phbs(void) * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties * in chosen. */ - if (of_chosen) { - const int *prop; - - prop = of_get_property(of_chosen, - "linux,pci-probe-only", NULL); - if (prop) { - if (*prop) - pci_add_flags(PCI_PROBE_ONLY); - else - pci_clear_flags(PCI_PROBE_ONLY); - } - } + of_pci_check_probe_only(); } static void __init pSeries_setup_arch(void) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 4/4] arm64: dts: Drop linux, pci-probe-only from the Seattle DTS
The linux,pci-probe-only property mandates an argument to indicate whether or not to engage the "probe-only" mode, but the Seattle DTS just provides a naked property, which is illegal. Also, it turns out that the board is perfectly happy without probe-only, so let's drop this altogether. Acked-by: Suravee Suthikulpanit Signed-off-by: Marc Zyngier --- arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/boot/dts/amd/amd-overdrive.dts b/arch/arm64/boot/dts/amd/amd-overdrive.dts index 564a3f7..128fa94 100644 --- a/arch/arm64/boot/dts/amd/amd-overdrive.dts +++ b/arch/arm64/boot/dts/amd/amd-overdrive.dts @@ -14,7 +14,6 @@ chosen { stdout-path = &serial0; - linux,pci-probe-only; }; }; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 0/4] PCI: arm64/powerpc: Fix parsing of linux, pci-probe-only
On 17/09/15 16:30, Bjorn Helgaas wrote: > On Fri, Sep 04, 2015 at 05:50:07PM +0100, Marc Zyngier wrote: >> The pci-host-generic driver parses the linux,pci-probe-only property, >> and assumes that it will have a boolean parameter. >> >> Turns out that the Seattle DTS file has a naked "linux,pci-probe-only" >> property, which leads to the driver dereferencing some unsuspecting >> memory location. Nothing really bad happens (we end up reading some >> other bit of DT, fortunately), but that not a reason to keep it this >> way. Turns out that the Pseries code (where this code was lifted from) >> may suffer from the same issue. >> >> The first patch introduces a common (and fixed) version of that check >> that can be used by drivers and architectures that require it. The two >> following patches change the pci-host-generic driver and the powerpc >> code to use it. >> >> Finally, the bad property is removed from the Seatle DTS, because it >> is simply not necessary (it actually prevents me from using SR-IOV, >> which otherwise runs fine without the probe-only thing). >> >> This has been tested on the offending Seattle board. >> >> * From v3: >> - Restrict the property lookup to /chosen (Rob) >> - Acked-by on patch #4 from Suravee >> - I swear this is the last time I rework these patches! ;-) >> >> * From v2: >> - Use of_property_read_u32 to safely read the property (Rob) >> - Add a log message to indicate when we enable probe-only >> (probably quite useful for debugging) >> >> * From v1: >> - Consolidate the parsing in of_pci.c (Bjorn) >> >> Marc Zyngier (4): >> of/pci: Add of_pci_check_probe_only to parse "linux,pci-probe-only" >> PCI: pci-host-generic: Fix lookup of linux,pci-probe-only property >> powerpc: PCI: Fix lookup of linux,pci-probe-only property >> arm64: dts: Drop linux,pci-probe-only from the Seattle DTS >> >> arch/arm64/boot/dts/amd/amd-overdrive.dts | 1 - >> arch/powerpc/platforms/pseries/setup.c| 14 ++ >> drivers/of/of_pci.c | 28 >> drivers/pci/host/pci-host-generic.c | 9 + >> include/linux/of_pci.h| 3 +++ >> 5 files changed, 34 insertions(+), 21 deletions(-) > > Applied with the comment tweak and acks to pci/host-generic for v4.4, > thanks! Turns out that the 01.org infrastructure has picked up on a compilation bug with randconfig. The following patch seems to fix it and should be applied on the first patch: diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 485d625..2da5abc 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -6,6 +6,8 @@ #include #include +#include + static inline int __of_pci_pci_compare(struct device_node *node, unsigned int data) { Sorry for the annoyance. M. -- Jazz is not dead. It just smells funny... ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 03/35] docs: fix broken references to text files
On Wed, 8 Apr 2020 17:45:55 +0200 Mauro Carvalho Chehab wrote: > Several references got broken due to txt to ReST conversion. > > Several of them can be automatically fixed with: > > scripts/documentation-file-ref-check --fix > > Reviewed-by: Mathieu Poirier # > hwtracing/coresight/Kconfig > Signed-off-by: Mauro Carvalho Chehab > --- > Documentation/virt/kvm/arm/pvtime.rst| 2 +- > virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 +- > virt/kvm/arm/vgic/vgic.h | 4 ++-- Acked-by: Marc Zyngier # kvm/arm64 M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On 2020-04-16 08:03, Vitaly Kuznetsov wrote: Tianjia Zhang writes: In earlier versions of kvm, 'kvm_run' is an independent structure and is not included in the vcpu structure. At present, 'kvm_run' is already included in the vcpu structure, so the parameter 'kvm_run' is redundant. This patch simplify the function definition, removes the extra 'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure if necessary. Signed-off-by: Tianjia Zhang --- v2 change: remove 'kvm_run' parameter and extract it from 'kvm_vcpu' arch/mips/kvm/mips.c | 3 ++- arch/powerpc/kvm/powerpc.c | 3 ++- arch/s390/kvm/kvm-s390.c | 3 ++- arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h | 2 +- virt/kvm/arm/arm.c | 6 +++--- virt/kvm/kvm_main.c| 2 +- 7 files changed, 17 insertions(+), 13 deletions(-) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 8f05dd0a0f4e..ec24adf4857e 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -439,8 +439,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, return -ENOIOCTLCMD; } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *run = vcpu->run; int r = -EINTR; vcpu_load(vcpu); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index e15166b0a16d..7e24691e138a 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1764,8 +1764,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) return r; } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *run = vcpu->run; int r; vcpu_load(vcpu); diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 19a81024fe16..443af3ead739 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4333,8 +4333,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) store_regs_fmt2(vcpu, kvm_run); } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *kvm_run = vcpu->run; int rc; if (kvm_run->immediate_exit) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3bf2ecafd027..a0338e86c90f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8707,8 +8707,9 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) trace_kvm_fpu(0); } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *kvm_run = vcpu->run; int r; vcpu_load(vcpu); @@ -8726,18 +8727,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) r = -EAGAIN; if (signal_pending(current)) { r = -EINTR; - vcpu->run->exit_reason = KVM_EXIT_INTR; + kvm_run->exit_reason = KVM_EXIT_INTR; ++vcpu->stat.signal_exits; } goto out; } - if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { + if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { r = -EINVAL; goto out; } - if (vcpu->run->kvm_dirty_regs) { + if (kvm_run->kvm_dirty_regs) { r = sync_regs(vcpu); if (r != 0) goto out; @@ -8767,7 +8768,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) out: kvm_put_guest_fpu(vcpu); - if (vcpu->run->kvm_valid_regs) + if (kvm_run->kvm_valid_regs) store_regs(vcpu); post_kvm_run_save(vcpu); kvm_sigset_deactivate(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6d58beb65454..1e17ef719595 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -866,7 +866,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state); int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg); -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu); int kvm_arch_init(void *opaque); void kvm_arch_exit(void); diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 48d0ec44ad77..f5390ac2165b 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -639,7 +639,6 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu) /** * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code * @vcpu: The
Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On 2020-04-16 09:45, Tianjia Zhang wrote: On 2020/4/16 16:28, Marc Zyngier wrote: [...] Overall, there is a large set of cleanups to be done when both the vcpu and the run structures are passed as parameters at the same time. Just grepping the tree for kvm_run is pretty instructive. M. Sorry, it's my mistake, I only compiled the x86 platform, I will submit patch again. Not a mistake. All I'm saying is that there is an opportunity for a larger series that cleans up the code base, rather than just doing a couple of localized changes. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v4 2/7] KVM: arm64: clean up redundant 'kvm_run' parameters
Hi Tianjia, On 2020-04-27 05:35, Tianjia Zhang wrote: In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang On the face of it, this looks OK, but I haven't tried to run the resulting kernel. I'm not opposed to taking this patch *if* there is an agreement across architectures to take the series (I value consistency over the janitorial exercise). Another thing is that this is going to conflict with the set of patches that move the KVM/arm code back where it belongs (arch/arm64/kvm), so I'd probably cherry-pick that one directly. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
On 13/06/18 10:20, Thomas Gleixner wrote: > On Wed, 13 Jun 2018, Julien Thierry wrote: >> On 13/06/18 09:34, Peter Zijlstra wrote: >>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote: diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 5426627..dbc5e02 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -61,6 +61,8 @@ *interrupt handler after suspending interrupts. For system *wakeup devices users need to implement wakeup detection in *their interrupt handlers. + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, if + *supported by the chip. */ >>> >>> NAK on the first 6 patches. You really _REALLY_ don't want to expose >>> NMIs to this level. >>> >> >> I've been working on something similar on arm64 side, and effectively the one >> thing that might be common to arm64 and intel is the interface to set an >> interrupt as NMI. So I guess it would be nice to agree on the right approach >> for this. >> >> The way I did it was by introducing a new irq_state and let the irqchip >> driver >> handle most of the work (if it supports that state): >> >> https://lkml.org/lkml/2018/5/25/181 >> >> This has not been ACKed nor NAKed. So I am just asking whether this is a more >> suitable approach, and if not, is there any suggestions on how to do this? > > I really didn't pay attention to that as it's burried in the GIC/ARM series > which is usually Marc's playground. I'm working my way through it ATM now that I have some brain cycles back. > Adding NMI delivery support at low level architecture irq chip level is > perfectly fine, but the exposure of that needs to be restricted very > much. Adding it to the generic interrupt control interfaces is not going to > happen. That's doomed to begin with and a complete abuse of the interface > as the handler can not ever be used for that. I can only agree with that. Allowing random driver to use request_irq() to make anything an NMI ultimately turns it into a complete mess ("hey, NMI is *faster*, let's use that"), and a potential source of horrible deadlocks. What I'd find more palatable is a way for an irqchip to be able to prioritize some interrupts based on a set of architecturally-defined requirements, and a separate NMI requesting/handling framework that is separate from the IRQ API, as the overall requirements are likely to completely different. It shouldn't have to be nearly as complex as the IRQ API, and require much stricter requirements in terms of what you can do there (flow handling should definitely be different). Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH 10/10] soc: qcom: convert to devm_platform_ioremap_resource
On Sat, 14 Dec 2019 17:54:47 + Yangtao Li wrote: > Use devm_platform_ioremap_resource() to simplify code. > > Signed-off-by: Yangtao Li > --- > drivers/soc/qcom/llcc-qcom.c| 7 +-- > drivers/soc/qcom/qcom-geni-se.c | 4 +--- > drivers/soc/qcom/qcom_aoss.c| 4 +--- > drivers/soc/qcom/qcom_gsbi.c| 5 + > drivers/soc/qcom/spm.c | 4 +--- > 5 files changed, 5 insertions(+), 19 deletions(-) > > diff --git a/drivers/soc/qcom/llcc-qcom.c b/drivers/soc/qcom/llcc-qcom.c > index 429b5a60a1ba..99e19df76889 100644 > --- a/drivers/soc/qcom/llcc-qcom.c > +++ b/drivers/soc/qcom/llcc-qcom.c > @@ -387,7 +387,6 @@ static int qcom_llcc_remove(struct platform_device *pdev) > static struct regmap *qcom_llcc_init_mmio(struct platform_device *pdev, > const char *name) > { > - struct resource *res; > void __iomem *base; > struct regmap_config llcc_regmap_config = { > .reg_bits = 32, > @@ -396,11 +395,7 @@ static struct regmap *qcom_llcc_init_mmio(struct > platform_device *pdev, > .fast_io = true, > }; > > - res = platform_get_resource_byname(pdev, IORESOURCE_MEM, name); > - if (!res) > - return ERR_PTR(-ENODEV); > - > - base = devm_ioremap_resource(&pdev->dev, res); > + base = devm_platform_ioremap_resource(pdev, 0); What guarantees do you have that entry 0 matches name? I find these changes pointless: they don't add much to the readability or maintainability of the code, and instead introduce creative bugs. M. -- Jazz is not dead. It just smells funny...
[PATCH] Make auto-loading of therm_pm72 possible
The therm_pm72 driver, used on the PowerMac G5 range, cannot be auto-loaded, since the driver itself creates both the device node and the driver instance. Moving the device node creation to the platform setup code and adding the necessary MODULE_DEVICE_TABLE() information allows the driver to be automatically loaded by udev on any semi-modern distribution. It "fixes" a major source of problem on G5 machines where the driver wasn't explicitely loaded by default, and the system would automatically shutdown under load. Tested on an Xserve G5. Signed-off-by: Marc Zyngier Cc: Benjamin Herrenschmidt --- arch/powerpc/platforms/powermac/setup.c |9 + drivers/macintosh/therm_pm72.c | 30 +- 2 files changed, 14 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/platforms/powermac/setup.c b/arch/powerpc/platforms/powermac/setup.c index 9deb274..d5aceb7 100644 --- a/arch/powerpc/platforms/powermac/setup.c +++ b/arch/powerpc/platforms/powermac/setup.c @@ -506,6 +506,15 @@ static int __init pmac_declare_of_platform_devices(void) of_platform_device_create(np, "smu", NULL); of_node_put(np); } + np = of_find_node_by_type(NULL, "fcu"); + if (np == NULL) { + /* Some machines have strangely broken device-tree */ + np = of_find_node_by_path("/u...@0,f800/i...@f8001000/f...@15e"); + } + if (np) { + of_platform_device_create(np, "temperature", NULL); + of_node_put(np); + } return 0; } diff --git a/drivers/macintosh/therm_pm72.c b/drivers/macintosh/therm_pm72.c index 4454927..2e041fd 100644 --- a/drivers/macintosh/therm_pm72.c +++ b/drivers/macintosh/therm_pm72.c @@ -2213,6 +2213,9 @@ static void fcu_lookup_fans(struct device_node *fcu_node) static int fcu_of_probe(struct platform_device* dev, const struct of_device_id *match) { state = state_detached; + of_dev = dev; + + dev_info(&dev->dev, "PowerMac G5 Thermal control driver %s\n", VERSION); /* Lookup the fans in the device tree */ fcu_lookup_fans(dev->dev.of_node); @@ -2235,6 +2238,7 @@ static const struct of_device_id fcu_match[] = }, {}, }; +MODULE_DEVICE_TABLE(of, fcu_match); static struct of_platform_driver fcu_of_platform_driver = { @@ -2252,8 +2256,6 @@ static struct of_platform_driver fcu_of_platform_driver = */ static int __init therm_pm72_init(void) { - struct device_node *np; - rackmac = of_machine_is_compatible("RackMac3,1"); if (!of_machine_is_compatible("PowerMac7,2") && @@ -2261,34 +2263,12 @@ static int __init therm_pm72_init(void) !rackmac) return -ENODEV; - printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); - - np = of_find_node_by_type(NULL, "fcu"); - if (np == NULL) { - /* Some machines have strangely broken device-tree */ - np = of_find_node_by_path("/u...@0,f800/i...@f8001000/f...@15e"); - if (np == NULL) { - printk(KERN_ERR "Can't find FCU in device-tree !\n"); - return -ENODEV; - } - } - of_dev = of_platform_device_create(np, "temperature", NULL); - if (of_dev == NULL) { - printk(KERN_ERR "Can't register FCU platform device !\n"); - return -ENODEV; - } - - of_register_platform_driver(&fcu_of_platform_driver); - - return 0; + return of_register_platform_driver(&fcu_of_platform_driver); } static void __exit therm_pm72_exit(void) { of_unregister_platform_driver(&fcu_of_platform_driver); - - if (of_dev) - of_device_unregister(of_dev); } module_init(therm_pm72_init); -- 1.7.2.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] Make auto-loading of therm_pm72 possible
The therm_pm72 driver, used on the PowerMac G5 range, cannot be auto-loaded, since the driver itself creates both the device node and the driver instance. Moving the device node creation to the platform setup code and adding the necessary MODULE_DEVICE_TABLE() information allows the driver to be automatically loaded by udev on any semi-modern distribution. It "fixes" a major source of problem on G5 machines where the driver wasn't explicitely loaded by default, and the system would automatically shutdown under load. Tested on an Xserve G5. Signed-off-by: Marc Zyngier Cc: Benjamin Herrenschmidt --- arch/powerpc/platforms/powermac/setup.c |9 + drivers/macintosh/therm_pm72.c | 30 +- 2 files changed, 14 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/platforms/powermac/setup.c b/arch/powerpc/platforms/powermac/setup.c index 9deb274..d5aceb7 100644 --- a/arch/powerpc/platforms/powermac/setup.c +++ b/arch/powerpc/platforms/powermac/setup.c @@ -506,6 +506,15 @@ static int __init pmac_declare_of_platform_devices(void) of_platform_device_create(np, "smu", NULL); of_node_put(np); } + np = of_find_node_by_type(NULL, "fcu"); + if (np == NULL) { + /* Some machines have strangely broken device-tree */ + np = of_find_node_by_path("/u...@0,f800/i...@f8001000/f...@15e"); + } + if (np) { + of_platform_device_create(np, "temperature", NULL); + of_node_put(np); + } return 0; } diff --git a/drivers/macintosh/therm_pm72.c b/drivers/macintosh/therm_pm72.c index 4454927..2e041fd 100644 --- a/drivers/macintosh/therm_pm72.c +++ b/drivers/macintosh/therm_pm72.c @@ -2213,6 +2213,9 @@ static void fcu_lookup_fans(struct device_node *fcu_node) static int fcu_of_probe(struct platform_device* dev, const struct of_device_id *match) { state = state_detached; + of_dev = dev; + + dev_info(&dev->dev, "PowerMac G5 Thermal control driver %s\n", VERSION); /* Lookup the fans in the device tree */ fcu_lookup_fans(dev->dev.of_node); @@ -2235,6 +2238,7 @@ static const struct of_device_id fcu_match[] = }, {}, }; +MODULE_DEVICE_TABLE(of, fcu_match); static struct of_platform_driver fcu_of_platform_driver = { @@ -2252,8 +2256,6 @@ static struct of_platform_driver fcu_of_platform_driver = */ static int __init therm_pm72_init(void) { - struct device_node *np; - rackmac = of_machine_is_compatible("RackMac3,1"); if (!of_machine_is_compatible("PowerMac7,2") && @@ -2261,34 +2263,12 @@ static int __init therm_pm72_init(void) !rackmac) return -ENODEV; - printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); - - np = of_find_node_by_type(NULL, "fcu"); - if (np == NULL) { - /* Some machines have strangely broken device-tree */ - np = of_find_node_by_path("/u...@0,f800/i...@f8001000/f...@15e"); - if (np == NULL) { - printk(KERN_ERR "Can't find FCU in device-tree !\n"); - return -ENODEV; - } - } - of_dev = of_platform_device_create(np, "temperature", NULL); - if (of_dev == NULL) { - printk(KERN_ERR "Can't register FCU platform device !\n"); - return -ENODEV; - } - - of_register_platform_driver(&fcu_of_platform_driver); - - return 0; + return of_register_platform_driver(&fcu_of_platform_driver); } static void __exit therm_pm72_exit(void) { of_unregister_platform_driver(&fcu_of_platform_driver); - - if (of_dev) - of_device_unregister(of_dev); } module_init(therm_pm72_init); -- 1.7.2.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kvm/arm64: Spark benchmark
On Fri, 09 Jun 2023 01:59:35 +0100, Yu Zhao wrote: > > TLDR > > Apache Spark spent 12% less time sorting four billion random integers twenty > times (in ~4 hours) after this patchset [1]. Why are the 3 architectures you have considered being evaluated with 3 different benchmarks? I am not suspecting you to have cherry-picked the best results, but I'd really like to see a variety of benchmarks that exercise this stuff differently. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v4 12/12] KVM: remove KVM_REQ_UNHALT
On Wed, 21 Sep 2022 01:32:01 +0100, Sean Christopherson wrote: > > From: Paolo Bonzini > > KVM_REQ_UNHALT is now unnecessary because it is replaced by the return > value of kvm_vcpu_block/kvm_vcpu_halt. Remove it. > > No functional change intended. > > Signed-off-by: Paolo Bonzini > Signed-off-by: Sean Christopherson Acked-by: Marc Zyngier M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH] KVM: Get rid of return value from kvm_arch_create_vm_debugfs()
On Fri, 16 Feb 2024 15:59:41 +, Oliver Upton wrote: > > The general expectation with debugfs is that any initialization failure > is nonfatal. Nevertheless, kvm_arch_create_vm_debugfs() allows > implementations to return an error and kvm_create_vm_debugfs() allows > that to fail VM creation. > > Change to a void return to discourage architectures from making debugfs > failures fatal for the VM. Seems like everyone already had the right > idea, as all implementations already return 0 unconditionally. > > Signed-off-by: Oliver Upton Acked-by: Marc Zyngier M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH mm-unstable v1 3/5] kvm/arm64: add kvm_arch_test_clear_young()
On Fri, 17 Feb 2023 04:21:28 +, Yu Zhao wrote: > > On Thu, Feb 16, 2023 at 9:12 PM Yu Zhao wrote: > > > > This patch adds kvm_arch_test_clear_young() for the vast majority of > > VMs that are not pKVM and run on hardware that sets the accessed bit > > in KVM page tables. I'm really interested in how you can back this statement. 90% of the HW I have access to is not FEAT_HWAFDB capable, either because it predates the feature or because the feature is too buggy to be useful. Do you have numbers? > > > > It relies on two techniques, RCU and cmpxchg, to safely test and clear > > the accessed bit without taking the MMU lock. The former protects KVM > > page tables from being freed while the latter clears the accessed bit > > atomically against both the hardware and other software page table > > walkers. > > > > Signed-off-by: Yu Zhao > > --- > > arch/arm64/include/asm/kvm_host.h | 7 +++ > > arch/arm64/include/asm/kvm_pgtable.h| 8 +++ > > arch/arm64/include/asm/stage2_pgtable.h | 43 ++ > > arch/arm64/kvm/arm.c| 1 + > > arch/arm64/kvm/hyp/pgtable.c| 51 ++-- > > arch/arm64/kvm/mmu.c| 77 - > > 6 files changed, 141 insertions(+), 46 deletions(-) > > Adding Marc and Will. > > Can you please add other interested parties that I've missed? The MAINTAINERS file has it all: KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64) M: Marc Zyngier M: Oliver Upton R: James Morse R: Suzuki K Poulose R: Zenghui Yu L: kvm...@lists.linux.dev May I suggest that you repost your patch and Cc the interested parties yourself? I guess most folks will want to see this in context, and not as a random, isolated change with no rationale. M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH mm-unstable v1 3/5] kvm/arm64: add kvm_arch_test_clear_young()
On Thu, 23 Feb 2023 03:58:47 +, Yu Zhao wrote: > > On Fri, Feb 17, 2023 at 2:00 AM Marc Zyngier wrote: > > > > On Fri, 17 Feb 2023 04:21:28 +, > > Yu Zhao wrote: > > > > > > On Thu, Feb 16, 2023 at 9:12 PM Yu Zhao wrote: > > > > > > > > This patch adds kvm_arch_test_clear_young() for the vast majority of > > > > VMs that are not pKVM and run on hardware that sets the accessed bit > > > > in KVM page tables. > > > > I'm really interested in how you can back this statement. 90% of the > > HW I have access to is not FEAT_HWAFDB capable, either because it > > predates the feature or because the feature is too buggy to be useful. > > This is my expericen too -- most devices are pre v8.2. And yet you have no issue writing the above. Puzzling. > > > Do you have numbers? > > Let's do a quick market survey by segment. The following only applies > to ARM CPUs: > > 1. Phones: none of the major Android phone vendors sell phones running > VMs; no other major Linux phone vendors. Maybe you should have a reality check and look at what your own employer is shipping. > 2. Laptops: only a very limited number of Chromebooks run VMs, namely > ACRVM. No other major Linux laptop vendors. Again, your employer disagree. > 3. Desktops: no major Linux desktop vendors. My desktop disagree (I send this from my arm64 desktop VM ). > 4. Embedded/IoT/Router: no major Linux vendors run VMs (Android Auto > can be a VM guest on QNX host). This email is brought to you via a router VM on an arm64 box. > 5. Cloud: this is where the vast majority VMs come from. Among the > vendors available to the general public, Ampere is the biggest player. > Here [1] is a list of its customers. The A-bit works well even on its > EVT products (Neoverse cores). Just the phone stuff dwarfs the number of cloud hosts. Hopefully your patches are better than your market analysis... M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify()
On Fri, 05 Apr 2024 12:58:11 +0100, Paolo Bonzini wrote: > > The .change_pte() MMU notifier callback was intended as an optimization > and for this reason it was initially called without a surrounding > mmu_notifier_invalidate_range_{start,end}() pair. It was only ever > implemented by KVM (which was also the original user of MMU notifiers) > and the rules on when to call set_pte_at_notify() rather than set_pte_at() > have always been pretty obscure. > > It may seem a miracle that it has never caused any hard to trigger > bugs, but there's a good reason for that: KVM's implementation has > been nonfunctional for a good part of its existence. Already in > 2012, commit 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with > invalidate_range_start and invalidate_range_end", 2012-10-09) changed the > .change_pte() callback to occur within an invalidate_range_start/end() > pair; and because KVM unmaps the sPTEs during .invalidate_range_start(), > .change_pte() has no hope of finding a sPTE to change. > > Therefore, all the code for .change_pte() can be removed from both KVM > and mm/, and set_pte_at_notify() can be replaced with just set_pte_at(). > > Please review! Also feel free to take the KVM patches through the mm > tree, as I don't expect any conflicts. > > Thanks, > > Paolo > > Paolo Bonzini (4): > KVM: delete .change_pte MMU notifier callback > KVM: remove unused argument of kvm_handle_hva_range() > mmu_notifier: remove the .change_pte() callback > mm: replace set_pte_at_notify() with just set_pte_at() > > arch/arm64/kvm/mmu.c | 34 - > arch/loongarch/include/asm/kvm_host.h | 1 - > arch/loongarch/kvm/mmu.c | 32 > arch/mips/kvm/mmu.c | 30 --- > arch/powerpc/include/asm/kvm_ppc.h| 1 - > arch/powerpc/kvm/book3s.c | 5 --- > arch/powerpc/kvm/book3s.h | 1 - > arch/powerpc/kvm/book3s_64_mmu_hv.c | 12 -- > arch/powerpc/kvm/book3s_hv.c | 1 - > arch/powerpc/kvm/book3s_pr.c | 7 > arch/powerpc/kvm/e500_mmu_host.c | 6 --- > arch/riscv/kvm/mmu.c | 20 -- > arch/x86/kvm/mmu/mmu.c| 54 +-- > arch/x86/kvm/mmu/spte.c | 16 > arch/x86/kvm/mmu/spte.h | 2 - > arch/x86/kvm/mmu/tdp_mmu.c| 46 --- > arch/x86/kvm/mmu/tdp_mmu.h| 1 - > include/linux/kvm_host.h | 2 - > include/linux/mmu_notifier.h | 44 -- > include/trace/events/kvm.h| 15 > kernel/events/uprobes.c | 5 +-- > mm/ksm.c | 4 +- > mm/memory.c | 7 +--- > mm/migrate_device.c | 8 +--- > mm/mmu_notifier.c | 17 - > virt/kvm/kvm_main.c | 50 + > 26 files changed, 10 insertions(+), 411 deletions(-) > Reviewed-by: Marc Zyngier M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon wrote: > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote: > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index dc04bc767865..ff17849be9f4 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1768,40 +1768,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct > > kvm_gfn_range *range) > > return false; > > } > > > > -bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > -{ > > - kvm_pfn_t pfn = pte_pfn(range->arg.pte); > > - > > - if (!kvm->arch.mmu.pgt) > > - return false; > > - > > - WARN_ON(range->end - range->start != 1); > > - > > - /* > > -* If the page isn't tagged, defer to user_mem_abort() for sanitising > > -* the MTE tags. The S2 pte should have been unmapped by > > -* mmu_notifier_invalidate_range_end(). > > -*/ > > - if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn))) > > - return false; > > - > > - /* > > -* We've moved a page around, probably through CoW, so let's treat > > -* it just like a translation fault and the map handler will clean > > -* the cache to the PoC. > > -* > > -* The MMU notifiers will have unmapped a huge PMD before calling > > -* ->change_pte() (which in turn calls kvm_set_spte_gfn()) and > > -* therefore we never need to clear out a huge PMD through this > > -* calling path and a memcache is not required. > > -*/ > > - kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT, > > - PAGE_SIZE, __pfn_to_phys(pfn), > > - KVM_PGTABLE_PROT_R, NULL, 0); > > - > > - return false; > > -} > > - > > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > { > > u64 size = (range->end - range->start) << PAGE_SHIFT; > > Thanks. It's nice to see this code retire: > > Acked-by: Will Deacon > > Also, if you're in the business of hacking the MMU notifier code, it > would be really great to change the .clear_flush_young() callback so > that the architecture could handle the TLB invalidation. At the moment, > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret' > being set by kvm_handle_hva_range(), whereas we could do a much > lighter-weight and targetted TLBI in the architecture page-table code > when we actually update the ptes for small ranges. Indeed, and I was looking at this earlier this week as it has a pretty devastating effect with NV (it blows the shadow S2 for that VMID, with costly consequences). In general, it feels like the TLB invalidation should stay with the code that deals with the page tables, as it has a pretty good idea of what needs to be invalidated and how -- specially on architectures that have a HW-broadcast facility like arm64. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
On Fri, 12 Apr 2024 15:54:22 +0100, Sean Christopherson wrote: > > On Fri, Apr 12, 2024, Marc Zyngier wrote: > > On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon wrote: > > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote: > > > Also, if you're in the business of hacking the MMU notifier code, it > > > would be really great to change the .clear_flush_young() callback so > > > that the architecture could handle the TLB invalidation. At the moment, > > > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret' > > > being set by kvm_handle_hva_range(), whereas we could do a much > > > lighter-weight and targetted TLBI in the architecture page-table code > > > when we actually update the ptes for small ranges. > > > > Indeed, and I was looking at this earlier this week as it has a pretty > > devastating effect with NV (it blows the shadow S2 for that VMID, with > > costly consequences). > > > > In general, it feels like the TLB invalidation should stay with the > > code that deals with the page tables, as it has a pretty good idea of > > what needs to be invalidated and how -- specially on architectures > > that have a HW-broadcast facility like arm64. > > Would this be roughly on par with an in-line flush on arm64? The simpler, > more > straightforward solution would be to let architectures override flush_on_ret, > but I would prefer something like the below as x86 can also utilize a > range-based > flush when running as a nested hypervisor. > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index ff0a20565f90..b65116294efe 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -601,6 +601,7 @@ static __always_inline kvm_mn_ret_t > __kvm_handle_hva_range(struct kvm *kvm, > struct kvm_gfn_range gfn_range; > struct kvm_memory_slot *slot; > struct kvm_memslots *slots; > + bool need_flush = false; > int i, idx; > > if (WARN_ON_ONCE(range->end <= range->start)) > @@ -653,10 +654,22 @@ static __always_inline kvm_mn_ret_t > __kvm_handle_hva_range(struct kvm *kvm, > break; > } > r.ret |= range->handler(kvm, &gfn_range); > + > + /* > +* Use a precise gfn-based TLB flush when possible, as > +* most mmu_notifier events affect a small-ish range. > +* Fall back to a full TLB flush if the gfn-based > flush > +* fails, and don't bother trying the gfn-based flush > +* if a full flush is already pending. > +*/ > + if (range->flush_on_ret && !need_flush && r.ret && > + kvm_arch_flush_remote_tlbs_range(kvm, > gfn_range.start > +gfn_range.end - > gfn_range.start + 1)) > + need_flush = true; > } > } > > - if (range->flush_on_ret && r.ret) > + if (need_flush) > kvm_flush_remote_tlbs(kvm); > > if (r.found_memslot) I think this works for us on HW that has range invalidation, which would already be a positive move. For the lesser HW that isn't range capable, it also gives the opportunity to perform the iteration ourselves or go for the nuclear option if the range is larger than some arbitrary constant (though this is additional work). But this still considers the whole range as being affected by range->handler(). It'd be interesting to try and see whether more precise tracking is (or isn't) generally beneficial. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [RFC PATCH kernel 1/2] irq: Add reference counting to IRQ mappings
Hi Alexey, On 2020-10-27 09:06, Alexey Kardashevskiy wrote: PCI devices share 4 legacy INTx interrupts from the same PCI host bridge. Device drivers map/unmap hardware interrupts via irq_create_mapping()/ irq_dispose_mapping(). The problem with that these interrupts are shared and when performing hot unplug, we need to unmap the interrupt only when the last device is released. This reuses already existing irq_desc::kobj for this purpose. The refcounter is naturally 1 when the descriptor is allocated already; this adds kobject_get() in places where already existing mapped virq is returned. That's quite interesting, as I was about to revive a patch series that rework the irqdomain subsystem to directly cache irq_desc instead of raw interrupt numbers. And for that, I needed some form of refcounting... This reorganizes irq_dispose_mapping() to release the kobj and let the release callback do the cleanup. If some driver or platform does its own reference counting, this expects those parties to call irq_find_mapping() and call irq_dispose_mapping() for every irq_create_fwspec_mapping()/irq_create_mapping(). Signed-off-by: Alexey Kardashevskiy --- kernel/irq/irqdesc.c | 35 +++ kernel/irq/irqdomain.c | 27 +-- 2 files changed, 36 insertions(+), 26 deletions(-) diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 1a7723604399..dae096238500 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -419,20 +419,39 @@ static struct irq_desc *alloc_desc(int irq, int node, unsigned int flags, return NULL; } +static void delayed_free_desc(struct rcu_head *rhp); static void irq_kobj_release(struct kobject *kobj) { struct irq_desc *desc = container_of(kobj, struct irq_desc, kobj); + struct irq_domain *domain; + unsigned int virq = desc->irq_data.irq; - free_masks(desc); - free_percpu(desc->kstat_irqs); - kfree(desc); + domain = desc->irq_data.domain; + if (domain) { + if (irq_domain_is_hierarchy(domain)) { + irq_domain_free_irqs(virq, 1); How does this work with hierarchical domains? Each domain should contribute as a reference on the irq_desc. But if you got here, it means the refcount has already dropped to 0. So either there is nothing to free here, or you don't track the references implied by the hierarchy. I suspect the latter. + } else { + irq_domain_disassociate(domain, virq); + irq_free_desc(virq); + } + } + + /* +* We free the descriptor, masks and stat fields via RCU. That +* allows demultiplex interrupts to do rcu based management of +* the child interrupts. +* This also allows us to use rcu in kstat_irqs_usr(). +*/ + call_rcu(&desc->rcu, delayed_free_desc); } static void delayed_free_desc(struct rcu_head *rhp) { struct irq_desc *desc = container_of(rhp, struct irq_desc, rcu); - kobject_put(&desc->kobj); + free_masks(desc); + free_percpu(desc->kstat_irqs); + kfree(desc); } static void free_desc(unsigned int irq) @@ -453,14 +472,6 @@ static void free_desc(unsigned int irq) */ irq_sysfs_del(desc); delete_irq_desc(irq); - - /* -* We free the descriptor, masks and stat fields via RCU. That -* allows demultiplex interrupts to do rcu based management of -* the child interrupts. -* This also allows us to use rcu in kstat_irqs_usr(). -*/ - call_rcu(&desc->rcu, delayed_free_desc); } static int alloc_descs(unsigned int start, unsigned int cnt, int node, diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index cf8b374b892d..02733ddc321f 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -638,6 +638,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain, { struct device_node *of_node; int virq; + struct irq_desc *desc; pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq); @@ -655,7 +656,9 @@ unsigned int irq_create_mapping(struct irq_domain *domain, /* Check if mapping already exists */ virq = irq_find_mapping(domain, hwirq); if (virq) { + desc = irq_to_desc(virq); pr_debug("-> existing mapping on virq %d\n", virq); + kobject_get(&desc->kobj); My worry with this is that there is probably a significant amount of code out there that relies on multiple calls to irq_create_mapping() with the same parameters not to have any side effects. They would expect a subsequent irq_dispose_mapping() to drop the translation altogether, and that's obviously not the case here. Have you audited the various call sites to see what could break? return virq; } @@ -751,6 +754,7 @@ unsigned int irq_create_fwspec_mapping(struct irq_fws
Re: [PATCH AUTOSEL 4.14 3/4] powerpc/msi: Fix deassociation of MSI descriptors
On Tue, 27 Dec 2022 20:36:08 +, Sasha Levin wrote: > > From: Marc Zyngier > > [ Upstream commit 4545c6a3d6ba71747eaa984c338ddd745e56e23f ] > > Since 2f2940d16823 ("genirq/msi: Remove filter from > msi_free_descs_free_range()"), the core MSI code relies on the > msi_desc->irq field to have been cleared before the descriptor > can be freed, as it indicates that there is no association with > a device anymore. > > The irq domain code provides this guarantee, and so does s390, > which is one of the two architectures not using irq domains for > MSIs. > > Powerpc, however, is missing this particular requirements, > leading in a splat and leaked MSI descriptors. > > Adding the now required irq reset to the handful of powerpc backends > that implement MSIs fixes that particular problem. > > Reported-by: Guenter Roeck > Signed-off-by: Marc Zyngier > Link: > https://lore.kernel.org/r/70dab88e-6119-0c12-7c6a-61bcbe239...@roeck-us.net > Signed-off-by: Sasha Levin > --- > arch/powerpc/platforms/4xx/hsta_msi.c | 1 + > arch/powerpc/platforms/cell/axon_msi.c | 1 + > arch/powerpc/platforms/pasemi/msi.c| 1 + > arch/powerpc/sysdev/fsl_msi.c | 1 + > arch/powerpc/sysdev/mpic_u3msi.c | 1 + > 5 files changed, 5 insertions(+) Please drop this patch from all stable branches. It isn't needed before 6.2, and doesn't fix anything on its own as nobody uses this structure after this point. M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v2 00/50] KVM: Rework kvm_init() and hardware enabling
On 2022-12-27 13:02, Paolo Bonzini wrote: Queued, thanks. I will leave this in kvm/queue after testing everything else and moving it to kvm/next; this way, we can wait for test results on other architectures. Can you please make this a topic branch, and if possible based on a released -rc? It would make it a lot easier for everyone. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: Call Trace and scary messages on kernel 2.6.27-rc5
On Mon, 8 Sep 2008 21:47:29 -0300 Rogério Brito <[EMAIL PROTECTED]> wrote: > Hi there. > > I just compiled a new 2.6.27-rc5 kernel for my standard Kurobox (an > embedded NAS that has an MPC8241 CPU, if I'm not mistaken) and upon > booting, I get these scary messages and Call Traces: [snip] This is a known problem. Check patch bb23b431db7405f6d79f989ad0236bf6428ba1cb in Linus' tree, it should correct the traces you see. Regards, M. -- And if you don't know where you're going, any road will take you there. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 2/9] ARM: PXA: Kill use of irq_create_strict_mappings()
irq_create_strict_mappings() is a poor way to allow the use of a linear IRQ domain as a legacy one. Let's be upfront about it and use a legacy domain when appropriate. Signed-off-by: Marc Zyngier --- arch/arm/mach-pxa/pxa_cplds_irqs.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/arch/arm/mach-pxa/pxa_cplds_irqs.c b/arch/arm/mach-pxa/pxa_cplds_irqs.c index 45c19ca96f7a..ec0d9b094744 100644 --- a/arch/arm/mach-pxa/pxa_cplds_irqs.c +++ b/arch/arm/mach-pxa/pxa_cplds_irqs.c @@ -147,22 +147,20 @@ static int cplds_probe(struct platform_device *pdev) } irq_set_irq_wake(fpga->irq, 1); - fpga->irqdomain = irq_domain_add_linear(pdev->dev.of_node, - CPLDS_NB_IRQ, - &cplds_irq_domain_ops, fpga); + if (base_irq) + fpga->irqdomain = irq_domain_add_legacy(pdev->dev.of_node, + CPLDS_NB_IRQ, + base_irq, 0, + &cplds_irq_domain_ops, + fpga); + else + fpga->irqdomain = irq_domain_add_linear(pdev->dev.of_node, + CPLDS_NB_IRQ, + &cplds_irq_domain_ops, + fpga); if (!fpga->irqdomain) return -ENODEV; - if (base_irq) { - ret = irq_create_strict_mappings(fpga->irqdomain, base_irq, 0, -CPLDS_NB_IRQ); - if (ret) { - dev_err(&pdev->dev, "couldn't create the irq mapping %d..%d\n", - base_irq, base_irq + CPLDS_NB_IRQ); - return ret; - } - } - return 0; } -- 2.29.2
[PATCH 3/9] irqchip/jcore-aic: Kill use of irq_create_strict_mappings()
irq_create_strict_mappings() is a poor way to allow the use of a linear IRQ domain as a legacy one. Let's be upfront about it. Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-jcore-aic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-jcore-aic.c b/drivers/irqchip/irq-jcore-aic.c index 033bccb41455..5f47d8ee4ae3 100644 --- a/drivers/irqchip/irq-jcore-aic.c +++ b/drivers/irqchip/irq-jcore-aic.c @@ -100,11 +100,11 @@ static int __init aic_irq_of_init(struct device_node *node, jcore_aic.irq_unmask = noop; jcore_aic.name = "AIC"; - domain = irq_domain_add_linear(node, dom_sz, &jcore_aic_irqdomain_ops, + domain = irq_domain_add_legacy(node, dom_sz - min_irq, min_irq, min_irq, + &jcore_aic_irqdomain_ops, &jcore_aic); if (!domain) return -ENOMEM; - irq_create_strict_mappings(domain, min_irq, min_irq, dom_sz - min_irq); return 0; } -- 2.29.2
[PATCH 1/9] irqdomain: Reimplement irq_linear_revmap() with irq_find_mapping()
irq_linear_revmap() is supposed to be a fast path for domain lookups, but it only exposes low-level details of the irqdomain implementation, details which are better kept private. The *overhead* between the two is only a function call and a couple of tests, so it is likely that noone can show any meaningful difference compared to the cost of taking an interrupt. Reimplement irq_linear_revmap() with irq_find_mapping() in order to preserve source code compatibility, and rename the internal field for a measure. Signed-off-by: Marc Zyngier --- include/linux/irqdomain.h | 22 +- kernel/irq/irqdomain.c| 6 +++--- 2 files changed, 12 insertions(+), 16 deletions(-) diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 33cacc8af26d..b9600f24878a 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -154,9 +154,9 @@ struct irq_domain_chip_generic; * Revmap data, used internally by irq_domain * @revmap_direct_max_irq: The largest hwirq that can be set for controllers that * support direct mapping - * @revmap_size: Size of the linear map table @linear_revmap[] + * @revmap_size: Size of the linear map table @revmap[] * @revmap_tree: Radix map tree for hwirqs that don't fit in the linear map - * @linear_revmap: Linear table of hwirq->virq reverse mappings + * @revmap: Linear table of hwirq->virq reverse mappings */ struct irq_domain { struct list_head link; @@ -180,7 +180,7 @@ struct irq_domain { unsigned int revmap_size; struct radix_tree_root revmap_tree; struct mutex revmap_tree_mutex; - unsigned int linear_revmap[]; + unsigned int revmap[]; }; /* Irq domain flags */ @@ -396,24 +396,20 @@ static inline unsigned int irq_create_mapping(struct irq_domain *host, return irq_create_mapping_affinity(host, hwirq, NULL); } - /** - * irq_linear_revmap() - Find a linux irq from a hw irq number. + * irq_find_mapping() - Find a linux irq from a hw irq number. * @domain: domain owning this hardware interrupt * @hwirq: hardware irq number in that domain space - * - * This is a fast path alternative to irq_find_mapping() that can be - * called directly by irq controller code to save a handful of - * instructions. It is always safe to call, but won't find irqs mapped - * using the radix tree. */ +extern unsigned int irq_find_mapping(struct irq_domain *host, +irq_hw_number_t hwirq); + static inline unsigned int irq_linear_revmap(struct irq_domain *domain, irq_hw_number_t hwirq) { - return hwirq < domain->revmap_size ? domain->linear_revmap[hwirq] : 0; + return irq_find_mapping(domain, hwirq); } -extern unsigned int irq_find_mapping(struct irq_domain *host, -irq_hw_number_t hwirq); + extern unsigned int irq_create_direct_mapping(struct irq_domain *host); extern int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index d10ab1d689d5..dfa716305ea9 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -486,7 +486,7 @@ static void irq_domain_clear_mapping(struct irq_domain *domain, irq_hw_number_t hwirq) { if (hwirq < domain->revmap_size) { - domain->linear_revmap[hwirq] = 0; + domain->revmap[hwirq] = 0; } else { mutex_lock(&domain->revmap_tree_mutex); radix_tree_delete(&domain->revmap_tree, hwirq); @@ -499,7 +499,7 @@ static void irq_domain_set_mapping(struct irq_domain *domain, struct irq_data *irq_data) { if (hwirq < domain->revmap_size) { - domain->linear_revmap[hwirq] = irq_data->irq; + domain->revmap[hwirq] = irq_data->irq; } else { mutex_lock(&domain->revmap_tree_mutex); radix_tree_insert(&domain->revmap_tree, hwirq, irq_data); @@ -920,7 +920,7 @@ unsigned int irq_find_mapping(struct irq_domain *domain, /* Check if the hwirq is in the linear revmap. */ if (hwirq < domain->revmap_size) - return domain->linear_revmap[hwirq]; + return domain->revmap[hwirq]; rcu_read_lock(); data = radix_tree_lookup(&domain->revmap_tree, hwirq); -- 2.29.2
[PATCH 0/9] Cleaning up some of the irqdomain features
The irqdomain subsystem has grown quite a lot over the years, and some of its features are either oddly used or just pretty useless. Some other helpers expose internals that are likely to change soon. Here are the bits that I'm trying to get rid of: - irq_linear_revmap exposes the internals of the domains, and only works for linear domains. The supposed speed improvement really isn't an argument, as it gets in the way of more significant optimisations. Reimplemented in terms of irq_find_mapping, which always works, and will eventually go at some point. - irq_create_strict_mappings is just a way to constraint the allocation of irqdescs into a given range, which is better served by creating a legacy irqdomain, and shows that the platform really needs to catch up with the 21st century. - irq_create_identity mapping is just a variation on the above, with irq==hwirq, although the way it is used is a gross hack in the SH code that needs to go. - irq_domain_add_legacy_isa is, as the names shows, a variation on irq_domain_add_legacy with a reservation of 16 interrupts. This is only used in the PPC code. The patches address all of the above, touching some of the ARM, PPC, and SH code that is affected. Another couple of patches address a MIPS platform that could use the generic code, and clean some of the comments in the irqdomain code. Unless anyone shouts, I'd like to take this into 5.13, as it is the basis of some further (and deeper) changes in the way irqdomains work. M. Marc Zyngier (9): irqdomain: Reimplement irq_linear_revmap() with irq_find_mapping() ARM: PXA: Kill use of irq_create_strict_mappings() irqchip/jcore-aic: Kill use of irq_create_strict_mappings() sh: intc: Drop the use of irq_create_identity_mapping() irqdomain: Kill irq_create_strict_mappings()/irq_create_identity_mapping() mips: netlogic: Use irq_domain_simple_ops for XLP PIC irqdomain: Drop references to recusive irqdomain setup powerpc: Convert irq_domain_add_legacy_isa use to irq_domain_add_legacy irqdomain: Kill irq_domain_add_legacy_isa Documentation/core-api/irq/irq-domain.rst | 1 - arch/arm/mach-pxa/pxa_cplds_irqs.c| 24 +-- arch/mips/netlogic/common/irq.c | 6 +-- arch/powerpc/include/asm/irq.h| 4 +- arch/powerpc/platforms/ps3/interrupt.c| 4 +- arch/powerpc/sysdev/i8259.c | 3 +- arch/powerpc/sysdev/mpic.c| 2 +- arch/powerpc/sysdev/tsi108_pci.c | 3 +- arch/powerpc/sysdev/xics/xics-common.c| 2 +- drivers/irqchip/irq-jcore-aic.c | 4 +- drivers/sh/intc/core.c| 50 ++- include/linux/irqdomain.h | 42 --- kernel/irq/irqdomain.c| 49 +++--- 13 files changed, 59 insertions(+), 135 deletions(-) -- 2.29.2
[PATCH 4/9] sh: intc: Drop the use of irq_create_identity_mapping()
Instead of playing games with using irq_create_identity_mapping() and irq_domain_associate(), drop the use of the former and only use the latter, together with the allocation of the irq_desc as needed. It doesn't make the code less awful, but at least the intent is clearer. Signed-off-by: Marc Zyngier --- drivers/sh/intc/core.c | 50 ++ 1 file changed, 21 insertions(+), 29 deletions(-) diff --git a/drivers/sh/intc/core.c b/drivers/sh/intc/core.c index a14684ffe4c1..6c57ee1ce6c4 100644 --- a/drivers/sh/intc/core.c +++ b/drivers/sh/intc/core.c @@ -179,6 +179,23 @@ static unsigned int __init save_reg(struct intc_desc_int *d, return 0; } +static bool __init intc_map(struct irq_domain *domain, int irq) +{ + int res; + + if (!irq_to_desc(irq) && irq_alloc_desc_at(irq, NUMA_NO_NODE) != irq) { + pr_err("uname to allocate IRQ %d\n", irq); + return false; + } + + if (irq_domain_associate(domain, irq, irq)) { + pr_err("domain association failure\n"); + return false; + } + + return true; +} + int __init register_intc_controller(struct intc_desc *desc) { unsigned int i, k, smp; @@ -316,19 +333,8 @@ int __init register_intc_controller(struct intc_desc *desc) if (!vect->enum_id) continue; - res = irq_create_identity_mapping(d->domain, irq); - if (unlikely(res)) { - if (res == -EEXIST) { - res = irq_domain_associate(d->domain, irq, irq); - if (unlikely(res)) { - pr_err("domain association failure\n"); - continue; - } - } else { - pr_err("can't identity map IRQ %d\n", irq); - continue; - } - } + if (!intc_map(d->domain, irq)) + continue; intc_irq_xlate_set(irq, vect->enum_id, d); intc_register_irq(desc, d, vect->enum_id, irq); @@ -345,22 +351,8 @@ int __init register_intc_controller(struct intc_desc *desc) * IRQ support, each vector still needs to have * its own backing irq_desc. */ - res = irq_create_identity_mapping(d->domain, irq2); - if (unlikely(res)) { - if (res == -EEXIST) { - res = irq_domain_associate(d->domain, - irq2, irq2); - if (unlikely(res)) { - pr_err("domain association " - "failure\n"); - continue; - } - } else { - pr_err("can't identity map IRQ %d\n", - irq); - continue; - } - } + if (!intc_map(d->domain, irq2)) + continue; vect2->enum_id = 0; -- 2.29.2
[PATCH 6/9] mips: netlogic: Use irq_domain_simple_ops for XLP PIC
Use the generic irq_domain_simple_ops structure instead of a home-grown one. Signed-off-by: Marc Zyngier --- arch/mips/netlogic/common/irq.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/mips/netlogic/common/irq.c b/arch/mips/netlogic/common/irq.c index cf33dd8a487e..c25a2ce5e29f 100644 --- a/arch/mips/netlogic/common/irq.c +++ b/arch/mips/netlogic/common/irq.c @@ -276,10 +276,6 @@ asmlinkage void plat_irq_dispatch(void) } #ifdef CONFIG_CPU_XLP -static const struct irq_domain_ops xlp_pic_irq_domain_ops = { - .xlate = irq_domain_xlate_onetwocell, -}; - static int __init xlp_of_pic_init(struct device_node *node, struct device_node *parent) { @@ -324,7 +320,7 @@ static int __init xlp_of_pic_init(struct device_node *node, xlp_pic_domain = irq_domain_add_legacy(node, n_picirqs, nlm_irq_to_xirq(socid, PIC_IRQ_BASE), PIC_IRQ_BASE, - &xlp_pic_irq_domain_ops, NULL); + &irq_domain_simple_ops, NULL); if (xlp_pic_domain == NULL) { pr_err("PIC %pOFn: Creating legacy domain failed!\n", node); return -EINVAL; -- 2.29.2
[PATCH 7/9] irqdomain: Drop references to recusive irqdomain setup
It was never completely implemented, and was removed a long time ago. Adjust the documentation to reflect this. Signed-off-by: Marc Zyngier --- kernel/irq/irqdomain.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 0ba761e6b0a8..89474aa88220 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -1659,12 +1659,10 @@ void irq_domain_free_irqs(unsigned int virq, unsigned int nr_irqs) /** * irq_domain_alloc_irqs_parent - Allocate interrupts from parent domain + * @domain:Domain below which interrupts must be allocated * @irq_base: Base IRQ number * @nr_irqs: Number of IRQs to allocate * @arg: Allocation data (arch/domain specific) - * - * Check whether the domain has been setup recursive. If not allocate - * through the parent domain. */ int irq_domain_alloc_irqs_parent(struct irq_domain *domain, unsigned int irq_base, unsigned int nr_irqs, @@ -1680,11 +1678,9 @@ EXPORT_SYMBOL_GPL(irq_domain_alloc_irqs_parent); /** * irq_domain_free_irqs_parent - Free interrupts from parent domain + * @domain:Domain below which interrupts must be freed * @irq_base: Base IRQ number * @nr_irqs: Number of IRQs to free - * - * Check whether the domain has been setup recursive. If not free - * through the parent domain. */ void irq_domain_free_irqs_parent(struct irq_domain *domain, unsigned int irq_base, unsigned int nr_irqs) -- 2.29.2
[PATCH 5/9] irqdomain: Kill irq_create_strict_mappings()/irq_create_identity_mapping()
No user of these APIs are left, remove them. Signed-off-by: Marc Zyngier --- include/linux/irqdomain.h | 9 - kernel/irq/irqdomain.c| 35 --- 2 files changed, 44 deletions(-) diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index b9600f24878a..3997ed9e4d7d 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -411,15 +411,6 @@ static inline unsigned int irq_linear_revmap(struct irq_domain *domain, } extern unsigned int irq_create_direct_mapping(struct irq_domain *host); -extern int irq_create_strict_mappings(struct irq_domain *domain, - unsigned int irq_base, - irq_hw_number_t hwirq_base, int count); - -static inline int irq_create_identity_mapping(struct irq_domain *host, - irq_hw_number_t hwirq) -{ - return irq_create_strict_mappings(host, hwirq, hwirq, 1); -} extern const struct irq_domain_ops irq_domain_simple_ops; diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index dfa716305ea9..0ba761e6b0a8 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -703,41 +703,6 @@ unsigned int irq_create_mapping_affinity(struct irq_domain *domain, } EXPORT_SYMBOL_GPL(irq_create_mapping_affinity); -/** - * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs - * @domain: domain owning the interrupt range - * @irq_base: beginning of linux IRQ range - * @hwirq_base: beginning of hardware IRQ range - * @count: Number of interrupts to map - * - * This routine is used for allocating and mapping a range of hardware - * irqs to linux irqs where the linux irq numbers are at pre-defined - * locations. For use by controllers that already have static mappings - * to insert in to the domain. - * - * Non-linear users can use irq_create_identity_mapping() for IRQ-at-a-time - * domain insertion. - * - * 0 is returned upon success, while any failure to establish a static - * mapping is treated as an error. - */ -int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, - irq_hw_number_t hwirq_base, int count) -{ - struct device_node *of_node; - int ret; - - of_node = irq_domain_get_of_node(domain); - ret = irq_alloc_descs(irq_base, irq_base, count, - of_node_to_nid(of_node)); - if (unlikely(ret < 0)) - return ret; - - irq_domain_associate_many(domain, irq_base, hwirq_base, count); - return 0; -} -EXPORT_SYMBOL_GPL(irq_create_strict_mappings); - static int irq_domain_translate(struct irq_domain *d, struct irq_fwspec *fwspec, irq_hw_number_t *hwirq, unsigned int *type) -- 2.29.2
[PATCH 8/9] powerpc: Convert irq_domain_add_legacy_isa use to irq_domain_add_legacy
irq_domain_add_legacy_isa is a pain. It only exists for the benefit of two PPC-specific drivers, and creates an ugly dependency between asm/irq.h and linux/irqdomain.h Instead, let's convert these two drivers to irq_domain_add_legacy(), stop using NUM_ISA_INTERRUPTS by directly setting NR_IRQS_LEGACY. The dependency cannot be broken yet as there is a lot of PPC-related code that depends on it, but that's the first step towards it. A followup patch will remove irq_domain_add_legacy_isa. Signed-off-by: Marc Zyngier --- arch/powerpc/include/asm/irq.h | 4 ++-- arch/powerpc/platforms/ps3/interrupt.c | 4 ++-- arch/powerpc/sysdev/i8259.c| 3 ++- arch/powerpc/sysdev/mpic.c | 2 +- arch/powerpc/sysdev/tsi108_pci.c | 3 ++- arch/powerpc/sysdev/xics/xics-common.c | 2 +- 6 files changed, 10 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index f3f264e441a7..aeb209144c68 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -23,8 +23,8 @@ extern atomic_t ppc_n_lost_interrupts; /* Total number of virq in the platform */ #define NR_IRQSCONFIG_NR_IRQS -/* Same thing, used by the generic IRQ code */ -#define NR_IRQS_LEGACY NUM_ISA_INTERRUPTS +/* Number of irqs reserved for a legacy isa controller */ +#define NR_IRQS_LEGACY 16 extern irq_hw_number_t virq_to_hw(unsigned int virq); diff --git a/arch/powerpc/platforms/ps3/interrupt.c b/arch/powerpc/platforms/ps3/interrupt.c index 78f2339ed5cb..93e367a00452 100644 --- a/arch/powerpc/platforms/ps3/interrupt.c +++ b/arch/powerpc/platforms/ps3/interrupt.c @@ -45,7 +45,7 @@ * implementation equates HV plug value to Linux virq value, constrains each * interrupt to have a system wide unique plug number, and limits the range * of the plug values to map into the first dword of the bitmaps. This - * gives a usable range of plug values of {NUM_ISA_INTERRUPTS..63}. Note + * gives a usable range of plug values of {NR_IRQS_LEGACY..63}. Note * that there is no constraint on how many in this set an individual thread * can acquire. * @@ -721,7 +721,7 @@ static unsigned int ps3_get_irq(void) } #if defined(DEBUG) - if (unlikely(plug < NUM_ISA_INTERRUPTS || plug > PS3_PLUG_MAX)) { + if (unlikely(plug < NR_IRQS_LEGACY || plug > PS3_PLUG_MAX)) { dump_bmp(&per_cpu(ps3_private, 0)); dump_bmp(&per_cpu(ps3_private, 1)); BUG(); diff --git a/arch/powerpc/sysdev/i8259.c b/arch/powerpc/sysdev/i8259.c index c1d76c344351..dc1a151c63d7 100644 --- a/arch/powerpc/sysdev/i8259.c +++ b/arch/powerpc/sysdev/i8259.c @@ -260,7 +260,8 @@ void i8259_init(struct device_node *node, unsigned long intack_addr) raw_spin_unlock_irqrestore(&i8259_lock, flags); /* create a legacy host */ - i8259_host = irq_domain_add_legacy_isa(node, &i8259_host_ops, NULL); + i8259_host = irq_domain_add_legacy(node, NR_IRQS_LEGACY, 0, 0, + &i8259_host_ops, NULL); if (i8259_host == NULL) { printk(KERN_ERR "i8259: failed to allocate irq host !\n"); return; diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c index b0426f28946a..995fb2ada507 100644 --- a/arch/powerpc/sysdev/mpic.c +++ b/arch/powerpc/sysdev/mpic.c @@ -602,7 +602,7 @@ static void __init mpic_scan_ht_pics(struct mpic *mpic) /* Find an mpic associated with a given linux interrupt */ static struct mpic *mpic_find(unsigned int irq) { - if (irq < NUM_ISA_INTERRUPTS) + if (irq < NR_IRQS_LEGACY) return NULL; return irq_get_chip_data(irq); diff --git a/arch/powerpc/sysdev/tsi108_pci.c b/arch/powerpc/sysdev/tsi108_pci.c index 49f9541954f8..042bb38fa5c2 100644 --- a/arch/powerpc/sysdev/tsi108_pci.c +++ b/arch/powerpc/sysdev/tsi108_pci.c @@ -404,7 +404,8 @@ void __init tsi108_pci_int_init(struct device_node *node) { DBG("Tsi108_pci_int_init: initializing PCI interrupts\n"); - pci_irq_host = irq_domain_add_legacy_isa(node, &pci_irq_domain_ops, NULL); + pci_irq_host = irq_domain_add_legacy(node, NR_IRQS_LEGACY, 0, 0, +&pci_irq_domain_ops, NULL); if (pci_irq_host == NULL) { printk(KERN_ERR "pci_irq_host: failed to allocate irq domain!\n"); return; diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c index 7e4305c01bac..fdf8dbb6 100644 --- a/arch/powerpc/sysdev/xics/xics-common.c +++ b/arch/powerpc/sysdev/xics/xics-common.c @@ -201,7 +201,7 @@ void xics_migrate_irqs_away(void) struct ics *ics; /* We can't set affinity on ISA interrupts */ - if (virq < NUM_ISA_INTERRUPTS) +
[PATCH 9/9] irqdomain: Kill irq_domain_add_legacy_isa
This helper doesn't have a user anymore, let's remove it. Signed-off-by: Marc Zyngier --- Documentation/core-api/irq/irq-domain.rst | 1 - include/linux/irqdomain.h | 11 --- 2 files changed, 12 deletions(-) diff --git a/Documentation/core-api/irq/irq-domain.rst b/Documentation/core-api/irq/irq-domain.rst index a77c24c27f7b..84e561db468f 100644 --- a/Documentation/core-api/irq/irq-domain.rst +++ b/Documentation/core-api/irq/irq-domain.rst @@ -146,7 +146,6 @@ Legacy irq_domain_add_simple() irq_domain_add_legacy() - irq_domain_add_legacy_isa() irq_domain_create_legacy() The Legacy mapping is a special case for drivers that already have a diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 3997ed9e4d7d..2a7ecf08d56e 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -45,9 +45,6 @@ struct cpumask; struct seq_file; struct irq_affinity_desc; -/* Number of irqs reserved for a legacy isa controller */ -#define NUM_ISA_INTERRUPTS 16 - #define IRQ_DOMAIN_IRQ_SPEC_PARAMS 16 /** @@ -346,14 +343,6 @@ static inline struct irq_domain *irq_domain_add_nomap(struct device_node *of_nod { return __irq_domain_add(of_node_to_fwnode(of_node), 0, max_irq, max_irq, ops, host_data); } -static inline struct irq_domain *irq_domain_add_legacy_isa( - struct device_node *of_node, - const struct irq_domain_ops *ops, - void *host_data) -{ - return irq_domain_add_legacy(of_node, NUM_ISA_INTERRUPTS, 0, 0, ops, -host_data); -} static inline struct irq_domain *irq_domain_add_tree(struct device_node *of_node, const struct irq_domain_ops *ops, void *host_data) -- 2.29.2
Re: [PATCH 1/9] irqdomain: Reimplement irq_linear_revmap() with irq_find_mapping()
Christophe, On Tue, 06 Apr 2021 12:21:33 +0100, Christophe Leroy wrote: > > > > Le 06/04/2021 à 11:35, Marc Zyngier a écrit : > > irq_linear_revmap() is supposed to be a fast path for domain > > lookups, but it only exposes low-level details of the irqdomain > > implementation, details which are better kept private. > > Can you elaborate with more details ? Things like directly picking into the revmap are positively awful, and doesn't work if the domain has been constructed using the radix tree. Which on its own is totally broken, because things like irq_domain_create_hierarchy() will pick an implementation or the other. > > > > > The *overhead* between the two is only a function call and > > a couple of tests, so it is likely that noone can show any > > meaningful difference compared to the cost of taking an > > interrupt. > > Do you have any measurement ? I did measure things on arm64, and couldn't come up with any difference other than noise. > Can you make the "likely" a certitude ? Of course not. You can always come up with an artificial CPU implementation that has a very small exception entry overhead, and a ridiculously slow memory subsystem. Do I care about these? No. If you can come up with realistic platforms that show a regression with this patch, I'm all ears. > > > > > Reimplement irq_linear_revmap() with irq_find_mapping() > > in order to preserve source code compatibility, and > > rename the internal field for a measure. > > This is in complete contradiction with commit > https://github.com/torvalds/linux/commit/d3dcb436 > > At that time, irq_linear_revmap() was less complex than what > irq_find_mapping() is today, and nevertheless it was considered worth > restoring in as a fast path. What has changed since then ? Over 8 years? Plenty. The use of irqdomains has been generalised, we have domain hierarchies, and if anything, this commit introduces the buggy behaviour I was mentioning above. I also don't see any mention of actual performance in that commit. And if we're worried about a fast path, being able to directly cache the irq_data in the revmap, hence skipping the irq_desc lookup that inevitable follows, is a much more interesting prospect than the "get useless data quick" that irq_linear_revmap() implements. This latter optimisation is what I am after. > Can you also explain the reason for the renaming of "linear_revmap" > into "revmap" ? What is that "measure" ? To catch the potential direct use of the reverse map field. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 4/9] sh: intc: Drop the use of irq_create_identity_mapping()
On Tue, 06 Apr 2021 11:32:13 +0100, Geert Uytterhoeven wrote: > > Hi Marc, > > On Tue, Apr 6, 2021 at 11:44 AM Marc Zyngier wrote: > > Instead of playing games with using irq_create_identity_mapping() > > and irq_domain_associate(), drop the use of the former and only > > use the latter, together with the allocation of the irq_desc > > as needed. > > > > It doesn't make the code less awful, but at least the intent > > is clearer. > > > > Signed-off-by: Marc Zyngier > > Thanks for your patch! > > > --- a/drivers/sh/intc/core.c > > +++ b/drivers/sh/intc/core.c > > @@ -179,6 +179,23 @@ static unsigned int __init save_reg(struct > > intc_desc_int *d, > > return 0; > > } > > > > +static bool __init intc_map(struct irq_domain *domain, int irq) > > +{ > > + int res; > > warning: unused variable ‘res’ [-Wunused-variable] > > > + > > + if (!irq_to_desc(irq) && irq_alloc_desc_at(irq, NUMA_NO_NODE) != > > irq) { > > + pr_err("uname to allocate IRQ %d\n", irq); > > + return false; > > + } > > + > > + if (irq_domain_associate(domain, irq, irq)) { > > + pr_err("domain association failure\n"); > > + return false; > > + } > > + > > + return true; > > +} > > + > > int __init register_intc_controller(struct intc_desc *desc) > > { > > unsigned int i, k, smp; > > @@ -316,19 +333,8 @@ int __init register_intc_controller(struct intc_desc > > *desc) > > warning: unused variable ‘res’ [-Wunused-variable] Ah, thanks for spotting these. > > > if (!vect->enum_id) > > continue; > > > > - res = irq_create_identity_mapping(d->domain, irq); > > > > - if (unlikely(res)) { > > - if (res == -EEXIST) { > > - res = irq_domain_associate(d->domain, irq, > > irq); > > - if (unlikely(res)) { > > - pr_err("domain association > > failure\n"); > > - continue; > > - } > > - } else { > > - pr_err("can't identity map IRQ %d\n", irq); > > - continue; > > - } > > - } > > + if (!intc_map(d->domain, irq)) > > + continue; > > > > intc_irq_xlate_set(irq, vect->enum_id, d); > > intc_register_irq(desc, d, vect->enum_id, irq); > > Otherwise this seems to work fine on real hardware (landisk) and qemu > (rts7751r2d). I did verify that the new function intc_map() is called. > > Tested-by: Geert Uytterhoeven Awesome, thanks Geert. M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH] powerpc/sysdev: Use of_device_get_match_data()
On Fri, 04 Mar 2022 13:10:19 +, Christophe Leroy wrote: > > > > Le 04/03/2022 à 02:18, cgel@gmail.com a écrit : > > From: Minghao Chi (CGEL ZTE) > > > > Use of_device_get_match_data() to simplify the code. > > > > Reported-by: Zeal Robot > > Signed-off-by: Minghao Chi (CGEL ZTE) > > --- > > arch/powerpc/sysdev/fsl_msi.c | 6 +- > > 1 file changed, 1 insertion(+), 5 deletions(-) > > > > diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c > > index b3475ae9f236..9d135bbb30b7 100644 > > --- a/arch/powerpc/sysdev/fsl_msi.c > > +++ b/arch/powerpc/sysdev/fsl_msi.c > > @@ -387,7 +387,6 @@ static int fsl_msi_setup_hwirq(struct fsl_msi *msi, > > struct platform_device *dev, > > static const struct of_device_id fsl_of_msi_ids[]; > > static int fsl_of_msi_probe(struct platform_device *dev) > > { > > - const struct of_device_id *match; > > struct fsl_msi *msi; > > struct resource res, msiir; > > int err, i, j, irq_index, count; > > @@ -397,10 +396,7 @@ static int fsl_of_msi_probe(struct platform_device > > *dev) > > u32 offset; > > struct pci_controller *phb; > > > > - match = of_match_device(fsl_of_msi_ids, &dev->dev); > > - if (!match) > > - return -EINVAL; > > - features = match->data; > > + features = of_device_get_match_data(&dev->dev); > > What happens when features is NULL ? I did jump at that one too, but as it turns out, it cannot happen, by construction. All the fsl_of_msi_ids[] entries have a non-NULL .data pointer, and you only enter probe if you match a fsl_of_msi_ids[] entry with the DT. So the current check for a NULL match is not something that can happen short of some other bug somewhere. M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
On Fri, 29 Apr 2022 22:45:14 +0100, "Russell King (Oracle)" wrote: > > On Fri, Apr 29, 2022 at 06:38:19PM -0300, Guilherme G. Piccoli wrote: > > Thanks Marc and Michael for the review/discussion. > > > > On 29/04/2022 15:20, Marc Zyngier wrote: > > > [...] > > > > > My expectations would be that, since we're getting here using an IPI, > > > interrupts are already masked. So what reenabled them the first place? > > > > > > Thanks, > > > > > > M. > > > > > > > Marc, I did some investigation in the code (and tried/failed in the ARM > > documentation as well heh), but this is still not 100% clear for me. > > > > You're saying IPI calls disable IRQs/FIQs by default in the the target > > CPUs? Where does it happen? I'm a bit confused if this a processor > > mechanism, or it's in code. > > When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are > themselves interrupts, so IRQs will be masked while the IPI is being > processed. Therefore, there should be no need to re-disable the > already disabled interrupts. > > > But crash_smp_send_stop() is different, it seems to IPI the other CPUs > > with the flag IPI_CALL_FUNC, which leads to calling > > generic_smp_call_function_interrupt() - does it disable interrupts/FIQs > > as well? I couldn't find it. > > It's buried in the architecture behaviour. When the CPU takes an > interrupt and jumps to the interrupt vector in the vectors page, it is > architecturally defined that interrupts will be disabled. If they > weren't architecturally disabled at this point, then as soon as the > first instruction is processed (at the interrupt vector, likely a > branch) the CPU would immediately take another jump to the interrupt > vector, and this process would continue indefinitely, making interrupt > handling utterly useless. > > So, you won't find an explicit instruction in the code path from the > vectors to the IPI handler that disables interrupts - because it's > written into the architecture that this is what must happen. > > IRQs are a lower priority than FIQs, so FIQs remain unmasked. Ah, you're of course right. That's one of the huge differences between AArch32 and AArch64, where the former has per target mode masking rules, and the later masks everything on entry... M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
On Wed, 27 Apr 2022 23:48:56 +0100, "Guilherme G. Piccoli" wrote: > > Currently the regular CPU shutdown path for ARM disables IRQs/FIQs > in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which > is responsible for that. This makes sense, since we're turning off > such CPUs, putting them in an endless busy-wait loop. > > Problem is that there is an alternative path for disabling CPUs, > in the form of function crash_smp_send_stop(), used for kexec/panic > paths. This functions relies in a SMP call that also triggers a > busy-wait loop [at machine_crash_nonpanic_core()], but *without* > disabling interrupts. This might lead to odd scenarios, like early > interrupts in the boot of kexec'd kernel or even interrupts in > other CPUs while the main one still works in the panic path and > assumes all secondary CPUs are (really!) off. > > This patch mimics the ipi_cpu_stop() interrupt disable mechanism > in the crash CPU shutdown path, hence disabling IRQs/FIQs in all > secondary CPUs in the kexec/panic path as well. > > Cc: Marc Zyngier > Cc: Russell King > Signed-off-by: Guilherme G. Piccoli > --- > arch/arm/kernel/machine_kexec.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > index f567032a09c0..ef788ee00519 100644 > --- a/arch/arm/kernel/machine_kexec.c > +++ b/arch/arm/kernel/machine_kexec.c > @@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused) > set_cpu_online(smp_processor_id(), false); > atomic_dec(&waiting_for_crash_ipi); > > + local_fiq_disable(); > + local_irq_disable(); > + My expectations would be that, since we're getting here using an IPI, interrupts are already masked. So what reenabled them the first place? Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 2/9] ARM: PXA: Kill use of irq_create_strict_mappings()
Hi Guenter, Thanks for the heads up. On Mon, 26 Apr 2021 23:39:42 +0100, Guenter Roeck wrote: > > On Tue, Apr 06, 2021 at 10:35:50AM +0100, Marc Zyngier wrote: > > irq_create_strict_mappings() is a poor way to allow the use of > > a linear IRQ domain as a legacy one. Let's be upfront about > > it and use a legacy domain when appropriate. > > > > Signed-off-by: Marc Zyngier > > --- > > When running the "mainstone" qemu emulation, this patch results > in many (32, actually) runtime warnings such as the following. > > [0.528272] [ cut here ] > [0.528285] WARNING: CPU: 0 PID: 1 at kernel/irq/irqdomain.c:550 > irq_domain_associate+0x194/0x1f0 > [0.528315] error: virq335 is not allocated [...] This looks like a case of CONFIG_SPARSE_IRQ, combined with a lack of brain engagement. I've come up with the following patch, which lets the kernel boot in QEMU without screaming (other than the lack of a rootfs...). Please let me know if this helps. Thanks, M. From 4d7f6ddbbfdff1c9f029bafca79020d3294dc32c Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 27 Apr 2021 09:00:28 +0100 Subject: [PATCH] ARM: PXA: Fix cplds irqdesc allocation when using legacy mode The Mainstone PXA platform uses CONFIG_SPARSE_IRQ, and thus we cannot rely on the irq descriptors to be readilly allocated before creating the irqdomain in legacy mode. The kernel then complains loudly about not being able to associate the interrupt in the domain -- can't blame it. Fix it by allocating the irqdescs upfront in the legacy case. Fixes: b68761da0111 ("ARM: PXA: Kill use of irq_create_strict_mappings()") Reported-by: Guenter Roeck Signed-off-by: Marc Zyngier --- arch/arm/mach-pxa/pxa_cplds_irqs.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/arm/mach-pxa/pxa_cplds_irqs.c b/arch/arm/mach-pxa/pxa_cplds_irqs.c index ec0d9b094744..bddfc7cd5d40 100644 --- a/arch/arm/mach-pxa/pxa_cplds_irqs.c +++ b/arch/arm/mach-pxa/pxa_cplds_irqs.c @@ -121,8 +121,13 @@ static int cplds_probe(struct platform_device *pdev) return fpga->irq; base_irq = platform_get_irq(pdev, 1); - if (base_irq < 0) + if (base_irq < 0) { base_irq = 0; + } else { + ret = devm_irq_alloc_descs(&pdev->dev, base_irq, base_irq, CPLDS_NB_IRQ, 0); + if (ret < 0) + return ret; + } res = platform_get_resource(pdev, IORESOURCE_MEM, 0); fpga->base = devm_ioremap_resource(&pdev->dev, res); -- 2.30.2 -- Without deviation from the norm, progress is not possible.
Re: [PATCH 15/31] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts
On Fri, 14 May 2021 21:51:51 +0100, Thomas Gleixner wrote: > > On Fri, Apr 30 2021 at 10:03, Cédric Le Goater wrote: > > CC: +Marc Thanks Thomas. > > > PCI MSI interrupt numbers are now mapped in a PCI-MSI domain but the > > underlying calls handling the passthrough of the interrupt in the > > guest need a number in the XIVE IRQ domain. > > > > Use the IRQ data mapped in the XIVE IRQ domain and not the one in the > > PCI-MSI domain. > > > > Exporting irq_get_default_host() might not be the best solution. > > > > Cc: Thomas Gleixner > > Cc: Paul Mackerras > > Signed-off-by: Cédric Le Goater > > --- > > arch/powerpc/kvm/book3s_xive.c | 3 ++- > > kernel/irq/irqdomain.c | 1 + > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c > > index 3a7da42bed57..81b9f4fc3978 100644 > > --- a/arch/powerpc/kvm/book3s_xive.c > > +++ b/arch/powerpc/kvm/book3s_xive.c > > @@ -861,7 +861,8 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned > > long guest_irq, > > struct kvmppc_xive *xive = kvm->arch.xive; > > struct kvmppc_xive_src_block *sb; > > struct kvmppc_xive_irq_state *state; > > - struct irq_data *host_data = irq_get_irq_data(host_irq); > > + struct irq_data *host_data = > > + irq_domain_get_irq_data(irq_get_default_host(), host_irq); > > unsigned int hw_irq = (unsigned int)irqd_to_hwirq(host_data); > > u16 idx; > > u8 prio; > > diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c > > index d10ab1d689d5..8a073d1ce611 100644 > > --- a/kernel/irq/irqdomain.c > > +++ b/kernel/irq/irqdomain.c > > @@ -481,6 +481,7 @@ struct irq_domain *irq_get_default_host(void) > > { > > return irq_default_domain; > > } > > +EXPORT_SYMBOL_GPL(irq_get_default_host); > > > > static void irq_domain_clear_mapping(struct irq_domain *domain, > > irq_hw_number_t hwirq) > Is there any reason why we should add more users of the "default host" fallback? I would really hope that new code would actually track their irqdomain in a more fine-grained way, specially when using the hierarchical MSi setup, which seems to be the goal of this series. Don't you have enough topology information that you can make use of to correctly assign a domain identifier (of_node or otherwise)? Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.15-1 tag
Hi Michael, On Fri, 03 Sep 2021 14:36:57 +0100, Michael Ellerman wrote: > > Hi Linus, > > Please pull powerpc updates for 5.15. > > A bit of a small batch this time. > > There was one conflict against my own fixes branch, and the resolution was a > little bit > messy, so I just did a merge of fixes myself to resolve the conflict. I > didn't think there > was any value in having you resolve a conflict between two of my own branches. > > Notable out of area changes: > scripts/mod/modpost.c # 1e688dd2a3d6 powerpc/bug: Provide > better flexibility to WARN_ON/__WARN_FLAGS() with asm goto > kernel/irq/irqdomain.c # 51be9e51a800 KVM: PPC: Book3S HV: XIVE: Fix > mapping of passthrough interrupts > > That second one generated a bit of discussion[1] with tglx and maz, > who asked if we could avoid adding the export of > irq_get_default_host(). Cédric replied explaining that we don't > really have good way to avoid it, but we never heard back from them, > so in the end I decided to merge it. Apologies for this, I clearly have lost track of it. It clearly isn't pretty, but it isn't a deal breaker either. In the end, it will be easier to address with the code being in the tree. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v5 00/14] PCI: Add support for Apple M1
On Mon, 04 Oct 2021 21:42:45 +0100, Linus Walleij wrote: > > On Mon, Oct 4, 2021 at 9:52 PM Rob Herring wrote: > > > FYI, I pushed patches 1-3 to kernelCI and didn't see any regressions. > > I am a bit worried about changes to the DT interrupt parsing and > > ancient platforms (such as PowerMacs). Most likely there wouldn't be > > any report until -rc1 or months later on those old systems. > > Lets page the PPC lists to see if someone can test on some powermac. /me eyes the XServe-G5 that hasn't been powered on in 10 years. What could possibly go wrong? M. -- Without deviation from the norm, progress is not possible.
[PATCH 2/5] KVM: mips: Use kvm_get_vcpu() instead of open-coded access
As we are about to change the way vcpus are allocated, mandate the use of kvm_get_vcpu() instead of open-coding the access. Signed-off-by: Marc Zyngier --- arch/mips/kvm/loongson_ipi.c | 4 ++-- arch/mips/kvm/mips.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/mips/kvm/loongson_ipi.c b/arch/mips/kvm/loongson_ipi.c index 3681fc8fba38..5d53f32d837c 100644 --- a/arch/mips/kvm/loongson_ipi.c +++ b/arch/mips/kvm/loongson_ipi.c @@ -120,7 +120,7 @@ static int loongson_vipi_write(struct loongson_kvm_ipi *ipi, s->status |= data; irq.cpu = id; irq.irq = 6; - kvm_vcpu_ioctl_interrupt(kvm->vcpus[id], &irq); + kvm_vcpu_ioctl_interrupt(kvm_get_vcpu(kvm, id), &irq); break; case CORE0_CLEAR_OFF: @@ -128,7 +128,7 @@ static int loongson_vipi_write(struct loongson_kvm_ipi *ipi, if (!s->status) { irq.cpu = id; irq.irq = -6; - kvm_vcpu_ioctl_interrupt(kvm->vcpus[id], &irq); + kvm_vcpu_ioctl_interrupt(kvm_get_vcpu(kvm, id), &irq); } break; diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index ceacca74f808..6228bf396d63 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -479,7 +479,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, if (irq->cpu == -1) dvcpu = vcpu; else - dvcpu = vcpu->kvm->vcpus[irq->cpu]; + dvcpu = kvm_get_vcpu(vcpu->kvm, irq->cpu); if (intr == 2 || intr == 3 || intr == 4 || intr == 6) { kvm_mips_callbacks->queue_io_int(dvcpu, irq); -- 2.30.2
[PATCH 0/5] KVM: Turn the vcpu array into an xarray
The kvm structure is pretty large. A large portion of it is the vcpu array, which is 4kB on x86_64 and arm64 as they deal with 512 vcpu VMs. Of course, hardly anyone runs VMs this big, so this is often a net waste of memory and cache locality. A possible approach is to turn the fixed-size array into an xarray, which results in a net code deletion after a bit of cleanup. This series is on top of the current linux/master as it touches the RISC-V implementation. Only tested on arm64. Marc Zyngier (5): KVM: Move wiping of the kvm->vcpus array to common code KVM: mips: Use kvm_get_vcpu() instead of open-coded access KVM: s390: Use kvm_get_vcpu() instead of open-coded access KVM: x86: Use kvm_get_vcpu() instead of open-coded access KVM: Convert the kvm->vcpus array to a xarray arch/arm64/kvm/arm.c | 10 +- arch/mips/kvm/loongson_ipi.c | 4 ++-- arch/mips/kvm/mips.c | 23 ++- arch/powerpc/kvm/powerpc.c | 10 +- arch/riscv/kvm/vm.c| 10 +- arch/s390/kvm/kvm-s390.c | 26 ++ arch/x86/kvm/vmx/posted_intr.c | 2 +- arch/x86/kvm/x86.c | 9 + include/linux/kvm_host.h | 7 --- virt/kvm/kvm_main.c| 33 ++--- 10 files changed, 45 insertions(+), 89 deletions(-) -- 2.30.2
[PATCH 3/5] KVM: s390: Use kvm_get_vcpu() instead of open-coded access
As we are about to change the way vcpus are allocated, mandate the use of kvm_get_vcpu() instead of open-coding the access. Signed-off-by: Marc Zyngier --- arch/s390/kvm/kvm-s390.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 7af53b8788fa..4a0f62b03964 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4572,7 +4572,7 @@ int kvm_s390_vcpu_start(struct kvm_vcpu *vcpu) } for (i = 0; i < online_vcpus; i++) { - if (!is_vcpu_stopped(vcpu->kvm->vcpus[i])) + if (!is_vcpu_stopped(kvm_get_vcpu(vcpu->kvm, i))) started_vcpus++; } @@ -4634,9 +4634,11 @@ int kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu) __disable_ibs_on_vcpu(vcpu); for (i = 0; i < online_vcpus; i++) { - if (!is_vcpu_stopped(vcpu->kvm->vcpus[i])) { + struct kvm_vcpu *tmp = kvm_get_vcpu(vcpu->kvm, i); + + if (!is_vcpu_stopped(tmp)) { started_vcpus++; - started_vcpu = vcpu->kvm->vcpus[i]; + started_vcpu = tmp; } } -- 2.30.2
[PATCH 4/5] KVM: x86: Use kvm_get_vcpu() instead of open-coded access
As we are about to change the way vcpus are allocated, mandate the use of kvm_get_vcpu() instead of open-coding the access. Signed-off-by: Marc Zyngier --- arch/x86/kvm/vmx/posted_intr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c index 5f81ef092bd4..82a49720727d 100644 --- a/arch/x86/kvm/vmx/posted_intr.c +++ b/arch/x86/kvm/vmx/posted_intr.c @@ -272,7 +272,7 @@ int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq, if (!kvm_arch_has_assigned_device(kvm) || !irq_remapping_cap(IRQ_POSTING_CAP) || - !kvm_vcpu_apicv_active(kvm->vcpus[0])) + !kvm_vcpu_apicv_active(kvm_get_vcpu(kvm, 0))) return 0; idx = srcu_read_lock(&kvm->irq_srcu); -- 2.30.2
[PATCH 1/5] KVM: Move wiping of the kvm->vcpus array to common code
All architectures have similar loops iterating over the vcpus, freeing one vcpu at a time, and eventually wiping the reference off the vcpus array. They are also inconsistently taking the kvm->lock mutex when wiping the references from the array. Make this code common, which will simplify further changes. Signed-off-by: Marc Zyngier --- arch/arm64/kvm/arm.c | 10 +- arch/mips/kvm/mips.c | 21 + arch/powerpc/kvm/powerpc.c | 10 +- arch/riscv/kvm/vm.c| 10 +- arch/s390/kvm/kvm-s390.c | 18 +- arch/x86/kvm/x86.c | 9 + include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c| 20 ++-- 8 files changed, 25 insertions(+), 75 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index f5490afe1ebf..75bb7215da03 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -175,19 +175,11 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) */ void kvm_arch_destroy_vm(struct kvm *kvm) { - int i; - bitmap_free(kvm->arch.pmu_filter); kvm_vgic_destroy(kvm); - for (i = 0; i < KVM_MAX_VCPUS; ++i) { - if (kvm->vcpus[i]) { - kvm_vcpu_destroy(kvm->vcpus[i]); - kvm->vcpus[i] = NULL; - } - } - atomic_set(&kvm->online_vcpus, 0); + kvm_destroy_vcpus(kvm); } int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 562aa878b266..ceacca74f808 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -171,25 +171,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) return 0; } -void kvm_mips_free_vcpus(struct kvm *kvm) -{ - unsigned int i; - struct kvm_vcpu *vcpu; - - kvm_for_each_vcpu(i, vcpu, kvm) { - kvm_vcpu_destroy(vcpu); - } - - mutex_lock(&kvm->lock); - - for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) - kvm->vcpus[i] = NULL; - - atomic_set(&kvm->online_vcpus, 0); - - mutex_unlock(&kvm->lock); -} - static void kvm_mips_free_gpa_pt(struct kvm *kvm) { /* It should always be safe to remove after flushing the whole range */ @@ -199,7 +180,7 @@ static void kvm_mips_free_gpa_pt(struct kvm *kvm) void kvm_arch_destroy_vm(struct kvm *kvm) { - kvm_mips_free_vcpus(kvm); + kvm_destroy_vcpus(kvm); kvm_mips_free_gpa_pt(kvm); } diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 35e9cccdeef9..492e4a4121cb 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -463,9 +463,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) void kvm_arch_destroy_vm(struct kvm *kvm) { - unsigned int i; - struct kvm_vcpu *vcpu; - #ifdef CONFIG_KVM_XICS /* * We call kick_all_cpus_sync() to ensure that all @@ -476,14 +473,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kick_all_cpus_sync(); #endif - kvm_for_each_vcpu(i, vcpu, kvm) - kvm_vcpu_destroy(vcpu); + kvm_destroy_vcpus(kvm); mutex_lock(&kvm->lock); - for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) - kvm->vcpus[i] = NULL; - - atomic_set(&kvm->online_vcpus, 0); kvmppc_core_destroy_vm(kvm); diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 26399df15b63..6af6cde295eb 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -46,15 +46,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) void kvm_arch_destroy_vm(struct kvm *kvm) { - int i; - - for (i = 0; i < KVM_MAX_VCPUS; ++i) { - if (kvm->vcpus[i]) { - kvm_vcpu_destroy(kvm->vcpus[i]); - kvm->vcpus[i] = NULL; - } - } - atomic_set(&kvm->online_vcpus, 0); + kvm_destroy_vcpus(kvm); } int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index c6257f625929..7af53b8788fa 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -2819,27 +2819,11 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) free_page((unsigned long)(vcpu->arch.sie_block)); } -static void kvm_free_vcpus(struct kvm *kvm) -{ - unsigned int i; - struct kvm_vcpu *vcpu; - - kvm_for_each_vcpu(i, vcpu, kvm) - kvm_vcpu_destroy(vcpu); - - mutex_lock(&kvm->lock); - for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) - kvm->vcpus[i] = NULL; - - atomic_set(&kvm->online_vcpus, 0); - mutex_unlock(&kvm->lock); -} - void kvm_arch_destroy_vm(struct kvm *kvm) {
[PATCH 5/5] KVM: Convert the kvm->vcpus array to a xarray
At least on arm64 and x86, the vcpus array is pretty huge (512 entries), and is mostly empty in most cases (running 512 vcpu VMs is not that common). This mean that we end-up with a 4kB block of unused memory in the middle of the kvm structure. Instead of wasting away this memory, let's use an xarray instead, which gives us almost the same flexibility as a normal array, but with a reduced memory usage with smaller VMs. Signed-off-by: Marc Zyngier --- include/linux/kvm_host.h | 5 +++-- virt/kvm/kvm_main.c | 15 +-- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 36967291b8c6..3933d825e28b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -552,7 +553,7 @@ struct kvm { struct mutex slots_arch_lock; struct mm_struct *mm; /* userspace tied to this vm */ struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM]; - struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; + struct xarray vcpu_array; /* Used to wait for completion of MMU notifiers. */ spinlock_t mn_invalidate_lock; @@ -693,7 +694,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) /* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu. */ smp_rmb(); - return kvm->vcpus[i]; + return xa_load(&kvm->vcpu_array, i); } #define kvm_for_each_vcpu(idx, vcpup, kvm) \ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d83553eeea21..4c18d7911fa5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -461,7 +461,7 @@ void kvm_destroy_vcpus(struct kvm *kvm) mutex_lock(&kvm->lock); for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) - kvm->vcpus[i] = NULL; + xa_erase(&kvm->vcpu_array, i); atomic_set(&kvm->online_vcpus, 0); mutex_unlock(&kvm->lock); @@ -1066,6 +1066,7 @@ static struct kvm *kvm_create_vm(unsigned long type) mutex_init(&kvm->slots_arch_lock); spin_lock_init(&kvm->mn_invalidate_lock); rcuwait_init(&kvm->mn_memslots_update_rcuwait); + xa_init(&kvm->vcpu_array); INIT_LIST_HEAD(&kvm->devices); @@ -3661,7 +3662,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) } vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus); - BUG_ON(kvm->vcpus[vcpu->vcpu_idx]); + r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT); + BUG_ON(r == -EBUSY); + if (r) + goto unlock_vcpu_destroy; /* Fill the stats id string for the vcpu */ snprintf(vcpu->stats_id, sizeof(vcpu->stats_id), "kvm-%d/vcpu-%d", @@ -3671,15 +3675,14 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) kvm_get_kvm(kvm); r = create_vcpu_fd(vcpu); if (r < 0) { + xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx); kvm_put_kvm_no_destroy(kvm); goto unlock_vcpu_destroy; } - kvm->vcpus[vcpu->vcpu_idx] = vcpu; - /* -* Pairs with smp_rmb() in kvm_get_vcpu. Write kvm->vcpus -* before kvm->online_vcpu's incremented value. +* Pairs with smp_rmb() in kvm_get_vcpu. Store the vcpu +* pointer before kvm->online_vcpu's incremented value. */ smp_wmb(); atomic_inc(&kvm->online_vcpus); -- 2.30.2
Re: [PATCH 1/5] KVM: Move wiping of the kvm->vcpus array to common code
On Fri, 05 Nov 2021 20:12:12 +, Sean Christopherson wrote: > > On Fri, Nov 05, 2021, Marc Zyngier wrote: > > All architectures have similar loops iterating over the vcpus, > > freeing one vcpu at a time, and eventually wiping the reference > > off the vcpus array. They are also inconsistently taking > > the kvm->lock mutex when wiping the references from the array. > > ... > > > +void kvm_destroy_vcpus(struct kvm *kvm) > > +{ > > + unsigned int i; > > + struct kvm_vcpu *vcpu; > > + > > + kvm_for_each_vcpu(i, vcpu, kvm) > > + kvm_vcpu_destroy(vcpu); > > + > > + mutex_lock(&kvm->lock); > > But why is kvm->lock taken here? Unless I'm overlooking an arch, > everyone calls this from kvm_arch_destroy_vm(), in which case this > is the only remaining reference to @kvm. And if there's some magic > path for which that's not true, I don't see how it can possibly be > safe to call kvm_vcpu_destroy() without holding kvm->lock, or how > this would guarantee that all vCPUs have actually been destroyed > before nullifying the array. I asked myself the same question two years ago, and couldn't really understand the requirement. However, x86 does just that, so I preserved the behaviour. If you too believe that this is just wrong, I'm happy to drop the locking altogether. If that breaks someone's flow, they'll shout soon enough. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 5/5] KVM: Convert the kvm->vcpus array to a xarray
On Fri, 05 Nov 2021 20:21:36 +, Sean Christopherson wrote: > > On Fri, Nov 05, 2021, Marc Zyngier wrote: > > At least on arm64 and x86, the vcpus array is pretty huge (512 entries), > > and is mostly empty in most cases (running 512 vcpu VMs is not that > > common). This mean that we end-up with a 4kB block of unused memory > > in the middle of the kvm structure. > > Heh, x86 is now up to 1024 entries. Humph. I don't want to know whether people are actually using that in practice. The only time I create VMs with 512 vcpus is to check whether it still works... > > > Instead of wasting away this memory, let's use an xarray instead, > > which gives us almost the same flexibility as a normal array, but > > with a reduced memory usage with smaller VMs. > > > > Signed-off-by: Marc Zyngier > > --- > > @@ -693,7 +694,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm > > *kvm, int i) > > > > /* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu. */ > > smp_rmb(); > > - return kvm->vcpus[i]; > > + return xa_load(&kvm->vcpu_array, i); > > } > > It'd be nice for this series to convert kvm_for_each_vcpu() to use > xa_for_each() as well. Maybe as a patch on top so that potential > explosions from that are isolated from the initiali conversion? > > Or maybe even use xa_for_each_range() to cap at online_vcpus? > That's technically a functional change, but IMO it's easier to > reason about iterating over a snapshot of vCPUs as opposed to being > able to iterate over vCPUs as their being added. In practice I > doubt it matters. > > #define kvm_for_each_vcpu(idx, vcpup, kvm) \ > xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, > atomic_read(&kvm->online_vcpus)) > I think that's already the behaviour of this iterator (we stop at the first empty slot capped to online_vcpus. The only change in behaviour is that vcpup currently holds a pointer to the last vcpu in no empty slot has been encountered. xa_for_each{,_range}() would set the pointer to NULL at all times. I doubt anyone relies on that, but it is probably worth eyeballing some of the use cases... Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 5/5] KVM: Convert the kvm->vcpus array to a xarray
On 2021-11-06 11:48, Marc Zyngier wrote: On Fri, 05 Nov 2021 20:21:36 +, Sean Christopherson wrote: On Fri, Nov 05, 2021, Marc Zyngier wrote: > At least on arm64 and x86, the vcpus array is pretty huge (512 entries), > and is mostly empty in most cases (running 512 vcpu VMs is not that > common). This mean that we end-up with a 4kB block of unused memory > in the middle of the kvm structure. Heh, x86 is now up to 1024 entries. Humph. I don't want to know whether people are actually using that in practice. The only time I create VMs with 512 vcpus is to check whether it still works... > Instead of wasting away this memory, let's use an xarray instead, > which gives us almost the same flexibility as a normal array, but > with a reduced memory usage with smaller VMs. > > Signed-off-by: Marc Zyngier > --- > @@ -693,7 +694,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) > >/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu. */ >smp_rmb(); > - return kvm->vcpus[i]; > + return xa_load(&kvm->vcpu_array, i); > } It'd be nice for this series to convert kvm_for_each_vcpu() to use xa_for_each() as well. Maybe as a patch on top so that potential explosions from that are isolated from the initiali conversion? Or maybe even use xa_for_each_range() to cap at online_vcpus? That's technically a functional change, but IMO it's easier to reason about iterating over a snapshot of vCPUs as opposed to being able to iterate over vCPUs as their being added. In practice I doubt it matters. #define kvm_for_each_vcpu(idx, vcpup, kvm) \ xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, atomic_read(&kvm->online_vcpus)) I think that's already the behaviour of this iterator (we stop at the first empty slot capped to online_vcpus. The only change in behaviour is that vcpup currently holds a pointer to the last vcpu in no empty slot has been encountered. xa_for_each{,_range}() would set the pointer to NULL at all times. I doubt anyone relies on that, but it is probably worth eyeballing some of the use cases... This turned out to be an interesting exercise, as we always use an int for the index, and the xarray iterators insist on an unsigned long (and even on a pointer to it). On the other hand, I couldn't spot any case where we'd rely on the last value of the vcpu pointer. I'll repost the series once we have a solution for patch #4, and we can then decide whether we want the iterator churn. -- Jazz is not dead. It just smells funny...
Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates
HI all, On Wed, 10 Nov 2021 18:41:06 +, Bjorn Helgaas wrote: > > On Wed, Nov 10, 2021 at 07:07:24PM +0100, Christian Zigotzky wrote: > > On 09 November 2021 at 03:45 pm, Christian Zigotzky wrote: > > > Hello, > > > > > > The Nemo board [1] doesn't recognize any ATA disks with the pci-v5.16 > > updates [2]. > > > > > > Error messages: > > > > > > ata4.00: gc timeout cmd 0xec > > > ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4) > > > ata1.00: gc timeout cmd 0xec > > > ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4) > > > ata3.00: gc timeout cmd 0xec > > > ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4) > > > > > > I was able to revert the new pci-v5.16 updates [2]. After a new > > compiling, the kernel recognize all ATA disks correctly. > > > > > > Could you please check the pci-v5.16 updates [2]? > > > > > > Please find attached the kernel config. > > > > > > Thanks, > > > Christian > > > > > > [1] https://en.wikipedia.org/wiki/AmigaOne_X1000 > > > [2] > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4 > > > > Hi All, > > > > Many thanks for your nice responses. > > > > I bisected today [1]. 0412841812265734c306ba5ef8088bcb64d5d3bd (of/irq: > > Allow matching of an interrupt-map local to an interrupt controller) [2] is > > the first bad commit. > > > > I was able to revert the first bad commit [1]. After a new compiling, the > > kernel detects all ATA disks without any problems. > > > > I created a patch for an easy reverting the bad commit [1]. With this patch > > we can do further our kernel tests. > > > > Could you please check the first bad commit [2]? > > > > Thanks, > > Christian > > > > [1] https://forum.hyperion-entertainment.com/viewtopic.php?p=54398#p54398 > > [2] > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0412841812265734c306ba5ef8088bcb64d5d3bd > > > > [+ Marc Zyngier, Alyssa Rosenzweig, Lorenzo Pieralisi, and Rob Herring > > because of the first bad commit] > > Thank you very much for the bisection and for also testing the revert! > > It's easy enough to revert 041284181226 ("of/irq: Allow matching of an > interrupt-map local to an interrupt controller"), and it seems like > that's what we need to do. I have it tentatively queued up. > > That commit was part of the new support for the Apple M1 PCIe > interface, and I don't know what effect a revert will have on that > support. Marc, Alyssa? It is going to badly break the M1 support, as we won't be able to take interrupts to detect that the PCIe link is up. Before we apply a full blown revert and decide that this isn't workable (and revert the whole M1 PCIe series, because they are otherwise somewhat pointless), I'd like to understand *what* breaks exactly. Christian, could you point me to the full DT that this machine uses? This would help understanding what goes wrong, and cook something for you to test. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates
On Thu, 11 Nov 2021 05:24:52 +, Christian Zigotzky wrote: > > On 10 November 2021 at 08:09 pm, Marc Zyngier wrote: > > HI all, > > > > On Wed, 10 Nov 2021 18:41:06 +, > > Bjorn Helgaas wrote: > >> On Wed, Nov 10, 2021 at 07:07:24PM +0100, Christian Zigotzky wrote: > >>> On 09 November 2021 at 03:45 pm, Christian Zigotzky wrote: > >>>> Hello, > >>>> > >>>> The Nemo board [1] doesn't recognize any ATA disks with the pci-v5.16 > >>> updates [2]. > >>>> Error messages: > >>>> > >>>> ata4.00: gc timeout cmd 0xec > >>>> ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4) > >>>> ata1.00: gc timeout cmd 0xec > >>>> ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4) > >>>> ata3.00: gc timeout cmd 0xec > >>>> ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4) > >>>> > >>>> I was able to revert the new pci-v5.16 updates [2]. After a new > >>> compiling, the kernel recognize all ATA disks correctly. > >>>> Could you please check the pci-v5.16 updates [2]? > >>>> > >>>> Please find attached the kernel config. > >>>> > >>>> Thanks, > >>>> Christian > >>>> > >>>> [1] https://en.wikipedia.org/wiki/AmigaOne_X1000 > >>>> [2] > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4 > >>> Hi All, > >>> > >>> Many thanks for your nice responses. > >>> > >>> I bisected today [1]. 0412841812265734c306ba5ef8088bcb64d5d3bd (of/irq: > >>> Allow matching of an interrupt-map local to an interrupt controller) [2] > >>> is > >>> the first bad commit. > >>> > >>> I was able to revert the first bad commit [1]. After a new compiling, the > >>> kernel detects all ATA disks without any problems. > >>> > >>> I created a patch for an easy reverting the bad commit [1]. With this > >>> patch > >>> we can do further our kernel tests. > >>> > >>> Could you please check the first bad commit [2]? > >>> > >>> Thanks, > >>> Christian > >>> > >>> [1] https://forum.hyperion-entertainment.com/viewtopic.php?p=54398#p54398 > >>> [2] > >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0412841812265734c306ba5ef8088bcb64d5d3bd > >>> > >>> [+ Marc Zyngier, Alyssa Rosenzweig, Lorenzo Pieralisi, and Rob Herring > >>> because of the first bad commit] > >> Thank you very much for the bisection and for also testing the revert! > >> > >> It's easy enough to revert 041284181226 ("of/irq: Allow matching of an > >> interrupt-map local to an interrupt controller"), and it seems like > >> that's what we need to do. I have it tentatively queued up. > >> > >> That commit was part of the new support for the Apple M1 PCIe > >> interface, and I don't know what effect a revert will have on that > >> support. Marc, Alyssa? > > It is going to badly break the M1 support, as we won't be able to take > > interrupts to detect that the PCIe link is up. > > > > Before we apply a full blown revert and decide that this isn't > > workable (and revert the whole M1 PCIe series, because they are > > otherwise somewhat pointless), I'd like to understand *what* breaks > > exactly. > > > > Christian, could you point me to the full DT that this machine uses? > > This would help understanding what goes wrong, and cook something for > > you to test. > > > > Thanks, > > > > M. > > > Hello Marc, > > Here you are: > https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406 This is not what I asked. I need the actual source file, or at the very least the compiled object (the /sys/firmware/fdt file, for example). Not an interpretation that I can't feed to the kernel. Without this, I can't debug your problem. > We are very happy to have the patch for reverting the bad commit > because we want to test the new PASEMI i2c driver with support for the > Apple M1 [1] on our Nemo boards. You can revert the patch on your own. At this stage, we're not blindly reverting things in the tree, but instead trying to understand what is happening on your particular system. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates
On Thu, 11 Nov 2021 07:47:08 +, Christian Zigotzky wrote: > > On 11 November 2021 at 08:13 am, Marc Zyngier wrote: > > On Thu, 11 Nov 2021 05:24:52 +, > > Christian Zigotzky wrote: > >> Hello Marc, > >> > >> Here you are: > >> https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406 > > This is not what I asked. I need the actual source file, or at the > > very least the compiled object (the /sys/firmware/fdt file, for > > example). Not an interpretation that I can't feed to the kernel. > > > > Without this, I can't debug your problem. > > > >> We are very happy to have the patch for reverting the bad commit > >> because we want to test the new PASEMI i2c driver with support for the > >> Apple M1 [1] on our Nemo boards. > > You can revert the patch on your own. At this stage, we're not blindly > > reverting things in the tree, but instead trying to understand what > > is happening on your particular system. > > > > Thanks, > > > > M. > > > I added a download link for the fdt file to the post [1]. Please read > also Darren's comments in this post. Thanks for that. The DT looks absolutely bonkers, no wonder that things break with something like that. But Darren's comments made me jump a bit, and I quote them here for everyone to see: [...] The dtb passed by the CFE firmware has a number of issues, which up till now have been fixed by use of patches applied to the mainline kernel. This occasionally causes problems with changes made to mainline. [...] Am I right in understanding that the upstream kernel does not support the machine out of the box, and that you actually have to apply out of tree patches to make it work? That these patches have to do with the IRQ routing? If so, I wonder why upstream should revert a patch to work on a system that isn't supported upstream the first place. I will still try and come up with a solution for you. But asking for the revert of a patch on these grounds is not, IMHO, acceptable. Also, please provide these patches on the list so that I can help you to some extend (and I mean *on the list*, not on a random forum that collects my information). Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates
On Thu, 11 Nov 2021 10:44:30 +, Christian Zigotzky wrote: > > On 11 November 2021 at 11:20 am, Marc Zyngier wrote: > > On Thu, 11 Nov 2021 07:47:08 +, > > Christian Zigotzky wrote: > >> On 11 November 2021 at 08:13 am, Marc Zyngier wrote: > >>> On Thu, 11 Nov 2021 05:24:52 +, > >>> Christian Zigotzky wrote: > >>>> Hello Marc, > >>>> > >>>> Here you are: > >>>> https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406 > >>> This is not what I asked. I need the actual source file, or at the > >>> very least the compiled object (the /sys/firmware/fdt file, for > >>> example). Not an interpretation that I can't feed to the kernel. > >>> > >>> Without this, I can't debug your problem. > >>> > >>>> We are very happy to have the patch for reverting the bad commit > >>>> because we want to test the new PASEMI i2c driver with support for the > >>>> Apple M1 [1] on our Nemo boards. > >>> You can revert the patch on your own. At this stage, we're not blindly > >>> reverting things in the tree, but instead trying to understand what > >>> is happening on your particular system. > >>> > >>> Thanks, > >>> > >>> M. > >>> > >> I added a download link for the fdt file to the post [1]. Please read > >> also Darren's comments in this post. > > > > Am I right in understanding that the upstream kernel does not support > > the machine out of the box, and that you actually have to apply out of > > tree patches to make it work? > No, you aren't right. The upstream kernel supports the Nemo board out > of the box. [1] That's not the way I interpret Darren's comments, but you surely know better than I do. I'll try to come up with something for you to test shortly. M. -- Without deviation from the norm, progress is not possible.
Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates
On Wed, 10 Nov 2021 18:07:24 +, Christian Zigotzky wrote: > > On 09 November 2021 at 03:45 pm, Christian Zigotzky wrote: > > Hello, > > > > The Nemo board [1] doesn't recognize any ATA disks with the > pci-v5.16 updates [2]. > > > > Error messages: > > > > ata4.00: gc timeout cmd 0xec > > ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4) > > ata1.00: gc timeout cmd 0xec > > ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4) > > ata3.00: gc timeout cmd 0xec > > ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4) > > > > I was able to revert the new pci-v5.16 updates [2]. After a new > compiling, the kernel recognize all ATA disks correctly. > > > > Could you please check the pci-v5.16 updates [2]? > > > > Please find attached the kernel config. > > > > Thanks, > > Christian > > > > [1] https://en.wikipedia.org/wiki/AmigaOne_X1000 > > [2] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4 > > Hi All, > > Many thanks for your nice responses. > > I bisected today [1]. 0412841812265734c306ba5ef8088bcb64d5d3bd > (of/irq: Allow matching of an interrupt-map local to an interrupt > controller) [2] is the first bad commit. Can you please give the following hack a go and post the result (including the full dmesg)? Thanks, M. diff --git a/drivers/of/irq.c b/drivers/of/irq.c index 32be5a03951f..8cf0cc9b7caf 100644 --- a/drivers/of/irq.c +++ b/drivers/of/irq.c @@ -156,14 +156,15 @@ int of_irq_parse_raw(const __be32 *addr, struct of_phandle_args *out_irq) /* Now start the actual "proper" walk of the interrupt tree */ while (ipar != NULL) { + bool intc = of_property_read_bool(ipar, "interrupt-controller"); + /* * Now check if cursor is an interrupt-controller and * if it is then we are done, unless there is an * interrupt-map which takes precedence. */ imap = of_get_property(ipar, "interrupt-map", &imaplen); - if (imap == NULL && - of_property_read_bool(ipar, "interrupt-controller")) { + if (imap == NULL && intc) { pr_debug(" -> got it !\n"); return 0; } @@ -244,8 +245,14 @@ int of_irq_parse_raw(const __be32 *addr, struct of_phandle_args *out_irq) pr_debug(" -> imaplen=%d\n", imaplen); } - if (!match) + if (!match) { + if (intc) { + pr_info("%pOF interrupt-map failed, using interrupt-controller\n", ipar); + return 0; + } + goto fail; + } /* * Successfully parsed an interrrupt-map translation; copy new -- Without deviation from the norm, progress is not possible.
Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates
On Fri, 12 Nov 2021 09:40:30 +, Christian Zigotzky wrote: > > On 11 November 2021 at 06:39 pm, Marc Zyngier wrote: > > On Wed, 10 Nov 2021 18:07:24 +, > > Christian Zigotzky wrote: > >> On 09 November 2021 at 03:45 pm, Christian Zigotzky wrote: > >>> Hello, > >>> > >>> The Nemo board [1] doesn't recognize any ATA disks with the > >> pci-v5.16 updates [2]. > >>> Error messages: > >>> > >>> ata4.00: gc timeout cmd 0xec > >>> ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4) > >>> ata1.00: gc timeout cmd 0xec > >>> ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4) > >>> ata3.00: gc timeout cmd 0xec > >>> ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4) > >>> > >>> I was able to revert the new pci-v5.16 updates [2]. After a new > >> compiling, the kernel recognize all ATA disks correctly. > >>> Could you please check the pci-v5.16 updates [2]? > >>> > >>> Please find attached the kernel config. > >>> > >>> Thanks, > >>> Christian > >>> > >>> [1] https://en.wikipedia.org/wiki/AmigaOne_X1000 > >>> [2] > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4 > >> > >> Hi All, > >> > >> Many thanks for your nice responses. > >> > >> I bisected today [1]. 0412841812265734c306ba5ef8088bcb64d5d3bd > >> (of/irq: Allow matching of an interrupt-map local to an interrupt > >> controller) [2] is the first bad commit. > > Can you please give the following hack a go and post the result > > (including the full dmesg)? > > > > Thanks, > > > > M. > > diff --git a/drivers/of/irq.c b/drivers/of/irq.c > > index 32be5a03951f..8cf0cc9b7caf 100644 > > --- a/drivers/of/irq.c > > +++ b/drivers/of/irq.c > > @@ -156,14 +156,15 @@ int of_irq_parse_raw(const __be32 *addr, struct > > of_phandle_args *out_irq) > > /* Now start the actual "proper" walk of the interrupt tree */ > > while (ipar != NULL) { > > + bool intc = of_property_read_bool(ipar, "interrupt-controller"); > > + > > /* > > * Now check if cursor is an interrupt-controller and > > * if it is then we are done, unless there is an > > * interrupt-map which takes precedence. > > */ > > imap = of_get_property(ipar, "interrupt-map", &imaplen); > > - if (imap == NULL && > > - of_property_read_bool(ipar, "interrupt-controller")) { > > + if (imap == NULL && intc) { > > pr_debug(" -> got it !\n"); > > return 0; > > } > > @@ -244,8 +245,14 @@ int of_irq_parse_raw(const __be32 *addr, struct > > of_phandle_args *out_irq) > > pr_debug(" -> imaplen=%d\n", imaplen); > > } > > - if (!match) > > + if (!match) { > > + if (intc) { > > + pr_info("%pOF interrupt-map failed, using > > interrupt-controller\n", ipar); > > + return 0; > > + } > > + > > goto fail; > > + } > > /* > > * Successfully parsed an interrrupt-map translation; copy new > > > The detecting of the ATA disks works with this patch! Well done! > Thanks a lot! Thanks for testing it. I'll turn that into a proper patch. M. -- Without deviation from the norm, progress is not possible.
Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates
On Fri, 12 Nov 2021 14:15:18 +, Christian Zigotzky wrote: > > On 12 November 2021 at 02:41 pm, Marc Zyngier wrote: > > On Fri, 12 Nov 2021 09:40:30 +, > > Christian Zigotzky wrote: > >> On 11 November 2021 at 06:39 pm, Marc Zyngier wrote: > >>> On Wed, 10 Nov 2021 18:07:24 +, > >>> Christian Zigotzky wrote: > >>>> On 09 November 2021 at 03:45 pm, Christian Zigotzky wrote: > >>>>> Hello, > >>>>> > >>>>> The Nemo board [1] doesn't recognize any ATA disks with the > >>>> pci-v5.16 updates [2]. > >>>>> Error messages: > >>>>> > >>>>> ata4.00: gc timeout cmd 0xec > >>>>> ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4) > >>>>> ata1.00: gc timeout cmd 0xec > >>>>> ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4) > >>>>> ata3.00: gc timeout cmd 0xec > >>>>> ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4) > >>>>> > >>>>> I was able to revert the new pci-v5.16 updates [2]. After a new > >>>> compiling, the kernel recognize all ATA disks correctly. > >>>>> Could you please check the pci-v5.16 updates [2]? > >>>>> > >>>>> Please find attached the kernel config. > >>>>> > >>>>> Thanks, > >>>>> Christian > >>>>> > >>>>> [1] https://en.wikipedia.org/wiki/AmigaOne_X1000 > >>>>> [2] > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4 > >>>> > >>>> Hi All, > >>>> > >>>> Many thanks for your nice responses. > >>>> > >>>> I bisected today [1]. 0412841812265734c306ba5ef8088bcb64d5d3bd > >>>> (of/irq: Allow matching of an interrupt-map local to an interrupt > >>>> controller) [2] is the first bad commit. > >>> Can you please give the following hack a go and post the result > >>> (including the full dmesg)? > >>> > >>> Thanks, > >>> > >>> M. > >>> diff --git a/drivers/of/irq.c b/drivers/of/irq.c > >>> index 32be5a03951f..8cf0cc9b7caf 100644 > >>> --- a/drivers/of/irq.c > >>> +++ b/drivers/of/irq.c > >>> @@ -156,14 +156,15 @@ int of_irq_parse_raw(const __be32 *addr, struct > >>> of_phandle_args *out_irq) > >>> /* Now start the actual "proper" walk of the interrupt tree */ > >>> while (ipar != NULL) { > >>> + bool intc = of_property_read_bool(ipar, "interrupt-controller"); > >>> + > >>> /* > >>>* Now check if cursor is an interrupt-controller and > >>>* if it is then we are done, unless there is an > >>>* interrupt-map which takes precedence. > >>>*/ > >>> imap = of_get_property(ipar, "interrupt-map", &imaplen); > >>> - if (imap == NULL && > >>> - of_property_read_bool(ipar, "interrupt-controller")) { > >>> + if (imap == NULL && intc) { > >>> pr_debug(" -> got it !\n"); > >>> return 0; > >>> } > >>> @@ -244,8 +245,14 @@ int of_irq_parse_raw(const __be32 *addr, struct > >>> of_phandle_args *out_irq) > >>> pr_debug(" -> imaplen=%d\n", imaplen); > >>> } > >>> - if (!match) > >>> + if (!match) { > >>> + if (intc) { > >>> + pr_info("%pOF interrupt-map failed, using > >>> interrupt-controller\n", ipar); > >>> + return 0; > >>> + } > >>> + > >>> goto fail; > >>> + } > >>> /* > >>>* Successfully parsed an interrrupt-map translation; > >>> copy new > >>> > >> The detecting of the ATA disks works with this patch! Well done! > >> Thanks a lot! > > Thanks for testing it. I'll turn that into a proper patch. > > > > M. > > > Could you please explain your patch? Please refer to the commit message[1]. > I am not a developer. I work for the A-EON Linux FLS. I have no idea what this is, unfortunately. M. [1] https://lore.kernel.org/r/2022143644.434995-1-...@kernel.org -- Without deviation from the norm, progress is not possible.
Re: [PATCH 16/39] irqdomain: Make normal and nomap irqdomains exclusive
On Mon, 15 Nov 2021 19:05:17 +, Cédric Le Goater wrote: > > Hello Mark, s/k/c/ > > On 5/20/21 18:37, Marc Zyngier wrote: > > Direct mappings are completely exclusive of normal mappings, meaning > > that we can refactor the code slightly so that we can get rid of > > the revmap_direct_max_irq field and use the revmap_size field > > instead, reducing the size of the irqdomain structure. > > > > Signed-off-by: Marc Zyngier > > > This patch is breaking the POWER9/POWER10 XIVE driver (these are not > old PPC systems :) on machines sharing the same LSI HW IRQ. For instance, > a linux KVM guest with a virtio-rng and a virtio-balloon device. In that > case, Linux creates two distinct IRQ mappings which can lead to some > unexpected behavior. Either the irq domain translates, or it doesn't. If the driver creates a nomap domain, and yet expects some sort of translation to happen, then the driver is fundamentally broken. And even without that: how do you end-up with a single HW interrupt having two mappings? > A fix to go forward would be to change the XIVE IRQ domain to use a > 'Tree' domain for reverse mapping and not the 'No Map' domain mapping. > I will keep you updated for XIVE. I bet there is a bit more to it. From what you are saying above, something rather ungodly is happening in the XIVE code. M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 0/5] KVM: Turn the vcpu array into an xarray
On Tue, 16 Nov 2021 15:03:40 +, Paolo Bonzini wrote: > > On 11/5/21 20:20, Marc Zyngier wrote: > > The kvm structure is pretty large. A large portion of it is the vcpu > > array, which is 4kB on x86_64 and arm64 as they deal with 512 vcpu > > VMs. Of course, hardly anyone runs VMs this big, so this is often a > > net waste of memory and cache locality. > > > > A possible approach is to turn the fixed-size array into an xarray, > > which results in a net code deletion after a bit of cleanup. > > > > This series is on top of the current linux/master as it touches the > > RISC-V implementation. Only tested on arm64. > > Queued, only locally until I get a review for my replacement of patch > 4 (see > https://lore.kernel.org/kvm/2026142205.719375-1-pbonz...@redhat.com/T/). In which case, let me send a v2 with the changes that we discussed with Sean. It will still have my version of patch 4, but that's nothing you can't fix. M. -- Without deviation from the norm, progress is not possible.
[PATCH v2 0/7] KVM: Turn the vcpu array into an xarray
The kvm structure is pretty large. A large portion of it is the vcpu array, which is 4kB on arm64 with 512 vcpu, double that on x86-64. Of course, hardly anyone runs VMs this big, so this is often a net waste of memory and cache locality. A possible approach is to turn the fixed-size array into an xarray, which results in a net code deletion after a bit of cleanup. * From v1: - Rebased on v5.16-rc1 - Dropped the dubious locking on teardown - Converted kvm_for_each_vcpu() to xa_for_each_range(), together with an invasive change converting the index to an unsigned long Marc Zyngier (7): KVM: Move wiping of the kvm->vcpus array to common code KVM: mips: Use kvm_get_vcpu() instead of open-coded access KVM: s390: Use kvm_get_vcpu() instead of open-coded access KVM: x86: Use kvm_get_vcpu() instead of open-coded access KVM: Convert the kvm->vcpus array to a xarray KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index KVM: Convert kvm_for_each_vcpu() to using xa_for_each_range() arch/arm64/kvm/arch_timer.c | 8 ++--- arch/arm64/kvm/arm.c | 16 +++-- arch/arm64/kvm/pmu-emul.c | 2 +- arch/arm64/kvm/psci.c | 6 ++-- arch/arm64/kvm/reset.c| 2 +- arch/arm64/kvm/vgic/vgic-init.c | 10 +++--- arch/arm64/kvm/vgic/vgic-kvm-device.c | 2 +- arch/arm64/kvm/vgic/vgic-mmio-v2.c| 3 +- arch/arm64/kvm/vgic/vgic-mmio-v3.c| 7 ++-- arch/arm64/kvm/vgic/vgic-v3.c | 4 +-- arch/arm64/kvm/vgic/vgic-v4.c | 5 +-- arch/arm64/kvm/vgic/vgic.c| 2 +- arch/mips/kvm/loongson_ipi.c | 4 +-- arch/mips/kvm/mips.c | 23 ++--- arch/powerpc/kvm/book3s_32_mmu.c | 2 +- arch/powerpc/kvm/book3s_64_mmu.c | 2 +- arch/powerpc/kvm/book3s_hv.c | 8 ++--- arch/powerpc/kvm/book3s_pr.c | 2 +- arch/powerpc/kvm/book3s_xics.c| 6 ++-- arch/powerpc/kvm/book3s_xics.h| 2 +- arch/powerpc/kvm/book3s_xive.c| 15 + arch/powerpc/kvm/book3s_xive.h| 4 +-- arch/powerpc/kvm/book3s_xive_native.c | 8 ++--- arch/powerpc/kvm/e500_emulate.c | 2 +- arch/powerpc/kvm/powerpc.c| 10 +- arch/riscv/kvm/vcpu_sbi.c | 2 +- arch/riscv/kvm/vm.c | 10 +- arch/riscv/kvm/vmid.c | 2 +- arch/s390/kvm/interrupt.c | 2 +- arch/s390/kvm/kvm-s390.c | 47 ++- arch/s390/kvm/kvm-s390.h | 4 +-- arch/x86/kvm/hyperv.c | 7 ++-- arch/x86/kvm/i8254.c | 2 +- arch/x86/kvm/i8259.c | 5 +-- arch/x86/kvm/ioapic.c | 4 +-- arch/x86/kvm/irq_comm.c | 7 ++-- arch/x86/kvm/kvm_onhyperv.c | 3 +- arch/x86/kvm/lapic.c | 6 ++-- arch/x86/kvm/svm/avic.c | 2 +- arch/x86/kvm/svm/sev.c| 3 +- arch/x86/kvm/vmx/posted_intr.c| 2 +- arch/x86/kvm/x86.c| 30 +++-- include/linux/kvm_host.h | 17 +- virt/kvm/kvm_main.c | 41 --- 44 files changed, 158 insertions(+), 193 deletions(-) -- 2.30.2
[PATCH v2 1/7] KVM: Move wiping of the kvm->vcpus array to common code
All architectures have similar loops iterating over the vcpus, freeing one vcpu at a time, and eventually wiping the reference off the vcpus array. They are also inconsistently taking the kvm->lock mutex when wiping the references from the array. Make this code common, which will simplify further changes. The locking is dropped altogether, as this should only be called when there is no further references on the kvm structure. Reviewed-by: Claudio Imbrenda Signed-off-by: Marc Zyngier --- arch/arm64/kvm/arm.c | 10 +- arch/mips/kvm/mips.c | 21 + arch/powerpc/kvm/powerpc.c | 10 +- arch/riscv/kvm/vm.c| 10 +- arch/s390/kvm/kvm-s390.c | 18 +- arch/x86/kvm/x86.c | 9 + include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c| 17 +++-- 8 files changed, 22 insertions(+), 75 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 2f03cbfefe67..29942d8c357c 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -175,19 +175,11 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) */ void kvm_arch_destroy_vm(struct kvm *kvm) { - int i; - bitmap_free(kvm->arch.pmu_filter); kvm_vgic_destroy(kvm); - for (i = 0; i < KVM_MAX_VCPUS; ++i) { - if (kvm->vcpus[i]) { - kvm_vcpu_destroy(kvm->vcpus[i]); - kvm->vcpus[i] = NULL; - } - } - atomic_set(&kvm->online_vcpus, 0); + kvm_destroy_vcpus(kvm); } int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 562aa878b266..ceacca74f808 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -171,25 +171,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) return 0; } -void kvm_mips_free_vcpus(struct kvm *kvm) -{ - unsigned int i; - struct kvm_vcpu *vcpu; - - kvm_for_each_vcpu(i, vcpu, kvm) { - kvm_vcpu_destroy(vcpu); - } - - mutex_lock(&kvm->lock); - - for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) - kvm->vcpus[i] = NULL; - - atomic_set(&kvm->online_vcpus, 0); - - mutex_unlock(&kvm->lock); -} - static void kvm_mips_free_gpa_pt(struct kvm *kvm) { /* It should always be safe to remove after flushing the whole range */ @@ -199,7 +180,7 @@ static void kvm_mips_free_gpa_pt(struct kvm *kvm) void kvm_arch_destroy_vm(struct kvm *kvm) { - kvm_mips_free_vcpus(kvm); + kvm_destroy_vcpus(kvm); kvm_mips_free_gpa_pt(kvm); } diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 35e9cccdeef9..492e4a4121cb 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -463,9 +463,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) void kvm_arch_destroy_vm(struct kvm *kvm) { - unsigned int i; - struct kvm_vcpu *vcpu; - #ifdef CONFIG_KVM_XICS /* * We call kick_all_cpus_sync() to ensure that all @@ -476,14 +473,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kick_all_cpus_sync(); #endif - kvm_for_each_vcpu(i, vcpu, kvm) - kvm_vcpu_destroy(vcpu); + kvm_destroy_vcpus(kvm); mutex_lock(&kvm->lock); - for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) - kvm->vcpus[i] = NULL; - - atomic_set(&kvm->online_vcpus, 0); kvmppc_core_destroy_vm(kvm); diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 26399df15b63..6af6cde295eb 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -46,15 +46,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) void kvm_arch_destroy_vm(struct kvm *kvm) { - int i; - - for (i = 0; i < KVM_MAX_VCPUS; ++i) { - if (kvm->vcpus[i]) { - kvm_vcpu_destroy(kvm->vcpus[i]); - kvm->vcpus[i] = NULL; - } - } - atomic_set(&kvm->online_vcpus, 0); + kvm_destroy_vcpus(kvm); } int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index c6257f625929..7af53b8788fa 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -2819,27 +2819,11 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) free_page((unsigned long)(vcpu->arch.sie_block)); } -static void kvm_free_vcpus(struct kvm *kvm) -{ - unsigned int i; - struct kvm_vcpu *vcpu; - - kvm_for_each_vcpu(i, vcpu, kvm) - kvm_vcpu_destroy(vcpu); - - mutex_lock(&kvm->lock); - for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) - kvm->vcpus[i] = NUL
[PATCH v2 2/7] KVM: mips: Use kvm_get_vcpu() instead of open-coded access
As we are about to change the way vcpus are allocated, mandate the use of kvm_get_vcpu() instead of open-coding the access. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Marc Zyngier --- arch/mips/kvm/loongson_ipi.c | 4 ++-- arch/mips/kvm/mips.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/mips/kvm/loongson_ipi.c b/arch/mips/kvm/loongson_ipi.c index 3681fc8fba38..5d53f32d837c 100644 --- a/arch/mips/kvm/loongson_ipi.c +++ b/arch/mips/kvm/loongson_ipi.c @@ -120,7 +120,7 @@ static int loongson_vipi_write(struct loongson_kvm_ipi *ipi, s->status |= data; irq.cpu = id; irq.irq = 6; - kvm_vcpu_ioctl_interrupt(kvm->vcpus[id], &irq); + kvm_vcpu_ioctl_interrupt(kvm_get_vcpu(kvm, id), &irq); break; case CORE0_CLEAR_OFF: @@ -128,7 +128,7 @@ static int loongson_vipi_write(struct loongson_kvm_ipi *ipi, if (!s->status) { irq.cpu = id; irq.irq = -6; - kvm_vcpu_ioctl_interrupt(kvm->vcpus[id], &irq); + kvm_vcpu_ioctl_interrupt(kvm_get_vcpu(kvm, id), &irq); } break; diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index ceacca74f808..6228bf396d63 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -479,7 +479,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, if (irq->cpu == -1) dvcpu = vcpu; else - dvcpu = vcpu->kvm->vcpus[irq->cpu]; + dvcpu = kvm_get_vcpu(vcpu->kvm, irq->cpu); if (intr == 2 || intr == 3 || intr == 4 || intr == 6) { kvm_mips_callbacks->queue_io_int(dvcpu, irq); -- 2.30.2
[PATCH v2 3/7] KVM: s390: Use kvm_get_vcpu() instead of open-coded access
As we are about to change the way vcpus are allocated, mandate the use of kvm_get_vcpu() instead of open-coding the access. Reviewed-by: Claudio Imbrenda Signed-off-by: Marc Zyngier --- arch/s390/kvm/kvm-s390.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 7af53b8788fa..4a0f62b03964 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4572,7 +4572,7 @@ int kvm_s390_vcpu_start(struct kvm_vcpu *vcpu) } for (i = 0; i < online_vcpus; i++) { - if (!is_vcpu_stopped(vcpu->kvm->vcpus[i])) + if (!is_vcpu_stopped(kvm_get_vcpu(vcpu->kvm, i))) started_vcpus++; } @@ -4634,9 +4634,11 @@ int kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu) __disable_ibs_on_vcpu(vcpu); for (i = 0; i < online_vcpus; i++) { - if (!is_vcpu_stopped(vcpu->kvm->vcpus[i])) { + struct kvm_vcpu *tmp = kvm_get_vcpu(vcpu->kvm, i); + + if (!is_vcpu_stopped(tmp)) { started_vcpus++; - started_vcpu = vcpu->kvm->vcpus[i]; + started_vcpu = tmp; } } -- 2.30.2
[PATCH v2 4/7] KVM: x86: Use kvm_get_vcpu() instead of open-coded access
As we are about to change the way vcpus are allocated, mandate the use of kvm_get_vcpu() instead of open-coding the access. Signed-off-by: Marc Zyngier --- arch/x86/kvm/vmx/posted_intr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c index 5f81ef092bd4..82a49720727d 100644 --- a/arch/x86/kvm/vmx/posted_intr.c +++ b/arch/x86/kvm/vmx/posted_intr.c @@ -272,7 +272,7 @@ int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq, if (!kvm_arch_has_assigned_device(kvm) || !irq_remapping_cap(IRQ_POSTING_CAP) || - !kvm_vcpu_apicv_active(kvm->vcpus[0])) + !kvm_vcpu_apicv_active(kvm_get_vcpu(kvm, 0))) return 0; idx = srcu_read_lock(&kvm->irq_srcu); -- 2.30.2
[PATCH v2 5/7] KVM: Convert the kvm->vcpus array to a xarray
At least on arm64 and x86, the vcpus array is pretty huge (up to 1024 entries on x86) and is mostly empty in the majority of the cases (running 1k vcpu VMs is not that common). This mean that we end-up with a 4kB block of unused memory in the middle of the kvm structure. Instead of wasting away this memory, let's use an xarray instead, which gives us almost the same flexibility as a normal array, but with a reduced memory usage with smaller VMs. Signed-off-by: Marc Zyngier --- include/linux/kvm_host.h | 5 +++-- virt/kvm/kvm_main.c | 15 +-- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9a8c997ff9bc..df1b2b183d94 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -552,7 +553,7 @@ struct kvm { struct mutex slots_arch_lock; struct mm_struct *mm; /* userspace tied to this vm */ struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM]; - struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; + struct xarray vcpu_array; /* Used to wait for completion of MMU notifiers. */ spinlock_t mn_invalidate_lock; @@ -701,7 +702,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) /* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu. */ smp_rmb(); - return kvm->vcpus[i]; + return xa_load(&kvm->vcpu_array, i); } #define kvm_for_each_vcpu(idx, vcpup, kvm) \ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 73811c3829a2..9a7bce944427 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -458,7 +458,7 @@ void kvm_destroy_vcpus(struct kvm *kvm) kvm_for_each_vcpu(i, vcpu, kvm) { kvm_vcpu_destroy(vcpu); - kvm->vcpus[i] = NULL; + xa_erase(&kvm->vcpu_array, i); } atomic_set(&kvm->online_vcpus, 0); @@ -1063,6 +1063,7 @@ static struct kvm *kvm_create_vm(unsigned long type) mutex_init(&kvm->slots_arch_lock); spin_lock_init(&kvm->mn_invalidate_lock); rcuwait_init(&kvm->mn_memslots_update_rcuwait); + xa_init(&kvm->vcpu_array); INIT_LIST_HEAD(&kvm->devices); @@ -3658,7 +3659,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) } vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus); - BUG_ON(kvm->vcpus[vcpu->vcpu_idx]); + r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT); + BUG_ON(r == -EBUSY); + if (r) + goto unlock_vcpu_destroy; /* Fill the stats id string for the vcpu */ snprintf(vcpu->stats_id, sizeof(vcpu->stats_id), "kvm-%d/vcpu-%d", @@ -3668,15 +3672,14 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) kvm_get_kvm(kvm); r = create_vcpu_fd(vcpu); if (r < 0) { + xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx); kvm_put_kvm_no_destroy(kvm); goto unlock_vcpu_destroy; } - kvm->vcpus[vcpu->vcpu_idx] = vcpu; - /* -* Pairs with smp_rmb() in kvm_get_vcpu. Write kvm->vcpus -* before kvm->online_vcpu's incremented value. +* Pairs with smp_rmb() in kvm_get_vcpu. Store the vcpu +* pointer before kvm->online_vcpu's incremented value. */ smp_wmb(); atomic_inc(&kvm->online_vcpus); -- 2.30.2
[PATCH v2 6/7] KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index
Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu index. Unfortunately, we're about to move rework the iterator, which requires this to be upgrade to an unsigned long. Let's bite the bullet and repaint all of it in one go. Signed-off-by: Marc Zyngier --- arch/arm64/kvm/arch_timer.c | 8 arch/arm64/kvm/arm.c | 6 +++--- arch/arm64/kvm/pmu-emul.c | 2 +- arch/arm64/kvm/psci.c | 6 +++--- arch/arm64/kvm/reset.c| 2 +- arch/arm64/kvm/vgic/vgic-init.c | 10 ++ arch/arm64/kvm/vgic/vgic-kvm-device.c | 2 +- arch/arm64/kvm/vgic/vgic-mmio-v2.c| 3 +-- arch/arm64/kvm/vgic/vgic-mmio-v3.c| 7 --- arch/arm64/kvm/vgic/vgic-v3.c | 4 ++-- arch/arm64/kvm/vgic/vgic-v4.c | 5 +++-- arch/arm64/kvm/vgic/vgic.c| 2 +- arch/powerpc/kvm/book3s_32_mmu.c | 2 +- arch/powerpc/kvm/book3s_64_mmu.c | 2 +- arch/powerpc/kvm/book3s_hv.c | 8 arch/powerpc/kvm/book3s_pr.c | 2 +- arch/powerpc/kvm/book3s_xics.c| 6 +++--- arch/powerpc/kvm/book3s_xics.h| 2 +- arch/powerpc/kvm/book3s_xive.c| 15 +-- arch/powerpc/kvm/book3s_xive.h| 4 ++-- arch/powerpc/kvm/book3s_xive_native.c | 8 arch/powerpc/kvm/e500_emulate.c | 2 +- arch/riscv/kvm/vcpu_sbi.c | 2 +- arch/riscv/kvm/vmid.c | 2 +- arch/s390/kvm/interrupt.c | 2 +- arch/s390/kvm/kvm-s390.c | 21 +++-- arch/s390/kvm/kvm-s390.h | 4 ++-- arch/x86/kvm/hyperv.c | 7 --- arch/x86/kvm/i8254.c | 2 +- arch/x86/kvm/i8259.c | 5 +++-- arch/x86/kvm/ioapic.c | 4 ++-- arch/x86/kvm/irq_comm.c | 7 --- arch/x86/kvm/kvm_onhyperv.c | 3 ++- arch/x86/kvm/lapic.c | 6 +++--- arch/x86/kvm/svm/avic.c | 2 +- arch/x86/kvm/svm/sev.c| 3 ++- arch/x86/kvm/x86.c| 21 +++-- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 13 +++-- 39 files changed, 114 insertions(+), 100 deletions(-) diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c index 3df67c127489..d6f4114f1d11 100644 --- a/arch/arm64/kvm/arch_timer.c +++ b/arch/arm64/kvm/arch_timer.c @@ -750,7 +750,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu) /* Make the updates of cntvoff for all vtimer contexts atomic */ static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff) { - int i; + unsigned long i; struct kvm *kvm = vcpu->kvm; struct kvm_vcpu *tmp; @@ -1189,8 +1189,8 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu) static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu) { - int vtimer_irq, ptimer_irq; - int i, ret; + int vtimer_irq, ptimer_irq, ret; + unsigned long i; vtimer_irq = vcpu_vtimer(vcpu)->irq.irq; ret = kvm_vgic_set_owner(vcpu, vtimer_irq, vcpu_vtimer(vcpu)); @@ -1297,7 +1297,7 @@ void kvm_timer_init_vhe(void) static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq) { struct kvm_vcpu *vcpu; - int i; + unsigned long i; kvm_for_each_vcpu(i, vcpu, kvm) { vcpu_vtimer(vcpu)->irq.irq = vtimer_irq; diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 29942d8c357c..6177b0c882ea 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -624,7 +624,7 @@ bool kvm_arch_intc_initialized(struct kvm *kvm) void kvm_arm_halt_guest(struct kvm *kvm) { - int i; + unsigned long i; struct kvm_vcpu *vcpu; kvm_for_each_vcpu(i, vcpu, kvm) @@ -634,7 +634,7 @@ void kvm_arm_halt_guest(struct kvm *kvm) void kvm_arm_resume_guest(struct kvm *kvm) { - int i; + unsigned long i; struct kvm_vcpu *vcpu; kvm_for_each_vcpu(i, vcpu, kvm) { @@ -2020,7 +2020,7 @@ static int finalize_hyp_mode(void) struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) { struct kvm_vcpu *vcpu; - int i; + unsigned long i; mpidr &= MPIDR_HWID_BITMASK; kvm_for_each_vcpu(i, vcpu, kvm) { diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index a5e4bbf5e68f..0404357705a8 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -900,7 +900,7 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu) */ static bool pmu_irq_is_valid(struct kvm *kvm, int irq) { - int i; + unsigned long i; struct kvm_vcpu *vcpu; kvm_for_each_vcpu(i, vcpu, kvm) { diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c index 74c47d420253..ed675fce8fb7 100644 --- a/arch/arm64/kvm/psci.c +++ b/arch/arm64/kvm/psci.c @@ -121,8 +121,8
[PATCH v2 7/7] KVM: Convert kvm_for_each_vcpu() to using xa_for_each_range()
Now that the vcpu array is backed by an xarray, use the optimised iterator that matches the underlying data structure. Suggested-by: Sean Christopherson Signed-off-by: Marc Zyngier --- include/linux/kvm_host.h | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index dc635fbfd901..d20e33378c63 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -705,11 +705,9 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) return xa_load(&kvm->vcpu_array, i); } -#define kvm_for_each_vcpu(idx, vcpup, kvm) \ - for (idx = 0; \ -idx < atomic_read(&kvm->online_vcpus) && \ -(vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ -idx++) +#define kvm_for_each_vcpu(idx, vcpup, kvm)\ + xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, \ + (atomic_read(&kvm->online_vcpus) - 1)) static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id) { -- 2.30.2
Re: [PATCH] powerpc/xive: Change IRQ domain to a tree domain
On Tue, 16 Nov 2021 13:40:22 +, Cédric Le Goater wrote: > > Commit 4f86a06e2d6e ("irqdomain: Make normal and nomap irqdomains > exclusive") introduced an IRQ_DOMAIN_FLAG_NO_MAP flag to isolate the > 'nomap' domains still in use under the powerpc arch. With this new > flag, the revmap_tree of the IRQ domain is not used anymore. This > change broke the support of shared LSIs [1] in the XIVE driver because > it was relying on a lookup in the revmap_tree to query previously > mapped interrupts. Just a lookup? Surely there is more to it, no? > Linux now creates two distinct IRQ mappings on the > same HW IRQ which can lead to unexpected behavior in the drivers. > > The XIVE IRQ domain is not a direct mapping domain and its HW IRQ > interrupt number space is rather large : 1M/socket on POWER9 and > POWER10, change the XIVE driver to use a 'tree' domain type instead. > > [1] For instance, a linux KVM guest with virtio-rng and virtio-balloon > devices. > > Cc: Marc Zyngier > Cc: sta...@vger.kernel.org # v5.14+ > Fixes: 4f86a06e2d6e ("irqdomain: Make normal and nomap irqdomains exclusive") > Signed-off-by: Cédric Le Goater > --- > > Marc, > > The Fixes tag is there because the patch in question revealed that > something was broken in XIVE. genirq is not in cause. However, I > don't know for PS3 and Cell. May be less critical for now. Depends if they expect something that a no-map domain cannot provide. > > arch/powerpc/sysdev/xive/common.c | 3 +-- > arch/powerpc/sysdev/xive/Kconfig | 1 - > 2 files changed, 1 insertion(+), 3 deletions(-) > > diff --git a/arch/powerpc/sysdev/xive/common.c > b/arch/powerpc/sysdev/xive/common.c > index fed6fd16c8f4..9d0f0fe25598 100644 > --- a/arch/powerpc/sysdev/xive/common.c > +++ b/arch/powerpc/sysdev/xive/common.c > @@ -1536,8 +1536,7 @@ static const struct irq_domain_ops xive_irq_domain_ops > = { > > static void __init xive_init_host(struct device_node *np) > { > - xive_irq_domain = irq_domain_add_nomap(np, XIVE_MAX_IRQ, > -&xive_irq_domain_ops, NULL); > + xive_irq_domain = irq_domain_add_tree(np, &xive_irq_domain_ops, NULL); > if (WARN_ON(xive_irq_domain == NULL)) > return; > irq_set_default_host(xive_irq_domain); > diff --git a/arch/powerpc/sysdev/xive/Kconfig > b/arch/powerpc/sysdev/xive/Kconfig > index 97796c6b63f0..785c292d104b 100644 > --- a/arch/powerpc/sysdev/xive/Kconfig > +++ b/arch/powerpc/sysdev/xive/Kconfig > @@ -3,7 +3,6 @@ config PPC_XIVE > bool > select PPC_SMP_MUXED_IPI > select HARDIRQS_SW_RESEND > - select IRQ_DOMAIN_NOMAP > > config PPC_XIVE_NATIVE > bool As long as this works, I'm happy with one less no-map user. Acked-by: Marc Zyngier M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v3 07/12] KVM: arm64: Use Makefile.kvm for common files
On Wed, 17 Nov 2021 17:39:58 +, David Woodhouse wrote: > > From: David Woodhouse > > Signed-off-by: David Woodhouse > --- > arch/arm64/kvm/Makefile | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile > index 989bb5dad2c8..04a53f71a6b6 100644 > --- a/arch/arm64/kvm/Makefile > +++ b/arch/arm64/kvm/Makefile > @@ -5,14 +5,12 @@ > > ccflags-y += -I $(srctree)/$(src) > > -KVM=../../../virt/kvm > +include $(srctree)/virt/kvm/Makefile.kvm > > obj-$(CONFIG_KVM) += kvm.o > obj-$(CONFIG_KVM) += hyp/ > > -kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \ > - $(KVM)/vfio.o $(KVM)/irqchip.o $(KVM)/binary_stats.o \ > - arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \ > +kvm-y += arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \ >inject_fault.o va_layout.o handle_exit.o \ >guest.o debug.o reset.o sys_regs.o \ >vgic-sys-reg-v3.o fpsimd.o pmu.o \ Acked-by: Marc Zyngier M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v3 02/12] KVM: Add Makefile.kvm for common files, use it for x86
On Wed, 17 Nov 2021 17:39:53 +, David Woodhouse wrote: > > From: David Woodhouse > > Splitting kvm_main.c out into smaller and better-organized files is > slightly non-trivial when it involves editing a bunch of per-arch > KVM makefiles. Provide virt/kvm/Makefile.kvm for them to include. > > Signed-off-by: David Woodhouse > --- > arch/x86/kvm/Makefile | 7 +-- > virt/kvm/Makefile.kvm | 13 + > 2 files changed, 14 insertions(+), 6 deletions(-) > create mode 100644 virt/kvm/Makefile.kvm > > diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile > index 75dfd27b6e8a..30f244b64523 100644 > --- a/arch/x86/kvm/Makefile > +++ b/arch/x86/kvm/Makefile > @@ -7,12 +7,7 @@ ifeq ($(CONFIG_FRAME_POINTER),y) > OBJECT_FILES_NON_STANDARD_vmenter.o := y > endif > > -KVM := ../../../virt/kvm > - > -kvm-y+= $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ > - $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o > \ > - $(KVM)/dirty_ring.o $(KVM)/binary_stats.o > -kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o > +include $(srctree)/virt/kvm/Makefile.kvm > > kvm-y+= x86.o emulate.o i8259.o irq.o lapic.o \ > i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ > diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm > new file mode 100644 > index ..ffdcad3cc97a > --- /dev/null > +++ b/virt/kvm/Makefile.kvm > @@ -0,0 +1,13 @@ > +# SPDX-License-Identifier: GPL-2.0 > +# > +# Makefile for Kernel-based Virtual Machine module > +# > + > +KVM ?= ../../../virt/kvm > + > +kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o > +kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o > +kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o > +kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o > +kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o > +kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o Acked-by: Marc Zyngier M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v3 08/12] KVM: Propagate vcpu explicitly to mark_page_dirty_in_slot()
On Wed, 17 Nov 2021 17:39:59 +, David Woodhouse wrote: > > From: David Woodhouse > > The kvm_dirty_ring_get() function uses kvm_get_running_vcpu() to work out > which dirty ring to use, but there are some use cases where that doesn't > work. > > There's one in setting the Xen shared info page, introduced in commit > 629b5348841a ("KVM: x86/xen: update wallclock region") and reported by > "butt3rflyh4ck" in > https://lore.kernel.org/kvm/cafco6xomos7eacn_n6v4txk7xl7iqra2gabg3f7e3naf5ug...@mail.gmail.com/ > > There's also about to be another one when the newly-reintroduced > gfn_to_pfn_cache needs to mark a page as dirty from the MMU notifier > which invalidates the mapping. In that case, we will *know* the vcpu > that can be 'blamed' for dirtying the page, and we just need to be > able to pass it in as an explicit argument when doing so. > > This patch preemptively resolves the second issue, and paves the way > for resolving the first. A complete fix for the first issue will need > us to switch the Xen shinfo to be owned by a particular vCPU, which > will happen in a separate patch. > > Signed-off-by: David Woodhouse > --- > arch/arm64/kvm/mmu.c | 2 +- > arch/x86/kvm/mmu/mmu.c | 2 +- > arch/x86/kvm/mmu/spte.c| 2 +- > arch/x86/kvm/mmu/tdp_mmu.c | 2 +- > arch/x86/kvm/x86.c | 4 ++-- > include/linux/kvm_dirty_ring.h | 6 -- > include/linux/kvm_host.h | 3 ++- > virt/kvm/dirty_ring.c | 8 ++-- > virt/kvm/kvm_main.c| 18 +- > 9 files changed, 27 insertions(+), 20 deletions(-) What's the base for this series? This patch fails to compile for me (at least on arm64), and the following patch doesn't apply on -rc1. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
On 2020-11-25 14:09, Laurent Vivier wrote: On 25/11/2020 14:20, Thomas Gleixner wrote: Laurent, On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote: The proper subsystem prefix is: 'genirq/irqdomain:' and the first letter after the colon wants to be uppercase. Ok. This function adds an affinity parameter to irq_create_mapping(). This parameter is needed to pass it to irq_domain_alloc_descs(). A changelog has to explain the WHY. 'The parameter is needed' is not really useful information. The reason of this change is explained in PATCH 2. I have two patches, one to change the interface with no functional change (PATCH 1) and one to fix the problem (PATCH 2). Moreover they don't cover the same subsystems. I can either: - merge the two patches - or make a reference in the changelog of PATCH 1 to PATCH 2 (something like "(see folowing patch "powerpc/pseries: pass MSI affinity to irq_create_mapping()")") - or copy some information from PATCH 2 (something like "this parameter is needed by rtas_setup_msi_irqs() to pass the affinity to irq_domain_alloc_descs() to fix multiqueue affinity") What do you prefer? How about something like this for the first patch: "There is currently no way to convey the affinity of an interrupt via irq_create_mapping(), which creates issues for devices that expect that affinity to be managed by the kernel. In order to sort this out, rename irq_create_mapping() to irq_create_mapping_affinity() with an additional affinity parameter that can conveniently passed down to irq_domain_alloc_descs(). irq_create_mapping() is then re-implemented as a wrapper around irq_create_mapping_affinity()." Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
On 2020-11-25 16:24, Laurent Vivier wrote: On 25/11/2020 17:05, Denis Kirjanov wrote: On 11/25/20, Laurent Vivier wrote: With virtio multiqueue, normally each queue IRQ is mapped to a CPU. But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity") this is broken on pseries. Please add "Fixes" tag. In fact, the code in commit 0d9f0a52c8b9f is correct. The problem is with MSI/X irq affinity and pseries. So this patch fixes more than virtio_scsi. I put this information because this commit allows to clearly show the problem. Perhaps I should remove this line in fact? This patch does not fix virtio_scsi at all, which as you noticed, is correct. It really fixes the PPC MSI setup, which is starting to show its age. So getting rid of the reference seems like the right thing to do. I'm also not keen on the BugId thing. It should really be a lore link. I also cannot find any such tag in the kernel, nor is it a documented practice. The last reference to a Bugzilla entry seems to have happened with 786b5219081ff16 (five years ago). Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
Hi David, On Thu, 24 Jun 2021 04:57:45 +0100, David Stevens wrote: > > From: David Stevens > > Return a struct kvm_pfn_page containing both a pfn and an optional > struct page from the gfn_to_pfn family of functions. This differentiates > the gup and follow_fault_pfn cases, which allows callers that only need > a pfn to avoid touching the page struct in the latter case. For callers > that need a struct page, introduce a helper function that unwraps a > struct kvm_pfn_page into a struct page. This helper makes the call to > kvm_get_pfn which had previously been in hva_to_pfn_remapped. > > For now, wrap all calls to gfn_to_pfn functions in the new helper > function. Callers which don't need the page struct will be updated in > follow-up patches. > > Signed-off-by: David Stevens > --- > arch/arm64/kvm/mmu.c | 5 +- > arch/mips/kvm/mmu.c| 3 +- > arch/powerpc/kvm/book3s.c | 3 +- > arch/powerpc/kvm/book3s_64_mmu_hv.c| 5 +- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 +- > arch/powerpc/kvm/book3s_hv_uvmem.c | 4 +- > arch/powerpc/kvm/e500_mmu_host.c | 2 +- > arch/x86/kvm/mmu/mmu.c | 11 ++- > arch/x86/kvm/mmu/mmu_audit.c | 2 +- > arch/x86/kvm/x86.c | 2 +- > drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +- > include/linux/kvm_host.h | 27 -- > include/linux/kvm_types.h | 5 + > virt/kvm/kvm_main.c| 121 + > 14 files changed, 109 insertions(+), 88 deletions(-) > [...] > +kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg) > +{ > + if (pfnpg.page) > + return pfnpg.pfn; > + > + kvm_get_pfn(pfnpg.pfn); > + return pfnpg.pfn; > +} > +EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap); I'd really like to see a tiny bit of documentation explaining that calls to kvm_pfn_page_unwrap() are not idempotent. Otherwise, looks good to me. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
On Thu, 24 Jun 2021 09:58:00 +0100, Nicholas Piggin wrote: > > Excerpts from David Stevens's message of June 24, 2021 1:57 pm: > > From: David Stevens > > out_unlock: > > if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)) > > read_unlock(&vcpu->kvm->mmu_lock); > > else > > write_unlock(&vcpu->kvm->mmu_lock); > > - kvm_release_pfn_clean(pfn); > > + if (pfnpg.page) > > + put_page(pfnpg.page); > > return r; > > } > > How about > > kvm_release_pfn_page_clean(pfnpg); I'm not sure. I always found kvm_release_pfn_clean() ugly, because it doesn't mark the page 'clean'. I find put_page() more correct. Something like 'kvm_put_pfn_page()' would make more sense, but I'm so bad at naming things that I could just as well call it 'bob()'. M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
On Thu, 24 Jun 2021 04:57:47 +0100, David Stevens wrote: > > From: David Stevens > > Avoid converting pfns returned by follow_fault_pfn to struct pages to > transiently take a reference. The reference was originally taken to > match the reference taken by gup. However, pfns returned by > follow_fault_pfn may not have a struct page set up for reference > counting. > > Signed-off-by: David Stevens > --- > arch/arm64/kvm/mmu.c | 43 +++ > 1 file changed, 23 insertions(+), 20 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 896b3644b36f..a741972cb75f 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c [...] > @@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, >*/ > if (vma_pagesize == PAGE_SIZE && !force_pte) > vma_pagesize = transparent_hugepage_adjust(memslot, hva, > -&pfn, &fault_ipa); > +&pfnpg, &fault_ipa); > if (writable) > prot |= KVM_PGTABLE_PROT_W; > > if (fault_status != FSC_PERM && !device) > - clean_dcache_guest_page(pfn, vma_pagesize); > + clean_dcache_guest_page(pfnpg.pfn, vma_pagesize); > > if (exec_fault) { > prot |= KVM_PGTABLE_PROT_X; > - invalidate_icache_guest_page(pfn, vma_pagesize); > + invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize); This is going to clash with what is currently in -next, specially with MTE. Paolo, if you really want to take this in 5.13, can you please push a stable branch based on -rc4 or older so that I can merge it in and test it? Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 0/2] Fix arm64 boot regression in 5.14
On 2021-07-20 13:35, Will Deacon wrote: Hi folks, Jonathan reports [1] that commit c742199a014d ("mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge") breaks the boot on arm64 when huge mappings are used to map the kernel linear map but the VA size is configured such that PUDs are folded. This is because the non-functional pud_set_huge() stub is used to create the linear map, which results in 1GB holes and a fatal data abort when the kernel attemps to access them. Digging further into the issue, it also transpired that huge-vmap is silently disabled in these configurations as well [2], despite working correctly in 5.13. The latter issue causes the pgtable selftests to scream due to a failing consistency check [3]. Rather than leave mainline in a terminally broken state for arm64 while we figure this out, revert the offending commit to get things working again. Unfortunately, reverting the change in isolation causes a build breakage for 32-bit PowerPC 8xx machines which recently started relying on the problematic stubs to support pte-level huge-vmap entries [4]. Since Christophe is away at the moment, this series first reverts the PowerPC 8xx change in order to avoid breaking the build. I would really like this to land for -rc3 and I can take these via the arm64 fixes queue if the PowerPC folks are alright with them. Cheers, Will [1] https://lore.kernel.org/r/20210717160118.9855-1-jonat...@marek.ca [2] https://lore.kernel.org/r/20210719104918.GA6440@willie-the-truck [3] https://lore.kernel.org/r/camuhmdxshordox-xxaeufdw3wx2peggfsqhvshvznkcgk-y...@mail.gmail.com/ [4] https://lore.kernel.org/r/8b972f1c03fb6bd59953035f0a3e4d26659de4f8.1620795204.git.christophe.le...@csgroup.eu/ Cc: Ard Biesheuvel Cc: Michael Ellerman Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Christophe Leroy Cc: Paul Mackerras Cc: Jonathan Marek Cc: Catalin Marinas Cc: Andrew Morton Cc: Nicholas Piggin Cc: Mark Rutland Cc: Geert Uytterhoeven Cc: Marc Zyngier Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-arm-ker...@lists.infradead.org --->8 Jonathan Marek (1): Revert "mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge" Will Deacon (1): Revert "powerpc/8xx: add support for huge pages on VMAP and VMALLOC" arch/arm64/mm/mmu.c | 20 - arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 43 arch/x86/mm/pgtable.c| 34 +++- include/linux/pgtable.h | 26 +--- 5 files changed, 25 insertions(+), 100 deletions(-) Acked-by: Marc Zyngier M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v10 0/4] this patchset is to remove PPCisms for QEIC
On 01/11/17 17:09, Thomas Gleixner wrote: > On Wed, 1 Nov 2017, Qiang Zhao wrote: >> Michael Ellerman wrote >>> >>> Qiang Zhao writes: >>> Hi all, Could anybody review this patchset and take action on them? Thank you! >>> >>> Who maintains this? I don't actually know, it's not powerpc code, or is it? >> >> Yes, it's not powerpc code, it is irqchip code, maintained by Thomas, Jason >> and Marc according to MAINTAINERS file. >> >> Hi Thomas, Jason and Marc, >> >> Could you keep an eye on this patchset? Thank you! > > It's on my radar, but I have zero capacity at the moment. Hopefully Marc > can spare a few cycles. I'll try, but I haven't been cc-ed on that one. I'll try to dig it out. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v6 2/4] irqchip/qeic: merge qeic init code from platforms to a common function
On 28/09/16 04:25, Zhao Qiang wrote: > The codes of qe_ic init from a variety of platforms are redundant, > merge them to a common function and put it to irqchip/irq-qeic.c > > For non-p1021_mds mpc85xx_mds boards, use "qe_ic_init(np, 0, > qe_ic_cascade_low_mpic, qe_ic_cascade_high_mpic);" instead of > "qe_ic_init(np, 0, qe_ic_cascade_muxed_mpic, NULL);". > > qe_ic_cascade_muxed_mpic was used for boards has the same interrupt > number for low interrupt and high interrupt, qe_ic_init has checked > if "low interrupt == high interrupt" > > Signed-off-by: Zhao Qiang > --- > Changes for v2: > - modify subject and commit msg > - add check for qeic by type > Changes for v3: > - na > Changes for v4: > - na > Changes for v5: > - na > Changes for v6: > - rebase > > arch/powerpc/platforms/83xx/misc.c| 15 --- > arch/powerpc/platforms/85xx/corenet_generic.c | 9 - > arch/powerpc/platforms/85xx/mpc85xx_mds.c | 14 -- > arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 16 > arch/powerpc/platforms/85xx/twr_p102x.c | 14 -- > drivers/irqchip/irq-qeic.c| 16 > 6 files changed, 16 insertions(+), 68 deletions(-) > [...] > --- a/drivers/irqchip/irq-qeic.c > +++ b/drivers/irqchip/irq-qeic.c > @@ -598,4 +598,20 @@ static int __init init_qe_ic_sysfs(void) > return 0; > } > > +static int __init qeic_of_init(void) > +{ > + struct device_node *np; > + > + np = of_find_compatible_node(NULL, NULL, "fsl,qe-ic"); > + if (!np) { > + np = of_find_node_by_type(NULL, "qeic"); > + if (!np) > + return; > + } > + qe_ic_init(np, 0, qe_ic_cascade_low_mpic, > +qe_ic_cascade_high_mpic); > + of_node_put(np); > +} Have you actually compiled that code? Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v6 2/4] irqchip/qeic: merge qeic init code from platforms to a common function
On 16/12/16 08:43, Qiang Zhao wrote: > On 16/12/16 04:33, Marc Zyngier wrote: > >> -Original Message----- >> From: Marc Zyngier [mailto:marc.zyng...@arm.com] >> Sent: Friday, December 16, 2016 4:33 PM >> To: Qiang Zhao ; o...@buserror.net; t...@linutronix.de >> Cc: ja...@lakedaemon.net; Xiaobo Xie ; linux- >> ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org >> Subject: Re: [PATCH v6 2/4] irqchip/qeic: merge qeic init code from >> platforms to >> a common function >> >> On 28/09/16 04:25, Zhao Qiang wrote: >>> The codes of qe_ic init from a variety of platforms are redundant, >>> merge them to a common function and put it to irqchip/irq-qeic.c >>> >>> For non-p1021_mds mpc85xx_mds boards, use "qe_ic_init(np, 0, >>> qe_ic_cascade_low_mpic, qe_ic_cascade_high_mpic);" instead of >>> "qe_ic_init(np, 0, qe_ic_cascade_muxed_mpic, NULL);". >>> >>> qe_ic_cascade_muxed_mpic was used for boards has the same interrupt >>> number for low interrupt and high interrupt, qe_ic_init has checked if >>> "low interrupt == high interrupt" >>> >>> Signed-off-by: Zhao Qiang >>> --- >>> Changes for v2: >>> - modify subject and commit msg >>> - add check for qeic by type >>> Changes for v3: >>> - na >>> Changes for v4: >>> - na >>> Changes for v5: >>> - na >>> Changes for v6: >>> - rebase >>> >>> arch/powerpc/platforms/83xx/misc.c| 15 --- >>> arch/powerpc/platforms/85xx/corenet_generic.c | 9 - >>> arch/powerpc/platforms/85xx/mpc85xx_mds.c | 14 -- >>> arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 16 >>> arch/powerpc/platforms/85xx/twr_p102x.c | 14 -- >>> drivers/irqchip/irq-qeic.c| 16 >>> 6 files changed, 16 insertions(+), 68 deletions(-) >>> >> >> [...] >> >>> --- a/drivers/irqchip/irq-qeic.c >>> +++ b/drivers/irqchip/irq-qeic.c >>> @@ -598,4 +598,20 @@ static int __init init_qe_ic_sysfs(void) >>> return 0; >>> } >>> >>> +static int __init qeic_of_init(void) >>> +{ >>> + struct device_node *np; >>> + >>> + np = of_find_compatible_node(NULL, NULL, "fsl,qe-ic"); >>> + if (!np) { >>> + np = of_find_node_by_type(NULL, "qeic"); >>> + if (!np) >>> + return; >>> + } >>> + qe_ic_init(np, 0, qe_ic_cascade_low_mpic, >>> + qe_ic_cascade_high_mpic); >>> + of_node_put(np); >>> +} >> >> Have you actually compiled that code? > > Yes. And the abundance of warnings that the compiler certainly spits doesn't strike you as an issue that should be addressed before posting the patches? Thanks, M. -- Jazz is not dead. It just smells funny...