Re: [PATCH 2/3] powerpc/pci: unmap legacy INTx interrupts of passthrough IO adapters

2020-06-10 Thread Cédric Le Goater
On 5/27/20 2:57 AM, Oliver O'Halloran wrote:
> On Wed, Apr 29, 2020 at 5:51 PM Cédric Le Goater  wrote:
>>
>> When a passthrough IO adapter is removed from a pseries machine using
>> hash MMU and the XIVE interrupt mode, the POWER hypervisor, pHyp,
>> expects the guest OS to have cleared all page table entries related to
>> the adapter. If some are still present, the RTAS call which isolates
>> the PCI slot returns error 9001 "valid outstanding translations" and
>> the removal of the IO adapter fails.
>>
>> INTx interrupt numbers need special care because Linux maps the
>> interrupts automatically in the Linux interrupt number space if they
>> are presented in the device tree node describing the IO adapter. These
>> interrupts are not un-mapped automatically and in case of an hot-plug
>> adapter, the PCI hot-plug layer needs to handle the cleanup to make
>> sure that all the page table entries of the XIVE ESB pages are
>> cleared.
>>
>> Cc: "Oliver O'Halloran" 
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/kernel/pci-hotplug.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/pci-hotplug.c 
>> b/arch/powerpc/kernel/pci-hotplug.c
>> index bf83f76563a3..9e9c6befd7ea 100644
>> --- a/arch/powerpc/kernel/pci-hotplug.c
>> +++ b/arch/powerpc/kernel/pci-hotplug.c
>> @@ -57,6 +57,8 @@ void pcibios_release_device(struct pci_dev *dev)
>> struct pci_controller *phb = pci_bus_to_host(dev->bus);
>> struct pci_dn *pdn = pci_get_pdn(dev);
>>
>> +   irq_dispose_mapping(dev->irq);
> 
> What does the original mapping? Powerpc arch code or the PCI core?
> Tearing down the mapping in pcibios_release_device() seems a bit fishy
> to me since the PCI core has already torn down the device state at
> that point. If the release is delayed it's possible that another
> pci_dev has mapped the IRQ before we get here, but maybe that's ok.

How's that below ? INTx mappings are cleared only when the PHB is removed.
It applies to all platforms but we could limit the removal to PHB hotplug
on pseries. 

C.


>From 10794159567552355f87e86e24002641c54e7ab5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= 
Date: Wed, 10 Jun 2020 19:55:24 +0200
Subject: [PATCH] powerpc/pci: unmap legacy INTx interrupts of passthrough IO
 adapters
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When a passthrough IO adapter is removed from a pseries machine using
hash MMU and the XIVE interrupt mode, the POWER hypervisor, pHyp,
expects the guest OS to have cleared all page table entries related to
the adapter. If some are still present, the RTAS call which isolates
the PCI slot returns error 9001 "valid outstanding translations" and
the removal of the IO adapter fails.

INTx interrupt numbers need special care because Linux maps the
interrupts automatically in the Linux interrupt number space. These
interrupts are not un-mapped automatically and in case of an hot-plug
adapter, the PCI hot-plug layer needs to handle the cleanup to make
sure that all the page table entries of the ESB pages are cleared.

Signed-off-by: Cédric Le Goater 
---
 arch/powerpc/include/asm/pci-bridge.h |  4 +++
 arch/powerpc/kernel/pci-common.c  | 47 +++
 2 files changed, 51 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index b92e81b256e5..9960dd249079 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -48,6 +48,8 @@ struct pci_controller_ops {
 
 /*
  * Structure of a PCI controller (host bridge)
+ *
+ * @intx: legacy INTx mappings
  */
 struct pci_controller {
struct pci_bus *bus;
@@ -127,6 +129,8 @@ struct pci_controller {
 
void *private_data;
struct npu *npu;
+
+   unsigned int intx[PCI_NUM_INTX];
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index be108616a721..795a9b49e0d6 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -353,6 +353,48 @@ struct pci_controller *pci_find_controller_for_domain(int 
domain_nr)
return NULL;
 }
 
+static void pci_intx_register(struct pci_dev *pdev, int virq)
+{
+   struct pci_controller *phb = pci_bus_to_host(pdev->bus);
+   int i;
+
+   for (i = 0; i < PCI_NUM_INTX; i++) {
+   /*
+* Look for an empty or an equivalent slot, IRQs can be
+* shared
+*/
+   if (phb->intx[i] == virq || !phb->intx[i]) {
+   phb->intx[i] = virq;
+   break;
+   }
+   }
+
+   if (i == PCI_NUM_INTX)
+   pr_err("PCI:%s INTx all mapped\n", pci_name(pdev));
+}
+
+/*
+ * Clearing the mapped INTx interrupts will also clear the underlying
+ * mappings of the ESB pages of the interrupts when under 

Re: [PATCH 2/3] powerpc/pci: unmap legacy INTx interrupts of passthrough IO adapters

2020-05-28 Thread Michael Ellerman
Cédric Le Goater  writes:
> On 4/29/20 9:51 AM, Cédric Le Goater wrote:
>> When a passthrough IO adapter is removed from a pseries machine using
>> hash MMU and the XIVE interrupt mode, the POWER hypervisor, pHyp,
>> expects the guest OS to have cleared all page table entries related to
>> the adapter. If some are still present, the RTAS call which isolates
>> the PCI slot returns error 9001 "valid outstanding translations" and
>> the removal of the IO adapter fails.
>> 
>> INTx interrupt numbers need special care because Linux maps the
>> interrupts automatically in the Linux interrupt number space if they
>> are presented in the device tree node describing the IO adapter. These
>> interrupts are not un-mapped automatically and in case of an hot-plug
>> adapter, the PCI hot-plug layer needs to handle the cleanup to make
>> sure that all the page table entries of the XIVE ESB pages are
>> cleared.
>
> It seems this patch needs more digging to make sure we are handling
> the IRQ unmapping in the correct PCI handler. Could you please keep
> it back for the moment ? 

Yep no worries.

cheers


Re: [PATCH 2/3] powerpc/pci: unmap legacy INTx interrupts of passthrough IO adapters

2020-05-27 Thread Cédric Le Goater
Hello Michael,

On 4/29/20 9:51 AM, Cédric Le Goater wrote:
> When a passthrough IO adapter is removed from a pseries machine using
> hash MMU and the XIVE interrupt mode, the POWER hypervisor, pHyp,
> expects the guest OS to have cleared all page table entries related to
> the adapter. If some are still present, the RTAS call which isolates
> the PCI slot returns error 9001 "valid outstanding translations" and
> the removal of the IO adapter fails.
> 
> INTx interrupt numbers need special care because Linux maps the
> interrupts automatically in the Linux interrupt number space if they
> are presented in the device tree node describing the IO adapter. These
> interrupts are not un-mapped automatically and in case of an hot-plug
> adapter, the PCI hot-plug layer needs to handle the cleanup to make
> sure that all the page table entries of the XIVE ESB pages are
> cleared.

It seems this patch needs more digging to make sure we are handling
the IRQ unmapping in the correct PCI handler. Could you please keep
it back for the moment ? 

Thanks,

C.


Re: [PATCH 2/3] powerpc/pci: unmap legacy INTx interrupts of passthrough IO adapters

2020-05-27 Thread Cédric Le Goater
On 5/27/20 2:57 AM, Oliver O'Halloran wrote:
> On Wed, Apr 29, 2020 at 5:51 PM Cédric Le Goater  wrote:
>>
>> When a passthrough IO adapter is removed from a pseries machine using
>> hash MMU and the XIVE interrupt mode, the POWER hypervisor, pHyp,
>> expects the guest OS to have cleared all page table entries related to
>> the adapter. If some are still present, the RTAS call which isolates
>> the PCI slot returns error 9001 "valid outstanding translations" and
>> the removal of the IO adapter fails.
>>
>> INTx interrupt numbers need special care because Linux maps the
>> interrupts automatically in the Linux interrupt number space if they
>> are presented in the device tree node describing the IO adapter. These
>> interrupts are not un-mapped automatically and in case of an hot-plug
>> adapter, the PCI hot-plug layer needs to handle the cleanup to make
>> sure that all the page table entries of the XIVE ESB pages are
>> cleared.
>>
>> Cc: "Oliver O'Halloran" 
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/kernel/pci-hotplug.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/pci-hotplug.c 
>> b/arch/powerpc/kernel/pci-hotplug.c
>> index bf83f76563a3..9e9c6befd7ea 100644
>> --- a/arch/powerpc/kernel/pci-hotplug.c
>> +++ b/arch/powerpc/kernel/pci-hotplug.c
>> @@ -57,6 +57,8 @@ void pcibios_release_device(struct pci_dev *dev)
>> struct pci_controller *phb = pci_bus_to_host(dev->bus);
>> struct pci_dn *pdn = pci_get_pdn(dev);
>>
>> +   irq_dispose_mapping(dev->irq);
> 
> What does the original mapping? Powerpc arch code or the PCI core?

Powerpc. In pci_read_irq_line() when a device is added.

> Tearing down the mapping in pcibios_release_device() seems a bit fishy
> to me since the PCI core has already torn down the device state at
> that point. If the release is delayed it's possible that another
> pci_dev has mapped the IRQ before we get here, but maybe that's ok.

Which scenario would that be ? multiple devices mapping the same INTx
interrupt because all are used already ? 

Where should we drop the mapping ?

Thanks,

C.

>> +
>> eeh_remove_device(dev);
>>
>> if (phb->controller_ops.release_device)
>> --
>> 2.25.4
>>



Re: [PATCH 2/3] powerpc/pci: unmap legacy INTx interrupts of passthrough IO adapters

2020-05-26 Thread Oliver O'Halloran
On Wed, Apr 29, 2020 at 5:51 PM Cédric Le Goater  wrote:
>
> When a passthrough IO adapter is removed from a pseries machine using
> hash MMU and the XIVE interrupt mode, the POWER hypervisor, pHyp,
> expects the guest OS to have cleared all page table entries related to
> the adapter. If some are still present, the RTAS call which isolates
> the PCI slot returns error 9001 "valid outstanding translations" and
> the removal of the IO adapter fails.
>
> INTx interrupt numbers need special care because Linux maps the
> interrupts automatically in the Linux interrupt number space if they
> are presented in the device tree node describing the IO adapter. These
> interrupts are not un-mapped automatically and in case of an hot-plug
> adapter, the PCI hot-plug layer needs to handle the cleanup to make
> sure that all the page table entries of the XIVE ESB pages are
> cleared.
>
> Cc: "Oliver O'Halloran" 
> Signed-off-by: Cédric Le Goater 
> ---
>  arch/powerpc/kernel/pci-hotplug.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/kernel/pci-hotplug.c 
> b/arch/powerpc/kernel/pci-hotplug.c
> index bf83f76563a3..9e9c6befd7ea 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -57,6 +57,8 @@ void pcibios_release_device(struct pci_dev *dev)
> struct pci_controller *phb = pci_bus_to_host(dev->bus);
> struct pci_dn *pdn = pci_get_pdn(dev);
>
> +   irq_dispose_mapping(dev->irq);

What does the original mapping? Powerpc arch code or the PCI core?
Tearing down the mapping in pcibios_release_device() seems a bit fishy
to me since the PCI core has already torn down the device state at
that point. If the release is delayed it's possible that another
pci_dev has mapped the IRQ before we get here, but maybe that's ok.

> +
> eeh_remove_device(dev);
>
> if (phb->controller_ops.release_device)
> --
> 2.25.4
>


Re: [PATCH 2/3] powerpc/pci: unmap legacy INTx interrupts of passthrough IO adapters

2020-05-21 Thread Cédric Le Goater
On 4/29/20 9:51 AM, Cédric Le Goater wrote:
> When a passthrough IO adapter is removed from a pseries machine using
> hash MMU and the XIVE interrupt mode, the POWER hypervisor, pHyp,
> expects the guest OS to have cleared all page table entries related to
> the adapter. If some are still present, the RTAS call which isolates
> the PCI slot returns error 9001 "valid outstanding translations" and
> the removal of the IO adapter fails.
> 
> INTx interrupt numbers need special care because Linux maps the
> interrupts automatically in the Linux interrupt number space if they
> are presented in the device tree node describing the IO adapter. These
> interrupts are not un-mapped automatically and in case of an hot-plug
> adapter, the PCI hot-plug layer needs to handle the cleanup to make
> sure that all the page table entries of the XIVE ESB pages are
> cleared.
> 
> Cc: "Oliver O'Halloran" 
> Signed-off-by: Cédric Le Goater 


We did some tests with differnt passthrough adapters under LPAPRs (PowerVM) 
and KVM guests P8 (HPT) and P9 (HPT+Radix). Baremetal behaves correctly. 
But I would feel more comfortable if someone could give a try (PATCH1-2) 
on a different system.


Thanks,

C. 


> ---
>  arch/powerpc/kernel/pci-hotplug.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/pci-hotplug.c 
> b/arch/powerpc/kernel/pci-hotplug.c
> index bf83f76563a3..9e9c6befd7ea 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -57,6 +57,8 @@ void pcibios_release_device(struct pci_dev *dev)
>   struct pci_controller *phb = pci_bus_to_host(dev->bus);
>   struct pci_dn *pdn = pci_get_pdn(dev);
>  
> + irq_dispose_mapping(dev->irq);
> +
>   eeh_remove_device(dev);
>  
>   if (phb->controller_ops.release_device)
> 



[PATCH 2/3] powerpc/pci: unmap legacy INTx interrupts of passthrough IO adapters

2020-04-29 Thread Cédric Le Goater
When a passthrough IO adapter is removed from a pseries machine using
hash MMU and the XIVE interrupt mode, the POWER hypervisor, pHyp,
expects the guest OS to have cleared all page table entries related to
the adapter. If some are still present, the RTAS call which isolates
the PCI slot returns error 9001 "valid outstanding translations" and
the removal of the IO adapter fails.

INTx interrupt numbers need special care because Linux maps the
interrupts automatically in the Linux interrupt number space if they
are presented in the device tree node describing the IO adapter. These
interrupts are not un-mapped automatically and in case of an hot-plug
adapter, the PCI hot-plug layer needs to handle the cleanup to make
sure that all the page table entries of the XIVE ESB pages are
cleared.

Cc: "Oliver O'Halloran" 
Signed-off-by: Cédric Le Goater 
---
 arch/powerpc/kernel/pci-hotplug.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index bf83f76563a3..9e9c6befd7ea 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -57,6 +57,8 @@ void pcibios_release_device(struct pci_dev *dev)
struct pci_controller *phb = pci_bus_to_host(dev->bus);
struct pci_dn *pdn = pci_get_pdn(dev);
 
+   irq_dispose_mapping(dev->irq);
+
eeh_remove_device(dev);
 
if (phb->controller_ops.release_device)
-- 
2.25.4