Re: [Patch v4 21/23] x86/PCI: Refine the way to release PCI IRQ resources
On Wed, 2015-03-04 at 09:06 -0700, Alex Williamson wrote: > Hi, > > I'm getting a regression from this patch when using VFIO for device > assignment to a QEMU VM. I have a device initially bound to the nouveau > driver, which is unbound from that driver and bound to vfio-pci for use > by userspace. vfio-pci calls pci_enable_device, but when userspace > attempts to setup the legacy INTx interrupt, pci_dev->irq is zero and > vfio-pci errors with -ENODEV. > > Where is pci_dev->irq supposed to be getting re-initialized in this > case? Am I missing an important initialization step in vfio-pci? I > certainly don't think I should be calling request_irq(pci_dev->irq, ... > when pci_dev->irq is zero. Thanks, It looks like the imbalance is in pci_dev->enable_cnt. Any call to pci_disable_device() will clear pci_dev->irq, but pci_enable_device() does effectively nothing unless it's called with enable_cnt == zero. So I'm at the mercy of the previous driver behaving relatively sanely in order to re-enable legacy interrupts for a device :-\ Maybe the unbound notifier needs to do some sanitizing of the device? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v4 21/23] x86/PCI: Refine the way to release PCI IRQ resources
Hi, I'm getting a regression from this patch when using VFIO for device assignment to a QEMU VM. I have a device initially bound to the nouveau driver, which is unbound from that driver and bound to vfio-pci for use by userspace. vfio-pci calls pci_enable_device, but when userspace attempts to setup the legacy INTx interrupt, pci_dev->irq is zero and vfio-pci errors with -ENODEV. Where is pci_dev->irq supposed to be getting re-initialized in this case? Am I missing an important initialization step in vfio-pci? I certainly don't think I should be calling request_irq(pci_dev->irq, ... when pci_dev->irq is zero. Thanks, Alex On Thu, 2015-02-05 at 13:44 +0800, Jiang Liu wrote: > Some PCI device drivers assume that pci_dev->irq won't change after > calling pci_disable_device() and pci_enable_device() during suspend and > resume. > > Commit c03b3b0738a5 ("x86, irq, mpparse: Release IOAPIC pin when > PCI device is disabled") frees PCI IRQ resources when pci_disable_device() > is called and reallocate IRQ resources when pci_enable_device() is > called again. This breaks above assumption. So commit 3eec595235c1 > ("x86, irq, PCI: Keep IRQ assignment for PCI devices during > suspend/hibernation") and 9eabc99a635a ("x86, irq, PCI: Keep IRQ > assignment for runtime power management") fix the issue by avoiding > freeing/reallocating IRQ resources during PCI device suspend/resume. > They achieve this by checking dev.power.is_prepared and > dev.power.runtime_status. PM maintainer, Rafael, then pointed out that > it's really an ugly fix which leaking PM internal state information to > IRQ subsystem. > > Recently David Vrabel also reports an > regression in pciback driver caused by commit cffe0a2b5a34 ("x86, irq: > Keep balance of IOAPIC pin reference count"). Please refer to: > http://lkml.org/lkml/2015/1/14/546 > > So this patch refine the way to release PCI IRQ resources. Instead of > releasing PCI IRQ resources in pci_disable_device()/ > pcibios_disable_device(), we now release it at driver unbinding > notification BUS_NOTIFY_UNBOUND_DRIVER. In other word, we only release > PCI IRQ resources when there's no driver bound to the PCI device, and > it keeps the assumption that pci_dev->irq won't through multiple > invocation of pci_enable_device()/pci_disable_device(). > > Signed-off-by: Jiang Liu > --- > arch/x86/include/asm/pci_x86.h |2 -- > arch/x86/pci/common.c | 34 -- > arch/x86/pci/intel_mid_pci.c |4 ++-- > arch/x86/pci/irq.c | 15 +-- > drivers/acpi/pci_irq.c |9 + > 5 files changed, 32 insertions(+), 32 deletions(-) > > diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h > index 164e3f8d3c3d..fa1195dae425 100644 > --- a/arch/x86/include/asm/pci_x86.h > +++ b/arch/x86/include/asm/pci_x86.h > @@ -93,8 +93,6 @@ extern raw_spinlock_t pci_config_lock; > extern int (*pcibios_enable_irq)(struct pci_dev *dev); > extern void (*pcibios_disable_irq)(struct pci_dev *dev); > > -extern bool mp_should_keep_irq(struct device *dev); > - > struct pci_raw_ops { > int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn, > int reg, int len, u32 *val); > diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c > index 7b20bccf3648..ff1f0afa5ed1 100644 > --- a/arch/x86/pci/common.c > +++ b/arch/x86/pci/common.c > @@ -497,6 +497,31 @@ void __init pcibios_set_cache_line_size(void) > } > } > > +/* > + * Some device drivers assume dev->irq won't change after calling > + * pci_disable_device(). So delay releasing of IRQ resource to driver > + * unbinding time. Otherwise it will break PM subsystem and drivers > + * like xen-pciback etc. > + */ > +static int pci_irq_notifier(struct notifier_block *nb, unsigned long action, > + void *data) > +{ > + struct pci_dev *dev = to_pci_dev(data); > + > + if (action != BUS_NOTIFY_UNBOUND_DRIVER) > + return NOTIFY_DONE; > + > + if (pcibios_disable_irq) > + pcibios_disable_irq(dev); > + > + return NOTIFY_OK; > +} > + > +static struct notifier_block pci_irq_nb = { > + .notifier_call = pci_irq_notifier, > + .priority = INT_MIN, > +}; > + > int __init pcibios_init(void) > { > if (!raw_pci_ops) { > @@ -509,6 +534,9 @@ int __init pcibios_init(void) > > if (pci_bf_sort >= pci_force_bf) > pci_sort_breadthfirst(); > + > + bus_register_notifier(_bus_type, _irq_nb); > + > return 0; > } > > @@ -667,12 +695,6 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) > return 0; > } > > -void pcibios_disable_device (struct pci_dev *dev) > -{ > - if (!pci_dev_msi_enabled(dev) && pcibios_disable_irq) > - pcibios_disable_irq(dev); > -} > - > int pci_ext_cfg_avail(void) > { > if (raw_pci_ext_ops) > diff --git a/arch/x86/pci/intel_mid_pci.c
Re: [Patch v4 21/23] x86/PCI: Refine the way to release PCI IRQ resources
Hi, I'm getting a regression from this patch when using VFIO for device assignment to a QEMU VM. I have a device initially bound to the nouveau driver, which is unbound from that driver and bound to vfio-pci for use by userspace. vfio-pci calls pci_enable_device, but when userspace attempts to setup the legacy INTx interrupt, pci_dev-irq is zero and vfio-pci errors with -ENODEV. Where is pci_dev-irq supposed to be getting re-initialized in this case? Am I missing an important initialization step in vfio-pci? I certainly don't think I should be calling request_irq(pci_dev-irq, ... when pci_dev-irq is zero. Thanks, Alex On Thu, 2015-02-05 at 13:44 +0800, Jiang Liu wrote: Some PCI device drivers assume that pci_dev-irq won't change after calling pci_disable_device() and pci_enable_device() during suspend and resume. Commit c03b3b0738a5 (x86, irq, mpparse: Release IOAPIC pin when PCI device is disabled) frees PCI IRQ resources when pci_disable_device() is called and reallocate IRQ resources when pci_enable_device() is called again. This breaks above assumption. So commit 3eec595235c1 (x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation) and 9eabc99a635a (x86, irq, PCI: Keep IRQ assignment for runtime power management) fix the issue by avoiding freeing/reallocating IRQ resources during PCI device suspend/resume. They achieve this by checking dev.power.is_prepared and dev.power.runtime_status. PM maintainer, Rafael, then pointed out that it's really an ugly fix which leaking PM internal state information to IRQ subsystem. Recently David Vrabel david.vra...@citrix.com also reports an regression in pciback driver caused by commit cffe0a2b5a34 (x86, irq: Keep balance of IOAPIC pin reference count). Please refer to: http://lkml.org/lkml/2015/1/14/546 So this patch refine the way to release PCI IRQ resources. Instead of releasing PCI IRQ resources in pci_disable_device()/ pcibios_disable_device(), we now release it at driver unbinding notification BUS_NOTIFY_UNBOUND_DRIVER. In other word, we only release PCI IRQ resources when there's no driver bound to the PCI device, and it keeps the assumption that pci_dev-irq won't through multiple invocation of pci_enable_device()/pci_disable_device(). Signed-off-by: Jiang Liu jiang@linux.intel.com --- arch/x86/include/asm/pci_x86.h |2 -- arch/x86/pci/common.c | 34 -- arch/x86/pci/intel_mid_pci.c |4 ++-- arch/x86/pci/irq.c | 15 +-- drivers/acpi/pci_irq.c |9 + 5 files changed, 32 insertions(+), 32 deletions(-) diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index 164e3f8d3c3d..fa1195dae425 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -93,8 +93,6 @@ extern raw_spinlock_t pci_config_lock; extern int (*pcibios_enable_irq)(struct pci_dev *dev); extern void (*pcibios_disable_irq)(struct pci_dev *dev); -extern bool mp_should_keep_irq(struct device *dev); - struct pci_raw_ops { int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn, int reg, int len, u32 *val); diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index 7b20bccf3648..ff1f0afa5ed1 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -497,6 +497,31 @@ void __init pcibios_set_cache_line_size(void) } } +/* + * Some device drivers assume dev-irq won't change after calling + * pci_disable_device(). So delay releasing of IRQ resource to driver + * unbinding time. Otherwise it will break PM subsystem and drivers + * like xen-pciback etc. + */ +static int pci_irq_notifier(struct notifier_block *nb, unsigned long action, + void *data) +{ + struct pci_dev *dev = to_pci_dev(data); + + if (action != BUS_NOTIFY_UNBOUND_DRIVER) + return NOTIFY_DONE; + + if (pcibios_disable_irq) + pcibios_disable_irq(dev); + + return NOTIFY_OK; +} + +static struct notifier_block pci_irq_nb = { + .notifier_call = pci_irq_notifier, + .priority = INT_MIN, +}; + int __init pcibios_init(void) { if (!raw_pci_ops) { @@ -509,6 +534,9 @@ int __init pcibios_init(void) if (pci_bf_sort = pci_force_bf) pci_sort_breadthfirst(); + + bus_register_notifier(pci_bus_type, pci_irq_nb); + return 0; } @@ -667,12 +695,6 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) return 0; } -void pcibios_disable_device (struct pci_dev *dev) -{ - if (!pci_dev_msi_enabled(dev) pcibios_disable_irq) - pcibios_disable_irq(dev); -} - int pci_ext_cfg_avail(void) { if (raw_pci_ext_ops) diff --git a/arch/x86/pci/intel_mid_pci.c b/arch/x86/pci/intel_mid_pci.c index 44b9271580b5..95c2471f6819 100644 ---
Re: [Patch v4 21/23] x86/PCI: Refine the way to release PCI IRQ resources
On Wed, 2015-03-04 at 09:06 -0700, Alex Williamson wrote: Hi, I'm getting a regression from this patch when using VFIO for device assignment to a QEMU VM. I have a device initially bound to the nouveau driver, which is unbound from that driver and bound to vfio-pci for use by userspace. vfio-pci calls pci_enable_device, but when userspace attempts to setup the legacy INTx interrupt, pci_dev-irq is zero and vfio-pci errors with -ENODEV. Where is pci_dev-irq supposed to be getting re-initialized in this case? Am I missing an important initialization step in vfio-pci? I certainly don't think I should be calling request_irq(pci_dev-irq, ... when pci_dev-irq is zero. Thanks, It looks like the imbalance is in pci_dev-enable_cnt. Any call to pci_disable_device() will clear pci_dev-irq, but pci_enable_device() does effectively nothing unless it's called with enable_cnt == zero. So I'm at the mercy of the previous driver behaving relatively sanely in order to re-enable legacy interrupts for a device :-\ Maybe the unbound notifier needs to do some sanitizing of the device? Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/