Re: [Patch v4 21/23] x86/PCI: Refine the way to release PCI IRQ resources

2015-03-04 Thread Alex Williamson
On Wed, 2015-03-04 at 09:06 -0700, Alex Williamson wrote:
> Hi,
> 
> I'm getting a regression from this patch when using VFIO for device
> assignment to a QEMU VM.  I have a device initially bound to the nouveau
> driver, which is unbound from that driver and bound to vfio-pci for use
> by userspace.  vfio-pci calls pci_enable_device, but when userspace
> attempts to setup the legacy INTx interrupt, pci_dev->irq is zero and
> vfio-pci errors with -ENODEV.
> 
> Where is pci_dev->irq supposed to be getting re-initialized in this
> case?  Am I missing an important initialization step in vfio-pci?  I
> certainly don't think I should be calling request_irq(pci_dev->irq, ...
> when pci_dev->irq is zero.  Thanks,

It looks like the imbalance is in pci_dev->enable_cnt.  Any call to
pci_disable_device() will clear pci_dev->irq, but pci_enable_device()
does effectively nothing unless it's called with enable_cnt == zero.  So
I'm at the mercy of the previous driver behaving relatively sanely in
order to re-enable legacy interrupts for a device :-\  Maybe the unbound
notifier needs to do some sanitizing of the device?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v4 21/23] x86/PCI: Refine the way to release PCI IRQ resources

2015-03-04 Thread Alex Williamson
Hi,

I'm getting a regression from this patch when using VFIO for device
assignment to a QEMU VM.  I have a device initially bound to the nouveau
driver, which is unbound from that driver and bound to vfio-pci for use
by userspace.  vfio-pci calls pci_enable_device, but when userspace
attempts to setup the legacy INTx interrupt, pci_dev->irq is zero and
vfio-pci errors with -ENODEV.

Where is pci_dev->irq supposed to be getting re-initialized in this
case?  Am I missing an important initialization step in vfio-pci?  I
certainly don't think I should be calling request_irq(pci_dev->irq, ...
when pci_dev->irq is zero.  Thanks,

Alex


On Thu, 2015-02-05 at 13:44 +0800, Jiang Liu wrote:
> Some PCI device drivers assume that pci_dev->irq won't change after
> calling pci_disable_device() and pci_enable_device() during suspend and
> resume.
> 
> Commit c03b3b0738a5 ("x86, irq, mpparse: Release IOAPIC pin when
> PCI device is disabled") frees PCI IRQ resources when pci_disable_device()
> is called and reallocate IRQ resources when pci_enable_device() is
> called again. This breaks above assumption. So commit 3eec595235c1
> ("x86, irq, PCI: Keep IRQ assignment for PCI devices during
> suspend/hibernation") and 9eabc99a635a ("x86, irq, PCI: Keep IRQ
> assignment for runtime power management") fix the issue by avoiding
> freeing/reallocating IRQ resources during PCI device suspend/resume.
> They achieve this by checking dev.power.is_prepared and
> dev.power.runtime_status.  PM maintainer, Rafael, then pointed out that
> it's really an ugly fix which leaking PM internal state information to
> IRQ subsystem.
> 
> Recently David Vrabel  also reports an
> regression in pciback driver caused by commit cffe0a2b5a34 ("x86, irq:
> Keep balance of IOAPIC pin reference count"). Please refer to:
> http://lkml.org/lkml/2015/1/14/546
> 
> So this patch refine the way to release PCI IRQ resources. Instead of
> releasing PCI IRQ resources in pci_disable_device()/
> pcibios_disable_device(), we now release it at driver unbinding
> notification BUS_NOTIFY_UNBOUND_DRIVER. In other word, we only release
> PCI IRQ resources when there's no driver bound to the PCI device, and
> it keeps the assumption that pci_dev->irq won't through multiple
> invocation of pci_enable_device()/pci_disable_device().
> 
> Signed-off-by: Jiang Liu 
> ---
>  arch/x86/include/asm/pci_x86.h |2 --
>  arch/x86/pci/common.c  |   34 --
>  arch/x86/pci/intel_mid_pci.c   |4 ++--
>  arch/x86/pci/irq.c |   15 +--
>  drivers/acpi/pci_irq.c |9 +
>  5 files changed, 32 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 164e3f8d3c3d..fa1195dae425 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -93,8 +93,6 @@ extern raw_spinlock_t pci_config_lock;
>  extern int (*pcibios_enable_irq)(struct pci_dev *dev);
>  extern void (*pcibios_disable_irq)(struct pci_dev *dev);
>  
> -extern bool mp_should_keep_irq(struct device *dev);
> -
>  struct pci_raw_ops {
>   int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
>   int reg, int len, u32 *val);
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 7b20bccf3648..ff1f0afa5ed1 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -497,6 +497,31 @@ void __init pcibios_set_cache_line_size(void)
>   }
>  }
>  
> +/*
> + * Some device drivers assume dev->irq won't change after calling
> + * pci_disable_device(). So delay releasing of IRQ resource to driver
> + * unbinding time. Otherwise it will break PM subsystem and drivers
> + * like xen-pciback etc.
> + */
> +static int pci_irq_notifier(struct notifier_block *nb, unsigned long action,
> + void *data)
> +{
> + struct pci_dev *dev = to_pci_dev(data);
> +
> + if (action != BUS_NOTIFY_UNBOUND_DRIVER)
> + return NOTIFY_DONE;
> +
> + if (pcibios_disable_irq)
> + pcibios_disable_irq(dev);
> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block pci_irq_nb = {
> + .notifier_call = pci_irq_notifier,
> + .priority = INT_MIN,
> +};
> +
>  int __init pcibios_init(void)
>  {
>   if (!raw_pci_ops) {
> @@ -509,6 +534,9 @@ int __init pcibios_init(void)
>  
>   if (pci_bf_sort >= pci_force_bf)
>   pci_sort_breadthfirst();
> +
> + bus_register_notifier(_bus_type, _irq_nb);
> +
>   return 0;
>  }
>  
> @@ -667,12 +695,6 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
>   return 0;
>  }
>  
> -void pcibios_disable_device (struct pci_dev *dev)
> -{
> - if (!pci_dev_msi_enabled(dev) && pcibios_disable_irq)
> - pcibios_disable_irq(dev);
> -}
> -
>  int pci_ext_cfg_avail(void)
>  {
>   if (raw_pci_ext_ops)
> diff --git a/arch/x86/pci/intel_mid_pci.c 

Re: [Patch v4 21/23] x86/PCI: Refine the way to release PCI IRQ resources

2015-03-04 Thread Alex Williamson
Hi,

I'm getting a regression from this patch when using VFIO for device
assignment to a QEMU VM.  I have a device initially bound to the nouveau
driver, which is unbound from that driver and bound to vfio-pci for use
by userspace.  vfio-pci calls pci_enable_device, but when userspace
attempts to setup the legacy INTx interrupt, pci_dev-irq is zero and
vfio-pci errors with -ENODEV.

Where is pci_dev-irq supposed to be getting re-initialized in this
case?  Am I missing an important initialization step in vfio-pci?  I
certainly don't think I should be calling request_irq(pci_dev-irq, ...
when pci_dev-irq is zero.  Thanks,

Alex


On Thu, 2015-02-05 at 13:44 +0800, Jiang Liu wrote:
 Some PCI device drivers assume that pci_dev-irq won't change after
 calling pci_disable_device() and pci_enable_device() during suspend and
 resume.
 
 Commit c03b3b0738a5 (x86, irq, mpparse: Release IOAPIC pin when
 PCI device is disabled) frees PCI IRQ resources when pci_disable_device()
 is called and reallocate IRQ resources when pci_enable_device() is
 called again. This breaks above assumption. So commit 3eec595235c1
 (x86, irq, PCI: Keep IRQ assignment for PCI devices during
 suspend/hibernation) and 9eabc99a635a (x86, irq, PCI: Keep IRQ
 assignment for runtime power management) fix the issue by avoiding
 freeing/reallocating IRQ resources during PCI device suspend/resume.
 They achieve this by checking dev.power.is_prepared and
 dev.power.runtime_status.  PM maintainer, Rafael, then pointed out that
 it's really an ugly fix which leaking PM internal state information to
 IRQ subsystem.
 
 Recently David Vrabel david.vra...@citrix.com also reports an
 regression in pciback driver caused by commit cffe0a2b5a34 (x86, irq:
 Keep balance of IOAPIC pin reference count). Please refer to:
 http://lkml.org/lkml/2015/1/14/546
 
 So this patch refine the way to release PCI IRQ resources. Instead of
 releasing PCI IRQ resources in pci_disable_device()/
 pcibios_disable_device(), we now release it at driver unbinding
 notification BUS_NOTIFY_UNBOUND_DRIVER. In other word, we only release
 PCI IRQ resources when there's no driver bound to the PCI device, and
 it keeps the assumption that pci_dev-irq won't through multiple
 invocation of pci_enable_device()/pci_disable_device().
 
 Signed-off-by: Jiang Liu jiang@linux.intel.com
 ---
  arch/x86/include/asm/pci_x86.h |2 --
  arch/x86/pci/common.c  |   34 --
  arch/x86/pci/intel_mid_pci.c   |4 ++--
  arch/x86/pci/irq.c |   15 +--
  drivers/acpi/pci_irq.c |9 +
  5 files changed, 32 insertions(+), 32 deletions(-)
 
 diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
 index 164e3f8d3c3d..fa1195dae425 100644
 --- a/arch/x86/include/asm/pci_x86.h
 +++ b/arch/x86/include/asm/pci_x86.h
 @@ -93,8 +93,6 @@ extern raw_spinlock_t pci_config_lock;
  extern int (*pcibios_enable_irq)(struct pci_dev *dev);
  extern void (*pcibios_disable_irq)(struct pci_dev *dev);
  
 -extern bool mp_should_keep_irq(struct device *dev);
 -
  struct pci_raw_ops {
   int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
   int reg, int len, u32 *val);
 diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
 index 7b20bccf3648..ff1f0afa5ed1 100644
 --- a/arch/x86/pci/common.c
 +++ b/arch/x86/pci/common.c
 @@ -497,6 +497,31 @@ void __init pcibios_set_cache_line_size(void)
   }
  }
  
 +/*
 + * Some device drivers assume dev-irq won't change after calling
 + * pci_disable_device(). So delay releasing of IRQ resource to driver
 + * unbinding time. Otherwise it will break PM subsystem and drivers
 + * like xen-pciback etc.
 + */
 +static int pci_irq_notifier(struct notifier_block *nb, unsigned long action,
 + void *data)
 +{
 + struct pci_dev *dev = to_pci_dev(data);
 +
 + if (action != BUS_NOTIFY_UNBOUND_DRIVER)
 + return NOTIFY_DONE;
 +
 + if (pcibios_disable_irq)
 + pcibios_disable_irq(dev);
 +
 + return NOTIFY_OK;
 +}
 +
 +static struct notifier_block pci_irq_nb = {
 + .notifier_call = pci_irq_notifier,
 + .priority = INT_MIN,
 +};
 +
  int __init pcibios_init(void)
  {
   if (!raw_pci_ops) {
 @@ -509,6 +534,9 @@ int __init pcibios_init(void)
  
   if (pci_bf_sort = pci_force_bf)
   pci_sort_breadthfirst();
 +
 + bus_register_notifier(pci_bus_type, pci_irq_nb);
 +
   return 0;
  }
  
 @@ -667,12 +695,6 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
   return 0;
  }
  
 -void pcibios_disable_device (struct pci_dev *dev)
 -{
 - if (!pci_dev_msi_enabled(dev)  pcibios_disable_irq)
 - pcibios_disable_irq(dev);
 -}
 -
  int pci_ext_cfg_avail(void)
  {
   if (raw_pci_ext_ops)
 diff --git a/arch/x86/pci/intel_mid_pci.c b/arch/x86/pci/intel_mid_pci.c
 index 44b9271580b5..95c2471f6819 100644
 --- 

Re: [Patch v4 21/23] x86/PCI: Refine the way to release PCI IRQ resources

2015-03-04 Thread Alex Williamson
On Wed, 2015-03-04 at 09:06 -0700, Alex Williamson wrote:
 Hi,
 
 I'm getting a regression from this patch when using VFIO for device
 assignment to a QEMU VM.  I have a device initially bound to the nouveau
 driver, which is unbound from that driver and bound to vfio-pci for use
 by userspace.  vfio-pci calls pci_enable_device, but when userspace
 attempts to setup the legacy INTx interrupt, pci_dev-irq is zero and
 vfio-pci errors with -ENODEV.
 
 Where is pci_dev-irq supposed to be getting re-initialized in this
 case?  Am I missing an important initialization step in vfio-pci?  I
 certainly don't think I should be calling request_irq(pci_dev-irq, ...
 when pci_dev-irq is zero.  Thanks,

It looks like the imbalance is in pci_dev-enable_cnt.  Any call to
pci_disable_device() will clear pci_dev-irq, but pci_enable_device()
does effectively nothing unless it's called with enable_cnt == zero.  So
I'm at the mercy of the previous driver behaving relatively sanely in
order to re-enable legacy interrupts for a device :-\  Maybe the unbound
notifier needs to do some sanitizing of the device?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/